This year’s WiDS Datathon Excellence in Research Award (Phase II) broadened participants’ focus to examine the impact of climate change. Participants were able to explore a domain related to their choice of dataset and task to compose a short research paper over a period of three months. Thank you to our partners at US Environmental Protection Agency (EPA), MIT Critical Data, and Climate Change AI for providing datasets and mentoring participants throughout their exploration.
The WiDS Datathon Phase II garnered participants from over 39 countries. Teams who submitted papers cross-collaborated throughout Antigua, Canada, France, India, Nigeria, Peru, Saudi Arabia, Spain, Trinidad and Tobago, and the US. Nearly 88% of participants express that this was their first time participating in Phase II, and many of them consider themselves beginners. Participants express that this research experience developed their curiosity and newfound experience working on a data science research project, where they gained skill sets to apply to their respective career paths.
Subject matter experts from the WiDS Datathon Committee and partners evaluated the research papers on their potential for real-world impact, rigor in scientific methodology, and clarity of communication.
Winners share stories about their experience, and invite you to read their papers.
Winners
Best in CCAI Track:
Predicting building gas consumption in France: impact of census variables
by Beatriz Mora and Adetomiwa Adedeji
From Spain to Nigeria, Beatriz and Adetomiwa collaborated across countries to compete in the WiDS Datathon. Beatriz, PhD, was a data scientist who transitioned to teaching computer science. Adetomiwa participates in several Kaggle competitions, and is a machine learning engineer by profession.
Both Beatriz and Adetomiwa are driven to learn new things, which is what attracted them to participate in the datathon. When asked about their experience participating, they indicated that without a leaderboard and fixed prediction in Phase II, you have to “challenge yourself to be better than yourself, rather than others”.
Best in EPA Track:
Data-driven Investigation on Impact of Air Quality in Different Demographics
by Areerat Kichkha, Irina Amari, Jaelin Lee
Areerat, Irina and Jaelin came together to participate in the WiDS Datathon Phase II across Ottawa, Canada, St. Louis, Missouri, and New York, NY. Jaelin, Director at dbNCR, is pursuing entrepreneurship in AI and PhD, and wanted the experience of writing a data science research paper. Areerat, an economist and Executive Director at AIRLEAP.ORG, is networking and making connections between data science methods in the topics of economics and the environmental sciences. Irina, Data Scientist at Netsas, has extensive experience in business in data-related roles and a passion for AI and ML. She participated in Phase II to improve her research skills.
Jaelin says that she liked having time to allow for a slower learning process to focus on iteration, trying new things, and getting feedback. Areerat enjoyed getting a sense of the entire data science pipeline to allow for creativity. Irina enjoyed the opportunity to collaborate with other data scientists from different backgrounds to come up with a solution to the complex environmental problem.
Best in MIT Track:
Determinants of Medical Resources in the U.S.: a County-level Econometric Analysis
by Yunxin Wu, Xu Wang, Daniela Alvarez
Xu, Yunxin, and Daniela collaborated together in the WiDS Datathon Phase II. Xu and Yunxin have known each other for 20 years. Xu is pursuing his PhD in economics and Yunxin is developing her career in the domains of technology, research, data science, and business management, and aspires to further her career in interdisciplinary research. When Yunxin met Daniela during the Datathon office hours, they found common interests in exploring the COVID track. Daniela is a self-taught data scientist from Peru who is eager to participate in Phase II as her first experience in research.
The team collaborated well together. Xu led the project by applying his PhD knowledge and tools to formalize the research question, design empirical strategies, perform statistical analysis and professionally present the findings. Yunxin researched external datasets, helped select and focus the research topic, used data science techniques and customized non-conventional variables. She appreciates the opportunity in academic writing, data analysis, visualization, and augmentation, as well as networking, learning different coding habits and styles, and various version control methods. Daniela contributed to data cleaning, exploration and preliminary analysis using machine learning tools. She landed her new data science job during the datathon, as well, and this experience positively impacted her career direction. She says,
“You don’t need to be a machine learning engineer or a data scientist to contribute, and everyone can contribute with their expertise and learn machine learning skills to help within their own fields” — Daniela Alvarez
The team believes that research is not all about prediction accuracy; it is about quantitative thinking. It was a great learning experience to gain both research skills, and opportunities for teamwork and collaboration. Xu realizes that collaborating on a project requires much soft skills and team camaraderie and that participating in the WiDS Datathon Phase II was “good practice making mistakes and a great sandbox to learn.”
Honorary Mentions
Best Young Researcher:
Automatic PM2.5 Prediction Including Feature Engineering and Model Tuning
by Olivia Zhao
Olivia Zhao, a rising senior at Chapel Hill High School, had a small amount of data science exposure through her high school course. When a teacher at her school shared the opportunity, the WiDS Datathon Phase II piqued her interest, as she would have more freedom to explore the data rather than set on a specific path. It was a challenge, indeed, which felt daunting at first.
Olivia persisted by looking at previous examples of prior Phase II papers. She was intrigued by the EPA dataset, as it related to the topics she was learning in her earth science class. She wanted to learn how people worked with this pollutant in data science and how data science can be applied to real-life questions, instead of simply numbers and tables.
To Olivia, the WiDS Datathon Phase II clarified how people use data science in real life. She is now considering data science as a career path to pursue.
Best in Data Science Novice:
You Have to Pay Attention to the Road Even if You Don’t Know Where You’re Going
by Christine Kuta, Aissatou Cissoko Diallo, Drashti B. Desai
Christine, Aissatou, and Drashti collaborated across Massachusetts and New Jersey. They each had an expertise to bring to the table: Aissatou was the programmer; Drashti was the data visualizer; and Christine was the engineer.
New to data science, both Aissatou and Drashti were initially hesitant to participate but persevered. Aissatou says, “We saw where we were struggling, and what we had to do to overcome it. As a newbie you are always shy, but you can get over it. People will catch you when you need it. Even when it is intimidating, you can do it.” Drashti responds, “there is a notion that you cannot do it. You have to learn by going, and by doing it.”
They were pleased with their outcome, and learned a lot from each other. Aissatou says that the datathon helped her build the gap between being a confused new graduate to being more confident to apply for an entry level data scientist role and feeling ready for it, saying, “I am no longer afraid of data science”. Drashti applied what she learned in Phase II to her master’s degree in business analysis. She realizes that data science is more than coding, modeling, and predictive analysis; it is also about gaining insights. Christine has been studying data science as an addition to her existing knowledge and skills, and her WiDS experience has helped her gain a new level of confidence in exploring data science opportunities.
The team, which met virtually through WiDS, is now completing a new project together working on an analysis of the economic factors and impact of the environment.
Best Statistical Analysis:
An evaluation of hazardous air pollutants in diverse demographic groups across United States
by Lesley Chapman Hannah
Lesley Chapman Hannah learned about the WiDS Datathon through a notice sent out through her graduate department at American University. Lesley does research for a living. As a postdoc at the National Cancer Institute (NCI), she studies health related questions and applies machine learning to research questions. When deciding which track for WiDS Datathon Phase II to pursue, the EPA dataset seemed to be a natural extension of the WiDS Datathon focus on climate change.
Lesley’s interests aligned with the focus of Phase II this year on open-ended exploration. She enjoys looking at every aspect of a project, researching, and applying a chosen method. She has been wanting to try out this statistical approach in other facets, so the WiDS Datathon Phase II was a great place to start. Lesley has enjoyed learning and gaining new skills this year through the WiDS Datathon 2022.
Best in Mix Expertise Team:
Improving the Energy Consumption of Buildings Using Machine Learning
by Juana Martin Gonzalez, Sayantica Pattanayak, Priya Shivaani Chauhan
Juana, Sayantica, and Priya collaborated across time zones to compete together in the WiDS Datathon Phase II. Sayantica, Assistant Professor at University of St. Thomas loves participating in Kaggle competitions and teaches deep learning. She was motivated to participate in Phase II to do more research work and experiment with different models. Juana, an undergraduate student in physics, saw Phase II as an opportunity to learn python and build statistical skills. She is considering pursuing a PhD, eager to use the Phase II experience to see what research is like. Priya, Senior Director of External Relations at First Source, is aspiring for her master’s in Data Science at UC Berkeley. She previously did research for the UC Davis Department Chair, and was 1 of 7 recipients in the 22nd National Conference on Undergraduate Research.
Juana says, “when it comes to research it doesn’t happen overnight”. There are so many questions and directions that you could take with the project that you really need to take the time and work with a team to think in-depth about your choices. Sayantica says that working together in teams and having people ask you questions improves your skills, and that research constantly pushes you to ask yourself questions. As beginners in writing research papers, both Juana and Priya say that focusing on the learning process during the WiDS Datathon Phase II has improved their skill sets.
Congratulations to all participants who participated in the WiDS Datathon Excellence in Research opportunity this year! We are already in the planning stages for what’s to come, next year. Stay tuned this fall for updates and announcements about the WiDS Datathon 2023!