Skip to main content
WiDS Posts | May 5, 2021

Learn from WiDS Datathon 2021 Winners, including a Kaggle Grandmaster

After the WiDS Datathon 2021 leaderboard closed, WiDS Datathon Committee Member Maggie Demkin of Kaggle arranged for a series of calls with the first, second, and third place winners to learn more about their experiences and approaches to the competition. Recordings of these conversations are now available so that everyone can learn from the winners.

Picture

First place winner Kim Montgomery is a Kaggle grandmaster and a data scientist at H2O.ai with a PhD in applied mathematics. Kim has worked in mathematical biology, and is currently working on applications of machine learning in the electrical utility industry.  Erin LeDell is the other first place winner and is the Chief Machine Learning Engineer at H2O.ai. Erin has a PhD in Biostatistics, and founded the Women in Machine Learning & Data Science (WiML/DS) meetup organization.

Kim and Erin describe their final model, a stacked ensemble with one LightGBM model, two XGBoost models, and two AutoML models, as well as their feature engineering. In her presentation, Kim does a deep dive on her research on the dataset and describes some very creative approaches towards feature engineering. While their model was complex, Erin explains how it doesn’t have to be that way, noting, “If you want to just use one of the constituent models, […] you could make something simpler with pretty close accuracy.”

Learn from Kim and Erin’s winning approach:

The second place winners were Mahsa ShoaranMasoud FarivarCong Ding, and Hossein Abedi, from the École polytechnique fédérale de Lausanne, or EPFL, a research institute and university in Lausanne, Switzerland, that specializes in natural sciences and engineering.

Mahsa Shoaran is an assistant professor at EPFL in both Electrical Engineering and Neuroprosthetics and the recipient of the Google Faculty Research Award in Machine Learning. Her research interests lie at the intersection of machine learning and neural interfaces. Prior to EPFL, Mahsa was at Cornell after completing her postdoc at Caltech, which is where she and Masoud met. Masoud Farivar is a senior data scientist at the Swiss Data Science Center (SDSC) and a scientist at EPFL. He previously worked at Google after completing his PhD at Caltech.

Cong Ding is a PhD student in Biotechnology and Bioengineering at EPFL in Geneva. Cong received her master’s degree from Tsinghua University in Beijing, and has also hosted Kaggle competitions. Hossein Abedi was a senior data scientist at ProAI, and is an incoming PhD student at EPFL. He received a master’s degree from Amirkabir University, and is a Kaggle expert.

Three out of four members of this team competed in the WiDS Datathon in 2020, finishing in fifth place that year. They talk about their team journey and describe their use of gradient boosting methods, feature selection and engineering, and additional strategies they used  to develop their model.

Hear how the EPFL team built their model:

The WiDS Datathon third place winners were Maya Saghiv, Liz Vaknin, Or Katz, and Pavel Vodolazov, all from Israel.

Maya Saghiv is an Industrial Engineer who also has an MBA, and has worked in the field of Industrial Engineering, Operations Research, and Optimization. Maya also received her data scientist certification from the Technion, Israel’s Institute of Technology. Liz Vaknin is a master’s student in Electrical Engineering as well as a deep learning researcher for NEC, with a specialization in medical engineering.

Or Katz is an electrical engineer who works for NEC, as well, but as a researcher of computer vision deep learning. Or is also a Kaggle competitions master and Kaggle Notebook master.  Pavel Vodolazov works as a data scientist at NEC after receiving his master’s degree in financial mathematics and working as a data scientist in financial services and telecom.

The third place team discussed their approach to feature selection and engineering as well as the training methods they used, and some techniques for data imputation.  They included dynamic variables in their feature engineering and came up with some important and interesting findings using their final model. The team also submitted a research paper for the WiDS Datathon Excellence in Research Award, which will be announced later this summer.

Listen to techniques used by this winning team:

Congratulations to all who participated in the WiDS Datathon 2021, with a special thanks to all who shared their work and their stories.