Skip to main content
Tag

Finance

Poster Blitz

In our first-ever poster blitz at the 2024 WiDS Worldwide, Stanford conference, participants showcased mastery in summarizing research and projects, presenting on-stage in 60 second or less. Notably, participants ranged from middle schoolers to postdocs, showcasing a diverse and talented group.

Read More

WiDS Datathon 2024 Challenge #1 Winners

Working with the WiDS Datathon dataset over the past week has been a thrilling exercise. This dataset presents an opportunity to learn about interesting and real-world modeling challenges, and is different from other curated datasets in textbooks and classic machine learning exercises. For that reason, I discuss some of the challenges you may experience around missing data, multicollinearity and linear/ nonlinear approaches. I will also provide resources to help you on these topics.

Read More

Three ways to enrich your data science career with the WiDS Datathon 2024 experience

Working with the WiDS Datathon dataset over the past week has been a thrilling exercise. This dataset presents an opportunity to learn about interesting and real-world modeling challenges, and is different from other curated datasets in textbooks and classic machine learning exercises. For that reason, I discuss some of the challenges you may experience around missing data, multicollinearity and linear/ nonlinear approaches. I will also provide resources to help you on these topics.

Read More

Triple Negative Breast Cancer Datathon – A Tutorial

Working with the WiDS Datathon dataset over the past week has been a thrilling exercise. This dataset presents an opportunity to learn about interesting and real-world modeling challenges, and is different from other curated datasets in textbooks and classic machine learning exercises. For that reason, I discuss some of the challenges you may experience around missing data, multicollinearity and linear/ nonlinear approaches. I will also provide resources to help you on these topics.

Read More

Fighting Crypto Crime with Data Science

Kim Grauer podcast page

Kim is the Director of Research at Chainalysis, where she examines trends in cryptocurrency economics and crime. She was trained in economics at the London School of Economics and in politics at Oxford University. Previously, she explored technological advancements in developing countries as an academic research associate at the London School of Economics and was an economics researcher at the New York City Economic Development Corporation.

Read More

Mentorship, Data Ethics, and Leadership

Kate Kolich serves as the Assistant Governor and the General Manager of Information Data and Analytics at the Reserve Bank of New Zealand.
In our episode, we discuss Kate’s role at the Reserve Bank, the role of her team, highlights from her career, and her insights on being a successful woman leader in her field.

Read More

The Bridge Between Dance and Data Science

Veronica Edwards is a dancer and choreographer who, after finding that physics was not in fact the major she wanted, studied sociology at Princeton University. In her words, she stumbled into her first job after graduation, which was with the company ReadWorks, and recently joined Polygence as a senior data analyst. In this podcast episode, she talks about Polygence, her wide-ranging educational and career background, the advice she has for others, and how dance has been an important part of her life and learning.

Read More

Using AI to Fight the Climate Crisis

Illustration of Priya Donti

Priya Donti, Executive Director of Climate Change AI, explains multiple ways that machine learning and AI can be used to mitigate climate change. Her work at the intersection of climate change, computer science, and data science led her to co-found Climate Change AI, our partner for the WiDS Datathon 2023 challenge to improve long-term weather forecasting.

Read More

Teaching and Learning Data Science in Latin America

Illustration of Lesly Zerna

Lesly Zerna, Google Developer Expert at Universidad Privada Boliviana, is a Bolivian researcher, teacher, mentor, content creator, and software developer. Shares her passion for learning and teaching data science and artificial intelligence. Lesly says learning works best when you know your personal learning style and apply that to practical topics you are interested in.​

Read More

Data in Seismology and Genomics Research

Illustration of Eileen Martin #Nila Monnier Ioannidis

Finding new ways to collect data – and a willingness to share it – are the hallmarks of a career in academia, according to Eileen Martin and Nila Monnier Ioannidis, when they were at Stanford, as a PhD student and postdoc, respectively. Now, Eileen is an Assistant Professor at Virginia Tech, moving to become an Assistant Professor at Colorado School of Mines in January 2022. Nila is an Assistant Professor at UC Berkeley.

Read More

The Multifaceted World of Data Storage

Illustration of Janet George

Janet George spoke to us when she was Western Digital’s chief data officer and first female Fellow. In this episode Janet explains that from manufacturing to product development, data science plays an important role in the storage industry. Janet is now the Group Vice President, Autonomous Enterprise, Advanced Analytics, Machine Learning & Artificial Intelligence at Oracle.

Read More

Eliminating Bias

Illustration of Jennifer Chayes

Jennifer Chayes, spoke to us when she was a technical fellow and managing director at Microsoft Research. She believes data scientists should build algorithms with Fairness, Accountability, Transparency, and Ethics – or FATE. Jennifer now serves as Associate Provost and Dean at the University of California at Berkeley (UCB).

Read More

WiDS Welcome | Margot Gerritsen | WiDS Stanford 2023

Thumbnail for WiDS Welcome | Margot Gerritsen | WiDS Stanford 2023

Margot Gerritsen, Professor Emerita & WiDS Executive Director, Stanford University opens WiDS Stanford 2023.

Biography:
Margot is Professor [Emerita] in the Department of Energy Science & Engineering at Stanford. Her specialties are data analysis, computer simulation and mathematical analysis of natural and engineering processes. From 2010 to 2018, she directed the Institute for Computational and Mathematical Engineering. From 2015 – 2020, she was the Senior Associate Dean for Educational Affairs in the School of Earth, Energy and Environmental Sciences. She co-founded WiDS in 2015. Margot is the Executive Director of WiDS Worldwide, and co-hosts the WiDS Podcast series.

Read More

Opening Address | Srinija Srinivasan | WiDS Stanford 2023

Thumbnail for Opening Address | Srinija Srinivasan | WiDS Stanford 2023

Srinija Srinivasan, Co-Founder, Loove opens WiDS Stanford 2023.

Biography:
Born in India and raised in Lawrence, Kansas, Srinija Srinivasan followed her siblings to college in California. Having studied artificial intelligence at Stanford and worked at a large-scale AI project after graduating, Srinija joined Yahoo! in 1995 as their fifth employee and self-titled Ontological Yahoo. She served as Vice President, Editor-in-Chief at Yahoo! for over 15 years, where her work centered on the human experience, from the categorization system of the Yahoo! Directory to editorial and policy issues globally. During that time she also chaired the board of non-profit SFJAZZ, and these experiences together inspired her to co-found Loove, a music venture exploring how commerce and technology can be guided by artistic values rather than letting our culture be led by market values. She’s a board member of the On Being Project and a vice chair of Stanford University’s Board of Trustees. She lives in Palo Alto, CA and Brooklyn, NY.

Read More

Proportionate Impacts of Policing in Chicago | Trina Reynolds-Tyler

Thumbnail for Proportionate Impacts of Policing in Chicago | Trina Reynolds-Tyler

Trina Reynolds-Tyler, Data Director, Invisible Institute presents the Technical Vision Talk “(DIS) Proportionate Impacts of Policing in Chicago”. Through public data records requests the Invisible Institute received an unprecedented amount of data related to misconduct records of the Chicago Police Department. Beneath the Surface analyzed these records to uncover patterns of gender based violence at the hands of police. A volunteer team of over 200 community members generated training data for Judy, our nickname for the algorithm which then parsed through narratives of complaints in more than 27,000 misconduct records between 2011 and 2015. We then were able to run a targeted search and identify a range of testimony representing shared experiences; connecting people across time and space. But what proportion is significant enough to constitute as evidence of a deeper issue? How does the universe of information we use to define the numerator or denominator impact our willingness to deepen our questions? Where do we draw the line between significance and meaningfulness when using data science to understand policing in America?

Biography:
Trina Reynolds-Tyler is the Data Director at the Invisible Institute, an abolitionist, and a native of south side Chicago. She leads Beneath the Surface, a project employing machine learning to identify gender based violence at the hands of Chicago police. Trina works to document how communities unable to depend on the police are creating safety and accountability outside of the carceral state. As a data scientist, she centers the practice of narrative justice in her inquiries.

Trina organizes with Not Me We, and is serving on a University of Chicago council attempting to measure the institution’s impact on the south side population. She developed the skills to use data science for real world problems as a Pozen Center for Human Rights intern with the Human Rights Data Analysis Group (HRDAG), and was a Pearson Institute Fellow. Trina holds a masters degree in public policy from the University of Chicago.

Read More

Putting our values into practice in data science | Megan Price, Jennifer Pan, Trina Reynolds-Tyler

Thumbnail for Putting our values into practice in data science | Megan Price

Panel: Putting our values into practice in data science work

Moderator:
Megan Price, Executive Director, Human Rights Data Analysis Group (HRDAG). As the Executive Director of the Human Rights Data Analysis Group, Megan drives the organization’s overarching strategy, leads scientific projects, and presents HRDAG’s work to diverse audiences. Her scientific work includes analyzing documents from the National Police Archive in Guatemala and contributing analyses submitted as evidence in multiple court cases in Guatemala. Her work in Syria includes collaborating with the Office of the United Nations High Commissioner of Human Rights (OHCHR) and Amnesty International on several analyses of conflict-related deaths in that country. In 2022 she was named a Fellow in the American Statistical Association.

Panelists:
Jennifer Pan is a Professor of Communication and Senior Fellow at the Freeman Spogli Institute at Stanford University. Her research resides at the intersection of political communication and authoritarian politics. Using large-scale datasets on political activity in China and other authoritarian countries, her work answers questions about how autocrats perpetuate their rule; how political censorship, propaganda, and information manipulation work in the digital age; and how preferences and behaviors are shaped as a result. Her papers have appeared in peer-reviewed publications such as Science, the American Political Science Review, the American Journal of Political Science, and Journal of Politics. She graduated from Princeton University, summa cum laude, and received her Ph.D. from Harvard University’s Department of Government.

Trina Reynolds-Tyler, Data Director, Invisible Institute, an abolitionist, and a native of south side Chicago. She leads Beneath the Surface, a project employing machine learning to identify gender based violence at the hands of Chicago police. Trina works to document how communities unable to depend on the police are creating safety and accountability outside of the carceral state. As a data scientist, she centers the practice of narrative justice in her inquiries.

Trina organizes with Not Me We, and is serving on a University of Chicago council attempting to measure the institution’s impact on the south side population. She developed the skills to use data science for real world problems as a Pozen Center for Human Rights intern with the Human Rights Data Analysis Group (HRDAG), and was a Pearson Institute Fellow. Trina holds a masters degree in public policy from the University of Chicago.

Read More

Real World Successes and Lessons Learned in Deploying ML Models | Wendy Ku

Thumbnail for Real World Successes and Lessons Learned in Deploying ML Models | Wendy Ku

Wendy Ku, Computer Vision Tech Lead, Senior Data Scientist, Getty Images presents the Technical Vision Talk “ML through a wide-angle lens: Real World Successes and Lessons Learned in Deploying ML Models”. Image search has been a well-established problem area across industries, with a wide range of applications including e-commerce, social media and search engines. As we collectively create and consume more visual content, image search capabilities are becoming increasingly more important. In recent years, multiple large-scale image-text models have been released, reinventing the performance of image-text understanding tasks. However, applying these generalized models out-of-the-box often results in less than desired performance. In practice, deploying and maintaining an image search system presents a different set of challenges.

Wondering what else is involved in a machine learning solution besides training and deployment? Or how real world model evaluations differ from Kaggle scoreboards? This talk will cover the less discussed journey of bringing language and image-text models to production.

Biography:
Wendy is a Senior Data Scientist at Getty Images, where she develops multilingual and visual-language representation models to improve users’ search experience. She leads Getty Images’ efforts on diagnosing bias and improving fairness in machine learning systems. Prior to joining Getty Images, Wendy was involved in product and operations optimization projects in cybersecurity, consumer finance and restaurant companies. When she’s not working, Wendy enjoys working on her art and running.

Read More

Embrace the journey: learnings & inspiration from a non linear path into Data | Gabriela de Queiroz

Thumbnail for Embrace the journey: learnings & inspiration from a non linear path into Data | Gabriela de Queiroz

Gabriela de Queiroz, Principal Cloud Advocate, Microsoft presents the Technical Vision Talk “Embrace the journey: learnings and inspiration from a non-linear path into Data Science”. This talk focuses on the importance of embracing non-linear career paths and the cumulative effect of seemingly disparate skills in becoming a successful data scientist. The talk highlights the power of a learning and growth mindset in overcoming obstacles and unlocking one’s full potential. Attendees will leave the talk feeling empowered to embrace their unique backgrounds and experiences and approach their careers with openness, honesty, and a willingness to learn and grow. Whether you are just starting out on your data science journey or looking to take your skills to the next level, this talk is an opportunity to be inspired, connect with like-minded individuals, and explore the limitless possibilities of a career in data science.

Biography:
Gabriela leads and manages the Global AI/ML/Data team in Education Advocacy. Before that, she worked at IBM as a Program Director on Open Source, Data & AI Technologies and then as Chief Data Scientist at IBM, leading AI Strategy and Innovations.

Gabriela is the founder of AI Inclusive, a global organization that is helping increase the representation and participation of gender minorities in Artificial Intelligence. She is also the founder of R-Ladies, a worldwide organization for promoting diversity in the R community with more than 200 chapters in 55+ countries.

Read More

Killing Diseases with Really Big Computers: Building Analysis Tools to Solve Disease| Marisa Torres

Thumbnail for Killing Diseases with Really Big Computers: Building Analysis Tools to Solve Disease| Marisa Torres

Marisa Torres, Bioinformatics Lead, Lawrence Livermore National Lab (LLNL) presents the Technical Vision Talk “Killing Diseases with Really Big Computers: Building Analysis Tools to Solve Disease”. Our team at LLNL has been working on improving bioinformatics tools and models for COVID-19 and for cancer. We think rapid response to a disease outbreak should be a national security priority. We’ve run huge gene simulations for drug discovery and built machine learning models on the results. For COVID-19, we‚Äôve aimed to design viral inhibitors with no adverse health reactions, and we‚Äôve successfully released most of this work publicly as a searchable and usable tool. We’re now scaling up our work from a few COVID-19 genes, up to tens of thousands of human genes for the American Heart Association. When designing therapeutics, we use every informatic and statistical tool available. We create new machine learning strategies for scaling virtual screens for the human or microbe. We use 3D models and docking poses of protein structures to make predictions. We screen for safety properties, because we don‚Äôt want detrimental interactions.

Biography:
Marisa designs, implements, and integrates data in relational databases, provides software engineering support for DNA signature discovery, and responds to internal and external customer requests for signature analysis. She has provided signature development and bioinformatics analysis for the Environmental Protection Agency and the National Bio-Forensics Analysis Center. In 2000, Marisa designed DNA signatures that were promoted for use in the BASIS program. She has taken the lead on signature erosion checking, which during the most recent DHS proposal cycle was recognized as important for continued reliable detection of pathogens, and she supports public health and biosecurity customers combining her versatile skill set of software engineering and biology background.

Read More

Harnessing AI and Data Science for Health Equity within Communities | Irene Dankwa-Mullan

Thumbnail for Harnessing AI and Data Science for Health Equity within Communities | Irene Dankwa-Mullan

Irene Dankwa-Mullan, Chief Health Equity Officer at Merative & Affiliate Professor at GWU Milken Institute School of Public Health presents Technical Vision Talk “Harnessing AI and Data Science for Health Equity within Communities”. A robust data science agenda can help support communities in their interventions to achieve health equity, and measure progress toward ensuring quality and optimal health for all. However, there are challenges for data science in promoting community-engaged interventions addressing health disparities. This talk will provide a background on the role of data science in promoting a vision for a productive health AI ecosystem of research, technology development and implementation to improve community health and advance health equity.

Biography:
Irene Dankwa-Mullan is an affiliate professor in the Department of Health Policy and Management, Milken Institute School of Public Health at The George Washington University. She is a nationally recognized industry physician, scientist, thought leader, author with over 20 years of diverse leadership experience in primary care, healthcare, businesses, and the community. She also serves in a strategic advisory role for various health technology start-ups. Irene most recently served as Chief Health Equity Officer at IBM Watson Health and provided leadership for the data and evidence strategy for implementation of technology and clinical decision-support solutions. She was previously Deputy Director for extramural scientific programs at the National Institute. Irene has published widely on health equity, community and public health and building AI technologies for social good.

Read More

Keynote: A Sparkle in the Dark The Outlandish Quest for Dark Matter | Maria Elena Monzani

Thumbnail for Keynote: A Sparkle in the Dark The Outlandish Quest for Dark Matter | Maria Elena Monzani

Maria Elena Monzani, Lead Scientist, SLAC National Accelerator Laboratory and Kavli Institute for Particle Astrophysics and Cosmology, Stanford University leads the Keynote Address “A Sparkle in the Dark: The Outlandish Quest for Dark Matter.” The nature and origin of dark matter are among the most compelling mysteries of contemporary science. There is strong evidence for dark matter from its role in shaping the galaxies and galaxy clusters that we observe in the universe. Still, for over three decades, physicists have been trying to detect the dark matter particles themselves with little success.

This talk will describe the leading effort in that search, the LUX-ZEPLIN (LZ) detector. LZ is an instrument that is superlative in many ways. It consists of 10 tons of liquified xenon gas, maintained at almost atomic purity and stored in a refrigerated titanium cylinder a mile underground in a former gold mine in Lead, South Dakota.

During its science run, LZ is projected to accumulate a massive dataset, consisting of many petabytes of data and recording several billions of particle interactions, only a handful of which might be produced by potential dark matter candidates (if nature cooperates). Identifying the dark matter signals in this amassment of data represents an extreme “needle in a haystack” problem, and requires leveraging advanced detector design and stat-of-the art machine learning algorithms. The talk will present some of the challenges in constructing this large-scale underground experiment and interpreting its data, along with the prospects LZ presents for finally discovering the dark matter particle, and recently-released results from its initial search for new physics.

Biography:
Maria Elena Monzani is a dark matter data wrangler. Her research field is Astroparticle physics, which focuses on topics at the intersection between particle physics and astrophysics/cosmology, using the tools of data intensive science. She received a dual PhD from University of Milano and University of Paris 7, performing research with the Borexino experiment that measured neutrinos produced by the Sun. She then held a postdoctoral position at Columbia University before joining SLAC in 2007 to work on the Fermi Gamma-ray Space Telescope. Today, Monzani is a lead scientist at SLAC and a senior member of the Kavli Institute for Particle Astrophysics and Cosmology at Stanford. She leads the software computing effort for the LZ Dark Matter Experiment and the science operations team for the Fermi satellite. She is also an Adjunct Scholar at the Vatican Observatory, and enjoys discussing the shared philosophical foundations of the scientific and religious endeavors.

Read More

Preparing for a career in DS | Montse Cordero, Adriana Velez Thames, Elaine Yi Xu, Sanne Smith

Thumbnail for Preparing for a career in DS | Montse Cordero

Panel: Preparing for a career in data science

Moderator:
Sanne Smith, Director of Master’s Program, Education Data Science, Stanford University, is the director of the master‚Äôs program Education Data Science and a lecturer at the Stanford Graduate School of Education. She teaches courses that introduce students to coding, data wrangling and visualization, various statistical methods, and the interpretation of quantitative research. She studies social networks and thriving, diverse contexts.

Panelists:
Montse Cordero, Mathematics Designer, youcubed, is a mathematics designer for youcubed, a center at Stanford University that aims to inspire, educate and empower teachers of mathematics, transforming the latest research on maths learning into accessible and practical forms. He is a co-author and professional development provider for youcubed’s Explorations in Data Science high school curriculum and has participated in multiple national summits for the advancement of data science in K-12 education (Data Science 4 Everyone Coalition, National Academies of Sciences Engineering and Medicine). Montse is also a mathematician interested in work at the intersection of combinatorics, algebra, and geometry. In all facets of their work, Montse endeavors to change the ways our culture thinks and talks about mathematics.

Adriana Velez Thames, Geophysicist-Data Scientist, Springboard Alumni. Adriana recently completed a transition to Data Science after many years in the Oil and Gas industry as a Senior Geophysicist. Her primary focus was in seismic data processing for imaging the Earth’s subsurface to guide energy exploration projects. From 2012-2019, she worked at TGS where her responsibilities included QC of deliverables, testing of internal software updates, and conducting test projects and benchmarks. This involved extensive analysis and manipulation of terabyte-sized digital subsurface data using sophisticated algorithms. She believes that data-driven decisions are the best way to solve problems in any industry. Having been born in Colombia and attained post-graduate degrees in Russia, she is fluent in English, Spanish, and has working proficiency in Russian. Currently she continues educational studies in data science and spatial data science.

Elaine Yi Xu, Staff Business Data Analyst, Intuit, is a passionate data analytics and data science practitioner, putting her undergrad degree in Statistics and MS in Info Sys and DS into everyday business decision-making. She’s been working in-house in web analytics, product analytics, and marketing analytics for multiple industries, including retail (lululemon), automotive (Kelley Blue Book), and most recently at Intuit, the global technology platform. She specializes in the measurement of Go-To-Market marketing strategies, assessment of marketing campaign effectiveness, optimization of user experience, and A/B Testing. She thrives to be the connective tissue between business, analytics, engineering, and data science, combining all facets of science to help arrive at the most optimal business decisions.

Read More

Bringing Motion Diffusion Models to Immersive Entertainment | Jhanvi Shriram and Ketaki Shriram

Thumbnail for Bringing Motion Diffusion Models to Immersive Entertainment | Jhanvi Shriram and Ketaki Shriram

Jhanvi Shriram, Co-Founder and CEO, Krikey alongside Ketaki Shriram
Co-Founder and CTO, Krikey present Technical Vision Talk “Bringing Motion Diffusion Models to Immersive Entertainment”. Most generative models thus far have focused on utilizing LLMs for consumer products. The introduction of motion diffusion models to this space provides a novel avenue to engage consumers, especially in the field of entertainment. This talk will cover a text-to-animation motion diffusion model. This model generates animations in less than 5 minutes. These animations can be applied to any 3D file and utilized with any 3D software. Practical applications include optimizing production pipelines for gaming, film, and immersive learning. We will also cover the implications for these industries as they adopt new generative tools in production workflows. To learn more about our tool and try it for yourself, please visit krikey.ai.

Jhanvi Biography:
Jhanvi is currently the CEO of Krikey, an AI gaming tools service that she co-founded with her sister. Krikey recently closed their Series A round, led by Reliance Jio, India’s biggest telecom operator. Prior to Krikey, Jhanvi worked at YouTube as a Production Strategist on operations and creator community programs, which sparked her interest in working with content creators. She also worked at JauntVR and Participant Media. In 2014, Jhanvi and her sister, Ketaki Shriram, co-produced a feature film titled, ‚ÄúTrue Son,‚Äù which followed a 22-year old‚Äôs political campaign in Stockton, CA. The film premiered at the 2014 Tribeca Film Festival and was acquired by FusionTV/Univision. Jhanvi holds a BA (Political Science and African Studies) and MBA from Stanford University, and a MFA (Producing) from USC. You can learn more here: krikey.ai.

Ketaki Shriram Biography:
Dr. Shriram is a scientist, film producer, and wildlife photographer interested in the impact of immersive worlds on human behavior. She is currently the Chief Technology Officer at Krikey, an AI gaming tools service that she co-founded with her sister. Krikey recently closed their Series A round, led by Reliance Jio, India’s biggest telecom operator. Dr. Shriram received her BA, MA, and PhD at the Stanford Virtual Human Interaction Lab. She previously worked at Google [x] and at Meta‚Äôs Reality Labs. Dr. Shriram was selected for the Forbes 30 Under 30 2020 Class in the Gaming category. You can learn more here: krikey.ai.

Read More

Data Democratization Panel | Priya Donti, Julia Stewart Lowndes, Nikki Tulley, Michela Taufer

Four women sitting on stage during a conference panel.

Panel: Data democratization: a powerful means for creating sustainable and equitable communities

Moderator:
Michela Taufer is an ACM Distinguished Scientist and holds the Dongarra Professorship in High-Performance Computing in the Department of Electrical Engineering and Computer Science at the University of Tennessee Knoxville (UTK). She earned her undergraduate degree (Laurea) in Computer Engineering from the University of Padova (Italy) and her doctoral degree (Ph.D.) in Computer Science from the Swiss Federal Institute of Technology or ETH (Switzerland). From 2003 to 2004, she was a La Jolla Interfaces in Science Training Program (LJIS) Postdoctoral Fellow at the University of California San Diego (UCSD) and The Scripps Research Institute (TSRI), where she worked on interdisciplinary projects in computer systems and computational chemistry.

Michela is well-known for her work in establishing trustworthy scientific discoveries on heterogeneous cyberinfrastructures. Throughout her career, she has put the principle of trustworthiness into practice. She has promoted scientific computing for the general population through volunteer computing, defined accurate scientific applications on accelerators and GPUs, and developed in situ analysis methods for scientific workflows on converging HPC and Cloud platforms. She has been serving as the principal investigator of several NSF collaborative projects. She has significant experience in mentoring a diverse population of students on interdisciplinary research and establishing long-lasting workforce development.

Panelists:
Priya Donti, Co-Founder and Executive Director, Climate Change AI (CCAI). Climate Change AI, a global non-profit initiative to catalyze impactful work at the intersection of climate change and machine learning, which she is currently running through the Cornell Tech Runway Startup Postdoc Program. She will also join MIT EECS as an Assistant Professor in Fall 2023. Her research focuses on developing physics-informed machine learning methods for forecasting, optimization, and control in high-renewables power grids. Priya received her Ph.D. in Computer Science and Public Policy from Carnegie Mellon University, and is a recipient of the MIT Technology Review’s 2021 “35 Innovators Under 35” award, the ACM SIGEnergy Doctoral Dissertation Award, the Siebel Scholarship, the U.S. Department of Energy Computational Science Graduate Fellowship, and best paper awards at ICML (honorable mention), ACM e-Energy (runner-up), PECI, the Duke Energy Data Analytics Symposium, and the NeurIPS workshop on AI for Social Good.

Julia Stewart Lowndes, Director, Openscapes is a marine ecologist working at the intersection of actionable environmental science, data science, and open science. Julia’s main focus is mentoring teams to develop technical and leadership mindsets and skills for data-intensive research, grounded in climate solutions, inclusion, and kindness. She founded Openscapes in 2018 as a Mozilla Fellow and Senior Fellow at the National Center for Ecological Analysis and Synthesis (NCEAS) at the University of California Santa Barbara (UCSB), having earned her PhD from Stanford University in 2012 studying drivers and impacts of Humboldt squid in a changing climate.

Nikki Tulley, Doctoral Student, University of Arizona; Indigenous Researcher, NASA Ames Research Center. Nikki is from the Navajo Nation (NN), an Indigenous Nation located in the United States. The work and research Nikki does is influenced by her upbringing. Born and raised on the NN Reservation, she has seen firsthand the impacts of water access and water quality challenges rural communities face. The NN has wicked water problems related to anthropogenic activities and climate change. Now, as an Indigenous Scientist, she recognizes that opportunity to braid traditional ecological knowledge and western science together to address water challenges. Taking a step beyond braiding the two knowledge systems together she has begun to use Earth Observation satellite imagery to tell a story of the changes being monitored from space and those observed from the landscapes. Nikki’s passion is empowering communities through data access and capacity building. She believes that community involvement in research can significantly aid in seeking solutions for resilient and sustainable communities.

Read More

Uncovering Online Censorship and Propaganda in China | Jennifer Pan

Thumbnail for Uncovering Online Censorship and Propaganda in China | Jennifer Pan

Jennifer Pan, Professor of Communication and FSI Senior Fellow, Stanford University presents the Technical Vision Talk “Uncovering Online Censorship and Propaganda in China.” Although digital communication technologies have revolutionized the way information can flow across borders and national boundaries, governments all over the world impose restrictions on access to digital information. Nowhere is the effort to control and manipulate the flow of digital information more sophisticated, more extensive and more sustained than in China. Controlling China’s digital ecosystem involves a huge organizational effort that is obviously designed to suppress information, but this effort paradoxically reveals the goals, intentions, actions of Chinese regime when its footprints are analyzed at scale..

Biography:
Jennifer Pan is a Professor of Communication and Senior Fellow at the Freeman Spogli Institute at Stanford University. Her research resides at the intersection of political communication and authoritarian politics. Using large-scale datasets on political activity in China and other authoritarian countries, her work answers questions about how autocrats perpetuate their rule; how political censorship, propaganda, and information manipulation work in the digital age; and how preferences and behaviors are shaped as a result. Her papers have appeared in peer-reviewed publications such as Science, the American Political Science Review, the American Journal of Political Science, and Journal of Politics. She graduated from Princeton University, summa cum laude, and received her Ph.D. from Harvard University’s Department of Government.

Read More

What is the Cost of Being Wrong | Megan Price

Thumbnail for What is the Cost of Being Wrong | Megan Price

Megan Price, Executive Director, Human Rights Data Analysis Group (HRDAG) presents the Technical Vision Talk “What is the Cost of Being Wrong? Machine learning models are a versatile tool in a statistician‚Äôs analytical toolbox. As George Box is credited with saying, ‚ÄúAll models are wrong, some are useful.‚Äù How can we identify the contexts when machine learning models are most useful? How can we identify the contexts where they pose the most risk for harm? These questions will be answered using examples from work by the Human Rights Data Analysis Group”.

Biography:
As the Executive Director of the Human Rights Data Analysis Group, Megan drives the organization’s overarching strategy, leads scientific projects, and presents HRDAG’s work to diverse audiences. Her scientific work includes analyzing documents from the National Police Archive in Guatemala and contributing analyses submitted as evidence in multiple court cases in Guatemala. Her work in Syria includes collaborating with the Office of the United Nations High Commissioner of Human Rights (OHCHR) and Amnesty International on several analyses of conflict-related deaths in that country. In 2022 she was named a Fellow in the American Statistical Association.

Read More

Openscapes Supporting Kinder Science for Future Us | Julia Stewart Lowndes

Thumbnail for Openscapes Supporting Kinder Science for Future Us | Julia Stewart Lowndes

Julia Stewart Lowndes, Director, Openscapes presents Technical Vision Talk “Openscapes: Supporting Kinder Science for Future Us”. At Openscapes, we believe open science can accelerate interoperable, data-driven solutions and increase diversity, equity, inclusion, and belonging in research and beyond. Our main activity is mentoring environmental and Earth science teams in open science, and connecting and elevating these researchers both through tech like R, Python, Quarto, and JupyterHubs and communities like RLadies, Black Women in Ecology Evolution, and Marine Science, Ladies of Landsat, and NASA. We will share stories and approaches about open science as a daily practice ‚Äì better science for future us ‚Äì and welcome you to join the movement.

Biography:
Julia Stewart Lowndes, PhD, is a marine ecologist working at the intersection of actionable environmental science, data science, and open science. Julia’s main focus is mentoring teams to develop technical and leadership mindsets and skills for data-intensive research, grounded in climate solutions, inclusion, and kindness. She founded Openscapes in 2018 as a Mozilla Fellow and Senior Fellow at the National Center for Ecological Analysis and Synthesis (NCEAS) at the University of California Santa Barbara (UCSB), having earned her PhD from Stanford University in 2012 studying drivers and impacts of Humboldt squid in a changing climate.

Read More

Making Biosignal Interfaces Accessible | Momona Yamagami

Thumbnail for Making Biosignal Interfaces Accessible | Momona Yamagami

Momona Yamagami, Incoming Assistant Professor, Electrical and Computer Engineering, Rice University presents the Technical Vision Talk on “Making Biosignal Interfaces Accessible”. Biosignal interfaces that use electromyography sensors, accelerometers, and other biosignals as inputs provide promise to improve accessibility for people with disabilities. However, generalized models that are not personalized to the individual‚Äôs abilities, body sizes, and skin tones may not perform well. Individualized interfaces that are personalized to the individual and their abilities could significantly enhance accessibility.

In this talk, I discuss how continuous (i.e., 2-dimensional trajectory-tracking) and discrete (i.e., gesture) electromyography (EMG) interfaces can be personalized to the individual. For the continuous task, we used methods from game theory to iteratively optimize a linear model that mapped EMG input to cursor position. For the discrete task, we developed a dataset of participants with and without disabilities performing gestures that are accessible to them. As biosignal interfaces become more commonly available, it is important to ensure that such interfaces have high performance across a wide spectrum of users.

Biography:
Momona will be an Assistant Professor at Rice University Electrical & Computer Engineering starting summer 2023 as part of the Digital Health Initiative. Her research focuses on modeling and enhancing human-machine interaction (HMI) to support accessibility and health using biosignals and control theory applied to the field of HCI (human-computer interaction). I am currently a CREATE postdoctoral scholar at the University of Washington in Seattle, WA, advised by Prof. Jennifer Mankoff.

Momona’s dissertation research leveraged control theory methods to model and enhance continuous HMIs and explore biosignals like electromyography (EMG) as accessible machine inputs for people with and without disabilities. Her current research interests include how multi-input biosignals can improve HMI accessibility for new and emerging technology like virtual reality and support the health of people with disabilities.

Read More

Closing Remarks | Susan Malaika | WiDS Stanford 2023

Thumbnail for Closing Remarks | Susan Malaika | WiDS Stanford 2023

Susan Malaika, Senior Technical Staff Member, IBM leads the closing remarks at WiDS Stanford 2023.

Biography:
Susan Malaika is Senior Technical Staff Member in the Ecosystem Engineering & Developer Advocacy group at IBM. Her specialties include opensource software, opendata, databases, and community building. Susan has worked in various engineering roles particularly in database development. She has served as Tech Advisory Council Representative at LF-AI & Data as well as co-lead for the Principles Working Group in the Trusted AI Committee. Susan was a founder of the JanusGraph project now at the Linux Foundation,. She has led workshops, hackathons, meetups and she engages with universities globally – and in the MENA region in particular. Susan also leads a tech community of a few hundred members in the New York area and hosts community sessions to foster collaboration.

Read More

Keynote: Why ‘users first’ is important for a good monetization | Gayatree Ganu

Thumbnail for Keynote: Why “users first” is important for a good monetization | Gayatree Ganu

Gayatree Ganu, Vice President, Data Science, Facebook presents Keynote Address “Put the horse before the cart: Why ‚Äúusers first‚Äù is important for a good monetization strategy”.
Meta has over 3B users on our platform engaging with our different products and services. Meta also makes over $100B annually through advertising. There is a strong connection between user engagement on our platform and how we build a sustainable business. Our mission statement for ads at Meta is “Make meaningful connections between people and businesses”. Connecting users to monetization or ads is an important part of Meta‚Äôs long term success. In this talk I will describe the frameworks to connect user engagement and revenue potential, allowing us to focus our products and services. We will also discuss how high quality and relevant ads can actually bring more engagement to our platform, making it a win-win situation. We will cover a lot of fun and challenging data science topics from weighted metrics, producer-consumer experimental setups, counterfactuals, incrementality, all at an extraordinary scale of 3B users and $100B!

Biography:
Gayatree Ganu leads the Engagement Ecosystem and Monetization Data Science teams at Facebook. The Engagement Ecosystem team’s mission is to inform Facebook’s strategy through better understanding and forecasting the health of the app. The Monetization team’s mission is to give everyone a voice and to champion economic prosperity. Gayatree leads a Data Science team with a diverse portfolio spanning modeling and machine learning, product optimizations of user experience, and strategic innovations. Gayatree has a PhD in Computer Science in Search and Recommendations from Rutgers University. She joined Facebook (now Meta) in 2013 and has worked on several problems and product areas through the last 10 years.

Gayatree believes deeply in fairness and equality in opportunity and is passionate about bringing more representation and providing sustained support to women and under-represented minorities in Tech. She leads recruiting for all Data Science roles at Meta, and is helping build an organization that values diverse perspectives as well as strong technical and analytical skills.

Read More

Optimization in the loop machine learning for energy and climate | Priya Donti

Thumbnail for Optimization in the loop machine learning for energy and climate | Priya Donti

Priya Donti, Co-Founder and Executive Director, Climate Change AI presents Technical Vision Talk “Optimization-in-the-loop machine learning for energy and climate”. Addressing climate change will require concerted action across society, including the development of innovative technologies. While machine learning (ML) methods have the potential to play an important role, these methods often struggle to contend with the physics, hard constraints, and complex decision-making processes that are inherent to many climate and energy problems. To address these limitations, I present the framework of ‚Äúoptimization-in-the-loop ML,‚Äù and show how it can enable the design of ML models that explicitly capture relevant constraints and decision-making processes. For instance, this framework can be used to design learning-based controllers that provably enforce the stability criteria or operational constraints associated with the systems in which they operate. It can also enable the design of task-based learning procedures that are cognizant of the downstream decision-making processes for which a model‚Äôs outputs will be used. By significantly improving performance and preventing critical failures, such techniques can unlock the potential of ML for operating low-carbon power grids, improving energy efficiency in buildings, and addressing other high-impact problems of relevance to climate action.

Biography:
Priya Donti is the Co-founder and Executive Director of Climate Change AI, a global non-profit initiative to catalyze impactful work at the intersection of climate change and machine learning, which she is currently running through the Cornell Tech Runway Startup Postdoc Program. She will also join MIT EECS as an Assistant Professor in Fall 2023. Her research focuses on developing physics-informed machine learning methods for forecasting, optimization, and control in high-renewables power grids. Priya received her Ph.D. in Computer Science and Public Policy from Carnegie Mellon University, and is a recipient of the MIT Technology Review’s 2021 “35 Innovators Under 35” award, the ACM SIGEnergy Doctoral Dissertation Award, the Siebel Scholarship, the U.S. Department of Energy Computational Science Graduate Fellowship, and best paper awards at ICML (honorable mention), ACM e-Energy (runner-up), PECI, the Duke Energy Data Analytics Symposium, and the NeurIPS workshop on AI for Social Good.

Read More

Productizing Data for Humanitarian Aid Applications | Kathryn Hymes

Thumbnail for Productizing Data for Humanitarian Aid Applications | Kathryn Hymes

Kathryn Hymes, Lead of Product and Innovation, Médecins Sans Frontières-USA presents the Technical Vision Talk “Productizing Data for Humanitarian Aid Applications”. In humanitarian efforts focused on delivering medical interventions in low-resource settings, there are many opportunities for data science to improve decision-making and produce valuable insights, both on the ground and in long-term operations. This talk will focus on product approaches to data that support insights for long-term engagement with some of the work of M√©decins Sans Fronti√®res, a global aid organization focused on public health.

Biography:
Kathryn Hymes is a technologist, computational linguist, and game designer. She currently serves as the lead of product and innovation at Médecins Sans Frontières-USA. She leads a humanitarian tech team building new products rooted in modern engineering practice to aid in MSF’s global work. Previously she was the head of international product expansion at Slack and an advisor at Airtable. She is a fellow at the Berkman Klein Center for Internet and Society with a focus on how playful design can contribute to a better digital life. Kathryn is a co-founder of Thorny Games (https://thornygames.com/), an award-winning design studio that regularly collaborates with universities, nonprofits and museums to apply playful design to hard problems. Her writing has appeared in The Atlantic, Wired, and The New York Times. Kathryn holds an MS in Computational and Mathematical Engineering from Stanford, an MA in Linguistics from Stanford, and a BS in Math from UCLA.

Read More

Elena Martinez Encourages High School Girls Through the WiDS Next Gen Program

Photograph of Elena Martinez

Elena Martinez, who entered school as a “limited English speaker,” discovered an early affinity for math and went on to graduate from college with a degree in math and computer science. She now encourages high school girls to pursue STEM through the WiDS Next Gen Program. Elena has served on the WiDS Next Gen Committee and is a first-year graduate student in Stanford Institute for Computational and Mathematical Engineering (ICME).

Read More

WiDS Datathon 2022 Sparks Collaboration, Learning, and New Friendships

Side by side portraits of Pravallika Myeni and Anissa Amziani

Two women on different sides of world who had never participated in a Kaggle competition came together as a team to compete in the WiDS Datathon 2022—leading to new skills, confidence, and a lasting friendship. The datathon goal was to analyze the energy efficiency of buildings. Participants analyzed regional differences in building energy efficiency, creating models to predict building energy consumption, an important first step in understanding how to maximize energy efficiency.

We asked Pravallika Myeni and Anissa Amziani to tell us more about themselves and the experience of working together during the datathon competition.

Read More

Machine Learning Provides Insights and Solutions Across Industries

Six portraits of women. Tamara Kolda, Julia Ling, Manogna Mantripragada, Sara Khalid, Sherrie Wang, Marzyeh Ghassemi. With WiDS branded illustrations. Machine learning provides insights and solutions.

Machine learning impacts so many parts of our daily lives, whether it’s doing a Google search, using your smart watch, or getting movie recommendations. Healthcare providers and businesses are using it to gain insights to guide decision making. And it’s also being applied to solve some of the world’s toughest problems like climate change and food security. Several recent WiDS talks, workshops, and podcasts describe the exciting applications of machine learning today and offer skills development workshops for new and experienced data scientists.

Read More

Principles of Good Data Viz | Jenn Schilling

Thumbnail for Principles of Good Data Viz | Jenn Schilling

What key principles of design and data viz do you need to know to create effective and clear graphs? This talk will cover preattentive attributes, Gestalt principles, and principles of color use. It will provide the key concepts from design and data viz research that you need to know to communicate data effectively. The talk will include examples to demonstrate applying the concepts and comparing data viz effectiveness.

This workshop was conducted by Jenn Schilling, Founder of Schilling Data Studio.

Read More

Data Science in Healthcare | Mrs Emily Godson (née Wheaton)

Thumbnail for Data Science in Healthcare | Mrs Emily Godson (née Wheaton)

The integrated use of data science and machine learning in healthcare has grown in popularity in recent years with many applications becoming engrained in our healthcare systems. Recent advancements in digitalization of healthcare data, production of masses of data from both operational activities in a healthcare setting and at a patient level from sensors and scans etc, has enabled many more applications and research.

In this session we will discuss data science applications in the healthcare industry as well as some of the ethics and considerations required when delivering Data Science solutions in the industry.

This workshop was conducted by Mrs Emily Godson (née Wheaton), Data Scientist / Big Data Mining – Senior at Hitachi Vantara.

Read More

Introduction to Linear Regression | Laura Lyman

Thumbnail for Introduction to Linear Regression | Laura Lyman

Linear regression is a fundamental tool in statistics and data science for modeling the relationship between different parameters. It can be used for prediction, forecasting and error reduction by fitting a predictive model between a response variable and a collection of explanatory variables based on an observed data set. Through linear regression analysis, we can quantify the strength of the linear relationship between the response and different explanatory variables, and we can identify parameters that may contain redundant information.

This workshop introduces the basics of simple and multiple linear regression. We will present both mathematical theory and applications in the context of real data sets — ranging from survey results collected by the US National Center for Health Statistics (NHANES), to real estate listings in Sacramento, CA. After the talk, the R code used will be provided, so attendees can revisit examples of how to apply this foundational modeling method.

This workshop was conducted by Laura Lyman, Instructor of Mathematics, Statistics, and Computer Science (MSCS) at Macalester College

Read More

How data visualization helps people understand and explore data

Photographs of Fernanda Viégas, Nicole Crosdale, Jenn Schilling, and Pariza Kambo. With WiDS branded illustrations.

​With the massive amounts of data that are generated and collected today, data visualization is an invaluable tool to help people explore and understand what it all means. Data visualizations can be exploratory to help analyze the data and explanatory to present insights to a broader audience. Both art and science, data visualization turns information into images and helps people see patterns, trends, and outliers in large data sets. Here is a sampling of recent WiDS talks and workshops that delve into different aspects of data visualization.

Read More

Introduction to Precision Medicine: From Statistics to Society

Thumbnail for Introduction to Precision Medicine: From Statistics to Society

Precision medicine aims to learn from data how to match the right treatment to the right person at the right time. One common goal in precision medicine is the estimation of optimal dynamic treatment regimens (DTRs), sequences of decision rules that recommend treatments to patients in a way that, if followed, would optimize outcomes for each individual and overall, in the targeted population. In this presentation, we will describe how the precision medicine framework formalizes sequential clinical decision-making and briefly review a subset of the most popular strategies for learning optimal dynamic treatment regimes. We will then invite the workshop group to ideate and discuss the critical opportunities and challenges for the translation of DTRs to clinical and community care, the role of stakeholder engagement and cross-disciplinary collaboration, and considerations for evaluating DTRs in practice.

This workshop was conducted by Nikki Freeman and Anna Kahkoska from the University of North Carolina at Chapel Hill.

Slides and resources used in this workshop: https://bit.ly/precision_medicine_slides

Read More

Earth observation & machine learning for agroecological applications

Thumbnail for Earth observation & machine learning for agroecological applications

The usage of machine learning (ML) has been growing exponentially. Its significant power in generalization and a large amount of available data make machine learning indispensable. In parallel, humanity is focused more than ever on space exploration, developing cutting-edge Earth Observation (EO) technology. Have you ever wondered how these two can be combined?

One domain that can be greatly benefited from this coalition is agriculture. With climate change and population rise, maintaining natural ecosystems while enhancing agricultural productivity and supporting farmers is of primary importance. In this sense, ML and EO technologies are the key enablers in developing actionable recommendations for farmers and policymakers to achieve resilient agriculture. In this workshop, we discuss the usage of ML for EO-related applications, focusing on agriculture and ecosystem services. We will present two applications of how ML bridges the gap between scientific knowledge and actionable advice for farmers and policymakers. The first application will consist of a predictive ML model related to the occurrence of pests in cotton fields. The second application will showcase the combination of a geographical model and an ML algorithm to identify the local-specific contribution of agricultural management to ecosystem services. For both applications, there will be live demonstrations using Python and R. By the end of this workshop, we hope you will be acquainted with establishing the link between machine learning, earth observation, and sustainable agriculture. Wishing you a fruitful exploration of this field having provided you with the necessary tools to start your journey!

This workshop was conducted by Roxanne Suzette Lorilla and Ornela Nanushi from the National Observatory of Athens.

Slides and materials used in this workshop: https://bit.ly/agroecological_applica…

Read More

Catching Fire: Autonomous Drones to Detect and Track Wildfires | Mathworks

Thumbnail for Catching Fire: Autonomous Drones to Detect and Track Wildfires | Mathworks

Can drones help prevent natural disasters? Wildfires have become highly destructive in recent years, ravaging the environment and human lives. In this hands-on workshop, build a wildfire detection system with autonomous drones. Explore cutting-edge methods to detect fire outbreaks and predict their direction of spread. Gain skills in simulation and AI that you can apply to life-saving problems.

This workshop was conducted by Shweta Singh, Sheeba Ransing and Arushi Kapurwan from Mathworks.

Resources used for this workshop can be accessed on Github: https://bit.ly/wids_catching_fire
Slides for this workshop: https://bit.ly/3DvenVR

Read More

Mitigating Bias in Machine Learning and Data Science

Photographs of Menglin Cao, Leda Braga, and Susan Athey. With WiDS branded illustrations in the background.

​AI and machine learning are increasingly being used across industry and government to make decisions impacting many parts of our lives. These technologies could determine who gets a job interview, what products are advertised to different audiences, or what government resources are allocated to different populations. Bias can become embedded in the development of AI systems either through the data and/or the development and evaluation of algorithms. This can result in inaccurate predictions that can significantly impact people’s lives. Several WiDS talks describe how bias in machine learning can impact everything from online ads to search recommendations to bus routes.

Read More

A Data Scientist’s Deep Dive into the WiDS Datathon

Working with the WiDS Datathon dataset over the past week has been a thrilling exercise. This dataset presents an opportunity to learn about interesting and real-world modeling challenges, and is different from other curated datasets in textbooks and classic machine learning exercises. For that reason, I discuss some of the challenges you may experience around missing data, multicollinearity and linear/ nonlinear approaches. I will also provide resources to help you on these topics.

Read More

Creating Data Visualizations with Spotify Data | Nicole Crosdale

Thumbnail for Creating Data Visualizations with Spotify Data | Nicole Crosdale

This workshop is targeted toward those who are new to coding. This presentation will teach an individual how to analyze their personal Spotify data, create visualizations and prepare their data to be used in business processes. This demonstration will use Python so a new coder will understand foundational coding syntax that can be used in other languages.

This workshop was conducted by Nicole Crosdale, a Graduate student at the University of Florida.

Resources and slides for this workshop: https://bit.ly/spotify_resources

Read More

Using MATLAB and Python Together| Mathworks

Thumbnail for Using MATLAB and Python Together| Mathworks

You’ve heard it before – Python vs MATLAB vs R but in reality, programming languages are often used together! In this hands-on workshop, you’ll learn how to use MATLAB and Python together with practical examples. Specifically, you’ll learn how to: – Call Python libraries from MATLAB – Call user-defined Python commands, scripts, and modules – Manage and convert data between languages – Package MATLAB algorithms to be called from Python

This workshop was conducted by Heather Gorr, Senior Product Marketing Manager, MATLAB and Grace Woolson, Student Competitions Technical Evangelist – Data Science at Mathworks.

Resources and slides for this workshop: https://bit.ly/matlab_python_slides

Read More

From Integrated Circuits to AI at the Edge: Fundamentals of Deep Learning & Data-Driven Hardware

Thumbnail for From Integrated Circuits to AI at the Edge: Fundamentals of Deep Learning & Data-Driven Hardware

In this workshop, I would like to share my journey transitioning from an electrical engineer focusing on ultra-low power integrated circuit design to an AI Solution Architect. Through specific examples of how the two fields connect, I will discuss the fundamentals of deep learning and data-driven hardware design. I will start with my experience in the semiconductor industry designing application-specific and data-dependent hardware for IoT systems and then discuss how this experience led to my career in AI specializing in areas including high-performance computing, edge computing, and more recently, federated learning.

I hope the attendees will not only find the technical content informative but also see how a growth mindset truly helped me find my career passion. Having a broad knowledge of the eco-system that supports AI applications – such as the hardware stack, hardware level optimization, and application-specific hardware design – can be very helpful to understanding and choosing the right platform for operational AI. I also hope to use this opportunity to connect with fellow AI/hardware enthusiasts in WiDS.

This workshop was conducted by Chu Lahlou, AI Specialized Cloud Solution Architect at Microsoft.

Read More

Linear Least Squares | Abeynaya Gnanasekaran

Thumbnail for Linear Least Squares | Abeynaya Gnanasekaran

The least squares method is one of the most widely used techniques in data science and is used to fit a linear model to data. In this workshop, we will study least squares problems from a linear algebraic perspective and discuss the techniques to solve them.

This workshop assumes that you have a basic understanding of linear algebra including concepts such as matrices, rank, range space, orthogonality, and matrix decompositions (Cholesky, QR, SVD).

This workshop was conducted by Abeynaya Gnanasekaran, a Senior Research Engineer at Raytheon Technologies Research Center.

Read More

Evaluating Effectiveness: Robustness, Reproducibility, and Interpretability of Algorithms

Photographs ofCindy Orozco Bohorquez, Chiara Sabatti, andJöelle Pineau. WiDS branded illustrations in the background

While there have been amazing achievements with machine learning in recent years, reproducing results for state-of-the-art deep learning methods is seldom straightforward. Three leading data scientists share their views at recent WIDS conferences on the importance of establishing structures, standards and best practices to guide us towards consistently producing high quality science and reliable findings.

Read More

Low-Code AI: Making AI accessible to everyone | Mathworks

Thumbnail for Low-Code AI: Making AI accessible to everyone | Mathworks

Learn how you can apply AI in your field without extensive knowledge in programming. This hands-on session includes a quick recap on the fundamentals of AI and two exercises where you will learn how to classify human activities using MATLAB® interactive tools and apps:

– Accessing and preprocessing data acquired from a mobile device
– Classifying the labeled data using two apps: The Classification Learner app and the Deep Network Designer app

At the end of the workshop, you will be able to design and train different machine learning and deep learning models without extensive programming knowledge. In addition, you will also learn how to automatically generate code from the interactive workflow. This will not only help you to reuse the models without manually going through all the steps but also to learn programming or advance your coding skills.

This workshop was conducted by Gaby Arellano Bello and Neha Sardesai, Senior Application Engineers in Education at Mathworks.

Access resources for this workshop: https://bit.ly/low_code_ai_resources

Read More

Introduction to Explainable AI | Supreet Kaur

Thumbnail for Introduction to Explainable AI | Supreet Kaur

Responsible AI is reaching new heights these days. Companies have started exploring Explainable AI as a means to explain the results better to senior leadership and increase their trust in AI Algorithms. This workshop will entail an overview of this area, importance of it in today’s era, and some of the practical techniques that you can use to implement it. As a bonus, it will also cover some industry use cases and limitations of these techniques. Join me in unboxing this black box!

This workshop was conducted by Supreet Kaur, Assistant Vice President at Morgan Stanley.

Slides for this workshop: https://bit.ly/explainableai_slides

Read More

Baby steps towards building your first ML model | Manogna Mantripragada

Thumbnail for Baby steps towards building your first ML model | Manogna Mantripragada

This workshop aims to enable young data scientists to start their first ML project. It would help them understand the process from gathering data to building their ML model. Building an ML model is easy, but building it the correct way is a lot harder than known.

This workshop was conducted by Manogna Mantripragada, Data Scientist at Greenlink Analytics.

Access resources for this workshop: https://bit.ly/energy_burden_analysis…

Read More

Exploratory data analysis using personal data from Strava and Apple Watch | Deepnote

Thumbnail for Exploratory data analysis using personal data from Strava and Apple Watch | Deepnote

During the workshop, we show a simple exploratory data analysis using Deepnote. We will focus on personal data from Camino de Santiago pilgrimage which we retrieved from our Strava API and show you how to get it from your own device. Using this data we explain a theory about Exploratory Data Analysis and show some use cases.

This workshop was conducted by Tereza Vaňková and Alleanna Clark of Deepnote.

Resources used in this workshop:
– https://bit.ly/deepnote_notebook
– https://bit.ly/deepnote_slides

Read More

WiDS Regional Event Highlights, June-July 2022

WiDS Regional event attendees of June and July of 2022

The WiDS 2022 season continued throughout the summer, with WiDS ambassadors hosting regional events in Armenia, Tokyo, Delhi, Portugal, and more. As the season nearly comes to an end, we are so proud of each and every WiDS ambassador who has played an active role in bringing WiDS to their local communities, inspiring data scientists, worldwide! In total, almost 200 regional events were held this year in 53 countries.

Read More

Dashboard Design Thinking | Jenn Schilling

Thumbnail for Dashboard Design Thinking | Jenn Schilling

Best practices in data visualization and dashboard design are numerous and sometimes contradictory, but a straightforward method to apply design thinking to creating dashboards is effective and universally applicable. This session will cover the details of design thinking and how it can be applied to dashboard development to create impactful dashboards that meet user needs and provide valuable insights.

This workshop was conducted by Jenn Schilling, Senior Research Analyst at the University of Arizona.

Read More

Exploring Hidden Markov Models | Julia Christina Costacurta

Thumbnail for Exploring Hidden Markov Models | Julia Christina Costacurta

Exploring Hidden Markov Models | Julia Christina Costacurta

Hidden Markov Models (HMMs) are used to describe and analyze sequential data in a wide range of fields, including handwriting recognition, protein folding, and computational finance. In this workshop, we will cover the basics of how HMMs are defined, why we might want to use one, and how to implement an HMM in Python. This workshop might be of particular interest to attendees from May 25’s “Intro to Markov Chains and Bayesian Inference” session. Introductory background in probability, statistics, and linear algebra is assumed.

This workshop was conducted by Julia Christina Costacurta, PhD Candidate at Stanford University

Useful resources for this workshop:
– https://bit.ly/hmm_presentation
– https://bit.ly/hmm_tutorial_notebook

Read More

Alternative approaches to A/B Experiments – 3 Causal Impact Approaches | Jennifer Vlasiu

Thumbnail for Alternative approaches to A/B Experiments - 3 Causal Impact Approaches | Jennifer Vlasiu

Make answering ‘what if’ analysis questions a whole lot easier by learning about state-of-the-art, end-to-end applied frameworks for causal inference.

We will cover:
Microsoft’s “Do Why” Package Causal Impact in Python – DoWhy | An end-to-end library for causal inference — DoWhy | An end-to-end library for causal inference documentation (microsoft.github.io)
Bayesian Causal Impact in R
MLE Causal Impact in Python
Bonus: AA Testing, when to use and why it matters
We will apply these models in the context of understanding the impact of a marketing rewards campaign, as well as understand the impact from a product/feature upgrade

This workshop was conducted by Jennifer Vlasiu, Data Science & Big Data Instructor at York University

Useful resources for this workshop:
– https://bit.ly/github_casual_impact

Read More

Introduction to Deep Learning for Image Classification | Cindy Gonzales

Thumbnail for Introduction to Deep Learning for Image Classification | Cindy Gonzales

Image classification is a task in the Computer Vision domain that takes in an image as input and outputs a label for that image. Deep learning is the most effective modern method for modeling this task. In this interactive workshop, we will walkthrough a Jupyter Notebook which will overview how to perform multi-class image classification in Python using the PyTorch library. The intention is to give the audience a broad overview of this task of classification and inspire participants to explore the vast fields of visual recognition and computer vision at large.

This workshop was conducted by Cindy Gonzales, Data Science Team Lead for the Biosecurity and Data Science Applications Group at Lawrence Livermore National Laboratory

Useful resources for this workshop:
– https://bit.ly/deep_learning_files
– https://bit.ly/deep_learning_notebook

Read More

Applying Data Science for Good

Photographs of Newsha Ajami, Denice Ross, Andrea Gaaliano, Nadia Fawaz, Sherrie Wang, and Maria Gargiulo. With WiDS branded illustrations

​The pandemic has changed the way that people think about their lives, and their work. A recent survey conducted by Gartner suggests that instead of using the term ‘great resignation’, organizations need to think about the ‘great reflection’, as employees are seeking more purpose in their work.

Read More

Counterfactual Explanations: The Future of Explainable AI | Aviv Ben Arie

Thumbnail for Counterfactual Explanations: The Future of Explainable AI | Aviv Ben Arie

As data scientists, the ability to understand our models’ decisions is important, especially for models that could have a high impact on people’s lives. This may pose several challenges, as most models used in the industry are not inherently explainable. Today, the most popular explainability methods are SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-Agnostic Explanation). Each method offers convenient APIs, backed by solid mathematical foundations, but falls short in intuitiveness and actionability.

In this workshop/article, I will introduce a relatively new model explanation method – Counterfactual Explanations (CFs). CFs are explanations based on minimal changes to a model’s input features that lead the model to output a different (mostly opposite) predicted class. CFs have been shown to be more intuitive for humans to comprehend and provide actionable feedback, compared to traditionalSHAP and LIME methods. I will review the challenges in this novel field (such as how to ensure that the CF proposes changes which are feasible), provide a birds-eye view of the latest research and give my perspective, based on my research in collaboration with Tel Aviv University, on the various aspects in which CFs can transform the way data science practitioners understand their ML models.

This workshop was conducted by Aviv Ben Arie, Data Science Manager at Intuit

Read More

Effective & Ideal Data Presentation using Visualization Techniques & Lucid Perceptions | Pariza

Thumbnail for Effective & Ideal Data Presentation using Visualization Techniques & Lucid Perceptions | Pariza

Research proves that the human brain processes visualizations better than text. And data visualizations prove that further.

Data visualization is the last phase in the data life cycle. It is the art and science of making data easy to understand and consume for the end user. Data visualizations present clusters of data in an easy-to-understand layout and that’s the reason it becomes mandatory for large amounts of complex data. Ideal data visualization shows the right amount of data, in the right order, in the right visual form, to convey the high priority information to the right audience and for the right purpose. If the data is presented in too much detail, then the consumer of that data might lose interest and the insight.

There are innumerable types of visual graphing techniques available for visualizing data. The right visualization arises from an understanding of the totality of the situation in context of the business domain’s functioning, consumers’ needs, nature of data, and the appropriate tools and techniques to present data. Ideal data visualization should tell a true, complete and simple story backed by data effectively, while keeping it insightful and engaging.

This workshop was conducted by Pariza Kamboj, Professor at Sarvajanik College of Engineering & Technology (SCET).

Useful resources for this workshop:
– Workshop #1: https://youtu.be/lRBuknaPRNI
– Jupyter code: https://bit.ly/jupyter_notebook2
– https://bit.ly/cars3_data
– https://bit.ly/execution_google_colab
– https://bit.ly/anaconda_installation_…

Read More

Intro to Markov Chains and Bayesian Inference | Mackenzie Simper

Thumbnail for Intro to Markov Chains and Bayesian Inference | Mackenzie Simper

Markov chains are a special type of random process which can be used to model many natural processes. This workshop will be a gentle introduction to Markov chains, giving basic properties and many examples. The second part of the workshop will focus on one specific application of Markov chains to data science: Sampling from posterior distributions in Bayesian inference. Introductory background in probability, statistics, and linear algebra is assumed.

This workshop was conducted by Mackenzie Simper, PhD Student at Stanford University.

Slides for this workshop: https://bit.ly/markov_chains_ppt

Read More

Open-sourced Propensity Model Package: From Modeling to Activation (Workshop #2) | Google

Thumbnail for Open-sourced Propensity Model Package: From Modeling to Activation (Workshop #2) | Google

A propensity model attempts to estimate the propensity (probability) of a behavior (e.g., conversion, churn, purchase, etc.) happening during a well-defined time period into the future based on historical data. It is a widely used technique by organizations or marketing teams for providing targeted messages, products or services to customers. This workshop shares an open-sourced package developed by Google, for building an end-to-end Propensity Modeling solution using datasets like GA360, Firebase or CRM and using the propensity predictions to design, activate and measure the impact of a media campaign. The package has enabled companies from e-commerce, retail, gaming, CPG and other industries to make accelerated data-driven marketing decisions.

This workshop was conducted by Lingling Xu, Bingjie Xu, Shalini Pochineni and Xi Li, data scientists on the Google APAC team.

Useful resources for this workshop:
– Workshop #1: https://youtu.be/rQhQca8RCuM
– https://bit.ly/propensity_modeling_pa…
– https://bit.ly/bigquery_export_schema
– https://bit.ly/ga_sample_dataset
– https://bit.ly/ml_windowing_pipeline

Read More

WiDS 2022 Highlights

Thumbnail for WiDS 2022 Highlights

The WiDS Worldwide conference took place in March 2022, held in-person at Stanford University and online. The conference featured keynotes, technical talks, panel discussions, and more. You’ll want to experience the energy in the room, hearing from data science thought leaders and conference attendees.

Read More

How Natural Language Processing is Changing How We Interact with Computers

Photographs of Dora Demszky, Jingwen Lu, Riyanka Bhowal, Rama-Akkiraiu, Vidya Setlur, and Toluiloòpeòi Olguinredlmil. With WiDS branded illustrations

Whether it’s asking Siri about the weather, Google for directions, or a customer service bot about your bank account, Natural Language Processing (NLP) is an expected capability across applications today. In WiDS Conference keynotes, workshops and technical talks, experts explore various uses for NLP and how it is shaping how people interact with computers.

Read More

WiDS Datathon Impact 2022

Three of WiDS Datathon participants working on the computer together.

The WiDS Datathon encourages women to hone their data science skills through an annual challenge focused on social impact, bringing people together across borders to collaborate on teams. This year, the datathon was held on Kaggle from January through the end of February, with over 4,000 participants registered from 95 countries, submitting over 25,000 Kaggle entries. The challenge focused on mitigating the effects of climate change with a focus on energy efficiency, with data provided by partners at Climate Change AI, Lawrence Berkeley National Laboratory, the US Environmental Protection Agency, and MIT Critical Data.

Read More

Predicting customer choice: A case study on integrating AI within a discrete choice model | Kathryn

Thumbnail for Predicting customer choice: A case study on integrating AI within a discrete choice model | Kathryn

Neural networks have been widely celebrated for their power to solve difficult problems across a number of domains. We explore an approach for leveraging this technology within a statistical model of customer choice. Conjoint-based choice models are used to support many high-value decisions at GM. In particular, we test whether using a neural network to model customer utility enables us to better capture non-compensatory behavior (i.e., decision rules where customers only consider products that meet acceptable criteria) in the context of conjoint tasks. We find the neural network can improve hold-out conjoint prediction accuracy for synthetic respondents exhibiting non-compensatory behavior only when trained on very large conjoint data sets. Given the limited amount of training data (conjoint responses) available in practice, a mixed logit choice model with a traditional linear utility function outperforms the choice model with the embedded neural network.

This workshop was conducted by Kathryn Schumacher, Staff Researcher in the Advanced Analytics Center of Expertise within General Motor’s Chief Data and Analytics Office.

Read More

Open-sourced Propensity Model Package: Accelerating Data-Driven Decisions (Workshop #1) | Google

Thumbnail for Open-sourced Propensity Model Package: Accelerating Data-Driven Decisions (Workshop #1) | Google

A propensity model attempts to estimate the propensity (probability) of a behavior (e.g., conversion, churn, purchase, etc.) happening during a well-defined time period into the future based on historical data. It is a widely used technique by organizations or marketing teams for providing targeted messages, products or services to customers. This workshop shares an open-sourced package developed by Google, for building an end-to-end Propensity Modeling solution using datasets like GA360, Firebase or CRM and using the propensity predictions to design, activate and measure the impact of a media campaign. The package has enabled companies from e-commerce, retail, gaming, CPG and other industries to make accelerated data-driven marketing decisions.

This workshop was conducted by Lingling Xu, Bingjie Xu, Shalini Pochineni and Xi Li, data scientists on the Google APAC team.

Useful resources for this workshop:
– https://bit.ly/github_propensity_mode…
– https://bit.ly/bigquery_export_schema
– https://bit.ly/ga_sample_dataset
– https://bit.ly/ml_windowing_pipeline

Read More

Basic to Intermediate Level SQL | Sreelaxmi Chakkadath

Thumbnail for Basic to Intermediate Level SQL | Sreelaxmi Chakkadath

The workshop would focus on the basic to intermediate levels of SQL. We will start with querying a database, using filters to clean the data. Joining different tables. Aggregate functions and use of ‘CASE WHEN’ for better query performances. Subqueries and Common Table Expressions (CTEs) and a comparison between them. Use of window functions. Lead and lag functions and the scenarios when they can be used. Pivot tables and when not to use them!

This workshop was conducted by Sreelaxmi Chakkadath, Data Science Master’s student at Indiana University Bloomington.

Useful resources for this workshop:
– PostgreSQL install link: https://www.postgresql.org/
– https://bit.ly/sql_workshop_script
– https://bit.ly/sql_workshop_codes
– https://bit.ly/sql_ppt_slides

Read More

Demystifying Data Pre-processing & Data Wrangling for Data Science | Pariza Kamboj

Thumbnail for Demystifying Data Pre-processing & Data Wrangling for Data Science | Pariza Kamboj

In the current era, Data Science is rapidly evolving and proving very decisive in ERP (Enterprise Resource Planning). The dataset required for building the analytical model using data science, is collected from various sources such as Government, Academic, Web Scraping, API’s, Databases, Files, Sensors and many more. We cannot use such real-world data for analysis process directly because it is often inconsistent, incomplete, and more likely to contain bulk errors. We often hear the phrase “garbage in, garbage out”. Dirty data or messy data riddled with inaccuracies and errors, result in a bad/improperly trained model which in turn might result in poor business decisions and sometimes even hazardous to the domain. Any powerful algorithm is failed in providing correct analysis when applied to bad data. Therefore, data must be curated, cleaned and refined to be used in data science and products based on data science. To perform these tasks, “Data Preparation” is required which includes two methods that are: Data Pre-processing, and Data Wrangling. Most data scientists spend the majority of their time in data preparation.

This workshop was conducted by Pariza Kamboj, Professor at Sarvajanik College of Engineering & Technology (SCET).

Useful resources for this workshop:
– https://bit.ly/jupyter_code
– https://bit.ly/cars3_dataset
– https://bit.ly/execution_google_colab
– https://bit.ly/anaconda_installation_…

Read More

Tolúlọpẹ́ Ògúnrẹ̀mí: From WiDS London to Stanford PhD

Portrait of Tolúlọpẹ́ Ògúnrẹ̀mí

Tolúlọpẹ́ Ògúnrẹ̀mí’s involvement with WiDS began as a WiDS London ambassador, working with Stanford prior to getting accepted to the Stanford Computer Science (CS) PhD program. She has been able to combine a love of languages with her ability in mathematics and computer science and is now focused on Natural Language Processing (NLP) for low-resource languages, particularly in sub-Saharan African languages. Tolúlọpẹ́ came full circle with WiDS, attending and volunteering for the WiDS Worldwide event in 2022.

Read More

How can we make sense of the unseen world? Using AI, sensors & IoT for scene exploration | Mathworks

Thumbnail for How can we make sense of the unseen world? Using AI

Have you wondered about being able to detect buried objects? Do you think your mobile device can be used to detect these buried objects? Metal is all around us and is often not seen but buried. The detection of metal is in many places on Earth. In fact the detection of metal is connected to a variety of applications such as: to provide insight regarding land use, detection of historic artifacts, determine the presence of various devices, and more.

In our workshop, we will explore using your own mobile device as a metal detector in your local environment. During this workshop we will provide an overview of the basics of sensors, AI, and IoT which will be required for building a prototype of our application. We’ll do hands-on exercises where you will acquire data from sensors, obtain summary statistics on the acquired data, and train a human activity classifier to understand what was done while data was being collected. We will also have an engaged discussion regarding topics to be mindful of with respect to this application such as considerations regarding the collection and usage of location data. You will leave motivated and ready to use sensors, AI, and IoT in your own projects via MATLAB!

Workshop presenters:
– Louvere Walker-Hannon, Application Engineering Senior Team Lead, MathWorks
– Loren Shure, Consulting Application Engineer, MathWorks
– Sarah Mohamed, Senior Software Engineer, MathWorks
– Shruti Karulkar, Quality Engineering Manager, MathWorks

Read More

Data Storytelling for Data Scientists | Hana M.K

Thumbnail for Data Storytelling for Data Scientists | Hana M.K

Workshop presented by Hana M.K., Data Storytelling and Presentation Instructor Host of “The Art of Communicating Data” show.

As humans, we enjoy stories. But as data practitioners, we sometimes forget that we need a compelling data story to accompany our work when sharing with others. In this workshop you’ll learn why it’s necessary for data scientists to also be data storytellers and how to craft a data story.

Read More

Catie Cuan | Stanford University | WiDS 2022

Thumbnail for Catie Cuan | Stanford University | WiDS 2022

Catie Cuan, PhD student, Stanford University

Catie is currently a PhD Candidate in the Mechanical Engineering department at Stanford University, where she recently completed a Master�s of Science in Mechanical Engineering in Spring, 2020. Her artistic and research work focuses on dance and robotics.

Read More

A Turing Test for Chest Radiology AI | Tanveer Syeda-Mahmood | IBM | WiDS 2022

Tanveer Syeda-Mahmood, IBM Fellow, IBM Research Center, presents at Technical Vision Talk at the WiDS Worldwide conference.

Chest radiographs are the most common imaging exams in hospitals and clinics, comprising 60% of x-rays in the US. They are also one of the hardest to interpret due to their low resolution in reflecting 2D projections of 3D volumes, and cognitive biases leading to interpretation errors. AI assistance with automated preliminary reads can expedite clinical workflows, reduce bias and increase diagnostic throughput of radiologists.

Read More

Opening Address | Debra Satz | Stanford University | WiDS 2022

Thumbnail for Opening Address | Debra Satz | Stanford University | WiDS 2022

Debra Satz, Dean of the School of Humanities and Sciences, Stanford University, delivers the Opening Address at the WiDS Worldwide conference.

Debra is the Vernon R. and Lysbeth Warren Anderson Dean of the School of Humanities and Sciences at Stanford University, the Marta Sutton Weeks Professor of Ethics in Society, Professor of Philosophy, and, by courtesy, Political Science.

Read More

Keynote: The Rigorous and Human Life of Data | Cecilia Aragon | University of Washington

Thumbnail for Keynote: The Rigorous and Human Life of Data | Cecilia Aragon | University of Washington

Cecilia Aragon, Professor, Human Centered Design & Engineering, University of Washington, presents a Keynote at the WiDS Worldwide conference.

Very often, the words ‘rigorous’ and ‘human-centered’ have been used as opposites in technical fields, with the implication that a focus on human aspects makes science ‘soft’ or ‘insufficiently technical’. This is a false dichotomy that Cecilia will argue in this talk.

While extraordinary advances in our ability to collect, analyze, and interpret vast amounts of data have been transforming the fundamental nature of data science, the human aspects of data science, including how to support scientific creativity and human insight, how to address ethical concerns, and the consideration of societal impacts, have been less studied. Yet these human issues are becoming increasingly vital to the future of data science. Cecilia will reflect on a 30-year career in data science in industry, government, and academia, discuss what it means for data science to be both rigorous and human-centered, and speculate upon future directions for data science.

Read More

Estimating Undocumented Human Rights Violations in Conflict Settings | Maria Gargiulo | HRDAG

Thumbnail for Estimating Undocumented Human Rights Violations in Conflict Settings | Maria Gargiulo | HRDAG

Maria Gargiulo, Statistician, Human Rights Data Analyst Group, presents a Technical Vision Talk at the WiDS Worldwide conference.

Collecting data on human rights violations in conflict settings is difficult and dangerous, and the data that results is often incomplete on multiple levels. Some victims� stories are never recorded, and those whose stories are documented may still be missing critical information about the victim, the perpetrator, or other contextual details about the violation. Furthermore, the data that is documented may not be statistically representative of the victim population as a whole. Drawing population-level inferences from this data without correcting for the missingness risks incorrectly answering questions about patterns of violence.

This talk will demonstrate how multiple systems estimation and multiple imputation can be used together to address both levels of missingness in order to draw population level inferences that are statistically valid and include a measure of uncertainty.

Read More

Panel: Data Science in Healthcare: Opportunities & Challenges | WiDS 2022

Thumbnail for Panel: Data Science in Healthcare: Opportunities & Challenges | WiDS 2022

WiDS Worldwide panel: Data Science in Healthcare: Opportunities & Challenges

Moderated by Tina Hernandez Boussard, Associate Professor, Stanford University

Panelists:
– Sylvia K. Plevritis, Chair of Biomedical Data Science, Stanford University
– Tanveer Syeda-Mahmood, IBM Fellow, IBM Research Center
– Jinoos Yazdany, Chief of Rheumatology, Zuckerberg San Francisco General Hospital

Read More

Beyond Bias: Algorithmic Unfairness, Infrastructure and Genealogies of Data | Alex Hanna | WiDS 2022

Thumbnail for Beyond Bias: Algorithmic Unfairness

Alex Hanna, Director of Research, DAIR Institute, presents a Technical Vision Talk at the WiDS Worldwide conference.

Problems of algorithmic bias are often framed in terms of lack of representative data or formal fairness optimization constraints to be applied to automated decision-making systems. However, these discussions sidestep deeper issues with data used in AI, including problematic categorizations and the extractive logics of crowd work and data mining.

In this talk Alex will make two interventions: first by reframing of data as a form of infrastructure, and as such, implicating politics and power in the construction of datasets; and secondly discussing the development of a research program around the genealogy of datasets used in machine learning and AI systems.

Read More

Confronting Data Bias in Travel Demand Modeling | Tierra Bills | UCLA | WiDS 2022

Thumbnail for Confronting Data Bias in Travel Demand Modeling | Tierra Bills | UCLA | WiDS 2022

Tierra Bills, Assistant Professor of Civil and Environmental Engineering and Public Policy, UCLA, presents a Technical Vision Talk at the WiDS Worldwide conference.

Should regions invest in more buses on transit routes, or new bus routes to provide greater transportation accessibility for vulnerable communities? What mix of transportation improvements will offer the greatest boost in accessibility for travelers who most need it? Such questions can be addressed using travel demand analysis tools.

This presentation will summarize various biases in travel data that arise due to underrepresentation of vulnerable populations, how they may come to be, and how such biases can influence travel modeling outcomes.

Read More

Panel: Algorithms and Data for Equity | WiDS 2022

Thumbnail for Panel: Algorithms and Data for Equity | WiDS 2022

WiDS Worldwide panel: Algorithms and Data for Equity

Moderated by Jenny Suckale, Associate Professor, Stanford University

Panelists:
– Tierra Bills, Assistant Professor of Civil and Environmental Engineering and Public Policy, UCLA
– Jessica Granderson, Director for Building Technology, White House Council on Environmental Quality
– Ling Jin, Research Scientist, Lawrence Berkeley National Laboratory

Read More

Keynote: What makes intelligent visual analytics tools really intelligent? | Vidya Setlur | Tableau

Thumbnail for Keynote: What makes intelligent visual analytics tools really intelligent? | Vidya Setlur | Tableau

Vidya Setlur, Director of Tableau Research, Tableau, presents a Keynote at the WiDS Worldwide conference.

In this keynote, Vidya will discuss how natural language can be leveraged in various aspects of the analytical workflow ranging from smarter data transformations, visual encodings, autocompletion to supporting analytical intent, to conversational interfaces. With a better understanding of how users explore data in their flow of analysis, can people doing analysis be supported by more intelligent tools? In this keynote, we will explore this question.

Read More

Inclusive Search and Recommendations | Nadia Fawaz | Pinterest | WiDS 2022

Thumbnail for Inclusive Search and Recommendations | Nadia Fawaz | Pinterest | WiDS 2022

Nadia Fawaz, Senior Staff Applied Research Scientist – Tech Lead Inclusive AI at Pinterest, presents a Technical Vision Talk at the WiDS Worldwide conference.

Through this tech talk one can gain knowledge of how machine learning technologies are paving the way for more inclusive inspirations in Search and in our augmented reality technology Try-On, and are also driving advances for more diverse recommendations across the platform. Developing inclusive AI in production requires an end-to-end iterative and collaborative approach.

Read More

A Mathematician’s View of Machine Learning (and Why It Matters) | Tamara Kolda | MathSci.ai

Thumbnail for A Mathematician's View of Machine Learning (and Why It Matters) | Tamara Kolda | MathSci.ai

Tammy Kolda, Mathematical Consultant at MathSci.ai, presents a Technical Vision Talk at the WiDS Worldwide conference.

A (trained) machine learning model, such as a deep neural network, operates loosely as follows: it takes features as an input and produces a classification as an output. Watch Tammy argue that �more data� and �bigger models� are not a panacea, and instead develop mathematical methodology for understanding how to move beyond the current limits of machine learning.

Read More

Andrea Gagliano | Getty Images | WiDS 2022

Thumbnail for Andrea Gagliano | Getty Images | WiDS 2022

Andrea Gagliano, Head of Data Science, AI/ML, Getty Images, talks about the importance of representation in images and videos, and how her work helps to make the technology sector more inclusive.

Andrea Gagliano is an artist and technologist who uses her practice to ask questions and provoke conversation around artificial intelligence, machine learning, and data in society and culture. She is the Head of Data Science at Getty Images where her team is responsible for building ML/AI capabilities for visual search and discovery.

Read More

Career Panel | WiDS 2022

Thumbnail for Career Panel | WiDS 2022

WiDS 2022 Career Panel

Moderated by Suzanne Weekes, Executive Director, SIAM

Panelists:
– Cecilia Aragon, Professor, Human Centered Design & Engineering, University of Washington
– Sharon Hutchins, VP & Chief of Operations, Intuit AI+Data
– Tamara Kolda, Mathematical Consultant, MathSci.ai
– Maggie Wang, Robotics Software Engineer, Skydio

Read More

Skydio Autonomy: Data-Driven Approaches Towards Real-Time 3D Reconstruction in Drones | Maggie Wang

Thumbnail for Skydio Autonomy: Data-Driven Approaches Towards Real-Time 3D Reconstruction in Drones | Maggie Wang

Maggie Wang, Robotics Software Engineer at Skydio, presents a Technical Vision Talk at the WiDS Worldwide conference.

Skydio is the leading US drone company and the world leader in autonomous flight. Our drones are used for everything from capturing amazing video, to inspecting bridges, to tracking progress on construction sites. Using six 4K navigational cameras, our drones create a 3D model of its surroundings that updates at a rate of over one million data points per second, and runs up to nine deep neural networks to predict into the future.

In this talk, Maggie will discuss how data-driven processes are used in Skydio 3D Scan, a revolutionary adaptive scanning software that enables Skydio drones to autonomously generate 3D models with comprehensive coverage and ultra-high resolution.

Read More

Unsupervised Learning for Network Intrusion Detection | Nandi Leslie | Raytheon | WiDS 2022

Thumbnail for Unsupervised Learning for Network Intrusion Detection | Nandi Leslie | Raytheon | WiDS 2022

Nandi Leslie, Engineering Fellow at Raytheon Technologies, presents a Technical Vision Talk at the WiDS Worldwide conference.

For nearly 40 years, computer scientists and engineers have been concerned with the problem of monitoring networks for unauthorized activities. More recently, anomaly-based intrusion detection systems have been developed to protect enterprise and mobile networks from such attacks. Nonetheless, in-vehicle networks remain vulnerable to a variety of remote attacks that erode information confidentiality, availability, and integrity.

In this talk, Nandi develops an ensemble hierarchical agglomerative clustering (E-HAC) algorithm for detecting remote attacks on the CAN bus. E-HAC is an ensemble learning approach over multiple clustering algorithms with different linkages and pairwise distances between observations. In addition, she presents prediction performance results for a dataset consisting of CAN bus and remote attack network traffic to demonstrate the effectiveness of this E-HAC algorithm.

Read More

Replication, Robustness and Interpretability: Improving How We Communicate Scientific Findings | Chiara Sabatti

Thumbnail for Replication

Chiara Sabatti, Professor of Biomedical Data Science and Statistics at Stanford University, presents a Technical Vision Talk at the WiDS Worldwide conference.

In a world where large comprehensive datasets are readily available in digital form, scientists engage in data analysis before formulating precise hypotheses, with the goal of exploring and identifying tantalizing patterns. In this talk Professor Sabatti will help us review some classical approaches to quantifying the strength of evidence, identify some of their limitations, and explore novel proposals. We will underscore the connections between clear, precise reporting of scientific evidence and �social good�.

Read More

Using Wearable Data to Empower Individuals in Managing Their Health | Torey Lee | WHOOP | WiDS 2022

Thumbnail for Using Wearable Data to Empower Individuals in Managing Their Health | Torey Lee | WHOOP | WiDS 2022

Torey Lee, Senior Data Scientist, WHOOP, presents a Technical Vision Talk at the WiDS Worldwide conference.

The COVID-19 pandemic has catalyzed a shift in healthcare; people are demanding more personal, engaging, and holistic care. Torey discusses the broad capabilities of wearable devices, explore a framework for detecting COVID-19 using an optical sensor, and examine how wearable data empowered one individual in seeking treatment for an illness.

Read More

WiDS Educational Outreach 2022

Thumbnail for WiDS Educational Outreach 2022

The WiDS Educational Outreach program aspires to take data science to secondary school students. Through the program we strive to educate and inspire young minds by facilitating relevant courses and paths to consider future careers involving data science, artificial intelligence (AI) and other related areas.

Watch this video to learn of the Education Outreach collaborations with schools around the world from Hyderabad, India to Dar es Salaam, Tanzania, and more.

Read More

WiDS Datathon 2022

Thumbnail for WiDS Datathon 2022

The WiDS Datathon is an initiative to provide a platform for data science enthusiast to learn, apply and hone their data science skills through the social impact challenges presented to them. Participants are trained and mentored by partners, ambassadors, and data enthusiasts.

Watch how the WiDS Datathon has evolved over the past fours years and an insight on the 2022 challenge that was focused on climate change.

Read More

WiDS 2022 Opening Video

Thumbnail for WiDS 2022 Opening Video

Welcome to the 7th annual WiDS Worldwide Conference. WiDS Co-directors Margot Gerritsen, Karen Matthys, and Judy Logan highlight WiDS Worldwide global initiatives and impact. WiDS aims to achieve at least 30% of women across all domains and levels by 2030. Join us in reaching WiDS 30×30!

Read More

LIVE: Women in Data Science (WiDS) Worldwide Conference 2022

Thumbnail for LIVE: Women in Data Science (WiDS) Worldwide Conference 2022

Join us online on March 7, 2022, for the Women in Data Science (WiDS) Worldwide conference, a technical conference featuring outstanding women doing exceptional work in data science and related fields, in a wide variety of domains. Everyone is welcome and encouraged to attend. Broadcasted LIVE from Stanford University 8am – 5pm PST.

Read More

A Beginner’s Tutorial for the WiDS Datathon 2022 challenge

two methodologies that you might consider when deciding how to develop a model.

Climate change is one of the critical challenges facing humanity today. Over the past few years, there have been widespread climate-driven disruptive events such as floods and wildfires. The devastation caused by these events has resulted in an awareness of the urgency of the issue. Indeed, people and governments have started working together in the direction of climate-focused coordinated action. At WiDS, we believe that it will be important for future data scientists to gain familiarity with mathematical and statistical models used to model climate data. For this reason, the focus of the WiDS Datathon this year is a climate-focused challenge: prediction of building energy consumption.

Read More

Cindy Orozco Bohorquez: From Bogotá, to Stanford PhD, to WiDS Worldwide Speaker

Portrait of Cindy Orozco Bohorquez

​Cindy Catherine Orozco Bohorquez started her PhD at Stanford’s Institute for Computational and Mathematical Engineering (ICME) the same year that WiDS launched its first conference. Her involvement with WiDS started as a shy volunteer and evolved to fulfilling her dream of becoming a speaker at global and regional WiDS conferences, a WiDS workshop instructor, and a member of a thriving community of women data scientists.

Read More

Women Using Data Science to Build a More Sustainable World

Photographs of Ma Xin, Sherrie Wang, Lesly Goh, Nida Rizwan Farid, Rosalind Archer, and Newsha Ajami. With WiDS branded illustrations in the background.

Data science is a crucial tool to quantify, predict and communicate about the impact of climate change. For example, a recent study used machine learning to analyze over 100,000 weather events that could be linked to global warming and discovered that 80 percent of the earth’s land has been adversely impacted and at least 85 percent of the world’s population has been affected by extreme weather events caused by climate change.

The Women in Data Science (WiDS) Conference and podcast series has been showcasing leading women data science experts who are using data science to help us understand the impact and potential solutions to combat climate change. Our WiDS Datathon 2022 dataset will also be focused on the impacts of climate change.

Here are some recent podcasts, panels, and talks from our WiDS conferences that address sustainability.

Read More

First Time Kaggle Participant Team Earns WiDS Datathon Excellence in Research Honorary Mention

The WiDS Datathon Excellence in Research Award is an opportunity for WiDS Datathon teams to write papers about their research. The papers are judged on their potential for real-world impact, rigor in scientific methodology, and clarity of communication. Team Parameters Patrol from the United States and the United Kingdom with teammates Natalie Pirkola, Elena Barbulescu, Stacy Forsyth and Kate Tereshchenko won an Honorary Mention for their research paper, Feature Engineering to Improve Performance.

Read More

An Introduction to Time Series Forecasting | Walmart

Thumbnail for An Introduction to Time Series Forecasting | Walmart

Forecasting using time series data is a hot topic of research and is applied to a variety of use-cases to make important decisions – wherever there are changes with time (seasonal or trend) such as e-commerce orders, stock market prices, weather prediction, demand and usage of products, etc. This workshop will cover time series analysis that attempts to understand the nature of the series and is useful for future forecasting along with the overview of popular forecasting models such as ARIMA, SMA, SES, Prophet followed by a case-study walk-through.

This workshop was conducted by Apurva Sinha & Sinduja Subramaniam at Walmart Global Tech.

Read More

Adapting to Climate Change Bit by Bit w/Planetary Health Informatics & Machine Learning, Sara Khalid

Thumbnail for Adapting to Climate Change Bit by Bit w/Planetary Health Informatics & Machine Learning

Living through a pandemic in the era of climate change it can be easy to sense doom and gloom. Yet living in the era data science, for the machine learning community there has not been a better time to act than now. This talk will introduce the audience to planetary health and some of the most pressing issues facing us (and our planet), cover a review of the state-of-the-art in artificial intelligence and data science methods in planetary health informatics and present a summary of the latest research, and finally highlight opportunities for budding and experienced data scientists in this rapidly growing and pertinent field.

This workshop was conducted by Sara Khalid, University Research Lecturer and Senior Research Associate at University of Oxford.

Read More

AI & Neuroscience: Combining Real-Time Brain Imaging and Machine Learning | Romy Lorenz

Thumbnail for AI & Neuroscience: Combining Real-Time Brain Imaging and Machine Learning | Romy Lorenz

Cognitive neuroscientists are often interested in broad research questions, yet use overly narrow experimental designs by considering only a small subset of possible experimental conditions. This limits the generalizability and reproducibility of many research findings. In this workshop, I present an alternative approach, “The AI Neuroscientist”, that resolves these problems by combining real-time brain imaging with a branch of machine learning, Bayesian optimization. Neuroadaptive Bayesian optimization is an active sampling approach that allows to intelligently search through large experiment spaces with the aim to optimize an unknown objective function. It thus provides a powerful strategy to efficiently explore many more experimental conditions than is currently possible with standard brain imaging methodology. Alongside methodological details on non-parametric Bayesian optimization using Gaussian process regression, I will present results from a clinical study where we applied the method to map cognitive dysfunction in stroke patients. Our results demonstrate that this technique is both feasible and robust also for clinical cohorts. Moreover, our study highlights the importance of moving beyond traditional ‘one-size-fits-all’ approaches where patients are treated as one group. Our approach can be combined with brain stimulation or other therapeutics, thereby opening new avenues for precision medicine targeting a diverse range of neurological and psychiatric conditions.

In this workshop, we focus on temporal domain from perspective of both traditional recommender systems and deep neural networks. We first start with the classic latent factor model. We introduce temporal dynamics in the latent factor model and show how this improves performance. We then move into sequential modelling using deep neural networks by presenting state-of-the-art in the field and discuss the advantages and disadvantages.

This workshop was conducted by Romy Lorenz, Postdoctoral Fellow at Stanford University and University of Cambridge

Read More

How do I get started with Machine Learning? | Mathworks

Thumbnail for How do I get started with Machine Learning? | Mathworks

Data Science workflows typically entail using Machine Learning.

Machine Learning can provide insight into various datasets and can assist with automating various types of analysis.

In this workshop you will explore a process for getting started with implementing Machine Learning interactively to train a model to predict tsunami intensity and implement other relevant tasks.

This workshop was conducted by Louvere Walker-Hannon, and Heather Gorr from Mathworks.

Read More

Data Visualization: Turning Information Into Images

Photographs of Miriah Meyer, Fernanda Viégas, Fanny Chevalier, and Nathalie Henry Riche. WIth WiDS branded illustrations in the background.

Data scientists work with large data sets that require computational analysis to gain insights and knowledge that often drive important decisions within organizations in industry, academia, non-profits, and government. In order for these insights to have the desired impact, data scientists need to communicate clearly to be well–and quickly–understood. Data visualization allows data scientists the ability to provide unique views into the data and about the data, turning data sets into insights at-a-glance. ​

Read More

What would we do without Linear Algebra, Part 3: Singular Value Decomposition & Principal Component

Thumbnail for Parallel Computing 101: All you need to know about the hardware that powers data science | WiDS 2021

In this third workshop in linear algebra, we will investigate the link between Principal Component Analysis and the Singular Value Decomposition. Along the way, we are introduced to several linear algebra concepts including linear regression, eigenvalues and eigenvectors and conditioning of a system. We will use shared python scripts and several examples to demonstrate the ideas discussed.

This workshop builds on the previous 2 workshops in linear algebra (Part I and Part II), and we will assume that the linear algebra concepts introduced in those workshops are familiar to the audience. They include: vector algebra (including inner products, angle between vectors), matrix-vector multiplications, matrix-matrix multiplications, matrix-vectors solves, singularity, and singular values.

Links:
1. Code is available for viewers to follow along: https://github.com/lalyman/lin-alg-wo…
2. The covariance matrix is defined for centered X, and the inequality n 1 given is strict.

This workshop was conducted by Laura Lyman, phD student at Stanford University, ICME.

Read More

Pocket AI and IoT, or How to be a Data Scientist using Your Mobile Device | Mathworks

Thumbnail for Do You See What I See: Exploration of Using AI and AR | Mathworks

Want to learn more about trends like AI, IoT and wearable tech? In one hour, we will cut through the hype by building a “smart” fitness tracker using your own mobile device. We’ll do hands-on exercises: you’ll acquire data from sensors, design a step counter and train a human activity classifier. You will leave motivated and ready to use machine learning and sensors in your own projects!

This workshop was conducted by Louvere Walker-Hannon, Shruti Karulkar, & Sarah Mohamed from MathWorks.

Read More

Telling and Sharing Stories | Izzy Aguiar

Thumbnail for Machine Learning for Scientific R&D: Why it's Hard and Why it's Fun | Julia Ling

How can sharing stories help us as a community? How do we learn how to find a story from the events of someone else’s life or our own? How can this relate to our own tendency as data-scientists to connect the dots, to find meaning through patterns? Join us in this WiDS workshop on telling and sharing stories where we will address these questions and learn how our stories are important in shaping the community we want to see in Data Science.

This workshop was conducted by Izzy Aguiar, phD student at Stanford University, ICME.

Read More

Bayesian Machine Learning & Sampling Methods | Walmart

Thumbnail for Actionable Ethics for Data Scientists | Emily Miller

In this workshop, you will learn about the core concepts of BML – how it is different from the frequentist approaches, building blocks of Bayesian inference and what known ML techniques look like in a bayesian set-up. You will also learn how to use various sampling techniques for bayesian inference and why we need such techniques in the first place. The workshop will also provide links and materials to continue your Bayesian journey afterwards.

This workshop is meant as an introduction to select BML modules – we strongly recommend you to continue exploring the world of bayesian once you have taken this first step.

This workshop was conducted by Ashwini Chandrashekharaiah & Debanjana Banerjee at Walmart Global Tech.

Read More

Dealing with Missing Data

Photographs of Fatima Abu Salem, Maria Gargiulo, Madeleine Udell, and Megan Price. With WiDS branded illustrations in the background.

We live in an era of big data with data sets that require computational analysis to gain insights and knowledge. The volume of big data has been increasing steadily, and will only continue to climb. Since we started the WiDS initiative in 2015, Statistica estimates that the volume of data has increased from 15.5 to 74 zetabytes, and they forecast that data volume will double again by 2024.
Yet with all of this data, one of the biggest challenges that data scientists and researchers face is dealing with missing data. In some cases, the missing data is due to not readily having access to the data sets that are required to perform the analysis, while other cases involve data sets that are incomplete and not uniformly populated.

Read More

Recommender Systems | Walmart

Thumbnail for Spelling Correction for 100+ Languages | Jingwen Lu

Recommender systems are playing a major role in e-commerce industry. They are keeping users engaged by recommending relevant content and have a significant role in driving digital revenue.

Following tremendous gains in computer vision and natural language processing with deep neural networks in the past decade, the recent years have seen a shift from traditional recommender systems to deep neural network architectures in research and industry.

In this workshop, we focus on temporal domain from perspective of both traditional recommender systems and deep neural networks. We first start with the classic latent factor model. We introduce temporal dynamics in the latent factor model and show how this improves performance. We then move into sequential modelling using deep neural networks by presenting state-of-the-art in the field and discuss the advantages and disadvantages.

This workshop was conducted by Aleksandra Cerekovic & Selene Xu at Walmart Gobal Tech.

Read More

Do You See What I See: Exploration of Using AI and AR | MathWorks

Thumbnail for Data Analysis for Health

Welcome to the world of artificial intelligence (AI) and augmented reality (AR)! This workshop explains AI and AR via hands on exercises where you will interact with your augmented world. You will learn about applications where the technologies of AI+AR are combined, their limitations, and their impacts in society. You’ll leave armed with code, inspiration, and an ethical framework for your own projects!

Artificial intelligence (AI) is used in a variety of industries for many applications. AI can be combined with other technologies to assist with understanding implications of certain aspects of applications. In this workshop, you explore how pose estimation results implemented using Deep Learning are impacted based on a location which is provided using augmented reality. These combined technologies provide insight into how poses could be interpreted differently based on a scene. This workshop also raises awareness regarding consequences of using AI for applications that are different from its originally intended use, which could lead to both technical and ethical challenges.

Specific topics that will be covered in this workshop are listed below:
• understand how AI and AR can be used for applications
• explore how to implement AI and AR
• discover what tools can be used to implement AI and AR
• review code that implements pose estimation using AI and changing background scenes using AR
• gain guidance regarding challenges to address societal impacts of the results from applications that use AI and AR

In addition to receiving an overview of terminology and an understanding of the workflows for each topic, code will be provided to demonstrate how to implement these workflows with tools from MathWorks.

This workshop was conducted by Louvere Walker-Hannon, Shruti Karulkar, & Sarah Mohamed from MathWorks.

Read More

Graph Theory for Data Science, Part III: Characterizing Graphs in the Real World

Thumbnail for Why we love arrays for data science | Eileen Martin

Graph theory provides an effective way to study relationships between data points, and is applied to everything from deep learning models to social networks. This workshop is part I in a series of three workshops. Throughout the series we will progress from introductory explanations of what a graph is, through the most common algorithms performed on graphs, and end with an investigation of the attributes of large-scale graphs using real data.

And in particular for Part III:
Many of the systems we study today can be represented as graphs, from social media networks to phylogenetic trees to airplane flight paths. In this workshop we will explore real-world examples of graphs, discussing how to extract graphs from real data, data structures for storing graphs, and measures to characterize graphs. We will work with real examples of graph data to create a table of values that summarize different example graphs, exploring values such as the centrality, assortativity, and diameter of each graph. Python code will be provided so that attendees can get hands-on experience analyzing graph data.

This workshop was conducted by Stanford ICME PhD student, Julia Olivieri.

Read More

Responsible Data Science

Photographs of Amanda Obidike, Andrea Martin, Danielle Jiang, Kristian Lum, Mary Gray, Zhamak Dehghan, and Emily Miller. With WiDS branded illustrations in the background

Data science is being applied in a growing number of domains that affect everyone’s lives, in healthcare, financial services, agriculture, resource management, and beyond. While data science has huge potential for good, there are also unintended consequences. Data scientists need to take steps to mitigate as many unintended consequences as they can using Responsible Data Science — a set of policies, procedures, and best practices to ensure algorithmic fairness, transparency, and explainability.

Read More

Collaborators Who Met During 2021 Datathon Win Excellence in Research Award

Photographs side by side of Maya Tadmor-Saghiv and Pavel Vodolazov

Maya Tadmor-Saghiv and Pavel Vodolazov teamed up to win third place in the 2021 WiDS Datathon and first place for the WiDS Datathon Excellence in Research Award for their paper on practices for handling missing data in ICU predictive modeling. They met during the datathon when they decided to join forces and then went on to collaborate on the prize-winning paper: Bridge Over Troubled Data – Practices for Handling Missing Data in Intensive Care Unit Predictive Modeling.

Read More

Graph Theory for Data Science, Part II: Graph Algorithms: Traversing the tree and beyond

Thumbnail for Why I love Linear Algebra

Graph theory provides an effective way to study relationships between data points, and is applied to everything from deep learning models to social networks. This workshop is part II in a series of three workshops. Throughout the series we will progress from introductory explanations of what a graph is, through the most common algorithms performed on graphs, and end with an investigation of the attributes of large-scale graphs using real data.

And in particular for Part II:
Graph-based algorithms are essential for everything from tracking relationships in social networks to finding the shortest driving distance on Google Maps. In this workshop we will explore some of the most useful graph algorithms, from both the breadth-first and depth-first methods for searching graphs, to Kruskal’s algorithm for finding a minimum spanning tree of a weighted graph, to approximation methods for solving the traveling salesman problem. We will use hands-on examples in python to explore the computational complexity and accuracy of these algorithms, and discuss their broader applications.

This workshop was conducted by Stanford ICME PhD student, Julia Olivieri.

Read More

Hands On Deep Learning and IoT Workshop | Mathworks

Thumbnail for Design Thinking for Data Science Problems | Sita Syal

In this workshop, we engage beginner and intermediate participants interested in getting started with Deep Learning and the Internet of Things (IoT). We’ll do hands-on exercises where you’ll use a webcam and a neural network to recognize images, aggregate data, and run real-time IoT analytics. Our goal is to get you excited about IoT and Deep Learning, and to set you up for success with various types of projects for work, school, and beyond.

This workshop was conducted by Louvere Walker-Hannon, Shruti Karulkar, & Sarah Mohamed from MathWorks.

Read More

Natural Language Processing | Riyanka Bhowal, Walmart

Thumbnail for Automating Machine Learning | Madeleine Udell

Natural language processing has direct real-world applications, from speech recognition to automatic text generation, from lexical semantics understanding to question answering. In just a decade, neural machine learning models became widespread, largely abandoning the statistical methods due to its requirement of elaborate feature engineering. Popular techniques include use of word-embeddings to capture semantic properties of words. In this workshop, we take you through the ever-changing journey of neural models while addressing their boons and banes.

The workshop will address concepts of word-embedding, frequency-based and prediction-based embedding, positional embedding, multi-headed attention and application of the same in unsupervised context.

This workshop was conducted by Riyanka Bhowal, Senior Data Scientist at Walmart Gobal Tech.

Read More

WiDS Honors Juneteenth 2021

Photographs of Margot Gerritsen, Karen Matthys, Judy Logan, Danielle Jiang, Rediet Abebe, Kristian Lum, Emily Miller, Dina Machuve, Afua, Bruce, Fernanda Viegas, Cindy Orozco Bohorques, Kalinda Griffiths, Suzy Weekes, and Talitha Washington. With WiDS branded illustrations in the background.

To celebrate Juneteenth 2021, we revisit our pledges, report progress, and renew the commitments we made on Juneteenth last year, in the wake of George Floyd’s death. We reinforced our commitment to extend our outreach to the communities, organizations, universities, and schools that serve Black and underrepresented minority communities.

Read More

Using Natural Language Processing to Analyze US History Textbooks

Thumbnail for An introduction to Data Mesh | Zhamak Dehghani

In this workshop, Dora Demszky, a Stanford PhD student, illustrates how natural language processing (NLP) can be used to answer social science questions. The workshop will focus on applying NLP to analyze the content of 15 US history textbooks used in Texas, to analyze the representation of historically marginalized people and groups.

The workshop is based on a paper (https://journals.sagepub.com/doi/pdf/…) that also has an associated toolkit, and it will provide examples of how this toolkit can be used using a Jupyter notebook that will be made available.

Read More

Graph Theory for Data Science, Part I: What is a graph and What Can We Do With It?

Thumbnail for Evolution of Applied Recommender Systems | Walmart

Graph theory provides an effective way to study relationships between data points, and is applied to everything from deep learning models to social networks. This workshop is part I in a series of three workshops. Throughout the series we will progress from introductory explanations of what a graph is, through the most common algorithms performed on graphs, and end with an investigation of the attributes of large-scale graphs using real data.

And in particular for Part I:
Graphs are structures that represent pairwise connections, and are used for everything from finding the shortest route between two locations to google’s page rank algorithm. Are you interested in learning about graph theory but don’t know where to start? In this workshop we will introduce graphs, develop comfort with their associated terminology, and investigate real-world applications with a focus on intuitive explanations and examples.

This workshop was conducted by Stanford ICME PhD student, Julia Olivieri.

Read More

Pocket AI and loT, or How to be a Data Scientist using your Mobile Device | Mathworks

Thumbnail for Tackling the WiDS Datathon Challenge 2021 | Usha Rengaraju

Want to learn more about trends like AI, IoT and wearable tech? In less than one hour, we will cut through the hype by building a “smart” fitness tracker using your own mobile device.

We’ll do hands-on exercises: you’ll acquire data from sensors, design a step counter and train a human activity classifier. You will leave motivated and ready to use machine learning and sensors in your own projects!

This workshop was conducted by Louvere Walker-Hannon, Shruti Karulkar, & Sarah Mohamed from MathWorks.

Read More

What would we do without Linear Algebra, Part II: Diving Deeper, Singular Value Decomposition

Thumbnail for Data Processing & Statistical Models to Impute Missing Perpetrator Information | HRDAG

Prerequisite: We will assume that you are familiar with the vector and matrix algebra.

This the second workshop devoted to linear algebra, which forms the foundation of many algorithms in data science. In part I of the series we introduced vector and matrix algebra, and briefly looked at the intriguing and ever so useful Singular Value Decomposition (SVD). In this workshop, we will take a deeper into the SVD. We will explain how it is derived, how it can be computed, and also how it is used.

This workshop is taught by Professor Margot Gerritsen and Stanford ICME PhD student, Laura Lyman.

Read More

Recent Stanford Grad Bianca Yu Discusses How Role Models Help Young Women Imagine Their Own Futures

Photograph of Bianca Yu

Bianca Yu, a recent Stanford graduate, talks about the importance of strong women role models in STEM and data science to help young women believe in what they can achieve. Before returning to Stanford to pursue her Master’s degree in Bioengineering, Bianca is helping educate younger women about data science through the WiDS Education Outreach Program.

Read More