With a BS in psychology and minor in math, I fell into the role of database manager and data analyst in previous jobs, until I decided to change my career and pursue formal education in data science. As an older student transitioning careers and first-time attendee, WiDS Berkeley was nothing short of inspiring, so inspiring that I’m writing about it! If my writing moves you, you can check out the Stanford and Berkeley conference recordings.
Since I had a Python midterm in the morning (I aced it :), I missed the beginning of the conference. I caught the last Tech Talk before lunch by Dr. Deb Agarwal, Senior Scientist, Lawrence Berkeley National Lab — Tales From the Front Lines of Wrangling Earth Science Data. She spoke of the application of data science to earth sciences and the importance of data standardization in one of her projects — the AmeriFlux Network. AmeriFlux is a connected network of individually managed sites collecting ecosystem data on carbon dioxide, water, and energy fluxes across the Americas that allows large-scale analysis of major climate and ecological biomes. This was particularly interesting to me because my #1 passion is caving and cave research — I had just completed an internship at Carlsbad Caverns National Park where my main focus was research on anthropogenic carbon dioxide, but I had participated in collecting data for some nationwide studies. These were the exact type of studies for which data standardization was essential to analyze nationwide data coming from individual collecting sites. Dr. Agrawal also said it took over a decade to implement these standards!
My mind was already spinning with the thought that I could combine my two passions — cave science and data science, but then there was the keynote, which opened my mind even further to the incredible applications of data science that I had never previously considered — towards social justice. Prof. Jennifer Chayes, Associate Provost, Dean, School of Information spoke about the newly formed Division of Computing, Data Science, and Society for which she was recruited away from a decades long career at Microsoft. First, she talked about her non-traditional path to her current position, which made me feel better about my status as an older student changing careers. A quarter billion dollar donation was made to the Division for the creation of the Data Hub — an interdisciplinary building meant to house data science and programs including business, social work, sciences, public health, and more! The idea is to create a single space to allow data science to cross disciplinary boundaries with a focus on making data science more ethical and applying it in a way that is tangibly beneficial to society.
I had heard of bias in machine learning algorithms and Professor Chayes explained it with a nifty acronym:
Fairness: unbiased algorithmic recommendations
Accountability: monitor non-discrimination compliance
Transparency: understand algorithm’s logic
Ethics: ethical AI encompasses all of the above
Many data scientists, including myself at this point, do not understand how ML algorithms work. They take a dataset, train a model, and call it a day, and without a true understanding of the algorithms, the data interpretation can be biased or incorrect. However, machine learning neural networks only detect existing patterns and predict those same patterns to emerge. So, if there has been a history of racial discrimination in bank loan practices, the algorithm will detect that and continue to recommend higher interest rates based on race. Even if the protected attribute race is removed as a factor for consideration, as is required by law, there are other race-identifying attributes that the algorithms can detect because the pattern of discrimination is there.
Gender-based hiring discrimination exists outside of STEM fields as well. Professor Chayes gave a poignant example of musicians auditioning for an orchestra — more men were hired than women, so they had blind auditions. Still, more men were hired than women. Even though the protected attribute gender had been removed, the sound of high heels was a gender-identifying attribute strong enough to trigger that bias. Next, they gave all applicants the same shoes, and finally the gender ratio evened out. This same bias exists very strongly in STEM and therefore in HR algorithms, and several examples were provided about discriminatory hiring practices and text-interpreting ML algorithms. As someone trying to get started in this industry, learning this made me want to remove any evidence of my gender from my resume and job applications.
Next was the live-streamed keynote address from Tsu-Jae King Liu, Dean, UC Berkeley College of Engineering at WiDS Stanford. Yes, there were several jests about how she attended WiDS Stanford over WiDS Berkeley, all in good fun. She gave several examples of how academia drove technological innovation — it was interesting to think about the interaction between research, development, and application. As an advocate of the importance of diversity, especially in technology, I never quite know how to answer: Why does it matter? Well, thankfully, Professor Liu answered that question quite explicitly for me and provided an illuminating example with stark real-world consequences. Airbag deployments results in 50% more injuries in women because only male sized dummies were used during testing. Diversity initiatives aren’t just about equity and rectifying systemic oppression, but have significant beneficial implications for the organization that enact these measures and real-world outcomes, especially in the fields of medicine, technology, criminal justice, and law.
There was a lot of great discussion in the panel on building inclusive data communities, but there was one thing that particularly stuck out to me. Prof. Niloufar Salehi, Assistant Professor, Berkeley School of Information spoke about her participation in the creation of the Feminist Data Manifest-No, a set of feminist data science principles accompanied by a declaration of refusal and a commitment. Some highlights:
We refuse to be disciplined by data, devices, and practices that seek to shape and normalize racialized, gendered, and differently-abled bodies in ways that make us available to be tracked, monitored, and surveilled. We commit to taking back control over the ways we behave, live, and engage with data and its technologies.
We refuse to understand data as disembodied and thereby dehumanized and departicularized. We commit to understanding data as always and variously attached to bodies; we vow to interrogate the biopolitical implications of data with a keen eye to gender, race, sexuality, class, disability, nationality, and other forms of embodied difference.
We refuse to accept that data and the systems that generate, collect, process, and store it are too complex or too technical to be understood by the people whose lives are implicated in them. We commit to seek to make systems and data intelligible, tangible, and controllable.
We refuse work about minoritized people. We commit to mobilizing data so that we are working with and for minoritized people in ways that are consensual, reciprocal, and that understand data as always co-constituted.
These tenets of the Feminist Data Manifest-No resonated with me and were reiterations of concepts introduced earlier on. Like the principles of FATE, they seek to remove discriminatory bias from data science. It acknowledges that data cannot be disembodied, you can’t just remove a protected attribute for there are so many identifying attributes that an algorithm can detect and with that perpetuate discrimination. They call for more transparency and working for minorities people. My main take-away and something that I have always believed is that data is interpretive and we as a society must resist the tendency of reduction. Data is powerful and a strong driver in analytical decision making; however, it must be used ethically and to achieve that, transparency in biases that affect the data must be acknowledged and understood as limitations.