How Stanford is preparing the next generation of data scientists
The gateway course for the new Data Science & Social Systems major teaches students how data can be used to address important social problems.
Every day, data scientists are analyzing vast amounts of information about the world, using computational methods to find new ways to understand a problem or phenomenon, and deciding what to do about it.
But it’s not enough to use data on its own – it must be understood within its social and political context as well, according to Stanford political scientist Jeremy Weinstein. This year, Weinstein, along with Stanford statisticians Guenther Walther and Chiara Sabatti, has launched two new degrees: a Bachelor of Science in Data Science and a Bachelor of Arts in Data Science & Social Systems.
“There’s basically no new technological frontier that doesn’t depend on or engage in some important way with human behavior or a political or social institution,” explained Weinstein, a professor of political science in the School of Humanities and Sciences who serves as faculty director of the BA program in Data Science & Social Systems. “For example, when staffing the tech industry of the future, you want people who can effortlessly move between the technical team, the policy team, and the trust and safety team. The Data Science and Social Systems program is designed to prepare professionals who can work at those intersections.”
This past spring, over 90 students took the new gateway course for the major, DATASCI 154: Solving Social Problems with Data. Throughout the course, which Weinstein co-taught with Mallory Nobles, the program’s associate director, students developed skills in quantitative analysis, modeling, and coding, but also honed their ability to frame problems, choose appropriate designs, and interpret data as it relates to its social and political environment.
Merging the engineering and social science perspectives
The course brought two mindsets together: a social science approach, rooted in an understanding of causal inference, and an engineering approach, based in learning algorithmic design and optimization.
As Weinstein and Nobles emphasized to their students, these perspectives are interconnected.
“When you ask and answer causal questions about a social problem, you’re deepening your understanding of the underlying causes, which can give you clues about how you might go about solving it, and when you design an algorithmic solution, you then want to understand its effect when deployed in the world, which brings you back to causal inference,” said Weinstein, who is also the faculty director of Stanford Impact Labs.
Students explored the value of these different approaches through modules designed with scholars from different fields at Stanford.
For example, Jennifer Pan in the Communication Department introduced students to the role of data science and causal inference techniques in studying the impact of social media on polarization and the spread of disinformation. Marshall Burke from the Department of Earth System Science engaged students in thinking through how machine learning approaches can help measure a changing climate, while social scientific methods are critical for understanding the impact of mitigation and adaptation policies. Ramesh Johari from the Department of Management Science and Engineering, along with David Scheinker from the School of Medicine, exposed students to the challenge of delivering equitable access to healthcare and how algorithmic approaches can improve delivery of patient care through the lens of their work on diabetes at Stanford Medicine.
Students learned how they too can be at ease shifting their perspective between engineering and social sciences. Class assignments emphasized statistics, computer science, and math in tandem with topics in the social and behavioral sciences, like psychology, sociology, economics, and political science. Their final project was to write a research proposal to tackle a social problem of their choosing.
The importance of questions
As Josh Orszag learned, getting the data is the “easy” part. Data can’t get you very far unless you have a meaningful research question.
“If you don’t have the right research question, you’re not going to get anywhere,” said Orszag, a Data Science and Social Systems major interested in issues related to democracy and governance. The challenge is figuring out what problem or predicament you want your data to answer.
Orszag teamed up with Ava Kerkorian, a prospective Data Science and Social Systems major, to think about how to build trust in the election process.
Throughout their research design process, Kerkorian said she and Orszag went back and forth with each other as they figured out how such a complex issue could be tackled in a way that was specific, scalable, and also actionable.
“So many times during this project, we had to take a pause and ask ourselves, how do we measure trust? What would success look like? What is confidence? Are we even sure this is something we want?” Kerkorian said.
What they ended up proposing was studying whether a nudge – a concept from behavioral economics that sways behavior through small suggestions or positive reinforcement – explaining how redistricting works from an Independent Redistricting Commission could influence people’s attitudes about the fairness of an election.
Thinking critically, ethically about impact
The course made Serena Lee, also a data science and social systems major, think critically about what it means to be a responsible data scientist.
“This class taught me that the work starts with how to collect data because that has a lot of value-laden decisions, from whom to involve in the dataset to what questions to ask, what wording to use, and how far in the past to look at the data,” Lee said. For her final project with Annie Zhu, they wanted to explore the influence of video-based misinformation in comparison to text-based misinformation. Specifically, they proposed studying different ways platforms could flag potentially harmful posts.
Eva Gorenburg, who also took the class this quarter, said learning the ins and outs of research design has changed how she now sees data.
“I think it’s really easy to take numbers as objective fact, but what we learned is even in studies that seem super quantitative and objective, there are tons of choices in the study design that impact results,” Gorenburg said.
Students also learned that what they choose to measure – and not measure – and how they use their data all have social consequences.
“If you just rely on observational study, observation or opinion, there’re so many essential experiences that you’re leaving out,” said Emily Winn, an environmental systems engineering major. “Solving social problems with data allows us to see things on a much broader scale than previously before.”
Winn and Gorenburg worked together for their final project, which was a proposal to study the social impacts of arsenic poisoning on women in Rural Bangladesh, where little data on its effects exists. Specifically, they wanted to know whether arsenicosis would lower a woman’s likelihood of marriage, which is essential for the economic and social security of women living in the region.
Just the beginning
Understanding social problems is not the same as solving them.
“Social problems exist for complex reasons,” said Weinstein. “Solving problems involves significant stakeholder consultation and understanding what the pathway is from a new insight or a new tool to actual change in the world.”
For Esha Thapa, a Data Science and Social Systems major, the class marks the beginning of an interesting journey examining these dynamics in greater depth.
“It’s definitely not a process that ends with the quarter ending,” said Thapa. “It’s something that we need to take with us for the rest of our careers and this is a great gateway course in that respect.”
Following Solving Social Problems with Data, students in the major will continue to take a range of core classes in data science, ethics, and social sciences. In their senior year, students will take a capstone practicum where they will apply computational and statistical methods to address a social issue in a real-life setting.
Data Science majors can pursue one of two tracks: either a Bachelor of Science being overseen by Walther and Sabatti, or a Bachelor of Arts, which Weinstein and Nobles direct.