0

Solving big questions
requires big computation

Introduction

Understanding the origins of our solar system, the future of our planet or humanity requires complex calculations run on high-power computers.

A common thread among research efforts across Stanford’s many disciplines is the growing use of sophisticated algorithms, run by brute computing power, to solve big questions.

In Earth sciences, computer models of climate change or carbon sequestration help drive policy decisions, and in medicine computation is helping unravel the complex relationship between our DNA and disease risk. Even in the social sciences, computation is being used to identify relationships between social networks and behaviors, work that could influence educational programs.

“There’s really very little research that isn’t dependent on computing,” says Ann Arvin, vice provost and dean of research. Arvin helped support the recently opened Stanford Research Computing Center (SRCC) located at SLAC National Accelerator Laboratory, which expands the available research computing space at Stanford. The building’s green technology also reduces the energy used to cool the servers, lowering the environmental costs of carrying out research.

“Everyone we’re hiring is computational, and not at a trivial level,” says Stanford Provost John Etchemendy, who provided an initial set of servers at the facility. “It is time that we have this facility to support those faculty.”

Here are just a few examples of how Stanford faculty are putting computers to work to crack the mysteries of our origins, our planet and ourselves.

Myths once explained our origins. Now we have algorithms.

Our Origins

Q: How did the universe form?

For thousands of years, humans have looked to the night sky and created myths to explain the origins of the planets and stars. The real answer could soon come from the elegant computer simulations conducted by Tom Abel, an associate professor of physics at Stanford.

Cosmologists face an ironic conundrum. By studying the current universe, we have gained a tremendous understanding of what occurred in the fractions of a second after the Big Bang, and how the first 400,000 years created the ingredients – gases, energy, etc. – that would eventually become the stars, planets and everything else. But we still don’t know what happened after those early years to create what we see in the night sky.

“It’s the perfect problem for a physicist, because we know the initial conditions very well,” says Abel, who is also director of the Kavli Institute for Particle Astrophysics and Cosmology at SLAC. “If you know the laws of physics correctly, you should be able to exactly calculate what will happen next.”

Easier said than done. Abel’s calculations must incorporate the laws of chemistry, atomic physics, gravity, how atoms and molecules radiate, gas and fluid dynamics and interactions, the forces associated with dark matter and so on. Those processes must then be simulated out over the course of hundreds of millions, and eventually billions, of years. Further complicating matters, a single galaxy holds one billion moving stars, and the simulation needs to consider their interactions in order to create an accurate prediction of how the universe came to be.

“Any of the advances we make will come from writing smarter algorithms,” Abel says. “The key point of the new facility is it will allow for rapid turnaround, which will allow us to constantly develop and refine and validate new algorithms. And this will help us understand how the very first things were formed in the universe.” —Bjorn Carey //

Q: How did we evolve?

The human genome is essentially a gigantic data set. Deep within each person’s six billion data points are minute variations that tell the story of human evolution, and provide clues to how scientists can combat modern-day diseases.

To better understand the causes and consequences of these genetic variations, Jonathan Pritchard, a professor of genetics and of biology, writes computer programs that can investigate those links. Genetic variation affects how cells work, both in healthy variation and in response to disease,” Pritchard says. How that variation displays itself – in appearance or how cells work – and whether natural selection favors those changes within a population drives evolution.

Consider, for example, variation in the gene that codes for lactase, an enzyme that allows mammals to digest milk. Most mammals turn off the lactase gene after they’ve been weaned from their mother’s milk. In populations that have historically revolved around dairy farming, however, Pritchard’s algorithms have helped to elucidate signals of strong selection since the advent of agriculture to enable people to process milk active throughout life. There has been similarly strong selection on skin pigmentation in non-Africans that allow better synthesis of vitamin D in regions where people are exposed to less sunlight.

The algorithms and machine learning methods Pritchard used have the potential to yield powerful medical insights. Studying variations in how genes are regulated within a population could reveal how and where particular proteins bind to DNA, or which genes are turned on in different cell types­ – information that could help design novel therapies. These inquiries can generate hundreds of thousands of data sets and can only be parsed with up to tens of thousands of hours of computer work.

Pritchard is bracing for an even bigger explosion of data; as genome sequencing technologies become less expensive, he expects the number of individually sequenced genomes to jump by as much as a hundredfold in the next few years. Storing and analyzing vast amounts of data is a fundamental challenge that all genomics groups are dealing with,” says Pritchard, who is a member of Stanford Bio-X. Having access to SRCC will make our inquiries go easier and more quickly, and we can move on faster to making the next discovery.” —Bjorn Carey //

7 billion people live on Earth. Computers might help us survive ourselves.

Our Planet

Q: How can we predict future climates?

There is no lab large enough to conduct experiments on the global-scale interactions between air, water and land that control Earth’s climate, so Stanford’s Noah Diffenbaugh and his students use supercomputers.

Computer simulations reveal that if human emissions of greenhouse gases continue at their current pace, global warming over the next century is likely to occur faster than any global-scale shift recorded in the past 65 million years. This will increase the likelihood and severity of droughts, heat waves, heavy downpours and other extreme weather events.

Climate scientists must incorporate into their predictions a growing number of data streams – including direct measurements as well as remote-sensing observations from satellites, aircraft-based sensors, and ground-based arrays.

“That takes a lot of computing power, especially as we try to figure out how to use newer unstructured forms of data, such as from mobile sensors,” says Diffenbaugh, an associate professor of environmental Earth system science and a senior fellow at the Stanford Woods Institute for the Environment.

Diffenbaugh’s team plans to use the increased computing resources available at SRCC to simulate air circulation patterns at the kilometer-scale over multiple decades. This has rarely been attempted before, and could help scientists answer questions such as how the recurring El Niño ocean circulation pattern interacts with elevated atmospheric carbon dioxide levels to affect the occurrence of tornadoes in the United States.

“We plan to use the new computing cluster to run very large high-resolution simulations of climate over regions like the U.S. and India,” Diffenbaugh says. One of the most important benefits of SRCC, however, is not one that can be measured in computing power or cycles. “Perhaps most importantly, the new center is bringing together scholars from across campus who are using similar methodologies to figure out new solutions to existing problems, and hopefully to tackle new problems that we haven’t imagined yet.” —Ker Than //

Q: How can we predict if climate solutions work?

The capture and trapping of carbon dioxide gas deep underground is one of the most viable options for mitigating the effects of global warming, but only if we can understand how that stored gas interacts with the surrounding structures.

Hamdi Tchelepi, a professor of energy resources engineering, uses supercomputers to study interactions between injected CO2 gas and the complex rock-fluid system in the subsurface.

“Carbon sequestration is not a simple reversal of the technology that allows us to extract oil and gas. The physics involved is more complicated, ranging from the micro-scale of sand grains to extremely large geological formations that may extend hundreds of kilometers, and the timescales are on the order of centuries, not decades,” says Tchelepi, who is also the co-director of the Stanford Center for Computational Earth and Environmental Sciences (CEES).

For example, modeling how a large plume of CO2 injected into the ground migrates and settles within the subsurface, and whether it might escape from the injection site to affect the air quality of a faraway city, can require the solving of tens of millions of equations simultaneously. SRCC will help augment the high computing power already available to Stanford Earth scientists and students through CEES, and will serve as a testing ground for custom algorithms developed by CEES researchers to simulate complex physical processes.

Tchelepi, who is also affiliated with the Precourt Institute for Energy, says people are often surprised to learn the role that supercomputing plays in modern Earth sciences, but Earth scientists use more computer resources than almost anybody except the defense industry, and their computing needs can influence the designs of next-generation hardware. “Earth science is about understanding the complex and ever-changing dynamics of flowing air, water, oil, gas, CO2 and heat. That’s a lot of physics, requiring extensive computing resources to model.” —Ker Than //

Q: How can we build more efficient energy networks?

Photo: San Francisco lighted up at night.

When folks crank their air conditioners during a heat wave, you can almost hear the electric grid moan. The sudden, larger-than-average demand for electricity can stress electric plants, and energy providers scramble to redistribute the load, or ask industrial users to temporarily shut down. To handle those sudden spikes in use more efficiently, Ram Rajagopal, an assistant professor of civil and environmental engineering, used supercomputers to analyze the energy usage patterns of 200,000 anonymous households and businesses in Northern California and from that develop a model that could tune consumer demand and lead to a more flexible “smart grid.”

Today, utility companies base forecasts on a 24-hour cycle that aggregates millions of households. Not surprisingly, power use peaks in the morning and evening, when people are at home. But when Rajagopal looked at 1.6 billion hourly data points he plotted dramatic variations.

Some households conformed to the norm and others didn’t. This forms the statistical underpinning for a new way to price and purchase power – by aggregating as few as a thousand customers into a unit with a predictable usage pattern. “If we want to thwart global warming we need to give this technology to communities,” says Rajagopal. Some consumers might want to pay whatever it costs to stay cool on hot days, others might conserve or defer demand to get price breaks. “I’m talking about neighborhood power that could be aligned to your beliefs,” says Rajagopal.

Establishing a responsive smart grid and creative energy economies will become even more important as solar and wind energy – which face hourly supply limitations due to Mother Nature – become a larger slice of the energy pie. —Tom Abate //

Know thyself. Let computation help.

Ourselves

Q: How does our DNA make us who we are?

Our DNA is sometimes referred to as our body’s blueprint, but it’s really more of a sketch. Sure, it determines a lot of things, but so do the viruses and bacteria swarming our bodies, our encounters with environmental chemicals that lodge in our tissues and the chemical stew that ensues when our immune system responds to disease states.

All of this taken together – our DNA, the chemicals, the antibodies coursing through our veins and so much more – determines our physical state at any point in time. And all that information makes for a lot of data if, like genetics professor Michael Snyder, you collected it 75 times over the course of four years.

Snyder is a proponent of what he calls “personal omics profiling,” or the study of all that makes up our person, and he’s starting with himself. “What we’re collecting is a detailed molecular portrait of a person throughout time,” he says.

So far, he’s turning out to be a pretty interesting test case. In one round of assessment he learned that he was becoming diabetic and was able to control the condition long before it would have been detected through a periodic medical exam.

If personal omics profiling is going to go mainstream, serious computing will be required to tease out which of the myriad tests Snyder’s team currently runs give meaningful information and should be part of routine screening. Snyder’s sampling alone has already generated a half of a petabyte of data – roughly enough raw information to fill about a dishwasher-size rack of servers.

Right now, that data and the computer power required to understand it reside on campus, but new servers will be located at SRCC. “I think you are going to see a lot more projects like this,” says Snyder, who is also a Stanford Bio-X affiliate and a member of the Stanford Cancer Center. “Computing is becoming increasingly important in medicine.” —Amy Adams //

Q: How do we learn to read?

A love letter, with all of its associated emotions, conveys its message with the same set of squiggly letters as a newspaper, novel or an instruction manual. How our brains learn to interpret a series of lines and curves into language that carries meaning or imparts knowledge is something psychology Professor Brian Wandell has been trying to understand.

Wandell hopes to tease out differences between the brain scans of kids learning to read normally and those who are struggling, and use that information to find the right support for kids who need help. “As we acquire information about the outcome of different reading interventions we can go back to our database to understand whether there is some particular profile in the child that works better with intervention 1, and a second profile that works better with intervention 2, says Wandell, a Stanford Bio-X member who is also the Isaac and Madeline Stein Family Professor and professor, by courtesy, of electrical engineering.

His team developed a way of scanning kids’ brains with magnetic resonance imaging, then knitting the million collected samples together with complex algorithms that reveal how the nerve fibers connect different parts of the brain. “If you try to do this on your laptop, it will take half a day or more for each child,” he says. Instead, he uses powerful computers to reveal specific brain changes as kids learn to read.

Wandell is associate director of the Stanford Neurosciences Institute, where he is leading the effort to develop a computing strategy – one that involves making use of SRCC rather than including computing space in their planned new building. He says one advantage of having faculty share computing space and systems is to speed scientific progress. “Our hope for the new facility is that it gives us the chance to set the standards for a better environment for sharing computations and data, spreading knowledge rapidly through the community,” he says. —Amy Adams //

Q: How do we work effectively together?

There comes a time in every person’s life when it becomes easy to settle for the known relationship, for better or for worse, rather than seek out new ties with those who better inspire creativity and ensure success.

Or so finds Daniel McFarland, professor of education and, by courtesy, of organizational behavior, who has studied how academic collaborations form and persist. McFarland and his own collaborators tracked signs of academic ties such as when Stanford faculty co-authored a paper, cited the same publications or got a grant together. Armed with 15 years of collaboration output on 3,000 faculty members, they developed a computer model of how networks form and strengthen over time.

“Social networks are large, interdependent forms of data that quickly confront limits of computing power, and especially so when we study network evolution,” says McFarland.

Their work has shown that once academic relationships have established, they tend to continue out of habit, regardless of whether they are the most productive fit. He argues that successful academic programs or businesses should work to bring new members into collaborations and also spark new ties to prevent more senior people from falling back on known but less effective relationships. At the same time, he comes down in favor of retreats and team building exercises to strengthen existing good collaborations.

McFarland’s work has implications for Stanford’s many interdisciplinary programs. He has found that collaborations across disciplines often fall apart due in part to the distant ties between researchers. “To form and sustain these ties, pairs of colleagues must interact frequently to share knowledge,” he writes. “This is perhaps why interdisciplinary centers may be useful organizational means of corralling faculty and promoting continued distant collaborations.” —Amy Adams //

Q: What can computers tell us about how our body works?

As you sip your morning cup of coffee, the caffeine makes its way to your cells, slots into a receptor site on the cells’ surface and triggers a series of reactions that jolt you awake. A similar process takes place when Zantac provides relief for stomach ulcers, or when chemical signals produced in the brain travel cell-to-cell through your nervous system to your heart, telling it to beat.

In each of these instances, a drug or natural chemical is activating a cell’s G-protein coupled receptor (GPCR), the cellular target of roughly half of all known drugs, says Vijay Pande, a professor of chemistry and, by courtesy, of structural biology and of computer science at Stanford. This exchange is a complex one, though. In order for caffeine or any other molecule to influence a cell, it must fit snugly into the receptor site, which consists of 4,000 atoms and transforms between an active and inactive configuration. Current imaging technologies are unable to view that transformation, so Pande has been simulating it using his Folding@Home distributed computer network.

So far, Pande’s group has demonstrated a few hundred microseconds of the receptor’s transformation. Although that’s an extraordinarily long chunk of time compared to similar techniques, Pande is looking forward to accessing the SRCC to investigate the basic biophysics of GPCR and other proteins. Greater computing power, he says, will allow his team to simulate larger molecules in greater detail, simulate folding sequences for longer periods of time and visualize multiple molecules as they interact. It might even lead to atom-level simulations of processes at the scale of an entire cell. All of this knowledge could be applied to computationally design novel drugs and therapies.

“Having more computer power can dramatically change every aspect of what we can do in my lab,” says Pande, who is also a Stanford Bio-X affiliate. “Much like having more powerful rockets could radically change NASA, access to greater computing power will let us go way beyond where we can go routinely today. —Bjorn Carey //

  • Hamdi Tchelepi

    Earth science is about understanding the complex and ever-changing dynamics of flowing air, water, oil, gas, CO2 and heat. That’s a lot of physics, requiring extensive computing resources to model.

    Hamdi Tchelepi,
    professor of energy resources engineering
  • Vijay Pande

    Having more computer power can dramatically change every aspect of what we can do in my lab, much like having more powerful rockets could radically change NASA — letting us go way beyond where we can go routinely today.

    Vijay Pande,
    professor of chemistry
  • Brian Wandell

    Our hope for the new facility is that it gives us the chance to set the standards for a better environment for sharing computations and data, spreading knowledge rapidly through the community.

    Brian Wandell,
    professor of psychology
  • Daniel McFarland

    Social networks are large, interdependent forms of data that quickly confront limits of computing power, and especially so when we study network evolution.

    Daniel McFarland,
    associate professor of education

Computing is as important to scientific discovery today as theory and experiments.

Hamdi Tchelepi,
professor of energy resources engineering