
Regina Barzilay is an MIT professor and AI faculty lead of the MIT Abdul Latif Jameel Clinic for Machine Learning in Health. Photo: MIT
MIT professor and National Academy of Medicine member Regina Barzilay is using machine learning to transform breast cancer risk prediction and expand the possibilities of personalized medicine.
By Jamie Durana
When Regina Barzilay was diagnosed with breast cancer in 2014, it upended her life and shifted the direction of her research. Already an accomplished computer scientist specializing in natural language processing, her experience as a patient shed light on the possibility of new applications for machine learning and revealed a stark disconnect between technology’s promise and its implementation in health care. “It was upsetting to see that all these great technologies are not translated into patient care,” she recalls. “I wanted to change it.” After going through her own treatment, Barzilay’s work took on an urgent new focus: could the very technologies she used in her research predict who might be at risk for breast cancer?
Every year millions of women in the United States go for their annual mammogram. For many, it can be an experience punctuated by anxiety, confusion, stress, even fear. Fear of the uncomfortable and sometimes painful test, and fear of the potential diagnosis: about 1 in 8 US women will develop breast cancer. Mammography is considered a gold-standard tool in breast cancer detection, and current US guidelines recommend that women aged 40 and above receive a mammogram each year. However, results aren’t always conclusive and, even if a scan doesn’t reveal anything, many women find themselves on pins and needles, wondering what next year’s scan might reveal. These challenges, along with an average cost of a few hundred dollars without insurance, lead roughly 60% of women to skip the scans altogether.
What if a single mammogram could help doctors predict a woman’s risk of developing breast cancer years into the future? That’s the potential of the machine learning tool called MIRAI that Barzilay’s team has created. Machine learning is a kind of artificial intelligence (AI) that trains computers to recognize patterns in data and improve their performance over time and with exposure to more data. By analyzing subtle cues in mammogram images, the tool identifies patterns associated with cancer development long before they’re visible to the human eye. With a more nuanced understanding of an individual patient’s risk, inconclusive mammograms and the annual waiting game for patients could become a thing of the past. Trained on data from almost 2 million mammograms, MIRAI has already shown promise in early risk prediction.
From Computer Science to Cancer Research

Illustration: Talia Lewis
Now a professor at MIT, Barzilay has long had an affinity for math and science. She loved math as a child and was good at it, which encouraged her pursuit of other STEM subjects. After college, Barzilay taught in a high school but ultimately craved the opportunity to take her own studies further and went on to earn a PhD in computer science at Columbia University in New York. Barzilay says she was intrigued by the constantly growing and changing field: there was always the chance to learn something new.
“It was really the early days of the Internet, but it was clear even then how many opportunities and possibilities you can just go and explore,” Barzilay says. That mindset has guided her efforts to explore how new technologies, like machine learning, could improve health care.
After her PhD, Barzilay was a postdoctoral researcher focused on natural language processing, a type of machine learning that teaches computers to understand human language. She says it was “an exciting journey” because the tools are constantly improving and showing promise for an ever-widening variety of applications.
Much of Barzilay’s current research is focused on drug discovery, the process of identifying possible new medicines. A core pursuit that guides her research is how machine learning can better predict disease, understand how those disease processes might appear in different patients, and identify how patients will respond to treatment. A person’s experience with an illness is more than the difference between a ‘sick’ or ‘not sick’ state, it is complex and unique to each patient. Predicting what that process will look like for someone is a challenge that Barzilay believes machine learning can help solve.
“Even looking at the simple scenario of a cold: some people who have a cold might be back to normal in two days, some might have a complication and be ill for two weeks, others might develop a high fever,” she says. Barzilay says applying machine learning technologies—which have the capacity to take in large amounts of data—in health care has the potential to fill in the blanks and expand a physician’s view “because a machine sees much more than any human doctor would ever see.”
Building the Model: From Concept to Global Validation
One can look at reading mammogram scans as identifying telltale signs of cancerous growth, typical tissue development gone awry. Tumor development is a “long process of tissue transformation,” Barzilay says. Her team decided to try using mammogram scans to train a machine learning model that would be able to decipher this seemingly invisible development process.

Example mammogram scan from a National Cancer Institute (NCI) public image library. MIRAI identifies signs of breast cancer risk by analyzing mammograms. Photo: NCI
An initial challenge for Barzilay’s team was getting the mammogram data needed to train the technology that would eventually become MIRAI. There’s no single large set of mammogram data that’s publicly available in the United States. It took nearly two years, but the team found a partner in Massachusetts General Hospital and received five years’ worth of mammogram scans to train MIRAI. Outcome data, information about whether or not patients represented in the scans went on to develop breast cancer, was a critical part of the data provided by Barzilay’s clinical collaborator, a radiologist at the hospital. “For each image, we knew the diagnosis for the woman over the next five years, and this data was used to train the model,” Barzilay says.
The way MIRAI works essentially comes down to its remarkable ability to recognize in mammogram scans subtle harbingers of cancer development, tiny signs that are invisible to the human eye. The model, trained on mammogram scans paired with five years of patient follow-up data, learned to associate those seemingly imperceptible tissue patterns with eventual cancer outcomes. “Some [patients] developed cancer, some didn’t, and some developed it in two years, some in five years,” Barzilay explains.
“For the human eye, it’s a problem of pattern recognition,” she says. A machine learning model like MIRAI can be exposed to an enormous amount of data, representing more patients than an individual physician would be able to treat. Barzilay says the exact pattern MIRAI is detecting is unknown, similar to the way an iPhone’s facial recognition feature works: what an iPhone pinpoints to identify a user’s face—nose, eyes, or another feature—is unclear, but it has built that recognition after seeing the user’s face from different angles.
The fact that MIRAI can evaluate a mammogram and predict breast cancer risk up to five years in advance shows how much information might be hidden in plain sight. And the model continues to grow its knowledge base.
Following its development using the initial set of mammograms, MIRAI’s performance was validated using patient data from seven hospitals across the United States, Israel, Sweden, Taiwan, and Brazil. To date, the model has been validated on roughly 2 million mammogram scans in 48 hospitals in 22 countries. This demonstrates MIRAI’s accuracy across a range of populations, illustrating that it has the potential to be used as a breast cancer prediction tool virtually everywhere.
There are different kinds of mammography machines in different countries. To ensure that MIRAI could be used in a wide range of clinical settings, the model needed to be able to interpret scans generated by different machines. “We had to put a lot of time in to ensuring that the method is invariant to the source from which the image is taken,” says Barzilay. The result is that MIRAI can be used around the world.
Throughout the development and validation process, the team was struck by how much MIRAI could understand about patients through the scans. Patient questionnaires are often used to help assess risk. The questions cover things like a patient’s age when menstruation began, reproductive history, nicotine use, and more. Although these are important sources of background information about patients, the data collected may be flawed or incomplete. “Patients may not remember answers to all of these questions, or maybe they don’t necessarily want to divulge the information when filling out the form,” Barzilay says. That means the relationship between certain answers and certain outcomes isn’t always obvious. But MIRAI was able to predict many answers to these questions based solely on the mammogram scans because “the tissue itself imprints a lot of information,” says Barzilay. She says the team was surprised to find that “the image itself has a lot of answers,” accurately predicting things like the patient’s age range or breastfeeding history.

Barzilay and her lab celebrate her induction to the National Academy of Engineering (NAE). Barzilay was elected to both the National Academy of Medicine and the NAE in 2023. Photo: Regina Barzilay Group
Rethinking Risk Assessment
Traditional risk indicators for breast cancer have proven immensely valuable and assessing risk through these means saves lives. But there are limitations. For example, screening for BRCA, or the breast cancer gene mutation, is an important tool and can dramatically increase survival rates for affected individuals, but the mutation only accounts for about 15% of breast cancer cases. Barzilay says, “this means the vast majority of patients who get breast cancer, like me, don’t know that they are at risk.”
Barzilay says traditional risk factor prediction methods fall short across the board. Some can be especially unreliable for certain populations, compounding the problem. For example, the widely used prediction assessment called Tyrer-Cuzick has been shown to underestimate breast cancer risk in Black women. That’s not an issue with MIRAI. A 2021 study found it consistently outperformed traditional tools like Tyrer-Cuzick across all patient groups.
Regulatory Hurdles: When Technology Outpaces Guidelines
Medical recommendations about when women should begin having mammograms and how frequently they should be done varies across countries. But those general standards tend to be based on age. Barzilay sees MIRAI as an opportunity to individualize preventive care by using its predictive power to base breast cancer screenings on each patient’s body. “We cannot create a dress that is going to fit everybody: you really need to have a dress that is based on your individual body,” she says. “It’s the same thing with breast cancer screening.” Barzilay paints a picture of preventive care that would allow doctors to tailor breast cancer screening. For example, perhaps women have their first mammogram at age 35, then MIRAI can identify the risk trajectory for each patient over the next five years. “Then you say, OK, these women don’t need to come for another ten years, these women actually need to come in every three years, and this woman needs to come in every year,” explains Barzilay.
Although MIRAI holds incredible potential to be a powerful tool for personalized preventive care, its path to widespread use in clinical settings comes with significant hurdles. One of the biggest is the disconnect between the rapid pace of technological innovation and the relatively slow enactment of medical guidelines for clinical practice. Barzilay says that some current guidelines about risk prediction have not kept pace with advancements in technology and data assessment over the years.
MIRAI has the potential to expand physicians’ field of information about patients. But, as Barzilay points out, “AI changes so fast and guidelines have a much slower way of updating. This difference is a big barrier for bringing tools like MIRAI into clinical care.”
The Future of Predictive Care
According to Barzilay, MIRAI is just one example of how machine learning can help shape personalization in health care. She imagines a world where predictive diagnostics become a staple in preventive medicine. Breast cancer screening is somewhat unique, Barzilay points out, because mammograms are automatically recommended to patients based on age. But what about the wide range of other illnesses a person may never know they have or have a predisposition to develop until symptoms appear? Barzilay sees vast potential in routine testing and specialized imaging to uncover risks long before they become critical, helping make prediction a routine and reliable part of preventive care.
Barzilay’s vision goes beyond diagnostics: she is enthusiastic about the potential for drug design to become more tailored to individuals. Designing drugs is a time and money intensive process, and still there are so many diseases for which no drugs exist, Barzilay says. Machine learning has the potential to help cut time and cost, while also finding more solutions. “Today, when you’re taking a drug, it may or may not be effective for you, it may or may not generate side effects for you,” she says. “We’re talking about mass-produced things, but I hope, as AI continues to evolve, you will be able to really personalize it much more, increasing the efficacy and decreasing the cost.”
Barzilay sees the success of MIRAI signaling broad potential for machine learning applications in medicine and transformation in how doctors assess risk. What if the rules guiding prediction of disease or reaction to medications weren’t based solely on statistics or a patient’s medical history, but also on the story your own body tells?