By Elizabeth Devitt
Illustration by Tavis Coburn
Nigam Shah has never peered into a child’s eyes, watching for the dusty haze of chronic inflammation that — if found too late —can lead to blindness.
But by designing computer programs that sift through millions of electronic health records, Shah, MBBS, PhD, has helped pediatricians save youngsters’ sight.
Children with the most common form of arthritis, juvenile idiopathic arthritis, not only have inflamed joints, but up to 30 percent of them also get eye inflammation, or uveitis. Uveitis doesn’t always cause symptoms — and kids don’t always tell anyone if they do — so these young patients need to be checked by an ophthalmologist every three to six months. Yet even with reminders, busy schedules can crowd out follow-up appointments and doctors may miss their chance to treat the earlier stages of uveitis.
So Shah, an assistant professor of medicine, and his team at Stanford’s Center for Biomedical Informatics Research joined forces with rheumatologist Jennifer Frankovich, MD, a clinical assistant professor of pediatrics at Stanford, to take a closer look at a group of young arthritis patients to search for risk factors that might provide early warning cues. They applied automated methods and a customized algorithm to search 1.2 million electronic health records specifically prepared for research use at Stanford. Through this effort, they discovered that if some of these children also had allergies, their odds of developing uveitis were two-and-a-half times higher than those of other youngsters with this condition. This new information may help medical teams identify which patients are in critical need of keeping up with their eye exams.
A growing cadre of database miners are creating algorithms to wrest more knowledge from data piling up in virtual warehouses — gene sequences, protein structures, archived medical images and electronic medical records, to name some of the types of information they’re digging through. Sorting through this biomedical “big data” has become a specialty practice in its own right, with consulting companies vying to analyze medical information for the benefit of industry or — as is the case with a company launched by Shah — patients. The results of this type of research are changing the way doctors and hospitals deliver medical care.
“People are beginning to realize data mining is more than just genomics,” says Shah. “And when people hear that data mining can keep a kid from going blind, they want to know more.”
A new approach to research
Nowadays, the recommended practice for doctors is to base treatment decisions on the results of clinical trials; software like the popular UpToDate decision aid has made that information easily accessible online (for a fee). Shah’s long-term vision has doctors tapping into biomedical big data for information to guide their treatment decisions just as easily.
‘What better way to find these different drug effects than to utilize EHRs, which have much more diverse sources of information?’
With the right algorithm, says Shah’s colleague and collaborator Russ Altman, MD, PhD, professor of bioengineering, of genetics and of medicine, sifting through electronic health records can become a learning system for one doctor, a group of doctors in one clinical practice, an entire hospital, a network of hospitals or a nationwide health-care system. “There are two big buckets of applications for data mining,” says Shah. “Those that drive the practice of medicine and those that drive science.” The answers that can be teased out of the data depend on the question you ask.
As to the practice of medicine, he envisions a world in which care is customized to individual patients. Shah and two partners recently founded a company called Kyron to provide a search engine for physicians looking for clinical evidence in real time — to explore the best medical-care options for the person in their exam room right at that moment.
“Instead of pulling up a paper that provides a summary of someone else’s research, doctors will be able to ask questions based on their own patient,” Shah says. For example, if a physician sees a patient with a history of heart attacks and seizures, a computer algorithm could search millions of records to find people of the same age and ethnicity, and with similar conditions, to determine whether a new prescription would conflict with the anti-seizure medication or worsen the underlying heart disease. The customized answer could include how many patients were found with similar medical issues, how strongly their query was associated with those other patients, and — most important — whether the associations between the questions and answers would make any difference in the health of the patient. Right now, Shah says, creating those computer programs is a brute force enterprise: “It’s like the early days of the Web at Yahoo where someone was manually categorizing Web pages.”
With the help of Louis Monier, the former chief technical officer of the internet search engine AltaVista, and Noah Zimmerman, PhD, a Stanford biomedical informatics alum, Shah hopes Kyron can offer doctors a new way to practice medicine before the end of 2014. If the company is successful, analyzing medical records from routine care to generate medical insights could become as simple as a Web search.
In research, letting computers do the legwork can save major amounts of time and money. Traditionally it takes about seven years from the time a researcher applies for a National Institutes of Health grant to the publication of study results. But with access to huge databases, studies can be done in weeks — at far less cost. For instance, when the Ohio-based MetroHealth health-care provider joined forces with a health-care analytics company, it spent only three months and about $25,000 to get the same results as a multimillion-dollar Norwegian study that took 13 years to follow 26,714 people and discover that men were at an increased risk of blood clots if they were tall and obese.
The learning-based system also affords researchers an opportunity to uncover “natural experiments,” says Shah. These are experiments that can’t easily be done because it would be hard to recruit a group of people with a specific list of treatments or medical procedures. For example, Shah and his colleagues used EHRs to investigate cilostazol, a drug prescribed for people who suffer from blood vessel blockage in their legs. Although the label states the drug shouldn’t be used in patients with congestive heart failure, some clinicians still feel that their patients would benefit from it. By mining the EHRs in Stanford’s database, they were able to show that using cilostazol caused no additional deaths in these high-risk patients.
“You can’t do a randomized clinical trial for everything,” says Shah, referring to the gold standard of medical knowledge: dividing patients with similar traits and treatment needs into groups, then testing the treatment options against a placebo, without anyone knowing which treatment they received. Although he’s quick to point out that data mining isn’t meant to replace clinical trials, Shah notes the computer “trials” may represent “real” patients better than the trials that enroll people who meet very specific criteria.
Altman usually spends his research time studying pharmacogenomics, harnessing the speed of computers to search databases for tiny variations among people’s genes that can be linked to responses to drugs. “But mining electronic medical records is a natural outgrowth of that work,” he says. “What better way to find these different drug effects than to utilize EHRs which have much more diverse sources of information?”
At Stanford, researchers benefit from access to STRIDE, a comprehensive database of Stanford patient records, spearheaded by Henry Lowe, MD, an associate professor of medicine and former director of the Stanford Center for Clinical Informatics. He designed the database to provide researchers access to clinical data without compromising patient privacy. At the last accounting, STRIDE contained information from 2 million medical records of patients from the Lucile Packard Children’s Hospital and Stanford Hospital & Clinics, including 25 million clinical documents, 1.3 million surgical pathology reports, 16 million pharmacy orders and 157 million laboratory test results.
Stanford researchers use STRIDE for almost 400 consultations a year, says Lowe. Most often, they explore the data as a “first step” to see how many patients might match what they are looking for in a study: age, gender, ethnicity, prescriptions, health problems and medical procedures. “STRIDE is a big enabler,” says Lowe. Without a way to aggregate that patient information and then de-identify the material to protect patient privacy it would be “a mind-numbing process” to search millions of records for the widely divergent sources of patient information — doctors’ notes, prescription orders or reports from medical procedures — that might be needed for a study, he says.
The scientists at Stanford aren’t the only ones trying to milk more information from millions of patient records pooling in hospitals around the world.
In 2013, the University of Oxford, in the United Kingdom, announced the launch of its Big Data Institute. Described by Oxford vice chancellor Andrew Hamilton as an opportunity to “transform the way we treat patients and understand disease in the coming decades,” researchers at the new institute plan to analyze large patient data sets — which include electronic health records collected through the National Health System — to uncover more effective medical treatments.
In the European Union, a handful of initiatives, such as EURECA and the European Medical Information Framework, are linking EHRs with other databases to deepen their mining resources. The National Center for Hematology in Moscow uses EHRs from more than 3 million patients to more efficiently recruit patients for clinical trials.
Closer to home, scientists in eMERGE, a consortium of nine research groups spread across almost a dozen states and clinical sites, can search DNA databases linked to all their EHRs. For example, researchers at the Vanderbilt University School of Medicine in Nashville, Tenn., a network member, searched eMERGE databases to study the DNA of 6,300 people with abnormally low levels of the thyroid hormones. As a result, the scientists found a previously unknown association with a tiny change in one gene, locating a new risk factor for this common thyroid disorder — all by using information already there, says Joshua Denny, MD, an associate professor of biomedical informatics and medicine at Vanderbilt. “It might take more than a year to find all those people for a clinical trial, and then we would have to do the genotyping,” says Denny. “I see EHR mining as cheaper, faster and better.”
‘The time from the first email to a submitted paper was 46 days.’ It would have taken months — or years — to run the same study as a regular clinical trial.
It’s also a handy way to get research results quickly corroborated by other scientists. When Altman mined the Food and Drug Administration database of adverse drug reactions and found that the combination of a common blood pressure drug and a particular antidepressant could cause blood sugar spikes, his next step was to see if that finding held true when he searched for that effect in Stanford patient records: It did. Then he simply emailed a colleague at Vanderbilt and asked him to run a similar search through his EHRs. At the same time, Altman sent that request to another researcher at Harvard. “The time from the first email to a submitted paper was 46 days,” says Denny. He couldn’t even estimate how many months — or years — it would take to run the same study as a regular clinical trial.
The Kaiser Permanente Medical Group relies on its 14-million-patient database and half a million DNA sequenced samples to improve patient care and efficiency, says Robert Pearl, MD, the executive director and CEO, who also lectures at the Stanford Graduate School of Business on strategic change in the health-care industry. “Right now, mining big data allows us to ask clinical research questions such as: Which hospitalized patients are likely to get much worse, so I can get them into ICU sooner? In the future, our clinicians will be able to query the database with patient info and then find out what happened with the last 1,000 patients with that condition. Today that ability resides at the research level,” says Pearl.
Public health organizations are also using electronic health data in many ways, including automated reporting of notifiable diseases such as tuberculosis, keeping tabs on influenza outbreaks, and assessing the safety of medical products. The FDA’s Mini-Sentinel program runs surveillance through a distributed data network that includes electronic health data from more than 120 million people to monitor the safety of medical products on the market. “Electronic health information has enormous potential to improve a whole range of patient activities,” says Richard Platt, MD, principal investigator for the FDA program and chair of the Harvard Medical School’s Department of Population Medicine.
In December 2013, the Patient-Centered Outcomes Research Institute, known as PCORI, announced $93.5 million in funding for the U.S. National Patient-Centered Clinical Research Network, which has the goal of speeding up comparisons of the effectiveness of medical treatments. “Electronic health information will be a critical component of this network and the potential return on this investment is very large,” says Platt, a director of the network’s coordinating center.
Working out the kinks
Lynn Etheredge, a consultant with the Rapid Learning Project at George Washington University in Washington, DC, says the biggest stumbling block for mining EHRs is the lack of standardization in the information that is collected and stored. “We could have an extraordinary system of national registries and database networks for all illnesses that have the critical pieces needed for clinical research, but our EHRs don’t yet deliver on this potential,” he says. Currently every institution is building its own system and few of them “talk” directly to each other. (A recent paper by Stanford’s Lowe and colleagues in the Journal of the American Medical Informatics Association described a method to transform clinical data — into a sort of universal currency — to help algorithms analyze multiple clinical data sets.)
Another caveat is that EHR systems are built for individual patient care. As a result, substantial effort may be needed to make the data useful for addressing questions that require combining information from many systems, says Platt. The FDA invests millions of dollars a year in working with its data-generating partners, such as the 18 organizations that comprise the Mini-Sentinel project, to assure consistency, quality and completeness of data across health-care organizations — but even so, the available data is insufficient to answer some research questions. For instance, says Platt, information in health-care records is often lacking or imprecise about important risk factors, such as smoking or exposures to over-the-counter medications.
The other obvious concern in setting up searches through millions of patients’ medical records is privacy. In fact, wouldn’t the patient privacy law, known as HIPAA, preclude use of patient data in this way? Not necessarily. Although the law restricts unauthorized sharing of personal health information, it allows such sharing if it would benefit the public’s health.
While health-care providers take every precaution to eliminate information that could identify individual patients before researchers get access to the data, it’s hard to ignore the threat posed to privacy when researchers examine broad swaths of patient information. With a written request, patients can opt out of including their medical information in Stanford’s research database built from their records. But Altman notes that some patients are more than willing to give up a little privacy in return for improved medical care, pointing to the growth of websites such as PatientsLikeMe where more than 200,000 people have shared intimate details about their disease symptoms with the hope of learning better ways to manage their illnesses. “The kind of privacy that we used to know is an illusion in this day and age,” says Shah. “What’s critical is security — you don’t want unauthorized access to people’s medical information. So we need to have a discussion about what is an acceptable risk of privacy loss.”
As EHR use gets mainstreamed into health care, the reams of medical information generated will attract more data miners and companies like Kyron that could profit from patient information in the process of improving their health care. “Who owns that data?” is a very important and relevant question to ask, says Shah. Right now, he says people willingly trust their data to Facebook and Google and the like, who mine it to make money (mostly by showing ads). “I don’t have all the answers,” Shah says. “But if there is money to be made then ultimately obtaining consent in some form is necessary.”
Steve Goodman, MD, PhD, professor of medicine at Stanford and the vice chair of PCORI’s methodology committee, says the benefits will likely outweigh the risks. Plenty of people appear to agree. In a 2012 survey conducted by Consumer Reports, nine in 10 responded positively to the statement: “My health data should be used to help improve the care of future patients who might have the same or similar conditions.”
“A system like this has the amazing ability to support physicians and give on-the-fly summaries of patients,” says Altman. “If you’re an internal medicine specialist like me, you’ll most likely follow 2,000 to 3,000 patients really well over the course of your career. Maybe you’d see another 10,000 to 20,000 patients on top of that. Imagine if you could augment that experience to 1 million patients. That’s what data mining can do — and we’re only scratching the surface.” SM