S T A N F O R D M E D I C I N E

Volume 18 Number 1 Winter/Spring 2001


back

 

studying biology without a test tube
BY SANDY McCALLUM FIELD

BIOINFORMATICISTS USE COMPUTERS TO FIND PATTERNS IN GENOMIC DATABASES.

Biochemistry professor Doug Brutlag's lab is unlike most others at Stanford University School of Medicine. There are no centrifuges, test tubes or Bunsen burners on the laboratory benches. There are no signs warning of radiation or biohazards. Instead, there are computers, lots of them. Next to the sink stands an espresso machine.

One of the perks of Brutlag's type of research is that it is safe to eat in his laboratory -- which is a good thing considering how much lab work he and his lab members have ahead of them.

Brutlag's lab is one of a handful that has pioneered the now-booming field of bioinformatics. Whether working in academia or industry, the bioinformaticist's goal is to devise computational methods that use genomic information to determine biological molecules' structures and functions. Or as Brutlag jokes, a bioinformaticist comes up with ways to learn about proteins and DNA by doing experiments in silico. He switched from test tubes to computers in 1985, making him the senior bioinformaticist among the five Stanford faculty members specializing in this field.

Why are bioinformaticists so busy these days? In February, scientists announced they had solved the mystery of the human genome sequence, spelling out all 3 billion DNA letters that comprise the complete set of
human genes. But now an even bigger question looms: What does it all mean?

Bioinformaticists are betting that their computer programs will come up with answers faster than human beings can. And like other scientists, bioinformaticists expect the answers to provide valuable information for understanding health, designing drugs and predicting disease.

One of Brutlag's favorite ways to predict the function of a protein without doing any "wet" experiments is by using computers to search the protein's amino acid sequence for patterns -- called motifs -- that have a record of cropping up in other proteins. If the bioinformaticist finds any motifs, he or she has also found a big clue to that protein's function: If proteins share motifs, the odds are high they'll have similar functions as well. If the motif search turns up no suggestions -- as is the case about 20 percent of the time -- the biologists turn to standard lab techniques to provide answers.

Though Brutlag's computerized approach can't supply all the answers, those it can deliver come quickly: in under two minutes. "Once we know a gene's sequence, we can use eMOTIF to search the database and turn up a function for the protein made by that gene within two seconds to two minutes," Brutlag says. Using standard laboratory techniques, it would take a researcher at least two days to ascertain a function for a gene -- and quite possibly several months.

Not long ago, no easy way to search for motifs existed. Then in 1991, Steve Henikoff, PhD, from the Fred Hutchison Cancer Research Center in Seattle, developed BLOCKS, a motif database. Using this database, researchers can learn the extent of the similarity
between an unknown sequence and a motif, and the probability that these two proteins are related. Underlying the program is an algorithm that aligns members of protein families and identifies any blocks of amino acids shared among them. This database is the starting point for many of the programs that the Brutlag laboratory has developed.

Out of this work has come eMOTIF, -- a program two of Brutlag's past postdocs, Craig Nevill-Manning, PhD, and Tom Wu, PhD, developed in 1997 in hopes of providing a faster, more informative means to search protein sequences. The analysis software, now licensed by Stanford to companies for a small annual fee, is one of the most widely used programs of its kind in industry.

One of the advantages of eMOTIF is that its searches take into account that some amino acids in a sequence are more important than others. The Brutlag group's program figures out which amino acids are vital for the protein's function and which can be replaced by another amino acid with no harm done, condensing the information into a document called a consensus sequence. Then eMOTIF predicts a protein's function by using the newly discovered protein's consensus sequence to search the database for proteins with similar sequences.

Current lab members are proving eMOTIF's great potential for speedily analyzing the bounty of data provided by genome sequencing projects. Graduate student Jimmy Huang has written a computer program that automates an eMOTIF search for all of the proteins encoded by an entire genome. This February, as soon as the sequence of the human genome was reported, fellow graduate student Serge Saxonov took the next step and used Huang's program to run eMOTIF on all 30,000 proteins in the genome.

Saxonov's project is moving with the rhythm typical of bioinformatics research: "Doing the run took only a couple of days -- though it's taking many weeks to interpret the results," Brutlag says. The group's goal is to identify functions for the many proteins in the human genome that have yet to be characterized.

Though the Brutlag lab specializes in unearthing protein functions, several of the more recent projects (which happen to be particularly interesting to drug designers) focus on predicting protein shapes. Information about how the chain of amino acids bends and twists to form a protein's three-dimensional structure provides clues as to the protein's function -- and even bigger leads on how to alter it.

One of the lab's structural projects, called 3MOTIF, takes all of the eMOTIFs and finds them in the database of known three-dimensional protein structures. In other words, the program, developed by Nevill-Manning and graduate student Steve Bennett, allows researchers to take their motif information for an unknown protein a step further by providing some idea of the structure of that motif.

A drug designer might find this very useful: If the structure of an enzyme's substrate-binding pocket can be predicted down to the nearest atom, it is possible to design molecules that might block this site and act as an inhibitor of the
enzyme. This can provide an effective and specific drug.

Graduate students Amit Singh and Jessica Shapiro are
interested in protein structure as well. Singh created a program that compares protein structures by superimposing one on top of the other. This comparison reveals conserved structural features -- dubbed sMOTIFs. Using sMOTIFs, Shapiro hopes to develop a program that will predict a protein's shape without the tedious and time-consuming task of crystallizing the protein to determine its structure using X-rays.

 

While most bioinformatics efforts focus on predicting the functions and shapes of proteins, one member of Brutlag's laboratory is venturing into new territory: DNA. Now that scientists have sequenced the human genome, they're increasingly interested in mining these strands of DNA for meaning.

To that end, graduate student Xiaole Liu has developed a software program, called Bioprospector, that looks for DNA motifs. Though scientists who analyze DNA sequences have long recognized some of the simpler motifs in DNA, such as the nucleotide pattern that marks the beginning of a gene, they've had more difficulty ferreting out some of the longer, less obvious signals embedded in DNA. Bioprospector allows researchers to recognize more subtle motifs, such as those that control when genes switch on and off -- information that can come in handy when designing new drugs or gene therapy strategies.

The demand for expertise in applying computational methods to biological problems is high -- and incoming students know it. As a member of the admissions committee for Stanford's BioMedical Informatics program, Brutlag has seen a boom in interest in the field.

"We are really getting top students in this area and are only taking the top 10 percent of those. We had over 120 applications for eight graduate student spots this year," says Brutlag. "When they finish their degrees, these students are getting top job offers from pharmaceutical companies to work on drug design problems."

Will bioinformatics make classical methods of experimentation obsolete? "No," says Brutlag. "Everyone won't be a bioinformaticist; bioinformatics will simply focus biology on really novel problems," he predicts. SM

VISIT THE BRUTLAG GROUP BIOINFORMATICS WEB SITE AT http://motif.stanford.edu.

BY SANDY McCALLUM FIELD