Ko, Sidow and Force

Dennis Ko, Arend Sidow, PhD, and Shelley Force Aldred.

Short Take

Dr. Clean Genes

Sorting out the genome exposes vital DNA sequences

By Amy Adams
Photograph by Bryce Duffy

Neat freaks be warned -- the human genome is a very untidy place. It is strewn with random DNA and dotted with redundant genes. Viral sequences are wedged between human genes. Replicates of single genes have scattered throughout the chromosomes. That's in addition to the mutations that crop up in each generation. And it's not just us -- the genomes of all vertebrates have become increasingly cluttered and mutated with each evolutionary step.

Except, that is, for some isolated genetic sequences that have remained virtually untouched by evolutionary debris. Despite millions of years of accumulated change, these genetic snippets are still nearly identical even in far distant species. According to geneticists such as Arend Sidow, PhD, these unaltered regions are no mere evolutionary oversights -- they are regions too important to change. Any stray mutations, inserted sequences or genomic rearrangements that cropped up must have so ruined the gene's function that the organism died, taking its problematic DNA with it.

Sidow, an assistant professor in Stanford's pathology and genetics departments, suspects that locating these unchanged sequences could guide researchers to the most important parts of the genome -- prime hunting grounds for disease-causing mutations or sequences that control when and how a gene is used. "They have enormous predictive value for mutations in humans," Sidow says. The more conserved the region the more deadly any mutation may be. But so far finding them has been difficult.

It turns out that software Sidow developed to help deduce evolutionary relationships between genes can be put to use in finding conserved DNA sequences. The software, called ProPhylER (short for Protein Phylogenies and Evolutionary Rates), is particularly adept at seeing through accumulated genetic clutter to work out how genes are related. It figures out when the genes duplicated and which members of a gene family in one species are related to members of similar gene families in another species. "The initial goal was to find out what the gene relationships were," Sidow says, adding that this goal of tidying up gene family trees has been with him since his graduate school days. "I wanted to put the genes in a rational framework," he says. Once he works out these gene family trees, he can turn ProPhylER's power to picking out similarities in the related genes.

To run a ProPhylER analysis, Sidow selects gene sequences from among those housed in online databases of publicly available genes. These include sequences from biology's usual suspects such as humans, mice, rats and frogs, as well as a few sequenced genes from animals such as chickens or fish. ProPhylER then plots the degree of similarity across the gene's length, with spikes showing where evolution has caused the sequences to diverge and dramatic dips where the genes are nearly identical. Many such dips exactly correlate with regions that are known to be important to a protein's function -- regions that recognize a hormone or bind two proteins together. Any evolutionary change to this region would have rendered the protein useless, so the gene sequence for that region stayed the same while neighboring sequences accumulated changes and rearrangements.

Revealing the Jewels Amid the Junk

As an example of ProPhylER's use, Sidow points to the gene that's involved in cystic fibrosis. The ProPhylER (pronounced "profiler") analysis for this gene highlights a region just before the gene begins that's similar in several species. Because this region has changed very little while the bases around it have followed their own evolutionary paths, that region is bound to be important for the gene to work correctly. This conserved section lines up with a known binding site for a protein that latches onto the DNA and regulates when and where the gene gets used. A mutation here prevents the protein from binding and turning this gene on in the appropriate tissue, leading to cystic fibrosis.

That ProPhylER identifies regions of known importance is reassuring to Sidow, but the software is only truly useful if it can guide researchers through the genetic jumble to regions of hitherto unrecognized importance. To that end, Sidow has begun collaborations with two Stanford graduate students who are using ProPhylER to speed their search for conserved sequences. Although for now Sidow and ProPhylER-adept members of his lab have to run the analysis, he hopes one day to have the software available online in a user-friendly format that researchers worldwide could access.

One of Sidow's collaborations is with Shelley Force Aldred, a graduate student in the lab of genetics professor Richard Myers, PhD. Force Aldred entered graduate school interested in understanding how stretches of DNA that come as much as 80,000 base pairs before or after a gene can regulate when and where that gene is used. Traditionally, the only way to find these regions is a patience-testing technique called bashing that requires making mutations along that enormous DNA landmass and hoping to find something interesting – a process akin to poking random holes in the ground in the hopes of finding oil. "If you wanted to understand that region you had to bash the heck out of it," Force Aldred says.

Force Aldred decided to speed up this tedious process with a two-pronged approach. One of those prongs uses ProPhylER to search for stretches of DNA that are virtually unchanged in the DNA regions surrounding the gene of interest in humans and mice. "If you find noncoding DNA that's highly conserved it must be something important," she says. It's almost impossible to find these conserved regions without ProPhylER because they may be at dramatically different locations before the human and mouse genes. "Even if I had sequence data, I don't have the skills to do the alignment," Force Aldred says.

The other prong involves developing a new bench science to pull out DNA chunks that regulate where genes are turned on and off in tissue culture. In theory, the regions Force Aldred pulls out experimentally should be those same regions that show a dramatic dip in Sidow's plot of DNA variability. "It's like a validation of what we're doing," Force Aldred says. It both substantiates her time-saving approach so others can use it with confidence and shows the value of Sidow's data. "We hope it will shave years off of bashing experiments," Force Aldred says.

Picking Out Key Protein Parts

Force Aldred stresses that although she's using Sidow's data and tools to examine gene regulation, a very important use for ProPhylER will be what it tells researchers about the proteins fashioned by the genes. One project that takes advantage of this capability is Sidow's collaboration with another graduate student, Dennis Ko, who works in the lab of developmental biology professor Matthew Scott, PhD.

Ko began his graduate career studying the childhood neurodegenerative disease Niemann-Pick C. Although other researchers had found that two genes called NPC1 and NPC2 are mutated in children with the disease, no one knew much about these proteins' normal roles. Ko heard Sidow talk at a genetics department retreat and realized that ProPhylER could speed up his studies. "I thought I could apply his tools to point out regions of importance," Ko says.

Sidow used ProPhylER to analyze NPC2 in six different vertebrate species. First he worked out which proteins from the other species are directly related to NPC2. He then compared those evolutionarily related genes for similarities. From this, ProPhylER zeroed in on several regions that were remarkably consistent in each species. "Because nothing is known about how parts of NPC2 are involved in fulfilling the protein's molecular function, this was the first glimpse of which regions are important," Ko says.

Ko made mutations in the conserved regions and tested the modified proteins in tissue culture. Sure enough, mutations in each of the conserved regions prevented the NPC2 protein from functioning normally in the cells. Some mutations blocked NCP2 from binding cholesterol -- one known function of the protein -- but others had no effect on cholesterol binding, telling Ko that the protein has another role in the cell. Ko said that without Sidow's analysis he would have either made mutations blindly or guessed which regions were likely to be important based on similarities to other known cholesterol-binding proteins.

Speeding up genetic research such as Ko's and Force Aldred's looks like ProPhylER's most immediate use. But Sidow sees a day when ProPhylER could perform the ultimate in genomic spring cleaning -- pinpointing all the most conserved regions of the human genome. "Maybe we can narrow the genome down to a much smaller fraction that's important," Sidow says. With that roadmap to the genome's most conserved sequences, researchers seeking disease-causing mutations need only look in isolated spots rather than sequencing the clutter of the entire genome. "You give me a genotype and I can say 'this person has a mutation at this position.' " From there, Sidow says, ProPhylER could predict just how bad any mutations might be -- if the mutation alters a highly conserved region it could spell trouble. How and when such an analysis could become part of medicine Sidow leaves for others to decide.

"More information has never been bad," he says, "It's a matter of how that information is used."

My Kingdom for a Cow

Cow genome data would strengthen analysis

Forget sequencing the genome of monkeys and apes. Sidow would like science to turn its attention to the cow.

That's because the more distant the evolutionary relationship between the vertebrates used in a ProPhylER analysis, the greater the accuracy of the results. Think of it this way: How surprising would it be that humans and our close relatives the apes share many genetic sequences? Not very.

But if humans, cows and dogs all have a similar sequence wedged amidst accumulated mutations, then that sequence must have been preserved for a reason. "The more different species you have the better the statistics are," Sidow says.

With that in mind, Sidow says that the cow and the dog genomes would do nicely for his purposes, or the pig and the cat. Anything, really, so long as it comes from a far-flung branch of the vertebrate evolutionary tree.

Comments? Contact Stanford Medicine at

 Back To Contents