published 08/24/1999



Coming Soon: A global genomic map of single-nucleotide polymorphisms (SNPs), the tiny differences between two people's DNA that largely determine everything from who's the natural athlete and who's the klutz to who's likely to get lung cancer from smoking and who's not. In the not-so-distant future, scientists will also be able to tell who's at risk for cardiovascular disease, whatever their lifestyle, as well as who will respond, or not, to this drug or that. But the techniques now used for discovering and mapping SNPs are costly, tedious, and Ph.D.-intensive. The real mark of a SNP-detection assay suitable for industrial scale-up will be its downward mobility: For characterizing huge numbers of SNPs among large populations, cheap, fast, and easy is the way to go.


Ahead of schedule and under budget. Not bad for government work.

"The idea that I could be standing in front of you now and claiming we'll have 90 percent of the human genome sequenced within 12 months," Francis Collins told assembled journalists at a San Francisco press briefing in June, "would have appeared ridiculous only two or three years ago." Yet that's exactly what the director of the National Human Genome Research Institute in Bethesda was doing. The human genome, Collins continued, should be fully, accurately sequenced by 2002 instead of 2005 as originally planned, with 90 percent of it available in a public database by the middle of next year.

Pretty big stuff. But the Human Genome Project has a spindly little brother who, within a few years, is going to be just as big or bigger. In recent months, a public effort to tease out, identify, and genomically map the single-nucleotide polymorphisms, or SNPs (pronounced "snips"), that account for the vast majority of genetic differences among humans has been abetted by another major initiative: an unprecedented consortium of 10 juggernaut pharmaceutical houses and the Wellcome Trust.

A global SNP map for the human species could lead to a treasure chest for Big Pharma, because association studies based on such a map (See the sidebar, "When Nucleotides Collide: What Are SNPs and How Can You Use Them?")

could be used to return benched drugs to first-string status, to locate obscure disease-related genes, and as preventive diagnostics. That's a paradigm shift, but it might not be the only one in store. For the SNP map's payoffs to be realized, costs have to drop dramatically. Some companies you've undoubtedly heard of, and a couple you perhaps haven't, are coming up with ways of making that happen -- and in the bargain, a whole new way of looking at DNA.

"Based on what people say or how they feel or how they look or how their lab tests look," says David Cox, co-director of the Stanford Human Genome Center, "we may say, 'Well, all those people have the same disease.' But in fact, we just may not have the right glasses on. There may really be five diseases there. Say we have a treatment for this disease that doesn't work very well -- in fact, it only works in about one in five people, but for them it works great. Then on examining everybody's SNPs, lo! and behold, we find five distinct genetic patterns associated with the disease, that the patient population for whom the drug worked corresponds precisely to one of these five distinct genetic groups -- and that now, we can identify ahead of time who the drug's going to benefit. That's big-time."

Big-time enough to have caused several biotech companies to start burning the midnight oil -- and a fair amount of cash -- a couple of years ago in efforts to identify clinically relevant SNPs and turn them into intellectual property, either for their own drug discovery use or for licensing to other companies. Most recently, CuraGen Corp., of New Haven, announced it had accumulated an inventory of 120,000 SNPs located on protein-coding regions of the genome. But CuraGen is not the lone player in this highly competitive field. Among others, there's Iceland-based deCode Genetics Inc., which has taken a population-based approach to genomics and Celera Genomics, which has amassed enormous computational resources to analyze genomic sequence data and is reportedly taking the lead in association studies. As well, Millennium Predictive Medicine Inc. joined forces with Becton Dickinson and Co. in February 1999 to develop pharmacogenomic tests for cancer patients. (For more complete details of these, and other pharmacogenomics companies, see "SNPs: Patent la Difference?" in Signals' archives.)

Last autumn, the National Institutes of Health (NIH) tossed $30 million at the National Human Genome Research Institute to go head-to-head with the SNP privateers in a race to identify 100,000-plus SNPs and quickly usher them into the public domain, thereby keeping the lock off this treasure trove.

The public/private race grew lopsided early this year when the giant nonprofit Wellcome Trust and 10 major multinational pharmaceutical houses funded a $45-million consortium dedicated to generating a full-strength SNP map within two years of the April kickoff and -- significantly -- placing the results into a public database open to researchers of every stripe, free of charge. The SNP Consortium, whose work is being carried out by five genome-research centers in the United States and England, expects to identify a minimum of 300,000 SNPs, with 150,000 of them mapped, by April 2001.

++ THE SNP CONSORTIUM ++

Big Pharma Members Academic Centers
AstraZeneca plc The Whitehead Institute For Biomedical Research
Bayer AG Washington University School Of Medicine
Bristol-Myers Squibb Co. The Wellcome Trust's Sanger Centre
F. Hoffmann-La Roche Ltd. Stanford Human Genome Center
Glaxo Wellcome plc Cold Spring Harbor Laboratory
Hoechst Marion Roussel AG  
Novartis AG  
Pfizer Inc.  
G.D. Searle & Co. Inc.  
SmithKline Beecham plc  

"For ten huge pharmaceutical companies to come together in a not-for-profit effort, and then to share their results with the public like this, is pretty unprecedented," says Cox of the Stanford Human Genome Center, which is doing the bulk of the consortium's mapping work. "I don't know that anything like that has ever happened before."

To the chagrin of early birds like Genset S.A., Incyte Pharmaceuticals Inc. and Myriad Genetics Inc., the big money won't be in merely finding SNPs -- the NIH, the SNP Consortium, and allied entities have seen to it that public databases will be bulging with enough SNPs to get around anybody's patent. "Everybody will be able to do this sort of work without being held hostage to commercial databases," says SNP Consortium CEO Arthur Holden. They won't have to wait too long, either, because data will be released pretty much as they're compiled.

The consensus human genome's amped-up schedule has a halo effect on its younger sibling. Once most of the sequence is known, SNP seekers can quickly map lots of the SNPs they find just by doing a database search.

With the creation of a genome-wide SNP map in the near future now a virtual certainty, there's a running debate about how to use it. Nobody really knows how big test populations have to be, or how many SNPs per DNA sample must be looked at, to yield meaningful data. The numbers, of course, will vary immensely according to the application. Association studies employed to find novel susceptibility genes involved in complex, multigene diseases or to determine drug-friendly vs. non-friendly genotypes may require the characterization, or "scoring," of tens or, ideally, hundreds of thousands of SNPs among each of thousands or even tens of thousands of people.

As results from association studies are obtained, drug companies may be able to use the information to resurrect dead drugs by examining banked blood samples to see if some subgroups in a previous study had benefited significantly from a drug that, judging from the entire patient population taken as whole, was a dud. Hundreds or thousands of subjects in upcoming or ongoing trials could also be SNP-genotyped and checked for clustering of efficacy or side effects among selected subsets. The numbers of SNPs such applications might require for scoring would depend on how much was known about the genetics involved.

Beyond the world of clinical trials looms the land of the clinical diagnostic test, wherein blood samples are collected from as many as several million people scattered in space and time, and anywhere from one to a few hundred SNPs per sample scored to see which drug was best suited for that individual. That's a potential win/win/win/win/win situation: for the patient, the doctor, insurance companies, public health authorities -- and the pharmaceutical companies, who would profit from the sale of the diagnostic, the increased likelihood of compliance, and the public-relations advantages of seeing scare headlines about "100,000 prescription-drug-related deaths per year" relegated to the ashheap of history.

That day isn't here yet. The current tools used for discovering and mapping SNPs -- variations on the brute-force DNA-sequencing theme -- aren't suitable for scale-up. "It costs somewhat under a dollar using conventional technology to assess one patient for one SNP," says Elliott Sigal, vice president for applied genomics at Bristol-Myers Squibb Co. (a SNP Consortium player). "If you wanted to look at 100,000 SNPs, you wouldn't really want to be doing $100,000 worth of experimentation per patient for very many patients. What we're expecting -- and, I think, what we're seeing -- is moves by biotechs to make that genotyping technology more cost-effective."

Of all the chunks clogging the SNP-scoring pipeline, among the biggest is the reliance of virtually all current SNP-scoring technologies on amplification of DNA samples by means of polymerase chain reaction (PCR), a kind of chemical Xerox machine for chosen few-hundred-nucleotide DNA sequences. PCR is slow --it requires 20 or 30 iterations of "thermal cycling" (back-and-forth changes of the reaction temperature) -- and takes a whole day to run. It's expensive, too, and will remain so at least until Hoffmann-LaRoche Inc.'s patent on the polymerase enzyme employed in the reaction expires in a few years.

Finally, PCR produces an exponential amplification of the target DNA sequence -- from 1 to 2 strands, then 4, 8, and so forth. That's a blessing and two curses, the blessing being lots of product from minuscule amounts of original material. The two curses are, first off, that the reaction's nonlinearity, combined with its temperature dependence and rates that vary from one DNA moiety to another, preclude reliable conclusions regarding the quantity of the original material. And second, that a sample must be handled fastidiously lest it be contaminated by extraneous material -- one stray DNA molecule, not to mention one ill-timed sneeze, can ruin your prep (remember, that's a whole day!).

Says Holden of the SNP Consortium: "Patent costs are just one aspect of it. The problem with PCR is it's never been particularly simple to use. I think it's a technology platform whose time has come and, maybe, is rapidly going." Stanford's Cox bemoans the headache of "keeping track of all those different assays."


As with the Internet, it may be the plumbers, not the content providers, who are the best bet. One of those wrench-bearing minions is Affymetrix Inc., a Santa Clara, CA-based company with 440 employees. Affymetrix has adopted the photolithographic approach of some of its neighboring Silicon Valley semiconductor companies and developed a "DNA chip" containing precisely ordered arrays of oligonucleotides built up nucleotide by nucleotide on a glass-wafer substrate. In a month or two, according to Rob Lipshutz, VP for corporate development, Affymetrix will launch a chip checking for 1,500 SNPs and containing something on the order of 60,000 oligonucleotide "probes," any one of which will "light up" if it hybridizes to a piece of DNA being tested. You build in redundancy by adding probes whose sequences begin and end one or two or three nucleotides to the left or to the right of one another. The test itself is straightforward: PCR the DNA sample, do the hybridization, scan the chip to see which "pixels" have lit up, and analyze the data.

Lipshutz says he expects Affymetrix to be "a very strong contender" as SNP-density requirements grow because "the more SNPs you look at, the more powerful the chip-based strategy is."


GeneCHIP Probe Array Synthesis Process
Image Courtesy of Affymetrix

But upsizing a chip has its downside. For one thing, any given complementary pair of DNA strands will have its own peculiar parameters for optimal hybridization; when you've got to do hundreds, or thousands, of hybridizations under essentially the same chemical and thermal conditions, you can get some bad data. Plus, it takes a lot of time to redesign a chip: Suppose someone comes along and says, "We've decided you should forget about these 30 SNPs -- they're not clinically relevant -- and instead check for these other 55 SNPs." You're back to square one. Finally, as currently configured, the Affymetrix chip requires that the sample DNA be amplified by PCR, with all the limitations that imposes on speed, price, ease, and scalability.


Mark Chee was director of genomics research at Affymetrix before leaving, he says, to contemplate the problem of scalability. He is now vice president of genomics at Illumina Inc., which started lab operations late last year in San Diego and still has fewer than 30 employees.

"Everybody else in the array business uses what we call an 'ordered array'," says Chee. "If you go to a particular x/y location on Affymetrix's chip, they can tell you the exact probe sequence at that location. But there's a scale-up problem. As you go to smaller and smaller features on an ordered array in order to get more and more elements onto the chip, your mechanical tolerances get tighter and tighter. Those problems are solvable, but they're expensive to solve."

Illumina takes a radically different approach to creating its array. A fiber optic strand's core and surrounding cladding are made of different materials. If you polish one end of a bundle of these strands to a smooth finish, then stick it into an etching medium, the cores will get etched a little deeper than the cladding. The end of each strand becomes, effectively, a little well. The bundle can then be dipped into a dish containing a bunch of beads that have been first optically bar-coded, then coated with different oligonucleotides -- beads with barcode A are coated with oligonucleotide A, and so forth. These beads spontaneously and randomly seat themselves in the wells at the ends of the fiber-optic strands. An array is born.

Image Courtesy of Illumina

Redundancy of bead-strand attachments ensures that each approximately 1.2 mm. by 1.2 mm. fiber-optic bundle (they're actually hexagonal) holds enough bead/sensors to do upwards of 1,000 genotypes, Chee says. A set of 96 bundles can be configured to mate with a microtiter plate, so you can dip one bundle into each sample-containing well of a 96-well plate. Hybridization generates a light signal.

Despite the beads' random seating, says Chee, their codes can be read (and their associated oligonucleotides thereby determined) after the fact. "We don't have to deal with the physical positioning challenges that everybody else has to deal with," says Chee. "But they don't have to deal with figuring out what's where after they've made the array. We think that's a good tradeoff. We have a unique ability to scale." To look at five more SNPs, you just add five new kinds of coated beads to the array, which can furthermore be scaled up to accommodate 384- or 1,536-well plates.

Illumina's technology is flexible, too. A matrix of 96 arrays can process either 2,000 samples for modest numbers of SNPs or, by loading the same sample onto all the microtiter plate wells, hundreds of thousands of SNPs on that sample.

But Illumina's sentient spaghetti isn't fully cooked yet. While no apparent roadblocks stand in the way of a viable product, Illumina's technology is still in the development phase. And it still depends on front-end PCR to beef up the sample size to detectable levels.

Madison, Wisconsin-based Third Wave Technologies Inc., with about 100 employees, has a PCR-free SNP-detection technique known as Invader that has drawn close attention from people at the heart of the SNP Consortium's mapping initiative -- and, to prove it, $19.5 million in new cash from a just-concluded round of mezzanine financing by sophisticates including, tellingly, the Wellcome Trust and the venture arm of SmithKline Beecham, another consortium player.

According to Bruce Neri, Third Wave's senior vice president for R&D, "Nature doesn't like single-stranded DNA," expressing its distaste via so-called "flap endonucleases" (or FENs), which clip off loose flaps of unhybridized DNA when construction processes underway in the region configure those dangling ends into recognizable structures. Third Wave has domesticated and patented a family of FENs, trademarked Cleavases, which team up with specially designed oligonucleotide probes to force a yes-or-no answer from a target DNA stretch in response to the question: "Does your SNP contain allele X?"

Here's how Invader works: Once you know the exact sequence of nucleotides on each side of a SNP site, you can construct two special single-stranded oligonucleotide "probes." The first, "signal" probe contains three domains: a stretch of DNA that's 100 percent complementary to several nucleotides abutting one side of the SNP; then, a single nucleotide which may or may not be complementary to the SNP -- you don't know yet, or you wouldn't have to run the test; and finally, a series of nucleotides distinctly uncomplementary to their counterparts on the far side of the SNP. Left to its own devices, that uncomplementary stretch will wave in the breeze while the other end of the probe is hybridizing its little tail off. The signal probe is added in excess so that it outnumbers target DNA by, say, a million to one.

Image Courtesy of Third Wave

The second oligonucleotide probe, called the Invader, contains a string of nucleotides complementary to precisely those SNP-neighboring nucleotides on the target DNA that have zero affinity to the signal probe. At one end, the Invader also contains an additional nucleotide guaranteed to hover at the SNP site when the rest of the Invader hybridizes to neighboring DNA.

Whether the tagalong nucleotide is complementary to the actual SNP or not makes no difference; either way, it will displace the signal probe at the SNP site and force the loser's end flap into just the configuration that FENs love to eat for breakfast -- but (and here's the really neat part!) only if the nucleotide at the signal probe's SNP-site position happens to be complementary to the actual SNP allele sitting on the target. Nature hasn't offered a full explanation yet of why things work this way, but they really do.

So, if you've guessed lucky and built a signal probe whose diagnostic nucleotide is indeed complementary to the SNP, the FEN clips off the probe's dangling flap and turns it loose into the surrounding medium. The unclipped end of the liberated signal probe flap farthest from the cleavage site is, by the way, labeled in such a way that it will fluoresce only after it's been clipped. No SNP match, no FEN clip, no signal, no problem. It must have been the other allele.

Happily, FENs perform quite efficiently at a temperature at which DNA hybridization is a transient affair. Signal probes are designed to be fidgety at this temperature, constantly sitting down briefly and then (clipped or unclipped) hopping off of the target DNA. The hybridizing rump of a clipped signal probe doesn't hog the site, but instead soon makes way for the next, yet-to-be-clipped signal probe (there are a lot of those unclipped signal probes hanging around nearby, because you originally dumped in a huge excess of them).

The Invader reaction yields about one Cleavase clip per target molecule every two seconds, assuming the signal probe is complementary to the SNP. That's almost 1,000 brightly fluorescing signals in a half hour's time. To heighten the assay's sensitivity, Third Wave has added a second, simultaneous phase the company calls Invader Squared: The signal probe, instead of being fluorescently labeled for readout, is made so that its clipped flap -- a la Invasion of the Body Snatchers --itself becomes a secondary Invader probe, quickly hybridizing to a synthetic DNA sequence tossed into the reaction tube in excess. Also present is lots and lots of a fluorescently labeled secondary signal probe, built so that invasion always forces it to undergo a FEN clip job. Now, instead of 1,000 fluorescent signals after a half hour you have 1,000 times 1,000, or a million of them.

Because the Invader reaction amplifies a signal molecule, not the target DNA, there's no possibility of runaway reproduction of a contaminant oligonucleotide. Third Wave president and CEO Lance Fors says that this method has proven highly accurate, providing at least a thousandfold signal-to-noise ratio. Moreover, unlike PCR, which amplifies exponentially and somewhat variably with respect to temperature and substrate moiety, the Invader reaction is carried out at a single temperature and proceeds linearly over several orders of magnitude, allowing good quantitation. It is also simple to run, demanding only the Third Wave reagents, relatively unskilled labor, and, with the exception of a fluorescent plate scanner, only common laboratory instruments.

A non-PCR-dependent detection assay like Third Wave's Invader can be coupled with other readout systems, as University of Wisconsin nucleic acid chemist Lloyd Smith showed recently in a paper published in the May 1999 Proceedings of the National Academy of Sciences (Vol. 96, pp. 6301-6). Modifying the Invader reaction so that, instead of a signal vs. no-signal outcome, the assay generated a different oligonucleotide signal for each allele of the SNP in question, Smith and his colleagues were able to then characterize several SNPs by means of matrix-assisted laser desorption/ionization time-of-flight mass spectroscopy, or MALDI-TOF MS (a technology that Sequenom Inc., of San Diego, is commercializing for SNP genotyping), with 100 percent accuracy, in one case even avoiding an error that occurred during the PCR-dependent sequencing used to verify their results.

Smith, who was himself one of the pioneers of DNA sequencing, is also, incidentally, a co-founder of Third Wave, along with Fors, who was a graduate student in the fabled CalTech lab of Leroy Hood when Smith was a post-doc there. The combination of the Invader reaction, an intermediate sample preparation step, and MALDI-TOF MS analysis takes only about 5 hours to run and lends itself to automated sample handling, in contrast to SNP genotyping by sequencing PCR products, which is fussy and takes more than a full day in total to complete, he says.

Superior technology aside, it will take vast improvements in the price/performance ratio to bring Big Pharma aboard as an industrial customer, says Allen Roses, vice president and worldwide director of genetics at SNP Consortium participant Glaxo Wellcome Inc. "We do maybe 25 Phase II [clinical] studies a year," Roses says. "Consider a Phase II study that has 500 people and 200,000 SNP data points per person. At a dollar a SNP, that's $100 million -- forget it. At a penny a SNP, that's still $1 million. To put up a million dollars up front doesn't make a lot of sense, because it's so early in the life of a molecule [that's being tested in humans as a therapeutic] and we know most of these molecules will fail. Now if the cost were a tenth of a penny per SNP...$200,000 every two weeks or so, that might be doable."

Nobody expects to see SNP characterizations at a tenth of a penny anytime soon. But prices are going to fall, and demand will materialize at whatever point end users can see benefits from genotyping. Indeed, the very existence of a useful genomic SNP map will make it dangerous for combatants in a highly competitive industry not to use it.

SIDEBAR:
When Nucleotides Collide: What Are SNPs, and How Can You Use Them?

In less than three years, scientists should have finished sequencing the entire human genome -- a massive, international effort that's going to come in way ahead of schedule. But, as awesome as this accomplishment is, the Human Genome Project's just the warm-up.

For, precisely put, there isn't really "a human genome;" there are about six billion of them. Line up all the DNA in one of your cells so that your mother's and father's genetic contributions lie side by side (X and Y chromosomes excepted), compare equivalent single strands from each of the two double helices, and chances are those two 3-billion chemical-letter stretches (we'll call the four letters, or nucleotides, in this alphabet A, T, G, and C) will differ by about 0.1 percent. Almost all of those differences, moreover, will be in the form of single-nucleotide polymorphisms, or SNPs (pronounced "snips"): a "G" at position number 1,967,448,512 on one strand, an "A" at the identical position on the other.

SNPs located in the coding region of genes can alter the structure of proteins, giving someone sickle-cell anemia, or blue eyes instead of brown ones, or an apoE4 allele (a risk factor for Alzheimer's disease) instead of an apoE3 allele, or type B instead of type O blood. But the research community has a higher purpose in mind for these pinpoint variations in our genomic chemistry. By laboriously determining the invariant sequences of several nucleotides abutting each side of a SNP, researchers can assign that SNP to a specific location (or "SNP site") on a chromosome, eventually creating a reliable, high-density map consisting of hundreds of thousands of SNP sites spaced more or less evenly throughout the genome. It seems SNPs are relatively stable over our evolutionary history -- over a span of numerous generations, a SNP allele is likely to remain genetically linked with the stretch of DNA in its proximity -- leading specific SNPs to associate with specific alleles of potentially interesting genes.

Imagine two large groups of unrelated people, chosen randomly except for one variable: one group's members are all hypertensive, while the other's all have normal blood pressure. Imagine further that you have a SNP map and can quickly, accurately, and cheaply determine, at each of a large collection of SNP sites, what percentage of the members of each group have, say, an "A" instead of a "G." Suppose that, of all those sites, you find just 10 where SNP allele frequencies differ substantially between the two populations: For instance, 80 percent of the hypertensives, but only 15 percent of the controls, sport a "G" at DNA position number 564,000,001. The logical next step might be to look very carefully at the genes in the vicinity of each of these areas and see if any of them are partly responsible for hypertension.

That's not exactly shooting fish in a barrel, but it's a lot better than searching randomly for relevant genes. You could use the same approach (known as an association study or cohort study) to analyze genomic differences between those, say, who responded positively to a particular drug, those who responded weakly or not at all, and those who suffered from significant side-effects.

Bruce Goldman
Cover Graphic: Model-Damon Lewis


Copyright © 2010. Signals (signalsmag.com) is an online magazine of analysis for biotechnology executives. To contact the Signals editorial department, send e-mail to signals_edit@deloitte.com. Signals is published by: Recap, 2033 N Main Street, Suite 1050 , Walnut Creek, California 94596-3722, Phone: (925) 952-3870