Plucking Proteins From Cell Soup
If you're a lover of Japanese cuisine, and the least bit dexterous with chopsticks, you already know that they're the perfect utensil for picking up even one small grain of rice -- or for holding on to a slippery soba noodle. But you'd also have to agree with the sushi chef that chopsticks are not the appropriate tool for cutting up a whole fish.
Protein chemists and biochemists face a similar dilemma: While standard two-dimensional (2D) gel electrophoresis is a great way to isolate one individual protein from a mix, it's just not up to the task of resolving all the proteins that constitute the entire human complement (proteome). The abundant species, like albumin, may be a snap, but the rare ones, which are generally the most interesting, remain hidden. That doesn't even take into account the vast amount of work necessary to remove each individual spot from the gel and identify it. And there are lots of proteins: Due to post-translational modifications and RNA splicing events, 100,000 genes do not produce 100,000 proteins. It's more like one billion -- although some estimates put it closer to 20 million. And, if you're interested in studying protein-protein interactions, you could be looking at 50 billion possibilities. An all-out assault on the human proteome requires industrial-strength instrumentation and methods -- not to mention sophisticated bioinformatics capabilities to correlate and analyze all the data.
Researchers are striving not only to isolate, analyze and characterize all possible human proteins, but also they're determined to ascertain how those proteins interact with each other. Proteins are drugs as well as drug targets; they do the body's work, and when something goes awry, disease follows. Today's protein analysis tools are just not efficient enough for the gargantuan task that lies ahead.
The speakers at last week's conference on Investing In And Commercializing New Technologies For Proteomics, organized by Global Business Research Ltd. and held in San Francisco's Japantown, drove this point home time and time again. Not surprisingly, there are plenty of companies developing new analytical instrumentation at this time -- and others that are trying to industrialize existing methods to meet the demands now placed upon them by large-scale protein mapping efforts. According to Eric Schmidt, a senior research analyst at SG Cowen Securities, the separations market (including 2D gels and liquid chromatography methods) is about $200 to $300 million today; as well, mass spectrometry companies are already commandeering a $300 to $400 million market just for proteomics applications. These companies include, but are certainly not limited to, Bruker Daltonics Inc., Micromass (a division of Waters Corp.), Packard Instrument Co. Inc., Varian Inc. and PE Corp. (through its Applied Biosystems business group).
Most researchers feel that 2D gels, in their present format, represent the biggest bottleneck to large-scale proteomics research. According to Stephen Martin, the director of Applied Biosystems' proteomics research center in Foster City, CA, even using automated methods, "it's possible to identify only the most abundant proteins [in a cell lysate]" from a 2D gel. And, depending on the gel itself, that equates to anywhere from 100 to 600 proteins (not even close to the one billion that may exist). According to Hanno Langen, the head of proteomics at Basel, Switzerland-based Roche Genetics, "With a gel of a total cellular extract, we see only 250 different gene products." In fact, one of the common problems with cell lysates is the fact that not all the proteins are unique. "There are lots of proteins we see over and over again, some of them 125 times," Langen added. "We feel 2D gels are great for separation, but bad for identification."
Thus, researchers at Roche Genetics have devised a so-called 3D method, which involves combining differential fractionation (for instance, immunoprecipitation) with, and prior to, standard 2D gels. They're also fractionating the cells themselves into various compartments (i.e., nucleus, cytoplasm, mitochondria) prior to any separation of the proteins within each group. Other companies are struggling to resolve this problem by improving the gels themselves or devising high-speed automated scanning methods for collecting data on each of the protein spots from a gel. They're also developing automated picking methods to lift each spot off the gel for further biochemical analysis, and even looking into alternative methods such as isotope coded affinity tags (ICATs) that may make it possible to avoid 2D gels altogether. (For a discussion on ICAT technology, which was developed by Ruedi Aebersold and his colleagues at the University of Washington, see the Signals article "Proteomics Gears Up." Applied Biosystems now has an exclusive license to this technology.)
Despite researchers' widespread frustration with 2D gels, "There's no other technique that can currently deliver the resolution," according to Mary Lopez, the vice president of R&D at Sydney, Australia-based Proteome Systems Ltd. What's needed is methods to "optimize gels to get the most information out of them. They need to be reproducible."
"The key technologies driving proteomics research today are reproducible 2D gel technology, staining and scanning technology, mass spectrometry for identification, databases (both protein and genome) and database-searching algorithms," explained Applied Biosystems' Martin. "The initial challenge in proteomics research today is the automation and integration of these technologies."
Applied Biosystems has chosen to leverage its mass spectrometer (MS) capabilities across all divisions and product lines. "It's a fundamentally enabling technology," Martin explained, and can be applied in studies involving structural proteomics, functional proteomics and bioinformatics.
Researchers use these pricey instruments (they can run anywhere from $100,000 to nearly $400,000) to identify individual proteins, after those proteins have been separated out of a complex mixture via 2D gels (which separate proteins by both charge and mass). Once instrument makers determined how to transform peptides in solid or liquid form into a gas, they could then bombard the particles with electrons, ionizing them and making direct mass analysis possible.
There's variations of MS, including the increasingly popular MALDI-TOF (matrix assisted laser desorption/ionization time-of-flight), MS/MS (or tandem MS) and ESI (electrospray ionization) MS. However, warned Martin, stand-alone advances in MS platforms by themselves "won't deliver substantially against the requirements of biology."
While protein chemists and instrument manufacturers struggle with the very practical details of improving systems for high-speed, high-throughput protein analysis, others are attacking proteomics from the theoretical end. For, it's not enough to know the sequence of a protein; in fact, "Given a protein sequence, it's very hard to tell what its structure is, and even harder to tell how it folds," explained IBM's Ajay Royyuru.
Mega-corporation IBM is backing the most theoretical of all proteomics efforts. It's dedicated $100 million over five years to computational biology -- in the form of Blue Gene, a future computer dedicated to solving one huge puzzle: protein folding. To tackle such a massive computational challenge requires significant advances in both hardware and software. Blue Gene will be a massively parallel machine that is fantastically quick, about one million times faster than today's desktop PC, according to Royyuru, manager of the structural biology group at IBM's Thomas J. Watson Research Center in Yorktown Heights, NY. It will be capable of more than one quadrillion operations per second (one petaflop). "The goals of the Blue Gene project are two-fold, and they are equally important," Royyuru explained. "One goal is the large-scale simulation of protein folding; the other is to take computing to a new level and scale (in both machine design and software)."
But modeling alone won't completely solve the problem. "We must be able to connect with experimental data in order to interpret the simulations correctly," he said. "If simulations can be demonstrated to accurately reproduce several types of experimental results, then we can get information not available from experiments."
There are some experimental data, too: in fact, the public protein database contains the structure of about 12,000 proteins, all determined experimentally. But, according to Tim Harris, president and CEO of San Diego-based Structural GenomiX Inc., only about 2,500 of those are unique (for instance, the database contains lots of lysozymes).
Harris' company focuses on determining the three-dimensional structures of proteins through X-ray crystallography. But here, too, progress will require the interplay between computer modeling and experimental results. "Real structures are absolutely critical for modeling," Harris explained. Once you have them, you can "advance high throughput screening hits with rational drug design; examine a wider variety of potential targets; and improve selectivity against related proteins."
According to Harris, structural considerations "play a role in all phases of the drug discovery process…If you have the three-dimensional image of a protein [obtained through X-ray crystallography], you can infer what the function is." But, he cautioned, "it's much more than just having the structures. You have to know what to do with them. Once you've got the structures, you can assign functions to proteins and enable new drug discovery."
originally published 09/28/2000