
Systems biology practitioners – at universities, government and non-profit labs and companies -- have been hard at work these last several years developing the technologies and computational tools they believe will allow them to actually study intact biological systems – such as human beings. They’ve created massive databases bursting with information – especially in genomics and proteomics. They’ve devised amazingly clever algorithms and bioinformatics capabilities to interpret these data. They’ve even modeled diseases in silico. But despite these advances – and the knowledge that the next scientific challenge is to integrate every piece of biological information into a cohesive whole -- the field is just now at the point where everyone understands exactly what it will take to accomplish this awesome feat.

Imagine, if you will, that a group of brilliant scientists from an alien culture have landed on earth, quite close to a superhighway. They immediately notice that there are hundreds and hundreds of similarly shaped objects speeding down the road, so they set out to commandeer one to understand how it works. They use their special powers to reduce the car to its component parts – and then each scientist analyzes one of these parts and gains a complete understanding of its nature. But, they discover, that’s not enough to tell them how these different parts work together to achieve motion. So, they spirit away another car and begin again – but this time, they start with an intact vehicle. Being oddly shaped, distinctly non-humanoid creatures, the aliens can’t actually drive the car, so they set about formulating a hypothesis as to how the thing might work. And then they test that hypothesis – perhaps by removing a tire. Well, the car still runs (and it can even move, sort-of), so they form a new hypothesis – and pull out a spark plug. And so on.
Their human counterparts would instantly understand the dilemma that the aliens face – for modern-day practitioners of the life sciences in its myriad manifestations have come to the same point. Especially in the last 50 years, researchers have employed a distinctly reductionist approach to understanding the mysteries of how organisms function. They’ve gleaned a deep – though certainly not complete – understanding of the genome, the genes it contains and the proteins they produce. They’ve categorized biochemical and metabolic pathways in great detail and can reproduce enzymatic reactions in a test tube with ease. They know quite a lot about intracellular transport, organelles, organs and circulatory, immune and nervous systems. They also have a handle on physiology, pathophysiology and disease. But, frankly, researchers still don’t understand how all these components work together as a system – be it a fruit fly or a human being.

To do that requires very very big biology. It means that scientists must tie all these disparate chunks of information into one cohesive unit. For studies in humans, genomic, proteomic, metabolomic, physiological and even environmental data will have to be integrated in the context of disease states. This means layer upon layer upon layer of data – and the need to look at multiple variables simultaneously to ascertain how they interact with each other.
It also requires a new level of collaboration between scientists who never used to communicate at all -- molecular biologists, engineers (computer and others), physicists, chemists and mathematicians. It’s a daunting challenge – and would be literally impossible if it weren’t for the advent of sophisticated computational methods, algorithms and bioinformatics capabilities. And even those aren’t powerful enough yet to tackle the extremely complex problem of modeling a metabolic signaling pathway, for instance.
The approaches under way in systems biology today fall into two general categories – top-down (starting with a disease state) and bottom-up (beginning with genes and proteins). But which ever approach is used, there seems to be a general agreement among the players in this field that the study of systems biology will require both hypothesis-driven research and discovery science. It integrates biology, computation and technology in an entirely new way. Ideally, systems biology involves an interaction between an experiment and a simulation, or a predictive model. The model – whether it be of a cell, an organ, a biochemical process or an entire organism – is used to design experiments that will test its assumptions. Once the results are known, they’re fed back into the model, which is adjusted accordingly. New simulations are run, and their assumptions are put to the test. It’s an iterative process – and key to the approach taken by many systems biology researchers.

Biotech visionary Leroy Hood made his mark in discovery-based science: He spent much of his scientific career developing and bringing into universal practice the leading reductionist tools – sequencers and synthesizers for DNA and proteins – and was a prime mover of the Human Genome Project. But producing databases jam-packed with data on DNA and protein sequences was just the beginning, a means to generate the building blocks for what’s to come. Without all this information – and Hood firmly believes that biology is information science – systems biology would not be possible.
He’s its strongest proponent, too, and co-founded the not-for-profit Institute for Systems Biology (ISB) in early 2000 to pioneer systems approaches to biology and medicine. (For background on Hood and the ISB, see the Signals article, “Systems Biology In The Post-Genomics Era.”) The ISB has already established the resources necessary for this massive undertaking – including facilities to generate large-scale data for genes, messenger RNA and proteins, a cross-disciplinary cadre of scientists, technologists to invent the next generation of instruments and tools that will be required to analyze biological information, and computer scientists to develop the requisite bioinformatics capabilities to test and model biological systems.
Three years after founding Seattle-based ISB, Hood believes that “the most important advance [in the field] has been to really articulate exactly what systems biology will entail.” And that’s “the effective integration of global sets of data of very different types.” It will require the development of “better global technology for making measurements and new computational tools for generating and integrating [the data].”
In a capsule, systems biology is hypothesis-driven, global, quantitative, integrative and iterative, Hood explained. It’s a sweeping vision, but not everyone understands it. “Some people confuse systems biology with discovery science,” he said. “Systems biology uses discovery tools, but they are only a part of what systems biology is.”

Indeed, it turns out that there are just about as many definitions of systems biology as there are practitioners. And not all of them think it’s a brand new discipline. “I’ve been doing systems biology for 20 years,” explained H. Steven Wiley, the director of the Biomolecular Systems Initiative at the Pacific Northwest National Laboratory (PNNL) in Richland, WA. Wiley’s research has focused on understanding mechanisms of cell communication and signaling using the epidermal growth factor receptor system as a model. He’s combined the techniques of molecular and cellular biology with biochemical and optical assays in his work, and used the results to build computer models of the underlying cellular processes. Wiley brought that perspective from the University of Utah to PNNL, where he’s building a multidisciplinary program to understand complex biological systems from a systems perspective.
“Twenty years ago, a scientist would add a drug [to an experimental system], see an effect, make an hypothesis, then test the hypothesis,” he said. That researcher was dealing with a system, which might consist of 10 parts. “You can’t study all 10 at once, so you try one, but then you find out there are actually 10 parts in that one part. This approach drives reductionism. You focus down until you finally get to the molecular level. But how do you relate that to the original problem?”
That’s where systems biology comes to the rescue, by building computer models of the individual parts – and of their parts, as well. “Your models become more complex, but you never stop focusing on the original problem. You don’t lose the forest for the trees,” Wiley said. However, even though “computer models allow you to handle the complexity,” the experimental technology has to be scalable to keep up with this complexity. And, with advances in microarrays and such, today a researcher “can work on 100 things at once.” With the advent of high-throughput technologies – which were developed in conjunction with the Human Genome Project – “We’re much better at doing computer models than we were. We have more understanding of how to model biological systems.”

The PNNL’s Biomolecular Systems Initiative is part of a larger effort mounted by the U.S. Department of Energy’s (DOE) Genomes To Life program, and, together with Oak Ridge National Laboratory, serves as a national center for systems biology experimentation. It’s a good thing, too, because Wiley said that the technical capabilities required for very large scale, extremely high-throughput research like this “don’t exist at universities,” and due to business constraints and Wall Street pressures, “companies can’t look at long-range problems. That’s the problem with systems biology – there’s no good place to do it.” (For similar reasons, Hood left his post at the University of Washington to form the ISB.)
However, Wiley wasn’t implying that companies don’t have a place in the grand scheme. “Companies have a desire to develop quantifiable, reproducible, reliable assays that they can use” to generate data that can automatically be put into computer databases. It’s all part of quantifying biology, which until now has been a largely descriptive science. “We are really at the beginning of biology as a hard quantitiative science,” he said. And it’s about to take off.
Indeed, there are any number of systems biology companies already hard at work on this problem – whether they’re starting from the top and working their way down or the other way around.

Target Discovery Inc., for instance, calls itself a discovery biology company, and is developing enabling technologies in expressional proteomics, interactional proteomics and metabolomics – technologies that are intended to vastly improve the quality of data needed for truly quantitative biology. Indeed, that’s the Palo Alto, Ca-based company’s forte: It’s devised a computationally based approach to the use of systems biology in drug discovery – and mathematical modeling of biological systems via artificial intelligence lies at the core.
The company’s CEO, Jeffrey Peterson, as well as its CSO, Luke Schneider, are chemical engineers by training – and as such they bring a distinctly different view to the world of experimental biology. In essence, both think that the mathematical, in silico modeling of unit operations now commonly employed in a wide variety of industries can be applied in a biological setting, as well. “We can generate testable hypotheses, create a model that makes a prediction and then validate it experimentally,” Schneider said. “We’re bringing a new discipline to biology, but it’s an age-old discipline in chemistry, physics and engineering. The tools have been around for a long time.”
“I like to call it mathematical biology,” he continued. “Systems biology is really the hypothesis-generating tool. The math directs what you need to do, and the ‘omics’ technologies provide the empirical means” to support the hypothesis. “It’s already been demonstrated that hypothesis-free experimentation doesn’t work. You can’t do one without the other.”
As we all know, discovery science (the “omics”) has already generated mountains of data – and continues to do so. It’s already far more than the human mind can comprehend. And, as the number of variables and contents of the databases continue to grow, current bioinformatics approaches won’t be able to handle it either. That’s why Target Discovery uses artificial intelligence (AI) for its mathematical models of underlying biochemical processes in human systems: “AI systems have demonstrated the ability to arrive at solutions more complex than a human would design, but which are more robust with unanticipated strengths,” Peterson explained.
“AI techniques winnow the universe of possible models down to a handful that best match the experimental data. From this point, you can play with them in silico until you get a bifurcation in predicted outcomes,” where half the models indicate one course of action while the other half point a different direction, Schneider said. “The bifurcation is a conflict between models that can be experimentally tested and resolved in the laboratory. Once you’re down to the model that best correlates to all relevant data, you have a predictive tool,” which can then be used in target selection, lead selection and optimization, and clinical optimization during drug discovery and development. “Playing with a mathematical model facilitates selection of the optimal target in the biological context of a disease pathway. You can illuminate the potential surrogate markers, predict tox profiles, and so forth,” he continued. “This approach will be the key to the efficiency and speed breakthroughs so badly needed in drug development today.”

Entelos Inc. also uses an in silico approach to systems biology – but in this case, the firm develops large-scale models of human disease (dubbed PhysioLabs) and simulates experiments on the computer. Each PhysioLab is specific to a disease or therapeutic area, and to build them the Foster City, CA company’s scientists and engineers glean information from thousands of peer-reviewed journal articles as well as the firm’s scientific advisors and development partners. The data – genomic, proteomic, physiologic and environmental – are then integrated into the model in the context of a disease. Importantly, the models are validated by their ability to reproduce results that are generated experimentally, after which researchers can begin to simulate experiments aimed at identifying pathways, genes, targets, drug candidates and so forth.
Entelos has already developed PhysioLabs for diabetes, obesity, adipocytes and asthma, and is working on one for rheumatoid arthritis. “The technology is the same for all platforms,” explained CSO Tom Paterson, and each is being developed in collaboration with pharmaceutical partners. “This provides us with a means to focus development in a way that provides value to the pharma, plus gives us an additional doorway to scientific advisors that may have relationships with our partners.”
The company’s modeling employs a top-down approach, first identifying the systems of the body that are relevant to disease and then working down to the tissues, cells, proteins and genes. Genomics and proteomics researchers, who work from the bottom up, have “industrialized empirical biology and created a vast amount of data quickly,” he said. And that’s great. However, “people have been dredging through these databases for a long time and haven’t gained a lot of insight. But the data do give you a lot of clues.” When you put those clues together, you can make the transition to hypothesis-driven research, Paterson said. And that’s what Entelos does.
A screen shot of the Diabetes PhysioLab. Courtesy of Entelos.
“The most valuable clue to understanding a disease is the clinical manifestation of that disease, and our hypotheses encompass the entire disease state.” For even in a particular disease (say rheumatoid arthritis) “not everyone has the same disease. It’s really a collection of syndromes with different manifestations and responses to treatment. The fact that we have collections of individuals who are similar but not identical gives us a rich set of clues that tell us something about the system that manifests the disease state,“ he explained. “Now we use the clues to reverse engineer… We map out the different physiologic systems, tissues and cells that participate in the disease state,” an undertaking that requires close collaboration with specialists in that particular disease. For comparison, Entelos also models the normal physiology of those very same tissues and cells – a model it can then perturb at will to create the disease state based on known and hypothesized causes for the disease. “We have to get our virtual patients to behave the same way the actual patients do,” Paterson said.
Ultimately, though, Entelos intends to use its capabilities not to “simply represent a single hypothesis of a disease state but also to map out the knowledge gaps and formulate multiple hypotheses of what might be going on,” he added. “It’s what you don’t know that surprises you.”

While Entelos uses virtual patients in its approach to systems biology, Beyond Genomics Inc. uses real ones. The Waltham, MA firm measures peptides and proteins from clinical samples and then integrates the information with metabolomic and genomic data using bioinformatics tools and statistical methods. Put in the context of biological pathways and disease mechanisms, the company claims its approach is able to identify specific targets for drug development and markers for disease diagnosis.
“If we can measure genomic, proteomic and metabolomic data simultaneously from tissue and/or body fluids, and then compare a healthy state with a diseased one and look at the differences and understand what has brought on the disease, we can treat it,” explained Stephen Naylor, Beyond Genomics’ CTO.
How does the company’s system work? Citing its December 2002 deal with GlaxoSmithKline plc (GSK) as an example, Naylor said that “GSK provides us with plasma samples from a cohort of age- and sex-matched controls as well as a cohort of patients with metabolic syndrome X [a set of heart disease risk factors that tend to cluster in some people]. Say there are 20 controls and 20 patients. Each of the 40 samples is analyzed individually; they are not pooled. The differences between samples contain important biological information which we don’t want to lose. Then we prep the samples to isolate the proteins and small molecules (including lipids) and analyze each sample with mass spectrometry. This gives us a specific molecular weight for each protein and lipid present in each sample – and 40 data files for the proteins and 40 data files for the lipids,” he explained.

Then, these data are processed by merging the protein and data files for each sample into a single file. Those 40 files are subsequently run through an “alignment anomalization” process and a “principal component” analysis to ascertain the differences in the protein and lipid constituents among those 40 samples. A subsequent clustering plot might demonstrate that “maybe 15 or 16 of the patients would be relatively tightly clustered, with some outliers, and typically 10 to 12 of the controls would be tightly clustered, with some outliers,” Naylor explained. The company can also identify the individual protein and lipid components via tandem mass spectrometry. “In the final step we use nonlinear kernel PCA [algorithms for pattern reconstructions]” to analyze the differences between patients and controls. “These are the first elements of networks. It’s not a pathway per se but we’re moving towards that.”
Of course, Beyond Genomics’ system can also incorporate genomic and metabolomic data, as well. “We can acquire information across all three ‘omics’ and integrate the data,” Naylor added. The firm’s pharma partners can then compare the results with publicly available information. “We make certain that our bioinformatics can not only take in our own data but can also bring in [external] data and map that. It goes both ways.” And that capability – to be able to incorporate information from multiple sources -- is critical to systems biology’s eventual success.

But not every systems biology company relies so heavily on computer modeling and bioinformatics. BioSeek Inc., for instance, starts with primary human cells to model diseases and identify targets and mechanisms of action of drug candidates. “Primary cells are closer to humans than cell lines or mice or yeast,” explained president and co-founder Rolf Ehrhardt. As well, primary cells have an intact regulatory framework, which apparently is lost in cultured cell lines. “Cells also have the ability to integrate or process a lot of information. When we challenge them with a drug, the cells process the information for us. The data are generated in a much more biological context” than they would be if the Burlingame, CA firm used computer simulations.
However, being primary cells, they don’t last forever, so BioSeek has developed ways to ensure the reproducibility of its experiments. “Each cell type has a limited passage number,” explained Ellen Berg, the company’s CSO and co-founder. “We keep banks of cells that have the desired characteristics.” Moreover, since BioSeek’s assays “reflect almost all the pathways inside a cell, we know right away if a cell has mutated.” Plus, the company phenotypes everything with a clinical match. “We’ve put a lot of effort into validating and benchmarking these systems,” Ehrhardt added.
“This is a systems approach,” Berg said. “We do global analysis on every protein and every component of the cell.” To accomplish this, company researchers use different types of primary human cells – including endothelial and epithelial cells and lymphocytes – and manipulate them to mirror various diseases either by switching genes on and off or by adding compounds such as cytokines or chemokines. Moreover, different cell types can be mixed together to study how they interact with each other. All this is done in the bottom of a well on a standard microtiter plate. BioSeek focuses on protein readouts (rather than mRNA, for instance) “because proteins give us a large amount of information,” she said. “We get 7-30 readouts per system.”

Researchers might mix all three cell types together, for example, and add cytokines to simulate inflammatory disease, thus turning on the appropriate pathways in the cells. After the cells have processed the information “we read out the key inflammatory proteins at the end of the pathway,” Berg explained. “We can also challenge the cells with a drug (or a gene),” and because drugs (and genes) can be clustered according to function, we can distinguish any given drug (or gene) with this system.” In fact, Berg said, “We’ve tested and mapped over 200 well-characterized and/or approved drugs in inflammation. We’ve also taken known genes and under- or over-expressed them. They act as expected from [information published in] the literature.”
As everyone knows, big pharma companies are struggling to come up with new compounds, but they aren’t having much luck. According to Ehrhardt, that’s because “they are looking with old [drug discovery] systems.” Technologies like BioSeek’s could change all that.
“Everybody needs better biological methods to find the functions of [newly discovered] targets so they can be prioritized,” explained Stanford University professor Eugene Butcher, chairman of BioSeek’s scientific advisory board and a co-founder. “The only way we can hope to figure out how human cells respond in general is to measure their responses in many different environments” and with high-throughput technology. Once the data are generated “we can reconstruct a roadmap to study the functional relationships of genes and proteins in many cellular environments.”

Like BioSeek, Odyssey Thera Inc. studies complex protein pathways in the context of living human cells, but there the similarity ends. The San Ramon, CA firm combines human cell lines appropriate to specific disease models with a rapid way of mapping proteins directly to their site of action. Although Odyssey Thera does focus on proteins, “We distance ourselves from proteomics,” explained CEO Marnie MacDonald. Instead, the firm measures protein-protein interactions, i.e. signal transduction, in real time to discover how cytokines and hormones act as signals, for instance, or the mechanisms by which a drug will interrupt a particular pathway.
These studies are made possible by a fluorescence-based assay -- the protein fragment complementation assay (PCA) -- that works like this: A cell line is stably or transiently infected with two genes of interest that code for two proteins of interest, proteins that normally bind with each other. Scientists also take a reporter protein (such as an enzyme that generates a fluorescent signal upon conversion of substrate to product) and dissect it into two pieces, which are fused to the two different genes of interest. When the proteins produced by these genes subsequently bind to one another, the reporter fragments reassemble, regenerating the enzymatic activity and giving off a fluorescent signal. If one blocks a pathway – say, with a drug – then there’s a decrease in the signal. The key to this approach, MacDonald said, is the fact that “there is no background [i.e., non-specific binding]. You get no signal unless the fusion proteins come together.”

High Throughput Screening of Drug Effects in Living Cells. The compound, which blocks a key signaling pathway in cancer cells, prevents membrane localization of a kinase-substrate complex as detected by PCA (green signal). Cell nuclei are shown in blue. Images were collected by automated microscopy in 96-well or 384-well plates. Automated image analysis allows quantitation of the membrane:cytoplasmic fluorescence. Courtesy Odyssey Thera.
The human cell lines, which must be “well behaved” and amenable to thorough characterization, are chosen based on the type of drug that company scientists wish to discover: For instance, “we use cancer cells for cancer drug discovery,” she said. “We screen for drugs directly in the cell” using two kinds of screens – high-content and high-throughput. As a result, Odyssey Thera is able to construct detailed maps of the biochemical pathways in these cells.
“Assays that measure molecular events inside the cell have been lacking,” MacDonald said. “Once we have them, we can apply them in a high-throughput manner to map the entire signaling networks involved in cancer” as well as other diseases.

Obviously, having a system like that – or any of the others described in this article – up and running full bore will provide a significant opportunity to streamline the drug discovery process. But when might systems biology actually start to make an impact? Not soon, according to ISB’s Hood. Although some progress has already been made in model organisms, “most spectacularly work on sea urchin development done by CalTech’s Eric Davidson… getting into higher organisms is really difficult and will go much more slowly.”
According to Hood, the short-term challenges for systems biology fall into two categories. First, researchers need to develop better technology to make global measurements, and for this “microfluidics and nanotechnology will be the source.” Second, there’s a need to develop “computational and mathematical tools for storing, capturing, analyzing, integrating, modeling and defining the elements of systems,” he said.
Some of the innovation necessary to achieve these goals is bound to come from new companies created by Accelerator Corp., a collaborative partnership formed among the ISB, VC firms MPM Capital, ARCH Venture Partners and Versant Ventures, as well as Alexandria Real Estate Equities Inc. Founded in May 2003, and backed by $15 million in committed capital, Accelerator will identify, finance and develop leading-edge ideas and technologies in systems biology – whether they originate from within ISB itself or elsewhere.
The idea was an instant hit. According to Carl Weissman, Accelerator’s president and CEO (and a venture partner at MPM Capital), “We’ve gotten a flood of responses, including about 70 cold calls” just in the first few weeks. Many of those are coming from academics who want to start a company, he added, but “existing companies that want access to cash” are also interested. The latter, though, are “not appropriate. We are looking for cutting-edge technology and ideas that require some level of development before an A [financing] round.” Accelerator intends to invest in six to eight companies over the next three years, Weissman said, and will be extremely selective about those it chooses.
And while those yet-to-be companies go about developing new technologies to apply to systems biology, research programs currently underway at already-established firms, academic and government labs and non-profit institutions will continue to contribute to a growing knowledge base that will help define the direction that systems biology will take in the future.
For all that, “Systems biology really hasn’t done anything yet,” said PNNL’s Wiley. “For the first time we can actually see our way clear to how it is to be done. The time is right to take what we’ve learned and do it.”
|