Overview
Research in the Computational Life Sciences and Informatics Core (CLSIC) focuses on understanding (and predicting) life using
a “Systems Biology” approach. Systems Biology aims at system-level understanding of biological systems, through which the “group
of parts” that make up “the whole” are connected one to another and work together. The ultimate goal of Systems Biology is to
develop in-silico bio systems. As a complex discipline, Systems Biology acquires data from all biological fields, including
genetics, biochemistry, structural biology, cell biology, physiology, and biophysics; and through the use of mathematical models,
regulation and communication pathways and relationships among the components in hierarchy from DNA to individual organisms can
be established.
The formation of the Bindley Bioscience Center's CLSIC is a cornerstone in establishing the capacity to deploy such systems approaches
for integrating molecular information that will enable us to understand and to create the capability to predict behaviors of
living systems. One key component for success in emerging Systems Biology approaches is the capacity to link informatics systems
with core research expertise and infrastructure in computational sciences and technology. To develop this capacity, the CLSIC
is engaging in cooperative partnerships with specific academic units and/or other Centers. Current research in the CLSIC is
focused on:
Automated data management systems to organize information flow in projects and to provide a common place for
scientists to integrate scientific information.
Development of various informatics tools for different assays designed for particular cellular regulatory events
as well as diseases. These tools should statistically identify significant changes in biological processes and
also facilitate the development and testing of effective markers for disease progression and/or therapeutic
efficiency.
Mathematical and statistical models to simulate the live biological system. These models should move the Systems Biology
model from to the stage of integrative omics to predictive omics.
|
Infrastructure
In order to efficiently and effectively advance campus-wide life sciences research, investigators at Bindley Bioscience Center
are undertaking the building of a Computational Life Sciences Data System (CLSDS) that can do the following:
The system is based on an e-notebook concept, whereby scientists can rely completely on electronic records during
any scientific activity. The e-notebook is composed of two different workflows: Experimental Workflow (EWF) and
Data Mining Workflow (DMWF). EWF refers to any laboratory activity related to sample analysis. This workflow starts
at sample reception and lasts through the end of an experiment. DMWF refers to all data analysis activities, either
during an experiment or after an experiment was executed.
The CLSDS uses LS*LIMS, a lab management and automation system for high-throughput life sciences developed by Applied
Bio systems, to track all experimental activities. The software includes a workflow engine to fully support automation
of laboratory processes and data flow. Further, the software facilitates protocol versions and process templates
control. Using the process viewer, the status of process runs can be viewed in real time and drilled down to view
the status or results of containers and samples. With LS*LIMS, samples can be tracked by container, location, and
other categories. Scientists can also annotate all relevant information to the original sample, regardless of the
sample type. The software maintains the genealogy of every sample and offers full support for barcode tracking.
The LS*LIMS will be customized to fit researchers’ needs, including the creation of workflow and optimization
of the data flow.
In order to support the information flow in the life sciences, it is necessary to have hardware to hold the information. Figure
1 depicts the hardware architecture of informatics for life sciences at the Bindley Bioscience Center. It can be seen that all
experimental activities, including LC-MS, are captured by LS*LIMS. A server in the data farm is dedicated to the LS*LIMS system.
Other servers are used for data mining. The user can access the data farm through Web service. Based on each individual’s priority,
the data security system of the data farm will allow the user to query certain data sets. The processing results can be captured
by the system. The system will also track each data transaction.
|
Resources
The Computational Life Sciences & Informatics Core (CLSIC) enjoys the privilege of sharing other campus wide resources. CLSIC
interacts closely with its “sister Discovery Park center:” e-Enterprise Center (e-Center) at Discovery Park. The e-Center engages
in research in the goal-oriented application of digital technology for business, government and societal problems. The e-Center
also provides infrastructure and an integrated environment for a number of goal-oriented, cyber/computational activities on
the Purdue campus, with a special focus on areas where Purdue has, or can develop, national leadership. The e-Center brings
together faculty and students with strengths in modeling, simulation, optimization, database systems, software engineering,
information security, communication, management, algorithm engineering, operations research, production systems, decision theory,
system analysis, risk management, marketing, and customer service.
Information Technology at Purdue, known as ITaP, provides all information technologies and maintenance service for
the CLSIC. ITaP was established in 2001 to integrate the information technology infrastructure and programs within
the University for learning, discovery, and engagement. High performance and data intensive computing resources
available to researchers include an IBM SP supercomputer, PC clusters, distributed computing solutions, and mass
data storage. The clusters and SP provide an aggregate capability of 1.75 teraflops. ITaP is currently adding additional
resources that will increase the aggregate capability to 3.9 teraflops. In addition, the Purdue SP is linked by
way of the I-Light high-speed optical fiber connection(currently 1 Gbps) to the IBM SP supercomputer at Indiana
University to provide an additional 1+ teraflop-distributed terascale computing capability.
CLSIC develops many scientific projects with the Indiana Center for Database Systems (ICDS) and Department of Computer Sciences.
The ICDS and Department of Computer Sciences are dedicated to providing high-quality computing facilities for use by computer
science faculty, students, and administrative personnel. The facilities are operated by a group of technical staffs who are
not only responsible for the installation and maintenance of the systems, but also assist faculty and students in the development
of software systems for research projects. The staff includes a director, facilities manager, administrative assistant, one
network engineer, one hardware engineer, six system administrators, and several student assistants. The Computational Life Sciences
& Informatics Core (CLSIC) enjoys the privilege of sharing other campus wide resources. CLSIC interacts closely with its “sister
Discovery Park center:” e-Enterprise Center (e-Center) at Discovery Park. The e-Center engages in research in the goal-oriented
application of digital technology for business, government and societal problems. The e-Center also provides infrastructure
and an integrated environment for a number of goal-oriented, cyber/computational activities on the Purdue campus, with a special
focus on areas where Purdue has, or can develop, national leadership. The e-Center brings together faculty and students with
strengths in modeling, simulation, optimization, database systems, software engineering, information security, communication,
management, algorithm engineering, operations research, production systems, decision theory, system analysis, risk management,
marketing, and customer service.
ITaP Research Computing Overview
Research Computing provides a variety of computing resources in support of Purdue faculty, staff, and students. This group supports
data and numerically intensive research applications on high performance computing systems such as Purdue's IBM SP and the DXUL
archival storage system. They also provide training and technical support in areas including scientific programming, program
design, optimization, and parallel programming.
Visit Website
e-Enterprise Center Overview
The e-Enterprise Center (e-Center) engages in research in the goal-oriented application of digital technology to business, government
and societal problems. It also provides infrastructure and an integrated environment for a number of goal-orients, cyber/ computational
activities on the Purdue campus, with a special focus on areas where Purdue has, or can develop, national leadership.
Visit Website
Envision Center Overview
Advances in computing platforms and instrumentation techniques have resulted in an exponential growth of data. Efficient interpretation
of this data is fast emerging as a key challenge in science, engineering, and business. The human-computer interface has emerged
as a major information bottleneck: computer speeds increase, but human comprehension is a datum.
Visit Website
Indiana Center for Database Systems (ICDS) Overview
The Indiana Center for Database Systems (ICDS) takes an interdisciplinary approach to solving practical problems in a wide variety
of database systems and their applications. Research activities and projects in the center include multimedia databases, data
mining, data streaming and sensors, database security and privacy, knowledge bases and web services.
Visit Website
Rosen Center Overview
Provides high performance computing and storage.
Visit Website
Statistical Bioinformatics Center Overview
The cycle of theory, experiment, and information is nowhere more important than in the life sciences, where we are learning
how to piece together various levels of expertise into a global or systems-level understanding of biology. Statistical Bioinformatics
is involved at each level: accumulation, organization, and analysis of biological data. Hypotheses that are initiated and tested
can be refined, and new experiments formulated for the purpose of supplying more information.
Visit Website
Statistical Consulting Service (SCS) Overview
The Department of Statistics provides statistical software and design consulting services for the University community. Faculty,
students and staff are encouraged to use these services, which are offered free of charge.
Visit Website
|
Publications
Journal Publications
Xiang Zhang, Wade Hines, Jiri Adamec, John M. Asara, Stephen Naylor, and Fred E. Regnier. An automated method for the analysis
of stable isotope labeling data in proteomics. Submitted to the Journal of American Society for Mass Spectrometry, 2004.
Xiang Zhang, Mourad Ouzzani, Vincent Jo Davisson, Fred E. Regnier, and Ahmed K. Elmagarmid. An Intelligent Data System for
Mass Spectrometry. Submitted to Computer Science and Engineering, 2004.
Xiang Zhang, Jiri Adamec, Stephen Naylor, and Fred Regnier. In-Silico Modeling of a Binary Elution Chromatography Approach
to Proteomic Mapping. Submitted to Analytical Chemistry, 2004.
Posters & Invited Talks
Xiang Zhang, Jiri Adamec, Stephen Naylor, Vincent J. Davisson, Fred E. Regnier. A binary elution chromatography approach
for high throughput proteomics, Indiana Proteomics Symposium, Bloomington, IN, Oct. 2004.
Xiang Zhang, Jiri Adamec, Vincent J. Davisson. Systems biology approach for drug discovery, Bio-nano Technology Symposium,
West Lafayette, IN, July 2004.
Xiang Zhang, Wade Hines, Jiri Adamec, John M. Asara, Stephen Naylor, Fred E. Regnier An automated method for the analysis
of stable isotope labeling data in proteomics. 52th ASMS Conference on Mass Spectrometry and Allied Topics , Nashville,
TN, May, 2004.
|
Molecular Properties
The molecular properties that can now be rapidly obtained on a large scale include genome sequences, mRNA expression profiles,
protein-protein interactions, and the fluxes of metabolites. The rate at which the three-dimensional structure of macromolecules
and macromolecular assemblies can be determined has also rapidly increased. The ability to monitor the activity of molecules
within individual living cells is rapidly increasing with the advent of advances in fluorescence microscopy and single molecule
techniques. Molecular interactions can be treated as mechanical problems (using concepts such as force, elasticity, and tension)
by the application of techniques such as atomic force microscopy. Technological advances continue to increase the rate of data
generation; however, data generation must be linked to analysis and understanding. With the appropriate tools, key system properties
of a living cell, its design principles, structures, dynamics and control methods can be modeled. Furthermore, our understanding
of these system properties can be codified in complex simulation models that can be experimentally tested and refined. Computational
Life Sciences can be divided into three major categories, which are:
- Bioinformatics: The research development or application of computational tools and approaches for expanding
the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive,
analyze, or visualize such data.
- Computational Biology: The development and application of data-analytical and theoretical methods, mathematical
modeling and computational simulation techniques to the study of biological, behavioral, and social systems.
- Systems Biology: The development of quantitative, mechanistic based models of the whole cell, collections of
cells or large pieces of the cellular machinery, where the objective is an integrated picture that compliments
the reductionist viewpoint of molecular biology.
As part of the implementation plan, there is a new Core Facilities that will be cooperatively developed in partnership with specific
academic units and/or other Centers. One key component for success in the emerging systems biology approaches is the capacity
to link the informatics systems and with core research expertise and infrastructure in computational sciences and technology.
The formation of the Computational Life Sciences and Informatics Core is a cornerstone in the establishment of the capacity
to deploy such systems approaches for integrating molecular information to understand and create the capability to predict behaviors
of living systems.
|
|