Institute for Systems Biology  CSB Home
 CSB Home

Core Facilities

Genomics Core
Proteomics Core
Microfluidics and Imaging Core
Informatics Core
Resources

Informatics Core



Overview:
From its inception the aims of the Informatics Core were established as:
  1. Provide a computational infrastructure that enables scientists to work in the way they want, have access to the data they require and perform analyses they need. Therefore, the Core has adopted the most applicable practices of industrial software development and the creativity of research-lead innovations. An integrated analysis framework called ICUBED has been built allowing for the complex analysis of semantically rich heterogeneous information to support high-throughput experiments.

  2. Foster a culture of cooperation between research groups. The Informatics Core supports the diverse research and development aspects of the Center. The Core works to support commonality between these independent development teams by promoting best practices and maintaining software. Education on the importance of, and mechanisms for, adoption of standards has played a significant part in the Cores undertakings.

  3. Ensure the computational infrastructure will be reliable, flexible and open. All of the software developed by the core follows these basic ideals:

    • reliable architecture: built to a professional standard, is designed and documented and exhibits both robust and failsafe properties.
    • flexible architecture: used to rapidly develop and deploy new functionality, adapted to suit new scientific methods and used to support a wide range of analyses and data driven discoveries.
    • open architecture: designed to make the interoperability with other systems easier through the use of common standards and multiple integration mechanisms. A closed architecture requires other systems to be altered and thereby subsumed into a dominant monolithic architecture.

Resources:
We have provided snapshot releases for components of the Center software that may be of use to the local general community. If you need help using these components then please contact the

Imaging Services (Cecilia)
The general purpose high throughput image repository is available both as source code, built using Apache Ant, as well as a Sun WAR file for deployment in (Apache) Tomcat. The software has been tested under Tomcat 5 only.

Synonym Service
The synonym service provides a means to map identifiers from one name space to another. Complex mappings can be defined using an administration and loading of data can be performed through a loading tool. The system can be built and installed using Apache Maven (version 2).

Registry Service
The Core currently uses Apache jUDDI as the main registry service. To ensure service interoperability we have built and maintain a standard set of interoperability unit tests for Java, Ruby, Perl and C# (based on document/literal WSDL 1.1). A taxonomy has been built to describe registry services, as well as a number of tools to aid in both the browsing and managing of services. The Core is building up a series of controlled vocabularies used in the RDF metadata documents (for example, a genomics ontology that is a subset of MAGE).

R Service
The R statistical Web Service is a dynamic web service that gives any client application access to the full power of the R statistical language. It dynamically marshals requests from SOAP via Ruby to R. In effect this means that R scripts can be called from any environment on any machine. As this system relies on a number of technologies a download is not available.

If you would like to use/evaluate the service please contact team.

Disclaimer and License
The software is distributed under the Apache license, and any issues associated with the software should be reported to the team.

This software and all associated documents are provided "as is" without any warranty of any kind, either expressed, implied, or statutory, including, but not limited to, any implied warranties of merchantability, fitness for a particular purpose and freedom from infringement, or that the software and all associated documents will be error free. The authors make no representations that the use of the software or any associated documents will not infringe any patent or proprietary rights of third parties. In no event will the author be liable for any damages, including but not limited to direct, indirect, special or consequential damages, arising out of, resulting from, or in any way connected with the use of the software or any associated documents

Projects:
The Informatics Core have been developing software in four key areas:

  • Infrastructure: software to support heterogeneous data integration
  • Genomics: software to support the analysis of microarray data
  • Imaging: to support the access to and analysis of cellular imaging experiments
  • Cytoscape: we are active contributors to the community Cytoscape project

Infrastructure
As part of the Core's mission we have developed a Service Oriented Architecture (SOA) that is: interoperable, allowing researchers to develop algorithms in the way they prefer; flexible, to allow the addition of new functionality with the minimum of coding; non intrusive, allowing developers to access their data without being required to adhere to a pre specified object model.


A conceptual schema of ICUBED: a data access component uses URN's to identify data and associated metadata and a data analysis component uses dynamically discovered web services.

The ISB Informatics Infrastructure, referred to as ICUBED, is a modular, service-oriented research enterprise architecture capable of integrating emerging technologies. The ICUBED enterprise architecture is designed for interoperability and extensibility and uses the facets of 'top-down' and 'bottom-up' design. In ICUBED developers are able to use their own evolving data models (e.g. bottom-up). However, formally defined domain specific data models and services (e.g. top-down) are provided through a number of common services.

There are two sides to the architecture: data access and data analysis. The data access uses LSIDs to provide an identity system for mapping data items to each other and to their RDF defined metadata. The data analysis architecture is based around Web Services. The ontology describing the Web Service is stored in a registry service allowing resources to be reasoned over and discovered at run time. A standard ID mechanism coupled with the use of 'meta models' and ontologies means that a formal data centric integration strategy is available to developers.

We have used ICUBED to support a number of research areas, including genomics, microfluidics and imaging. The services developed with this infrastructure are available for download in the resource section. These services are designed to be "cross-cutting", providing functionality that can be used in various applications; a synonym service is available to perform identifier mapping operations, a statistical service is provided to perform R script executions, a registry service is available to dynamically discover resources.

Genomics
We have made extensive use of GenePattern to build pipelines for the microarray analysis. We have provided custom tools to link these GenePattern instances to the ISB microarray data warehouse SBEAMS.

We have been working closely with the Genomics Core, who have developed a number of tools including SlimArray. Members of the Informatics Core have been working on the development of the desktop microarray analysis tool SeqExpress.

Imaging
An area where we have already applied ICUBED is in the development of software, named Cecilia, for the automatic analysis of high throughput cellular imaging. Cecilia consists of a number of services, each of which is dynamically locatable through our registry service. As these services are designed to be orchestrated externally, they can be reused within other distributed applications.


Within Cecilia data is captured from the device, parsed into an intermediate form and published via a SOAP interface to a data store. The data is held in a staging area in the data store until resources are available for processing, once processed the data can be queried via both LSIDs and SOAP.

Through Cecilia, the image data is captured directly from the microscopes and specially built drivers are used to integrate the equipment. Access to the image repository service is through a SOAP publish interface. When the data and associated metadata are published they are passed through an extract-transform-load (ETL) system into a data repository. The system has been designed to scale to the level of throughput required by the current generation of cell population based imaging experiments.

Cytoscape
Members of the Informatics Core have contributed extensively to the open-source Cytoscape development community. As part of this work a plugin manager system for Cytoscape has been developed. We have also helped in routine code maintenance and documentation for the Cytoscape community.

To help support ISB software projects, the Informatics Core has been working with the Gaggle team to create a Cytoscape goose. This goose connects networks together through the Gaggle Boss by taking advantage of the improved network handling.

Cytoscape has also been integrated with ICUBED allowing it to query the repository (through a UDDI plugin) and use the associated services.

Institute for Systems BiologyCenter for Systems Biology at the Institute for Systems Biology
1441 N. 34th Street, Seattle, WA 98103
Phone: 206.732.1200 | Fax: 206.732.1299 | Email:

© 2007, Institute for Systems Biology, All Rights Reserved