From its inception the aims of the Informatics Core were established as:
- Provide a computational infrastructure that enables scientists to work in the way they want, have access to the data
they require and perform analyses they need. Therefore, the Core has adopted the most applicable practices of industrial
software development and the creativity of research-lead innovations. An integrated analysis framework called
ICUBED has been built allowing for the complex analysis of
semantically rich heterogeneous information to support high-throughput experiments.
- Foster a culture of cooperation between research groups. The Informatics Core supports the
diverse research and development aspects of the Center. The Core works to support commonality between
these independent development teams by promoting best practices and maintaining software. Education on the
importance of, and mechanisms for, adoption of standards has played a significant part in the Cores undertakings.
- Ensure the computational infrastructure will be reliable, flexible and open.
All of the software developed by the core follows these basic ideals:
- reliable architecture: built to a professional standard, is designed and
documented and exhibits both robust and failsafe properties.
- flexible architecture: used to rapidly develop and deploy new functionality,
adapted to suit new scientific methods and used to support a wide range of
analyses and data driven discoveries.
- open architecture: designed to make the interoperability with other systems easier
through the use of common standards and multiple integration mechanisms. A closed architecture requires
other systems to be altered and thereby subsumed into a dominant monolithic architecture.
We have provided snapshot releases for components of the Center software that may be of use to the local general
community. If you need help using these components then please contact
Imaging Services (Cecilia)
The general purpose high throughput image repository is available both as source code, built using Apache Ant, as well as
a Sun WAR file for deployment in (Apache) Tomcat. The software has been tested under Tomcat 5 only.
The synonym service provides a means to map identifiers from one name space to another. Complex mappings can be defined
using an administration and loading of data can be performed through a loading tool. The system can be built and
installed using Apache Maven (version 2).
The Core currently uses Apache jUDDI as the main registry service. To ensure service interoperability we have built and
maintain a standard set of interoperability unit tests for Java, Ruby, Perl and C# (based on document/literal WSDL 1.1).
A taxonomy has been built to describe registry services, as well as a number of tools to aid in both the browsing and
managing of services. The Core is building up a series of controlled vocabularies used in the RDF metadata documents
(for example, a genomics ontology that is a subset of MAGE).
The R statistical Web Service is a dynamic web service that gives any client application access to the full power of the
R statistical language. It dynamically marshals requests from SOAP via Ruby to R. In effect this means that R
scripts can be called from any environment on any machine. As this system relies on a number of technologies a
download is not available.
If you would like to use/evaluate the service please contact
Disclaimer and License
The software is distributed under the Apache license, and any issues associated with the software should be
reported to Jennifer Dougherty.
This software and all associated documents are provided "as is" without any warranty of any kind, either expressed,
implied, or statutory, including, but not limited to, any implied warranties of merchantability, fitness for a
particular purpose and freedom from infringement, or that the software and all associated documents will be error
free. The authors make no representations that the use of the software or any associated documents will not
infringe any patent or proprietary rights of third parties. In no event will the author be liable for any damages,
including but not limited to direct, indirect, special or consequential damages, arising out of, resulting from,
or in any way connected with the use of the software or any associated documents
The Informatics Core have been developing software in four key areas:
- Infrastructure: software to support heterogeneous data integration
- Genomics: software to support the analysis of microarray data
- Imaging: to support the access to and analysis of cellular imaging experiments
we are active contributors to the community Cytoscape project
As part of the Core's mission we have developed a Service Oriented Architecture (SOA)
that is: interoperable, allowing researchers to develop algorithms in the way they
prefer; flexible, to allow the addition of new functionality with the minimum of
coding; non intrusive, allowing developers to access their data without being
required to adhere to a pre specified object model.
A conceptual schema of ICUBED: a data access component uses URN's to identify data and associated
metadata and a data analysis component uses dynamically discovered web services.
The ISB Informatics Infrastructure, referred to as ICUBED, is a modular, service-oriented research
enterprise architecture capable of integrating emerging technologies. The ICUBED enterprise architecture
is designed for interoperability and extensibility and uses the facets of 'top-down' and 'bottom-up'
design. In ICUBED developers are able to use their own evolving data models (e.g. bottom-up). However,
formally defined domain specific data models and services (e.g. top-down) are provided through a number
of common services.
There are two sides to the architecture: data access and data analysis. The data access uses LSIDs to
provide an identity system for mapping data items to each other and to their RDF defined metadata. The
data analysis architecture is based around Web Services. The ontology describing the Web Service is
stored in a registry service allowing resources to be reasoned over and discovered at run time. A standard
ID mechanism coupled with the use of 'meta models' and ontologies means that a formal data centric
integration strategy is available to developers.
We have used ICUBED to support a number of research areas, including genomics, microfluidics and imaging.
The services developed with this infrastructure are available for download in the
resource section. These services are designed to
be "cross-cutting", providing functionality that can be used in various applications; a
synonym service is available to perform identifier mapping operations, a statistical service is provided to
perform R script executions, a registry service is available to dynamically discover resources.
We have made extensive use of GenePattern
to build pipelines for the microarray analysis. We have provided custom tools to link these GenePattern instances to the ISB microarray data warehouse
We have been working closely with the Genomics Core, who have developed
a number of tools including SlimArray.
Members of the Informatics Core have been working on the development of the desktop microarray analysis tool
An area where we have already applied ICUBED is in the development of software, named Cecilia, for the automatic analysis
of high throughput cellular imaging. Cecilia consists of a number of services, each of which is dynamically locatable through
our registry service. As these services are designed to be orchestrated externally, they can be reused within other distributed
Within Cecilia data is captured from the device, parsed into an intermediate form and published via a SOAP interface to a
data store. The data is held in a staging area in the data store until resources are available for processing, once processed
the data can be queried via both LSIDs and SOAP.
Through Cecilia, the image data is captured directly from the microscopes and specially built drivers are used to integrate
the equipment. Access to the image repository service is through a SOAP publish interface. When the data and associated
metadata are published they are passed through an extract-transform-load (ETL) system into a data repository. The system
has been designed to scale to the level of throughput required by the current generation of cell population based
Members of the Informatics Core have contributed extensively to the open-source Cytoscape development community.
As part of this work a plugin manager system for Cytoscape has been developed. We have also helped in routine code
maintenance and documentation for the Cytoscape community.
To help support ISB software projects, the Informatics Core has been working with the Gaggle team to create a Cytoscape
goose. This goose connects networks together through the Gaggle Boss by taking advantage of the improved network
Cytoscape has also been integrated with ICUBED allowing it to query the repository (through a UDDI plugin)
and use the associated services.