Skip to Page Content

Resources

caArray - Homepage

Case Study: caArray

A New Generation for Array-Based Cancer Research

Challenge

Successful software evolves to meet the changing needs of its user community and the “world” in which it exists. caArray is no exception. Originally deployed in January 2005, caArray enables researchers to acquire, disseminate, and aggregate high quality cancer array information via the cancer Biomedical Informatics Grid (caBIG®), the biomedical informatics network developed by the National Cancer Institute (NCI). caArray supports leading array manufacturers Affymetrix (AFFX) and Illumina (ILMN) and a variety of proven and cutting-edge experiment methodologies. NCI engaged 5AM to spearhead caArray’s first major revision since its inception. The challenge was to meet the user community’s changing requirements, optimize the architecture, re/produce supporting software assets, and incorporate new developments in the caBIG®-wide architecture.

“In addition to driving a shift in the way research is taking place at our lab, caArray has also enabled us to make our data available to other institutions. We believe this data will enable cancer research by informing pre-clinical and clinical studies in humans, essentially initiating a paradigm-shift in the way research is currently taking place.”

– Chuck Donnelly, Director, Computational Sciences, Jackson Laboratory Cancer Center

Clear Communication Leads Change

As the effort’s lead team, 5AM orchestrated fundamental change in development, test and management practices across a matrix of interacting teams spanning five companies. We met with stakeholders around the country to elicit and refine the requirements, channeling them into a long-term vision that continues to be iteratively and incrementally fulfilled today. Use cases served as the basis for the most transparent communication method possible for all stakeholders inside and external to the NCI.

New Functions and Features

To address over 400 logged bugs preventing adoption of caArray, 5AM developed and executed a plan to target key functionality of this J2EE product. We maintained a production system while creating the next generation by focusing on delivering key, unavailable features central to the application’s use. We initiated a distinct integration environment to support a vast improvement in quality, introducing automated daily builds, code quality checks,, comprehensive automated tests and coverage reporting. 5AM reintroduced the ability to load, validate and parse massive data files and provided an interoperable mechanism for analyzing the data by creating and submitting the corresponding domain model (in UML) to the caBIG® curator team.

Boot Camp Trains Adopters

We developed a code-based boot camp and delivered the training at successive caBIG® Annual Meetings to a wide range of adopters with varying skill levels. 5AM also organized, documented and implemented a change control process to support all groups.

Result

5AM turned concern into optimism by delivering on this challenging product. Confidence in the software has increased; caArray 2.0 now is among the most widely-used software in caBIG®. Over 300 research centers deploy caArray 2.0 in the U.S., Australia, England, and New Zealand.

“caArray is making our workflow more efficient by allowing the user to have more control over what he or she enters in terms of metadata. There is no forced sequential entry of data required, so users can upload data files and associate metadata with ease.”

– Sunita Koul, Software Developer, Bioinformatics Core at Washington University School of Medicine

caArray 2.0 has adopted a federated model, enabling researchers to search data from laboratories around the world via local installations of caArray. Its user-friendly web interface is complemented by rich programmatic APIs that permit analytical tools on and off the Grid (like caIntegrator2, geWorkbench and GenePattern) to pull data from caArray for visualization and analysis.

Our expertise in moving toward a use-case oriented program that was iterative, incremental and focused on architecture and risk mitigation was rewarded: the software build and deployment automation piloted under this project – which makes caArray 2.0 simple to install and deploy – is now being introduced across the caBIG® product teams.

“The software is very well organized and exhibits clean design, making installation and usage remarkably easy. We are excited about using caArray as a data distribution system in that it has effective mechanisms for security and configuring data visibility. These are core requirements and they have been very well met by the current design.”

– Gerald Fontenay, Computer Systems Engineer, Lawrence Berkeley National Laboratory

With caArray 2.0, NCI-affiliated researchers worldwide now have the ability to quickly validate, upload, and analyze array data on caBIG®. This essential tool is accelerating progress in all facets of cancer research—with cancer patients being the project’s most important beneficiaries.