Why there are so many legacy applications in pharma and what to do about it

This post was written by Randy Julian on April 21, 2010
Posted Under: Products

Computers have now been used to collect, store and perform calculations on laboratory data for over 30 years. Since the record retention policies for most regulated functions in a drug company state that records must be kept for at least 35 years, some of what was collected in the early days of computerized laboratory operation must still be kept.

During the 1990s the Food and Drug Administration recognized that computers, being used to collect so-called “electronic records”, were fundamentally different from measurement systems which printed to paper.  A device which printed only to paper, and kept no internal record, could be used to meet FDA evidence requirements if the paper were signed, dated and secured against modification.  Computer systems were expected to meet the same requirements, despite a lack of tools to ensure electronic files were as valid as the earlier paper outputs had been.

IT departments, instrument vendors and laboratory software vendors struggled with the problem of computer system validation and the security of electronic records through the 1990s and into the 2000s.  The primary response was to create a heavyweight, system-wide validation to ensure that regulated computerized systems were tested and compliant with the new agency guidance known as 21 CFR Part 11.  Once validated, data could be stored or managed electronically as long as the validation was maintained through system upgrades and modifications and as long as the application was operational.  Data could be migrated from one system to another only through extensive validation of both the new application and the migration tools.  The expense of migration and validation gave an incentive to keep regulated systems operational over extended periods of time.  It also created a situation where the number of systems used by an organization increased, as it became more cost effective to transition from one system to another by leaving old data “retired” in-place.

As the use of computerized laboratory systems increased through the expansion of applications such as Laboratory Information Management Systems (LIMS) and Electronic Laboratory Notebooks (eLN), the demand on IT support groups also increased. New, more complex systems based on relational databases such as Oracle were added on top of existing ‘legacy’ systems.  The support and maintenance costs to IT departments across the industry have become staggeringly high.  Ironically, while customer expectations were being raised by the expansion of retail, mass market, computer technology, IT departments saddled with decades of system acquisitions, spent most of their resources maintaining older, less valuable systems. Every drug company can make a graph like this:

Application cost benefit plot

A tipping point has come with the latest wave of industry consolidation and healthcare reform legislation.  Newly merged companies found themselves with expensive, duplicate functionality  provided by incompatible, non-integrated, legacy systems during a time when margins and market cap plunged.  Solving the integration problem and reducing the cost of maintaining legacy systems, so that more resources could be devoted to driving innovation in drug discovery and development has shifted from a nice-to-have, to a must-have.  

  • How can regulated data, which must be accessed using some part of a laboratory application be migrated from a legacy system without losing the minimum functionality needed to meet regulatory requirements?
  • How can data stored in multiple, incompatible data models in a variety of database systems be integrated so that the assets of merged companies can be properly combined to help improve product pipelines?

Indigo BioSystems has attacked this problem and produced an elegant solution to both questions.

As an integrated laboratory data management and analysis system, INDIGO was designed to store the data from any laboratory operation – with the goal of organizing and integrating information in order to support automated analysis.  INDIGO uses a very abstract database system, which flexibly stores relationships between data objects, as opposed to storing data in fixed structures as is done in most legacy applications and relational database systems.  INDIGO also allows open format raw data to be stored, since direct access to raw data is critical to most data analysis tasks.  By using open format data and storing data extracted from relational databases in a ‘semantic’ model where data is stored and annotated for meaning, INDIGO ensures that data can be searched and found indefinitely.  Finally, to perform data analysis, INDIGO provides full analysis tools, including complex computational capabilities used to visualize and report on data and results.

Indigo BioSystems combined all of these capabilities into a single system following design principles pioneered by large-scale data management operations at Google, Amazon and Yahoo.  By insisting that INDIGO be capable of large-scale scientific analysis on the order of these internet giants,  our engineers avoided the design errors which now plague almost all enterprise software: poor performance due to inherent lack of scalability.  At a time when the drug industry badly needs better productivity, the most frequently cited culprit is slow performing software systems, many purchased to speed the very processes they are slowing with poor scalability.

By extracting data from legacy systems and storing it in INDIGOs massively scalable storage system and replacing the critical search, calculation and reporting function of legacy systems, it is now possible to truly retire legacy systems and simultaneously integrate their data to produce information aggregation.  Some or all of a legacy systems functionality can be implemented using the scripts and workflow engine supported by INDIGO.  Data can be extracted into “resources” without performing extensive reverse engineering of legacy data models.  A “Resource Oriented Architecture” allows links to be formed between data items as needed to create a dynamic data model which grows more searchable and  valuable over time.

  • Share/Bookmark

Add a Comment

required, use real name
required, will not be published
optional, your blog address