The Evolution of a Platform
This blog is intended to give an inside view of Indigo’s product development including the rationale and thought processes we go through to create products that solve problems with data analysis in the life sciences. I usually try to explain the ideas behind our products.
This post is a little different.
As a technologist in a software company it is easy to surround yourself with like-minded technophiles. We thrive on the details of our solutions (which we sometimes treat better than we do our own children – just ask our spouses). But at some point, unless you are such a loser that you really want to unleash another crappy product with a blinking 12:00 on the display, you have to give up the love of complexity and strive for simplicity.
Easier said than done. It’s so hard it will make you want to quit. If you do, you’re in good company, crappy products are everywhere. If you don’t quit, you have a shot at really helping people.
I’m not saying that we’ve achieved the ultimate goal, but now that we know our approach to automating data analysis actually works, we have started the drive toward simplicity. This push has resulted in some unexpected benefits.
This is a long post so I will cut to the punch line. We have simplified our product to a platform that scientists can use to automate data analysis. After the work described below, it now looks like this:

It is an application which has a data repository function that makes raw and processed data available to an automation framework which includes all the features needed to automate lab data analysis. Period.
How we got to this from our “architecture” diagram:

is the story I’ll tell in this blog.
This summer I gave a talk at an AAPS meeting in Seattle on the “21st Century Bioanalytical Laboratory” and, during the vendor-heavy session I was in, the usual suspects showed up. “Get ‘yur LIMS here”, “What? You don’t have an eLN? What are you a caveman?” I wanted to shake everyone in the audience and say: “Do you really think that doing the same thing you’ve done for years will produce different results?” Come on. To paraphrase Kevin Spacey in K-PAX: “With so many doctors on this planet, why are so many people still sick?”
I loaded my PowerPoint for bear and got ready to try and wake everyone up. By the way, if you can choose to go last in a vendor-heavy session, always do it. It lets you search Google for images to make the big money marketing guys look flat footed. You can make your point in more contrast than the big boys whose uptight presentations feel like a slide-whipping…I’m just sayin’.
You can download my slides by clicking here.
The gist was this: we are doing informatics wrong for how pharma works these days, and if you keep doing things the old way, you will probably get laid off. Odds are you will get laid off anyway, but why walk around wearing a sign that says “You can probably do without me.”?
One way to not get laid off is to perform with magical efficiency. They cut your budget – you get more done. They take away people – you get more done. They ask you to work with CROs from far, far away – you get more done.
It seems unlikely that magical efficiency will come without using computers better than you do today. My favorite customer says most of pharma has reached the rarefied air of using Excel to do everything. He calls this “Spreadsheet Monkey Business” and can calculate the lost efficiency – it’s scary. Why are people content to abuse spreadsheets while waiting for the next round of cut-backs? I think it’s because Excel is easy to understand and Big Thinking Change is hard to understand.
At the end of my talk someone came to the microphone and earnestly said: “I understand LIMS, I understand eLN, but Indigo, I have never heard of anything like what you were talking about”. A few years ago that comment would have made my head swell with pride. It’s so nice to have someone confirm your genius by admitting to the room they don’t understand you. Not this time. This time, it felt like a wasp had stung my eyeball. I almost blacked out. If our product is so complex that a 20 minute talk, replete with jokes, teasing about Bioanalytical LIMS (sorry – that’s just too easy), and plenty of pictures, we’ve got a big problem. It’s not just the story, it’s the way we think about the product that matters – what does it take for people to understand a new complex product? It’s only their jobs and the salvation of the industry that’s at stake. It takes a lot – new things are hard to understand. Give an audience ten new things, and you are talking gibberish.
It started when we committed the sin of creating artificial distinctions between the components of our platform. We did it so that we could tell everyone how smart we were and show them we were a real company because we had lots of cool products. It let us talk (and talk and talk) about how clever each and every part really is. Why? Steve Jobs doesn’t do this. Hell, he glued the iPhone shut so that no one would talk about its guts. Trust me they are WAY cool, but they don’t really matter. Not that everyone got the iPhone idea at first either (Why would I want an “app” for my phone?). They got that it was a phone and that it could do some other stuff too. Now looking at Apple’s “magical” profits, people are starting to sort of get it.
Indigo started with the idea that we could automate the slow, manual error-prone analysis of lab data. We could speed up drug discovery and development, help cure disease and ease human suffering, and the market would pay for that help. We now know that this is true – now more than ever before. But automating data analysis is like saying “work smarter, not harder”. How do you actually pull it off? We figured out that in order to automate data analysis, first you have to organize data so that software can get to the raw materials of analysis easily. That means data integration and aggregation.
We are drive toward a basic science workflow:
Think of an industrial robot building iPhones (I don’t know if this is how they do it, but it should be). Robot arms are cool, but they are bolted somewhere and the parts they assemble have to be within reach. Also, if the parts are all in boxes with tape and foam peanuts, the robot will have trouble getting the part to grip. So, we need to bring the parts together, and “standardize” them, at least to the point that the robot can pick them up and make your iPhone.
We chose to use RDF and data linking (see my earlier post on this) to store structured data and a separate component called “HyperStore” and “OpenStore” to store unstructured data (XML files with raw data, etc.). Why two components? Bragging – the kind of hubris for which Zeus will smite you. It doesn’t really matter how it works. And there doesn’t need to be two components. It’s just a repository. It holds structured and unstructured data. The structured bit can hold the entire World Wide Web (we are not kidding about this – it scales to biblical proportions). The unstructured bit is blindingly fast and will allow you to search all your data at Google speeds (at least if you use our cloud approach and we parallelize the search…). That’s it. The platform has a repository. If you want a propeller-head deep dive come on in and we’ll show you the robot arm making the iPhone, but if you just want to make a call, you won’t care.
Solving the data integration and standardization problem for laboratory data was so challenging and rewarding that we couldn’t stop talking about it. Here’s another tip: only get involved in international standards bodies. And for real fun get on the site selection and menu committees as quickly as possible…I’m just sayin’.
With the integration task solved, we moved on to how to actually make people’s lives better by delivering functionality that helped people work faster. Here again, we were very clever – “too clever by half” (said the actor to the bishop). Our idea was to create plug-in’s out of Java using a modularity standard called OSGi. This is a good idea, but it means that someone has to write actual compiled code to create the module – and that is just too slow. Interestingly, for years we have been using the “R” statistics environment to develop algorithms – most recently a new peak picking approach based on some science fiction that is now working its way through the patent office (more on that later). We would take these algorithms and then move them from “R” into a plug-in. After a while we realized that it makes no sense to have a platform for which you have to write Java plug-ins to get functionality, especially when the R script has been done for months already. Why not just skip straight to providing modules based on R? Everyone loves R. So, we created a component to run R scripts that operates on data in the repository. This way, as soon as we see something working we can incorporate it directly into a workflow. Customers can get R scripts from us, create them on their own or download any of the 2500 packages on CRAN and BioConductor.
For example, we apply machine learning techniques implemented in R to automated chromatographic peak review and almost eliminate manual modification of peak-picking parameters. You can look at the lectures from a course I taught on this at Purdue a few years ago, but it starts with feature selection:
Then you can have experts help build training sets:
And then you can automate the identification of bad peaks:
Now we are getting somewhere: Indigo is a platform that has a repository which can automatically be populated from instruments and databases and it automates the task of running R scripts to do insanely good things with data – helps you do more with less.
The platform needed a few other bits to really make people happy. It needed a way to generate the myriad of files that other systems needed to operate (text files, oh well…if you make me.). OK, we can use a standard template engine and let people add, change and remove templates all they want.
We also needed a way for people to create web pages that would provide interactivity. Web pages collect data and put it in the repository, do queries, and fire off R-scripts. The Web UI engine is another open source project which allows editing, uploading and deleting of web forms.
Indigo integrates all these tools together and gives the customer total control. It makes it easy and fun to build automated data analysis processes – not just manual data analysis – which you can do any old way you want (unless you miss the cut on the next round of…well, you know).
That’s it: Indigo is a solution platform that uses an advanced repository to make data directly accessible to the best statistics system in the world and includes all the connective bits to talk to instruments, databases and other systems.
It’s just Indigo. It’s one thing. It has one name (like Oracle, Sting or Cher). If you want to know how the internal parts work you can do an exam, but looking at the insides won’t make you appreciate what it does unless you are already an Indigo employee.
Next time, I will talk about how if you have Indigo, you can do almost anything with R. I also want to explain how by putting the platform on the cloud you gain insane amounts of power cheap. The goal is magical efficiency – you can do it.








Reader Comments