Is there room for 90 providers of genomics software? 2

Image courtesy Libertas Academica via Flickr.

“Let a hundred flowers bloom, let a hundred schools of thought contend”, Mao once said during the Revolutionary Days of 1957. Now in the middle of a genomics revolution, it feels that way in the market for genomics software to analyze next-generation sequencing data. New companies are being formed, large software and hardware firms are expanding into the life sciences, and others are offering in addition to software options the implementation of a cloud-based service. But will customers really spend scarce funds on commercial software for analysis? And can the market absorb all these new offerings? Software from the vendors of equipment that generate the data is not complete – talk to anyone with a 454, Illumina or Life Technologies system. Even for Sanger sequencing, the established “gold standard” that has been routinely used for almost 30 years, other software tools supplement the vendor-supplied software, in order to meet a particular scientist’s requirement(s).

For example, in next-generation sequencing all the three major vendors will provide software that will take the massive amount of data (from millions to billions of reads), and perform alignment against a reference sequence, resulting in a BAM format. (BAM is short for Binary representation of a sequence Alignment / Map file, which was a standard developed by the 1000 Genomes Project.) This is known historically as a secondary analysis (the term ‘primary analysis’ is what the instrument does in producing the data, say turning voltage signals from an Ion Torrent 316 chip into base calls with an assigned quality score).

Functions beyond alignment to reference vary with the vendor. (This can be deemed tertiary analysis.) A common vendor-supplied function is to call variants after the alignment of sequence reads to a reference (and the standard output file is called a VCF file, again from the 1000 Genomes Project). Another function could be de novo assembly. And while this tertiary analysis sounds straightforward (alignment to reference or assembling a small genome from scratch), it is not a ‘one-size-fits-all’ proposition.

In the context of the complexity of the data, and the varied needs of individual researcher,this is a reasonable business opportunity: to specialize in the data analysis segment of the next-generation sequencing market.

There are a few unique aspects to providing software (or software-as-a-service in the cloud, known as SaaS) to the research market. First and foremost, within any research group with genomic data experience (dating back to whole-genome gene expression and before) there will be one or several experienced computational biologists, who can assemble data analysis pipelines and are accustomed to working with large datasets. Secondly, there is a large (and growing) set of free-to-use tools published by research groups in prominent journals, with new tools being published very frequently. (Ask any bioinformatics specialist for their current list of ‘go-to’ tools and how they have changed over the past 12 or 18 months, and be prepared to hear an inventory of which software has remained useful over time and another set of software that at one point was useful and later on proved less so, when a better tool appeared.) Third, and perhaps most importantly, funds for a given project are set aside for reagents and perhaps some equipment including computational hardware (especially if the project is large), but funds are not set aside for purchasing software or software services for analysis. Funds for software often have to be looked at on a department- or institute-wide site license, which complicates matters substantially (for both the researcher and the vendor trying to make the sale).

It is hard to compete with free. Of course the implementation of any software, even if the software is provided free of charge, is not free when you count the cost of the resources needed to implement the software, both in terms of time and computation hardware capacity. On top of saving funds on an initial outlay, another strong argument for the ‘do it yourself’ approach to bioinformatics, is the fact that putting a solution together of hand-picked tools means that each tool is understood on its own, with its particular strengths and weaknesses known to the user (or discovered in due course). For purchased software, while technical support is offered, the amount of transparency in terms of how the software is constructed will vary between vendors; in other words, the software ‘secret sauce’ can be a black box to the customer, with surprising (usually negative) effects discovered empirically.

In this background there are many software vendors. A friend put together this list of genomics software and service providers, currently with 90 listed. Some are long-established companies dating back to before the Human Genome Project was completed, the whole-genome expression days, or whatever you want to call the late 1990’s (Ingenuity, Accelerys, GenomeQuest); others are well-established firms developing software since the early days of NGS (circa 2005, such as clcBIO, SoftGenetics, Knome), and others have been formed more recently and are gaining some market share and awareness (Geneious, DNAnexus, BioTeam). Still others are major recognized multinational firms (Microsoft, Samsung, GE Healthcare). Each company is offering a better mousetrap; emphasis on a hand-curated protein-interaction database from the primary literature (Ingenuity), performing data analysis as a cloud-based service (Appistry, DNAnexus), multiple samples across whole-genome or whole-exome sequence data (Knome), the list goes on and on about each companies’ unique ‘value proposition’: the simple “this is what we offer that is better than anything else”.

Whatever the true current number of bioinformatics companies might be, we are in a phase of the market that will tolerate a lot of creative approaches before the inevitable pruning begins. In both the automobile and aviation industries, in the beginning of these major industries there was no standardization of function, and little agreement on what controls were needed or required. Eventually both the functions that were required and the companies that produced an effective product were sorted out by the marketplace, and an inevitable contraction took place. For bioinformatic software (or software as a service), the market still is in the explosive growth phase. Revolutions tend to begin in a messy way.

Given the incomplete nature of what a genome is to begin with (a prior post was written up here), there is no one ‘best way’ to analyse something so complex, with so many judgement calls to be made along the entire process. The imperfect nature of the quality of the data is a result from a 1% typical NGS error rate. Any filtering will suppress false positive results at the expense of an elevated false negative error rate, and any variant not identified is simply not discovered; it is the false positive result (a putative variant identified that turns out to be not real) that limits career advancement and gets individuals in trouble (not to mention wasting limited resources). Unknown variation that goes by undetected does not get individuals in trouble, however it may mean the difference between a major discovery being made or a negative result, or a definitive diagnosis for a patient with an unknown condition, versus a negative result from whole-exome or whole-genome sequencing.

Is there room for 90 companies to produce software or provide software as a service for genomics? Eventually, definitely not. In the near term, as the revolution continues, a hundred flowers bloom.

About Dale Yuzuki

A sales and marketing professional in the life sciences research-tools area, Dale currently is employed by Olink as their Americas Field Marketing Director. For additional biographical information, please see my LinkedIn profile here: and also find me on Twitter @DaleYuzuki.

Leave a Reply to Dale Yuzuki Cancel reply

Your email address will not be published. Required fields are marked *

2 thoughts on “Is there room for 90 providers of genomics software?

  • Bioduediligence

    I see little use of non-free software in the academic world. This is a key reason why the US government is funding open-source software/pipeline efforts being conducted at University of Maryland IGS and other places, so the time and resources are not duplicated. As the clinical market develops, presumably there will be a place for some (<90…) players to provide value-added services.

    • Dale Yuzuki

      I was a little surprised when a friend who started working for Ingenuity told me how many customers they had for their pathway analysis software, on a yearly subscription basis. Twelve years ago (circa 2000) companies like DoubleTwist and GeneLogic were out selling proprietary software databases, and really made a splash before they crashed.

      I realize now that I didn’t include a basic fact regarding NGS – the exponential growth of ever-larger datasets, and only a linear growth in qualified computational biologists. A cursory look at the ‘Careers’ search engine reveals no less than 25 currently-open positions for either staff bioinformatics scientists, field application scientists for bioinformatics, staff software engineers in C++, a software engineer for NGS mapping, even a Summer 2013 intern in bioinformatics. And from what I remember, this number (25) was about 20 six months ago, so the need only is growing.

      This is an important point, that the demand-side for bioinformatics expertise will only explode in the future along with the size and scope of the data collected. The only question is the mechanics (and economics) of the ‘how’.