What does Google X have to offer in life science diagnostic development?
Screen capture of The Atlantic Magazine interview via http://youtu.be/7dzI_azZEGI
I’ve picked up a phrase, ‘it’s a narrow world’, from somewhere in my travels. Way back in my laboratory manager days in Santa Monica California at the John Wayne Cancer Institute (‘laboratory manager’ sounds so much better than ‘laboratory technician’), I met a young scientist named Andrew Conrad who started a company called National Genetics Institute. (This was around 1992 or 1993.) Their aim was for fast (and inexpensive) PCR-based diagnostics.
They had two unique qualities: one was that they were based in west Los Angeles where I grew up, the other was a technology to re-use other equipment components to build a fast thermal cycler. Their system at that time was a home-made one that used pumps from other medical equipment (think dialysis machines and the like) and plumbing to move high volumes of heated water into circulating water baths. Add in some liquid-handling robotics (remember this is in the early 1990s, and the first 96-plate based systems were just appearing about that time. (This Wikipedia article indicates the Society for Biomolecular Screening started a standardization initiative in 1996.)
Fast forward twenty years to March 2013, and Andrew Conrad’s name re-appears as the chief scientist for the Google X Life Sciences project, and in the subsequent months there was a veritable avalanche of exposure: an interview with the Wall Street Journal (complete with a photo of Dr. Conrad with an Attune flow cytometer in the background), an interview with Wired journalist Steven Levy entitled ‘We’re hoping to build the Tricorder’, and most recently a description of their work with modeling human skin, complete with an Atlantic Magazine video.
Two larger questions need to be asked, however. What is Google’s unique competence? And what research Google X would have to offer?
Google’s Unique Competence
During last October’s American Society for Human Genetics meeting in San Diego, our group at Thermo Fisher Scientific generated 15 posts from this conference (you can access them here at Behind the Bench). One plenary talk that I did not discuss before was on Sunday Oct. 19 (8:30am) given by David Glaser of Google, with the title “Lessons from a Mixed Marriage: Big Sequencing Meets Big Data”. David said a number of interesting things, and I’ll take the time here to share a synopsis of that talk.
He said that genomics is now becoming an ‘N of Millions’ activity, and this is certainly true, given projects such as the Resilience Project of the Mount Sinai Medical Center in New York, the 100,000 Genomes Project of Genomics England in the UK, and of course the recent Personalized Medicine Initiative announced by the President that seeks the genetic profiles, medical histories and other data from a million or more Americans.
Importantly the speaker from Google laid out a brief history of big data mining, from the development of MapReduce in 2004, Hadoop in 2005, Apache Spark in 2009, and most recently Google Dremel in 2010, as key milestones for the analysis of very large datasets. What is meant by ‘very large’? Think trillions of rows of data. And the guiding principles are to go big, to go fast, and go standard.
As an example of ‘big’, he brought up YouTube, currently uploading 300 hours of video every minute. Google’s YouTube search engine covers more than 100 Petabytes of data. (That is 100,000 1 Terabyte hard drive’s worth.)
They applied Dremel and another tool called BigQuery to 1,000 whole-genome sequences from the publicly-available Thousand Genome Project datasets, to see how well their computational code can sift through .vcf files. (Remember – each whole-genome sequence contains 3-4 million variants apiece.) The first task: segregate variants by population. After a total time of only 10 seconds, a graph (in R) was produced, that validated prior analysis. After another 10 lines of code, a graph of shared variation across all 1,000 samples was produced. Another few lines of code, and output of SNP distribution of heterozygosity by population of origin.
He next went through a PCA analysis, solving not only a 1000×1000 computational problem but then scaling it to 1 million x 1 million. You get the idea – the folks from Google are experts at huge datasets and mining it for search, whether a particular cat video on YouTube or the frequency of heterozygosity at a given locus across many samples. He concluded with this XKCD cartoon to illustrate how insurmountable some Big Data challenges may remain.
Google’s expertise at Big Data and datamining is undisputed. And with more than $60B in annual revenue (primarily from their search engine, in particular their AdWords search marketing), there’s no question there. And when they start to monetize genomic analysis for scientists, there will be a healthy business there.
The Google X project is another matter entirely. A semi-secretive facility for making ‘major technological advancements’, their mandate is to gain at least a 10-fold improvement over an existing method. Google’s self-driving car project and Project Glass are two better-known projects, and Google X Life Sciences has a similar goal.
Where does Google X Life Science start? Their list includes a contact lens for blood-sugar monitoring and a spoon for people with tremors. It is the last two that are of interest to the geneticist: the Baseline study, and cancer-detection via nanoparticles. For Baseline, studying what the meaning of ‘normal’ is is a useful exercise, but it is difficult to envision how this can align under the ’10-fold improvement goal’. There are many efforts underway from many different research institutions that have been looking at this question for many years. For example the US National Institutes on Aging has been conducting The SardiNIA Study since it launched in 2001 – with two important dimensions: genetic homogeneity and careful phenotyping.
What Google’s Baseline may fail to capture is the control over both environmental and genetic variables. This is where careful work from geneticists come in – choosing ‘natural experiments’ such as an island population that traces their lineage back some 8,000 years. Google needs to be very careful who they choose as their ‘normals’ – and be prepared that most of the work is not on the data generation, but rather on the phenotyping data collection and figuring out what population of individuals to choose to baseline to begin with.
But the larger question is this: what is the protein signal they want to monitor that’s so indicative of an early-detection for cancer that couldn’t be examined with a blood draw? That is a different question altogether. There are technologies available for single-cell analysis, as well as for cell-free DNA (and RNA) analysis, to look at circulating tumor cells and for particular biomarkers that could be somatic mutation-based, methylation-based, copy-number variant based, you name it. Not to mention exquisite technologies available for very sensitive protein detection. If only the requisite biomarkers in the blood can be defined, which many companies (and many biomarkers) are actively pursuing, with many available today.
About the other (non-life science) large Google X projects, self-driving cars have huge social, legal and policy implications (laid out here recently in the Washington Post). For the recently-pulled Google Glass, to put it back into secret development in the hands of a fashion design expert does not bode well for its future, as Google doesn’t perceive Glass to be a social interaction problem, just a lifestyle / design one.
I for one am not optimistic that Google X Life Sciences (or Google X in general) will be able to perform the needed 10x disruption they have mandated. Self-driving cars should be (and currently is) ongoing in the development laboratories of major automobile manufacturers, as cars are their core competency. The world of augmented reality got a boost recently with Microsoft’s announcement of the HoloLens, and if early predictions come true, can be a game-changer in how individuals can interact with a combination of the real world and the virtual. I for one am not waiting on Facebook to introduce complex computing hardware, nor am I waiting on Google for hardware, for that matter.
Innovation is very hard: there are thousands and thousands of misses for every success. Google X Life Sciences is trying to do what many small startups are also trying to do: solve a problem (early detection of cancer) that is very very difficult. They just have a lot more funding to do it, and over 100 people working on it. They are optimistic that they will show results within five years – so it will be certainly something to watch.
PS – For all of you that have followed me over to the Behind the Bench blog, many thanks! (And for those who haven’t discovered it yet, please pay us a visit.)