Next Generation Technologist

Next Generation Sequencing, Marketing, and the Genomic Revolution

March 2, 2015
by Dale Yuzuki

A few observations from Advances in Genome Biology and Technology

150302 Marco Island 1 Crop

View from the Thermo Fisher suite

In the closing session of the Advances in Genome Biology and Technology meeting recently concluded in Marco Island, Florida, the main organizers Dr. Eric Green (National Human Genome Research Institute Director, Bethesda MD) and Dr. Elaine Mardis (Co-director with Rick Wilson of the Genome Institute of Washington University, St. Louis MO) asked an interesting question – whether they should set-up an area where blog posts could be written (and recorded) for the benefit of others. I think it is a good idea – to have a forum by which participants can share their thoughts and impressions, as over the years these get scattered as certain blogs appear (and disappear as circumstances change).

Notes from AGBT 2010

As one example, Anthony Fejes shared a lot on his old blog, and if you want a snapshot on how things have changed in five years you can access his notes from AGBT 2010 here. (His 2009 observations are summarized in a single post here, dated Feb 9.) And then in 2010 he moved over to the blog network but it was short-lived, and is now blogging under a subdomain (here’s a link to his AGBT 2011 notes). Not being able to attend since (the most recent post was an invitation to ‘guest blog’ from the 2012 meeting but there were not any takers), he’s moved onto other topics in bioinformatics, and notably hosted a Reddit AMA a few months ago (I admit that I missed it, even though /r/bioinformatics is an interesting sub). He even received ‘Reddit Gold’, which is a way readers can reward useful contributions with additional Reddit superpowers.

Anyway, back to the topic of guest blogging and AGBT: there were a number of new faces on Twitter using the #AGBT15 hashtag, and the prior notables that blogged about AGBT in prior years were absent, such as Lex Nederbragt (his ‘In Between the Lines of Code’ is widely read and last year was his first AGBT) and Keith Robison. Keith’s ‘Omics! Omics!’ site had a number of notable #AGBT15 posts, including a nice description of the just-announced 10X Genomics platform, and this review of Thermo Fisher Scientific’s progress with the Ion Torrent platform. Another blogger that comes to mind is Dan Koboldt of MassGenomics.

One simple fact is that at AGBT the intersection of public communication via this new media platform and attendees will change from year to year. This year perhaps other voices will start to share their opinions online – but for Lex, Keith and Dan they all have limited travel funds (who doesn’t?) and like anyone else has to be selective on which conferences to attend in a given year. (Of course there were commercial media present, such as reporting from GenomeWeb which will be sure to be written up and published over the next few days, as well as a number of Wall Street analysts who will be writing up their research reports for their respective audiences.)

The challenges of getting blog posts written

And blogs quickly become dormant – I see that the excellent GenomesUnzipped hasn’t posted since March of 2014 – for a wide variety of reasons, the primary one is that everyone is busy. The opportunity cost of writing down thoughts versus doing ‘regular work’ or other special projects or fulfilling other commitments is completely understandable.

So if AGBT as an organization does setup a guest-blogging platform (which I do plan on supporting, for what that’s worth), there is a bar it must hurdle – that of taking the time and effort of sharing opinion and feedback ‘on the record’. I hope that in the coming year there will be more people who can summarize their thoughts and impressions – it seems like it was only James Hadfield and his excellent Core Genomics blog who was there in-person.

And thus the value of in-person (“IRL”) interaction at a conference – the social component of asking for opinions and perspectives is invaluable. No dependance on getting people to write a guest post – they can simply share what they think. (And at all those – ahem – after-session informal get-togethers colloquially known as ‘parties’, there were plenty of opinions shared over a wide variety of topics, for sure.)

Okay, it does leave me as a blog-post writer who attended as well – and I want to thank all those who came up to me during AGBT to introduce yourselves. If I’ve learned anything through almost three years of blogging and tweeting, is that there are many people who find out who you are through your writing that you may never meet. (And thinking about it, we all have benefited enormously from the writings of people we will never meet, in particular the many writers of the Great Books that have so shaped Western civilization.) Thus the reach of the written word – even if it is in an ephemeral media such as an online blog – is a powerful thing.

Impressions from AGBT

So onto what I thought about AGBT: I’ve written up already (for the Thermo Fisher Scientific blog Behind the Bench) a post of my favorite talk at this year’s AGBT, Carlos Bustamonte (Stanford) talking about a unique application of forensic DNA analysis combined with his expertise on human population genomics – tracing the trans-atlantic slave trade from 1721 through several centuries (his data goes through 1910). His presentation was entitled “PhenoCap: A Targeted Capture Panel for Comprehensive Phenotyping of Forensic DNA Samples”. So keep an eye out for it, it should go live in a few days. (There already are some 5 video interviews from AGBT 2015 up on the website, with many more to come.)

For sure I would be remiss to not mention 10X Genomics; they launched their product at this conference, and made quite an impression. A sample-prep device that takes 1ng of >100kb-length genomic DNA, partitions it into 100’s of thousands of pL-volume emulsified droplets (although to be frank the word ‘emulsion’ has never been used by them in any of their presentations nor in my conversations with them), and uses one unique barcodes per microreaction (picoreaction?) to individually label that >100kb strand, so that afterward the 100’s of thousands of uniquely-labeled sequences can be deconvoluted and aligned together. I won’t re-iterate the details – Keith Robison wrote them all up here.

Shortly after the afternoon presentation, which I described to a friend as ‘pitch perfect’, I was minding my own business at dinner when John Stuelpnagel (a founder of Illumina as well as of 10X) joined us. (It brought back fond memories of my own cubicle at their old building on Towne Centre Drive, where I was only around the corner from John’s cube. I found out quickly that he was an early-riser like myself, and found his input always insightful and so valuable. Good times.) After talking about Poisson distributions and clarifying species specificity and other minutiae, he told me that they had mathematically modeled the numbers their engineers needed to design the system to (you can imagine how expensive it is to synthesize and manufacture many hundreds of thousands of oligos, but that is how Illumina started way back in 2000-2001), and that once they built their system the experimental results matched perfectly with their calculations! This can be called ‘optimum company management through mathematics’.

From the applications perspective, friends I spoke to (several core facility directors, who I just might see at the upcoming Association for Biomolecular Resource Facilities meeting in Saint Louis MO March 28-31) were eager to give this platform a try. They all agreed that for cancer applications Levi Garraway’s plenary talk was noteworthy (yes they had a plenary speaker in the morning session have several data slides featuring 10X Genomics results).

One other notable item regarding long haplotyping (for background on diploid phasing, I wrote a piece called ‘The Unexplored Diploid Landscape’ here), was that several presenters said they had optical mapping data (in reference to BioNano Genomics), but no data from that platform. It seemed as if optical mapping was only used in a validation function, to confirm findings from either Pacific Biosciences’ long reads (average on the order of 10kb, up to 47kb long) or short-read method (i.e. Moleculo).

Another impression from AGBT

Craig Venter presenting at the Pacific Biosciences workshop; Deanna Church (Personalis) in the foreground

Craig Venter presenting at the Pacific Biosciences workshop; Deanna Church (Personalis) in the foreground

On the note of Pacific Biosciences, it seemed like they were everywhere. (By ‘everywhere’, I meant in the sessions.) Certainly The Gene Myers spoke about his next step – development of a local aligner, with other tools making much-improved human genome reference assemblies with far lower compute requirements. (It was not unusual to hear 60x improvements in speed, just by looking at these problems in a different way – thus my own admiration for people who have the freedom and ability to choose the problems they want to work on.) If you want to see what a splash The Gene Myers made at AGBT last year, see here.

By getting fully-phased human genomes first with CHM1, Mike Hunkapillar (CEO of Pacific Biosciences) showed metrics for the CHM1 hydatidiform mole (a haploid human genome), with N50’s far longer than any human genome reference assembly to-date (for HuRef-1, it was an N=50 of 10.5Mb). In addition there are some 18 projects at four collaborators worldwide working on other CHM samples (i.e. derived from different individuals). Important work here – an overarching theme of phased haplotypes, and what the scientific community can learn about the role of genetic variation in an allele-specific manner.

A final impression from AGBT

The splash from prior years about single-molecule sequencing from newer technologies (i.e. Oxford Nanopore, Quantum Biosystems, ZS Genetics, and others) seems to have died down. There were a few presentations that included Oxford Nanopore data but they had no error rate metrics, talked about how it felt like an ‘alpha or beta’ version, using His-tag metal chelating purification resin (hello QIAGEN!) to enrich for hairpin-containing library molecules, so the MinION platform is still something of a work-in-progress. I myself am amazed that they can get 512 single-molecule pores as a manufacture-able item and read electrical signals as nucleotides pass through. (For additional context and background, here’s a post from AGBT14 about ONT.)

A parting thought

One observation (from another friend, yes he was another core facility director) noted that there was no data or other findings from the X-10 whole-genome sequencing platform. Naturally J. Craig Venter said he was planning to purchase many more of them for his Human Longevity Institute to fulfill their goal of 1M whole human genomes and that he wouldn’t share any results that he will be commercializing. And other than a poster (or two?) about technical aspects of the X-10 platform, it was something of an omission that did not go unnoticed.

February 22, 2015
by Dale Yuzuki

Marketing Precision Medicine Pt2

Borrowed from this infographic:

Borrowed from this infographic:

After writing up ‘Marketing Precision Medicine’ it turned out that I spent so much time in giving the background and context, that I only was able to mention briefly the marketing challenge that the Precision Medicine Initiative brings.

So here is Part 2, getting deeper into the Marketing challenge of precision medicine – a concern about privacy and data security.

Again via LinkedIn, and of course will need to come up with a Part 3 next week – what government can do to help.

Marketing Precision Medicine Pt2.

February 16, 2015
by Dale Yuzuki

Marketing Precision Medicine

Borrowed from this infographic:

Borrowed from this infographic:

Last week I listened in with interest on the Precision Medicine Workshop held at the NIH, and wrote up some thoughts about the need for them to market it effectively. You can access this post on LinkedIn – something of an experiment in posting in various places.

We are living through a remarkable period in the history of healthcare, implementing precision medicine on a population-scale of a million individuals.

February 9, 2015
by Dale Yuzuki

Commentary from Behind the Bench

Right about one calendar year ago I was asked to start writing for a new blog for the Genetic Analysis division of Thermo Fisher Scientific. Called ‘Behind the Bench’, it took a few months to get off the ground, and started posting right at the end of May 2014.

A few personal comments on what I’ve learned this past year:

  • Finding post ideas is not a problem when your own customers are doing fascinating things
  • The mechanics of legal and regulatory approval can be daunting, but I have plenty of company among others who have similar constraints
  • Producing nice-looking video takes a lot of effort and expertise, but is so much easier (and less expensive) now
  • Being a social media professional doesn’t mean my prior experience no longer counts; it becomes more relevant than ever
  • Marketing is going through a huge tectonic shift, away from email toward digital (in all its forms – webinars, video, white-papers, ebooks) and social
  • Presence online doesn’t mean anything to those who are still offline; peer recommendations and in-person involvement (such as interaction with a flesh-and-blood salesperson) still mean a lot to customers

If you are interested in the most popular posts from 2014, they are available here. If you’d like to see all 113 posts published to-date, they are here.

Lastly, if you would like to ‘listen in’ on live-tweets from an upcoming conference (such as AGBT in Marco Island FL Feb 25, AACR in Philadelphia PA April 19 and ESHG in Glascow Scotland June 6) follow me on Twitter. And if you are interested in the top tweets from 2014, you can access them here too.

If there are topics you’d like to see in the future (suitable for this space rather than Behind the Bench), feel free to leave a comment below.


February 3, 2015
by Dale Yuzuki

The Core Competency of Google is not Life Sciences

What does Google X have to offer in life science diagnostic development?

Screen capture of The Atlantic Magazine interview via

Screen capture of The Atlantic Magazine interview via

I’ve picked up a phrase, ‘it’s a narrow world’, from somewhere in my travels. Way back in my laboratory manager days in Santa Monica California at the John Wayne Cancer Institute (‘laboratory manager’ sounds so much better than ‘laboratory technician’), I met a young scientist named Andrew Conrad who started a company called National Genetics Institute. (This was around 1992 or 1993.) Their aim was for fast (and inexpensive) PCR-based diagnostics.

They had two unique qualities: one was that they were based in west Los Angeles where I grew up, the other was a technology to re-use other equipment components to build a fast thermal cycler. Their system at that time was a home-made one that used pumps from other medical equipment (think dialysis machines and the like) and plumbing to move high volumes of heated water into circulating water baths. Add in some liquid-handling robotics (remember this is in the early 1990s, and the first 96-plate based systems were just appearing about that time. (This Wikipedia article indicates the Society for Biomolecular Screening started a standardization initiative in 1996.)

Fast forward twenty years to March 2013, and Andrew Conrad’s name re-appears as the chief scientist for the Google X Life Sciences project, and in the subsequent months there was a veritable avalanche of exposure: an interview with the Wall Street Journal (complete with a photo of Dr. Conrad with an Attune flow cytometer in the background), an interview with Wired journalist Steven Levy entitled ‘We’re hoping to build the Tricorder’, and most recently a description of their work with modeling human skin, complete with an Atlantic Magazine video.

Two larger questions need to be asked, however. What is Google’s unique competence? And what research Google X would have  to offer?

Google’s Unique Competence

During last October’s American Society for Human Genetics meeting in San Diego, our group at Thermo Fisher Scientific generated 15 posts from this conference (you can access them here at Behind the Bench).  One plenary talk that I did not discuss before was on Sunday Oct. 19 (8:30am) given by David Glaser of Google, with the title “Lessons from a Mixed Marriage: Big Sequencing Meets Big Data”. David said a number of interesting things, and I’ll take the time here to share a synopsis of that talk.

He said that genomics is now becoming an ‘N of Millions’ activity, and this is certainly true, given projects such as the Resilience Project of the Mount Sinai Medical Center in New York, the 100,000 Genomes Project of Genomics England in the UK, and of course the recent Personalized Medicine Initiative announced by the President that seeks the genetic profiles, medical histories and other data from a million or more Americans.

Importantly the speaker from Google laid out a brief history of big data mining, from the development of MapReduce in 2004, Hadoop in 2005, Apache Spark in 2009, and most recently Google Dremel in 2010, as key milestones for the analysis of very large datasets. What is meant by ‘very large’? Think trillions of rows of data. And the guiding principles are to go big, to go fast, and go standard.

As an example of ‘big’, he brought up YouTube, currently uploading 300 hours of video every minute. Google’s YouTube search engine covers more than 100 Petabytes of data. (That is 100,000 1 Terabyte hard drive’s worth.)

They applied Dremel and another tool called BigQuery to 1,000 whole-genome sequences from the publicly-available Thousand Genome Project datasets, to see how well their computational code can sift through .vcf files. (Remember – each whole-genome sequence contains 3-4 million variants apiece.) The first task: segregate variants by population. After a total time of only 10 seconds, a graph (in R) was produced, that validated prior analysis. After another 10 lines of code, a graph of shared variation across all 1,000 samples was produced. Another few lines of code, and output of SNP distribution of heterozygosity by population of origin.

He next went through a PCA analysis, solving not only a 1000×1000 computational problem but then scaling it to 1 million x 1 million. You get the idea – the folks from Google are experts at huge datasets and mining it for search, whether a particular cat video on YouTube or the frequency of heterozygosity at a given locus across many samples. He concluded with this XKCD cartoon to illustrate how insurmountable some Big Data challenges may remain.

Google’s expertise at Big Data and datamining is undisputed. And with more than $60B in annual revenue (primarily from their search engine, in particular their AdWords search marketing), there’s no question there. And when they start to monetize genomic analysis for scientists, there will be a healthy business there.

Google X

The Google X project is another matter entirely. A semi-secretive facility for making ‘major technological advancements’, their mandate is to gain at least a 10-fold improvement over an existing method. Google’s self-driving car project and Project Glass are two better-known projects, and Google X Life Sciences has a similar goal.

Where does Google X Life Science start? Their list includes a contact lens for blood-sugar monitoring and a spoon for people with tremors. It is the last two that are of interest to the geneticist: the Baseline study, and cancer-detection via nanoparticles. For Baseline, studying what the meaning of ‘normal’ is is a useful exercise, but it is difficult to envision how this can align under the ’10-fold improvement goal’. There are many efforts underway from many different research institutions that have been looking at this question for many years. For example the US National Institutes on Aging has been conducting The SardiNIA Study since it launched in 2001 – with two important dimensions: genetic homogeneity and careful phenotyping.

What Google’s Baseline may fail to capture is the control over both environmental and genetic variables. This is where careful work from geneticists come in – choosing ‘natural experiments’ such as an island population that traces their lineage back some 8,000 years. Google needs to be very careful who they choose as their ‘normals’ – and be prepared that most of the work is not on the data generation, but rather on the phenotyping data collection and figuring out what population of individuals to choose to baseline to begin with.

But the larger question is this: what is the protein signal they want to monitor that’s so indicative of an early-detection for cancer that couldn’t be examined with a blood draw? That is a different question altogether. There are technologies available for single-cell analysis, as well as for cell-free DNA (and RNA) analysis, to look at circulating tumor cells and for particular biomarkers that could be somatic mutation-based, methylation-based, copy-number variant based, you name it. Not to mention exquisite technologies available for very sensitive protein detection. If only the requisite biomarkers in the blood can be defined, which many companies (and many biomarkers) are actively pursuing, with many available today.

About the other (non-life science) large Google X projects, self-driving cars have huge social, legal and policy implications (laid out here recently in the Washington Post). For the recently-pulled Google Glass, to put it back into secret development in the hands of a fashion design expert does not bode well for its future, as Google doesn’t perceive Glass to be a social interaction problem, just a lifestyle / design one.

I for one am not optimistic that Google X Life Sciences (or Google X in general) will be able to perform the needed 10x disruption they have mandated. Self-driving cars should be (and currently is) ongoing in the development laboratories of major automobile manufacturers, as cars are their core competency. The world of augmented reality got a boost recently with Microsoft’s announcement of the HoloLens, and if early predictions come true, can be a game-changer in how individuals can interact with a combination of the real world and the virtual. I for one am not waiting on Facebook to introduce complex computing hardware, nor am I waiting on Google for hardware, for that matter.

Innovation is very hard: there are thousands and thousands of misses for every success. Google X Life Sciences is trying to do what many small startups are also trying to do: solve a problem (early detection of cancer) that is very very difficult. They just have a lot more funding to do it, and over 100 people working on it. They are optimistic that they will show results within five years – so it will be certainly something to watch.

PS – For all of you that have followed me over to the Behind the Bench blog, many thanks! (And for those who haven’t discovered it yet, please pay us a visit.)

Copy Protected by Chetans WP-Copyprotect.