The importance of data sharing 2

David Haussler (UC Santa Cruz): “Every clinician should be able to compare their genome to others.”

Here at the first plenary session at the Advances in Genome Biology and Technology 2016, there is a mix of the strange and the familiar. Strange, in that we are not in Marco Island at the Marriott resort, and it took some time and effort to become familiar with the layout and the location of the important places (ahem, the vendor suites and public places to consume beverages). Familiar, in seeing old friends and meeting new ones, and remembering acquaintances from prior years. And a familiar sight, to hear the latest in where the latest research is leading, as well as progress on some long-standing problems.

David Haussler from the University of California Santa Cruz gave a plenary talk entitled “Global sharing of better and more genomes”, and something he said struck a chord with me – that without the collaborative nature of sharing genomic information (both phenotype and genotype) the entire field of the clinical application of genomic data will simply not make progress. He illustrated this point with the slide, stating the problem: genome data is held in silos, unshared, and not standardized for exchange.

What is great about the work of the Global Alliance for Genomics and Health

The problem of silo'ed, unshared and unstandardized genomic data

The problem of silo’ed, unshared and unstandardized genomic data

(and on Twitter at @GA4GH) is that they are actively working on several fronts to solve this problem. But first a bit of context.

The Bermuda Principles were set forth in 1996 during the Human Genome Project was described by David Bentley as ‘one of the most important early meetings… to help shape the spirit in which it was carried out’ (an oral history recording about this meeting is available here.)

In one of his talks Francis Collins has described what a courageous step this was – with such an ambitious and resource-consuming and large-scale project that was the Human Genome Project, nothing quite like this had been done before. (For additional details , a 2002 essay from John Suston about that history is available here.)

These many years later after the completion of the HGP, and the completion of the HapMap project, over a thousand GWAS and then the Thousand Genomes Project, and here we are today along the path to making genomic discoveries have practical impact in healthcare. I’m reminded of Eric Green’s and Mark Guyer’s Nature Perspective piece in 2011, ‘Charting a course for genomic medicine from base pairs to bedside’ – from understanding the structure of genomes, then to understanding the biology of genomes, and then onto understanding the biology of disease and advancing the science of medicine as a memorable ‘heat-map’ that span the years from 2011 to 2020.

The problem remains, however, that with genomic data held in silos, unshared, in a non-standard format progress will not be made. There is a famous quotation from Stewart Brand from 1984 that is often shortened to “Information wants to be free”. However, the full quote is this:

On the one hand information wants to be expensive, because it’s so valuable. The right information in the right place just changes your life.


On the other hand, information wants to be free, because the cost of getting it out is getting lower and lower all the time.


So you have these two fighting against each other. (Stewart Brand)

This tension, between the value of genomic information in the right place that can literally save a person’s life, and the cost of information becoming lower all the time, is something that needs to be solved.

Dr. Haussler spoke of three major projects of the Global Alliance for Genomics and Health – the first is called the ‘Beacon Project’ that gets a simple yes/no answer ‘to test the willingness of international sites to share genetic data in the simplest of all technical contexts’ (information about this is here). The second project is called the ‘BRCA Challenge’ which aims to pool BRCA1 and BRCA2 data into a global resource (headed by Sir John Burn of Newcastle University and Stephen Chanock of the NCI); there are over 12,000 coding variants known within these two genes, all ‘scattered everywhere’. The third project he spent some time on, the generation of a new form of human genome variation map.

He laid out how this graph structure works, merging diverse genomic sequencing data into one graph structure that reflects the total diversity discovered in that specific genomic region, each given a unique base-level identifier. The goal is to create a ‘Rosetta Stone of the human genome’, allowing for different kinds of ‘connections’ including inversions that intersect sequences on their sides The more variation put into the graph, he claimed the better it become (“we are beating GATK3 at variant calling”), and showed data comparing several graph-based mapping methods across the MHC region.

The Global Alliance for Genomics and Health is working on important and vital projects (he touched on the development on a common genome ‘API’ with many applications interfacing with it); will you get involved with the genomic data you have in your silo? GA4GH invites organizations and individuals to become a member.

About Dale Yuzuki

A sales and marketing professional in the life sciences research-tools area, Dale currently is employed by SeqOnce Biosciences as their Director of Business Development. For additional biographical information, please see my LinkedIn profile here: and also find me on Twitter @DaleYuzuki.

Leave a comment

Your email address will not be published. Required fields are marked *

2 thoughts on “The importance of data sharing

  • Michael Rhodes

    Nice article, seems to be a growing trend towards this, seems to me that a lot of the research community are failing to recognize how genomic sequencing is know part of healthcare. Given that even an annonymized genome is very easy to link back to a person, if you have some genomic data on them, I believe it will move the other way, people will increasingly want there genome information secured not shared. I wonder if in the next few years there will be a huge ruckus when people realize there genome ( i.e. Medical record) was shared , especially as many of the users of the data will be commercial entities looking to make profit from the information. As a community I think we are failing to explain to an average person why sharing the info is safe (assuming it is) and has a return for society. The best argument in many ways is that you genome is available to anyone who acquires items you have interacted with, the courts have said that you don’t have privacy rights over this detritus ( at least as far as government organizations are concerned) so if everyone may as well just make it public. Although this raises so many issues (genetic discrimination, marriage discrimination and parentage to name a few that I find it hard to buy into

    • Dale Yuzuki Post author

      Hi Michael, great to hear from you, and wow was Joe’s technology talk at AGBT fascinating (I saw the poster the night before).

      Completely agree on the many issues raised, and the ELSI (Ethical Legal and Social Issues) topics are difficult.

      And on the commercial entities part, seeing the 23andMe CEO on the stage at AGBT was a bit surreal, especially as it is obvious she is playing it both ways – aggressively championing the cause of ‘participants rather than research subjects’ and yet these participants’ genomic information IS the product they are selling. And I could not believe the gall to criticize the NIH in front of the Director of the NHGRI, but there it is.

      We’re going to see this played out for years to come, for sure. Let’s catch up sometime – call me the next time you are in the DC area!