Sunday, 19 January 2014

The $1000 genome is here... Kind of...

I’m not an avid follower of tech news but something that popped up on my radar this week seemed worthy of a blog post. As Bio-IT World reports in What You Need to Know About Illumina’s New Sequencers, Illumina have announced the first sub-$1000 human genome:

Sequencing costs have been coming down steadily and dramatically since the invention of “Next Generation” techniques and the “$1000 genome” - a full human genome for under $1000 - has long been one of the holy grail targets of cheap sequencing. The cost-per-genome that Illumina quote does indeed represent a substantial drop:

This is not for everyone, as you need to buy at least ten machines as a “HighSeq XTM Ten package at $1 million a piece.

According to the Illumina press release:

The HiSeq X Ten is the world’s first platform to deliver full coverage human genomes for less than $1,000, inclusive of typical instrument depreciation, DNA extraction, library preparation, and estimated labor. Purpose-built for population-scale human whole genome sequencing, the HiSeq X Ten is an ideal platform for scientists and institutions focused on the discovery of genotypic variation to enable a deeper understanding of human biology and genetic disease. It can sequence tens of thousands of samples annually with high-quality, high-coverage sequencing, delivering a comprehensive catalog of human variation within and outside coding regions.

The $1000 price tag only applies “when used at this scale” and it doesn’t say anything about computational costs - storing and processing the vast quantities of data coming of the machine. For many sequencing applications, the computational cost now exceeds the sequencing cost, although I suspect that genome re-sequencing is at the cheaper/easier end of the processing spectrum. Which brings me to the other aspect of my “kind of…” qualifier: the HighSeq XTM still only produces 150bp reads, and at 30x coverage. This is ample for certain applications and will enable you to re-sequence (i.e. use an existing genome sequence as a scaffold to map the short reads onto) most of a “normal” human genome. It will probably struggle, however, when looking at repetitive sequences. Sequencing a genome de novo (i.e. without a template for assembly) will not be possible at the sub-$1000 price tag. Likewise, samples with heterogeneity, such as cancer genomes, need much more that 30x coverage.

As a bioinformatician, announcements like this fill me with a mixture of excitement and dread. Don’t get me wrong: being able to generate so much more data is great. The problem is, we need to be able to do something with all that data. Short 150bp read data is, ultimately, quite limiting: you need loads of it to get decent coverage/assembly and you are always going to be stuck where greater lengths are required to discriminate between repeats etc. Processing, quality-controlling, filtering and assembly these short reads remain a bioinformatic headache. This is definitely progress but, personally, I am still waiting for long-read single molecule sequencing before I get too excited.

No comments:

Post a Comment

Thanks for leaving a comment! (Unless you're a spammer, in which case please stop - I am only going to delete it. You are just wasting your time and mine.)