Tuesday, 10 May 2016

Please sponsor me Running for Premature Babies in the SMH Half Marathon - only 5 days to go!

You may have noticed a slight lack of posts recently. This is in part (or a lot!) due to a major lifestyle change that took place in the beginning of March, when we became first-time parents. One of the things advertised in the hospital was the “Running for Premature Babies” in the Sydney Morning Herald Half Marathon. This seemed like a good motivator to get back(‽) in shape, and raise some much-needed cash for the Royal Women’s Hospital Foundation at the same time.

The good news is that they have raised the $108k for a new X-ray machine and everything raised from now will go on research, which obviously appeals to me as a researcher. Anyhoo… if you can spare a few bucks, it would be much appreciated - and if you are in Sydney on Sunday, do cheer on the folks in pink! (Not the best photo, I know! I'd just run 18km for the first time in my life!)

Tuesday, 29 March 2016

If you are British, please fight the government plans to make all schools into academies

If you don’t know the problem, read Mark Steel’s excellent column:

Thank God our schools have finally been liberated by our national free spirit George Osborne

Now, please sign either or both of these petitions:

The government has announced that every school in England will become an academy. This was not in their manifesto and is therefore a completely undemocratic move.

State schools are accountable to parents, the local community and to local authorities. By forcing schools to become academies the accountability will be to a trust and to accountants. Her Majesty Chief Inspector of schools has concerns over education provided in academies and so should you.

I hate posting so much about depressing politics but blame the Tories. I simply cannot comprehend how they can think this is good for the children or the country. The only plausible explanation is pure self-interest and greed, wanting to give even more to their rich mates. How this can happen in a supposed democracy is terrifying. If nothing else, it has highlighted how sick the system really is.

In the modern age, there really is no excuse for hands-on democracy with the people voting (electronically) on important issues like this. (The problem, of course, is that the people with the power to change things are the ones who will be most disadvantaged by giving up some of their power.) https://petition.parliament.uk/ is a start, I guess.

Sunday, 27 March 2016

Are data scientists just "research parasites"?

Although it passed me by at the time, the New England Journal of Medicine - a highly respected top-tier medical journal - featured an editorial on data sharing1 in January. It was so bad, that the International Society for Computational Biology (ISCB) felt the need to respond in the most recent issue of PLoS Computational Biology2. I’m glad they did, for the editorial was awful.

It starts quite well:

The aerial view of the concept of data sharing is beautiful. What could be better than having high-quality information carefully reexamined for the possibility that new nuggets of useful data are lying there, previously unseen? The potential for leveraging existing results for even more benefit pays appropriate increased tribute to the patients who put themselves at risk to generate the data. The moral imperative to honor their collective sacrifice is the trump card that takes this trick.

But then rapidly goes downhill:

However, many of us who have actually conducted clinical research, managed clinical studies and data collection and analysis, and curated data sets have concerns about the details. The first concern is that someone not involved in the generation and collection of the data may not understand the choices made in defining the parameters. Special problems arise if data are to be combined from independent studies and considered comparable. How heterogeneous were the study populations? Were the eligibility criteria the same? Can it be assumed that the differences in study populations, data collection and analysis, and treatments, both protocol-specified and unspecified, can be ignored?

Many of us who have actually conducted data analysis would retort: if you have concerns about the details then you should be making those details clear. If choices are important, explain them! For sure, you cannot just blindly combine multiple datasets that have different biases etc. but what decent scientist would do that (without an explicit caveat regarding that assumption)?

Longo and Drazen seem to be implying that all data scientists are bad scientists. As I’ve said before, Bioinformatics is just like bench science and should be treated as such. If you are making dodgy assumptions about data, you are doing it wrong. (Though people do make mistakes - the data collectors too.)

It gets worse:

A second concern held by some is that a new class of research person will emerge — people who had nothing to do with the design and execution of the study but use another group’s data for their own ends, possibly stealing from the research productivity planned by the data gatherers, or even use the data to try to disprove what the original investigators had posited. There is concern among some front-line researchers that the system will be taken over by what some researchers have characterized as “research parasites.”

Apparently, some people might think I am a “research parasite” because I sometimes analyse other people’s (published) data without talking to them about it. I’m glad the ISCB called them out on this. Newsflash: science only makes progress by people trying to disprove what other researchers (and, ideally, themselves) have posited. Science is a shared endeavour. If someone uses your data to do something (good), good! If you don’t want that, embargo the data or delay publication. Then question your motives; if glory is what you seek, perhaps you’re in the wrong profession?

A researcher frightened of “stolen productivity” is perhaps a researcher struggling for ideas. (I’d love someone else to answer some of the questions I have kicking around so that I could move on to the next thing!) A researcher scared of someone trying “to disprove what the original investigators had posited” has bigger problems.

The rest of the editorial is not so bad, as it tells the tale of a fruitful collaboration between “new investigators” and “the investigators holding the data”. Of course, this is the ideal scenario, short of generating the data themselves. The fact that the authors felt the need to stress this - and the language used of “symbiosis” versus “parasitism” - demonstrates that Longo and Drazen are utterly clueless about the modus operandi of the disciplines they discredit. Whilst ideal, direct collaboration is not always feasible. Sometimes - when the original investigators are too attached to their pet hypothesis or conclusion - it is not desirable.

They end:

How would data sharing work best? We think it should happen symbiotically, not parasitically. Start with a novel idea, one that is not an obvious extension of the reported work. Second, identify potential collaborators whose collected data may be useful in assessing the hypothesis and propose a collaboration. Third, work together to test the new hypothesis. Fourth, report the new findings with relevant co-authorship to acknowledge both the group that proposed the new idea and the investigative group that accrued the data that allowed it to be tested. What is learned may be beautiful even when seen from close up.

This sounds OK - and the described model may even be data sharing at its best - but the implication that anything short of this ideal is somehow inadequate is naive and unhelpful.

First, one person’s novel idea is another person’s obvious extension. And anyway, why should having one idea give you automatic rights to all obvious extensions?! Why should the rest of us trust the data gatherers to do a good job - especially if they exhibit attitudes towards data akin to these authors?

Second, identifying a potential collaborator does not guarantee collaboration. Ironically, the kind of paranoid narcissist that would use a term like “research parasite” is unlikely to be open to collaboration.

Thirdly, citation is a form of co-authorship that acknowledges “the investigative group that accrued the data”. Wanting full co-authorship where additional intellectual input is not required is just greedy. (And a note to the narcissist: self-citations are generally seen as lower impact than citations by wholly independent groups.)

Longo and Drazen should stick to commenting on what they know, whatever that is, and leave data scientists to worry about how they conduct themselves. With this editorial, they have done everyone - not least of which themselves - a deep disservice.


  1. Longo D.L., Drazen J.M. Data Sharing. N Engl J Med, 2016. 374(3): p. 276–7. doi:10.1056/NEJMe1516564.

  2. Berger B, Gaasterland T, Lengauer T, Orengo C, Gaeta B, Markel S, et al. (2016) ISCB’s Initial Reaction to The New England Journal of Medicine Editorial on Data Sharing. PLoS Comput Biol 12(3): e1004816. doi:10.1371/journal.pcbi.1004816.

Saturday, 26 March 2016

Meet the world's newest lifeform: Syn 3.0

Every now and then, a piece of science is done that is truly ground-breaking and world changing. One such piece is:

Hutchison III CA et al. (2016) Design and synthesis of a minimal bacterial genome. Science 351(6280): aad6253-1. DOI: 10.1126/science.aad6253

Science has a summary here but it’s worth reading the whole paper. Syn 3.0 itself is pretty impressive, but what’s even more impressive is the approach taken to make it. In addition to using current knowledge of fundamental biological machinery, the Venter group used large-scale transposon mutagenesis and selection to identify additional genes that were either essential (i.e. no growth without them) or “quasi-essential”, where removal resulted in a major growth deficit.

They also had to overcome the problem of redundancy: even in a genome as reduced as the Mycoplasma species, there can sometimes be multiple genes that do the same thing. Removing one makes little difference but removing both is lethal - something hard to identify when knocking out single genes at a time. Whatever the Intelligent Design crowd would like to believe, biology is messy.

Of course, Syn 3.0 is just the start, as the goal was making a “minimal cell”:

“A minimal cell is usually defined as a cell in which all genes are essential. This definition is incomplete, because the genetic requirements for survival, and therefore the minimal genome size, depend on the environment in which the cell is grown. The work described here has been conducted in medium that supplies virtually all the small molecules required for life. A minimal genome determined under such permissive conditions should reveal a core set of environment-independent functions that are necessary and sufficient for life. Under less permissive conditions, we expect that additional genes will be required.”

Robust life will therefore need a lot more genes. It will be interesting to see how many are required for autotrophy - life that needs only inorganic chemicals and an energy source.

Even within the “minimal cell” concept, Syn 3.0 represents a somewhat arbitrary end-point. In identifying the “quasi-essential” genes, a judgement had to be made regarding what constitutes an acceptable growth rate*. Whittling down to 473 genes is impressive, but this number could no doubt be even smaller if slower growth rates were accepted. (Modern life is in competition with lots of other highly evolved organisms. Early life would have been able to get by with much lower growth rates, so this is not a “minimal cell” in that context.)

There is also a lot of exciting potential ahead for manually reducing the number of genes by true intelligent design. Fusing interacting gene products together, for example, might eliminate the need for so many genes contributing to core processes. (Looking for apparent protein fusion/fission events in evolution is a reasonably successful method for predicting protein-protein interactions.) With time, we might be able to “wind back the clock” and remove some of the unnecessary complexity that has probably crept into the system due to the underlying evolutionary process.

I also wonder how many of the current crop of genes of unknown function - a surprising 149 genes - can be replaced over time with genes of known function. (In other words, how many of them represent convergent evolution of functions we already know about but are not recognisable.) And how many of the rest are genome-/condition-specific?

Like all of the best science, this work opens the door to more questions than it answers! Some exciting times ahead, I think.

[*The important but oft-overlooked concept that any assessment of life is context- and environment-dependent exposes another flaw with Intelligent Design as a testable hypothesis: designed to do what? To assess how well-designed something is, one needs to know its purpose and/or the acceptable design traits. To hide from the fact that Intelligent Design is Creationism, supporters often make the argument that the identity of the designer (Creator) is not important - but without knowledge of the designer, how can one predict the motivation behind the design?]

The blood of dinosaurs

Courtesy of sciencegasm. Remember this when you see any cute Easter chicks*...

[*Yes, I know it’s not a baby chicken. Some kind of gull, maybe?]

Friday, 25 March 2016

The solution to extremism is not more extremism

Like all civilised beings, I was dismayed (if not shocked) at the recent events in Brussels. Until our world is rid of religious extremism, I fear that such terrorist atrocities will always be a part of it. However, the way to combat religious extremism is not political extremism.

Terrible as these events were, the damage and death toll are slight compared to a natural disaster, or even a bad plane crash. Just as we do not stop flying, or living in earthquake zones, neither should we go for a Trumpesque (albeit temporary) ban on muslim immigrants nor a right-wing blame-immigrants-for-everything backlash against refugees.

Caitlin Moran nailed it with a tweet:

Of course, some people missed the point, retorting that the terrorists were home-grown extremists - even more reason not to blame the refugees - or that not all of them are fleeing ISIS. To me, though, the sentiment is sound. Caitlin is not talking about those specific guys, but rather the kind of regressive human who sees indiscriminate violence (whether motivated by religious or political ideologies) as an acceptable means to their desired end.

More recently, the poet Brian Bilston summed it up even better, with the preface:

Here is a new poem entitled “Refugees”.

Please bear with it.

If the terrorists succeed in making us lose our humanity and sink to their level, they win. If we live in a climate of fear due to an inflated sense of the risk posed by a scattering of abhorrent individuals, they win. Surely, a civilised society can do better than that?

Wednesday, 23 March 2016

Cassetteboy has done it again

Normally, the phrase for this kind of thing would be: “It’s funny, because it’s true.” However, being true makes this tragic. (But still funny at the same time.)

Emperor's New Clothes rap | Cassetteboy

Cassetteboy - Emperor's New Clothes rap

Posted by In My Newsfeed on Monday, 21 March 2016

And if you missed it the first time, check out Cassetteboy’s prophetic Cameron’s Conference Rap from 2014: