Showing posts with label scientists. Show all posts
Showing posts with label scientists. Show all posts

Friday, 27 January 2017

In Defense of Science

From the Evolution Directory (evoldir) mailing list today:

Governmental scientists employed at a subset of agencies have been forbidden from presenting their findings to the public. We have drafted the following response for distribution, and encourage other scientists to post it to their websites, when feasible.

Graham Coop
Professor of Evolution and Ecology
UC Davis

Michael B. Eisen
Professor of Molecular and Cell Biology
UC Berkeley

Molly Przeworski
Professor of Biological Sciences
Columbia University


The message, for any affected US scientists out there:

We are deeply concerned by the Trump administration’s move to gag scientists working at various governmental agencies. The US government employs scientists working on medicine, public health, agriculture, energy, space, clean water and air, weather, the climate and many other important areas. Their job is to produce data to inform decisions by policymakers, businesses and individuals. We are all best served by allowing these scientists to discuss their findings openly and without the intrusion of politics. Any attack on their ability to do so is an attack on our ability to make informed decisions as individuals, as communities and as a nation.

If you are a government scientist who is blocked from discussing their work, we will share it on your behalf, publicly or with the appropriate recipients. You can email us at USScienceFacts@gmail.com.

I’ve also heard a rumour that Michael Eisen is running for senate. That would be cool - we need fewer Trumps and more science-savvy politicians.

Sunday, 27 March 2016

Are data scientists just "research parasites"?

Although it passed me by at the time, the New England Journal of Medicine - a highly respected top-tier medical journal - featured an editorial on data sharing1 in January. It was so bad, that the International Society for Computational Biology (ISCB) felt the need to respond in the most recent issue of PLoS Computational Biology2. I’m glad they did, for the editorial was awful.

It starts quite well:

The aerial view of the concept of data sharing is beautiful. What could be better than having high-quality information carefully reexamined for the possibility that new nuggets of useful data are lying there, previously unseen? The potential for leveraging existing results for even more benefit pays appropriate increased tribute to the patients who put themselves at risk to generate the data. The moral imperative to honor their collective sacrifice is the trump card that takes this trick.

But then rapidly goes downhill:

However, many of us who have actually conducted clinical research, managed clinical studies and data collection and analysis, and curated data sets have concerns about the details. The first concern is that someone not involved in the generation and collection of the data may not understand the choices made in defining the parameters. Special problems arise if data are to be combined from independent studies and considered comparable. How heterogeneous were the study populations? Were the eligibility criteria the same? Can it be assumed that the differences in study populations, data collection and analysis, and treatments, both protocol-specified and unspecified, can be ignored?

Many of us who have actually conducted data analysis would retort: if you have concerns about the details then you should be making those details clear. If choices are important, explain them! For sure, you cannot just blindly combine multiple datasets that have different biases etc. but what decent scientist would do that (without an explicit caveat regarding that assumption)?

Longo and Drazen seem to be implying that all data scientists are bad scientists. As I’ve said before, Bioinformatics is just like bench science and should be treated as such. If you are making dodgy assumptions about data, you are doing it wrong. (Though people do make mistakes - the data collectors too.)

It gets worse:

A second concern held by some is that a new class of research person will emerge — people who had nothing to do with the design and execution of the study but use another group’s data for their own ends, possibly stealing from the research productivity planned by the data gatherers, or even use the data to try to disprove what the original investigators had posited. There is concern among some front-line researchers that the system will be taken over by what some researchers have characterized as “research parasites.”

Apparently, some people might think I am a “research parasite” because I sometimes analyse other people’s (published) data without talking to them about it. I’m glad the ISCB called them out on this. Newsflash: science only makes progress by people trying to disprove what other researchers (and, ideally, themselves) have posited. Science is a shared endeavour. If someone uses your data to do something (good), good! If you don’t want that, embargo the data or delay publication. Then question your motives; if glory is what you seek, perhaps you’re in the wrong profession?

A researcher frightened of “stolen productivity” is perhaps a researcher struggling for ideas. (I’d love someone else to answer some of the questions I have kicking around so that I could move on to the next thing!) A researcher scared of someone trying “to disprove what the original investigators had posited” has bigger problems.

The rest of the editorial is not so bad, as it tells the tale of a fruitful collaboration between “new investigators” and “the investigators holding the data”. Of course, this is the ideal scenario, short of generating the data themselves. The fact that the authors felt the need to stress this - and the language used of “symbiosis” versus “parasitism” - demonstrates that Longo and Drazen are utterly clueless about the modus operandi of the disciplines they discredit. Whilst ideal, direct collaboration is not always feasible. Sometimes - when the original investigators are too attached to their pet hypothesis or conclusion - it is not desirable.

They end:

How would data sharing work best? We think it should happen symbiotically, not parasitically. Start with a novel idea, one that is not an obvious extension of the reported work. Second, identify potential collaborators whose collected data may be useful in assessing the hypothesis and propose a collaboration. Third, work together to test the new hypothesis. Fourth, report the new findings with relevant co-authorship to acknowledge both the group that proposed the new idea and the investigative group that accrued the data that allowed it to be tested. What is learned may be beautiful even when seen from close up.

This sounds OK - and the described model may even be data sharing at its best - but the implication that anything short of this ideal is somehow inadequate is naive and unhelpful.

First, one person’s novel idea is another person’s obvious extension. And anyway, why should having one idea give you automatic rights to all obvious extensions?! Why should the rest of us trust the data gatherers to do a good job - especially if they exhibit attitudes towards data akin to these authors?

Second, identifying a potential collaborator does not guarantee collaboration. Ironically, the kind of paranoid narcissist that would use a term like “research parasite” is unlikely to be open to collaboration.

Thirdly, citation is a form of co-authorship that acknowledges “the investigative group that accrued the data”. Wanting full co-authorship where additional intellectual input is not required is just greedy. (And a note to the narcissist: self-citations are generally seen as lower impact than citations by wholly independent groups.)

Longo and Drazen should stick to commenting on what they know, whatever that is, and leave data scientists to worry about how they conduct themselves. With this editorial, they have done everyone - not least of which themselves - a deep disservice.


  1. Longo D.L., Drazen J.M. Data Sharing. N Engl J Med, 2016. 374(3): p. 276–7. doi:10.1056/NEJMe1516564.

  2. Berger B, Gaasterland T, Lengauer T, Orengo C, Gaeta B, Markel S, et al. (2016) ISCB’s Initial Reaction to The New England Journal of Medicine Editorial on Data Sharing. PLoS Comput Biol 12(3): e1004816. doi:10.1371/journal.pcbi.1004816.

Saturday, 5 July 2014

STAP retractions are both a failing and a triumph of science

It was looking inevitable and this week two high profile Nature articles on “STAP” (stimulus-triggered acquisition of pluripotency) stem cells were finally retracted in Nature:

Several critical errors have been found in our Article and Letter (http://dx.doi.org/10.1038/nature12969), which led to an in-depth investigation by the RIKEN Institute. The RIKEN investigation committee has categorized some of the errors as misconduct (see Supplementary Data 1 and Supplementary Data 2). Additional errors identified by the authors that are not discussed in RIKEN’s report are listed below.

...

We apologize for the mistakes included in the Article and Letter. These multiple errors impair the credibility of the study as a whole and we are unable to say without doubt whether the STAP-SC phenomenon is real. Ongoing studies are investigating this phenomenon afresh, but given the extensive nature of the errors currently found, we consider it appropriate to retract both papers.

Nature cover the retractions in an editorial, “STAP retracted”, which runs with the tagline,

“Two retractions highlight long-standing issues of trust and sloppiness that must be addressed.”

You can get a sense of those issues from the retraction statement and the editorial, which concludes:

“we and the referees could not have detected the problems that fatally undermined the papers. The referees’ rigorous reports quite rightly took on trust what was presented in the papers.”

They also highlight “sloppiness” in science, manifest as a “growth in the number of corrections reported in journals in recent years”. (Something not helped, in my opinion, by high profile journals such as Science and Nature burying so much of the important methods in Supplementary Data, which is rarely reviewed or edited as critically as material in the main text body.)

You can read more about those issues in the editorial and elsewhere, such as the Faculty of 1000 blog. The STAP papers, their initial irreproducibility and eventual retraction highlight potential failings of the current scientific system, which places far too much emphasis on output quantity and impact rather than (true) quality and integrity.

However, they also highlight the tremendous success of the scientific system.

The fact is, the experiments were repeated, the failure to reproduce results was documented, suspicions were raised and investigations made. Science works because, ultimately, you cannot fake it. Whatever data you make up, whatever results you misinterpret, whatever sloppiness leads to “conclusions [that] seem misleadingly robust”, the truth will out eventually. You cannot hoodwink nature.

And that is why science remains far and away the best (probably only) method we have for establishing the truth about reality. The system maybe flawed, it may waste money and it may lead poor unsuspecting suckers chasing wild geese, but eventually it will self-correct. So, whilst I would never put my trust in individual scientists (unless they have earnt it) or results, and I remain skeptical of every new claim, I still emphatically trust science itself.

Wednesday, 21 May 2014

Mary Anning (21 May 1799 – 9 March 1847)

Today’s Google Doodle is one of my favourites, celebrating the life (or 215th birthday) of Mary Anning, a paleontologist who discovered many fossils along the Lyme Regis coast, including the first complete ichthyosaur skeleton (at age 12 after her brother found the skull) and the first plesiosaur.

Lyme Regis is just down the coast from where we used to live in Southampton and it really has a fantastic shoreline, part of the Jurassic Coast. We paid a visit with a friend in 2011 and although we did not find any ichthyosaurs or plesiosaurs, there were plenty of ammonites to be found in the rocks. It is very humbling to look at something that died tens to hundreds of million years ago and has been sitting in a rock since, waiting to be found.

I think that there was a cast of Mary Anning’s ichthyosaur at the Lyme Regis Museum, which is sited on her birthplace (or it might have been Dinosaurland fossil museum, which is in her old church). Well worth a visit if you find yourself near the Devon coast!

Thursday, 21 November 2013

RIP Fred Sanger (1918-2013)

I opened my email this morning to the news that Fred Sanger had died. This was not entirely surprising, given that he was 95, but still sad. Although I have never met him, I think it is fair to say that I am one of many scientists whose careers have been shaped and influenced by the work of this great scientist.

I still remember sitting in lectures as an undergraduate and discovering how “Sanger” sequencing worked - like many of the ideas that change the world, it was gloriously simple and yet spectacularly clever. And, I think it is fair to say, it changed the face of biology forever.

Indeed, that was back in 1977, and Sanger sequencing is still used all over the world today, even in the face of stiff competition from “Next Generation” methods. It was the sequencing method (albeit in a much tweaked and automated version) that got us the Human Genome and one of the world’s leading sequencing centres - the Wellcome Trust Sanger Institute at Hinxton, outside Cambridge - still bears his name.

The centre has a press release about the “remarkable man”, which has been written by greater wordsmiths than I:

“Fred Sanger, who died on Tuesday 19 November 2013, aged 95, was the quiet giant of genomics, the father of an area of science that we will explore for decades to come.

His achievements rank alongside those of Francis Crick, James Watson and Rosalind Franklin in discovering the structure of DNA. We are proud that he graciously agreed to allow our Institute to be named after him.

In research marked by two Nobel Prizes, he developed methods that allow us to determine the order of the building blocks of DNA and of proteins. This technique allowed the languages of life to be read.

Because of Fred’s work we have been able to interpret those languages and to use that knowledge for good.”

There is more, including quotes and links out to other resources about his work, at the site.

I remember thinking in those lectures back in Nottingham how I wished that one day I might have an idea as good as Sanger sequencing. I doubt that I ever will; instead, I will just have to settle for trying to do the best I can with all of the amazing sequence data that now exists as a result.