Monday, 20 October 2014

Do we need a bioinformatics dead links database?

Keith Bradnam on his ACGT blog has recently highlighted the irksome issue of “link rot” - the unfortunate but all-to-common scenario in which a publication points to a URL that ceases to exist after an oft-too-brief time has elapsed. (In severe cases, before the paper is even published!)

Personally, I think that journals need to take a bigger responsibility in maintaining the integrity of their publications. Surely it is not beyond a big publishing house to have a bot that systematically checks their published links once a week and reports and errors? Authors could be notified and/or online manuscripts updated. In severe cases where the URL is critical to the paper, such as databases and webservers, lack of repair could even lead to retraction.

The problem goes beyond publications, however. For (sometimes mysterious) reasons of their own, even established biological resources quite often change their URLs too. For example, Pfam embeds the Wikipedia entry for a domain into their pages. I was browsing the globin page last week and clicked on a link to see the entry in Interpro, only to be greeted with:

Sorry, the requested URL has changed

The requested URL: http://www.ebi.ac.uk/interpro/IEntry/ac=IPR002338

has moved to

/interpro/entry/IPR002338;jsessionid=BB50E35E3DD96F6C83B7AD432B8E2E2A

You will automatically be redirected to the correct URL in a few seconds. Please update your bookmarks.

It redirected OK but I wonder how long it will be before that redirect itself disappears… and how many Wikipedia articles and database cross-references run the risk of breaking.

Perhaps we need a database where bioinformaticians can report, track, update and hopefully fix all these rotten links? In the meantime, if anyone spots any in my own webpages or papers, please let me know!

The perils of having a (soft) cat

Why Evolution is True posted this cartoon from lunarbaboon last week:

It’s never stopped me going to work but I’d be lying if I said that something like this (without the shoes) had not delayed me going to bed once or twice.

Monday, 13 October 2014

Every conference should be like ABiC 2014 (#ABiC14)

Last weekend, I attended the inaugural Australian Bioinformatics Conference (ABiC 2014) in Melbourne. (Twitter tag #ABiC14 so as not to be confused with the ABIC Foundation conference in California.) I’ve been to quite a few conferences in my time, including a fair few bioinformatics conferences. The latter (in my experience) are often quite dull affairs - good for meeting people and “networking” but not so good for interesting science beyond the invited keynote speakers.

Happily, ABiC14 went far beyond expectations. I signed up because it is important for me as a relative newbie to meet more Australian bioinformaticians. This was accomplished but was only the start of the positive experience of this conference. In fact, the conference was so good that I thought it worth posting its successful attributes/decisions as a model for future conferences (and reminder to myself should I organise one).

Here then, in approximate chronological order, are my top take-home quality attributes of ABiC 2014:

Organisation. The conference website was well organised and clear prior to the meeting. The program outline was available in good time, enabling planning, with abstracts etc. available in a single PDF. The only real improvement I can think of would be to have a printed program (just the outline, not full abstracts) provided upon arrival. However, because there were no last minute changes to the program, this turned out to be a moot point.

Coffee. Science in general, and bioinformatics specifically, is fuelled by coffee. The Aussies do coffee well and ABiC 2014 had the genius idea of a sponsored barista who made a decent cup of fresh coffee to order. I took advantage of this on arrival, which really set me up for the morning session and got the whole thing off to a great start.

Venue. The Royal Children’s Hospital was a great venue. Access by public transport was easy. The lecture theatre was comfortable and a good size. Eduroam provided free WiFi access (even if HTTP sites were unavailable for some reason). Coffee breaks, lunch, and poster sessions were near to the lecture theatre and promoted mingling.

Size. I’m not sure what the final number of delegates at ABiC 2014 was but for me it was the perfect number. (I would guess around 180+/-.) There were enough people for some interesting diversity, presentations and posters but not too many to actually get to chat to people and visit all the interesting posters.

Interesting Science! This is clearly one of the most important aspects of a good conference but sadly one that is often not achieved by a bioinformatics meeting (beyond the keynotes). I think there were several contributing factors to this. First, there were not too many talks and no parallel sessions. There were twenty talks in total, in six sessions of 3-4 talks of 10-50 mins each. Each session kicked off with an invited speaker. By keeping the quantity down, the quality of the talks - and stamina of the audience - was maintained at a very high level. I was expecting to “zone out” and struggle to maintain concentration a few times but I found my attention captured by each speaker. Talk quality was further enhanced by…

No published conference proceedings. Because of the large computer science community within bioinformatics, a lot of bioinformatics conferences have the submitted work published as papers in conference proceedings or a journal special issue. Even when the journal is a good one, this massively decreases the likelihood of getting interesting science, which will normally be either (a) published in a higher impact biology or general science journal, or (b) not yet be ready for publication. As a result, talks tend to be more technical and specialist… and boring. ABiC 2104 avoided this trap.

Food. You don’t go to a conference for the food but having tasty snacks at the breaks and at lunch does add to a general vibe of happiness and satisfaction.

Beer. Scientists and computer geeks are not always the most extrovert of individuals, and a little something to grease the wheels of social interaction never goes amiss. The ABiC committee went as far as to establish the beverage categories of choice with a pre-conference questionnaire. They gained extra brownie points from me for the presence of Fat Yak among the offerings.

Posters. This is largely a composite of the above points but the poster session was one of the better ones I have attended. For a start, the posters were up for the entire conference. It really bugs me when conferences do not do this and it all comes down to choice of venue - so well done ABiC for selecting a venue with this capacity. Furthermore, the poster location had great access - being a wide corridor - and was between the auditorium and the morning/afternoon drinks and snacks, maximising quality exposure. Furthermore, restricting the number of talks saved lots of interesting science for the posters!

Conference Dinner. If you are lucky, the conference dinner will be one of the best networking points of a conference, as your table represents a fairly captive audience with whom you have to interact. Nevertheless, conferences are generally charged to grants and researchers are on a budget, so it is frustrating to be forced to spend a load of cash on a meal that rarely lives up to the expectation of the price tag. ABiC 2014 took a pragmatic solution of booking out a restaurant and bar, and everyone paying for their own meal and drinks ordered off a standard single course menu. It worked really well. The food was good (but not excessive) as was the company. There was only one course but you never need more than that when there are morning and afternoon snacks in the breaks.

Timing. As far as I could tell, the talks generally seemed to stick to time, so coffee/lunch breaks were not truncated. Again, by keeping the number of talks (and talks per sessions) low, this was easier to achieve. Nonetheless, it's still something that is not achieved by all conferences. (I remember one conference in which my talk started 5 minutes after it was supposed to finish!)

Twitter. I’m a bit sporadic on Twitter at the best of times and not entirely convinced that live tweeting in a conference is not generally more of a distraction than a benefit - for those present, at least. However, I did enjoy having the #ABiC14 twitter feed open during the meeting and felt that it added to the sense of community. Of course, the aforementioned interest of the talks (and the smallish size of the meeting) limited the distraction.

As of the end of ABiC 2014, there was no plan for an ABiC 2015. I, for one, really hope that there is one - and that the organisers are able to stick to the same model.

Sunday, 5 October 2014

Wikipedia joker almost prescient as Rabbitohs win their first NRL Grand Final for over forty years

Congratulations to the South Sydney Rabbitohs, who have just won the Australian National Rugby League Grand Final 2014. The Rabbitohs are our local team, so we watched the game (on TV) - and it was a cracker!

Although I must confess that it was the first NRL game that I’ve watched properly (being more familiar with Union back in the UK & Ireland), I must also admit that I don’t think it will be my last. It's pretty brutal but very exciting.

I decided to do a bit of reading up on the Rabbitoh’s before the match and was surprised to see that Wikipedia already listed a Rabbitohs win! It was taken down at some point but I got a screen grab first (right). The final score was 30-6, so the jokester in question was only a couple of digits out!

The game started with a crunching tackle which broke the cheekbone of Bunnies’ hero and man-of-the-match (Clive Churchill Medal winner) Sam Burgess. Burgess went on to play the entire game with a broken cheekbone, reminiscent of their 1970 victory when the skipper played 70 minutes with a broken jaw! Tough game, tough players!

Thursday, 18 September 2014

Why are kinases called kinases?

Kinases are one of the biggest and most important classes of enzymes (i.e. protein with catalytic activity) in biology. You can always recognise an enzyme because of the suffix “-ase”. What comes before the -ase then indicates the nature of the enzymatic activity.

The major classes of enzyme have names that give a good clue as to the general nature of this activity. Oxidoreductases, for example, catalyse oxidation-reduction (“redox”) reactions. Transferases transfer chemical groups from one molecule to another. Kinases are transferases: they transfer a phosphate group from one organic molecule (usually ATP, the cell’s primary energy carrier) to another (a protein, lipid or carbohydrate). And this is actually the origin of the kin- part of the name: from the Greek kinein “to move”.

For those wondering, there is also such as thing as a Phosphorylase but this does something subtly different; whereas a kinase transfers organic phosphate groups (i.e. phosphates attached to carbon-based biomolecules), a phosphorylase transfers inorganic phosphate groups (i.e. phosphate+hydrogen) to acceptor biomolecules.

But why -ase? This stems from the ending of diastase, the first enzyme ever discovered. According to Wikipedia:

“The name “diastase” comes from the Greek word διάστασις (diastasis) (a parting, a separation) because when beer mash is heated, the enzyme causes the starch in the barley seed to transform quickly into soluble sugars and hence the husk to separate from the rest of the seed.”

So there you have it. A kinase is an early example of an enzyme that moves something from one molecule to another, hence a name that literally means “an enzyme to move”.

Thursday, 7 August 2014

How (not) to apply for a PhD

As with most academics, I get a fair a number of unsolicited enquiries about possible PhD placements. Unfortunately, a number of these appear to be from students who are receiving little or no advice regarding how to go about making an application.

Every now and then, an application stands out from the bunch for being particularly good or bad. A while back, I received one of the latter, which made me so sad that I thought I would turn my response into a post. What made it particularly tragic was that it was from a student who had received government funding to study abroad, and was therefore in a fairly strong position.

The email (name redacted) was as follows:

On [DATE] "XXX baby" <XXX@yahoo.com> wrote:

Subject: Hello doctor

My name is XXX, I finished M.Sc. degree in XXX University in Iraq at (2012), I have obtained a fund from the Iraqi government to study PHD in Microbiology in the Ustralia.

I had opportunity to be a student of Iraq and My interests are about Bacteriology and Immunity in general and I really hope to get your kind acceptance to pursue my PHD under your supervision in University of UNSW.

Kindly find attached my C.V please which I hope it gives detailed overview about me

Thank you very much

The CV was then attached as several JPEGs of scanned pages. Unusual attachments plus a username of “XXX baby” and subject line of “Hello doctor” meant that this one almost went straight in the bin as spam. Given the number of typos and other errors, it might have been better if it had.

I’ve had some others that were almost as bad, including one that started “Hello Sir !” and proceeded to end every sentence with an exclamation mark! Yes! Every sentence! No! That’s not a good idea!

Lest I get misunderstood, I must stress that the point here is not to be mean to these students. The issue is that they are clearly not getting the advice they need, especially given the fact that they are writing in their second (plus) language and applying to academics with a different culture.

Here then, is my advice/guidance for those wanting to make an unsolicited application for a PhD studentship (though most points still apply if a project is being advertised). I get many applications from overseas students. If I am even to consider you as a potential student then you need to impress me. The following impress me:

Professionalism. Send a well structured email, with a sensible subject such as “PhD enquiry” and a CV (if attached) that is provided as a single sensibly named PDF (or docx), i.e. your name and “CV” feature somewhere in there.

Genuine Interest. Personalise your message and provide some indication that you really know who I am. “Dear Sir” indicates a blanket mailshot to all and sundry and is thus destined for the bin. Knowing who I am is not enough, though. I also want an indication that you know and understand at least something of the research that goes on in my lab. Referencing degree subjects or research experience that match neither my background nor research focus indicates poor research/understanding. A PhD is long, hard graft and I need to know that you have genuine interest or everyone’s time will be wasted.

A clear CV. Your CV should have relevant skills and metrics highlighted. If you are from overseas, remember that I probably do not know what your grades mean, so place them in context. What proportion of students get those grades/medals etc.? This is a research post, so describe some of your research projects and your role in them. When it comes to CVs, evidence is the name of the game. Don’t just list skills and positive attributes: provide examples.

Motivation/Enthusiasm. Good grades are not enough and academic ability will only get you so far in a PhD. Motivation and enthusiasm are critical. As well as a CV that stresses relevant achievements, include a personal statement that convinces me that you want to do a PhD (with me) for the right reasons, and are likely to see it through.

Ask questions. This is basically genuine interest + motivation/enthusiasm but worth stating in its own right as intelligent questions are the evidence of those things. It's your PhD and your life - you should care about what you might be doing. The caveat is this: do not ask a question that is answered by ten minutes of reading my lab's webpages and/or paper abstracts.

Good communication skills. If English is not your first language, get your emails proof-read by someone with good English. Exclamation points after every sentence indicates that communication will be tricky, as does failure to appropriately understand/respond to emails. I am not going to think you are not keen if you take a few days to give a measured response. I am going to think that communication may be insurmountably difficult if I get a speedy response that is riddled with errors.

Funding. Unless you are applying for a specific funded project, you will need to secure your own funding. A clear indication of a funding plan is therefore crucial. If you already have funding, this is good. Better still, would be to detail exactly what that funding covers (i.e. duration? fees AND living expenses? any attached conditions?) and to indicate how the funding was won and how competitive it was. Winning a competitive scholarship is one way to impress. Even better, provide evidence in the form of an official notification of funding etc.

References. Ultimately, it is very tricky to assess a student from a CV and covering letter alone. Again, evidence is the name of the game. Provide two or more faculty members or professional scientists who can provide an academic reference. These should have institution email addresses, not personal gmail/yahoo addresses, as anyone can create these and they will carry less weight.

Fail to hit most of these points and your application is in the bin. (I now have a generic response that I send out to generic applications.) This may seem harsh but there is a lot at stake and it is important to get a good fit between student, supervisor and project; a poor student/fit is a net drain on lab productivity.

A PhD is not something to embark upon lightly. It will consume many years of your life and will quite possibly determine the direction of the rest of your professional life. A PhD application should be made with all of the research, care and attention to detail that this implies.

Saturday, 5 July 2014

STAP retractions are both a failing and a triumph of science

It was looking inevitable and this week two high profile Nature articles on “STAP” (stimulus-triggered acquisition of pluripotency) stem cells were finally retracted in Nature:

Several critical errors have been found in our Article and Letter (http://dx.doi.org/10.1038/nature12969), which led to an in-depth investigation by the RIKEN Institute. The RIKEN investigation committee has categorized some of the errors as misconduct (see Supplementary Data 1 and Supplementary Data 2). Additional errors identified by the authors that are not discussed in RIKEN’s report are listed below.

...

We apologize for the mistakes included in the Article and Letter. These multiple errors impair the credibility of the study as a whole and we are unable to say without doubt whether the STAP-SC phenomenon is real. Ongoing studies are investigating this phenomenon afresh, but given the extensive nature of the errors currently found, we consider it appropriate to retract both papers.

Nature cover the retractions in an editorial, “STAP retracted”, which runs with the tagline,

“Two retractions highlight long-standing issues of trust and sloppiness that must be addressed.”

You can get a sense of those issues from the retraction statement and the editorial, which concludes:

“we and the referees could not have detected the problems that fatally undermined the papers. The referees’ rigorous reports quite rightly took on trust what was presented in the papers.”

They also highlight “sloppiness” in science, manifest as a “growth in the number of corrections reported in journals in recent years”. (Something not helped, in my opinion, by high profile journals such as Science and Nature burying so much of the important methods in Supplementary Data, which is rarely reviewed or edited as critically as material in the main text body.)

You can read more about those issues in the editorial and elsewhere, such as the Faculty of 1000 blog. The STAP papers, their initial irreproducibility and eventual retraction highlight potential failings of the current scientific system, which places far too much emphasis on output quantity and impact rather than (true) quality and integrity.

However, they also highlight the tremendous success of the scientific system.

The fact is, the experiments were repeated, the failure to reproduce results was documented, suspicions were raised and investigations made. Science works because, ultimately, you cannot fake it. Whatever data you make up, whatever results you misinterpret, whatever sloppiness leads to “conclusions [that] seem misleadingly robust”, the truth will out eventually. You cannot hoodwink nature.

And that is why science remains far and away the best (probably only) method we have for establishing the truth about reality. The system maybe flawed, it may waste money and it may lead poor unsuspecting suckers chasing wild geese, but eventually it will self-correct. So, whilst I would never put my trust in individual scientists (unless they have earnt it) or results, and I remain skeptical of every new claim, I still emphatically trust science itself.