Tuesday 26 April 2011

Why many research articles are wrong.

Nature is affectionately known amongst the research community (mostly those who have not had a Nature paper) as the Journal of Irreproducible Results. Nature has been responsible for some publishing disasters - cold fusion comes to mind but the other journals should not gloat.

Here are a couple of articles about why science is often wrong
The Truth Wear Out
Significance Chasing

Word Associations

How should you visualise gigabytes of textual data?

Word arcs
Word Spectra
Other Visualisations

Data Scraping and Privacy

It is amazing to see how far "reputable" companies will go at the request of others to pry into people's personal affairs.

This is what I have come to expect from Big Pharma, but not of the scrapers.

WSJ Article

Video Game Violence

How are you to objectively measure the effect of video games on level of violence?

Here are some articles about a study that tries to create an objective measure.

Techdirt: does-being-more-vocal-video-game-violence-debate-mean-you-have-better-argument.shtml
eurekalert: Press Release

So the way the "study" creates this objective measure is by weighing the number of publications produced by both sides of an argument. I wonder if they have ever heard of the field called publication bias.

Here is another study that finds a link between video games and mental health issues
Slashdot Article

Wednesday 2 March 2011

Impact Factors for Bioinformatics

Publish or be damned, or in this case publish and be damned.

NAR 7.9
Bioinformatics 4.9
BMC Bioinformatics 3.4
PloS One 4.3
Briefings in Bioinformatics 7.3
BMC Genomics 3.8
Proteins, Structure Function and Bioinformatics 3.4

Sunday 27 February 2011

What are the chances?

Here are another pair of news stories from the 27th of February 2011. Both made the front page of the BBC website that day and both were about fatalities of people who had come from Tadcaster Yorkshire

http://www.bbc.co.uk/news/uk-england-york-north-yorkshire-12592527
http://www.bbc.co.uk/news/uk-12591428

So coincidences happen ridiculously often, you always need to think about the data.

Thursday 27 January 2011

Getting the most out of Google - part 1

Basic Searching

  1. Phrases in quotation marks.
  2. Default Boolean is AND.
  3. Use OR explicitly.
  4. Negate using the minus sign (should not contain). Minus must not be separated by a space.
  5. Use a + to make an explicit inclusion of a short word ignored by Google e.g. +the.
  6. Also search with synonyms by using the ~ character.
  7. Number ranges are separated by .. e.g. 5..10. If the second number is missing it will be a maximum or minimum.
  8. There are no stemming wildcards - so * only is a wildcard for a complete word.

Special Syntax

These allow you to focus your search onto certain elements of the web.
  1. intitle: limits the search to the titles of webpages.
  2. inanchor: searches only in hyperlinks
  3. intext: searches only in the page text - not anchors, URLs.
  4. insite: restrict the search to a particular domain.
  5. inurl: limits the search to a particular page.
  6. link: searches for pages that link to that URL.
  7. cache: searches the cache - this is useful if the page has moved/been deleted.
  8. filetype: searches for suffixes of that filetype e.g. ppt.
  9. related: finds related pages.
  10. define: finds related definitions.
To find more detail, there is the O'Reilly book - Google Hacks The problem is parts of it are badly outdated so it is perhaps best from a library rather than to buy.