Monday 31 May 2010

The End Of Science

This was the cover title for a series of articles in Wired in July 2008. They were welcoming us to the petabyte age - where we will routinely deal with datasets containing petabytes of data. Inside the titles are less provocative the first is - The End of Theory.

In this article Chris Anderson looks at how tools like Google and the Cloud are chaging the way we look at data. He raises the question of how we have to dea
l with such massive amounts of data.

It forces us to view data mathematically first and establish a context for it later
Peter Norvig has gone so far as to change George Box maxim "All models are wrong but some are useful" to "All models are wrong, and increasingly you can succeed without them."

... faced with massive data, this approach to science - hypoyhesize, model, test - is becoming obsolete.
There are a couple of problems with this idea:
  1. It becomes very difficult to distinguish science and pseudo-science. The Bible Code and other such books suddenly become more convincing.
  2. There is still a model, or rather an assumption and that is that homology between already seen examples will apply to new unseen examples.
What is actually happening is you no longer look for the universal laws but what they actually produce. This is actually how science has worked before. That is the process by which Newton produced his Laws of Gravitation, but the explanation in a more fundamental sense of what gravity means had to wait for Einstein.

Anderson goes too far when he talks about the limits of biology:

Now biology is heading in the same direction. The models we were taught in school about "dominant" and "recessive" genes steering a strictly Mendelian process have turned out to be an even greater simplification of reality than Newton's laws. The discovery of gene-protein interactions and other aspects of epigenetics has challenged the view of DNA as destiny and even introduced evidence that environment can influence inheritable traits, something once considered a genetic impossibility.
This is over-stating reality. Biology according to pure Darwinian evolution is likely to be wrong and there are some epigenetic factors that follow a Lamarckian process but not all and Darwin and Mendelian inheritance is still true for most genes. We only know that it is not completely true because we know the mechanism of epigenetics, which is precisely what Anderson had said we do not need to know.

There is now a better way. Petabytes allow us to say: "Correlation is enough." We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.
Sorry but he should read any statistics textbook about data-mining that show you can find any possible correlation depending on the way you partition the data. The higher the dimensionality of the data (more variables) the more likely this is to happen. So this paragraph is nonsense.

Finally:

Correlation supercedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic, explanation at all.
There's no reason to cling to our old ways. It's time to ask: What can science learn from Google?
To understand the importance of mechanism you need to read studies like those of Richard Peto looking at heart attacks and the use of aspirin as a primary medical treatment when patients are hospitalised. Why did they think to use aspirin? Because its mechanism of action is to prevent clotting.

Saturday 29 May 2010

Conspiracy Theories

From the assasination of Lincoln to 9-11 nothing seems to attract the public imagination more than a conspiracy theory. Now there is an entire conspiracy theory literature and books are written about the dangers of conspiracy theorists. We live in the "brave new world of big brother" (people should read the books before slinging the names around indiscriminantly including me). Governments do want to hide some of the things they do and how can we tell when we are being lied to.

This blog entry was in fact inspired by the book "You are being lied to". Some of the articles are by respected authors such as Noam Chomsky and Howard Bloom but many of the authors are unknown to me, so why should I believe anything they say?

How can we tell if there is any truth to the claims? Well we need to use logic, science and reason and we need to find hard evidence.

Take for example the article by Jim Marrs - What is Missing from this Picture. I read the article about how much evidence is missing from high profile cases and you have to wonder. How can the police and the government lose so much evidence? What do they have to hide? The problem is that this is building a whole story on one or two pieces of information. To use a phrase from Terry Pratchett you are living on narativium. You have a story and you make the facts fit the story you want to believe in. So when you read articles like this you have to be careful not to be caught up by the story.

One example is the campaign to find illegal behaviour in the Clinton-Gore adminsitration and to connect this with the death of Vince Foster and Ron Brown (and even TWA 800). Now we know that certain right-wing groups were trying their hardest to smear Clinton after they had got the Republicans to employ a special prosecutor in an operation called the Arkansas Project. In the end it was the people behind the smears and the prosecutor who ended up as the losers. It is very easy to find coincidence and circumstantial evidence but if this is all you have then you have to be very careful at what conclusions you make.

For a Holywood take on conspiracy theories you can enjoy:

  • The Net - A hacker's identity and life is destroyed as she uncovers a conspiracy.
  • Conspiracy Theory - Mel Gibson hams it up
  • Enemy of the State - The NSA bad guy's birthday is 9/11
  • JFK - but read the websites about Oliver Stone's playing with history.

Best of all and a must watch for anyone interested in conspiracies, is Arlington Road. This is absolutely brilliant, especially as it showed how the US was vulnerable to home grown terrorism before 9/11.

Friday 28 May 2010

Chariots of the Gods and Pseudoscience

I have always been a science-fiction geek and so from this I was always interested in stories of alien encounters and secret government cover-ups such as project bluebook and Majestic-12.

One of the early books I read probably when I was about 12 or 13 was "The Gold of the Gods" by Erich von Daniken. In this book von Daniken describes being shown into a huge system of tunnels that extended out under the Pacific Ocean and that were filled with gold artefacts and a library that was written on a huge collection of metal discs. The books has photos of the entrance to the cave system and of some of the artefacts.

This is one of a series of books that develops his theme from "Chariots of the Gods" that the meso-american civilisations were exposed to Alien technology and that we have been visited by Aliens who we think of as Gods and that UFOs are their chariots.

So here is more evidence of these visitations because there is a level of technology that does not fit with what we know of meso-American history. The problem is that it does not exist. The poor archaeologist who showed him the site then had to explain to the authorities where all these artefacts had gone as they did not actually exist. He had made up the extent of the tunnel system and the library. This is an example of pseudo-science, the hijacking of scientific language and presentation to make something appear to be scientific while not actually following scientific method. You cannot always trust what you read or believe the photos that you see.

For a clear debunking of von Daniken's claims you should read Ronald Story's book The Space Gods Revealed. Sadly it looks like Story's book is out of print but von Daniken's frauds continue to sell well. As wikipedia would say knowledge is not democratic. Just because lots of people believe it, does not mean it is true.

Thursday 27 May 2010

The Curious Case of "The Total Synthesis of Taxol - K.C. Nicolaou"

This is an interesting case of scientific competition which is discussed in hushed tones amongst synthetic organic chemists. K.C. Nicolaou is one of the leading exponents of total synthesis of natural products. One of the early natural products to be produced synthetically was taxol which is the naturally occuring toxin in Yew trees. Taxol stops cell division and so it is a potential anti-cancer drug.

Nicolaou had been competing to be the first to produce a total synthesis against Robert Holton's group from the State University of Florida. Holton submitted his two papers on the synthesis to the Journal of the American Chemical Society on Dec 21st 1993. Nicolaou has submitted a paper on synthesis of related Taxoids three weeks earlier to the same journal on the 30th of November (this would appear in the February 1994 issue of JACs as the Holton papers).

Meanwhile Nicolaou submits a paper on the Total Synthesis of Taxol to Nature on the 24th of January which was accepted on the 31st of January and published in the 17th February edition of Nature. The paper before Nicolaou's in the journal was submitted 27th September 1993 and accepted 21st December 1993. So more usually it took 3 months to review a paper and not one week and even after acceptance Nature could take another six to seven weeks to publish not the two and a half weeks seen for Nicolaou's paper. Nicolaou would then go on to elaborate the synthesis in four papers published in JACS in 1995. Nicolaou included the synthesis of Taxol in his book Classics in Total Synthesis.

Now we cannot know whether Nicolaou was a reviewer for the Holton JACS paper but he would have been a logical choice unless Holton expressed that he should not be allowed to review because of a conflict of interest. Even if he did not review the paper, he was probably aware that it had been submitted but doesn't the rapid submission of the paper to Nature and its fast-track review and publication deserve some further investigation?

All in all this case raises questions about the publishing ethics of the journal Nature, the confidentiality of submissions to JACS and the integrity of scientists. Nicolaou has since gone on to be a highly acclaimed scientist with many significant international awards. Meanwhile Robert Holton's biography is rather more modest.

When Peer Review Fails - Fabricated Results

The idea behind peer review is that your peers are also experts in the field and so they should be able to spot weaknesses in the science and prevent the publication of fabricated results.

Here are a few cases where this has not worked.
This is likely to be the tip of the iceberg, these are serial offenders who had fabricated most of the data for their entire careers. It is much harder to find the one off fabrication or just the science that turns out to be wrong by error and irreproducible. Peers should be able to reproduce the experiment, that is what makes it science but this is not always the case as was seen with Cold Fusion.

Here is an article about retraction rates which shows Science and then Nature are at the top of the list. It also suggests a correlation between impact factor and retraction rates. Nature is affectionately referred to amongst the scientific community as the Journal of Irreproducible Results (cold fusion paper, memory of water paper ...)

Peer Review

The aim of Peer Review is to make sure that publications and grant submissions meet an acceptable standard as defined by a community of peers (The idea of trial by your peers goes back to Magna carta). The question is does it work?

Only 8% members of the Scientific Research Society agreed that 'peer
review works well as it is.' (Chubin and Hackett, 1990; p.192)

"A recent U.S. Supreme Court decision and an analysis of the peer review system
substantiate complaints about this fundamental aspect of scientific research."
(Horrobin, 2001)

"is a non-validated
charade whose processes generate results little better than does chance."
(Horrobin, 2001)

"Peer Review is one of the sacred pillars of the scientific edifice" (Goodstein,
2000)

"Peer Review is central to the organization of modern science…why not apply
scientific [and engineering] methods to the peer review process" (Horrobin,
2001).


Currently most peer review is carried out anonymously. That is the reviewers are anonymous but the authors of a paper or grant proposal are known to the reviewer. Often when you submit a paper you are asked for a list of potential reviewers, some of whom will be used to review your paper. These two features of the review process often combine to make success or failure of a paper submission dependent on how good the submitting author is at "networking". Well known names with a large circle of connections select members of this group as their reviewers and so they are more frequently published and become more famous as an expert in that area and have more connections. This is a "rich get richer" scenario, which makes it very difficult for new researchers to break into a field. Especially a field which is fiercely competitive and in which the leading members have large egos.

Some of the open source journals are proposing alternatives:
  1. Publish the reviews alongside the paper.
  2. Name the reviewers so they are no longer anonymous to the authors.
  3. Anonymise the papers so that the reviewers do not know who the authors are.
  4. Publish the papers online before peer review and let the community review them before they take a fixed form.
The consequences of (1) are to moderate the language of reviews. Reviewers are likely to be more careful about what they say if they know these reviews are going to be made public. Combining (1) and (2) is even better as it forces reviewers to act objectively as their reputation amongst the community matters. Idea (3) is hard to achieve as reviewers will be able to work out who the authors are from a knowledge of their field and the literature cited (all scientists cite themselves more than others). The problem with (4) is how active will the community be? If there is a very active community in a small field this might work, but there is still the question of the status of the papers in the depository before they reach their final form when they are effectively not peer reviewed.

The quotes above were taken from an e-mail advertisement for a conference on improving peer review.

References

Chubin, D. R. and Hackett E. J., 1990, Peerless Science, Peer Review and U.S. Science Policy; New York, State University of New York Press.

Horrobin, D., 2001, "Something Rotten at the Core of Science?" Trends in Pharmacological Sciences, Vol. 22, No. 2, February 2001. Also at http://www.whale.to/vaccine/sci.html and http://post.queensu.ca/~forsdyke/peerrev4.htm (both Web pages
were accessed on February 1, 2010)

Goodstein, D., 2000, "How Science Works", U.S. Federal Judiciary Reference Manual on Evidence, pp. 66-72 (referenced in Horrobin, 2000)


Wednesday 26 May 2010

Extrapolation to Absurdity

Here is a typical newspaper article using statistics wrongly, this particular article is from The Independent about the number of people who can recognise Winston Churchill. The article describes a study carried out on behalf of the Royal Mint to celebrate 70 years since Churchill became Prime Minister for the first time. People were asked to identify famous 20th Century Prime Ministers and only 19% could not name Churchill but this increased to 32% of 25-34 year olds and 44% of those aged 16-24.

So assuming a linear model and extrapolating beyond the range of the dataset the report predicts that in about 80 years time Churchill will no longer be recognised. This is wrong for two reasons:
  1. The model is unlikely to be linear - the recognition factor is likely to tail off more slowly in the future as there will always be a core of Churchill recognisers that will be above zero (historians and politicians for example).
  2. You cannot extrapolate outside of the experimental range with confidence. So what they could say that in 8 years time the number of 24-34 year olds who cannot recognise Churchill will rise to 44% (those sampled who are now 16-24 and did not recognise him, still cannot do so in the future sample).


Tuesday 25 May 2010

Making Mountains of Surmise from Molehills of Evidence

The title of this post is taken from an article by the great exponent of popular mathematics and early skeptic, Martin Gardner who died last Saturday. I knew of Gardner because of his books like the Mathematical Carnival, which made maths fun and accessible. These were books that I would read along with my best friend when we were 12-13 and wonder at the beauty of mathematics. Gardner wrote a puzzle column for Scientific American which combined puzzles with magic and shows the strong link between Maths, Logic and Magic.

The particular case he was referring to was the book Atlantis written by Ignatius Donnelly in 1885, which pointed out the possible links between the South American civilisations and Ancient Egypt. They both built pyramids, they both had a calendar of 365 days and they both had flood legends. What he could not have known is that the South American civilisations such as the Incas and Aztec flourished from the 12th century AD over 3500 years after the building of the Great Pyramid. Even the Maya pre-classic period does not begin until 2000 BC, 500 years later.

This is an example of correlation that does not imply causation. They both happen to have discovered the same things. The same calendar is sensible as this is the right one that corresponds to the Earth's orbit. Flood legends are ubiquitous in human civilisations, because floods happen everywhere and pyramids are a strong and simple structure to build if you want a large monument with limited materials.

This is also a perfect example of another phenomenon from biology, that of Convergent Evolution. We end up with the same products from very different histories because of the environmental constraints. This lesson about constraints and convergence is a very important one as it is easy to fall into Donnelly's error and think that events are less probable than they actually are.

Monday 24 May 2010

MMR and ex-Doctor Wakefield

This is perhaps one of the clearest cases of unethical behaviour in medical research. It also had devastating consequences for the vaccination program in the UK -Times article on the consequences (note the comments in support of Wakefield). There is also an article in the Telegraph on MMR and autism

Andrew Wakefield carried out a study which was published in the Lancet (and later retracted by the journal) that showed a link between giving children the MMR vaccine and autism. Finally after finding Wakefield guilty of conducting unethical medical treatment, fabricating results and not-declaring conflicting interests today was sentencing day by the General Medical Council and Wakefield was struck-off the medical register.
BBC report

A nice cartoon summary of the case sums up the most important points by the Tall Guy.

One of the lead campaigners in the UK against the vaccination deniers is Ben Goldacre who has reported on the case extensively in his Blog as well as in his book Bad Science.
The media's MMR hoax
Goldacre's comments on a Today Programme interview with Wakefield

David Aaronovich's Tweet comments on the Today Programme interview.

  • Peculiar opening question to Wakefield. If you had known how much fuss it would cause would you have spoken out?
  • Wakefield says 'let us have an open debate' unchallenged with the fact of the overwhelming evidence. Now on 'choice' theme.
  • Wakefield's dishonest red herring on single vaccine unchallenged. Simply challenged on drop in MMR uptake.
  • Getting better as he's forced to admit he didn't have proof, then told he didn't actually have evidence. Now he's evading and eliding.
  • Now the point about his egregious conflict of interest. And the amount Wakefield was paid as expert in litigation.
  • He 'has no idea' how much money he received for involvement with litigation. Is allowed self-serving last word. Too short.
  • Remarkable that, on the day that he is struck off, the Today programme tweets that Wakefield "defends his investigation". Still not got!
BMJ article by Dr Evan Harris - about who else is responsible.
Guardian Article - Importance of Ethics in Science

Science and Politics - The Creation of Synthetic Life

Yesterday the media had the splash headlines that a group lead by Craig Venter (of Human Genome fame) has successfully created the first synthetic living organism. This new life is also known as Synthia. There has already been a media response about the potential dangers and the need to think about the possible implications. The BBC coverage of the story is linked below:
Artificial Life Breakthrough
Susan Watt's Blog post

Ben Goldacre was less impressed
"This new Synthia lifeform business is slightly overstated, nice proof of concept and doubtless fidlly as f**k, but nothing amazingly new."

The US Government has made a response with President Obama sending a letter to request a Bioethical review of the matter.
Obama Letter to Presidential Commission for the Study of Bioethical Issues

This provoked some interesting Tweet responses:

"*Of course* Obama has to consult *faith communtities" about Synthia. What a ridiculous country the US has become. :<"

"Obama was doing so well until he mentioned faith communities. No point, we know they'll be up in arms about Synthia, the halfwit Luddites."

PS I just read the front pages of today's papers which have headlines like God2.0. The problem with Venter is he has a very large ego and he likes to create a big spectacle. I suspect that he will regret the God2.0 epithet as when Obama consults the faith communities it is not going to go down well and I would predict harsh legislation.

There is also this Guardian article on the atheistic background to the research:
Andrew Brown - Guardian CiF

Pharyngula has summarised the ill-informed responses in the media
Pharyngula's Blog

Thursday 20 May 2010

Google is objective isn't it?

How many people think about the results they get back from Google?

You just type in a search and off it goes and find you the information impartially - Right?

Well not necessarily. There are ways for people to make sure they get higher on the Google's output than competitive ideas. This has been used during political campaigning to smear opponents by associating their webpages with searches for insulting descriptions. It was also used by a certain German car firm that was blocked by Google from searches because they had been manipulating the system.

Behind all of this Google also has to make money and some of that money is through advertising and adwords and people who pay the cash want the impact. So they want to come higher up on the search outputs.

Another way that Google influences the way people search is in the automatic completion of the search phrases which can lead people is directions that do not reflect the actual community. One such case has been reported for nano-technology:
http://www.nanowerk.com/news/newsid=16330.php

Fruitbatgate - The case for academic freedom

WARNING THIS ARTICLE CONTAINS DESCRIPTIONS OF A SEXUAL NATURE.

Why did I have to put that header? Well I open myself to the possibility that I might be repremanded the same way as Dr Dylan Evans because in discussing his case I have to mention  the cause of it all. Dr Evans has been found guilty of sexual harassment by his employers University College Cork because he showed colleagues an article from PLOS One (A journal in which I have published as well and for which I review) which was titled "Fellatio by fruit bats prolongs copulation time."

One of his female colleagues objected saying that she felt harassed and "disgusted". This was the final act in Dr Evans "inappropriate" behaviour as he had kissed her on both cheeks and had complimented her on her appearance.

Here is a blog discussing the case:
http://scienceblogs.com/pharyngula/2010/05/bat_sex_is_not_protected_by_ac.php

Here is the evidence presented at the case and the UCC statement regarding the release of confidential information.
http://felidware.com/DylanEvans/

Here is the Telegraph article about the case
http://www.telegraph.co.uk/science/science-news/7734926/Academic-disciplined-over-fruit-bat-sex-paper.html

Who do you trust? Trusting in Reputation

We live in a world of "Information Overload". So how do we know which information we can trust and what we cannot?

One way we often sort opinions from each other is based upon reputation. Everyone has their own favorite intellectuals/scientist whose views you are more likely to agree with than their opponents. In this way knowledge is very much like politics and it cannot claim to be completely objective. This is a warning to always beware hype, especially when someone is telling you they know the absolute definite truth.

Now one of my favorite TV scientists is Professor Robert Winston. He is a very well respected fertility expert and a pioneer of IVF treatment, for which he received a peerage. He is currently Professor of the Public Understanding of Science at Imperial College London. So his recent work has focused more on the media aspects of science than on the scientific research itself, but he still is an expert and he still should be carrying out decent science and he should be someone whose opinions I can trust. Here is one of his articles:
Are women more likely to conceive if they enjoy sex?

Ok so is this a good article? Can I trust what it says? Here is a bit of the text:

"Publication of an inconclusive study like this might encourage infertile
patients to feel even worse about themselves when there was no clear evidence
that orgasm improves fertility in the majority of women. "

So another question. Is it responsible to publish this anecdotal study in a newspaper, especially in the women's section? He is saying that it might be true but we have not got enough evidence.

Here are some links to pages critical of this sort of article and also some of Prof Winston's other endorsements:
Petra Boynton on a similar sexual health article in the Times
Ben Goldacre criticising his promotion of Omega-3

So there is not always one view and even the most respected scientists can get things wrong (Linus Pauling and vitamin C). So what is the best judge of what to trust? There is a growing movement towards waht is called "evidence based" approaches in fields such as medicine and even politics. So we do not make decisions based on hunches or ideologies but instead based on evidence.

How do I know if anything is right?

Q: Why do you think that the world is round?

A1: It has to be round as we can travel from one point in any direction and return to the same point without leaving the surface.

A2: It is round because we have seen it from space.

Which of these two answers is most convincing?

We like to see things for ourselves and that is perhaps some of the strongest evidence (there are plent of optical illusions and magicians to show that we can sometimes not believe our eyes). Everyday experience says that the world is flat. So we depend on the ideas of a shared experience, after all we cannot all experience everything and there are many things we do not want to experience at all. For example we do not want to have to experience poisons or toxic chemicals.

The other answer is perhaps more interesting as it is more abstract but more revealing. It shows how we can create an artificial model in our minds that can take into account all possible worlds - all possible realities and we can show that it is true for all of these cases. This is part of the reason there is such a strong connection between mathematics and knowledge.

Wednesday 19 May 2010

Reliable Knowledge

This is the title of an inspirational book by John Ziman that I read as an undergraduate. So I must give thanks for inspiring me to write this Blog but I must also point out that there is no other connection between Ziman's book and what I write here.

The actual reason for writing this Blog is to provide a set of course materials for an MSc course on Scientific Method that I will be running in September. This Blog is to provide the pre-course reading and also some of the resources for discussion groups during the week long course. The idea is to create a large-scale narrative frame-work that can by followed in a non-linear way depending on the views and questions of the students.

Another way to think of it is like a set of post-it notes each with a single bite-sized piece for revision. It is the way you link them together that makes the course.