Friday, 25 June 2010

Cyril Cleverdon and Document Retrieval

Cyril Cleverdon is one of those people you will not see mentioned in the news. He was a librarian at Cranfield. In the 1950s he wrote some of the pioneering work about document retrieval and most importantly defined the terms precision and recall.

Precision is the fraction of retrieved documents relevant to the search.

Precision = no. of retrieved relevant documents / no. of retrieved documents

Precision is the number of true positives relative to false positives successfully retrieved. The is a measure of the type I error.

Recall is the fraction of documents that are relevant to the query that are succesfully retrieved.

Recall = no. of relevant documents retrieved / no. of relevant documents

Recall is the number of true positives retrieved out of the total number of positives, and is a measure of the type II error. This is much harder to measure or infer than the type I error, as we cannot be sure of the total number of relevant documents except in cases where we use synthetic data.

No comments:

Post a Comment