Ranking Retrieval

In this world of ever expanding Multimedia, its retrieval and proper search through query is an important task. Currently, image retrieval tasks have started coming up, but are still not up to the mark. When I say image retrieval, it means looking at some characteristic properties of images, and not its associated text to search for the common features.

Query by example is a typical method of retrieval, in which we give a sample image, and ask to search in the database for similar images. An audio approach to this is the query by humming, where one could hum the tune, and all similar tunes should show up. As can be imagined this seems to be quite an extraordinary job – searching through ALL the multimedia content out there (the WWW) is almost impossible!

Anyway, this post is about the development of ranking how good / bad is the proposed retrieval algorithm. What would you typically like? To have all the similar images corresponding to the query to come up. To minimize wrong images popping up. To somehow rank them in order of their content similarity, with highest rank being most similar. On the whole, the Average Normalized Modified Retrieval Rank (ANMRR) proposes to do exactly that. It converts a seemingly subjective task to an objective, quantitative one.

The initial approach was to average the retrieval rate – a rate which measures how many images are found below a certain number among the ground truth (labelled real similar images) for each query. However setting this “certain number” could soon be a problem. Also, since the number of ground truth images (NG) differ for each query, we would bias this ARR by the smallest group.

To counter the first problem, a rank was assigned for each kth ground truth image. Then, a “relevant ranks” threshold K (typically twice the size of ground truth for that query) was set which indicates the level of tolerance for that query and a new rank was formulated as the actual Rank or a worst-case scenario of 1.25K. This is averaged across all the images retrieved to give the Average Rank (AVR).

Further, to counter the bias introduced by different sizes of ground truth sets, a Modifed Retrieval Rank (MRR) is defined as the difference between the Average Rank and half the ground truth size+1. The MRR = 0 in case all images are perfectly found and ranked but the upper bound is still dependent on NG(query). This is then normalized to prevent the mismatch in number of ground truth images to obtain a Normalized Modified Retrieval Rank (NMRR) as
$NMRR = \frac{AVR - 0.5[1+NG]}{1.25K - 0.5[1+NG]}$

This NMRR is averaged over all queries to obtain the ANMRR. A lower value of ANMRR (around say 0.1) indicates quite a good performance, while above 0.5 certainly means the algorithm needs a good review. This ANMRR is seen to coincide with subjective evaluation of the results. It is also the base comparison criteria for the MPEG-7 Colour Descriptors and their experiments.

The paper titled Color and Texture Descriptors, provides a nice compact explanation of the same.

3 thoughts on “Ranking Retrieval”

1. Carlos Pérez Lara

I always have been this question? The ANMRR is part of the standar for all descriptors the MPEG-7? I’m a research about MPEG-7, Have you ever been with MPEG-7?

Saludos de México.

1. Makarand Tapaswi Post author

Hi, yes, for most of the features where they test them with tasks such as image retrieval (any retrieval), they use the ANMRR to measure performance. I wouldn’t call mself an expert at MPEG-7, but have used some of the features for non-MPEG-7 related tasks 🙂

1. Carlos Pérez Lara

Yes!, In fact I’m reading about ANMRR. The measure was development for to value performance of the descriptors. I implemented the Color Layout and Edge Histogram and the results was good. The standar always says about a dataset the images, that dataset is free. do you know anything about dataset?