Query by example is a typical method of retrieval, in which we give a sample image, and ask to search in the database for similar images. An audio approach to this is the query by humming, where one could hum the tune, and all similar tunes should show up. As can be imagined this seems to be quite an extraordinary job – searching through ALL the multimedia content out there (the WWW) is almost impossible!
Anyway, this post is about the development of ranking how good / bad is the proposed retrieval algorithm. What would you typically like? To have all the similar images corresponding to the query to come up. To minimize wrong images popping up. To somehow rank them in order of their content similarity, with highest rank being most similar. On the whole, the Average Normalized Modified Retrieval Rank (ANMRR) proposes to do exactly that. It converts a seemingly subjective task to an objective, quantitative one.
The initial approach was to average the retrieval rate – a rate which measures how many images are found below a certain number among the ground truth (labelled real similar images) for each query. However setting this “certain number” could soon be a problem. Also, since the number of ground truth images (NG) differ for each query, we would bias this ARR by the smallest group.
To counter the first problem, a rank was assigned for each kth ground truth image. Then, a “relevant ranks” threshold K (typically twice the size of ground truth for that query) was set which indicates the level of tolerance for that query and a new rank was formulated as the actual Rank or a worst-case scenario of 1.25K. This is averaged across all the images retrieved to give the Average Rank (AVR).
Further, to counter the bias introduced by different sizes of ground truth sets, a Modifed Retrieval Rank (MRR) is defined as the difference between the Average Rank and half the ground truth size+1. The MRR = 0 in case all images are perfectly found and ranked but the upper bound is still dependent on NG(query). This is then normalized to prevent the mismatch in number of ground truth images to obtain a Normalized Modified Retrieval Rank (NMRR) as
This NMRR is averaged over all queries to obtain the ANMRR. A lower value of ANMRR (around say 0.1) indicates quite a good performance, while above 0.5 certainly means the algorithm needs a good review. This ANMRR is seen to coincide with subjective evaluation of the results. It is also the base comparison criteria for the MPEG-7 Colour Descriptors and their experiments.
The paper titled Color and Texture Descriptors, provides a nice compact explanation of the same.