Normalized Discounted Cumulative Gain — What it does and how it works
Normalized Discounted Cumulative Gain (NDCG) is a loss metric used when training ranking models. The idea is that when search results are ordered from most relevant to least relevant, the NDCG score is maximized because the further down the list of results you go, the more the metric "discounts" the result, so it pays to have the best results up top.
To start, Discounted Cumulative Gain (DCG) emphasizes highly relevant documents that appear at the beginning of the list while discounting values that appear farther down the list by applying a logarithmic scale for reduction. For example, as seen in the below table, DCG punishes document #6 with a relevance of two more than document #2 with a relevance of two.
With the logarithmic reduction, DCG will be optimized to return a list of the most relevant documents first.
The limitation to DCG is that you cannot compare results with another query. What if another query returns eight results instead of six? That query will likely result in a higher score, but is it better? This is where NDCG comes into play.
To normalize DCG values, you need to know the best ordering for that query. Reviewing the example above, the best order would be 3,3,2,2,1,0. With this information, we can divide the DCG value by the Ideal DCG value, normalizing the results.
Secondly, with the knowledge of the ideal ordering, we can increase or decrease the list size based on the list of documents the query returns, allowing the query results to be comparable. Sticking with the above example, if the returned list is 3,3,2,2,1,0,0,3, then the list size would be reduced to six values, producing a higher NDCG value showing that this query results in higher relevant documents.
There are limitations to NDCG, which are noted at the bottom of the Wikipedia page. For example, if we have two lists of documents:
Example A: 3,3,2,2,1,0,0,3
Example B: 3,3,2,2,1,0
Best Order: 3,3,2,2,1,0
The NDCG value would be identical for each query due to the list size being reduced to size values. Therefore, examples A and B would be the same results. However, within the Wikipedia page, they do note an alternative ranking system, including negative values.
Every metric has its pros and cons and NDCG will not be the optimal solution for every problem though I have found it to work well for most. I hope this article gave you insight into the how and why behind NDCG! If you have any questions or input, don't hesitate to reach out to me!