One of the most common and extensively studied knowledge extraction task is text categorization. Frequently customers ask how we evaluate the quality of the output of our categorization models, especially in scenarios where each document may belong to several categories.
The idea is to be able to keep track of changes in the continuous improvement cycle of models and know if those changes have been for good or bad, to commit or reject them.
This post gives answer to this question describing the metrics that we commonly adopt for model quality assessment, depending on the categorization scenario that we are facing.