Are Topically Diverse Documents Also Interesting?

Our paper "Are Topically Diverse Documents Also Interesting?", with Hosein Azarbonya, Ferron Saan, Jaap Kamps, and Maarten Marx, has been accepted as a short paper at Conference and Labs of the Evaluation Forum (CLEF'15). \o/

CLEF2015_Poster_MosText interestingness is a measure of assessing the quality of documents from users’ perspective which shows their willingness to read a document. Different approaches are proposed for measuring the interestingness of texts. Most of these approaches suppose that interesting texts are also topically diverse and estimate interestingness using topical diversity. In this paper, we investigate the relation between interestingness and topical diversity. We do this on the Dutch and Canadian parliamentary proceedings. We apply an existing measure of interestingness, which is based on structural properties of the proceedings (eg, how much interaction there is between speakers in a debate). We then compute the correlation between this measure of interestingness and topical diversity.

Our main findings are that in general there is a relatively low correlation between interestingness and topical diversity; that there are two extreme categories of documents: highly interesting, but hardly diverse (focused interesting documents) and highly diverse but not interesting documents. When we remove these two extreme types of documents there is a positive correlation between interestingness and diversity.

In order to know about the details, please read our paper: