"Poison Pills and Antidots: Inoculating Relevance Feedback", an article published in Amsterdam Science Magazine as one of the cool contributions of our CIKM2016 paper. We also have a extended abstract describing this part, accepted to be presented in DIR2016 "Inoculating Relevance Feedback Against Poison Pills", with Hosein Azarbonyad, Jaap Kamps, Djoerd Hiemstra and Maarten Marx.
Relevance Feedback (RF) is a common approach for enriching queries, given a set of explicitly or implicitly judged documents either explicitly assessed by the user or implicitly inferred from user behavior,
to improve the performance of the retrieval.
Although it has been shown that on average, the overall performance of retrieval will be improved after relevance feedback, for some topics, employing some relevant documents may decrease the average precision of the initial run. This is mostly because the feedback document is partially relevant and contains off-topic terms which adding them to the query as expansion terms results in loosing the retrieval performance. These relevant documents that hurt the performance of retrieval after feedback are called “poison pills”. In this article, we discuss the effect of poison pills on the relevance feedback and present Significant Words Language Models as an approach for estimating feedback model to tackle this problem.
Significant Words Language Models are family of models aiming to estimate models for a set of documents so that all, and only, the significant shared terms are captured in the models(see here and here). This makes these models to be not only distinctive, but also supported by all the documents in the set.
Put loosely, SWLM iteratively removes two types of words from the model: general words, i.e., common words used frequently across all the documents, and page-specific words, i.e., words mentioned in some of the relevant documents, but not the majority of them (see the above Figure). This approach prevents noise words interfering with the relevance feedback, and thus successfully improves the retrieval performance by protecting relvance feedback against Poison Pills.
We investigated the effect of poison pills on relevance feedback. To do so, for each topic with more than ten relevant documents, we add them one by one, based on their ranking in the initial run, to the feedback set and keep the track of the change in the performance of the feedback run after adding each relevant document to the feedback set compared to the feedback run without its presence.
Employing SWLM enables the feedback system to control the contribution of feedback documents and prevents their specific or general terms affect the feedback model. The above Figure shows how using SWLM empowers the feedback system to deal with the poison pills. In this figure, the performance of different systems in topic 374 on Robust04 dataset are illustrated. As can be seen, adding the seventh relevant document to the feedback set leads to a substantial decrement in the performance of the feedback in all the systems. The query is “Nobel prize winners” and the seventh document is about one of the Nobel peace prize winners, Yasser Arafat, but at the end, it has a discussion concerning Middle East issues, which contains some highly frequent terms that are non-relevant to the query.
However, SWLM is able to distinguish this document as a poison pill and by reducing its contribution to the feedback model, i.e. learning a low value for , they prevent the severe drop in the feedback performance.
So, SWLM inoculates the feedback model against poison pills by automatically determining whether adding a specific relevant document to the feedback set hurts the retrieval performance for a specific topic or not and controls its effect in the feedback model.
For more information, please take a look at our papers:
- Mostafa Dehghani, H. Azarbonyad, J. Kamps, D. Hiemstra, and M. Marx. “Inoculating Relevance Feedback Against Poison Pills“, In proceedings of Dutch-Belgian Information Retrieval (DIR’16), 2016.
- Mostafa Dehghani, H. Azarbonyad, J. Kamps, D. Hiemstra, and M. Marx. “Luhn Revisited: Significant Words Language Models“, In the proceedings of The ACM International Conference on Information and Knowledge Management (CIKM’16), 2016, pp.1301-1310.