Sources of Evidence for Automatic Indexing of Political Texts

Our paper "Sources of Evidence for Automatic Indexing of Political Texts", with Hosein Azarbonyad, Jaap Kamps, and Maarten Marx, has been accepted as a short paper at the 37th European Conference on Information Retrieval (ECIR'15). \o/  

politicsPolitical texts on the Web, documenting laws and policies and the process leading to them, are of key importance to government, industry, and every individual citizen. Yet access to such texts is difficult due to the ever increasing volume and complexity of the content, prompting the need for indexing or annotating them with a common controlled vocabulary or ontology. In this research, we investigate the effectiveness of different sources of evidence -such as the labeled training data, textual glosses of descriptor terms, and the thesaurus structure- for automatically indexing political texts.

Our main findings are the following:

First, using a learning to rank (LTR) approach integrating all features, we observe significantly better performance than previous systems.

Second, the analysis of feature weights reveals the relative importance of various sources of evidence, also giving insight into the underlying classification problem.

Third, a lean-and-mean system using only four features is able to perform at 97% of the large LTR model.

For more details, please refer to our paper:

One thought on “Sources of Evidence for Automatic Indexing of Political Texts

Comments are closed.