Recently I participated in a semi-hackathon, one-week event, at Talk of Europe - Creative Camp #3 with two other members of ExPoSe project, Alex and Hosein. We proposed to develop a system for Discovering Links in Political Conversations.
Here is a brief report on what we have done in this event:
Entity recognition and disambiguation is a strong tool for enriching parliamentary debates and linking them to external sources. However, the general-purpose entity linkers are often not able to accurately detect entities which are particularly relevant for the parliamentary domain. Therefore, in the Talk of Europe event, Creative Camp#3, we focused on designing and developing a specialized entity linker for political debates. As an application of the designed entity linker, we developed an interactive tool which utilizes the extracted entities and enables users to explore the debates based on their relation to the concepts of an external knowledge base, namely Wikipedia. This system helps users to dig into the parliamentary debates from their own perspective and explore which aspects of their interested concepts are more discussed in the parliament.
Generally, our work in the event had two different parts each of which presents an independent system:
- Entity Linker for Political Conversations
- ExPoSe WikiCat Browser
The following provides a brief explanation of each of these two systems:
Discovering Links in Political Conversations:
The goal in this part of our work was to enrich parliamentary proceedings by linking mentioned concepts and named entities to corresponding external resources, such as biographies or encyclopedia entries. To do so, we have developed a customized system for entity recognition and disambiguation which targets the most salient entity types in this domain: politicians and political parties. We have combined the specialized linker with general-purpose entity linking systems to achieve a promising precision as well as an acceptable recall.
At the moment, the developed linker can be utilized on Dutch proceedings. The system generates the links in PoliticalMashup format:
The implementation of the aforementioned system is available here: https://bitbucket.org/aolieman/pm_el_tools
ExPoSe WikiCat Browser
As the second part of our work, we have developed a system named “ExPoSe WikiCat Browser”. In general, ExPoSe WikiCat Browser enables users to browse and investigate the data from a particular point of view. The system makes use of extracted entities from the data in order to project the data on an arbitrary ontology. In fact, the structure of the selected ontology determines the point of view from which users want to see the data.
As an example from the parliamentary domain, assume a user is interested in analyzing the relation between national laws of a European country and EU legislation. One approach would be to investigate when EU legislation was discussed in the national parliament and in the context of which (proposed) national laws. In other words, it is desirable to see how debates within the parliament of that country can be projected onto the topics related to the European Parliament in terms of both the subject matter and time.
In ExPoSe WikiCat Browser, we make use of the entity linker’s output to determine the notion of the topics of the data and we let the user select a sub-hierarchy from Wikipedia’s category hierarchy1 onto which mentioned entities from the text are projected. In the selected hierarchy, each node is a category which represents a topic and its descendant nodes are its sub-topics.
In the parliamentary data example, the ExPoSe WikiCat Browser is provided with the extracted entities from national parliamentary debates and the “European Union” is selected as the category at the root of the hierarchy.
Subsequently, the system extracts all the categories in the selected hierarchy and maps extracted entities to the Wikipedia categories. Afterward, it filters the entities considering whether they present some concepts related to the extracted categories or not. Then, having the debates and entities in them which are related to the categories in the hierarchy, the system calculates the importance and recency of each category (nodes in the hierarchy). The importance of each node demonstrates how much the topic of that category is addressed in the national parliamentary debates, based on the frequency of entities related to this category. The recency of each node shows how recently the topic of this category is discussed in the debates.
We provide a graphical user interface from which the user is able to traverse the paths in the hierarchy and see which categories are more discussed in the national parliamentary debates (importance of nodes is shown by their size), which categories are recently discussed (recency of nodes is shown by their color), and which debates are related to which categories at different levels of abstraction. The links to all the debates related to the given category are provided grouped by the entities that belong to the category.
So, this system provides a hierarchical grouping of elements/documents in a large collection based on Wikipedia categories. In loose terms, it assigns categories to the documents employing the extracted entities from them and scores categories based on the time of documents assigned to them (recency) as well as the frequency of documents entities (importance). The system provides the user with a dynamic interface in which the categories, their recency/importance, and their relations are demonstrated and the user is able to browse and investigate the data from a very abstract level (categories) into a very detailed level (entities in documents).
The general idea of the system is to find the abstract representation of a data from a particular point of view, which is determined by the user, and empower the user to dive into the data from the abstract representation to the detailed information. This idea is applicable to other domains as well. For example, there are interesting discussion forums in social media that are growing and investigating and analyzing this kind of data is getting more sophisticated. So, there would be desirable to have a system which not only represents the data from the point of view which is important for us but also provides a dynamic way to browse the data in different levels of abstraction. Having these domains in mind, we are going to extend our system and evaluate it for other applications in other domains.