Two-Way Parsimonious Classification Models for Evolving Hierarchies

Our paper "Two-Way Parsimonious Classification Models for Evolving Hierarchies", with Hosein Azarbonyad, Jaap Kamps, and Maarten Marx, has been accepted as a long paper at the Conference and Labs of the Evaluation Forum (CLEF'16). \o/

Modern web data is highly structured in terms of entities and relations from large knowledge bases, geo-temporal references, and social network structure. These knowledge bases contain many facts and entities in a graph or hierarchical structure, making it possible to express concepts at different levels of abstraction. However, due to the dynamic nature of data, their structure may evolve over time. For example, in a hierarchy, nodes can be removed or added or even transfer across the hierarchy. Thus, modeling objects in the evolving structures and building robust classifiers for them is notoriously hard and requires employing a set of solid features from the data, which are not affected by these kinds of changes.


For example, assume we would build a classifier for the “US president” over recent data, then a standard classifier would not distinguish the role in office from the person who is the current president, leading to obvious issues after the elections in 2016. In other words, if we can separate the model of the function from the model of the person fulfilling it, for example by abstracting over several presidents, that more general model would in principle be robust over time.

These challenges are ubiquitous in dealing with any dynamic data annotated with concepts from a hierarchical structure. We study the problem in the context of parliamentary data, as a particular web data. Parliamentary proceedings in public government are one of the fully annotated data with an enriched dynamic structure linking every speech to the respective speaker, their role in the parliament and their political party.

tikz-figure0Consider a simple hierarchy of a multi-party parliament as shown in the figure , which determines different categories relevant to different layers of membership in the parliament. Also assume that all speeches of members of the parliament are available. It is desirable to use text classification approaches to study how speeches of politicians relate to ideology or other factors such as party membership or party status as government or opposition, over different periods of parliament. To this end, we need models representing each object in the intermediate levels of the hierarchy as a category representing all its descendant objects.

However, in the parliament hierarchy, since members and parties can move in the hierarchy over different periods, it is challenging to estimate models that transfer across time. For instance, after elections, governments change and prior opposition parties may form the new government, and prior government parties form the new opposition. Thus, if the model of, say, status in terms of government and opposition, is affected by terms related to the parties’ ideology, they will not be valid in the next period. This requires making these models less dependent on the “accidental” parties and members forming the government in a particular period and capture the essential features of the abstract notion of status. In order to estimate a robust model for an object in an evolving hierarchy, we need to explicitly take all the relations between the object and other objects in other layers into account and try to capture essential features by removing features that are better explained by other objects in different layers. This way, by estimating independent models for related objects, we can assure that the models remain valid even if the relational structure of the hierarchy changes over time.

Based on this, we propose Hierarchical Significant Words Language Models (HSWLM) which is an extension of significant words language models (SWLM) to be able to model hierarchical objects, which are highly robust against structural changes by capturing, all, and only the significant terms as a stable set of features. Our inspiration comes from the early work on information retrieval by Luhn 1, in which it is argued that in order to establish a model consisting of significant words, we need to eliminate both common words and specific words. Based on this idea, with respect to the structure of the hierarchy, we propose to define general terms as terms already explained by ancestor models, and specific terms as terms already explained by models of descendants, and then employ the parsimonization technique to hierarchically eliminate them as non-essential terms from the models.

The main aim of our research is to develop appropriate language models for classification of objects in the evolving hierarchies. We break this down into a number of concrete research questions:

  1. How to estimate robust language models for objects in the evolving hierarchies, by explicitly taking relations between the levels into account?
  2. How effective are hierarchical significant words language models for classifying textual objects regarding different levels of the hierarchy across time periods?
  3. Do the resulting hierarchical significant words language models capture common characteristics of classes in different levels of hierarchy over time?

In our research, we address all these research questions. We have conducted extensive experiments on richly annotated parliamentary proceedings linking every speech to the respective speaker, their political party, and their role in the parliament.  To find the answers of the above research questions, please refer to:

  1. H. P. Luhn. The automatic creation of literature abstracts. IBM J. Res. Dev., 2(2):159–165, 1958.

2 thoughts on “Two-Way Parsimonious Classification Models for Evolving Hierarchies

Comments are closed.