Our paper "On Horizontal and Vertical Separation in Hierarchical Text Classification", with Hosein Azarbonyad, Jaap Kamps, and Maarten Marx, has been accepted as a long paper at The ACM International Conference on the Theory of Information Retrieval (ICTIR'16). \o/
Hierarchy is an effective and common way of representing information and many real-world textual data can be organized in this way. Organizing data in a hierarchical structure is valuable since it determines relationships in the data at different levels of resolution and picks out different categories relevant to each of the different layers of memberships. In a hierarchical structure, a node at any layer could be an indicator of a document, a person, an organization, a category, an ideology, and so on, which we refer to them as “hierarchical entities”. Taking advantage of the structure in the hierarchy requires a proper way for modeling and representing entities, taking their relation in the hierarchy into consideration.
There are two types of dependencies in the hierarchies:
- Horizontal dependency, which refers to the relations of entities in the same layer. (A simple example would be the dependency between siblings which have some commonalities in terms of being descendants of the same entity.)
- Vertical dependency, which addresses the relations between ancestors and descendants in the hierarchy. (For example the relation between root and other entities.)
Due to the existence of two-dimensional dependencies between entities in the hierarchy, modeling them regardless of their relationships might result in overlapping models which are not capable of making different entities distinguishable. The overlapping models are not favorable since when the data representations are not well-separated, classification and retrieval systems are less likely to work well. Thus, two-dimensional separability, i.e. horizontal and vertical separability, is one of the key requirements of hierarchical classification.
The concept of separability is of crucial importance in information retrieval, especially when the task is not just ranking of items based on their probability of being relevant, but also making a boolean decision on whether or not an item is relevant, like in information filtering. Regarding this concern, Probability Threshold Principle (PTP) has been presented the Probability Threshold Principle (PTP), as a stronger version of the Probability Ranking Principle, for binary classification, which discusses optimizing a threshold for separating items regarding their probability of class membership. PTP is a principle based on the separability in the score space. In this research, we discuss separability in the data representation and define Strong Separation Principle as the counterpart of PTP in the feature space.
The main aim of this research is to understand and validate the effect of the separation property on hierarchical classification and discuss how to provide horizontally and vertically separable language mod- els for text-based hierarchical entities. We break this down into three concrete research questions:
- RQ1: What makes separability a desirable property for classifiers?
- We demonstrate that based on the ranking and classification principles, separation property in the data representation theoretically follows separation in the scores and consequently improves the accuracy of classifiers’ decisions. We state this as the “Strong Separation Principle” for optimizing expected effectiveness of classifiers. Furthermore, we define two-dimensional separation in the hierarchical data and discuss its necessity for hierarchical classification
- RQ2: How can we estimate horizontally and vertically separable language models for the hierarchical entities?
- We show that to estimate horizontally and vertically separable language models, they should capture all, and only, the essential terms of the entities taking their positions in the hierarchy into consideration. Based on this and inspired by Significant Words Language Models, we introduce Hierarchical Significant Words Language Models (HSWLM) and evaluate them on real-world data to demonstrate that they present models for hierarchical entities that possess both horizontal and vertical separability.
- RQ3: How separability improves transferability?
- We investigate the effectiveness of language models of hierarchical entities possessing two-dimensional separation across time and show that separability makes the models capture essential characteristics of a class, which consequently improves transferability over time.
For more details, please take a look at the paper:
- Mostafa Dehghani, H. Azarbonyad, J. Kamps, and M. Marx, "On Horizontal and Vertical Separation in Hierarchical Text Classification", In proceedings of ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR'16), 2016.