Some Highlights of MILA Deep Learning and Reinforcement Learning Summer Schools 2017

A couple weeks a go, I attended MILLA Deep Learning summer school (DLSS) from June 26th to July 1st and Reinforcement Learning summer school (RLSS) from  July 3rd to 5th, 2017, organized by Yoshua Bengio and Aaron Courville. You can find information about the lectures here. In the following, I will share some of "my" highlights from the summer schools.

Different types of learning problems

The first day of summer school was on general topics of machine learning and neural networks. Doina Precup gave a talk which was a gentle refreshing of the general concepts of Machine Learning and then Hugo Larochelle covered the basics of neural networks. In the second part of his talk, Hugo started by dividing learning problems into different types, based on the data and settings of the problem during training and inference. Based on his grouping, each learning problem can be classified to one of these categories:

(more…)

Learning to Attend, Copy, and Generate for Session-Based Query Suggestion

Our paper "Learning to Attend, Copy, and Generate for Session-Based Query Suggestion", with Sascha Rothe, Enrique Alfonseca, and  Pascal Fleury, has been accepted as a long paper at the international Conference on Information and Knowledge Management (CIKM'17). This paper is on the outcome of my internship at Google Research. \o/

Users interact with search engines during search sessions and try to direct their search by submitting a sequence of queries. Based on these interactions, search engines provide a prominent feature, in which they assist their users to formulate their queries to better represent their intent during Web search by providing suggestions for the next query.

 

Query suggestion might address the need for disambiguation of the user queries to make the direction of the search more clear for both, the user and the search engine.
It might help users by providing a precise and succinct query when they are not familiar with the specific terminology or when they lack understanding of the internal vocabulary and structures in order to be able to formulate an effective query. It has been shown that in general, query suggestion accelerates search satisfaction by either diving deeper into the current search direction or by moving to a different aspect of a search task.

(more…)

Share your Model instead of your Data!

Our paper "Share your Model instead of your Data: Privacy Preserving Mimic Learning for Ranking", with Hosein Azarbonyad, Jaap Kamps, and Maarten de Rijke, has been accepted at Neu-IR: SIGIR Workshop on Neural Information Retrieval (NeuIR'17). \o/

In this paper, we aim to lay the groundwork for the idea of sharing a privacy preserving model instead of sensitive data in IR applications. This suggests researchers from industry share the knowledge learned from actual users’ data with the academic community that leads to a better collaboration of all researchers in the field.

Deep neural networks demonstrate undeniable success in several fields and employing them is taking o for information retrieval problems. It has been shown that supervised neural network models perform be er as the training dataset grows bigger and becomes more diverse. Information retrieval is an experimental and empirical discipline, thus, having access to large-scale real datasets is essential for designing effective IR systems. However, in many information retrieval tasks, due to the sensitivity of the data from users and privacy issues, not all researchers have access to large-scale datasets for training their models.

Much research has been done on the general problem of pre- serving the privacy of sensitive data in IR applications, where the question is how should we design effective IR systems without damaging users’ privacy?

One of the solutions so far is to anonymize the data and try to hide the identity of users. However, there is no guarantee that the anonymized data will be as effective as the original data.

Using machine learning-based approaches, sharing the trained model instead of the original data has turned out to be an option for transferring knowledge. The idea of mimic learning is to use a model that is trained based on the signals from the original training data to annotate a large set of unlabeled data and use these labels as training signals for training a new model. It has been shown, for many tasks in computer vision and natural language processing, that we can transfer knowledge this way and the newly trained models perform as well as the model trained on the original training data.

(more…)

On Search Powered Navigation

Our paper "On Search Powered Navigation", with Glorianna Jagfeld, Hosein Azarbonyad, Alex Olieman, Jaap Kamps, Maarten Marx, has been accepted as a short paper at . \o/

Knowledge graphs and other hierarchical domain ontologies hold great promise for complex information seeking tasks, yet their massive size defies the standard and e ective way smaller hierarchies are used as a static navigation structure in faceted search or standard website navigation. As a result, we see only limited use of knowledge bases in entity surfacing for navigational queries, and fail to realize their full potential to empower search. Seeking information in structured environments consists of two main activities: exploratory browsing and focused searching.

Exploratory browsing refers to activities aimed at be er de ning the information need and increasing the level of understanding of the information space, while focused searching includes activities such as query re ning and comparison of results, which are performed a er the information need has been made more concrete. Based on the interplay of these two actions, a search system is supposed to provide a connected space of information for the users to navigate, as well as search to adjust the focus of their browsing towards useful content.

In our paper, we introduce the concept of Search Powered Navigation (SPN), which enables users to combine navigation with query based searching in a structured information space, and o ers a way to nd a balance between exploration and exploitation. We hypothesize that SPN enables users to exploit the semantic structure of a large knowledge base in an e ective way. We test this hypothesis by conducting a user study in which users are engaged in exploratory search activities and investigate the e ect of SPN on the variability in users’ behaviour and experience. We employed an exploratory search system on parliamentary data in two modes, pure navigation and search powered navigation, and tested two types of tasks, broad- and focused-topic tasks.

(more…)

Beating the Teacher: Neural Ranking Models with Weak Supervision

Our paper "Neural Ranking Models with Weak Supervision", with Hamed Zamani, Aliaksei Severyn, Jaap Kamps, and W. Bruce Croft, has been accepted as a long paper at The 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR2017). \o/
This paper is on the outcome of my pet project during my internship at Google Research.

Despite the impressive improvements achieved by unsupervised deep neural networks in computer vision, natural language processing, and speech recognition tasks, such improvements have not yet been observed in ranking for information retrieval. The reason may be the complexity of the ranking problem, as it is not obvious how to learn from queries and documents when no supervised signal is available. In our paper, we propose to train neural ranking models using weak supervision, where labels are obtained automatically without human annotators or any external resources e.g., click data.

To this aim, we use the output of a known unsupervised ranking model, such as BM25, as a weak supervision signal. We further train a set of simple yet effective ranking models based on feed-forward neural networks.  We studied their effectiveness under various learning scenarios: point-wise and pair-wise models, and using different input representations: from encoding query-document pairs into dense/sparse vectors to using word embedding representation. Our findings also suggest that supervised neural ranking models can greatly benefit from pre-training on large amounts of weakly labeled data that can be easily obtained from unsupervised IR models.

We have three main research Questions:

  • RQ1. Is it possible to learn a neural ranker only from labels provided by a completely unsupervised IR model such as BM25, as the weak supervision signal, that will exhibit superior generalization capabilities?
  • RQ2. What input representation and learning objective is most suitable for learning in such a setting?
  • RQ3. Can a supervised learning model benefit from weak supervision step, especially in cases when labeled data is limited?

(more…)

SIGIR2017 Tutorial on "Neural Networks for Information Retrieval"

We will be giving a full day tutorial on "Neural Networks for Information Retrieval", with Tom Kenter, Alexey Borisov, Christophe Van Gysel, Maarten de Rijke, Bhaskar Mitra at The 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR2017). \o/

The aim of this full-day tutorial is to give a clear overview of current tried-and-trusted neural methods in IR and how they benefit IR research. It covers key architectures, as well as the most promising future directions. It is structured as follows:

  • Basic concepts:
    • Main concepts involved in neural systems will be covered, such as back propagation, distributed representations/embeddings, convolutional layers, recurrent networks, sequence-to-sequence models, dropout, loss functions, optimization schemes like Adam.
  • Semantic matching:
    • Different methods for supervised, semi- and unsupervised learning for semantic matching will be discussed.
  • Learning to Rank with Neural Networks:
    • Feature-based models for representation learning, ranking objectives and loss functions, and training a neural ranker under different levels of supervision are going to be discussed.
  • Modeling user behavior with Neural Networks:
    • Probabilistic graphical models, Neural click models, and modeling biases using neural network will be described.
  • Generating Models:
    • The ideas on machine reading task, question answering, conversational IR, and dialogue systems will be covered.

hummm...got existed?  Join us at SIGIR2017 🙂
The material from our SIGIR 2017 tutorial on Neural Networks for Information Retrieval (NN4IR) is available online at http://nn4ir.com.

Modeling Retrieval Problem using Neural Networks

Despite the buzz surrounding deep neural networks (DNN) models for information retrieval, the literature is still lacking a systematic basic investigation on how generally we can model the retrieval problem using neural networks.
Modeling the retrieval problem in the context of neural networks means the general way that we frame the problem with regards to the essential components of a neural network, including what we consider as the objective function, and which kind of architecture we employ, how we feed the data to the network, etc.

Here, in this post, I try to present different general architectures that can be considered for modeling the retrieval problem. First, I provide a categorization of different models based on their objective function, and then I will discuss different approaches with regards to their inference time. Note that in the figures, I use the fully connected feed-forward neural network, while it can be replaced by more complex or more expressive neural models like LSTMs, or CNN.

Categorizing Models by the Type of Objective Function

There are different models that the retrieval problem can be generally formulated in the neural network framework in terms of the objective function which is defined to be optimized: Retrieval as Regression, Retrieval as Ranking, and Retrieval as Classification. I am going to explain these models and discuss their pros and cons.

Retrieval as Regression

The first architecture would be framing the retrieval problem as the scoring problem which can be phrased as the regression problem. In the regression model (left most model in above figure), given the query q and the document d, we aim at generating a score, which could be for example interpreted as the probability that the document d is relevant given the query q. In this model, network learns to produce calibrated scores, which at the end, these scores are used to rank documents. This model is also referred as the point-wise model in the learning to rank literature.

(more…)

Hierarchical Re-estimation of Topic Models for Measuring Topical Diversity

Our paper "Hierarchical Re-estimation of Topic Models for Measuring Topical Diversity", with Hosein Azarbonyad, Tom Kenter, Maarten Marx, Jaap Kamps, and Maarten de Rijke, has been accepted as a long paper at The 39th European Conference on Information Retrieval (ECIR'17). \o/

Quantitative notions of topical diversity in text documents are useful in several contexts, e.g., to assess the interdisciplinarity of a research proposal or to determine the interestingness of a document. An influential formalization of diversity has been introduced in biology. It decomposes diversity in terms of elements that belong to categories within a population and formalizes the diversity of a population d as the expected distance between two randomly selected elements of the population:

div(d) = \sum_{i=1}^{T} \sum_{j=1}^{T} p_ip_j \delta(i,j),

where p_i and p_j  are the proportions of categories i andj in the population and \delta(i, j) is the distance between i and j.

This notion of diversity had been adapted to quantify the topical diversity of a text document. Words are considered elements, topics are categories, and a document is a population. When using topic modeling for measuring topical diversity of text document d, We can model elements based on the probability of a word w given d, P(w|d), categories based on the probability of w given topic t , P(w|t), and populations based on the probability of t given d, P(t|d). In probabilistic topic modeling, at estimation time, these distributions are usually assumed to be sparse.

  1. First, the content of a document is assumed to be generated by a small subset of words from the vocabulary (i.e., P(w|d) is sparse).
  2. Second, each topic is assumed to contain only some topic-specific related words (i.e.,  P(w|t) is sparse).
  3. Finally, each document is assumed to deal with a few topics only (i.e., P(t|d) is sparse).

When approximated using currently available methods,  P(w|t) and P(t|d) are often dense rather than sparse. Dense distributions cause two problems for the quality of topic models when used for measuring topical diversity: generality and impurity. General topics mostly contain general words and are typically assigned to most documents in a corpus. Impure topics contain words that are not related to the topic. Generality and impurity of topics both result in low quality P(t|d) distributions.

Different topic re-estimation approaches. TM is a topic modeling approach like, e.g., LDA. DR is document re-estimation, TR is topic re-estimation, and TAR is topic assignment re-estimation.

(more…)

Telling how to narrow it down: Effect of Browsing Path Recommendation on Exploratory Search

Our paper "Telling how to narrow it down: Effect of Browsing Path Recommendation on Exploratory Search", with Glorianna Jagfeld, Hosein Azarbonyad, Alex Olieman, Jaap Kamps, Maarten Marx, has been accepted as a short paper at The ACM SIGIR Conference on Human Information Interaction & Retrieval (CHIIR'17). \o/

There are several information needs requiring sophisticated human-computer interactions that currently remain unsolved or poorly supported by major search applications. One of these cases is exploratory search, which refers to search tasks that are open-ended, multi-faceted, and iterative, like learning or topic investigation. This type of search often occurs in a domain unknown or poorly known to the searchers which can make it hard for them to formulate proper queries for retrieving useful documents.

Exploratory search is composed of two main activities, exploratory browsing and focused searching . Exploratory browsing refers to activities that aim at better defining the information need and raising the understanding of the information space. Focused searching corresponds to activities like query refining and results’ comparisons after the information need has been shaped more clearly. Based on this composition, an exploratory search system needs to provide its users a connected space of information to browse and investigate, as well as facilities to adjust the focus of their search towards useful documents.

Using structured data to organize unstructured information is one of the promising approaches for supporting complex search tasks, including exploratory search. Structure in the data provides overviews at different levels of abstraction and empowers the users to explore the data from different points of view. However, it may still be difficult to find useful paths of exploration and clues can help the users.

The main aim of this research is to investigate the user behavior in exploratory search when a recommendation engine for browsing paths is provided along with the browsing system.To do so, we have employed the ExPoSe-Browser (Exploratory Political Search Browser) as the baseline system and built a recommendation engine as a supplementary feature for the system. We have conducted a user study involving exploratory search tasks which revealed general differences of the browsing behavior of the subjects using the two different systems.

(more…)