Our paper "Fidelity-Weighted Learning", with Arash Mehrjou, Stephan Gouws, Jaap Kamps, Bernhard Schölkopf, has been accepted at Sixth International Conference on Learning Representations (ICLR2018). \o/
Fidelity-weighted learning (FWL) is a semi-supervised student-teacher approach for training deep neural networks using weakly-labeled data. It modulates the parameter updates to a student network which trained on the task we care about, on a per-sample basis according to the posterior confidence of its label-quality estimated by a Bayesian teacher, who has access to a rather small amount of high-quality labels.
The success of deep neural networks to date depends strongly on the availability of labeled data which is costly and not always easy to obtain. Usually, it is much easier to obtain small quantities of high-quality labeled data and large quantities of unlabeled data. The problem of how to best integrate these two different sources of information during training is an active pursuit in the field of semi-supervised learning and here, with FWL, we propose an idea to address this question.
Learning from samples of variable quality
For a large class of tasks, it is also easy to define one or more so-called “weak annotators”, additional (albeit noisy) sources of weak supervision based on heuristics or “weaker”, biased classifiers trained on e.g. non-expert crowd-sourced data or data from different domains that are related. While easy and cheap to generate, it is not immediately clear if and how these additional weakly-labeled data can be used to train a stronger classifier for the task we care about. More generally, in almost all practical applications machine learning systems have to deal with data samples of variable quality. For example, in a large dataset of images only a small fraction of samples may be labeled by experts and the rest may be crowd-sourced using e.g. Amazon Mechanical Turk. In addition, in some applications, labels are intentionally perturbed due to privacy issues.
Assuming we can obtain a large set of weakly-labeled data in addition to a much smaller training set of “strong” labels, the simplest approach is to expand the training set by including the weakly-supervised samples (all samples are equal). Alternatively, one may pretrain on the weak data and then fine-tune on observations from the true function or distribution (which we call strong data). Indeed, a small amount of expert-labeled data can be augmented in such a way by a large set of raw data, with labels coming from a heuristic function, to train a more accurate neural ranking model. The downside is that such approaches are oblivious to the amount or source of noise in the labels.
All labels are equal, but some labels are more equal than others, just like animals.
Inspired by George, Animal Farm, 1945.
We argue that treating weakly-labeled samples uniformly (i.e. each weak sample contributes equally to the final classifier) ignores potentially valuable information of the label quality. Instead, we propose Fidelity-Weighted Learning (FWL), a Bayesian semi-supervised approach that leverages a small amount of data with true labels to generate a larger training set with confidence-weighted weakly-labeled samples, which can then be used to modulate the fine-tuning process based on the fidelity (or quality) of each weak sample. By directly modeling the inaccuracies introduced by the weak annotator in this way, we can control the extent to which we make use of this additional source of weak supervision: more for confidently-labeled weak samples close to the true observed data, and less for uncertain samples further away from the observed data.
How fidelity-weighted learning works?
We propose a setting consisting of two main modules:
- One is called the student and is in charge of learning a suitable data representation and performing the main prediction task,
- The other is the teacher which modulates the learning process by modeling the inaccuracies in the labels.