Universal Transformers

Thanks to Stephan Gouws for his help on writing and improving this blog post. Transformers have recently become a competitive alternative to RNNs for a range of sequence modeling tasks. They address a significant shortcoming of RNNs, i.e. their inherently sequential computation which prevents parallelization across elements of the input sequence, whilst still addressing the […]

Fidelity-Weighted Learning

Our paper “Fidelity-Weighted Learning”, with Arash Mehrjou, Stephan Gouws, Jaap Kamps, Bernhard Schölkopf, has been accepted at Sixth International Conference on Learning Representations (ICLR2018). \o/ [perfectpullquote align=”full” bordertop=”false” cite=”” link=”” color=”” class=”#16989D” size=”16″] tl;dr Fidelity-weighted learning (FWL) is a semi-supervised student-teacher approach for training deep neural networks using weakly-labeled data. It modulates the parameter updates to […]