Universal Transformers

Thanks to Stephan Gouws for his help on writing and improving this blog post. Transformers have recently become a competitive alternative to RNNs for a range of sequence modeling tasks. They address a significant shortcoming of RNNs, i.e. their inherently sequential computation which prevents parallelization across elements of the input sequence, whilst still addressing the […]

Fidelity-Weighted Learning

Our paper "Fidelity-Weighted Learning", with Arash Mehrjou, Stephan Gouws, Jaap Kamps, Bernhard Schölkopf, has been accepted at Sixth International Conference on Learning Representations (ICLR2018). \o/ The success of deep neural networks to date depends strongly on the availability of labeled data which is costly and not always easy to obtain. Usually, it is much easier […]