Vision Transformer: Farewell Convolutions?

Vision Transformer (ViT) is a pure self-attention-based architecture (Transformer) without CNNs. ViT stays as close as possible to the Transformer architecture that was originally designed for text-based tasks.  One of the most key characteristics of ViT is its extremely simple way of encoding inputs and also using vanilla transformer architecture with no fancy trick. In ViT, […]

Long Range Arena: A Benchmark for Efficient Transformers

Transformers do not scale very well to long sequence lengths largely because of quadratic self-attention complexity. In the recent months, a wide spectrum of efficient, fast Transformers have been proposed to tackle this problem (check out this post). However, there was no well-established way on how to evaluate this class of models in a systematic way […]

Efficient Transformers

Transformers has garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning.  The self-attention mechanism is a key defining characteristic of Transformer models. The mechanism can be viewed as a graph-like inductive bias that connects all tokens in a sequence with a relevance-based pooling operation. A […]

MetNet: A Neural Weather Model for Precipitation Forecasting

Weather has an enormous impact on renewable energy and markets, which is expected to reach 80% of the world’s electricity production. There are many social and economic benefits of accurate weather forecasting, from improvements in our daily lives to substantial impacts on agriculture, energy and transportation and to the prevention of human and economic losses […]

Universal Transformers: The Infinite Use of Finite Means!

Thanks to Stephan Gouws for his help on writing and improving this blog post. Transformers have recently become a competitive alternative to RNNs for a range of sequence modeling tasks. They address a significant shortcoming of RNNs, i.e. their inherently sequential computation which prevents parallelization across elements of the input sequence, whilst still addressing the […]

Learning to Transform, Combine, and Reason in Open-Domain Question Answering

Our paper "Learning to Transform, Combine, and Reason in Open-Domain Question Answering", with Hosein Azarbonyad, Jaap Kamps, and Maarten de Rijke, has been accepted as a long paper at 12th ACM International Conference on Web Search and Data Mining (WSDM 2019).\o/ We have all come to expect getting direct answers to complex questions from search […]

SIGIR2018 Workshop on Learning From Noisy/Limited Data for IR

We are organizing the "Learning From Noisy/Limited Data for Information Retrieval" workshop which is co-located with SIGIR 2018. This is the first edition of this workshop and The goal of the workshop is to bring together researchers from industry, where data is plentiful but noisy, with researchers from academia, where data is sparse but clean, to […]

Fidelity-Weighted Learning

Our paper "Fidelity-Weighted Learning", with Arash Mehrjou, Stephan Gouws, Jaap Kamps, Bernhard Schölkopf, has been accepted at Sixth International Conference on Learning Representations (ICLR2018). \o/ The success of deep neural networks to date depends strongly on the availability of labeled data which is costly and not always easy to obtain. Usually, it is much easier […]

Avoiding Your Teacher's Mistakes: Training Neural Networks with Controlled Weak Supervision

This post is about the project I've done in collaboration with Aliaksei Severyn, Sascha Rothe, and Jaap Kamps, during my internship at Google Research. Deep neural networks have shown impressive results in a lot of tasks in computer vision, natural language processing, and information retrieval. However, their success is conditioned on the availability of exhaustive […]