Vision Transformer: Farewell Convolutions?

Vision Transformer (ViT) is a pure self-attention-based architecture (Transformer) without CNNs. ViT stays as close as possible to the Transformer architecture that was originally designed for text-based tasks.  One of the most key characteristics of ViT is its extremely simple way of encoding inputs and also using vanilla transformer architecture with no fancy trick. In ViT, […]

Long Range Arena: A Benchmark for Efficient Transformers

Transformers do not scale very well to long sequence lengths largely because of quadratic self-attention complexity. In the recent months, a wide spectrum of efficient, fast Transformers have been proposed to tackle this problem (check out this post). However, there was no well-established way on how to evaluate this class of models in a systematic way […]

Efficient Transformers

Transformers has garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning.  The self-attention mechanism is a key defining characteristic of Transformer models. The mechanism can be viewed as a graph-like inductive bias that connects all tokens in a sequence with a relevance-based pooling operation. A […]

MetNet: A Neural Weather Model for Precipitation Forecasting

Weather has an enormous impact on renewable energy and markets, which is expected to reach 80% of the world’s electricity production. There are many social and economic benefits of accurate weather forecasting, from improvements in our daily lives to substantial impacts on agriculture, energy and transportation and to the prevention of human and economic losses […]

Fidelity-Weighted Learning

Our paper "Fidelity-Weighted Learning", with Arash Mehrjou, Stephan Gouws, Jaap Kamps, Bernhard Schölkopf, has been accepted at Sixth International Conference on Learning Representations (ICLR2018). \o/ The success of deep neural networks to date depends strongly on the availability of labeled data which is costly and not always easy to obtain. Usually, it is much easier […]

Avoiding Your Teacher's Mistakes: Training Neural Networks with Controlled Weak Supervision

This post is about the project I've done in collaboration with Aliaksei Severyn, Sascha Rothe, and Jaap Kamps, during my internship at Google Research. Deep neural networks have shown impressive results in a lot of tasks in computer vision, natural language processing, and information retrieval. However, their success is conditioned on the availability of exhaustive […]

Learning to Attend, Copy, and Generate for Session-Based Query Suggestion

Our paper "Learning to Attend, Copy, and Generate for Session-Based Query Suggestion", with Sascha Rothe, Enrique Alfonseca, and Pascal Fleury, has been accepted as a long paper at the international Conference on Information and Knowledge Management (CIKM'17). This paper is on the outcome of my internship at Google Research. \o/ Users interact with search engines […]

Beating the Teacher: Neural Ranking Models with Weak Supervision

Our paper "Neural Ranking Models with Weak Supervision", with Hamed Zamani, Aliaksei Severyn, Jaap Kamps, and W. Bruce Croft, has been accepted as a long paper at The 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR2017). \o/ This paper is on the outcome of my pet project during my internship […]