Saba
@Saba_A96
Followers
133
Following
575
Media
8
Statuses
141
MSc @Mila_Quebec and @UMontrealDIRO
Joined November 2020
why intern at Ai2? ๐interns own major parts of our model development, sometimes even leading whole projects ๐กwe're committed to open science & actively help our interns publish their work reach out if u wanna build open language models together ๐ค links๐
14
45
697
After nearly 3 years since our NeurIPS paper, SOTA architectures are now adopting NoPE. Kimi Linear uses NoPE for all full-attention layers (not a RoPE hybrid).
The brilliant Kimi Linear paper. It's a hybrid attention that beats full attention while cutting memory by up to 75% and keeping 1M token decoding up to 6x faster. It cuts the key value cache by up to 75% and delivers up to 6x faster decoding at 1M context. Full attention is
7
34
371
Stanford just published a huge 470-page study ๐ "The Principles of Diffusion Models" Explains how diffusion models turn noise into data and ties their main ideas together. It starts from a forward process that adds noise over time, then learns the exact reverse. The reverse
13
184
1K
Delighted to share that my supervisor @aagrawalAA has been awarded the 2025 Mark Everingham Prize, one of the most prestigious honors in the field! Looking forward to seeing her work continue to inspire. ๐ซ๐
I am quite excited to share that our efforts in organizing and running "The VQA series of challenges" have been recognized with the 2025 Mark Everingham Prize -- https://t.co/oZrGEBBU5B for "stimulating a new strand of vision and language research". Thank you to the PAMI TC
0
0
17
๐บ Attending @ICCVConference in Honolulu this week! I'll be presenting our work on multimodal reward-guided decoding. Come check it out on October 21 (morning), poster #122. If youโre around, Iโd love to connect and chat about multimodal models and real-time video generation!
Iโm happy to share that our paper "Controlling Multimodal LLMs via Reward-guided Decoding" has been accepted to #ICCV2025! ๐ w/ @proceduralia, @koustuvsinha, @adri_romsor, @michal_drozdzal, and @aagrawalAA ๐ Read more: https://t.co/wIRL9jsAr1 ๐งต Here's what we did:
0
6
20
I will be speaking about "Reasoning, data-efficiency and alignment in vision-language models" at the CLVL workshop tomorrow (Oct 20) at ICCV @ 9.15am! So stop-by if you are interested in these topics, or just want to learn about what my lab is up-to! https://t.co/4DrufmJ03Z
๐ CLVL 2025: Celebrating a Decade of Vision & Language Innovation! ๐ Join us for a reflection on a remarkable decade-long journey for the CLVL workshop series with an amazing set of speakers!
2
5
34
๐ Exciting opportunity to work with @RajeswarSai on cutting-edge research in video modeling and multimodal reasoning! ๐ Heโs recruiting grad students. Donโt miss it!
Iโm looking forward to co-supervising students in the upcoming academic year at Mila. There is much to explore in the space of action-conditioned video modeling and long-context multimodal reasoning. We are advancing & if this aligns with your interests, please apply ๐
0
0
4
Introducing linear scaling of reasoning: ๐๐ก๐ ๐๐๐ซ๐ค๐จ๐ฏ๐ข๐๐ง ๐๐ก๐ข๐ง๐ค๐๐ซ Reformulate RL so thinking scales ๐(๐ง) ๐๐จ๐ฆ๐ฉ๐ฎ๐ญ๐, not O(n^2), with O(1) ๐ฆ๐๐ฆ๐จ๐ซ๐ฒ, architecture-agnostic. Train R1-1.5B into a markovian thinker with 96K thought budget, ~2X accuracy ๐งต
14
202
918
Itโs clear next-gen reasoning LLMs will run for millions of tokens. RL at 1M needs ~100ร compute than 128K. Our Markovian Thinking keeps compute scaling linear instead. Check out Miladโs thread; some of my perspectives below:
Introducing linear scaling of reasoning: ๐๐ก๐ ๐๐๐ซ๐ค๐จ๐ฏ๐ข๐๐ง ๐๐ก๐ข๐ง๐ค๐๐ซ Reformulate RL so thinking scales ๐(๐ง) ๐๐จ๐ฆ๐ฉ๐ฎ๐ญ๐, not O(n^2), with O(1) ๐ฆ๐๐ฆ๐จ๐ซ๐ฒ, architecture-agnostic. Train R1-1.5B into a markovian thinker with 96K thought budget, ~2X accuracy ๐งต
18
94
898
Exciting news! ๐ Our work on "The Promise of RL for Autoregressive Image Editing" has been accepted at NeurIPS 2025! ๐ ๐ฅ ๐๐๐ฅ๐: A simple, scalable RL pipeline for high-quality, controllable edits. Check out the project on GitHub: [ https://t.co/KpaXflG5uC]
github.com
EARL: Editing with Autoregression and RL. Contribute to mair-lab/EARL development by creating an account on GitHub.
We built a new ๐ฎ๐๐๐ผ๐ฟ๐ฒ๐ด๐ฟ๐ฒ๐๐๐ถ๐๐ฒ + ๐ฅ๐ image editing model using a strong verifier โ and it beats SOTA diffusion baselines using 5ร less data. ๐ฅ ๐๐๐ฅ๐: a simple, scalable RL pipeline for high-quality, controllable edits. ๐งต1/
0
7
18
๐จExciting news! Our paper โWebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generationโ is accepted for an oral presentation at EMNLP 2025! ๐ WebMMU addresses a critical gap in AI evaluation: how well can models understand and build websites? ๐งต1/n
2
18
24
๐จAnnouncing the World Modeling Workshop 2026 ๐จ ๐
When: Feb 4โ6, 2026 ๐Where: Mila (Montrรฉal) + Online (free) ๐ก What: Keynotes, Methods Deep Dive, and Tutorials ๐ https://t.co/WukFtNON3o โ๏ธ worldmodel.mila@gmail.com ๐งต Details below:
6
57
242
Internship @ServiceNowRSRCH to build the next generation of computer use agents that are safe and secure from malicious attacks. Focus on intervention strategies, defenses to make agents robust against unsafe behavior.. Apply here:
0
29
35
A Hindu wedding without a sacred fire? A Chinese banquet with forks? Do text-to-image models meet cultural expectations, both explicitly stated and implicitly assumed? Excited to share our latest paper on evaluating cultural alignment in T2I models ๐ https://t.co/UCcaWGtqNG
1
23
57
7/ Huge thanks to the amazing team and everyone who supported us along the way. Grateful for all the collaboration and effort! @_rabiulawal @sikarwar_ank @a_kazemnejad @OOOOLGAluo @joanrod_ai @RajeswarSai @sivareddyg @chrisjpal @benno_krojer @aagrawalAA
1
1
10
6/ Although diffusion-based approaches were previously seen as the dominant method for image editing, RL on AR models boosts performance, making them competitive with diffusion models while being more data-efficient. EARL shows AR+RL is a promising combination for image editing.
1
1
8
5/ Moreover, we conduct the first systematic analysis of SFT vs RL for image editing, showing RL post-training excels without paired data for complex edits (counting, spatial, and action changes). SFT alone is insufficient due to the lack of high-quality paired datasets.
1
1
8
4/ We also explored Chain-of-Thought (CoT) reasoning: ๐ง Add explanations before edits. While CoT during SFT hurt performance (Emu3 isnโt pretrained for reasoning), โ RL still improved some of these weaker reasoning SFT models.
1
1
8
3/ EARL combines: - Autoregressive generation (discrete text+vision tokens) (Emu3) - GRPO for stable RL - A QWEN-VL-72B verifier for reward No denoising. No complex pipelines. Just a clean RL setup that works.
1
1
8