Saba @Saba_A96 X Profile

Saba

@Saba_A96

Followers

133

Following

575

Media

8

Statuses

141

MSc @Mila_Quebec and @UMontrealDIRO

Joined November 2020

Don't wanna be here? Send us removal request.

Kyle Lo

@kylelostat

4 days

why intern at Ai2? 🐟interns own major parts of our model development, sometimes even leading whole projects 🐡we're committed to open science & actively help our interns publish their work reach out if u wanna build open language models together 🤝 links👇

14

45

697

Amirhossein Kazemnejad

@a_kazemnejad

6 days

After nearly 3 years since our NeurIPS paper, SOTA architectures are now adopting NoPE. Kimi Linear uses NoPE for all full-attention layers (not a RoPE hybrid).

Rohan Paul

@rohanpaul_ai

8 days

The brilliant Kimi Linear paper. It's a hybrid attention that beats full attention while cutting memory by up to 75% and keeping 1M token decoding up to 6x faster. It cuts the key value cache by up to 75% and delivers up to 6x faster decoding at 1M context. Full attention is

7

34

371

Rohan Paul

@rohanpaul_ai

8 days

Stanford just published a huge 470-page study 📕 "The Principles of Diffusion Models" Explains how diffusion models turn noise into data and ties their main ideas together. It starts from a forward process that adds noise over time, then learns the exact reverse. The reverse

13

184

1K

Saba

@Saba_A96

19 days

Delighted to share that my supervisor @aagrawalAA has been awarded the 2025 Mark Everingham Prize, one of the most prestigious honors in the field! Looking forward to seeing her work continue to inspire. 💫🎉

Aishwarya Agrawal

@aagrawalAA

19 days

I am quite excited to share that our efforts in organizing and running "The VQA series of challenges" have been recognized with the 2025 Mark Everingham Prize -- https://t.co/oZrGEBBU5B for "stimulating a new strand of vision and language research". Thank you to the PAMI TC

0

17

Oscar Mañas @ ICCV

@oscmansan

22 days

🌺 Attending @ICCVConference in Honolulu this week! I'll be presenting our work on multimodal reward-guided decoding. Come check it out on October 21 (morning), poster #122. If you’re around, I’d love to connect and chat about multimodal models and real-time video generation!

Oscar Mañas @ ICCV

@oscmansan

3 months

I’m happy to share that our paper "Controlling Multimodal LLMs via Reward-guided Decoding" has been accepted to #ICCV2025! 🎉 w/ @proceduralia, @koustuvsinha, @adri_romsor, @michal_drozdzal, and @aagrawalAA 🔗 Read more: https://t.co/wIRL9jsAr1 🧵 Here's what we did:

0

6

20

Aishwarya Agrawal

@aagrawalAA

21 days

I will be speaking about "Reasoning, data-efficiency and alignment in vision-language models" at the CLVL workshop tomorrow (Oct 20) at ICCV @ 9.15am! So stop-by if you are interested in these topics, or just want to learn about what my lab is up-to! https://t.co/4DrufmJ03Z

Mohamed Elhoseiny

@moElhoseiny

23 days

🎉 CLVL 2025: Celebrating a Decade of Vision & Language Innovation! 🎉 Join us for a reflection on a remarkable decade-long journey for the CLVL workshop series with an amazing set of speakers!

2

5

34

Saba

@Saba_A96

25 days

🚀 Exciting opportunity to work with @RajeswarSai on cutting-edge research in video modeling and multimodal reasoning! 🎓 He’s recruiting grad students. Don’t miss it!

Sai Rajeswar

@RajeswarSai

25 days

I’m looking forward to co-supervising students in the upcoming academic year at Mila. There is much to explore in the space of action-conditioned video modeling and long-context multimodal reasoning. We are advancing & if this aligns with your interests, please apply 👇

0

4

Milad Aghajohari

@MAghajohari

1 month

Introducing linear scaling of reasoning: 𝐓𝐡𝐞 𝐌𝐚𝐫𝐤𝐨𝐯𝐢𝐚𝐧 𝐓𝐡𝐢𝐧𝐤𝐞𝐫 Reformulate RL so thinking scales 𝐎(𝐧) 𝐜𝐨𝐦𝐩𝐮𝐭𝐞, not O(n^2), with O(1) 𝐦𝐞𝐦𝐨𝐫𝐲, architecture-agnostic. Train R1-1.5B into a markovian thinker with 96K thought budget, ~2X accuracy 🧵

14

202

918

Amirhossein Kazemnejad

@a_kazemnejad

1 month

It’s clear next-gen reasoning LLMs will run for millions of tokens. RL at 1M needs ~100× compute than 128K. Our Markovian Thinking keeps compute scaling linear instead. Check out Milad’s thread; some of my perspectives below:

Milad Aghajohari

@MAghajohari

1 month

Introducing linear scaling of reasoning: 𝐓𝐡𝐞 𝐌𝐚𝐫𝐤𝐨𝐯𝐢𝐚𝐧 𝐓𝐡𝐢𝐧𝐤𝐞𝐫 Reformulate RL so thinking scales 𝐎(𝐧) 𝐜𝐨𝐦𝐩𝐮𝐭𝐞, not O(n^2), with O(1) 𝐦𝐞𝐦𝐨𝐫𝐲, architecture-agnostic. Train R1-1.5B into a markovian thinker with 96K thought budget, ~2X accuracy 🧵

18

94

898

Saba

@Saba_A96

2 months

Exciting news! 🚀 Our work on "The Promise of RL for Autoregressive Image Editing" has been accepted at NeurIPS 2025! 🎉 🔥 𝗘𝗔𝗥𝗟: A simple, scalable RL pipeline for high-quality, controllable edits. Check out the project on GitHub: [ https://t.co/KpaXflG5uC]

github.com

EARL: Editing with Autoregression and RL. Contribute to mair-lab/EARL development by creating an account on GitHub.

Saba

@Saba_A96

3 months

We built a new 𝗮𝘂𝘁𝗼𝗿𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝘃𝗲 + 𝗥𝗟 image editing model using a strong verifier — and it beats SOTA diffusion baselines using 5× less data. 🔥 𝗘𝗔𝗥𝗟: a simple, scalable RL pipeline for high-quality, controllable edits. 🧵1/

0

7

18

Rabiul Awal

@_rabiulawal

2 months

🚨Exciting news! Our paper “WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation” is accepted for an oral presentation at EMNLP 2025! 🎉 WebMMU addresses a critical gap in AI evaluation: how well can models understand and build websites? 🧵1/n

2

18

24

World Modeling Workshop 2026

@worldmodel_26

2 months

🚨Announcing the World Modeling Workshop 2026 🚨 📅 When: Feb 4–6, 2026 📍Where: Mila (Montréal) + Online (free) 💡 What: Keynotes, Methods Deep Dive, and Tutorials 🌐 https://t.co/WukFtNON3o ✉️ worldmodel.mila@gmail.com 🧵 Details below:

6

57

242

Spandana Gella

@gspandana

2 months

Internship @ServiceNowRSRCH to build the next generation of computer use agents that are safe and secure from malicious attacks. Focus on intervention strategies, defenses to make agents robust against unsafe behavior.. Apply here:

0

29

35

P Shravan Nayak

@PShravannayak

3 months

A Hindu wedding without a sacred fire? A Chinese banquet with forks? Do text-to-image models meet cultural expectations, both explicitly stated and implicitly assumed? Excited to share our latest paper on evaluating cultural alignment in T2I models 🌐 https://t.co/UCcaWGtqNG

1

23

57

Saba

@Saba_A96

3 months

8/ Paper: https://t.co/zZ5diTBZ5m Repo:

github.com

EARL: Editing with Autoregression and RL. Contribute to mair-lab/EARL development by creating an account on GitHub.

0

1

8

Saba

@Saba_A96

3 months

7/ Huge thanks to the amazing team and everyone who supported us along the way. Grateful for all the collaboration and effort! @_rabiulawal @sikarwar_ank @a_kazemnejad @OOOOLGAluo @joanrod_ai @RajeswarSai @sivareddyg @chrisjpal @benno_krojer @aagrawalAA

1

10

Saba

@Saba_A96

3 months

6/ Although diffusion-based approaches were previously seen as the dominant method for image editing, RL on AR models boosts performance, making them competitive with diffusion models while being more data-efficient. EARL shows AR+RL is a promising combination for image editing.

1

8

Saba

@Saba_A96

3 months

5/ Moreover, we conduct the first systematic analysis of SFT vs RL for image editing, showing RL post-training excels without paired data for complex edits (counting, spatial, and action changes). SFT alone is insufficient due to the lack of high-quality paired datasets.

1

8

Saba

@Saba_A96

3 months

4/ We also explored Chain-of-Thought (CoT) reasoning: 🧠 Add explanations before edits. While CoT during SFT hurt performance (Emu3 isn’t pretrained for reasoning), → RL still improved some of these weaker reasoning SFT models.

1

8

Saba

@Saba_A96

3 months

3/ EARL combines: - Autoregressive generation (discrete text+vision tokens) (Emu3) - GRPO for stable RL - A QWEN-VL-72B verifier for reward No denoising. No complex pipelines. Just a clean RL setup that works.

1

8