Jason Ramapuram Profile
Jason Ramapuram

@jramapuram

Followers
1K
Following
2K
Media
44
Statuses
268

ML Research Scientist  MLR | Formerly: DeepMind, Qualcomm, Viasat, Rockwell Collins | Swiss-minted PhD in ML | Barista alumnus ☕ @ Starbucks | 🇺🇸🇮🇳🇱🇻🇮🇹

Joined August 2009
Don't wanna be here? Send us removal request.
@jramapuram
Jason Ramapuram
11 months
Enjoy attention? Want to make it ~18% faster? Try out Sigmoid Attention. We replace the traditional softmax in attention with a sigmoid and a constant (not learned) scalar bias based on the sequence length. Paper: Code: This was
Tweet media one
Tweet media two
16
163
835
@jramapuram
Jason Ramapuram
4 days
RT @teelinsan: Uncertainty quantification (UQ) is key for safe, reliable LLMs. but are we evaluating it correctly?. 🚨 Our ACL2025 paper f….
0
11
0
@jramapuram
Jason Ramapuram
8 days
RT @HPouransari: 🌟Explore key insights from the FastVLM project (real-time vision-language model) in this blog post:. .
Tweet card summary image
machinelearning.apple.com
Vision Language Models (VLMs) enable visual understanding alongside textual inputs. They are typically built by passing visual tokens from a…
0
38
0
@jramapuram
Jason Ramapuram
17 days
Data mixing ratios are critical for modern LLM training. This work takes a first principles approach and develops scaling laws for the mixing ratios, enabling “train small” -> “get guarantees at scale”. Definitely worth a read.
@MustafaShukor1
Mustafa Shukor
17 days
We propose new scaling laws that predict the optimal data mixture, for pretraining LLMs, native multimodal models and large vision encoders !. Only running small-scale experiments is needed, and we can then extrapolate to large-scale ones. These laws allow 1/n 🧵
Tweet media one
0
2
14
@jramapuram
Jason Ramapuram
28 days
Love Mamba? Take a deep dive into this work from Apple MLR.
@TeresaNHuang
Teresa Huang
1 month
Is the mystery behind the performance of Mamba🐍  keeping you awake at night? We got you covered! Our ICML2025 paper demystifies input selectivity in Mamba from the lens of approximation power, long-term memory, and associative recall capacity.
Tweet media one
0
0
4
@jramapuram
Jason Ramapuram
29 days
RT @mkirchhof_: Can LLMs access and describe their own internal distributions? With my colleagues at Apple, I invite you to take a leap for….
0
19
0
@jramapuram
Jason Ramapuram
1 month
RT @eugene_ndiaye: Some 📸🤳from the ongoing #MlssSenegal2025 🙌🏿
Tweet media one
Tweet media two
Tweet media three
Tweet media four
0
18
0
@jramapuram
Jason Ramapuram
1 month
RT @ssahoo_: 🚨 “The Diffusion Duality” is out! @ICML2025 . ⚡️ Few-step generation in discrete diffusion language models by exploiting the u….
0
100
0
@jramapuram
Jason Ramapuram
2 months
RT @pals_nlp_wrkshp: Join us at @emnlpmeeting for: . "Tailoring AI: Exploring Active and Passive LLM Personalization" 🎯🧠. To answer, when s….
0
16
0
@jramapuram
Jason Ramapuram
2 months
RT @ruomingpang: At WWDC we introduce a new generation of LLMs developed to enhance the Apple Intelligence features. We also introduce the….
Tweet card summary image
machinelearning.apple.com
With Apple Intelligence, we're integrating powerful generative AI right into the apps and experiences people use every day, all while…
0
110
0
@jramapuram
Jason Ramapuram
2 months
RT @thoma_gu: I will be attending #CVPR2025 and presenting our latest research at Apple MLR! Specifically, I will present our highlight pos….
0
19
0
@jramapuram
Jason Ramapuram
2 months
RT @stevenstrogatz: My new #math series in the New York Times, "Math, Revealed," is aimed at everyone, whether you love math or not. Have a….
Tweet card summary image
nytimes.com
In the world of taxicab geometry, even the Pythagorean theorem takes a back seat.
0
147
0
@jramapuram
Jason Ramapuram
2 months
RT @Maureendss: Now that @ISCAInterspeech registration is open, time for some shameless promo!. Sign-up and join our Interspeech tutorial:….
interspeech2025.org
0
5
0
@jramapuram
Jason Ramapuram
2 months
RT @GoogleDeepMind: Video, meet audio. 🎥🤝🔊. With Veo 3, our new state-of-the-art generative video model, you can add soundtracks to clips y….
0
1K
0
@jramapuram
Jason Ramapuram
2 months
RT @reach_vb: Let's goo! Starting today you can access 5000+ LLMs powered by MLX directly from Hugging Face Hub! 🔥. All you need to do is c….
0
20
0
@jramapuram
Jason Ramapuram
3 months
RT @zhaisf: Proud to report that TarFlow is accepted to #ICML2025 as a Spotlight 🎉 I’m really looking forward to new ideas and applications….
0
13
0
@jramapuram
Jason Ramapuram
3 months
Great push by @FlorisWeers in getting these models out and in a clean easy to use script:
Tweet card summary image
github.com
Contribute to apple/ml-sigmoid-attention development by creating an account on GitHub.
0
0
2
@jramapuram
Jason Ramapuram
3 months
Stop by poster #596 at 10A-1230P tomorrow (Fri 25 April) at #ICLR2025 to hear more about Sigmoid Attention! . We just pushed 8 trajectory checkpoints each for two 7B LLMs for Sigmoid Attention and a 1:1 Softmax Attention (trained with a deterministic dataloader for 1T tokens):. -
Tweet media one
@jramapuram
Jason Ramapuram
6 months
Small update on SigmoidAttn (arXiV incoming). - 1B and 7B LLM results added and stabilized. - Hybrid Norm [on embed dim, not seq dim], `x + norm(sigmoid(QK^T / sqrt(d_{qk}))V)`, stablizes longer sequence (n=4096) and larger models (7B). H-norm used with Grok-1 for example.
Tweet media one
1
14
45
@jramapuram
Jason Ramapuram
4 months
RT @MartinKlissarov: Here is an RL perspective on understanding LLMs for decision making. Are LLMs best used as: .policies / rewards / tra….
0
29
0
@jramapuram
Jason Ramapuram
4 months
RT @prlz77: Our work on fine-grained control of LLMs and diffusion models via Activation Transport will be presented @iclr_conf as spotligh….
machinelearning.apple.com
Large generative models are becoming increasingly capable and more widely deployed to power production applications, but getting these…
0
10
0
@jramapuram
Jason Ramapuram
4 months
RT @MustafaShukor1: We release a large scale study to answer the following:.- Is late fusion inherently better than early fusion for multim….
0
77
0