Hadi Pouransari
@HPouransari
Followers
1K
Following
427
Media
32
Statuses
210
ML Research @Apple, PhD @Stanford.
California, USA
Joined July 2019
I won't be at NeurIPS, but there will be some fun MLX demos at the Apple booth: - Image generation on M5 iPad - Fast, distributed text generation on multiple M3 Ultras - FastVLM real-time on an iPhone
6
17
223
We use latent continuous thoughts for retrieval optimized via downstream NTP loss, unified under one LLM backbone. Since representations are shared, documents can be precomputed—eliminating 2-stage RAG. We match raw text performance but with a much shorter context budget. 📉🚀
Happy to introduce my internship work at @Apple . We introduce CLaRa: Continuous Latent Reasoning, an end-to-end training framework that jointly trains retrieval and generation ! 🧠📦 🔗 https://t.co/jEapFfeD7D
#RAG #LLMs #Retrieval #Reasoning #AI
1
9
33
Come work with us! The Machine Learning Research (MLR) team at Apple is seeking a passionate AI researcher to work on Efficient ML algorithms, including models optimized for fast inference and efficient training methods. Apply here:
jobs.apple.com
Apply for a AIML - Machine Learning Researcher, MLR job at Apple. Read about the role and find out if it’s right for you.
6
42
377
Our research team is hiring PhD interns 🍏 Spend your next summer in Paris and explore the next frontiers of LLMs for uncertainty quantification, calibration, RL and post-training, and Bayesian experimental design. Details & Application ➡️
jobs.apple.com
Apply for a Internship - Machine Learning Research on Uncertainty job at Apple. Read about the role and find out if it’s right for you.
8
59
345
Why can’t programmers tell the difference between Halloween & Christmas? Because oct 31 = dec 25.
62
167
1K
🚀New Paper https://t.co/KB2hZljDHu We conduct a systematic data-centric study for speech-language pretraining, to improve end-to-end spoken-QA! 🎙️🤖 Using our data-centric insights, we pretrain a 3.8B SpeechLM (called SpeLangy) outperforming 3x larger models! 🧵👇
3
40
127
If you are excited about Multimodal and Agentic Reasoning with Foundation Models, Apple ML Research has openings for Researchers, Engineers, and Interns in this area. Consider applying through the links below or feel free to send a message for more information. - Machine
jobs.apple.com
Apply for a AIML - Machine Learning Researcher, MLR job at Apple. Read about the role and find out if it’s right for you.
12
53
460
🚀 Come work with me in the Machine Learning Research team at Apple! I’m looking for FT research scientists with a strong track of impactful publications on generative modeling (NeurIPS, ICML, ICLR, CVPR, ICCV, etc.) to join my team and work on fundamental generative modeling
jobs.apple.com
Apply for a AIML - Machine Learning Researcher, MLR job at Apple. Read about the role and find out if it’s right for you.
7
41
348
SSMs promised efficient language modeling for long context, but so far seem to underperform compared to Transformers in many settings. Our new work suggests that this is not a problem with SSMs, but with how we are currently using them. Arxiv: https://t.co/bCzxawF452 🧵
6
84
415
I'm super excited about M5. It's going to help a lot with compute-bound workloads in MLX. For example: - Much faster prefill. In other words time-to-first-token will go down. - Faster image / video generation - Faster fine-tuning (LoRA or otherwise) - Higher throughput for
52
108
1K
🚨While booking your travel for #NeurIPS2025, make sure to stay on Sunday, December 7 8am-5pm for CCFM Workshop (Continual and Compatible Foundation Model Updates). We have received exciting paper contributions and have an amazing lineup of speakers.
Is your AI keeping Up with the world? Announcing #NeurIPS2025 CCFM Workshop: Continual and Compatible Foundation Model Updates When/Where: Dec. 6-7 San Diego Submission deadline: Aug. 22, 2025. (opening soon!) https://t.co/oIrrtiRcNy
#FoundationModels #ContinualLearning
0
3
21
📣 Internship at Apple ML Research We’re looking for a PhD research intern with interests in efficient multimodal models and video. For our recent works see https://t.co/gOZIopzufv This is a pure-research internship where the objective is to publish high-quality work. Internship
machinelearning.apple.com
Vision Language Models (VLMs) enable visual understanding alongside textual inputs. They are typically built by passing visual tokens from a…
3
30
296
📣We have PhD research internship positions available at Apple MLR. DM me your brief research background, resume, and availability (earliest start date and latest end date) if interested in the topics below.
Introducing Pretraining with Hierarchical Memories: Separating Knowledge & Reasoning for On-Device LLM Deployment 💡We propose dividing LLM parameters into 1) anchor (always used, capturing commonsense) and 2) memory bank (selected per query, capturing world knowledge). [1/X]🧵
7
50
462
How to do Chain-Of-Thoughts reasoning for language diffusion models? See 👇
🧵1/ Latent diffusion shines in image generation for its abstraction, iterative-refinement, and parallel exploration. Yet, applying it to text reasoning is hard — language is discrete. 💡 Our work LaDiR (Latent Diffusion Reasoner) makes it possible — using VAE + block-wise
0
3
22
We’re excited to share our new paper: Continuously-Augmented Discrete Diffusion (CADD) — a simple yet effective way to bridge discrete and continuous diffusion models on discrete data, such as language modeling. [1/n] Paper: https://t.co/fQ8qxx4Pge
6
36
238
LLMs are currently this one big parameter block that stores all sort of facts. In our new preprint, we add context-specific memory parameters to the model, and pretrain the model along with a big bank of memories. 📑 https://t.co/xTNn2rNTK5 Thread 👇
arxiv.org
The impressive performance gains of modern language models currently rely on scaling parameters: larger models store more world knowledge and reason better. Yet compressing all world knowledge...
Introducing Pretraining with Hierarchical Memories: Separating Knowledge & Reasoning for On-Device LLM Deployment 💡We propose dividing LLM parameters into 1) anchor (always used, capturing commonsense) and 2) memory bank (selected per query, capturing world knowledge). [1/X]🧵
0
22
176
I love this line of research from my colleagues at Apple: Augmenting a language model with a hierarchical memory makes perfect sense for several reasons: - Intuitively the memory parameters should be accessed much less frequently than the weights responsible for reasoning. You
Introducing Pretraining with Hierarchical Memories: Separating Knowledge & Reasoning for On-Device LLM Deployment 💡We propose dividing LLM parameters into 1) anchor (always used, capturing commonsense) and 2) memory bank (selected per query, capturing world knowledge). [1/X]🧵
8
75
696
Memories complement RAG and can be combined for enhanced results. Post-hoc memory learning is possible (see Qwen, Gemma, etc.), with more ablations in the paper. Paper: https://t.co/FCgKFyvXB4 With @GrangierDavid, C. Thomas, @mkirchhof_, and @OncelTuzel at Apple MLR. [X/X]
arxiv.org
The impressive performance gains of modern language models currently rely on scaling parameters: larger models store more world knowledge and reason better. Yet compressing all world knowledge...
0
9
58
🚀 Consider a hypothetical hardware storing a bank with three memory levels: Anchor model: 0.8GB @ RAM Level 1: 39GB @ Flash Level 2: 155GB @ External Disk Level 3: 618GB @ Cloud Total fetch time: 38ms (vs. 198ms for a single-level flat memory bank). [9/X]
1
6
33