Hadi Pouransari Profile
Hadi Pouransari

@HPouransari

Followers
1K
Following
427
Media
32
Statuses
210

ML Research @Apple, PhD @Stanford.

California, USA
Joined July 2019
Don't wanna be here? Send us removal request.
@HPouransari
Hadi Pouransari
1 day
Thanks to all who reached out, and sorry if I can't respond to everyone.
@HPouransari
Hadi Pouransari
2 months
📣We have PhD research internship positions available at Apple MLR. DM me your brief research background, resume, and availability (earliest start date and latest end date) if interested in the topics below.
0
0
11
@awnihannun
Awni Hannun
15 days
I won't be at NeurIPS, but there will be some fun MLX demos at the Apple booth: - Image generation on M5 iPad - Fast, distributed text generation on multiple M3 Ultras - FastVLM real-time on an iPhone
6
17
223
@YizheZhangNLP
Yizhe Zhang (hiring)
20 days
We use latent continuous thoughts for retrieval optimized via downstream NTP loss, unified under one LLM backbone. Since representations are shared, documents can be precomputed—eliminating 2-stage RAG. We match raw text performance but with a much shorter context budget. 📉🚀
@Jiehenlp
Jie He
22 days
Happy to introduce my internship work at @Apple . We introduce CLaRa: Continuous Latent Reasoning, an end-to-end training framework that jointly trains retrieval and generation ! 🧠📦 🔗 https://t.co/jEapFfeD7D #RAG #LLMs #Retrieval #Reasoning #AI
1
9
33
@OncelTuzel
Oncel Tuzel
27 days
Come work with us! The Machine Learning Research (MLR) team at Apple is seeking a passionate AI researcher to work on Efficient ML algorithms, including models optimized for fast inference and efficient training methods. Apply here:
Tweet card summary image
jobs.apple.com
Apply for a AIML - Machine Learning Researcher, MLR job at Apple. Read about the role and find out if it’s right for you.
6
42
377
@mkirchhof_
Michael Kirchhof
1 month
Our research team is hiring PhD interns 🍏 Spend your next summer in Paris and explore the next frontiers of LLMs for uncertainty quantification, calibration, RL and post-training, and Bayesian experimental design. Details & Application ➡️
Tweet card summary image
jobs.apple.com
Apply for a Internship - Machine Learning Research on Uncertainty job at Apple. Read about the role and find out if it’s right for you.
8
59
345
@MIT_CSAIL
MIT CSAIL
2 months
Why can’t programmers tell the difference between Halloween & Christmas? Because oct 31 = dec 25.
62
167
1K
@vishaal_urao
Vishaal Udandarao
2 months
🚀New Paper https://t.co/KB2hZljDHu We conduct a systematic data-centric study for speech-language pretraining, to improve end-to-end spoken-QA! 🎙️🤖 Using our data-centric insights, we pretrain a 3.8B SpeechLM (called SpeLangy) outperforming 3x larger models! 🧵👇
3
40
127
@alexttoshev
Alexander Toshev
2 months
If you are excited about Multimodal and Agentic Reasoning with Foundation Models, Apple ML Research has openings for Researchers, Engineers, and Interns in this area. Consider applying through the links below or feel free to send a message for more information. - Machine
Tweet card summary image
jobs.apple.com
Apply for a AIML - Machine Learning Researcher, MLR job at Apple. Read about the role and find out if it’s right for you.
12
53
460
@itsbautistam
Miguel Angel Bautista
2 months
🚀 Come work with me in the Machine Learning Research team at Apple! I’m looking for FT research scientists with a strong track of impactful publications on generative modeling (NeurIPS, ICML, ICLR, CVPR, ICCV, etc.) to join my team and work on fundamental generative modeling
Tweet card summary image
jobs.apple.com
Apply for a AIML - Machine Learning Researcher, MLR job at Apple. Read about the role and find out if it’s right for you.
7
41
348
@EranMalach
Eran Malach
2 months
SSMs promised efficient language modeling for long context, but so far seem to underperform compared to Transformers in many settings. Our new work suggests that this is not a problem with SSMs, but with how we are currently using them. Arxiv: https://t.co/bCzxawF452 🧵
6
84
415
@awnihannun
Awni Hannun
2 months
I'm super excited about M5. It's going to help a lot with compute-bound workloads in MLX. For example: - Much faster prefill. In other words time-to-first-token will go down. - Faster image / video generation - Faster fine-tuning (LoRA or otherwise) - Higher throughput for
52
108
1K
@FartashFg
Fartash Faghri
2 months
🚨While booking your travel for #NeurIPS2025, make sure to stay on Sunday, December 7 8am-5pm for CCFM Workshop (Continual and Compatible Foundation Model Updates). We have received exciting paper contributions and have an amazing lineup of speakers.
@FartashFg
Fartash Faghri
5 months
Is your AI keeping Up with the world? Announcing #NeurIPS2025 CCFM Workshop: Continual and Compatible Foundation Model Updates When/Where: Dec. 6-7 San Diego Submission deadline: Aug. 22, 2025. (opening soon!) https://t.co/oIrrtiRcNy #FoundationModels #ContinualLearning
0
3
21
@FartashFg
Fartash Faghri
2 months
📣 Internship at Apple ML Research We’re looking for a PhD research intern with interests in efficient multimodal models and video. For our recent works see https://t.co/gOZIopzufv This is a pure-research internship where the objective is to publish high-quality work. Internship
Tweet card summary image
machinelearning.apple.com
Vision Language Models (VLMs) enable visual understanding alongside textual inputs. They are typically built by passing visual tokens from a…
3
30
296
@HPouransari
Hadi Pouransari
2 months
📣We have PhD research internship positions available at Apple MLR. DM me your brief research background, resume, and availability (earliest start date and latest end date) if interested in the topics below.
@HPouransari
Hadi Pouransari
2 months
Introducing Pretraining with Hierarchical Memories: Separating Knowledge & Reasoning for On-Device LLM Deployment 💡We propose dividing LLM parameters into 1) anchor (always used, capturing commonsense) and 2) memory bank (selected per query, capturing world knowledge). [1/X]🧵
7
50
462
@HPouransari
Hadi Pouransari
2 months
How to do Chain-Of-Thoughts reasoning for language diffusion models? See 👇
@haoqik322
Murray Kang
2 months
🧵1/ Latent diffusion shines in image generation for its abstraction, iterative-refinement, and parallel exploration. Yet, applying it to text reasoning is hard — language is discrete. 💡 Our work LaDiR (Latent Diffusion Reasoner) makes it possible — using VAE + block-wise
0
3
22
@UnderGroundJeg
Huangjie Zheng
2 months
We’re excited to share our new paper: Continuously-Augmented Discrete Diffusion (CADD) — a simple yet effective way to bridge discrete and continuous diffusion models on discrete data, such as language modeling. [1/n] Paper: https://t.co/fQ8qxx4Pge
6
36
238
@mkirchhof_
Michael Kirchhof
2 months
LLMs are currently this one big parameter block that stores all sort of facts. In our new preprint, we add context-specific memory parameters to the model, and pretrain the model along with a big bank of memories. 📑 https://t.co/xTNn2rNTK5 Thread 👇
Tweet card summary image
arxiv.org
The impressive performance gains of modern language models currently rely on scaling parameters: larger models store more world knowledge and reason better. Yet compressing all world knowledge...
@HPouransari
Hadi Pouransari
2 months
Introducing Pretraining with Hierarchical Memories: Separating Knowledge & Reasoning for On-Device LLM Deployment 💡We propose dividing LLM parameters into 1) anchor (always used, capturing commonsense) and 2) memory bank (selected per query, capturing world knowledge). [1/X]🧵
0
22
176
@awnihannun
Awni Hannun
2 months
I love this line of research from my colleagues at Apple: Augmenting a language model with a hierarchical memory makes perfect sense for several reasons: - Intuitively the memory parameters should be accessed much less frequently than the weights responsible for reasoning. You
@HPouransari
Hadi Pouransari
2 months
Introducing Pretraining with Hierarchical Memories: Separating Knowledge & Reasoning for On-Device LLM Deployment 💡We propose dividing LLM parameters into 1) anchor (always used, capturing commonsense) and 2) memory bank (selected per query, capturing world knowledge). [1/X]🧵
8
75
696
@HPouransari
Hadi Pouransari
2 months
Memories complement RAG and can be combined for enhanced results. Post-hoc memory learning is possible (see Qwen, Gemma, etc.), with more ablations in the paper. Paper: https://t.co/FCgKFyvXB4 With @GrangierDavid, C. Thomas, @mkirchhof_, and @OncelTuzel at Apple MLR. [X/X]
Tweet card summary image
arxiv.org
The impressive performance gains of modern language models currently rely on scaling parameters: larger models store more world knowledge and reason better. Yet compressing all world knowledge...
0
9
58
@HPouransari
Hadi Pouransari
2 months
🚀 Consider a hypothetical hardware storing a bank with three memory levels: Anchor model: 0.8GB @ RAM Level 1: 39GB @ Flash Level 2: 155GB @ External Disk Level 3: 618GB @ Cloud Total fetch time: 38ms (vs. 198ms for a single-level flat memory bank). [9/X]
1
6
33