Keivan Alizadeh
@KeivanAlizadeh2
Followers
612
Following
246
Media
2
Statuses
35
Join our innovative team at #Apple as a Research Scientist/Engineer specializing in LLM #Reasoning, #Planning, and General #Intelligence. We are seeking an ideal candidate who: - Is available to start by the end of this year - Holds a PhD or will graduate by year-end - Has 3-5
lnkd.in
This link will take you to a page that’s not on LinkedIn
9
31
257
Our paper on the reasoning illusion asked important questions about current evaluation paradigm of reasoning models and how they behave with respect to complexity. We hoped that our findings can help to look beyond benchmarks to better understand logical scaling & behavior of
arxiv.org
Recent generations of language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes before providing answers. While these models demonstrate improved...
🧵 1/8 The Illusion of Thinking: Are reasoning models like o1/o3, DeepSeek-R1, and Claude 3.7 Sonnet really "thinking"? 🤔 Or are they just throwing more compute towards pattern matching? The new Large Reasoning Models (LRMs) show promising gains on math and coding benchmarks,
5
11
113
🧵 1/8 The Illusion of Thinking: Are reasoning models like o1/o3, DeepSeek-R1, and Claude 3.7 Sonnet really "thinking"? 🤔 Or are they just throwing more compute towards pattern matching? The new Large Reasoning Models (LRMs) show promising gains on math and coding benchmarks,
111
579
3K
We have open-sourced GSM-Symbolic templates and generated data! 🎉 - Github: https://t.co/qpRdfbntbw - Hugging Face: https://t.co/TXC4cIyadF I will be also attending #NeurIPS2024. If you are also attending and would like to discuss research ideas on reasoning, let's connect :)
huggingface.co
8
4
41
How we can make the process of RLHF more robust? Using a simple trick: Instead of limiting the KL divergence to a single SFT model we can search around a model soup which resides in a higher reward space. Please check our interns great work!
1/🔔Excited to share my internship work, SALSA: Soup-based Alignment Learning for Stronger Adaptation, (NeurIPS workshop paper)! 🎉 Proximal Policy Optimization (PPO) often limits exploration by keeping models tethered to a single reference model. SALSA, however, breaks free
0
0
9
1/ LLM inference is very expensive; and LLMs don't necessarily use their full capacity to respond to a specific prompt. That's why many researchers have been investigating adaptive computation methods such as early exiting, layer/expert pruning, speculative decoding, mixture of
6
55
317
** Intern position on LLM reasoning ** @mchorton1991, @i_mirzadeh, @KeivanAlizadeh2 and I are co-hosting an intern position at #Apple to work on understanding and improving reasoning capabilities of LLMs. The ideal candidate: - Has prior publications on LLM reasoning - Is
1/ Can Large Language Models (LLMs) truly reason? Or are they just sophisticated pattern matchers? In our latest preprint, we explore this key question through a large-scale study of both open-source like Llama, Phi, Gemma, and Mistral and leading closed models, including the
3
23
171
nobody will remember: - your salary - how “busy you were” - how many hours you worked people will remember: - nothing. You will not be remembered. You have conquered no lands. Forged no new nations. Fought for no noble causes. You existed for a brief moment but soon you, and
710
1K
10K
1/ Can Large Language Models (LLMs) truly reason? Or are they just sophisticated pattern matchers? In our latest preprint, we explore this key question through a large-scale study of both open-source like Llama, Phi, Gemma, and Mistral and leading closed models, including the
343
1K
6K
Hey Guys, I'm gonna present LLM in a flash in ACL 2024. Hit me up if you are in Bangkok. https://t.co/t67MbvpPOO Updates from previous version: - Llama 2 results - Some results on Apple GPUs (Metal) - Speculative decoding - Memory Latency Tradeoff - Impact of longer generation
arxiv.org
Large language models (LLMs) are central to modern natural language processing, delivering exceptional performance in various tasks. However, their substantial computational and memory...
0
6
45
Introducting🪆Matryoshka Representations for Adaptive Deployment🪆 TL;DR: up to 14× lower real-world classification & retrival costs at web-scale at no loss in accuracy & w/o any overhead across setups. Paper: https://t.co/JMP9ED72L2 Code: https://t.co/SEccseeDxz [1/11]
6
99
484
Instead of a single neural network, why not train lines, curves and simplexes in parameter space? Fantastic work by @Mitchnw et al. exploring how this idea can lead to more accurate and robust models: https://t.co/9VpB3TdJR6
2
44
282
I've been seeing a lot of talk around the recent Vision Transformer (ViT) paper, so I thought I'd highlight some of my favorite previous work on self-attention and transformers in computer vision! Link to ViT: https://t.co/1bEXg9Ad06 (thread 👇)
2
50
255
Catch us at this ECCV where @JamesPa91074457 and @sarahmhpratt from our lab present their works as spotlights!!! VisualCOMET: https://t.co/mDhvcsDpHD Grounded Situation Recognition: https://t.co/c9RDo3KvCj
#ECCV2020
Check out our spotlight presentation at ECCV this week on Grounded Situation Recognition + new demo! Demo: https://t.co/wo5l5R2iL1 Project Page (paper, code, dataset, video): https://t.co/y5OPuAaKxO Poster: Tue 14:00, Wed 00:00 (UTC+1) Spotlight Q+A: Wed 08:50 (UTC+1)
1
3
4
Excited to share our work for diagnosing breast cancer. We extend self-attention mechanism to learn representations on 100s of megapixel images end-to-end, that too on a GTX-1080 GPU, improving the previous best method by 8% and matching the performance of pathologists. (1/n)
1
2
14
NED allows to compare methods across different tasks. It gives insights about the SOTA methods (in few-shot, unsupervised learning, etc) vs simple baselines.
0
0
4
Glad to be part of the NED team. A simple framework toward more realistic ML Systems. NED doesn't separate train and test. Just go iN thE wilD, collect data and evaluate. NED extends ML models to ML Systems, which contains both model and training strategy.
Sharing In The Wild: From ML Models to Pragmatic ML systems In The Wild (NED) is a learning and evaluation framework designed to further progress towards general ML systems capable of excelling in the real world. Paper: https://t.co/gyJAVy9hJA Site: https://t.co/LYoVtXjJyA
1
3
14
Hello, world! This is the twitter handle of the Reasoning, AI and VisioN lab, RAIVN Lab - like the bird, @uwcse led by Prof. Ali Farhadi Check out our webpage: https://t.co/tgBTEVWbzL & follow us Transitioning from being low-key to active to share cool works🥳 RTs appreciated
0
19
41
sharing Supermasks in Superposition (SupSup): A model that sequentially learns thousands of tasks with negligible forgetting---even without access to task identity information. arxiv: https://t.co/9eFj4TzamK code: https://t.co/DvMSkDMwSO blog: https://t.co/sEW0izlkiF (1/8)
3
56
203
Excited to share Grounded Situation Recognition -- our work (with @yatskar, @LucaWeihs, Ali Farhadi, and @anikembhavi) on predicting and grounding situations in images! Paper: https://t.co/jlSoOZsERr Code: https://t.co/WwDMQZaHPq
1
18
56