Ashvini Jindal
@akjindal53244
Followers
524
Following
940
Media
14
Statuses
215
LLM @LinkedIn, Llama-3.1-Storm-8B, Winner ๐ of NeurIPS LLM Efficiency Challenge, Creator of Arithmo-Mistral-7B Mathematical Reasoning LLM
San Francisco, CA
Joined July 2012
๐ ๐๐ฅ๐๐ฆ๐-๐.๐-๐๐ญ๐จ๐ซ๐ฆ-๐๐ has arrived! A new 8B parameter LLM that outperforms @Meta ๐๐น๐ฎ๐บ๐ฎ-๐ฏ.๐ญ-๐ด๐-๐๐ป๐๐๐ฟ๐๐ฐ๐ and ๐๐ฒ๐ฟ๐บ๐ฒ๐-๐ฏ-๐๐น๐ฎ๐บ๐ฎ-๐ฏ.๐ญ-๐ด๐ across diverse benchmarks! Our new 8B LLM pushes the boundaries of what's possible with smaller language
15
55
249
New Anthropic Research: โInverse Scaling in Test-Time Computeโ We found cases where longer reasoning leads to lower accuracy. Our findings suggest that naรฏve scaling of test-time compute may inadvertently reinforce problematic reasoning patterns. ๐งต
59
168
1K
Excited to share ๐คฏ that our LMUnit models with @ContextualAI just claimed the top spots on RewardBench2 ๐ฅ How did we manage to rank +5% higher than models like Gemini, Claude 4, and GPT4.1? More in the details below: ๐งต 1/11
8
25
150
Introducing T5Gemma: the next generation of encoder-decoder/T5 models! ๐งDecoder models adapted to be encoder-decoder ๐ฅ32 models with different combinations ๐คAvailable in Hugging Face and Kaggle https://t.co/8eFH7yKger
22
138
778
๐จ New paper & dataset! ๐จ NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions - Synthesizes 2.8M challenging and diverse questions which require multi-step reasoning, along with reference answers - Shows steeper data scaling curve for knowledge distillation
2
90
427
What if a [MASK] was all you needed? ModernBERT is great, but we couldn't stop wondering if it could be greater than previous encoders in different ways. Maybe we don't need task-specific heads? Maybe it can do all sort of tasks with only its generative head? Spoilers: Yes
17
105
805
Finished a run (R1 style) GRPO on Qwen-2.5-0.5B (base model) yield +10 accuracy points on GSM8K. Literally just works. Base model scores 41.6% as reported on qwen paper vs 51%~ GRPO
41
110
1K
Everyone is talking about RL these days. But are we done with SFT? The answer is NO. If we revive SFT in another form, it can even beat RL! Very happy to introduce Critique Fine-Tuning, a new form of SFT, which can more efficiently activate language models' reasoning
23
96
699
Here is Tรผlu 3 405B ๐ซ our open-source post-training model that surpasses the performance of DeepSeek-V3! The last member of the Tรผlu 3 family demonstrates that our recipe, which includes Reinforcement Learning from Verifiable Rewards (RVLR) scales to 405B - with performance on
151
372
2K
I don't have too too much to add on top of this earlier post on V3 and I think it applies to R1 too (which is the more recent, thinking equivalent). I will say that Deep Learning has a legendary ravenous appetite for compute, like no other algorithm that has ever been developed
DeepSeek (Chinese AI co) making it look easy today with an open weights release of a frontier-grade LLM trained on a joke of a budget (2048 GPUs for 2 months, $6M). For reference, this level of capability is supposed to require clusters of closer to 16K GPUs, the ones being
371
2K
14K
Ever wondered if model merging works at scale? Maybe the benefits wear off for bigger models? Maybe you considered using model merging for post-training of your large model but not sure if it generalizes well? cc: @GoogleAI @GoogleDeepMind @uncnlp ๐งต๐ Excited to announce my
6
87
393
Finally got some time to chat about these new models. We started the project of Qwen2.5 at the moment we released Qwen2. Through this process we did realize a lot of problems and mistakes that we made. In terms of pretraining, we simply focus on leveling up the quality and
Welcome to the party of Qwen2.5 foundation models! This time, we have the biggest release ever in the history of Qwen. In brief, we have: Blog: https://t.co/lih1QNWCVv Blog (LLM): https://t.co/XZGw7hLoD0 Blog (Coder): https://t.co/3msdDONsqJ Blog (Math): https://t.co/shJdCOx2pL
25
33
325
Super proud of the team behind Llama 3.1 8B storm model and itโs now available on @monsterapis for finetuning and deployment in a single click! Why it matters? This model is beating all the benchmarks through the roof! With their self curation technique, it is helping the model
โญ Today marks 1-month since the release of Llama-3.1-Storm-8B model with 52k+ collective downloads with model availability on @huggingface ๐ค, @ollama , and @UnslothAI . We are starting a discord group for Storm: https://t.co/p2BL0QpJV1 ๐ From today onwards, model is also
0
2
3
โญ Today marks 1-month since the release of Llama-3.1-Storm-8B model with 52k+ collective downloads with model availability on @huggingface ๐ค, @ollama , and @UnslothAI . We are starting a discord group for Storm: https://t.co/p2BL0QpJV1 ๐ From today onwards, model is also
discord.com
Discord is great for playing games and chilling with friends, or even building a worldwide community. Customize your own space to talk, play, and hang out.
0
2
9
Today is the day! ๐ After 1.5 years of hard work, "Build A Large Language Model (From Scratch)" is finally out! Print or ebook copies are available on Manningโs site: https://t.co/bOi8w6b57u ๐. And it's also available on Amazon soon: https://t.co/fBR1J3otxG ๐ฆ.
89
334
2K
This is really cool from Google. On demand podcasts about your favourite papers and books.
36
142
2K
A 20% boost on a metric is rare, especially when itโs code generation ๐ฅฑ PlanSearch, our new search method based on diverse plans, outperforms baselines by huge margins. It's not just a search method, but also a philosophy How are these numbers achieved? can they be predicted?
7
25
227
Andrej Karpathy says as we expand our brains into an exocortex on a computing substrate, we will be renting our brains and open source will become more important because if it's "not your weights, not your brain"
67
197
1K