Ashvini Jindal @akjindal53244 X Profile

Ashvini Jindal

@akjindal53244

Followers

524

Following

940

Media

14

Statuses

215

LLM @LinkedIn, Llama-3.1-Storm-8B, Winner 🏆 of NeurIPS LLM Efficiency Challenge, Creator of Arithmo-Mistral-7B Mathematical Reasoning LLM

https://t.co/xPqOTEjGDL

San Francisco, CA

Joined July 2012

Don't wanna be here? Send us removal request.

Ashvini Jindal

@akjindal53244

1 year

🚀 𝐋𝐥𝐚𝐦𝐚-𝟑.𝟏-𝐒𝐭𝐨𝐫𝐦-𝟖𝐁 has arrived! A new 8B parameter LLM that outperforms @Meta 𝗟𝗹𝗮𝗺𝗮-𝟯.𝟭-𝟴𝗕-𝗜𝗻𝘀𝘁𝗿𝘂𝗰𝘁 and 𝗛𝗲𝗿𝗺𝗲𝘀-𝟯-𝗟𝗹𝗮𝗺𝗮-𝟯.𝟭-𝟴𝗕 across diverse benchmarks! Our new 8B LLM pushes the boundaries of what's possible with smaller language

15

55

249

Aryo Pradipta Gema

@aryopg

4 months

New Anthropic Research: “Inverse Scaling in Test-Time Compute” We found cases where longer reasoning leads to lower accuracy. Our findings suggest that naïve scaling of test-time compute may inadvertently reinforce problematic reasoning patterns. 🧵

59

168

1K

William Berrios

@w33lliam

5 months

Excited to share 🤯 that our LMUnit models with @ContextualAI just claimed the top spots on RewardBench2 🥇 How did we manage to rank +5% higher than models like Gemini, Claude 4, and GPT4.1? More in the details below: 🧵 1/11

8

25

150

Omar Sanseviero

@osanseviero

4 months

Introducing T5Gemma: the next generation of encoder-decoder/T5 models! 🔧Decoder models adapted to be encoder-decoder 🔥32 models with different combinations 🤗Available in Hugging Face and Kaggle https://t.co/8eFH7yKger

22

138

778

Jason Weston

@jaseweston

9 months

🚨 New paper & dataset! 🚨 NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions - Synthesizes 2.8M challenging and diverse questions which require multi-step reasoning, along with reference answers - Shows steeper data scaling curve for knowledge distillation

2

90

427

Ben Clavié

@bclavie

9 months

What if a [MASK] was all you needed? ModernBERT is great, but we couldn't stop wondering if it could be greater than previous encoders in different ways. Maybe we don't need task-specific heads? Maybe it can do all sort of tasks with only its generative head? Spoilers: Yes

17

105

805

anton

@abacaj

10 months

Finished a run (R1 style) GRPO on Qwen-2.5-0.5B (base model) yield +10 accuracy points on GSM8K. Literally just works. Base model scores 41.6% as reported on qwen paper vs 51%~ GRPO

41

110

1K

Wenhu Chen

@WenhuChen

10 months

Everyone is talking about RL these days. But are we done with SFT? The answer is NO. If we revive SFT in another form, it can even beat RL! Very happy to introduce Critique Fine-Tuning, a new form of SFT, which can more efficiently activate language models' reasoning

23

96

699

Ai2

@allen_ai

10 months

Here is Tülu 3 405B 🐫 our open-source post-training model that surpasses the performance of DeepSeek-V3! The last member of the Tülu 3 family demonstrates that our recipe, which includes Reinforcement Learning from Verifiable Rewards (RVLR) scales to 405B - with performance on

151

372

2K

Andrej Karpathy

@karpathy

10 months

I don't have too too much to add on top of this earlier post on V3 and I think it applies to R1 too (which is the more recent, thinking equivalent). I will say that Deep Learning has a legendary ravenous appetite for compute, like no other algorithm that has ever been developed

Andrej Karpathy

@karpathy

11 months

DeepSeek (Chinese AI co) making it look easy today with an open weights release of a frontier-grade LLM trained on a joke of a budget (2048 GPUs for 2 months, $6M). For reference, this level of capability is supposed to require clusters of closer to 16K GPUs, the ones being

371

2K

14K

Prateek Yadav

@prateeky2806

1 year

Ever wondered if model merging works at scale? Maybe the benefits wear off for bigger models? Maybe you considered using model merging for post-training of your large model but not sure if it generalizes well? cc: @GoogleAI @GoogleDeepMind @uncnlp 🧵👇 Excited to announce my

6

87

393

Junyang Lin

@JustinLin610

1 year

Finally got some time to chat about these new models. We started the project of Qwen2.5 at the moment we released Qwen2. Through this process we did realize a lot of problems and mistakes that we made. In terms of pretraining, we simply focus on leveling up the quality and

Qwen

@Alibaba_Qwen

1 year

Welcome to the party of Qwen2.5 foundation models! This time, we have the biggest release ever in the history of Qwen. In brief, we have: Blog: https://t.co/lih1QNWCVv Blog (LLM): https://t.co/XZGw7hLoD0 Blog (Coder): https://t.co/3msdDONsqJ Blog (Math): https://t.co/shJdCOx2pL

25

33

325

Gaurav Vij

@Gaurav_vij137

1 year

Super proud of the team behind Llama 3.1 8B storm model and it’s now available on @monsterapis for finetuning and deployment in a single click! Why it matters? This model is beating all the benchmarks through the roof! With their self curation technique, it is helping the model

Ashvini Jindal

@akjindal53244

1 year

⭐ Today marks 1-month since the release of Llama-3.1-Storm-8B model with 52k+ collective downloads with model availability on @huggingface 🤗, @ollama , and @UnslothAI . We are starting a discord group for Storm: https://t.co/p2BL0QpJV1 🚀 From today onwards, model is also

0

2

3

Ashvini Jindal

@akjindal53244

1 year

⭐ Today marks 1-month since the release of Llama-3.1-Storm-8B model with 52k+ collective downloads with model availability on @huggingface 🤗, @ollama , and @UnslothAI . We are starting a discord group for Storm: https://t.co/p2BL0QpJV1 🚀 From today onwards, model is also

discord.com

Discord is great for playing games and chilling with friends, or even building a worldwide community. Customize your own space to talk, play, and hang out.

0

2

9

Sebastian Raschka

@rasbt

1 year

Today is the day! 🎉 After 1.5 years of hard work, "Build A Large Language Model (From Scratch)" is finally out! Print or ebook copies are available on Manning’s site: https://t.co/bOi8w6b57u 📚. And it's also available on Amazon soon: https://t.co/fBR1J3otxG 📦.

89

334

2K

rohit

@krishnanrohit

1 year

This is really cool from Google. On demand podcasts about your favourite papers and books.

36

142

2K

Evan Wang

@evanzwangg

1 year

A 20% boost on a metric is rare, especially when it’s code generation 🥱 PlanSearch, our new search method based on diverse plans, outperforms baselines by huge margins. It's not just a search method, but also a philosophy How are these numbers achieved? can they be predicted?

7

25

227

Tsarathustra

@tsarnick

1 year

Andrej Karpathy says as we expand our brains into an exocortex on a computing substrate, we will be renting our brains and open source will become more important because if it's "not your weights, not your brain"

67

197

1K