akjindal53244 Profile Banner
Ashvini Jindal Profile
Ashvini Jindal

@akjindal53244

Followers
524
Following
940
Media
14
Statuses
215

LLM @LinkedIn, Llama-3.1-Storm-8B, Winner ๐Ÿ† of NeurIPS LLM Efficiency Challenge, Creator of Arithmo-Mistral-7B Mathematical Reasoning LLM

San Francisco, CA
Joined July 2012
Don't wanna be here? Send us removal request.
@akjindal53244
Ashvini Jindal
1 year
๐Ÿš€ ๐‹๐ฅ๐š๐ฆ๐š-๐Ÿ‘.๐Ÿ-๐’๐ญ๐จ๐ซ๐ฆ-๐Ÿ–๐ has arrived! A new 8B parameter LLM that outperforms @Meta ๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ-๐Ÿฏ.๐Ÿญ-๐Ÿด๐—•-๐—œ๐—ป๐˜€๐˜๐—ฟ๐˜‚๐—ฐ๐˜ and ๐—›๐—ฒ๐—ฟ๐—บ๐—ฒ๐˜€-๐Ÿฏ-๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ-๐Ÿฏ.๐Ÿญ-๐Ÿด๐—• across diverse benchmarks! Our new 8B LLM pushes the boundaries of what's possible with smaller language
15
55
249
@aryopg
Aryo Pradipta Gema
4 months
New Anthropic Research: โ€œInverse Scaling in Test-Time Computeโ€ We found cases where longer reasoning leads to lower accuracy. Our findings suggest that naรฏve scaling of test-time compute may inadvertently reinforce problematic reasoning patterns. ๐Ÿงต
59
168
1K
@w33lliam
William Berrios
5 months
Excited to share ๐Ÿคฏ that our LMUnit models with @ContextualAI just claimed the top spots on RewardBench2 ๐Ÿฅ‡ How did we manage to rank +5% higher than models like Gemini, Claude 4, and GPT4.1? More in the details below: ๐Ÿงต 1/11
8
25
150
@osanseviero
Omar Sanseviero
4 months
Introducing T5Gemma: the next generation of encoder-decoder/T5 models! ๐Ÿ”งDecoder models adapted to be encoder-decoder ๐Ÿ”ฅ32 models with different combinations ๐Ÿค—Available in Hugging Face and Kaggle https://t.co/8eFH7yKger
22
138
778
@jaseweston
Jason Weston
9 months
๐Ÿšจ New paper & dataset! ๐Ÿšจ NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions - Synthesizes 2.8M challenging and diverse questions which require multi-step reasoning, along with reference answers - Shows steeper data scaling curve for knowledge distillation
2
90
427
@bclavie
Ben Claviรฉ
9 months
What if a [MASK] was all you needed? ModernBERT is great, but we couldn't stop wondering if it could be greater than previous encoders in different ways. Maybe we don't need task-specific heads? Maybe it can do all sort of tasks with only its generative head? Spoilers: Yes
17
105
805
@abacaj
anton
10 months
Finished a run (R1 style) GRPO on Qwen-2.5-0.5B (base model) yield +10 accuracy points on GSM8K. Literally just works. Base model scores 41.6% as reported on qwen paper vs 51%~ GRPO
41
110
1K
@WenhuChen
Wenhu Chen
10 months
Everyone is talking about RL these days. But are we done with SFT? The answer is NO. If we revive SFT in another form, it can even beat RL! Very happy to introduce Critique Fine-Tuning, a new form of SFT, which can more efficiently activate language models' reasoning
23
96
699
@allen_ai
Ai2
10 months
Here is Tรผlu 3 405B ๐Ÿซ our open-source post-training model that surpasses the performance of DeepSeek-V3! The last member of the Tรผlu 3 family demonstrates that our recipe, which includes Reinforcement Learning from Verifiable Rewards (RVLR) scales to 405B - with performance on
151
372
2K
@karpathy
Andrej Karpathy
10 months
I don't have too too much to add on top of this earlier post on V3 and I think it applies to R1 too (which is the more recent, thinking equivalent). I will say that Deep Learning has a legendary ravenous appetite for compute, like no other algorithm that has ever been developed
@karpathy
Andrej Karpathy
11 months
DeepSeek (Chinese AI co) making it look easy today with an open weights release of a frontier-grade LLM trained on a joke of a budget (2048 GPUs for 2 months, $6M). For reference, this level of capability is supposed to require clusters of closer to 16K GPUs, the ones being
371
2K
14K
@prateeky2806
Prateek Yadav
1 year
Ever wondered if model merging works at scale? Maybe the benefits wear off for bigger models? Maybe you considered using model merging for post-training of your large model but not sure if it generalizes well? cc: @GoogleAI @GoogleDeepMind @uncnlp ๐Ÿงต๐Ÿ‘‡ Excited to announce my
6
87
393
@JustinLin610
Junyang Lin
1 year
Finally got some time to chat about these new models. We started the project of Qwen2.5 at the moment we released Qwen2. Through this process we did realize a lot of problems and mistakes that we made. In terms of pretraining, we simply focus on leveling up the quality and
@Alibaba_Qwen
Qwen
1 year
Welcome to the party of Qwen2.5 foundation models! This time, we have the biggest release ever in the history of Qwen. In brief, we have: Blog: https://t.co/lih1QNWCVv Blog (LLM): https://t.co/XZGw7hLoD0 Blog (Coder): https://t.co/3msdDONsqJ Blog (Math): https://t.co/shJdCOx2pL
25
33
325
@Gaurav_vij137
Gaurav Vij
1 year
Super proud of the team behind Llama 3.1 8B storm model and itโ€™s now available on @monsterapis for finetuning and deployment in a single click! Why it matters? This model is beating all the benchmarks through the roof! With their self curation technique, it is helping the model
@akjindal53244
Ashvini Jindal
1 year
โญ Today marks 1-month since the release of Llama-3.1-Storm-8B model with 52k+ collective downloads with model availability on @huggingface ๐Ÿค—, @ollama , and @UnslothAI . We are starting a discord group for Storm: https://t.co/p2BL0QpJV1 ๐Ÿš€ From today onwards, model is also
0
2
3
@akjindal53244
Ashvini Jindal
1 year
โญ Today marks 1-month since the release of Llama-3.1-Storm-8B model with 52k+ collective downloads with model availability on @huggingface ๐Ÿค—, @ollama , and @UnslothAI . We are starting a discord group for Storm: https://t.co/p2BL0QpJV1 ๐Ÿš€ From today onwards, model is also
Tweet card summary image
discord.com
Discord is great for playing games and chilling with friends, or even building a worldwide community. Customize your own space to talk, play, and hang out.
0
2
9
@rasbt
Sebastian Raschka
1 year
Today is the day! ๐ŸŽ‰ After 1.5 years of hard work, "Build A Large Language Model (From Scratch)" is finally out! Print or ebook copies are available on Manningโ€™s site: https://t.co/bOi8w6b57u ๐Ÿ“š. And it's also available on Amazon soon: https://t.co/fBR1J3otxG ๐Ÿ“ฆ.
89
334
2K
@krishnanrohit
rohit
1 year
This is really cool from Google. On demand podcasts about your favourite papers and books.
36
142
2K
@evanzwangg
Evan Wang
1 year
A 20% boost on a metric is rare, especially when itโ€™s code generation ๐Ÿฅฑ PlanSearch, our new search method based on diverse plans, outperforms baselines by huge margins. It's not just a search method, but also a philosophy How are these numbers achieved? can they be predicted?
7
25
227
@tsarnick
Tsarathustra
1 year
Andrej Karpathy says as we expand our brains into an exocortex on a computing substrate, we will be renting our brains and open source will become more important because if it's "not your weights, not your brain"
67
197
1K