Shashank Rajput @shashank_r12 X Profile

Shashank Rajput

@shashank_r12

Followers

856

Following

9K

Media

12

Statuses

222

LLM Research @Meta

https://t.co/iQjvn1FQi9

Joined October 2013

Don't wanna be here? Send us removal request.

Dimitris Papailiopoulos

@DimitrisPapail

1 month

My student @AngelikiGiannou is on the job market and she's the one that wrote the OG Looped Transformer paper. I strongly encourage you to read it, it's a tour de force! While I won't claim it influenced work on test-time compute, it 100% anticipated directions the community is

Dimitris Papailiopoulos

@DimitrisPapail

3 years

Can transformers follow instructions? We explore this in: "Looped Transformers as Programmable Computers" https://t.co/MAVjjOTDT1 led by Angeliki (@AngelikiGiannou) and Shashank (@shashank_r12) in collaboartion with the @Kangwook_Lee and @jasondeanlee Here is a 🧵

3

12

61

finbarr

@finbarrtimbers

1 month

The MixAttention blog post from Databricks/Mosaic is great: https://t.co/0PGr4k9e6q

5

21

190

Abhay Gupta

@gupta__abhay

3 months

@jefrankle frantically asks not to whisper the words that won him this very well deserved award !!! But here’s one for you - “Lottery Tickets” !!!

Jonathan Frankle

@jefrankle

3 months

Throwback Thursday.

1

5

Pratyush Maini

@pratyushmaini

4 months

1/Pretraining is hitting a data wall; scaling raw web data alone leads to diminishing returns. Today @datologyai shares BeyondWeb, our synthetic data approach & all the learnings from scaling it to trillions of tokens🧑🏼‍🍳 - 3B LLMs beat 8B models🚀 - Pareto frontier for performance

23

126

712

DeepSeek

@deepseek_ai

10 months

🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference! Core components of NSA: • Dynamic hierarchical sparse strategy • Coarse-grained token compression • Fine-grained token selection 💡 With

898

2K

16K

Mahesh Sathiamoorthy

@madiator

11 months

We are announcing Open Thoughts, our large-scale open-source effort to curate the best open reasoning datasets! DeepSeek-R1 is amazing but we still don't have access to high-quality open reasoning datasets. These datasets are crucial if you want to build your reasoning models!

46

289

2K

Mahesh Sathiamoorthy

@madiator

11 months

Introducing Bespoke-Stratos-32B, our reasoning model distilled from DeepSeek-R1 using Berkeley NovaSky’s Sky-T1 recipe. The model outperforms Sky-T1 and o1-preview in reasoning (Math and Code) benchmarks and almost reaches the performance of DeepSeek-R1-Distill-Qwen-32B while

34

136

775

Mahesh Sathiamoorthy

@madiator

11 months

We are happy to announce Curator, an open-source library designed to streamline synthetic data generation! High-quality synthetic data generation is essential in training and evaluating LLMs/agents/RAG pipelines these days, but tooling around this is still entirely lacking! So

26

153

960

Mahesh Sathiamoorthy

@madiator

11 months

Nice to see my previous work that I led at Google DeepMind covered by VentureBeat (in the light of a new work from Meta). Context: We had introduced the novel idea of Generative Retrieval for recommender systems to the world in our Neurips 2023 paper called TIGER (Transformer

2

14

93

Kangwook Lee

@Kangwook_Lee

11 months

It's finally here! Excited to share the project I led with KRAFTON and NVIDIA. The future of gaming is here 🙌

NVIDIA GeForce

@NVIDIAGeForce

11 months

Transform solo gameplay into a seamless team experience with PUBG Ally. KRAFTON & NVIDIA have teamed up to create the world’s first Co-Playable Character (CPC), built with NVIDIA ACE → https://t.co/JLcLQ8crrD

5

6

86

Databricks

@databricks

11 months

Watch the full conversation:

0

6

10

Databricks

@databricks

11 months

Databricks research scientist @shashank_r12 s shares approaches in LLMs: - How RAG enhances accuracy - Evolution of attention mechanisms - Practical applications & trade-offs of Mamba architectures

1

8

21

Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)

@rao2z

1 year

Soo disappointed that it's just a "department" and not a School, College or an Institute.. gotta get ahead of the curve, @IITKgp!!

8

3

164

Hongyi Wang

@HongyiWang10

1 year

I have three Ph.D. student openings in my research group at @RutgersCS starting in Fall 2025. If you are interested in working with me on efficient algorithms and systems for LLMs, foundation models, and AI4Science, please apply at: https://t.co/nFxw6cP4Oh The deadline is

20

113

410

Pallavi

@herengoneagn

1 year

🧵 Super proud to finally share this work I led last quarter - the @Databricks Domain Intelligence Benchmark Suite (DIBS)! TL;DR: Academic benchmarks ≠ real performance and domain intelligence > general capabilities for enterprise tasks. 1/3

3

10

33

dr. jack morris

@jxmnop

1 year

i'm somewhat confident that both the following properties will hold of language models in 2027: 1. tokenization will be gone, replaced with byte-level ingestion 2. all tokens that don't need to be read or written by a human will be continuous vectors luckily two interesting

22

93

811

Rajko Radovanović

@rajko_rad

1 year

At NeurIPS early? Like making GPUs go brrr? Join me at a luncheon tomorrow on LLM Scaling x Efficiency, 5 mins from the conference center... Note, folks need to have directly relevant work of not in the field. DM me for more info or for reccs! Per the usual, I'll be doing 3

0

6

40

Shashank Rajput

@shashank_r12

1 year

I'll be at NeurIPS and would love to chat about anything AI. Also, visit the Databricks booth to checkout out some of the work we've been doing! https://t.co/8a36EwaWZj

databricks.com

Databricks is a platinum sponsor of NeurIPS 2024, held Dec 10-15 in Vancouver. Visit booth #591 (Dec 10-12) for demos on observability and GenAI tools like MLflow and Mosaic AI. Talks include Matei...

1

0

17

Ahmad Al-Dahle

@Ahmad_Al_Dahle

1 year

Introducing Llama 3.3 – a new 70B model that delivers the performance of our 405B model but is easier & more cost-efficient to run. By leveraging the latest advancements in post-training techniques including online preference optimization, this model improves core performance at

175

470

3K

NVIDIA AI Developer

@NVIDIAAIDev

1 year

🤔 How can we achieve GPT-3 175B-level performance with only 1.3B parameters? 🌟 New from #NVIDIAResearch: HYMBA (HYbrid Multi-head Bi-Attention) combines MLP and attention mechanisms to dramatically boost small language model capabilities. HYMBA could revolutionize NLP

8

44

170