Nishanth Dikkala @NishanthDikkala X Profile

Nishanth Dikkala

@NishanthDikkala

Followers

393

Following

2K

Media

5

Statuses

136

Research Scientist @ Google Research, Ph.D. Computer Science, MIT.

Mountain View, CA

Joined July 2013

Don't wanna be here? Send us removal request.

Nishanth Dikkala

@NishanthDikkala

1 month

Check out alternating updates, our research technique which is part of the Gemma 3n model!.

Omar Sanseviero

@osanseviero

1 month

Want to learn about the research behind Gemma 3n?. Altup - LAuReL - MatFormer - Activation sparsity - Universal Speech Model - Blog -

0

10

Nishanth Dikkala

@NishanthDikkala

1 month

RT @GoogleDeepMind: We’re fully releasing Gemma 3n, which brings powerful multimodal AI capabilities to edge devices. 🛠️. Here’s a snapshot….

0

445

0

Nishanth Dikkala

@NishanthDikkala

3 months

Check out the new Gemma 3n model we shipped yesterday which packs a lot of powwow for its size!.

Google AI Developers

@googleaidevs

3 months

✨ Introducing Gemma 3n, available in early preview today. The model uses a cutting-edge architecture optimized for mobile on-device usage. It brings multimodality, super fast inference, and more.

0

2

8

Nishanth Dikkala

@NishanthDikkala

3 months

Presenting this work @ ICLR tomorrow! Come talk to us about looped transformers and their inductive bias for reasoning tasks. Poster #272: Hall 3 + 2B.

Nikunj Saunshi

@NSaunshi

5 months

*New ICLR paper* – We introduce a paradigm of *looped models for reasoning*. Main claims.- Reasoning requires depth (via looping), not necessarily params.- LLM reasoning predictably scales with more loops.- Looped models generate “latent thoughts” & can simulate CoT reasoning.1/n

1

2

8

Nishanth Dikkala

@NishanthDikkala

5 months

RT @kazemi_sm: Checkout our latest Gemma models, you'll love them :).Also checkout the results on our BIG-Bench Extra Hard benchmark: https….

0

4

0

Nishanth Dikkala

@NishanthDikkala

5 months

💡🚨Check out our ICLR 2025 paper on the inductive bias looping offers for reasoning tasks!.

Nikunj Saunshi

@NSaunshi

5 months

*New ICLR paper* – We introduce a paradigm of *looped models for reasoning*. Main claims.- Reasoning requires depth (via looping), not necessarily params.- LLM reasoning predictably scales with more loops.- Looped models generate “latent thoughts” & can simulate CoT reasoning.1/n

0

6

Nishanth Dikkala

@NishanthDikkala

5 months

As LLMs get stronger, we need harder benchmarks to continue tracking progress. Check out our new work in this direction.

Mehran Kazemi

@kazemi_sm

5 months

Is BIG-Bench Hard too easy for your LLM?.We just unleashed BIG-Bench EXTRA Hard (BBEH)! 😈.Every task, harder! Every model, humbled! (Poem Credit: Gemini 2.0 Flash).Massive headroom for progress across various areas in general reasoning 🤯

0

2

8

Nishanth Dikkala

@NishanthDikkala

6 months

RT @james_s_bedford: Found out students are using this website to have an AI generated Lebron James summarise their study material. You can….

0

27

0

Nishanth Dikkala

@NishanthDikkala

8 months

RT @jdeposicion: After apparently the entirety of the Indian population, I've come to also learn that:. *Y'all are wild with names .*Y'all….

0

46

0

Nishanth Dikkala

@NishanthDikkala

8 months

RT @kazemi_sm: [1] ReMI: A Dataset for Reasoning with Multiple Images. Work done with @NishanthDikkala, @ankit_s_anand, @hardy_qr, @BahareF….

0

2

0

Nishanth Dikkala

@NishanthDikkala

9 months

Beautiful <3.

Reem Abulleil

@ReemAbulleil

9 months

I rarely do this — but sometimes it really is personal.

0

1

Nishanth Dikkala

@NishanthDikkala

10 months

Check out our new work showing that causal language modeling alone is sufficient for a Transformer model to learn to solve Sudokus and other constraint satisfaction problems like Zebra puzzles! Lead by @shahkulin98, to appear at @NeurIPSConf 2024!.

Kulin Shah

@shahkulin98

10 months

📢 Excited to announce our new paper (accepted at @NeurIPSConf) that shows that causal language modeling alone can teach a 50M parameter transformer model to solve Sudoku and Zebra puzzles. Paper: A thread 🧵

0

1

12

Nishanth Dikkala

@NishanthDikkala

11 months

RT @kiranvodrahalli: Happy to share Michelangelo (, a long-context reasoning benchmark which measures performance b….

arxiv.org

We introduce Michelangelo: a minimal, synthetic, and unleaked long-context reasoning evaluation for large language models which is also easy to automatically score. This evaluation is derived via...

0

57

0

Nishanth Dikkala

@NishanthDikkala

11 months

RT @francoisfleuret: Funny thing is that I am convinced intelligence is distilling system 1 into system 2.

0

1

0

Nishanth Dikkala

@NishanthDikkala

11 months

Playing around with giving o1-preview a slight modification of a question from AIME 2024. Almost gets it right. One marked improvement over previous models is the reasoning chain stays consistent throughout the solution.

0

3

Nishanth Dikkala

@NishanthDikkala

1 year

RT @levelsio: ✨ I made a new site called. 🧳 💨 It's a live ranking of airlines by how much luggage they are losing….

0

809

0

Nishanth Dikkala

@NishanthDikkala

1 year

Check out our new multi-image reasoning benchmark where a model needs to reason using information spread across text and multiple images!.(An interesting insight we discover: Even the mightiest models struggle to tell time!).

Mehran Kazemi

@kazemi_sm

1 year

🚨 Benchmark Alert: Multi-Image Reasoning. While newer LLMs can reason across multiple, potentially disparate, information sources, their effectiveness remains uncertain. We introduce ReMI, a benchmark dedicated to measuring reasoning with multiple images interleaved with text.

0

2

Nishanth Dikkala

@NishanthDikkala

2 years

RT @ChatGPTGoneWild: Oh:

0

7

0

Nishanth Dikkala

@NishanthDikkala

2 years

Check out the blog post on our work on scaling up embedding dimension in Transformer models efficiently! (NeurIPS 2023 Spotlight paper).(Joint work with Cenk Baykal, Dylan Cutler, Nikhil Ghosh, @rinapy and Xin Wang).

Google AI

@GoogleAI

2 years

Introducing AltUp, a method that takes advantage of increasing scale in Transformer networks w/out increasing the computation cost — it’s easy to implement, widely applicable to Transformer architectures, and requires minimal hyperparameter tuning. More →

0

1

12

Nishanth Dikkala

@NishanthDikkala

2 years

RT @Gill_Gross: There is nothing like tennis at the highest level humanly possible.

0

28

0