Kalpesh Krishna
@kalpeshk2011
Followers
3K
Following
7K
Media
39
Statuses
1K
Staff Research Scientist @GoogleDeepMind (Gemini team), PhD from @umass_nlp. Former undergrad @IITBombay and intern @GoogleDeepMind, @Allen_AI, @TTIC_Connect.
New York, USA
Joined May 2013
Check out our new @GoogleAI paper: we curate a mixture of 5M human judgments to train general-purpose foundational autoraters. Strong LLM-as-judge scores on RewardBench (87.8%), and highest perf among baselines on LLMAggreFact + 6 other benchmarks! 📰 https://t.co/oUN4hDeNWx 👇
arxiv.org
As large language models (LLMs) advance, it becomes more challenging to reliably evaluate their output due to the high costs of human evaluation. To make progress towards better LLM autoraters, we...
🚨 New @GoogleDeepMind paper 🚨 We trained Foundational Large Autorater Models (FLAMe) on extensive human evaluations, achieving the best RewardBench perf. among generative models trained solely on permissive data, surpassing both GPT-4 & 4o. 📰: https://t.co/FIPFiHwXyt 🧵:👇
6
21
124
Our TPUs are headed to space! Inspired by our history of moonshots, from quantum computing to autonomous driving, Project Suncatcher is exploring how we could one day build scalable ML compute systems in space, harnessing more of the sun’s power (which emits more power than 100
830
2K
17K
Just posted Q3 earnings. We delivered our first-ever $100B quarter driven by double-digit growth across every major part of our business. (Five years ago, our quarterly revenue was at $50B🚀) Our full-stack approach to AI is driving real momentum and we’re shipping at speed.
617
1K
15K
It's 4:43am PT on a Thursday, and it's launch time on @GeminiApp! So excited and thankful for our partnership with Reliance - we've just start rolling out FREE Google AI Pro plans across India for eligible Jio customers! (total value ₹35,100) Grab it in the MyJio app or
178
246
3K
A thread of videos from today’s flight into Hurricane Melissa In this first one we are entering from the southeast just after sunrise and the bright arc on the far northwest eye wall is the light just beginning to make it over the top from behind us.
356
6K
24K
Third pass through Melissa. GoPro in side window as different camera looking forward shooting in ultra high res 8k. Not sure when that might get processed as the file turned out ridiculous. Barely had HD space for it and MacBook Pro promptly chocked when I tried to edit it
91
2K
10K
Let’s set the record straight: President Trump’s updated H-1B visa requirement applies only to new, prospective petitions that have not yet been filed. Petitions submitted prior to September 21, 2025 are not affected. Any reports claiming otherwise are flat-out wrong and should
3K
2K
9K
An advanced version of Gemini 2.5 Deep Think has achieved gold-medal level performance at the ICPC 2025 - one of the world’s most prestigious programming contests. 🏅 Building on the model's success in math at the IMO, this marks another historic milestone for advanced AI. 🧵
123
346
2K
🚨🍌Breaking News: Gemini-2.5-Flash-Image-Preview (“nano-banana”) by @GoogleDeepMind now ranks #1 in Image Edit Arena. In just two weeks: 🟡“nano-banana” has driven over 5 million community votes in the Arena 🟡Record-breaking 2.5M+ votes casted for this model alone 🟡It has
Image generation with Gemini just got a bananas upgrade and is the new state-of-the-art image generation and editing model. 🤯 From photorealistic masterpieces to mind-bending fantasy worlds, you can now natively produce, edit and refine visuals with new levels of reasoning,
36
155
1K
Image generation with Gemini just got a bananas upgrade and is the new state-of-the-art image generation and editing model. 🤯 From photorealistic masterpieces to mind-bending fantasy worlds, you can now natively produce, edit and refine visuals with new levels of reasoning,
184
539
3K
Last night, the 53rd Weather Reconnaissance Squadron flew into the eye of Hurricane Erin—and captured imagery of the breathtaking stadium effect. These missions provide critical data to the NHC to improve forecasts, helping keep communities safe before the storm makes
182
3K
10K
One word: relentless. just in the past two weeks, we’ve shipped: 🌐 Genie 3 - the most advanced world simulator ever 🤔 Gemini 2.5 Pro Deep Think available to Ultra subs 🎓 Gemini Pro free for uni students & $1B for US ed 🌍 AlphaEarth - a geospatial model of the entire planet
504
972
10K
Genie 3 is here - it can generate an entire world simulation that you can interact with in real-time, just from a text prompt! It's pretty mind-blowing really when you stop to think about it, and it's rapidly improving - one day we will be able to build the Holodeck for real!
251
735
5K
For researchers, scientists, and academics tackling hard problems: Gemini 2.5 Deep Think is here. 🤯 It doesn't just answer, it brainstorms using parallel thinking and reinforcement learning techniques. We put it into the hands of mathematicians who explored what it can do ↓
140
491
3K
🎉 BearCubs 🐻 is headed to #COLM2025! We benchmarked the newly released ChatGPT Agent and uncovered some exciting insights. Looking forward to sharing our findings in Montreal this October and giving a sneak peek of at REALM @ACL2025NLP in Vienna! 🎻
ChatGPT Agent is a huge step up on BearCubs, esp on multimodal/interactive tasks (e.g., playing web games)! It gets 65.8% accuracy vs Deep Research's 36% and Operator's 23%. Humans are at ~85%, and clearly better/faster at fine control & complex filtering.
0
1
17
An advanced version of Gemini with Deep Think has officially achieved gold medal-level performance at the International Mathematical Olympiad. 🥇 It solved 5️⃣ out of 6️⃣ exceptionally difficult problems, involving algebra, combinatorics, geometry and number theory. Here’s how 🧵
157
740
4K
AI now beats every single human in the hardest college entrance exam in India, the IIT JEE. Bytedance silently published this result this week. The top scorer was Rajit Gupta with 332/360, but Google's Gemini 2.5 Pro was at rank 1 with 336/360.
151
349
4K
Hot Gemini updates off the press. 🚀 Anyone can now use 2.5 Flash and Pro to build and scale production-ready AI applications. 🙌 We’re also launching 2.5 Flash-Lite in preview: the fastest model in the 2.5 family to respond to requests, with the lowest cost too. 🧵
48
134
1K
🔥Veo 3 keeps growing like crazy. To keep up, we’re introducing Veo 3 Fast in @GeminiApp and Flow. It’s >2x faster, has the same 720p resolution, and a bunch of serving optimizations. The big headline: we can serve more of it, even for the Yetis! How to get started: 1) Get a
96
111
849