Nithin GK
@NithinGK10
Followers
73
Following
242
Media
0
Statuses
48
Learning to make good systems. Diffusion models, Multimodal LLMs , Agents, 3D reconstruction. @JohnsHopkins @iitmadras alum. Views are my own.
Seattle
Joined May 2021
Give all latex, figs & graphs for your paper to Nano Banana Pro, and it'll make a clear, fun, and high resolution 4K poster for you We might just preset this as our @NeurIPSConf poster next month https://t.co/Vy7VKOrzMG Go to @YuyangHu_666 poster & and he'll share the prompt
1
6
77
You went 🍌🍌 for Nano Banana. Now, meet Nano Banana Pro. It’s SOTA for image generation + editing with more advanced world knowledge, text rendering, precision + controls. Built on Gemini 3, it’s really good at complex infographics - much like how engineers see the world:)
824
2K
24K
@StefanoErmon @_inception_ai Diffusion will obviously work on any bitstream. With text, since humans read from first word to last, there is just the question of whether the delay to first sentence for diffusion is worth it. That said, the vast majority of AI workload will be video understanding and
131
205
2K
AGI won't know it's done training... like us, it might always feel incomplete, never satisfied... never realizing it's already good enough!
3
3
31
🥳🥳DiT w/o VAE, but with Semantic Encoder, such as DINO! We introduce SVG (Self-supervised representation for Visual Generation) . Paper: https://t.co/TL2gnTDCGL Code: https://t.co/fWEwVYeiKz
8
55
362
🧩 “It’s not AR vs diffusion… it’s AR through diffusion.” 👉Is it possible that Visual Autoregressive are models are secretely a discrete diffusion ? We show: with a Markovian attention mask, VAR becomes mathematically equivalent to discrete diffusion. Here's how 🧵👇
2
4
6
Did you know that when they say stuff like "The A18 uses TSMC's 3nm process" or "announced the 2nm node" The 3nm, 2nm actually doesn't mean anything?! It's just like a version number. They make it up. Literally nothing measures 2nm or 3nm. I certainly didn't know.
340
555
9K
Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to
237
1K
8K
🚀 Open Vision Reasoner (OVR) Transferring linguistic cognitive behaviors to visual reasoning via large-scale multimodal RL. SOTA on MATH500 (95.3%), MathVision, and MathVerse. 💻 Code: https://t.co/eB82Pawlue 🌐 Project: https://t.co/tMAFw1oe3N
#LLM @yanawei @HopkinsEngineer
0
5
21
Why you should stop working on RL research and instead work on product // The technology that unlocked the big scaling shift in AI is the internet, not transformers I think it's well known that data is the most important thing in AI, and also that researchers choose not to work
34
154
2K
Scalable EBM training?? Can’t believe I can see this keyword in my life 😭. This is amazing! On a second read of the algorithm, looks like it is combining EBM with a training algorithm closer to AR. But still, leveraging EBM as a way of latent thinking is such a cool idea 💗
4
52
435
For people looking for a diffusion-based video generator to finetune or post-train for their downstream physical AI applications, we just released our latest one. We have 2 models: 2B and 14B. 2B for fast prototyping and 14B for better quality. The license is fully open. Give it
github.com
Cosmos-Predict2 is a collection of general-purpose world foundation models for Physical AI that can be fine-tuned into customized world models for downstream applications. - nvidia-cosmos/cosmos-pr...
🚀 Introducing Cosmos-Predict2! Our most powerful open video foundation model for Physical AI. Cosmos-Predict2 significantly improves upon Predict1 in visual quality, prompt alignment, and motion dynamics—outperforming popular open-source video foundation models. It’s openly
2
11
46
We post-trained a reasoning model to reason whether a video is real or generated. It might be very useful as a critic to improve video generators. Take a look. @NVIDIAAI
Cosmos-Reason1 has exciting updates 💡 Now it understands physical reality — judging videos as real or fake! Check out the resources👇 Paper: https://t.co/TcqqvrhqAD Huggingface: https://t.co/hOLno2IyhW Code: https://t.co/UUg90bmcGW Project page: https://t.co/Dr6ZqnKM8o (1/n)
0
4
36
If you've read my latest blog post on generative modelling in latent space, this one is a great follow-up about putting things into practice. https://t.co/eTo1BfDkxk
In this blog post we will summarize some of our findings with training autoencoders for diffusion! We also share some null results we had with a slightly unconventional approach we tried. 1/2
1
27
208
OpenAI's Greg Brockman says the AGI future looks less like a monolith - and more like a menagerie of specialized agents. Models that call other models. “We're heading to a world where the economy is fundamentally powered by AI.” The goal is to unlock 10x more activity, output,
66
154
890
It's already the case that people's free will gets hijacked by screens for hours a day, with lots of negative consequences. AI video can make this worse, since it's directly optimizable. AI video has positive uses, but most of it will be fast food for the mind.
Very impressed with Veo 3 and all the things people are finding on r/aivideo etc. Makes a big difference qualitatively when you add audio. There are a few macro aspects to video generation that may not be fully appreciated: 1. Video is the highest bandwidth input to brain. Not
22
36
387
Wild numbers. If you plot trajectory of the non-stochastic diffusion sampling, 99.8% of the latent of the entire trajectory can be explained with first two principle components. Roughly speaking, your entire diffusion trajectory is 99.8% two dimensional.
19
30
413