Nithin GK Profile
Nithin GK

@NithinGK10

Followers
73
Following
242
Media
0
Statuses
48

Learning to make good systems. Diffusion models, Multimodal LLMs , Agents, 3D reconstruction. @JohnsHopkins @iitmadras alum. Views are my own.

Seattle
Joined May 2021
Don't wanna be here? Send us removal request.
@docmilanfar
Peyman Milanfar
8 days
Give all latex, figs & graphs for your paper to Nano Banana Pro, and it'll make a clear, fun, and high resolution 4K poster for you We might just preset this as our @NeurIPSConf poster next month https://t.co/Vy7VKOrzMG Go to @YuyangHu_666 poster & and he'll share the prompt
1
6
77
@sundarpichai
Sundar Pichai
8 days
You went 🍌🍌 for Nano Banana. Now, meet Nano Banana Pro.  It’s SOTA for image generation + editing with more advanced world knowledge, text rendering, precision + controls. Built on Gemini 3, it’s really good at complex infographics - much like how engineers see the world:)
824
2K
24K
@sainingxie
Saining Xie
10 days
☘️
@GoogleDeepMind
Google DeepMind
10 days
This is Gemini 3: our most intelligent model that helps you learn, build and plan anything. It comes with state-of-the-art reasoning capabilities, world-leading multimodal understanding, and enables new agentic coding experiences. 🧵
0
5
52
@elonmusk
Elon Musk
19 days
@hyhieu226 We’re working on them
112
89
2K
@elonmusk
Elon Musk
22 days
@StefanoErmon @_inception_ai Diffusion will obviously work on any bitstream. With text, since humans read from first word to last, there is just the question of whether the delay to first sentence for diffusion is worth it. That said, the vast majority of AI workload will be video understanding and
131
205
2K
@m__dehghani
Mostafa Dehghani
1 month
AGI won't know it's done training... like us, it might always feel incomplete, never satisfied... never realizing it's already good enough!
3
3
31
@xinntao
Xintao Wang
1 month
🥳🥳DiT w/o VAE, but with Semantic Encoder, such as DINO! We introduce SVG (Self-supervised representation for Visual Generation) . Paper: https://t.co/TL2gnTDCGL Code: https://t.co/fWEwVYeiKz
8
55
362
@Amandeep__kumar
Amandeep Kumar
2 months
🧩 “It’s not AR vs diffusion… it’s AR through diffusion.” 👉Is it possible that Visual Autoregressive are models are secretely a discrete diffusion ? We show: with a Markovian attention mask, VAR becomes mathematically equivalent to discrete diffusion. Here's how 🧵👇
2
4
6
@giffmana
Lucas Beyer (bl16)
2 months
Did you know that when they say stuff like "The A18 uses TSMC's 3nm process" or "announced the 2nm node" The 3nm, 2nm actually doesn't mean anything?! It's just like a version number. They make it up. Literally nothing measures 2nm or 3nm. I certainly didn't know.
340
555
9K
@thinkymachines
Thinking Machines
3 months
Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to
237
1K
8K
@TheAppleDesign
Apple Design
4 months
Steve Jobs : how to design perfect products.
99
1K
10K
@vishalm_patel
Vishal Patel
5 months
🚀 Open Vision Reasoner (OVR) Transferring linguistic cognitive behaviors to visual reasoning via large-scale multimodal RL. SOTA on MATH500 (95.3%), MathVision, and MathVerse. 💻 Code: https://t.co/eB82Pawlue 🌐 Project: https://t.co/tMAFw1oe3N #LLM @yanawei @HopkinsEngineer
0
5
21
@_kevinlu
Kevin Lu
5 months
Why you should stop working on RL research and instead work on product // The technology that unlocked the big scaling shift in AI is the internet, not transformers I think it's well known that data is the most important thing in AI, and also that researchers choose not to work
34
154
2K
@RuiqiGao
Ruiqi Gao
5 months
Scalable EBM training?? Can’t believe I can see this keyword in my life 😭. This is amazing! On a second read of the algorithm, looks like it is combining EBM with a training algorithm closer to AR. But still, leveraging EBM as a way of latent thinking is such a cool idea 💗
@_akhaliq
AK
5 months
Energy-Based Transformers are Scalable Learners and Thinkers
4
52
435
@liu_mingyu
Ming-Yu Liu
6 months
For people looking for a diffusion-based video generator to finetune or post-train for their downstream physical AI applications, we just released our latest one. We have 2 models: 2B and 14B. 2B for fast prototyping and 14B for better quality. The license is fully open. Give it
Tweet card summary image
github.com
Cosmos-Predict2 is a collection of general-purpose world foundation models for Physical AI that can be fine-tuned into customized world models for downstream applications. - nvidia-cosmos/cosmos-pr...
@qsh_zh
Qinsheng Zhang
6 months
🚀 Introducing Cosmos-Predict2! Our most powerful open video foundation model for Physical AI. Cosmos-Predict2 significantly improves upon Predict1 in visual quality, prompt alignment, and motion dynamics—outperforming popular open-source video foundation models. It’s openly
2
11
46
@liu_mingyu
Ming-Yu Liu
6 months
We post-trained a reasoning model to reason whether a video is real or generated. It might be very useful as a critic to improve video generators. Take a look. @NVIDIAAI
@mli0603
Max Li 李赵硕
6 months
Cosmos-Reason1 has exciting updates 💡 Now it understands physical reality — judging videos as real or fake! Check out the resources👇 Paper: https://t.co/TcqqvrhqAD Huggingface: https://t.co/hOLno2IyhW Code: https://t.co/UUg90bmcGW Project page: https://t.co/Dr6ZqnKM8o (1/n)
0
4
36
@sedielem
Sander Dieleman
6 months
If you've read my latest blog post on generative modelling in latent space, this one is a great follow-up about putting things into practice. https://t.co/eTo1BfDkxk
@wayfarerlabs
OWL
6 months
In this blog post we will summarize some of our findings with training autoencoders for diffusion! We also share some null results we had with a slightly unconventional approach we tried. 1/2
1
27
208
@vitrupo
vitrupo
6 months
OpenAI's Greg Brockman says the AGI future looks less like a monolith - and more like a menagerie of specialized agents. Models that call other models. “We're heading to a world where the economy is fundamentally powered by AI.” The goal is to unlock 10x more activity, output,
66
154
890
@dpkingma
Durk Kingma
6 months
It's already the case that people's free will gets hijacked by screens for hours a day, with lots of negative consequences. AI video can make this worse, since it's directly optimizable. AI video has positive uses, but most of it will be fast food for the mind.
@karpathy
Andrej Karpathy
6 months
Very impressed with Veo 3 and all the things people are finding on r/aivideo etc. Makes a big difference qualitatively when you add audio. There are a few macro aspects to video generation that may not be fully appreciated: 1. Video is the highest bandwidth input to brain. Not
22
36
387
@cloneofsimo
Simo Ryu
9 months
Wild numbers. If you plot trajectory of the non-stochastic diffusion sampling, 99.8% of the latent of the entire trajectory can be explained with first two principle components. Roughly speaking, your entire diffusion trajectory is 99.8% two dimensional.
19
30
413