Bren Professor
@caltech
, Fmr Sr Director of
#AI
research
@nvidia
, Fmr Principal Scientist
@awscloud
, AI+Science, PDE, Neural operators. Views my own.
Our
@NatRevPhys
perspective article on neural operators and their ability to accelerate simulations and design is now out.
@Nature
1. Neural operators learn mappings between functions, e.g. spatiotemporal processes and partial differential equations.…
For the first time, we show that the Llama 7B LLM can be trained on a single consumer-grade GPU (RTX 4090) with only 24GB memory. This represents more than 82.5% reduction in memory for storing optimizer states during training.
Training LLMs from scratch currently requires huge…
GaLore
Memory-Efficient LLM Training by Gradient Low-Rank Projection
Training Large Language Models (LLMs) presents significant memory challenges, predominantly due to the growing size of weights and optimizer states. Common memory-reduction approaches, such as low-rank
I have decided to leave my position at Nvidia to focus on starting something new, as some of you may already know. I look forward to scaling models with physical and scientific understanding to accelerate progress toward AGI.
I will share more soon. Excited to meet everyone at…
Launching Lean Co-pilot for LLM-human collaboration to write formal mathematical proofs that are 100% accurate. We use LLMs to suggest proof tactics in Lean and also allow humans to intervene and modify in a seamless manner.
Automating theorem proving…
My mom was one of the first female engineers in my community. Initially, my grandpa refused since she would be too qualified and hence, unmarriageable. My mom went on a hunger strike for three days and my grandpa relented. I stand on the shoulder of giants
#womeninstem
Our big day came together so beautifully, surrounded by family and friends at the historic
@caltech
Athenaeum. More pictures to come soon!
@bjenik
#wedding
Thank you
@TheOfficialACM
for recognizing me as an ACM Fellow! This honor belongs to my team and collaborators. Every day I am energized and inspired to work with them. Also thankful to my mentors and my family for supporting me all this way!
💐Meet the 2022 ACM Fellows! 57 of the ACM members have been selected for their wide-ranging and fundamental contributions in the
#computing
field. Please join us in congratulating these new inductees!
Learn more about their achievements here:
#ACMFellows
Reduce training cost of diffusion models by ~70% through masked training of transformer backbones. Masked training is popular for self-supervised representation learning, but we are first to show for
#GenerativeAI
@wn8_nie
@Kay12400259
@ArashVahdat
How do we capture local features across multiple resolutions? While standard convolutional layers work only on a fixed input-resolution, we design local neural operators that learn integral and differential kernels, and are principled ways to extend standard convolutions to…
How can we mix active and transfer learning in few-shot learning setting with pre-trained LLMs? We show that you need to label only few samples in-domain and leverage on transfer learning from out-domain along with ingrained knowledge in pretrained models
TLDR: We developed StrassenNet ~ 4 years ago. It already re-discovered Strassen's algorithm, reduced multiplications by 99.5% while maintaining accuracy of
#deeplearning
No guarantee
#alphatensor
is stable while StrassenNet explicitly trains for it.
I agree with
@ylecun
that open source has been the primary reason for AI innovation and growth. Do not let misinformation and hysteria kill this. Open-source means we democratize and allow everyone to explore new ways to make AI reliable and safe, and allow for peer review.
The heretofore silent majority of AI scientists and engineers who
- do not believe in AI extinction scenarios or
- believe we have agency in making AI powerful, reliable, and safe and
- think the best way to do so is through open source AI platforms
NEED TO SPEAK UP !
My hypothesis is that more RLHF and fine-tuning destroys the calibration of pretrained models. It is a shallow fix that encourages the models to convince the user that they are following instructions, rather than actually following them.
This from a VP at OpenAI is from a few days ago. I wonder if degradation on some tasks can happen simply as an unintended consequence of fine tuning (as opposed to messing with the mixture-of-experts setup in order to save costs, as has been speculated).
Join our AI algorithms team at
@nvidia
as an intern working on a broad range of topics from generative models, reinforcement learning, neural operators, 3D vision, AI for science, and quantum algorithms.
Text understanding with
#LLMs
is useful but not enough for scientific understanding and discovery. In chemistry, in addition to text, chemical structure is essential to determine the properties of molecules.
We have created the first multimodal text-chemical structure model:…
Looking forward to my
@TEDTalks
on building
#AI
with universal physical understanding. Excited to announce our recent works building the foundations for such a model.
Language models have shown impressive capabilities with universal text understanding capabilities, but they are…
Strongly disagree with the first one. "What I cannot create I cannot understand" - Richard Feynman. Generative AI is not just for creating fun art, it is at the core of concept learning.
Celebrating a personal milestone. Last year I married
@bjenik
my amazing cheerleader, partner-in-crime and soulmate amidst family and friends coming out of the pandemic. Can't believe the year has gone by!
#happyanniversary
AI with Fourier Neural Operators speeds up Plasma modeling in Nuclear fusion by a million times
Predicting plasma evolution within a Tokamak reactor is crucial to realizing the goal of sustainable fusion. A big challenge are disruptions that occur when the plasma gets unstable…
#ValentinesDay
❤️Really lucky to have
@bjenik
in my life. It has transformed my perspective in so many ways. To have someone supportive and dependable allows me to be authentic and vulnerable. Looking forward to so many new adventures ❤️
I am so excited to see that the long-planned release of the Makani library for
#AI
based weather and climate modeling has finally happened!
It started with our FourCastNet work almost three years ago using the Fourier neural operator, where we showed for…
Scientists of Twitter! The world needs more reminders that we're living, breathing human beings.
Quote tweet this with a picture of you doing not-science
What an amazing evening at
@TheOfficialACM
awards reception yesterday! Honored to be selected as an ACM Fellow and privileged to be working with phenomenal students and colleagues! Wonderful to have
@bjenik
by my side and cheering me!
Very proud to see the FourCastNet model that I helped build at
@nvidia
in collaboration with
@BerkeleyLab
and other universities be highlighted in Jensen's keynote at
#GTC24
- FourCastNet was the very first high-resolution
#AI
model to show competitive performance for weather…
Coming off from
@nvidia
blow-out earnings and company meeting, Jensen met our Earth-2 team.
#ai
+
#science
is a game-changer here. We are
#hiring
AI research scientists, deep learning engineers, numerical methods experts, and climate modelers. DM me.
I have missed conference dinners.
#NeurIPS2022
reunion. Great to see former students grow their research groups. Great to have students presenting at their first-ever conference. Great to connect everyone together.
On
#WomeninScienceDay
I honor my grandma. While she didn't get a chance to formally study
#STEM
she instilled in me innate curiosity. She, along with my grandpa, allowed my mom to study engineering, something unprecedented in the community then, despite widespread opposition.
Researchers at the California Institute of Technology have developed a drone that can fight back against powerful wind gusts, powered by machine learning, AI and complex algorithms
RL is too difficult to learn from scratch. The future is large foundation models with world knowledge helping with automatic curriculum generation and online learning
It’s becoming increasingly understood that RL is no longer the way to create agents that interact with their environment
Instead, it’s about using GPT as a processor, and writing cognitive programs the agents execute
@DeveloperHarris
is one of my favorite follows in this space…
Our work on using Neural Operators to design a medical catheter just got published in Science Advances.
A catheter is a flexible tube to draw fluids out of a human body. Unfortunately, catheters frequently cause bacterial infections by bacteria swimming…
"Let us ban matrix multiplications for six months" is not a solution to dangers
#AI
potentially poses while ignoring vast benefits. As with any technology, I believe in humanity's ability to embrace positive benefits while figuring out safety guardrails. No need to stop progress
2023 is the year that we took AI+Science mainstream. My research summary of an amazingly productive year.
In 2018, I co-founded the
@caltech
#AI
+
#Science
initiative that spurred deep collaborations leading to many of these developments. Looking forward…
Tensors are back! I am so glad to see this! We have been advocating that tensor or multilinear operations are all you need. The benefits: universal approximation and expressivity. Higher order BLAS operations for parallelism without the need for any transposition or data…
Proud for our new
#ICLR2024
paper attempting to answer: Are activation functions required for all deep networks?
Can networks perform well on ImageNet recognition without activation functions, max pooling, etc? 🧐
Arxiv:
🧵1/n
I am thankful for having an amazing group of students and collaborators both at
@Caltech
and
@nvidia
Their passion continues to inspire me every day. I am also thankful to my family for being an amazing source of support.
#HappyThanksgiving
For natural language, reducing hallucinations is a tradeoff between utility and creativity. Not so, for theorem proving and other applications that can be formally verified and certified. Check out our LeanDojo using LLMs including GPT4 interface that has…
Generative models are not just for generating plausible text, as in ChatGPT, but also samples in other domains such as chemistry and biology. An important problem is generating the configuration of how proteins and small molecules (ligands) bind with one another.
We designed…
Anima Anandkumar is leading research and institutional change that aim to reform the foundations of AI. In a conversation with
@laxmevy
, the machine learning scientist discusses her upbringing, tensor algebra and the ethical challenges facing her field.
Generative models are not just for generating plausible text, as in ChatGPT, but also samples in other domains such as chemistry and biology. An important problem is generating the configuration of how proteins and small molecules (ligands) bind with one another.
We designed…
Top-10 things that happened in 2022. It has been an action-packed year! So thankful for the opportunities I have been given and the amazing people I am surrounded with
Really excited to see my
@caltech
group
#website
revamped, and is sleek and simple. Check it out! We will also be adding detailed project pages over the next month
Physics-informed neural operator (PINO) for multi-body dynamics. The 3D PDE is challenging, but nicely decomposes into ODEs that are simpler and PINO can be used to solve it effectively
We have now posted an expanded version of our Lean Co-pilot paper and updated the code base.
The credit goes to
@p_song1
for driving this effort.
Our latest experiments show that our co-pilot can automate >80% of the proof steps…
Launching Lean Co-pilot for LLM-human collaboration to write formal mathematical proofs that are 100% accurate. We use LLMs to suggest proof tactics in Lean and also allow humans to intervene and modify in a seamless manner.
Automating theorem proving…
Super excited to share our paper on efficient training of Neural Operators with mixed precision, recently accepted at
#ICLR2024
! We show more than 50% gain in throughput with almost no loss in accuracy.
Neural operators are
#AI
methods for solving
#PDE
. Unlike traditional…
See our new paper on a new perspective on self-attention in
#transformer
We view the continuous extension of self-attention as kernel integration and use
#Fourier
neural operator
#FNO
to solve it. Leads to efficient
#transformer
with no loss in performance.
@NVIDIAAI
@Caltech
Check out our new arxiv preprint on efficient token mixers for transformers via Fourier neural operators
Fourier meets attention but more efficiently O(nlogn). With soft thresholding it allows dynamic scaling to high/continuous resolution vision.
@linylinx
Allow me to apologize. This bullet item was posted in error. NVIDIA looks for the quality of publications. We are not concerned with quantity.
Scientists of Twitter! The world needs more reminders that we're living, breathing human beings with lives outside a lab. Quote tweet this with a picture of you doing not-science!
No water, no electricity, all home sweet home. I miss 137...
Like many others, I came to US as an immigrant and made it my home. I benefitted immensely by being around smart and passionate people everywhere I went. Finding a path for capturing the value of immigrants we train here is really important to keep us competitive and innovative.
If this were a science paper, you would expect a country that picks its science workforce at random as a “weak baseline” and a leading nation like the US to actively experiment towards state-of-the-art, or at least beat the baseline.
Not providing a guaranteed path for…
It is hard to put into words my experience speaking at
@TEDTalks
It was exhilarating, fulfilling, inspiring! Thank you
@SCR10
for tirelessly helping me shape my talk and
@TEDchris
for all the feedback. Thank you
@bjenik
for being my cheerleader and a partner in making this talk…
Automotive aerodynamics aims to study the behavior of vehicles under motion, minimizing drag and detecting any undesirable behavior before the vehicle design is approved for production. This is typically done with a combination of simulation and wind tunnel testing, which are…
Geometry-informed neural operators (GINOs) can significantly accelerate traditional fluid dynamics simulations.
We extend GINO to solve partial differential equations on irregular geometries using physics-informed losses. Based on flexible sampling strategies and exact…
Automotive aerodynamics aims to study the behavior of vehicles under motion, minimizing drag and detecting any undesirable behavior before the vehicle design is approved for production. This is typically done with a combination of simulation and wind tunnel testing, which are…
Thank you for the honor. The credit really goes to my team and my collaborators. It is a privilege to be working with so many amazing people!
@RealAAAI
@Caltech
Lightning speed from open source community! The memory reduction of GaLore is reproduced from third-party (LLama-Factory), with 100% GPU utility and good-looking loss curve during full parameter fine-tuning :)
GaLore 8-bit AdamW requires 28G while standard AdamW requires >= 42G…
The first open data, model and benchmark for theorem proving under MIT license. Our small LLM with retriever is as good as the closed
@OpenAI
model. Democratization of
#AI
See more details in
@KaiyuYang4
thread:
LeanDojo: Theorem Proving with Retrieval-Augmented Language Models
paper page:
Large language models (LLMs) have shown promise in proving formal theorems using proof assistants such as Lean. However, existing methods are difficult to reproduce or build…
A great afternoon with fabulous
#womeninai
and strong allies like our own Jensen Huang doing a panel on a wide range of topics, from pursuing an AI career to the value of diverse teams for building trustworthy AI and promoting hybrid work.
@nvidia
@Oracle
@MosaicML
#siggraph
A simple lightweight approach to convert unconditional GANs into conditional models by only training a classifier through energy-based models in latent space of GANs. Zero-shot composition of attributes not present in training data. at
#NeurIPS
@wn8_nie
@ArashVahdat
@NVIDIAAI
📢
#LACE
: Controllable & Compositional Generation with Latent-Space Energy-Based Models
We introduce an extremely simple method for converting unconditional GANs into conditional models by training only a classifier.
abs:
project:
Thank you
@RyanMaue
for posting about our FourCastNet model. It was the first AI high-resolution weather model to show that accurate weather forecasting is possible while being tens of thousands of times faster than numerical weather models, which paved the way for other AI…
Want to see magic?
This is the Nvidia FourCastNet (v2) A.I. global weather model prediction (inference) of Hurricane Lee from the day it was a named storm.
Lee grows larger and more intense as it bends around the Bermuda high. Then, 12 days later it make landfall in NL.
“I never saw it as something weird for women to be interested in engineering … If my mom ever saw sexism, she would point it out and say, ‘No, don’t accept this.’ That really helped.” — Anima Anandkumar, machine learning scientist
A unique experience to visit the remotest continent,
#Antarctica
on invitation from
@congresofuturo
organized by the Chilean senate. Everything was planned immaculately, and we were accompanied by two Senators organizing the event and received by a military general. We got to…
🏆 Our paper "Deep Multimodal Fusion for Surgical Feedback Classification" won the best proceedings paper at
@SymposiumML4H
! 🎉
We're revolutionizing
#SurgicalTraining
by automating feedback analysis using
#MachineLearning
.
We've developed the first
#AI
…
My talk on neural operators for science and engineering at the U.S. National Congress on Computational Mechanics. We have made exciting progress in applying neural operators to challenging multi-scale problems like weather forecasting, nuclear fusion
As Hurricane Lee tracked slowly westward in the middle of the Atlantic Ocean, three new weather models developed in the private sector predicted the storm would make landfall in Nova Scotia about a week later.
A short demo of Lean Co-pilot by
@KaiyuYang4
We are here at
#NeurIPS2023
to talk about AI for theorem proving
- Tutorial on Machine Learning for Theorem Proving (): Monday 1:45–4:15 PM, Hall B2
- LeanDojo’s oral presentation: Tuesday 10 AM, Ballroom A-C…
Join us for coffee chat at
@icmlconf
Learn more about both full-time and intern positions in our AI Algorithms group. DM me or
@DrJimFan
@Azizzadenesheli
@JeanKossaifi
@ChaoweiX
1. AI + Science, Neural Operators
2. Foundation models + Decision Making
3. Fundamental AI research
I'm going to ICML in Hawaii!
My team pushes the research frontier in AI agents, multimodal LLMs, game AI, and robotics. If you're interested in joining NVIDIA or collaborating with me, please reach out by email! My contact info is at
If applicable,…
NVIDIA Modulus—a framework for developing physics-based machine learning models that can be used for everything from basic analysis to digital twins.
#GTC21
Thank you
@BloombergTV
for having me on
#AI
IRL talking about AI+Science and how AI is revolutionizing weather forecasting through faster and cheaper forecasts allowing us to create larger ensembles.
NVIDIA is using AI to create “very accurate digital twins of this planet.” The company's AI research director
@AnimaAnandkumar
explains why.
Watch the full episode of AI IRL here
I am excited about generative AI but also concerned that not everyone can equally benefit from it. I got mixed results from
@PrismaAI
Some bore a resemblance, but many didn't. Also could not control the hypersexualization of women. Bias in the models is so obvious.
Take a break from OpenAI drama.
@bjenik
encouraged me to try out the
@leica_camera
during our recent trip. A refreshing break from AI. trying manual focus and settings makes photography lot more enjoyable.
#seoul
#market
Come work with us on exciting and cutting-edge
#ai
research on topics such as:
1. AI + Science and Neural Operators
2. Foundation models and Decision Making
3. Fundamental algorithmic research in AI.
We have internships open year-round at
@nvidia
Agree. Our work showed that using GPT4 in an interactive way to generate automatic curriculum and build skill library can tackle complex long-horizon tasks in Minecraft. The future of RL will be generative modeling + online learning.
Excited to release first genome-scale language models revealing evolutionary dynamics of Sars-CoV-2. Our model rapidly and accurately identifies variants of concern. Largest biological language models > 1.4 zettaflops on
@nvidia
Selene and Polaris
Bill Dally
@nvidia
presenting at
@caltech
on role of number representation in
#deeplearning
1000x speedup over the last decade. Logarithmic representation has the best efficiency. We propose multiplicative updates for low-precision training in log system
It has been an absolute pleasure to work with
@GuanyaShi
at
@Caltech
He has done groundbreaking work in interdisciplinary area "learning to fly" requires building new foundations in
#ai
for safety+stability. See impressive demos on
#drones
He is on academic market. Hire him!
I wrote a blog post: "Neural-Control Family: What Deep Learning + Control Enables in the Real World." This post discusses some key principles of our work on learning robotic agility in safety-critical systems (e.g., Neural-Swarm below).
An amazing experience going to the remotest continent
#Antarctica
on invitation from
#chile
#senate
and got flown on a military plane. It is an honor to be part of this special group!
#congresofuturo
Fundamental rights of women stripped. I never expected to see this day. Dogma and extremism instead of scientific foundations are dangerous. Guns now have more rights than women.
#AbortionIsHealthcare
#AbortionRightsAreHumanRights
Our StrassenNet had shown 99.5% reduction in number of multiplications in
#deeplearning
and automatically discovering Strassen's algorithm. Good to see this line of work being extended
Since 1969 Strassen’s algorithm has famously stood as the fastest way to multiply 2 matrices - but with
#AlphaTensor
we’ve found a new algorithm that’s faster, with potential to improve efficiency by 10-20% across trillions of calculations per day!
NVIDIA Modulus has created physics-informed
#digitaltwins
of wind farms and power plants, as well as being utilized for prognosis and health management. Learn more about this neural network framework.
#GTC22