Rajat Monga
@rajatmonga
Followers
13K
Following
1K
Media
5
Statuses
624
Inference @ Microsoft, Past: Founder TensorFlow, Inference IO
Joined November 2008
LSTMs were the early language models we scaled up with DistBelief back in 2013, well before TensorFlow, great to see a retake on that combining it with newer ideas. Evolution at work!
Sepp Hochreiter giving a keynote talk at #NeurIPS2024 about xLSTM having key structural advantages such as very fast inference speed and high parameter efficiency compared to flash attention transformers and state-space models. xLSTM resources: https://t.co/jgMY8j2xLe
0
0
10
Great work by ONNX Runtime Web team enabling Whisper in the browser!
It's finally possible: real-time in-browser speech recognition with OpenAI Whisper! 🤯 The model runs fully on-device using Transformers.js and ONNX Runtime Web, and supports multilingual transcription across 100 different languages! 🔥 Check out the demo (+ source code)! 👇
0
0
5
Each language has a place and time. Java brought value in speeding up project times, and enabling more developers. Spark in C++ (Photon) is a lot more performant, but not many devs can do that well. Now folks want to rewrite golang (new Java) code in Rust (new C++)!
When Java became popular, people (me included) claimed that it was massively better than C/C++. This was highly controversial and people mocked me for using Java. I was hammered by the referees during my first grant application for picking Java as my language of choice. In some
2
2
6
Is the era of massive AI model growth over? We got the last 1000X from better compute & smaller number formats. The path to the next 1000X isn't so clear... https://t.co/kHChv8cgIB
0
2
2
Awesome result with #AlphaTensor. Anything we can gamify, DeepRL is ready to go.
Today in @Nature: #AlphaTensor, an AI system for discovering novel, efficient, and exact algorithms for matrix multiplication - a building block of modern computations. AlphaTensor finds faster algorithms for many matrix sizes: https://t.co/E18DezRPTL & https://t.co/SvHgsa0SNV 1/
0
1
5
Small changes => big returns. Find the right leverage points.
1/ Expensive A100 GPUs being underutilized due to CPU bottlenecks? 💰 TensorRT speedup being held back by a busy Python thread? 🐍 Learn more about our journey getting a 20-40% boost by removing CPU as a bottleneck when applying LLM's to millions of pages in our web index. 🧵
0
1
7
AI enables entirely new experiences. Just as Cloud and Mobile did over the last two decades. This is the decade of AI.
At Google Venture a decade ago we searched for AI enabled companies and came up dry. That has changed. AI is going to eat software companies. Primarily because it creates entire new UX that incumbents can’t adopt without breaking their product. 10 year hypercycle just started.
0
1
5
We thought we were onto something when we were building DistBelief and writing this paper and we were. Amazing looking back a decade later. Great working with @JeffDean @AndrewYNg @quocleix and the whole Brain Team.
Honored that our 2012 paper "Building High-level Features Using Large Scale Unsupervised Learning" received an @icmlconf Test of Time Award honorable mention! Joint work with @quocleix, @MarcRanzato, @RajatMonga, Matthieu Devin, Kai Chen, @greg_corrado, myself, & @AndrewYNg.
2
0
34
The next bump up ↗️ in *maternal mortality* is here. Time to bring it down ↘️
0
0
1
"You have to be willing to open black boxes" True for all systems as you scale. Glad @Neeva is talking about what's under the hood.
1/ Building a search index requires processing lots of documents. Systems like @ApacheSpark are great, but require love and attention to detail at scale. You have to be willing to open black boxes to run the engines smoothly. A few Learnings 🧵
0
0
1
Data is the attractor here. Apps will be where the data is, not because it is the best solution, but because of the layers of governance and security that people get comfortable with. All data apps will run on one of 5 data platforms - 3 clouds + Snowflake + Databricks.
There’s still chatter of building apps on the data warehouse - but the DW still provides a suboptimal experience when doing so many forms of analyses (ML, scenario planning, graph analysis, causal inference).
0
0
1
Thoughtful and elucidating. Ideas are indeed Powerful, and hence, can also be Dangerous.
I've now been asked multiple times for my take on Elon's offer for Twitter. So fine, this is what I think about that. I will assume the takeover succeeds, and he takes Twitter private. (I have little knowledge/insight into how actual takeover battles work or play out) (long 🧵)
0
0
3
As Russians look to finally take Mariupol - amid unconfirmed claims of chemical weapons use - its worth reflecting that a Facebook group for relatives looking for missing loved ones now has 140k members.
104
3K
8K
Agree with @levie on this one wholeheartedly. Call me a slow learner but I don't see enough real issues being solved with Web3 for all the hype.
@mmasnick @clamentjohn @jack @alphabreacher I just don’t agree with the main premise. We have more protocols that ever before. Maybe there’s an issue of identity portability, but I don’t think that practically solves that much (without introducing a new set of issues).
0
1
4
"Two Refugees, Both on Poland’s Border. But Worlds Apart." https://t.co/N300ohnJFa Albagir was punched in the face, called racial slurs ... Katya wakes up every day to a stocked fridge and fresh bread on the table ...
0
0
7