The story of Nigel Richards, the man from New Zealand who memorized every French word in the French scrabble dictionary and won the French Scrabble Championship without speaking any French
If you’ve ever wanted to take a grubby Python project and turn it into something that looks more like a well run open-source project (👋 ML researchers), here’s a guide I wrote on how to do it.
I was frustrated after Googling for hours, so hope it helps!
Tech report coming soon!
SSMs are an amazing fit for audio, perplexity numbers with our new architecture blow Transformer baselines out of the water, look at this giant gap on training loss
🚀Excited to release Robustness Gym, a new Python evaluation toolkit for evaluating the robustness of NLP models, as part of a collaboration between Stanford, Salesforce Research and UNC Chapel-Hill.
Paper:
Code:
pip install!
(Thread) I finally got GPT-3 access last week (shout out to
@gdb
), and took a stab at an experiment that I've been curious about for a while.
TLDR: training a model on a dataset entirely generated by GPT-3.
You can read my blog at .
Incredibly excited to be releasing our first model,
@cartesia_ai
Sonic today.
Sonic is a voice model based on a new state space model architecture we've developed that's blazing fast, efficient and high quality.
It's the first of many models we're building to bring cheap
Today, we’re excited to release the first step in our mission to build real time multimodal intelligence for every device: Sonic, a blazing fast (🚀 135ms model latency), lifelike generative voice model and API.
Read and try Sonic
We built an interactive data frame powered by foundation models that can wrangle your unstructured data (images, videos, text docs...)
Introducing 🔮 Meerkat!
📃
💻
🌐
There’s a weird dichotomy where all the AI researchers I interact with think there’s a lot left to do on designing new architectures that improve over Transformers — but everyone else seems to be entirely unaware that this is even a possibility left to consider
Preprint alert!
"Model Patching: Closing the Subgroup Performance Gap with Data Augmentation" is now on arXiv!
📑Paper:
🧑💻Code:
📹Video:
✍️Blog:
Read on to learn more (1/9)
Writing a rebuttal for NeurIPS,
What I want to say 😏
“Your review is $%*€¥. Try again. 2/10.”
What I actually say 😒
“Thanks for the helpful feedback. Your wisdom and insight are truly wondrous and move my soul. I was touched that you think we don’t have enough baselines...”
Excited to release a new resource for Data Centric AI:
...with a great post by
@HazyResearch
about our lab's journey in this:
This is already a community effort with 20+ folks who have contributed discussion. Please send us PRs!
Excited to release Meerkat, a new data library for interactive machine learning! We've (
@jundesai
,
@EyubogluSabri
,
@HazyResearch
) been building this up over the last couple of months.
Read our blog post to learn more:
Awesome to see that our MLSys seminar series now has 3k subs on YouTube (and counting):
I’m constantly amazed by how many folks I interact with have watched, thanks for tuning in! (and subscribe)
@realDanFu
@w4nderlus7
@matei_zaharia
@HazyResearch
A while back, I wrote a Python library for handling YAML-based configuration in my ML projects.
I've been installing (`pip install quinine`) it for my own projects for a while, now you can use it too
README:
Really chuffed to see that we've crossed 5000 subs on our MLSys Seminar YouTube after 34 weeks of streaming ().
A big thanks to all our speakers and viewers, and the cast (
@realDanFu
,
@w4nderlus7
,
@HazyResearch
,
@matei_zaharia
, Fiodar)!
Want to use state space models (S4 -- ) and don't know where to start?
We just put up an example script () on how to build a simple S4 model backbone that crosses the previous SOTA on sequential CIFAR (81%) in 30 minutes on a V100!
Quadratic attention has been indispensable for information-dense modalities such as language... until now.
Announcing Mamba: a new SSM arch. that has linear-time scaling, ultra long context, and most importantly--outperforms Transformers everywhere we've tried.
With
@tri_dao
1/
excited to finally release Mamba-2!! 8x larger states, 50% faster training, and even more S's 🐍🐍
Mamba-2 aims to advance the theory of sequence models, developing a framework of connections between SSMs and (linear) attention that we call state space duality (SSD)
w/
@tri_dao
Indian society is cursed. The trope of the “qualified woman” whose sole purpose is marriage is frankly infuriating.
These idiotic “traditions” permeate even the most liberal parts of India. If you’re Indian, your family probably has people who clutch onto these ideals.
We built a data exploration dashboard that we shipped with
@togethercompute
's new Red Pajama LLM data release!
We embedded the entire Github subset of Red Pajama (releasing indexes + embeddings soon!).
Built in 100 lines of Python with
@MeerkatML
🚀
🚀 ChatGPT / GPT-4 for querying and asking questions on codebases
Point to any GitHub repo, and get an index that is used to answer questions. Use --prompt-only mode if you can only access GPT-4 via ChatGPT to copy-paste.
Built with
@MeerkatML
!
We're bringing you the 2nd episode of the Stanford MLSys Seminar tomorrow.
@matei_zaharia
will talk about lessons from
@databricks
in building and deploying
@MLflow
.
Tune in at 3pm PT Th at (and join our mailing list at )!
A new tool in the Robustness Gym universe!
This work is prompted by a basic lesson we’re learning in the RG project: quantitative metrics are fuzzy measures of performance, and need to be supported by interactive tools that support deeper inspection. Both are important!
New blogpost on
@StanfordCRFM
:
What will it take to put models like GPT-X into software and not have to worry about insane behavior and bugs?
We discuss making foundation models a reliable software abstraction: new programming tools are going to be key!
Someone pointed me to this fragment from Jensen's Wired article -- amazing to see the support around SSMs (and really cool that he's so technically plugged in)
Come by our Model Patching poster at
@iclr_conf
today!
We describe how data augmentation with a domain-translation model and combined with robust training can improve worst-case performance.
Talk/Poster Link:
Time: Today (Monday 5/3) 5-7pm PT [Spot C1]
Wow, this went randomly viral and seems to have struck a chord.
In the spirit of self-promotion: check out our work on making ML models more robust.
Video:
Arxiv preprint:
Very excited about the future of AI!
A very short blog post on 3 directions for data tools I’m personally excited about in the era of GPT-4.
We’re working on these in
@MeerkatML
(stay tuned for something cool coming soon!)
I miss the pre-mid-2022 days when my Twitter was a daily digest of ML research preprints
Now you can’t go 3 tweets without somebody trying to teach you a new incantation to yell into the magic box — it’s the AI equivalent of drugs and vegetables
We've crowdsourced a ton of contributions to so far!
You can now get a broad overview of Data Centric AI there -- we've got discussion on weak supervision, self supervision, robustness, data augmentation, privacy, data selection, and more.
Check it out!!
Excited to see the RedPajama dataset released: check out the
@MeerkatML
data exploration dashboard we put together in a collaboration with
@togethercompute
as part of this release 🚀
We’ll continue to update and add to that in the RedPajama repo!
Announcing RedPajama — a project to create leading, fully open-source large language models, beginning with the release of a 1.2 trillion token dataset that follows the LLaMA recipe, available today!
More in 🧵 …
@AstleDsa
SSMs generally crush on data derived from continuous signals -- we've observed this consistently across many applications and modalities (audio, video, EEG, EKG, other time series).
Lots more to learn and improve here
Delighted to announce that our paper (with
@AIforHI
) on “Learning Procedural Abstractions and Evaluating Discrete Latent Temporal Structure” has been accepted to ICLR 2019!
ChatGPT is pretty cool. Braindump:
It might make mistakes in reasoning and knowledge retrieval, this is not worth overindexing on in my opinion. This is certain to improve quickly, although it’s good to know what’s not working quite as well yet
We’ve got new work out, appearing at NeurIPS this year!
We extended S4 beyond sequences to handle images and videos.
Our new S4ND layer is a drop in for ConvND in any architecture!
With all the demos flowing for GPT-3, I thought it would be fun to speculate about what this means for the future of user interfaces.
I haven't blogged before, but I decided to take the plunge (it's short).
GPT-3 & The Natural Language Programmer
Preprint alert!
"Model Patching: Closing the Subgroup Performance Gap with Data Augmentation" is now on arXiv!
📑Paper:
🧑💻Code:
📹Video:
✍️Blog:
Read on to learn more (1/9)
Amidst a barrage of great work released at frenetic pace, it's easy to feel like there's nothing left for you to do (esp. in academia).
I rarely worry about this now -- a trick I use is to imagine myself 3 years ago and then think about all the cool shit that has happened since.
Super excited to get this grant with
@HazyResearch
and Sharon Li on new directions for robust machine learning systems. Shout out to
@StephanZheng
and
@nazneenrajani
for their support!
We're thrilled to announce this year's
@SFResearch
Deep Learning Grant winners
@ChenhaoTan
@gregd_nlp
@pulkitology
Christopher Ré and Hung-yi Lee! 🎉👏 We're excited to work together to advance the state of AI. Read more about the winning proposals:
Committed to UC Berkeley over Duke. Hardest decision of my life thus far. Here’s to hoping I get out of this alive (and with all my limbs intact). Go Bears! 🐻
Check out Mistral, our code base for training large LMs.
We’ve also released multiple random seeds, 600+ checkpoints per run for GPT-2 Small and Medium
We're excited to open-source Mistral 🚀 - a codebase for accessible large-scale LM training, built as part of Stanford's CRFM ().
We're releasing 10 GPT-2 Small & Medium models with different seeds & 600+ checkpoints per run!
[1/4]
My (17 yr old) brother just released his first product! It’s a Chrome extension that improves your exposure to news stories from other points of view. The design is great and it runs the latest and greatest NLP models for news recs.
Download and review!
Unslant, my browser extension to surface ideologically contrasting takes on the political news you’re reading, is live on
@ProductHunt
! First product release ever, can’t wait to see where this goes 😃
Excited to see this report on foundation models go out today, where I co-authored the data section led by
@laurel_orr1
:
Huge credit to
@percyliang
and
@RishiBommasani
for orchestrating this and making sure each section hit a pretty stringent quality bar.
Our new ICML paper Mandoline, where we tell you how to calculate metrics on unlabeled data using importance weighting.
The key idea is using domain knowledge from users to simplify the weighting. Think of it as human-in-the-loop model monitoring!
At 10:35 on Wed (Dec 5)
@krandiash
, Tong Mu,
@turingmusician
and Emma Brunskill will have a demo on “Automatic Curriculum Generation Applied to Teaching Novices a Short Bach Piano Segment” in Room 510 ABCD # D10
It's been awesome working with
@vipulved
,
@ce_zhang
and all the folks at Together as we bring our new models to production. They have an incredible team building the new-age infrastructure for gen AI.
Looking forward to partnering more closely on future model releases.
Yesterday,
@Cartesia
released Sonic, a blazing fast lifelike generative voice model and API.
We are thrilled to partner with them to achieve the fastest text-to-voice generations in the market with 135 ms model latency 🚀 to provide real-time inference to their users with high
If you missed our Meerkat tweets, check out this sick demo from
@EyubogluSabri
to do flash fill in a notebook:
This takes GPT-3 and does excel-style flash fill on PDFs: mixes interaction, complex data (PDFs) and LLMs in one place!
Built in pure Python🐍
Incredibly excited about our work on structured state spaces!
@_albertgu
did a spectacular job bringing the theory together elegantly, give this thread a read to learn more about how it compares to other models eg. transformers
(1/n)
Excited to release 2 preprints that describe our progress on sequence modeling for long-range dependencies!
(NeurIPS ‘21)
We build a new class of state space models that improve perf. on the Long Range Arena by 20 points!
I’m bullish on voice design and mixing, like the one in
@cartesia_ai
Sonic
It’s super fun to create new voices and it solves a safety problem. You can imagine endless synthetic voices that don’t belong to anyone
This is going to be key to building great audio experiences
The voice mix feature is pretty sweet- with a few tweaks you can get any voice you want…with the clone feature it could get a bit wild quick. Imagine calendar app reminders spoken by a voice clone of who scheduled the event “looking forward to catching up tomorrow”
Cartesia Chief Scientist
@_albertgu
teamed up with Together Chief Scientist
@tri_dao
to release a new 3B Mamba text model trained on the SlimPajama dataset, in a close collaboration with Cartesia &
@togethercompute
.
Read more on our blog:
Playing with
@cartesia_ai
and I’m super impressed!
The voices feel natural and more human-like compared to anything I’ve seen before.
Check out the demo! There are two interesting moments around 1:06 and at the end – not sure what happened there 🤪
Still measuring things but
Check out our data-centric AI post on the SAIL blog!! We've had 30+ people contribute to the Github already!
Especially interested in gathering case studies on real applications where data-centric approaches may have helped -- so reach out to me if you have leads!
Now on the SAIL blog:
@hazyresearch
and
@krandiash
write a retrospective on the Hazy lab's journey in data-centric AI, and a new effort to build a community resource for data-centric AI on GitHub ().
Check it out!
Check out the Stanford AI blogpost by
@SharonYixuanLi
, featuring some of the recent work from
@HazyResearch
. Features a sneak preview of our most recent effort on “model patching” — making ML models robust with data augmentation!
Wrote a blog on automating the art of data augmentation, featuring latest works on the practice, theory and new direction of data augmentation from
@HazyResearch
. Check out on
@StanfordAILab
website:
SSMs go brrrr
super excited to announce the first model powered by our latest research into efficient architectures 👀
stay tuned for more details soon!
We’ve got a few explainers and preprints coming out around S4 soon — starting with this!
This blogpost walks through and explains S4 using basic concepts in calculus and signals
Read it and be rid of Transformers forever
S4 is an amazing sequence model - but has seemed mysterious. It doesn't have to be!
In this blog (originally an internal explainer for our group),
@HazyResearch
looks at S4 from first principles that are familiar to most sophomore engineering students.
🚀 Demo: 🖼️ Custom Visualizations in
@MeerkatML
This is super cool: a fully interactive viz in your notebook cell in 7 lines of code.
Plug in any embeddings and works on any data type out-of-the-box: images, text, audio, <your datatype>!
This demo shows this for audio data!
Foundation models have enabled amazing human-in-the-loop systems (ChatGPT, Copilot, ++). How can we bring them to bear on important batch computing tasks (like information extraction) - where we need efficiency and reliability at scale?
Early thoughts at:
Very happy to have received a seed grant from
@StanfordHAI
alongside Peyton Greenside (CompBio+ML), Dan Imler (EmergencyMed) and Sumit Bhargava (Pediatrics). My first time applying for a grant — look forward to seeing what we achieve!
The MLSys seminar series is back from a hiatus this Thursday at a new time (1.35-2.30pm PT) --
If you aren't on our mailing list already, here's the link:
If you're at Stanford, you can take it as a class for credit (CS528)
Something I’m very curious about is how existing technologies and systems will be redesigned to accommodate AIs interacting with them.
Example: writing websites so an AI can read and manipulate source code to fully customize the experience to every user.
🤗
@MeerkatML
now has super good
@huggingface
datasets integration!
Our data frames (arrow-backed) can load in any HF dataset (like audio in this gif!) and give you a much nicer interactive experience in a notebook -- we're multimodal from the ground up 🚀
Now I can take a picture of my fridge and automatically get Instacart groceries, with a tailored set of recipes delivered to my doorstep.
Only to spend the rest of the week manually doordashing food instead
Great thread! A reminder to assume good intentions, do your homework and listen/engage constructively.
The best researchers I’ve worked with weren’t just experts, but also struck me as extraordinarily empathetic and thoughtful. Lucky to be surrounded by them at Stanford.
Over the past few weeks, both
@timnitGebru
and
@adjiboussodieng
have taken considerable great for challenging others in AI to be more fair and equitable. Perhaps I can share a brief story and analogy to help others understand the situation more clearly.
1/n Stop by our poster today at
#ICLR
(BC
#39
, 4.30pm on) to learn more about our work on "Learning Procedural Abstractions and Evaluating Discrete Latent Temporal Structure".
Joint work with
@EmmaBrunskill
in the Stanford
@AIforHI
lab!
Slides from my invited talk at the Data Quality Assessment for ML workshop at KDD '21 are up:
Overviews our work from the past few months: check it out and give us feedback!
What if building a web app was a much more live, interactive process?
Here's Smoothie, my experiment in adapting notebooks to full-stack dev. It allows you to wire together inputs, buttons, Markdown, and "actions" (Python, JS, and LLM queries) to build apps. 🧋 (More below.)
Say hello to 🧠 MRI support in
@MeerkatML
!
You can now visualize 3D MRIs, segmentations, all that fun stuff in a few lines of Python!
And… you can even stick this into a dashboard and send it to your clinician for annotation 🧑🎨
Want to visualize your large medical imaging datasets? Now you can with
@MeerkatML
!
Check out this custom interface for visualizing *all* 3D 🧠 MRIs from
@BraTS_challenge
. Supports reformats, segmentation overlays 🎨, and more!
All with <10 lines of code in native Python 🐍
Generating Long Videos of Dynamic Scenes
Demonstrates SotA performance in producing long videos with realistic motion and changes in content by leveraging a two-phase training strategy.
proj:
abs:
Our work on SpaceTime, led by
@mzhangio
,
@KhaledSaab11
is out!
Brings a new architecture to time series data: an area that’s been a little bit neglected in the LLM era.
If you’re excited about applications of this model, talk to
@mzhangio
LLMs today are amazing, but how to get time series foundation models?
@ICLR2023
we share some ideas w SpaceTime🌌
➡️New architecture for SoTA forecasting, classification
➡️Comes w expressive modeling; fast + flexible decoding; long-context
Key ideas, paper, code, fun demo👇
The question I'm asking is: how hard is it to go from "I want to solve this classification task but have no data" -> "generate dataset for the task" -> "train model"?
I did this for NLI, and managed to get a pretty decent model using *only* examples generated using GPT-3.
Subtleties aside, the biggest takeaway for me from S4 was that people need to stop being attention maximalists, maybe we’ll be strapping Transformer blocks to our brains in 20 years who knows, but this other stuff out there will maybe level the playing field in a year or two
"What Makes Convolutional Models Great on Long Sequence Modeling?"
CNNs—not transformers—now dominate the hardest sequence modeling benchmark.
Here's how this happened: [1/14]
After a brief hiatus, the Stanford MLSys seminar is back for the winter quarter! 🎇
Today, we'll be livestreaming with
@SongHan_MIT
on TinyML and reducing AI's carbon footprint ().
Join us at 12pm PT!
Fox News’ Peter Doocy uses all his time at the White House press briefing to ask about an assessment that “literally everyone on Earth will die” because of artificial intelligence:
“It sounds crazy, but is it?”