Jason Yim
@json_yim
Followers
2K
Following
624
Media
21
Statuses
317
PhD student @MIT_CSAIL. Generative models, protein design. 🦋 Bluesky handle: https://t.co/MPlEjog02q On X until the exodus is complete.
Cambridge, MA
Joined September 2017
Combining discrete and continuous data is an important capability for generative models. To address this for protein design, we introduce Multiflow, a generative model for structure and sequence generation. Preprint: https://t.co/wuj9l5sTLc Code: https://t.co/IwIoC74Odm 1/8
2
99
468
🚨🚨🚨 Now your Masked Diffusion Model can self-correct! We propose PRISM, a plug-and-play approach fine-tuning method that adds self-correction ability to any pretrained MDM! (1/N)
6
42
254
(1/5) Beyond Next-Token Prediction, introducing Next Semantic Scale Prediction! Our @NeurIPSConf NeurIPS 2025 paper HDLM is out! Check out the new language modeling paradigm: Next Semantic Scale Prediction via Hierarchical Diffusion Language Models. It largely generalizes
7
55
343
We introduce a new ''rule'' for understanding diffusion models: Selective Underfitting. It explains: 🚨 How diffusion models generalize beyond training data 🚨 Why popular training recipes (e.g., DiT, REPA) are effective and scale well Co-led with @kiwhansong0! (1/n)
8
64
419
It's odd the performance is significantly worse than the AR base model? Starting with a much powerful AR model, dropping the performance just enough to beat all other diffusion LLMs, and then saying it's better than them is weird...
More on RND1 models: Blog: https://t.co/VGHEu7J98P Code: https://t.co/rqUmMDsC2Q Report: https://t.co/JlnejayKV2 Weights: https://t.co/3pc1NngnmF
1
0
16
New work: “GLASS Flows: Transition Sampling for Alignment of Flow and Diffusion Models”. GLASS generates images by sampling stochastic Markov transitions with ODEs - allowing us to boost text-image alignment for large-scale models at inference time! https://t.co/unsuG3mYer [1/7]
4
60
248
We've cleaned up the story big time on flow maps. Check out @nmboffi's slick repo implementing all the many ways to go about them, and stay tuned for a bigger release 🤠 https://t.co/7WygKSpbZP
https://t.co/Juucy5l844
Consistency models, CTMs, shortcut models, align your flow, mean flow... What's the connection, and how should you learn them in practice? We show they're all different sides of the same coin connected by one central object: the flow map. https://t.co/QBp1kELVhF 🧵(1/n)
2
22
132
Very excited to share our preprint: Self-Speculative Masked Diffusions We speed up sampling of masked diffusion models by ~2x by using speculative sampling and a hybrid non-causal / causal transformer https://t.co/6e37sx8Cbu w/ @ValentinDeBort1 @thjashin @ArnaudDoucet1
2
39
186
🎉Personal update: I'm thrilled to announce that I'm joining Imperial College London @imperialcollege as an Assistant Professor of Computing @ICComputing starting January 2026. My future lab and I will continue to work on building better Generative Models 🤖, the hardest
98
34
625
We've open sourced Adjoint Sampling! It's part of a bundled release showcasing FAIR's research and open source commitment to AI for science. https://t.co/6oBTnael8p
https://t.co/rYmJ02KguC
github.com
code for "Adjoint Sampling: Highly Scalable Diffusion Samplers via Adjoint Matching" - facebookresearch/adjoint_sampling
Announcing the newest releases from Meta FAIR. We’re releasing new groundbreaking models, benchmarks, and datasets that will transform the way researchers approach molecular property prediction, language processing, and neuroscience. 1️⃣ Open Molecules 2025 (OMol25): A dataset
1
23
117
#FPIworkshop best paper award goes to @peholderrieth @msalbergo and Tommi Jaakkola. Congrats and great talk Peter!
0
9
55
I won't be at ICLR 🥲 but you can talk to these other cool people at my poster, Thursday 3-5:30 PM in Hall 3+2B #10!
Excited to share my #ICLR2025 paper, with JC Hütter and friends! Genetic perturbation screens allow biologists to manipulate and measure the genes in cells = discover causal relationships! BUT they are expensive to run, expensive to interpret. ... We use LLMs to help!
0
3
26
Had fun exploring guidance for backbone designability within this latent framework, excited to chat more about guidance with experimental data @gembioworkshop ICLR
I'll be at the ICLR @gembioworkshop workshop presenting latent and structure diffusion for protein backbone generation. Come by to talk all things latent for biology. https://t.co/r3Fqg47NZT
https://t.co/SkdJ82syqA
0
3
14
I'll be at the ICLR @gembioworkshop workshop presenting latent and structure diffusion for protein backbone generation. Come by to talk all things latent for biology. https://t.co/r3Fqg47NZT
https://t.co/SkdJ82syqA
arxiv.org
We propose a hierarchical protein backbone generative model that separates coarse and fine-grained details. Our approach called LSD consists of two stages: sampling latents which are decoded into...
1
12
71
I'll be at ICLR. Come check out our generative modeling work! Reach out if you want to chat. Proteina: https://t.co/ZZXeekrfqp Protcomposer: https://t.co/Ijb7d1VQ6p Generator matching:
New paper out! We introduce “Generator Matching” (GM), a method to build GenAI models for any data type (incl. multimodal) with any Markov process. GM unifies a range of state-of-the-art models and enables new designs of generative models. https://t.co/6BTkr3ukYc (1/5)
1
12
56
RFdiffusion => generative binder design. RFdiffusion2 => generative enzyme design. It's rare to find scientists with deep knowledge in chemistry, machine learning, and software engineering like Woody. The complexity of enzymes matches the complexity of his skills. Check out RFD2
New enzymes can unlock chemistry we never had access to before. Here, we introduce RFdiffusion2 (RFD2), a generative model that makes significant strides in de novo enzyme design. Preprint: https://t.co/cAWGkrSxBo Code: coming soon Animation credit: https://t.co/Th9ZjsYeX2 (1/n)
1
33
146
New enzymes can unlock chemistry we never had access to before. Here, we introduce RFdiffusion2 (RFD2), a generative model that makes significant strides in de novo enzyme design. Preprint: https://t.co/cAWGkrSxBo Code: coming soon Animation credit: https://t.co/Th9ZjsYeX2 (1/n)
13
76
260
Excited to share our preprint “BoltzDesign1: Inverting All-Atom Structure Prediction Model for Generalized Biomolecular Binder Design” — a collaboration with @MartinPacesa, @ZhidianZ , Bruno E. Correia, and @sokrypton. 🧬 Code will be released in a couple weeks
15
114
450
Protein dynamics was the first research to enchant me >10yrs ago, but I left in PhD bc I couldnt find big expt data to evaluate models. Today w @ginaelnesr, I'm thrilled to share the big dynamics data I've been dreaming of, and the mdl we trained: Dyna-1. https://t.co/ZawXaGdCVw
4
37
170
Combining prediction, generation, and modalities (sequence, structure, nucleic acids, small molecules, proteins) is the future. Congrats to the authors! Looking forward to the technical report.
Announcing Neo-1: the world’s most advanced atomistic foundation model, unifying structure prediction and all-atom de novo generation for the first time - to decode and design the structure of life 🧵(1/10)
1
8
74
Introducing All-atom Diffusion Transformers — towards Foundation Models for generative chemistry, from my internship with the FAIR Chemistry team @OpenCatalyst @AIatMeta There are a couple ML ideas which I think are new and exciting in here 👇
4
106
529