Soojung Yang Profile
Soojung Yang

@SoojungYang2

Followers
960
Following
1K
Media
3
Statuses
202

AI+simulation for protein dynamics! PhD student @ MIT @RGBLabMIT / prev intern @MSFTResearch / website: https://t.co/xHAeCEbNpO

Joined September 2020
Don't wanna be here? Send us removal request.
@SoojungYang2
Soojung Yang
10 days
Check out our new preprint on converging representations in models of matter! https://t.co/gVmHlddq6e TLDR: Chemical foundation models (e.g., MLIPs) with very different architectures/tasks become increasingly **internally aligned** as they improve. See Sathya’s visual summary 👇
Tweet card summary image
arxiv.org
Machine learning models of vastly different modalities and architectures are being trained to predict the behavior of molecules, materials, and proteins. However, it remains unclear whether they...
@sathyaedamadaka
Sathya Edamadaka
10 days
Scientific foundation models are converging to a universal representation of matter. Come chat with us at #NeurIPS! We (@SoojungYang2 @RGBLabMIT) have an oral spotlight at the #NeurIPS #UniReps workshop and will also poster at #AI4Mat. 🖇️: https://t.co/H8ZRlpvJUj 🧵(1/5)
1
1
17
@SoojungYang2
Soojung Yang
8 days
@unireps @sathyaedamadaka And posters, @ SPIGM "Bayesian Optimization for Biochemical Discovery with LLMs" @ AI4Mat "Universally Converging Representations of Matter Across Scientific Foundation Models" Hit me up for coffee chat!
0
0
0
@SoojungYang2
Soojung Yang
8 days
NeurIPS⛵️ Find me here on 12/6! Oral @unireps, S1: "Universally Converging Representations of Matter Across Scientific Foundation Models" @sathyaedamadaka will present our work! Oral @AI4Mat, 4:10PM: "Guided Diffusion for A Priori Transition State Sampling" I'll be presenting.
2
1
12
@SoojungYang2
Soojung Yang
10 days
This motivated my 2024 work ( https://t.co/YreWI84Wch), where I showed that models across both input modalities strongly align and share similar statistical fingerprints (e.g., cysteines are consistently encoded with the lowest intrinsic dimensions).
openreview.net
Protein foundation models produce embeddings that are valuable for various downstream tasks, yet the structure and information content of these embeddings remain poorly understood, particularly in...
1
1
3
@SoojungYang2
Soojung Yang
10 days
@sathyaedamadaka It’s time to look more closely at what foundation models are actually learning internally. Representation will be the key to transferring, integrating, and interpreting knowledge across scientific foundation models. I’m heading to NeurIPS workshops. Excited for more discussion🤩
0
1
2
@SoojungYang2
Soojung Yang
10 days
@sathyaedamadaka It’s time to look more closely at what foundation models are actually learning internally. Representation will be the key to transferring, integrating, and interpreting knowledge across scientific foundation models. I’m heading to NeurIPS workshops. Excited for more discussion🤩
0
1
2
@SoojungYang2
Soojung Yang
10 days
Now with @sathyaedamadaka, we took this much further in one year. We examined 59 chemical models spanning molecules, polymers, and crystalline materials, and found even stronger evidence that models with vastly different architectures/tasks converge to aligned representations.
1
0
2
@SoojungYang2
Soojung Yang
10 days
This motivated my 2024 work ( https://t.co/YreWI84Wch), where I showed that models across both input modalities strongly align and share similar statistical fingerprints (e.g., cysteines are consistently encoded with the lowest intrinsic dimensions).
openreview.net
Protein foundation models produce embeddings that are valuable for various downstream tasks, yet the structure and information content of these embeddings remain poorly understood, particularly in...
1
1
3
@SoojungYang2
Soojung Yang
10 days
In the protein community, we’ve known that scaling up protein language models allows structural information to emerge from evolutionary sequence data. That led me to wonder: what if protein language and structure models are converging to the same internal statistical space?
1
0
2
@ginaelnesr
Gina El Nesr
26 days
looking forward to being in Boston this week & hope to see you at BPDMC 🤗🔬
@ProteinBoston
Boston Protein Design and Modeling Club
1 month
We can't wait until Wednesday, November 19th 2025 to arrive because we have @ginaelnesr from @PossuHuangLab speaking at 7pm EST in Room 6055, Longwood Center, @DanaFarber "Learning millisecond protein dynamics from what is missing in NMR spectra" https://t.co/E8bDyGzQgJ
0
4
22
@leonklein26
Leon Klein
1 month
(1/n) Can diffusion models simulate molecular dynamics instead of generating independent samples? In our NeurIPS2025 paper, we train energy-based diffusion models that can do both: - Generate independent samples - Learn the underlying potential 𝑼 🧵👇 https://t.co/TSurVY3YEl
12
141
838
@FrankNoeBerlin
Frank Noe
3 months
Thank you UW–Madison for this great honor! In 2016 I changed my group to AI4Science and went all-in on deep learning solutions for the molecular sciences. Best decision of my career in hindsight. Amazing what's become of that field - now AI for Science is a @MSFTResearch lab.
@MSFTResearch
Microsoft Research
3 months
Congratulations to Frank Noé, Partner Research Manager at Microsoft Research AI for Science, who received the 2025–2026 Joseph O. Hirschfelder Prize in Theoretical Chemistry for pioneering contributions at the interface of AI and chemistry. https://t.co/FMCPmbfAW1
9
5
65
@xie_tian
Tian Xie
5 months
Want to join our efforts @MSFTResearch AI for Science to push the frontier of AI for materials? We are the team behind MatterGen & MatterSim and we have 2 job openings! Each can be in Amsterdam, NL, Berlin, DE, or Cambridge, UK. It is a rare opportunity to join a highly talented,
0
24
77
@CecClementi
Cecilia Clementi
5 months
Our development of machine-learned transferable coarse-grained models in now on Nat Chem! https://t.co/HGngd8Vpop I am so proud of my group for this work! Particularly first authors Nick Charron, Klara Bonneau, @sayeg84, Andrea Guljas.
Tweet card summary image
nature.com
Nature Chemistry - The development of a universal protein coarse-grained model has been a long-standing challenge. A coarse-grained model with chemical transferability has now been developed by...
3
40
128
@SoojungYang2
Soojung Yang
5 months
@genbio_workshop @LucasPinede @junonam_ @RGBLabMIT @sihyun_yu Unfortunately both @LucasPinede and I won't be at ICML in person this year. But our poster will be at the workshop 😂 Feel free to DM us or email our lead author (lucasp at https://t.co/Zjbm3tdReJ) about the paper! Preprint coming very soon, please tune in! (4/n, N=4)
0
0
3
@SoojungYang2
Soojung Yang
5 months
@genbio_workshop @LucasPinede @junonam_ @RGBLabMIT At low noise, emulator scores ≈ physical forces. Even w/o alignment loss, emulator embeddings drift toward MLIP's. Inspired by @sihyun_yu’s REPA, we add a loss to encourage this — and training becomes faster & better. (3/n)
1
0
3
@SoojungYang2
Soojung Yang
5 months
@genbio_workshop @LucasPinede @junonam_ @RGBLabMIT Boltzmann emulators can sample conformations cheaper than MD but are expensive to train. They need equilibrium data, which is very hard to get. But… MLIPs like MACE are trained on tons of noneq. data. What if we transfer knowledge from force models to generative models? (2/n)
1
0
1
@SoojungYang2
Soojung Yang
5 months
🚀 Come check our poster at ICML @genbio_workshop! We show that pretrained MLIPs can accelerate training of Boltzmann emulators — by aligning their internal representations. Coauthors @LucasPinede, @junonam_, @RGBLabMIT (1/n)
2
17
151
@satyanadella
Satya Nadella
5 months
Understanding protein motion is essential to understanding biology and advancing drug discovery. Today we’re introducing BioEmu, an AI system that emulates the structural ensembles proteins adopt, delivering insights in hours that would otherwise require years of simulation.
@MSFTResearch
Microsoft Research
5 months
Today in the journal Science: BioEmu from Microsoft Research AI for Science. This generative deep learning method emulates protein equilibrium ensembles – key for understanding protein function at scale. https://t.co/WwKjj5B0eb
164
419
3K
@SoojungYang2
Soojung Yang
5 months
Happy to have contributed to this awesome project!
@MSFTResearch
Microsoft Research
5 months
Today in the journal Science: BioEmu from Microsoft Research AI for Science. This generative deep learning method emulates protein equilibrium ensembles – key for understanding protein function at scale. https://t.co/WwKjj5B0eb
1
7
86