Rex Ma
@RexMa9
Followers
112
Following
778
Media
8
Statuses
50
CS PHD student @ UToronto | AI for biology
Toronto, Ontario
Joined April 2018
1/ Really excited to see IntegrAO (Integrate Any Omics) 🧬, a multi-omics integration framework, has been published in Nature Machine Intelligence! 🚀 🗒️ Paper: https://t.co/StEwDfKpea 💻 Code: https://t.co/EPgepiFh3p
1
4
21
Excited to release BoltzGen which brings SOTA folding performance to binder design! The best part of this project has been collaborating with many leading biologists who tested BoltzGen at an unprecedented scale, showing success on many novel targets and pushing its limits! 🧵..
20
268
985
@xingyuchen67 Grateful to all my incredible collaborators @xingyuchen67 @LLawrenceLin @JasonLinjc and my supervisor @BoWang87 for their guidance, support, and inspiration throughout this journey! @VectorInst @UofTCompSci @UHN_Research
1
0
2
@xingyuchen67 • Evaluated on human enhancer & promoter datasets across 6 cell types. • Consistently outperforms evolutionary, generative, and RL baselines, improving specificity, motif correlation and diversity.
1
0
0
@xingyuchen67 Ctrl-DNA transforms gLMs into constraint-aware designers: enabling specificity targeting, minimizing off-target expression, and generating sequences aligned with true TF motifs. This contribution makes gLMs not just predictive, but practical tools for controlable DNA design!
1
0
0
Excited to share that Ctrl-DNA, our constrained RL + Genomic Language Model system for cell-type–specific regulatory DNA design, co-led with @xingyuchen67, was accepted as NeurIPS 2025 Spotlight (top 3.2%) 🧬✨ Paper: https://t.co/kZHZ4YFcdA Code: https://t.co/wO42Qv2chY
1
1
4
Ctrl-DNA, our constrained RL + Genomic Language Model system for cell-type–specific regulatory DNA design, was accepted as a @NeurIPSConf 2025 Spotlight (top 3.2%) 🧬✨ Paper: https://t.co/anDOO3xg9m Code: https://t.co/CQ9FPisCEk TL;DR We fine-tune DNA GLMs with a constraint RL
🧬 We have many foundation models or language models for DNAs, but can we control them? We introduce Ctrl-DNA: Controllable Cell-Type-Specific Regulatory DNA Design via Constrained RL — a reinforcement learning framework for controllable cis-regulatory sequence generation.
1
17
88
We are excited to share GPN-Star, a cost-effective, biologically grounded genomic language modeling framework that achieves state-of-the-art performance across a wide range of variant effect prediction tasks relevant to human genetics. https://t.co/FTm3byYp67 (1/n)
17
159
521
Excited to see BioReason accepted to @NeurIPSConf 2025! Grateful to all my incredible collaborators!
Our BioReason is published in @NeurIPSConf 2025!! BioReason is the first reasoning model built on a biological foundation model (DNA-LLM) — an AI system that can reason across multimodal biological data through languages 🧬🤖 This marks a key step toward AI systems that think
0
1
1
Latest genomic AI models report near-perfect prediction of pathogenic variants (e.g. AUROC>0.97 for Evo2). We ran extensive independent evals and found these figures are true, but very misleading. A breakdown of our new preprint: 🧵
9
118
483
Between linear probing and full fine-tuning, I believe linear probing better reflects the intrinsic quality of learned embeddings. Full fine-tuning introduces many factors—like model size or learning rates that can heavily influence final performance. What are your thoughts?
0
0
0
For zero-shot benchmarks, tasks typically differ from tuning-based evaluations, often focusing on variant effect prediction. Models that excel in fine-tuning can struggle in zero-shot settings.
1
0
0
There are three common benchmarking setups for genomic/protein language models: zero-shot, linear probing (adapter tuning), and full fine-tuning. I've noticed rankings can vary significantly depending on the chosen method. Which approach best reflects a model's true capabilities?
Benchmarking gLMs across the central dogma of bio: Genomic Touchstone: 34 genomic LMs, 36 tasks, 88 datasets spanning DNA → RNA → protein I was kinda shocked to see models trained only on DNA can predict RNA mods & protein features better* than RNA/Protein LMs🤔
1
0
1
Introducing the world's first reasoning model in biology! 🧬 BioReason enables AI to reason about genomics like a biology expert. A thread 🧵:
25
260
1K
How can we make genomic foundation models actually useful to biology?! Teach them to REASON!! 🧬 Excited to share BioReason - the first model to successfully integrate DNA foundation models (eg, Evo 2) with LLMs (eg, Qwen3) for biological reasoning! 🔬 What we built: • Novel
16
107
452
Conditional generation? Nah. Controllable generation!
🧬 We have many foundation models or language models for DNAs, but can we control them? We introduce Ctrl-DNA: Controllable Cell-Type-Specific Regulatory DNA Design via Constrained RL — a reinforcement learning framework for controllable cis-regulatory sequence generation.
0
0
4
Excited to see BoneMarrowMap out in press! Please check out the R package (now capable of classifying 100,000 hematopoietic cells in 10 minutes) and consider giving the paper a read for some insights into how differentiation goes wrong in AML!
Online now concurrent with #AACR25 #hematooncology Minisymposium talk by @andygxzeng and Dr. John Dick team @PMResearch_UHN: Single-cell transcriptional mapping reveals genetic and hierarchy-based determinants of aberrant differentiation in AML. https://t.co/gKNyA5rQpK
6
18
74
I am super excited for AI for scientific innovation, a direction that will certainly grow in the next five years. I think there will be two flavors of it. The first is “deepmind style”, where there is a very specific, important problem to solve (e.g., protein-folding), and you
17
69
618
🚀 Our IntegrAO is trending! We're thrilled to see IntegrAO featured in the latest Nature Machine Intelligence "News & Views" article by Alexander Schönhuth! 🎉: https://t.co/1OUMOVk988 Multi-omics modeling is crucial for precision medicine, yet challenges like data
DeepSeek-R1 is great, but it still does not handle genomics yet :) 🚀 We’re thrilled to announce that IntegrAO (pronounced "integral") 🧬🔬—our new multi-omics integration framework—has been published in Nature Machine Intelligence! 📑 Paper: https://t.co/6aidWd5PND 💻 Code:
3
13
68
🚀 Introducing scGPT-spatial! 🧬🌍 A game-changing spatial-omic foundation model, built on the powerful scGPT framework with MoE (mixture of experts) and continually pretrained on a massive 30 million spatial single-cell profiles! 🧠 What’s the challenge? Spatial
21
117
452
New preprint claims that most existing DNA language models perform just as well with random weights, suggesting that pretraining does nothing (Mistral & DNABERT-2 look like exceptions). We need better DNA language models.
18
64
453