hla_michael Profile Banner
Michael Hla Profile
Michael Hla

@hla_michael

Followers
1K
Following
1K
Media
14
Statuses
125

bio + cs | prev @harvard @shv

Joined July 2020
Don't wanna be here? Send us removal request.
@hla_michael
Michael Hla
6 months
I taught an LLM to optimize proteins. It proposed a better carbon capture enzyme. Introducing Pro-1, an 8b param reasoning model trained using GRPO towards a physics based reward function for protein stability. It takes in a protein sequence + text description + previous
93
334
3K
@776foundation
776 Foundation
2 months
Meet #climatefellow @hla_michael. He’s an independent researcher using AI to accelerate biology, starting with a model that designs enzymes for better carbon capture. His work bridges machine learning & molecular engineering to tackle climate at the cellular level.
Tweet media one
1
12
24
@alexisohanian
Alexis Ohanian 🗽
2 months
Thrilled to announce our 2025 @776foundation Fellows! I’m giving each fellow $100k to tackle one of the biggest threats to humanity: Climate Change. Over the next two years, my Foundation will support these young trailblazers as they come up with innovative and forward-thinking
Tweet media one
17
16
161
@hla_michael
Michael Hla
3 months
Thanks again to the @adaptyvbio team! Uploaded a csv of sequences and got super detailed assay results with no overhead. Would highly recommend
@adaptyvbio
Adaptyv Bio
3 months
Pro-1, a protein design model by @hla_michael, doesn’t just propose mutations — it explains why it made them. We tested 19 of its FGF-1 designs in our lab and 3 of them improved thermostability while maintaining binding. In this protein designer spotlight we explain how
Tweet media one
0
0
15
@andrewwhite01
Andrew White 🐦‍⬛
3 months
I want to point out that over the last few weeks there has been other great work on building reasoning models in biology. Don't want to get stuck on defining what is reasoning/verified rewards - this is just cool work to highlight: @hla_michael did some amazing early work on
@andrewwhite01
Andrew White 🐦‍⬛
3 months
At FutureHouse, we’ve noticed scientific agents are good at applying average intelligence across tasks. They always seem to make the obvious choices, which is good, but discovery sometimes requires more intuition and insight than average. We’ve made the first step today towards
3
10
97
@hla_michael
Michael Hla
4 months
Sequences and Thermostability Data: https://t.co/Ynaf5jd2Wo Binding Affinity Data:
Tweet card summary image
docs.google.com
2
1
2
@hla_michael
Michael Hla
4 months
Special thank you to @julian_englert @danielnzg85 and the @adaptyvbio team for sponsoring this validation. The entire process was seamless and would highly recommend their services!
1
0
7
@hla_michael
Michael Hla
4 months
Nevertheless, these sequences serve as the first ever LLM optimized proteins and serve as valuable baseline validation. Looking forward to synthesizing the carbonic anhydrases and pushing the model’s capabilities.
1
0
3
@hla_michael
Michael Hla
4 months
None of the successful sequences had very interesting modifications (as indicated by the top performer being a single point mutation variant) or truly impressive reasoning. Most focused on point mutations and keywords similar to those provided in the prompt. This raises a valid
1
0
5
@hla_michael
Michael Hla
4 months
Comparing results to existing variants, the Pro-1 sequences are competitive with some of the most stable publicly available sequences in literature. The K116E variant (v3) in particular demonstrated exceptional improvement in melting temperature, demonstrating a 23.9 degree
Tweet media one
1
0
4
@hla_michael
Michael Hla
4 months
The successful variants all had similar reasoning traces, typically referencing generic properties of stable proteins such as low flexibility, solubility, etc. On occasion, the model would reference details more specific to FGF-1, such as integrin or heparin binding affinity.
Tweet media one
1
0
2
@hla_michael
Michael Hla
4 months
Of the 19 variant sequences: 16/19 were able to be expressed 7/16 showed reliable thermal stability signal 3/7 had higher melting temperature 3/3 preserved binding affinity to FGFR1 (compared to 6/16 for all of the expressed variants)
Tweet media one
1
0
2
@hla_michael
Michael Hla
4 months
The prompt included general information about FGF-1 (function, known interactions), mutagenesis data from UniProt, and select excerpts from papers that have previously engineered more stable FGF-1 variants. The base and creative model instances were sampled 50 times each, with
1
0
3
@hla_michael
Michael Hla
4 months
Why FGF-1? Human fibroblast growth factor (FGF-1) is a 155 amino acid protein implicated in processes such as cell differentiation, tissue repair, and metabolic regulation. It also has been shown to have some therapeutic potential in parkinson’s, type 2 diabetes, and
1
0
5
@hla_michael
Michael Hla
4 months
First Lab Validation for Reasoning Model Proteins With @adaptyvbio, we tested 19 FGF-1 sequences optimized by Pro-1 for thermal stability and binding affinity to human FGFR-1. Pro-1 produced 3 novel sequences that maintained binding affinity and expression compared to wild
Tweet media one
6
25
155
@hla_michael
Michael Hla
6 months
Full blog post: https://t.co/7TTfkGdHlL Codebase: https://t.co/19GbxsWxtt Model Weights:
huggingface.co
7
11
119
@hla_michael
Michael Hla
6 months
If you would like to contribute or have any feedback, don’t hesitate to reach out. This has been my pet project over the past 2 months and would love to hear your thoughts.
6
0
46
@hla_michael
Michael Hla
6 months
Pro-1 demonstrates the transferability of natural language models to sequence optimization tasks and presents a new possibility in leveraging language models for scientific discovery. With strong reward signals, language models can reason over complex scientific tasks and one
2
1
41
@hla_michael
Michael Hla
6 months
Looking forward, the biggest priority is to synthesize the model generated sequences (actively looking for help with this). Wet lab validation is absolutely necessary for a project like this, and synthesizing these sequences is the ultimate test for any model designed sequences.
4
0
47
@hla_michael
Michael Hla
6 months
The creative model then reasoned through the insights from the literature provided and suggested novel modifications motivated by the themes of the papers provided. For example, in its best generation, the creative model reasoned that introducing a peptide tag would enhance
Tweet media one
3
1
44
@hla_michael
Michael Hla
6 months
For the base model, I passed in the native HCA II sequence, effects of known mutations, excerpts from a review on the topic (Fiore, 2015), reaction mechanism, and residues that were known to be involved in the reaction. Out of 100 samples, the best proposal from the base model
2
1
39