
Michael Hla
@hla_michael
Followers
1K
Following
1K
Media
14
Statuses
125
bio + cs | prev @harvard @shv
Joined July 2020
I taught an LLM to optimize proteins. It proposed a better carbon capture enzyme. Introducing Pro-1, an 8b param reasoning model trained using GRPO towards a physics based reward function for protein stability. It takes in a protein sequence + text description + previous
93
334
3K
Meet #climatefellow @hla_michael. He’s an independent researcher using AI to accelerate biology, starting with a model that designs enzymes for better carbon capture. His work bridges machine learning & molecular engineering to tackle climate at the cellular level.
1
12
24
Thrilled to announce our 2025 @776foundation Fellows! I’m giving each fellow $100k to tackle one of the biggest threats to humanity: Climate Change. Over the next two years, my Foundation will support these young trailblazers as they come up with innovative and forward-thinking
17
16
161
Thanks again to the @adaptyvbio team! Uploaded a csv of sequences and got super detailed assay results with no overhead. Would highly recommend
Pro-1, a protein design model by @hla_michael, doesn’t just propose mutations — it explains why it made them. We tested 19 of its FGF-1 designs in our lab and 3 of them improved thermostability while maintaining binding. In this protein designer spotlight we explain how
0
0
15
I want to point out that over the last few weeks there has been other great work on building reasoning models in biology. Don't want to get stuck on defining what is reasoning/verified rewards - this is just cool work to highlight: @hla_michael did some amazing early work on
At FutureHouse, we’ve noticed scientific agents are good at applying average intelligence across tasks. They always seem to make the obvious choices, which is good, but discovery sometimes requires more intuition and insight than average. We’ve made the first step today towards
3
10
97
Sequences and Thermostability Data: https://t.co/Ynaf5jd2Wo Binding Affinity Data:
docs.google.com
2
1
2
Special thank you to @julian_englert @danielnzg85 and the @adaptyvbio team for sponsoring this validation. The entire process was seamless and would highly recommend their services!
1
0
7
Nevertheless, these sequences serve as the first ever LLM optimized proteins and serve as valuable baseline validation. Looking forward to synthesizing the carbonic anhydrases and pushing the model’s capabilities.
1
0
3
None of the successful sequences had very interesting modifications (as indicated by the top performer being a single point mutation variant) or truly impressive reasoning. Most focused on point mutations and keywords similar to those provided in the prompt. This raises a valid
1
0
5
Comparing results to existing variants, the Pro-1 sequences are competitive with some of the most stable publicly available sequences in literature. The K116E variant (v3) in particular demonstrated exceptional improvement in melting temperature, demonstrating a 23.9 degree
1
0
4
The successful variants all had similar reasoning traces, typically referencing generic properties of stable proteins such as low flexibility, solubility, etc. On occasion, the model would reference details more specific to FGF-1, such as integrin or heparin binding affinity.
1
0
2
Of the 19 variant sequences: 16/19 were able to be expressed 7/16 showed reliable thermal stability signal 3/7 had higher melting temperature 3/3 preserved binding affinity to FGFR1 (compared to 6/16 for all of the expressed variants)
1
0
2
The prompt included general information about FGF-1 (function, known interactions), mutagenesis data from UniProt, and select excerpts from papers that have previously engineered more stable FGF-1 variants. The base and creative model instances were sampled 50 times each, with
1
0
3
Why FGF-1? Human fibroblast growth factor (FGF-1) is a 155 amino acid protein implicated in processes such as cell differentiation, tissue repair, and metabolic regulation. It also has been shown to have some therapeutic potential in parkinson’s, type 2 diabetes, and
1
0
5
First Lab Validation for Reasoning Model Proteins With @adaptyvbio, we tested 19 FGF-1 sequences optimized by Pro-1 for thermal stability and binding affinity to human FGFR-1. Pro-1 produced 3 novel sequences that maintained binding affinity and expression compared to wild
6
25
155
Full blog post: https://t.co/7TTfkGdHlL Codebase: https://t.co/19GbxsWxtt Model Weights:
huggingface.co
7
11
119
If you would like to contribute or have any feedback, don’t hesitate to reach out. This has been my pet project over the past 2 months and would love to hear your thoughts.
6
0
46
Pro-1 demonstrates the transferability of natural language models to sequence optimization tasks and presents a new possibility in leveraging language models for scientific discovery. With strong reward signals, language models can reason over complex scientific tasks and one
2
1
41
Looking forward, the biggest priority is to synthesize the model generated sequences (actively looking for help with this). Wet lab validation is absolutely necessary for a project like this, and synthesizing these sequences is the ultimate test for any model designed sequences.
4
0
47
The creative model then reasoned through the insights from the literature provided and suggested novel modifications motivated by the themes of the papers provided. For example, in its best generation, the creative model reasoned that introducing a peptide tag would enhance
3
1
44
For the base model, I passed in the native HCA II sequence, effects of known mutations, excerpts from a review on the topic (Fiore, 2015), reaction mechanism, and residues that were known to be involved in the reaction. Out of 100 samples, the best proposal from the base model
2
1
39