hla_michael Profile Banner
Michael Hla Profile
Michael Hla

@hla_michael

Followers
1K
Following
1K
Media
14
Statuses
117

bio + cs | prev @harvard @shv

Joined July 2020
Don't wanna be here? Send us removal request.
@hla_michael
Michael Hla
4 months
I taught an LLM to optimize proteins. It proposed a better carbon capture enzyme. Introducing Pro-1, an 8b param reasoning model trained using GRPO towards a physics based reward function for protein stability. It takes in a protein sequence + text description + previous
93
341
3K
@hla_michael
Michael Hla
10 days
RT @alexisohanian: Thrilled to announce our 2025 @776foundation Fellows! I’m giving each fellow $100k to tackle one of the biggest threats….
0
17
0
@hla_michael
Michael Hla
25 days
Thanks again to the @adaptyvbio team! Uploaded a csv of sequences and got super detailed assay results with no overhead. Would highly recommend.
@adaptyvbio
Adaptyv Bio
25 days
Pro-1, a protein design model by @hla_michael, doesn’t just propose mutations — it explains why it made them. We tested 19 of its FGF-1 designs in our lab and 3 of them improved thermostability while maintaining binding. In this protein designer spotlight we explain how
Tweet media one
0
0
14
@hla_michael
Michael Hla
1 month
RT @andrewwhite01: I want to point out that over the last few weeks there has been other great work on building reasoning models in biology….
0
10
0
@hla_michael
Michael Hla
2 months
Sequences and Thermostability Data: Binding Affinity Data:
2
1
2
@hla_michael
Michael Hla
2 months
Special thank you to @julian_englert @danielnzg85 and the @adaptyvbio team for sponsoring this validation. The entire process was seamless and would highly recommend their services!.
1
0
7
@hla_michael
Michael Hla
2 months
Nevertheless, these sequences serve as the first ever LLM optimized proteins and serve as valuable baseline validation. Looking forward to synthesizing the carbonic anhydrases and pushing the model’s capabilities.
1
0
3
@hla_michael
Michael Hla
2 months
None of the successful sequences had very interesting modifications (as indicated by the top performer being a single point mutation variant) or truly impressive reasoning. Most focused on point mutations and keywords similar to those provided in the prompt. This raises a valid.
1
0
4
@hla_michael
Michael Hla
2 months
Comparing results to existing variants, the Pro-1 sequences are competitive with some of the most stable publicly available sequences in literature. The K116E variant (v3) in particular demonstrated exceptional improvement in melting temperature, demonstrating a 23.9 degree
Tweet media one
1
0
4
@hla_michael
Michael Hla
2 months
The successful variants all had similar reasoning traces, typically referencing generic properties of stable proteins such as low flexibility, solubility, etc. On occasion, the model would reference details more specific to FGF-1, such as integrin or heparin binding affinity.
Tweet media one
1
0
2
@hla_michael
Michael Hla
2 months
Of the 19 variant sequences:. 16/19 were able to be expressed.7/16 showed reliable thermal stability signal.3/7 had higher melting temperature.3/3 preserved binding affinity to FGFR1 (compared to 6/16 for all of the expressed variants)
Tweet media one
1
0
2
@hla_michael
Michael Hla
2 months
The prompt included general information about FGF-1 (function, known interactions), mutagenesis data from UniProt, and select excerpts from papers that have previously engineered more stable FGF-1 variants. The base and creative model instances were sampled 50 times each, with.
1
0
2
@hla_michael
Michael Hla
2 months
Why FGF-1?. Human fibroblast growth factor (FGF-1) is a 155 amino acid protein implicated in processes such as cell differentiation, tissue repair, and metabolic regulation. It also has been shown to have some therapeutic potential in parkinson’s, type 2 diabetes, and.
1
0
5
@hla_michael
Michael Hla
2 months
First Lab Validation for Reasoning Model Proteins. With @adaptyvbio, we tested 19 FGF-1 sequences optimized by Pro-1 for thermal stability and binding affinity to human FGFR-1. Pro-1 produced 3 novel sequences that maintained binding affinity and expression compared to wild
Tweet media one
6
25
153
@hla_michael
Michael Hla
4 months
Full blog post: Codebase: Model Weights:
7
11
120
@hla_michael
Michael Hla
4 months
If you would like to contribute or have any feedback, don’t hesitate to reach out. This has been my pet project over the past 2 months and would love to hear your thoughts.
6
0
47
@hla_michael
Michael Hla
4 months
Pro-1 demonstrates the transferability of natural language models to sequence optimization tasks and presents a new possibility in leveraging language models for scientific discovery. With strong reward signals, language models can reason over complex scientific tasks and one.
2
1
43
@hla_michael
Michael Hla
4 months
Looking forward, the biggest priority is to synthesize the model generated sequences (actively looking for help with this). Wet lab validation is absolutely necessary for a project like this, and synthesizing these sequences is the ultimate test for any model designed sequences.
4
0
49
@hla_michael
Michael Hla
4 months
The creative model then reasoned through the insights from the literature provided and suggested novel modifications motivated by the themes of the papers provided. For example, in its best generation, the creative model reasoned that introducing a peptide tag would enhance
Tweet media one
3
1
45
@hla_michael
Michael Hla
4 months
For the base model, I passed in the native HCA II sequence, effects of known mutations, excerpts from a review on the topic (Fiore, 2015), reaction mechanism, and residues that were known to be involved in the reaction. Out of 100 samples, the best proposal from the base model.
2
1
41
@hla_michael
Michael Hla
4 months
Optimizing Human Carbonic Anhydrase II (HCA II):. Enzymes have been an area of intense research for carbon capture due to their ability to catalyze CO₂ conversion with remarkable efficiency. Among these, HCA II is an exceptionally efficient candidate, speeding up the conversion
Tweet media one
1
6
67