Tomas Hernando Kofman Profile
Tomas Hernando Kofman

@tomas_hk

Followers
2K
Following
686
Media
94
Statuses
551

¬◇

Joined April 2012
Don't wanna be here? Send us removal request.
@tomas_hk
Tomas Hernando Kofman
6 months
Today we’re launching Prompt Adaptation, a state-of-the-art agentic system that automatically adapts prompts across LLMs. Prompt Adaptation outperforms all other methods and significantly improves accuracy over manual prompt engineering, saving you thousands of hours per year.
23
71
642
@tomas_hk
Tomas Hernando Kofman
9 days
(if you want to try out our prompt optimizer, dm me and I can whitelist you)
1
1
2
@tomas_hk
Tomas Hernando Kofman
9 days
Kimi K2 Thinking might be the world's most powerful model right now—but I will bet $100 that you're not using it correctly. Most people are just beginning to understand the fragile nature of prompts and how they drift over time. 30-60% of prompts will degrade when switching
2
2
7
@tomas_hk
Tomas Hernando Kofman
11 days
The prompt optimization presentation at @SAP TechEd hit the max occupancy limit and I was barely able to get in—standing room only. Great to see these incredible results from @Henkel's use of Prompt Optimizer!
0
3
9
@tomas_hk
Tomas Hernando Kofman
12 days
Excited to share that today we are deepening our partnership with SAP. Prompt optimization is now available through SAP’s Generative AI Hub, enabling developers building in SAP’s ecosystem to automatically optimize AI prompts across different models, dramatically enhancing
1
5
16
@acompa_
Alejandro Companioni
26 days
I had fully bought into GEPA's Pareto-frontier framing for prompts, but the ACE paper changed my thinking in two important ways. 🧵
1
2
8
@tomas_hk
Tomas Hernando Kofman
1 month
Rootly used Not Diamond to optimize their prompts on SRE tasks and *doubled* performance on Sonnet and nearly maxxed out GPT-5 performance (91.3% -> 97.4%). Hell yeah ♥️
@rootlyhq
Rootly
1 month
While Sonnet-4.5 remains a popular choice among developers, our benchmarks show it underperforms GPT-5 on SRE-related tasks when both are run with default parameters. However, using the @notdiamond_ai prompt adaptation platform, Sonnet-4.5 achieved up to a 2x performance
0
4
21
@tomas_hk
Tomas Hernando Kofman
1 month
Optimized prompts let smaller models deliver stronger results while reducing cost and latency. This matters even more in multi-prompt applications, where latency compounds at every step. dm me for access if you want to try it out 🤍
0
0
2
@tomas_hk
Tomas Hernando Kofman
1 month
With Prompt Adaptation, in ~30 minutes of background processing, we automatically generate and test many prompt variations and find the best-performing one. The resulting Gemini 2.5 Flash prompt scored 97.5%, outperforming the stronger Pro baseline by 4.5%.
1
0
2
@tomas_hk
Tomas Hernando Kofman
1 month
Clinc150 is a dataset for intent classification in conversational assistants. A prompt written for Gemini 2.5 Pro scored 93% accuracy. On Gemini 2.5 Flash (a faster, cheaper model), the same prompt scored 86.75%.
1
0
2
@tomas_hk
Tomas Hernando Kofman
1 month
Weaker models → stronger results Strong vs weak model comparisons are normally a tradeoff between performance and cost/latency. But we can level the field with prompt adaptation.
1
3
8
@tomas_hk
Tomas Hernando Kofman
2 months
tldr: newer models don’t automatically guarantee better results. Without adaptation, migrations often lead to regressions, technical debt, and last-minute scrambles when models are deprecated. With Prompt Adaptation, prompts are automatically optimized so teams can improve
1
0
3
@tomas_hk
Tomas Hernando Kofman
2 months
The adapted prompt for Sonnet 4 reached 89% accuracy, not only reversing the regression but also surpassing both GPT-4o and Sonnet 4 with the original prompt.
1
0
3
@tomas_hk
Tomas Hernando Kofman
2 months
With Prompt Adaptation, the process is automated. In ~30 minutes of background processing, the system generates many prompt variations and identifies the best-performing one.
1
0
3
@tomas_hk
Tomas Hernando Kofman
2 months
Even though Sonnet 4 is stronger on benchmarks, it underperformed GPT-4o without adapting the prompt. Traditionally, fixing this requires a ton of manual trial and error. Many customers have estimated up to 40hrs of engineering work to rewrite and test prompts for new models.
1
0
3
@tomas_hk
Tomas Hernando Kofman
2 months
As an example: a prompt originally written for GPT-4o (released Nov ’24, now ~2 generations old) to perform intent classification on Banking77. The original prompt scored 82.5% accuracy on GPT-4o. Running the same prompt on Sonnet 4 (released May ’25) dropped accuracy to 80%.
1
0
3
@tomas_hk
Tomas Hernando Kofman
2 months
When teams migrate to a newer model with the same prompt, performance often regresses. Prompts aren’t portable, each model version interprets instructions differently.
1
0
3
@tomas_hk
Tomas Hernando Kofman
2 months
Better models → worse results: why prompt adaptation matters When a new model is released, the expectation is simple: stronger benchmarks should translate into stronger real-world performance. But we generally see the opposite.
2
6
10
@BEBischof
Bryan Bischof fka Dr. Donut
2 months
This debate has really captured the timeline. Sadly, most folks discussing it are mostly missing the nuance. I think Swyx understands this a lot more deeply than the folks discussing this elsewhere, so I recommend his thread here beyond a lot of the branched ones. As one of the
@swyx
swyx 🗽 @aidotengineer AIE CODE
2 months
Claude Code: no evals [well known code agent company]: no evals [well known code agent company 2]: kinda halfassed evals [leading vibe coding company]: no evals [ceo of company selling you evals]: mmmmm yess all my top customers do evals, you should do evals [vc's in love
9
8
139
@tomas_hk
Tomas Hernando Kofman
3 months
Sam is talking about personality here... but personality isn’t just what intelligence sounds like—it is *part* of intelligence. It’s the organizing system that filters, prioritizes, and directs cognition. So his quote is a pretty different position from just a few years ago.
0
3
7