Tomas Hernando Kofman @tomas_hk profile

Tomas Hernando Kofman

@tomas_hk

Followers

2K

Following

649

Media

86

Statuses

500

¬◇

Joined April 2012

Don't wanna be here? Send us removal request.

Tomas Hernando Kofman

@tomas_hk

9 days

Today we’re launching Prompt Adaptation, a state-of-the-art agentic system that automatically adapts prompts across LLMs. Prompt Adaptation outperforms all other methods and significantly improves accuracy over manual prompt engineering, saving you thousands of hours per year.

21

71

639

Tomas Hernando Kofman

@tomas_hk

1 year

1/9. We’re open-sourcing a lightweight preview of our router that sends queries to either GPT-3.5 or GPT-4, maximizing accuracy while drastically reducing costs and latency. Routing to Gemini, Mistral, Claude, and Llama coming soon. A few quick points:.

13

63

482

Tomas Hernando Kofman

@tomas_hk

10 months

Today we're releasing Not Diamond…. The world’s most powerful AI model router. Not Diamond maximizes LLM output quality by automatically recommending the best LLM on every request at lower cost and latency. And it takes <5m to set up. Watch this to see how to start using it:

23

76

394

Tomas Hernando Kofman

@tomas_hk

8 months

Today we're open-sourcing RoRF (Routing on Random Forests), a pairwise model router that beats all closed and open-source approaches, along with 12 pre-trained model routers:. Hugging Face: Github: Blog:

7

56

294

Tomas Hernando Kofman

@tomas_hk

10 months

This is the last chatbot you’ll ever need. Yesterday, @mckaywrigley built an oss Not Diamond-powered chat app. We loved it. So today we’re releasing a hosted version. Get the best LLM on every message and hyper-personalize routing to your preferences with feedback. Watch how:

12

24

134

Tomas Hernando Kofman

@tomas_hk

11 months

Standing-room only @websim_ai hackathon last night—so happy we could do this and thanks to everyone for coming and hanging out. Some of the coolest highlights in thread:

5

10

58

Tomas Hernando Kofman

@tomas_hk

3 months

We're hiring engineers and researchers to build the future of multi-model AI infrastructure. We're a small, technically elite team backed by Jeff Dean, Julien Chaumond, Ion Stoica, + more. And we guarantee a $50K investment in your next startup for every year you work with us.

4

13

58

Tomas Hernando Kofman

@tomas_hk

7 months

You can now use Not Diamond in Raycast to get recommended the best LLM on every message you send. It's super simple to set up—and I love having this just a shortcut away. s/o to @raycastapp and @thomaspaulmann for building such an incredible interface 🩶Set up link below 👇

5

6

50

Tomas Hernando Kofman

@tomas_hk

7 months

New top rankings for models on over 500,000 samples of real-world human preference data:. 1. Claude.2. Mistral(!).3. Perplexity . Claude 3.5 Sonnet is preferred by users over all other models. Surprisingly however, the next-best models are Mistral Large 2 and Perplexity(!),

3

10

40

Tomas Hernando Kofman

@tomas_hk

1 year

Hiring a founding engineer: Competitive salary, generous equity, robust benefits, and a guaranteed investment in a future startup if you ever decide to build something yourself down the line.

2

8

30

Tomas Hernando Kofman

@tomas_hk

9 months

Since launching Not Diamond chat a few weeks ago, we've had over 100,000 messages sent with 37.6% of messages receiving feedback(‼️) . First image: rankings across all models (sonnet > gpt-4o).Second image: my personal ranking (perplexity 💜). Reply to get your personal ranking.

4

6

32

Tomas Hernando Kofman

@tomas_hk

5 months

You can now use Not Diamond in @OpenRouterAI! Access 293 top models with a single API key and automatically get routed to the best one for your use case. OpenRouter is one of the most important AI dev platforms out there. Excited to see what folks build! 🙌 @xanderatallah.

OpenRouter

@OpenRouterAI

5 months

5. New Auto Router. Will post more details in the coming weeks, but you can get a sneak peak here: 🧠 It now routes to 19 different models, optimizing for quality, powered by Not Diamond. 💬 The chatroom now more clearly shows which model was used. The.

1

8

31

Tomas Hernando Kofman

@tomas_hk

10 months

Not Diamond sets new SOTA standards on major benchmarks like GPQA, Arena Hard, MMLU, and HumanEval. We achieve this by ensembling every other model into a meta-model that learns when to call each LLM. Routing opens a new frontier for the performance and generalizability of LLMs.

2

6

31

Tomas Hernando Kofman

@tomas_hk

8 months

We're hiring exceptional founding team members for Not Diamond:. • Small, elite technical team over-indexed on emotional intelligence.• ($50K*years at Not Diamond) investment in your next startup.• $10K for a successful referral. JDs in thread, email me at t5@notdiamond.ai.

2

7

27

Tomas Hernando Kofman

@tomas_hk

8 months

o1 is insanely powerful. and insanely expensive: . 60x more expensive than 4o, 1000x more than 4o-mini. And it's not actually better on all domains. We've put together a super simple repo that routes to o1 when it really matters. Watch this to learn how to use it:

2

6

23

Tomas Hernando Kofman

@tomas_hk

9 days

The future is multi-model. You wouldn’t rewrite your codebase every week, and you shouldn’t have to rewrite your prompts either. If you want to automate your prompt engineering and save 1000 hours this year, I'd love for you to try it out. Sign up at

1

2

27

Tomas Hernando Kofman

@tomas_hk

1 year

3/9.The world won't have one single, giant model that everyone sends everything to—instead, there will be many foundation models, millions of fine-tuned variants of those models, and countless custom inference engines running on top of them.

2

3

25

Tomas Hernando Kofman

@tomas_hk

9 months

@reidhoffman on @theallinpod:. “The mistake people make is they think there’s going to be one model to rule them all… You’re going to see networks of models, traffic control, escalation… The multi-model approach is going to be quickly universal.". Networks of computers > big

2

6

25

Tomas Hernando Kofman

@tomas_hk

1 year

@southpkcommons @OpenAI Spent this weekend at the @southpkcommons / @OpenAI hackathon building Temper, a tool that surfaces divisive tweets on contentious political subjects and drafts replies using evidence-based de-escalation techniques to reduce polarization: With incredible.

4

6

23

Tomas Hernando Kofman

@tomas_hk

1 year

9/9.We believe a world of diverse models is not only a better future for AI, but a safer one as well. We're excited to be open-sourcing notdiamond-0001 and we're looking forward to seeing what everyone builds with it!.

1

23

Tomas Hernando Kofman

@tomas_hk

9 days

Prompt Adaptation is a black box prompt optimization technique (BPO) similar to that pioneered by DSPy. It takes your original prompt and a set of golden inputs and outputs from your application and then iterates over thousands of potential prompts to find the best prompt for

1

24

Tomas Hernando Kofman

@tomas_hk

1 year

2/9.◇ We outperform GPT-4 by a factor of 1.51x when used as a router. ◇ We determine which model to call in <10ms. ◇ Available on HF or for free through our API, where we also continuously monitor OpenAI for outages 24/7 and reroute to a fallback model of your choice.

1

0

21

Tomas Hernando Kofman

@tomas_hk

9 days

Prompt Adaptation addresses this critical issue by automatically adapting prompts across models, replacing 25+ hrs of manual prompt engineering with 30m of background processing. We outperform all other evaluated techniques including Meta DSPy and Bedrock Prompt Optimization.

1

22

Tomas Hernando Kofman

@tomas_hk

10 months

I’m incredibly honored not only to launch Not Diamond today but also to announce our $2.3M pre-seed round led by @defyvc with backing from some of the greatest AI scientists, engineers, and executives on this planet: @JeffDean (Google), @julien_c (Hugging Face), @iamthezack.

1

5

21

Tomas Hernando Kofman

@tomas_hk

7 months

@bindureddy If anyone wants to integrate model routing into their app, we have an API at for our SOTA model router, along with a chatbot that learns your routing preferences in real-time. Backed by Jeff Dean, Julien Chaumond, etc.

1

3

20

Tomas Hernando Kofman

@tomas_hk

10 months

This is sick. @mckaywrigley built a personalized chatbot arena on top of Not Diamond. Fully open-source too, check it out:.

Mckay Wrigley

@mckaywrigley

10 months

Meet AI Router Chat. It’s a personal chatbot arena I made that adapts to your model preferences over time. It uses Not Diamond’s new API to dynamically select the best LLM for a given query. Watch to see how it works - I’m obsessed. GitHub link below!

0

5

19

Tomas Hernando Kofman

@tomas_hk

10 months

This is a huge validation for routing. For any data distribution, no LLM will beat every other on every single query. And not only does routing achieve SOTA, it does so at a much lower cost. For example, Not Diamond beats GPT-4o on MMLU by 1.6% with 29.6% lower costs:

1

2

19

Tomas Hernando Kofman

@tomas_hk

7 months

Not Diamond is now integrated into @weights_biases Weave 🎉 . LLM routing can boost accuracy by 25% and reduce costs by 10x. Here’s how to train a custom router on your evals with w&b Weave and Not Diamond to route between LLMs:. s/o @l2k @altryne 🙏 🖤.

1

5

19

Tomas Hernando Kofman

@tomas_hk

10 months

New LLMs are released every day with evolving quality, cost, latency, and context windows across domains. Now, you don't have to guess when to use which model. Plus, we make it super easy to train your own router on your own data. Try it here:

1

2

18

Tomas Hernando Kofman

@tomas_hk

8 months

You can now use the Not Diamond chat app to generate images—your own personal chatbot arena for image gen:

6

4

18

Tomas Hernando Kofman

@tomas_hk

10 months

Getting started with Not Diamond literally takes less than 5m. Go try it now. Let me know what you think. Not Diamond: n.b. This launch tweet was composed entirely without emojis.

3

1

18

Tomas Hernando Kofman

@tomas_hk

7 months

If you want to maximize RAG output quality while also saving tens of thousands of dollars (in a few lines of code)—you can't rely on a single LLM. Here's how to add SOTA model routing into your RAG apps:. Code: App: Built with.

0

4

16

Tomas Hernando Kofman

@tomas_hk

11 months

Cosmic fabric simulator by @cutiekatw:

1

3

16

Tomas Hernando Kofman

@tomas_hk

4 months

How to route between reasoning models like @deepseek_ai R1 and regular models like Claude 3.5 Sonnet 👇 This works out of the box. This is how you get all the reasoning firepower of R1 without burning up latency on every request!

0

3

17

Tomas Hernando Kofman

@tomas_hk

5 months

In my op-ed with a former OpenAI exec, we argue that LLMs are becoming fuzzy commodities. Around a core of abilities, models are commoditizing—leading to a race to the bottom. But at the edges, models are specializing. Both of these point to a multi-model future for 2025 🧵

1

7

17

Tomas Hernando Kofman

@tomas_hk

9 days

I’m also excited to announce additional funding from @defyvc, @IBM, Fund, @MyriadVC, @deepwatermgmt, @dnxventures, and @AmbushCapital to continue building a world class team (have never worked with a better team in my life), and it’s an honor to have such.

1

3

17

Tomas Hernando Kofman

@tomas_hk

10 months

Go try it now:. Chat app: API: We're #1 on Product Hunt right now:

1

2

17

Tomas Hernando Kofman

@tomas_hk

10 months

VCs often ask me to describe how we're different than our competitors. The honest answer is that I'm so happy that so many smart teams and researchers are working on this problem. So I put together an awesome-ai-model-routing list to make it easier to find them:

2

3

17

Tomas Hernando Kofman

@tomas_hk

3 months

Really nice technical blog post from @JungMinki7 surveying the model routing landscape, from Automix to RouteLLM to Not Diamond. Cool to see we helped Minki cut his trip-planning AI's cost by 50% and latency by 30%:

1

3

12

Tomas Hernando Kofman

@tomas_hk

10 months

@OriolVinyalsML Amazing work @OriolVinyalsML ! love this.

0

4

Tomas Hernando Kofman

@tomas_hk

2 years

After reading Outlive by @PeterAttiaMD, I struggled to think of anyone I *wouldn't* recommend it to—it's essential reading on living longer & better. But at 500 pages, it’s a big time investment. That's why I wrote summarizing all key points & suggestions.

1

2

14

Tomas Hernando Kofman

@tomas_hk

1 year

4/9.We’ve talked to hundreds of developers building on top of LLMs. For nearly everyone, model routing sucks. Teams are using heuristics to route deterministically with if/else statements, regex expressions, and handwritten prompts. We decided there had to be a better way.

1

16

Tomas Hernando Kofman

@tomas_hk

10 months

We’re building the “meta-model” of AI. Robust routing infrastructure will be critical to effective AI, and by shifting the future towards networks of specialized models rather using a single giant monolithic model for everything, we can create a much safer world to live in.

1

16

Tomas Hernando Kofman

@tomas_hk

6 months

Not Diamond is now available in @langflow_ai by @DataStax to enable developers to access LLM routing in their no-code workflows!. With LLM routing you can maximize quality 📈, save costs 💰, and reduce latency 🏎️, with minimal effort. See how 👇

2

7

16

Tomas Hernando Kofman

@tomas_hk

10 months

Critically, Not Diamond is not a proxy so all requests go out client-side. Our router's inference speed is blazing fast (<100ms), and you can enable fuzzy data hashing on our API for increased data privacy, or seamlessly deploy to your own infrastructure for maximum security.

1

2

16

Tomas Hernando Kofman

@tomas_hk

10 months

Another feature I’m excited about is the ability to jointly-optimize prompts and model routing recommendations. Not Diamond integrates with auto-prompt frameworks like DSPy and SAMMO so you can translate prompts across models and always call the best model with the best prompt.

1

2

16

Tomas Hernando Kofman

@tomas_hk

9 days

Nine months ago, we released the world’s most powerful model router, outperforming every foundation model on every major benchmark. But choosing the right model is only half the battle—we also have to know how to prompt them. Doing this manually is extremely time-consuming.

1

2

17

Tomas Hernando Kofman

@tomas_hk

10 months

Not Diamond works out of the box with no setup, but it’s even more powerful when trained on your data. You literally just upload a dataset with your LLM inputs and eval scores and get back your own custom model router. Here’s how simple it is to train a router with Not Diamond:

1

2

14

Tomas Hernando Kofman

@tomas_hk

10 months

30 years ago, Yahoo tried to build the everything website. Google took the opposite bet: that the future of the internet would be incredibly fragmented. So they built a router—from search queries to websites—and became the single “meta-website” for the entire internet.

1

2

15

Tomas Hernando Kofman

@tomas_hk

7 months

LLMs are incredibly powerful on about 80% of tasks—it’s the last 20% that prevents a project from reaching production. The “human-in-the-loop” approach is designed to close this gap by leveraging a human expert to review and correct any mistakes the LLM makes. But how do you

1

3

15

Tomas Hernando Kofman

@tomas_hk

10 months

Small, specialized models can outperform larger models on narrow domains. Routing gives specialized models the robustness of general ones. This is not only more computationally efficient—we get huge interpretability and safety benefits as a free bonus. This is our trojan horse.

1

2

15

Tomas Hernando Kofman

@tomas_hk

8 months

When routing between two strong models, RoRF achieves higher accuracy than either individual model at significant cost reductions. When routing between a strong and a weak model pair, we outperform other routing approaches at a lower cost.

1

14

Tomas Hernando Kofman

@tomas_hk

5 months

@karpathy @EverydayAI_ Andrej, you should check out <- it's a general framework that can take any evaluation data over any set of models for any set of inputs and learn an optimal recommendation algorithm for it to predictively select the best model for each input.

1

2

14

Tomas Hernando Kofman

@tomas_hk

10 months

We train custom routers for each benchmark and report on test splits. While benchmarks only weakly correlate to real-world use cases, our results powerfully illustrate how for any distribution of data, Not Diamond can route between LLMs to outperform each of them individually.

1

2

14

Tomas Hernando Kofman

@tomas_hk

4 months

1/9.Don’t put all your eggs in one LLM basket. The stock predictions about R1 don’t matter, and neither do the conspiracy theories. Even the model itself doesn’t really matter. DeepSeek R1 really matters because it means the number of frontier AI models is about to explode.

1

7

14

Tomas Hernando Kofman

@tomas_hk

6 months

Not Diamond is now live on @Zapier! Watch this to learn how to build a Slack chatbot in under 3 minutes that dynamically routes between @AnthropicAI's Sonnet and Haiku to maximize quality while significantly reducing costs:

1

4

13

Tomas Hernando Kofman

@tomas_hk

8 months

Not Diamond now supports routing to custom models. You can route between proprietary models like GPT-4o, private fine-tunes, agents. this is something a ton of people have asked for. Watch me quickly train a model router from scratch with both proprietary and custom LLMs:

1

4

13

Tomas Hernando Kofman

@tomas_hk

7 months

That feeling when you give someone their 50,000th star on github 🌟. Congrats @dify_ai. Amazing repo!

Dify.AI

@dify_ai

7 months

🎉 We’ve just hit 50k stars on GitHub!. ❤️ A huge thanks to our incredible community for being part of this journey. We’re about to unveil something big in v1.0, something we’ve been working on day and night. It’s set to change the game, making Dify more open and accessible than

0

3

13

Tomas Hernando Kofman

@tomas_hk

5 months

Interesting takeaways from @harjtaggar and @sdianahu: "Applications don't want to be beholden to a single model. a lot of companies in the Fall 24 batch have a multi-model architecture to use the best one for the best task."

1

4

13

Tomas Hernando Kofman

@tomas_hk

10 months

By default, Not Diamond maximizes quality above all else. But Not Diamond also allows you make optimized tradeoffs that can drastically lower your costs and latency by routing to smaller models when doing so doesn’t impact quality.

1

2

12

Tomas Hernando Kofman

@tomas_hk

9 days

Our enterprise customers have already begun to see the benefits. At his Sapphire keynote today, SAP’s CTO Philipp Herzig previewed a new Prompt Optimization Service leveraging Not Diamond with the intent of driving accuracy improvements and accelerating engineering throughput.

2

1

13

Tomas Hernando Kofman

@tomas_hk

10 months

We’re a small cracked team (h/t @ilyasut) of researchers, engineers, and veteran ML founders. We’ve published in top AI research journals, grown companies from 0 to tens of millions in revenue, and built for billions of users. Send me a note if you want to join us.

1

12

Tomas Hernando Kofman

@tomas_hk

9 months

❤️❤️❤️

1

12

Tomas Hernando Kofman

@tomas_hk

1 year

5/9.Unlike deterministic routers, you'll notice that notdiamond-0001 doesn't route based on simple categories or domains. Instead, routing decisions are far more fine-grained. Here are some examples of prompts that get routed to either GPT-3.5 or GPT-4:

1

0

11

Tomas Hernando Kofman

@tomas_hk

1 year

Heavy firepower from Alibaba. Not surprising to see SOTA performance on multilingual benchmarks.

Qwen

@Alibaba_Qwen

1 year

🔥Qwen2 has received a great deal of enthusiasm from the community. Qwen2 features five cutting-edge models of varying sizes: Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B (MoE), and Qwen2-72B. These models support 27 languages and have significantly enhanced capabilities in

0

6

12

Tomas Hernando Kofman

@tomas_hk

1 year

We're hosting the world's shortest hackathon next week with @websim_ai—come hang with us! If you haven't messed with yet, block the next hour off and simulate the simulation.

websim

@websim_ai

1 year

Announcing the Websim Hackathon Boogaloo 6/20! 3 hackathons, 1 event:. 1. World’s Shortest Hackathon - 10 minutes, one prompt.2. Pass the Baton Hackathon - each team member gets 1 iterative prompt.3. One Hour Wonder Hackathon - no gimmicks, 1 hour make anything

0

8

10

Tomas Hernando Kofman

@tomas_hk

8 months

The architecture for RoRF is based on a Random Forest Classifier and supports both open-source embedding models like Jina which you can run locally, or closed-source embedding models like Voyage or OpenAI for lower compute requirements.

1

10

Tomas Hernando Kofman

@tomas_hk

1 year

A gem from @benthompson's interview with @natfriedman and @danielgross today: . The most important agent in AI is going to be the local agent that decides where to dispatch jobs. It doesn’t need to be big, it doesn’t need to be complex, but it is at the linchpin and it will.

0

3

11

Tomas Hernando Kofman

@tomas_hk

1 year

6/9.If you’re using GPT-4, notdiamond-0001 will lead to an immediate and drastic reduction in your inference costs and latency without any degradation in quality. Or, if you’re using GPT-3.5, you can enjoy a much higher response quality without significantly increasing your bill.

1

0

10

Tomas Hernando Kofman

@tomas_hk

8 months

@bindureddy This is awesome! Hmu if you want to integrate Not Diamond—we support routing b/w 40 models through our API and can hyper-personalize routing in real-time based on user feedback (e.g . We also have several oss routers you can integrate.

0

11

Tomas Hernando Kofman

@tomas_hk

10 months

@chipro Super valuable writeup. Appreciated the section on routing, aligns with a lot of what I've been seeing as well.

0

6

Tomas Hernando Kofman

@tomas_hk

9 months

So cool to see this review of our model-routing chatbot by @saj_adibs ! Check it out:

3

2

11

Tomas Hernando Kofman

@tomas_hk

1 year

7/9.notdiamond-0001 is just the first step. We’ll soon be releasing the ability to dynamically route to Gemini, Claude, Mistral, Llama, Cohere, and many more models, as well as your own fine-tuned models and custom workflows, agents, RAG applications, and chains.

2

0

10

Tomas Hernando Kofman

@tomas_hk

10 months

New paper from Stanford: even on narrow domains like Math, LLM quality is "a function of which skills we choose to evaluate.". Table 1: LLM performance on the whole benchmark. Table 2: How widely models vary on individual sub-skills—GPT-4o drops from first to last place!

2

3

9

Tomas Hernando Kofman

@tomas_hk

5 months

What do 1,629,706 human feedback ratings on AI model responses from real-world users tell us about which LLM is the best?. Results in thread 👇.

1

4

10

Tomas Hernando Kofman

@tomas_hk

1 year

Amazing work from @JunlinWang3, @jueseph, @ben_athi, @ce_zhang, and @james_y_zou. Reminds me a bit of emergent communication research from a few years ago in which agents with diverse perceptual capabilities learn to communicate and benefit from each others' abilities.

Together AI

@togethercompute

1 year

Mixture of Agents—a framework that leverages the collective strengths of multiple LLMs. Each layer contains multiple agents that refine responses using outputs from the preceding layer. Together MoA achieves a score of 65.1% on AlpacaEval 2.0.

0

3

9

Tomas Hernando Kofman

@tomas_hk

11 months

is the most non-teleological AI tool that exists today. It doesn't solve any "problem". Instead, it lets you see the world through a new lens, and from that vantage point opens the possibility for something that couldn't have existed before.

1

0

9

Tomas Hernando Kofman

@tomas_hk

8 months

Will be livestreaming a conversation with @MatthewBerman in 30 minutes, please join us! Would love to see you there:

4

9

Tomas Hernando Kofman

@tomas_hk

8 months

As our chat app began blowing up, inference costs grew. So we dogfooded Not Diamond to save 51% on our LLM costs ($750K) with one line of code. By enabling cost tradeoffs, we send queries to cheaper models when doing so wouldn't affect quality. Details in the link below 👇

2

3

10

Tomas Hernando Kofman

@tomas_hk

6 months

Interesting to see on the o1 model card that across OpenAI's four frontier models, there's huge variation on which model performs best across different agentic tasks:

0

2

10

Tomas Hernando Kofman

@tomas_hk

6 months

LLMs struggle with reasoning for the same reason that they're bad at ASCII:. Drawing is 2d, transformers are 1d (next token). Humans reason spatially (this is why we have memory palaces). Reasoning with transformers is like drawing an ASCII Mona Lisa when all you can see is the

1

3

10

Tomas Hernando Kofman

@tomas_hk

6 months

Not Diamond is now available on AWS Marketplace(!), so you can now integrate AI model routing with Not Diamond into your existing AWS infrastructure. I'll be at re:Invent this week—send me a note if you'd like to meet up and talk about improving your multi-model workflows.

1

4

9

Tomas Hernando Kofman

@tomas_hk

4 months

Join me tomorrow at 9am PT for this conversation with Dagster on LLM routing and data orchestration!.

Dagster

@dagster

4 months

Are you ready to get into the weeds on AI development best practices to reduce costs and improve accuracy? Join us on a Deep Dive on February 11 at 9 a.m. PT with the Not Diamond Team. We'll cover:.- The Not Diamond-Dagster integration.- Why you should be leveraging AI model

0

3

9

Tomas Hernando Kofman

@tomas_hk

4 months

Had such a fun time galaxy braining on model routing with @MarkMoyou on his AI podcast! Check out the episode here:

0

2

8

Tomas Hernando Kofman

@tomas_hk

9 days

@swyx @BEBischof @ankrgyl @latentspacepod Prompt adaptation improves model routing but can also be used independently of it. I would argue the #1 sign your company is serious about AI is that you've invested in the data-driven infrastructure to evaluate and optimize across any model instead of building on gut instinct.

0

3

9

Tomas Hernando Kofman

@tomas_hk

29 days

The TTL for AI models is now <1 year:

0

1

9

Tomas Hernando Kofman

@tomas_hk

7 months

Routing between agents.Agent workflows require multiple specialized agents. But how do you send the right inputs to the right agent?. We wrote a guide on how to route inputs to the correct agent with 65% lower latency than 4o can (and higher accuracy):.

0

5

8

Tomas Hernando Kofman

@tomas_hk

4 months

@dark_sando Hi! Not Diamond supports routing between r1 and non-reasoning models out of the box. Check it out:

1

5

9

Tomas Hernando Kofman

@tomas_hk

8 months

@lexfridman Daily user of Cursor—cool to see the question on routing. We've built a SOTA router that determines when to send queries to o1 vs when to use a weaker model: oss option also available ❤️. cc @mntruell, @amanrsanger, @sualehasif996, @ArVID220u.

Tomas Hernando Kofman

@tomas_hk

8 months

o1 is insanely powerful. and insanely expensive: . 60x more expensive than 4o, 1000x more than 4o-mini. And it's not actually better on all domains. We've put together a super simple repo that routes to o1 when it really matters. Watch this to learn how to use it:

0

2

8

Tomas Hernando Kofman

@tomas_hk

10 months

3B Apple model scores "a win rate of more than 50% and a tie rate of 27.4% against GPT-3.5," beats Llama-70B on instruction following, and beats GPT-4T on tool use. Tons of super impressive results. Also, very on brand that the only author is "Apple".

1

3

8

Tomas Hernando Kofman

@tomas_hk

10 months

Love to see Not Diamond discussed in @fpingham's session on how to choose the right LLM model ❤️.

Francisco

@fpingham

10 months

Upcoming Pampa Learning #6 (in English): Choosing a model. We will cover a few key topics when building an LLM-native application:. - how to choose which model to start with?.- how to decide if/when you need to change the model or finetune your own?.

0

2

8

Tomas Hernando Kofman

@tomas_hk

5 months

"There is no single best model or paradigm. Different solutions excel in different scenarios." .• SLMs were surprisingly strong, performing similarly or better than larger counterparts.• oss is off-the-shelf viable for Text2Json and function calling.

1

2

8

Tomas Hernando Kofman

@tomas_hk

11 months

Quantum fashion store with infinite scroll by @neilsonks and @notdavidhuang:

1

2

8

Tomas Hernando Kofman

@tomas_hk

3 months

@abacaj @mayfer check out what we're building at

0

2

7

Tomas Hernando Kofman

@tomas_hk

9 months

Cursor is my daily IDE—this is such a sweet and simple idea. Cool to see Not Diamond in the repo!.

0

7

Tomas Hernando Kofman

@tomas_hk

11 months

Amazing work from @basetenco — thoughtful computational optimization for multi-model workflows is a deep need and will only become necessary as more and more applications leverage multiple models.

Baseten

@basetenco

11 months

0

2

7

Tomas Hernando Kofman

@tomas_hk

7 months

I'm at GenAI Summit this weekend! Hmu if you'll be there and we can hang.

GenAI Summit

@genaisummitsf

8 months

🎉 Exciting News! Tomás Hernando Kofman(@tomas_hk ), Co-Founder of will be speaking at GENAI Summit Silicon Valley 2024! 🎉. Tomás co-founded Not Diamond, a startup focused on AI model optimization and routing queries to the best AI models. Backed by

0

2

7

Tomas Hernando Kofman

@tomas_hk

1 year

One of the coolest parts of the SWE-agent paper is the pass@k metrics. Huge improvements.

1

5

7