Tomas Hernando Kofman Profile
Tomas Hernando Kofman

@tomas_hk

Followers
2K
Following
649
Media
86
Statuses
500

¬◇

Joined April 2012
Don't wanna be here? Send us removal request.
@tomas_hk
Tomas Hernando Kofman
9 days
Today we’re launching Prompt Adaptation, a state-of-the-art agentic system that automatically adapts prompts across LLMs. Prompt Adaptation outperforms all other methods and significantly improves accuracy over manual prompt engineering, saving you thousands of hours per year.
21
71
639
@tomas_hk
Tomas Hernando Kofman
1 year
1/9. We’re open-sourcing a lightweight preview of our router that sends queries to either GPT-3.5 or GPT-4, maximizing accuracy while drastically reducing costs and latency. Routing to Gemini, Mistral, Claude, and Llama coming soon. A few quick points:.
13
63
482
@tomas_hk
Tomas Hernando Kofman
10 months
Today we're releasing Not Diamond…. The world’s most powerful AI model router. Not Diamond maximizes LLM output quality by automatically recommending the best LLM on every request at lower cost and latency. And it takes <5m to set up. Watch this to see how to start using it:
23
76
394
@tomas_hk
Tomas Hernando Kofman
8 months
Today we're open-sourcing RoRF (Routing on Random Forests), a pairwise model router that beats all closed and open-source approaches, along with 12 pre-trained model routers:. Hugging Face: Github: Blog:
7
56
294
@tomas_hk
Tomas Hernando Kofman
10 months
This is the last chatbot you’ll ever need. Yesterday, @mckaywrigley built an oss Not Diamond-powered chat app. We loved it. So today we’re releasing a hosted version. Get the best LLM on every message and hyper-personalize routing to your preferences with feedback. Watch how:
12
24
134
@tomas_hk
Tomas Hernando Kofman
11 months
Standing-room only @websim_ai hackathon last night—so happy we could do this and thanks to everyone for coming and hanging out. Some of the coolest highlights in thread:
Tweet media one
Tweet media two
5
10
58
@tomas_hk
Tomas Hernando Kofman
3 months
We're hiring engineers and researchers to build the future of multi-model AI infrastructure. We're a small, technically elite team backed by Jeff Dean, Julien Chaumond, Ion Stoica, + more. And we guarantee a $50K investment in your next startup for every year you work with us.
4
13
58
@tomas_hk
Tomas Hernando Kofman
7 months
You can now use Not Diamond in Raycast to get recommended the best LLM on every message you send. It's super simple to set up—and I love having this just a shortcut away. s/o to @raycastapp and @thomaspaulmann for building such an incredible interface 🩶Set up link below 👇
5
6
50
@tomas_hk
Tomas Hernando Kofman
7 months
New top rankings for models on over 500,000 samples of real-world human preference data:. 1. Claude.2. Mistral(!).3. Perplexity . Claude 3.5 Sonnet is preferred by users over all other models. Surprisingly however, the next-best models are Mistral Large 2 and Perplexity(!),
Tweet media one
Tweet media two
Tweet media three
Tweet media four
3
10
40
@tomas_hk
Tomas Hernando Kofman
1 year
Hiring a founding engineer: Competitive salary, generous equity, robust benefits, and a guaranteed investment in a future startup if you ever decide to build something yourself down the line.
2
8
30
@tomas_hk
Tomas Hernando Kofman
9 months
Since launching Not Diamond chat a few weeks ago, we've had over 100,000 messages sent with 37.6% of messages receiving feedback(‼️) . First image: rankings across all models (sonnet > gpt-4o).Second image: my personal ranking (perplexity 💜). Reply to get your personal ranking.
Tweet media one
Tweet media two
4
6
32
@tomas_hk
Tomas Hernando Kofman
5 months
You can now use Not Diamond in @OpenRouterAI! Access 293 top models with a single API key and automatically get routed to the best one for your use case. OpenRouter is one of the most important AI dev platforms out there. Excited to see what folks build! 🙌 @xanderatallah.
@OpenRouterAI
OpenRouter
5 months
5. New Auto Router. Will post more details in the coming weeks, but you can get a sneak peak here: 🧠 It now routes to 19 different models, optimizing for quality, powered by Not Diamond. 💬 The chatroom now more clearly shows which model was used. The.
1
8
31
@tomas_hk
Tomas Hernando Kofman
10 months
Not Diamond sets new SOTA standards on major benchmarks like GPQA, Arena Hard, MMLU, and HumanEval. We achieve this by ensembling every other model into a meta-model that learns when to call each LLM. Routing opens a new frontier for the performance and generalizability of LLMs.
Tweet media one
2
6
31
@tomas_hk
Tomas Hernando Kofman
8 months
We're hiring exceptional founding team members for Not Diamond:. • Small, elite technical team over-indexed on emotional intelligence.• ($50K*years at Not Diamond) investment in your next startup.• $10K for a successful referral. JDs in thread, email me at t5@notdiamond.ai.
2
7
27
@tomas_hk
Tomas Hernando Kofman
8 months
o1 is insanely powerful. and insanely expensive: . 60x more expensive than 4o, 1000x more than 4o-mini. And it's not actually better on all domains. We've put together a super simple repo that routes to o1 when it really matters. Watch this to learn how to use it:
2
6
23
@tomas_hk
Tomas Hernando Kofman
9 days
The future is multi-model. You wouldn’t rewrite your codebase every week, and you shouldn’t have to rewrite your prompts either. If you want to automate your prompt engineering and save 1000 hours this year, I'd love for you to try it out. Sign up at
1
2
27
@tomas_hk
Tomas Hernando Kofman
1 year
3/9.The world won't have one single, giant model that everyone sends everything to—instead, there will be many foundation models, millions of fine-tuned variants of those models, and countless custom inference engines running on top of them.
2
3
25
@tomas_hk
Tomas Hernando Kofman
9 months
@reidhoffman on @theallinpod:. “The mistake people make is they think there’s going to be one model to rule them all… You’re going to see networks of models, traffic control, escalation… The multi-model approach is going to be quickly universal.". Networks of computers > big
Tweet media one
2
6
25
@tomas_hk
Tomas Hernando Kofman
1 year
@southpkcommons @OpenAI Spent this weekend at the @southpkcommons / @OpenAI hackathon building Temper, a tool that surfaces divisive tweets on contentious political subjects and drafts replies using evidence-based de-escalation techniques to reduce polarization: With incredible.
4
6
23
@tomas_hk
Tomas Hernando Kofman
1 year
9/9.We believe a world of diverse models is not only a better future for AI, but a safer one as well. We're excited to be open-sourcing notdiamond-0001 and we're looking forward to seeing what everyone builds with it!.
1
1
23
@tomas_hk
Tomas Hernando Kofman
9 days
Prompt Adaptation is a black box prompt optimization technique (BPO) similar to that pioneered by DSPy. It takes your original prompt and a set of golden inputs and outputs from your application and then iterates over thousands of potential prompts to find the best prompt for
Tweet media one
1
1
24
@tomas_hk
Tomas Hernando Kofman
1 year
2/9.◇ We outperform GPT-4 by a factor of 1.51x when used as a router. ◇ We determine which model to call in <10ms. ◇ Available on HF or for free through our API, where we also continuously monitor OpenAI for outages 24/7 and reroute to a fallback model of your choice.
1
0
21
@tomas_hk
Tomas Hernando Kofman
9 days
Prompt Adaptation addresses this critical issue by automatically adapting prompts across models, replacing 25+ hrs of manual prompt engineering with 30m of background processing. We outperform all other evaluated techniques including Meta DSPy and Bedrock Prompt Optimization.
Tweet media one
1
1
22
@tomas_hk
Tomas Hernando Kofman
10 months
I’m incredibly honored not only to launch Not Diamond today but also to announce our $2.3M pre-seed round led by @defyvc with backing from some of the greatest AI scientists, engineers, and executives on this planet: @JeffDean (Google), @julien_c (Hugging Face), @iamthezack.
1
5
21
@tomas_hk
Tomas Hernando Kofman
7 months
@bindureddy If anyone wants to integrate model routing into their app, we have an API at for our SOTA model router, along with a chatbot that learns your routing preferences in real-time. Backed by Jeff Dean, Julien Chaumond, etc.
1
3
20
@tomas_hk
Tomas Hernando Kofman
10 months
This is sick. @mckaywrigley built a personalized chatbot arena on top of Not Diamond. Fully open-source too, check it out:.
@mckaywrigley
Mckay Wrigley
10 months
Meet AI Router Chat. It’s a personal chatbot arena I made that adapts to your model preferences over time. It uses Not Diamond’s new API to dynamically select the best LLM for a given query. Watch to see how it works - I’m obsessed. GitHub link below!
0
5
19
@tomas_hk
Tomas Hernando Kofman
10 months
This is a huge validation for routing. For any data distribution, no LLM will beat every other on every single query. And not only does routing achieve SOTA, it does so at a much lower cost. For example, Not Diamond beats GPT-4o on MMLU by 1.6% with 29.6% lower costs:
Tweet media one
1
2
19
@tomas_hk
Tomas Hernando Kofman
7 months
Not Diamond is now integrated into @weights_biases Weave 🎉 . LLM routing can boost accuracy by 25% and reduce costs by 10x. Here’s how to train a custom router on your evals with w&b Weave and Not Diamond to route between LLMs:. s/o @l2k @altryne 🙏 🖤.
1
5
19
@tomas_hk
Tomas Hernando Kofman
10 months
New LLMs are released every day with evolving quality, cost, latency, and context windows across domains. Now, you don't have to guess when to use which model. Plus, we make it super easy to train your own router on your own data. Try it here:
1
2
18
@tomas_hk
Tomas Hernando Kofman
8 months
You can now use the Not Diamond chat app to generate images—your own personal chatbot arena for image gen:
Tweet media one
6
4
18
@tomas_hk
Tomas Hernando Kofman
10 months
Getting started with Not Diamond literally takes less than 5m. Go try it now. Let me know what you think. Not Diamond: n.b. This launch tweet was composed entirely without emojis.
3
1
18
@tomas_hk
Tomas Hernando Kofman
7 months
If you want to maximize RAG output quality while also saving tens of thousands of dollars (in a few lines of code)—you can't rely on a single LLM. Here's how to add SOTA model routing into your RAG apps:. Code: App: Built with.
0
4
16
@tomas_hk
Tomas Hernando Kofman
11 months
Cosmic fabric simulator by @cutiekatw:
1
3
16
@tomas_hk
Tomas Hernando Kofman
4 months
How to route between reasoning models like @deepseek_ai R1 and regular models like Claude 3.5 Sonnet 👇 This works out of the box. This is how you get all the reasoning firepower of R1 without burning up latency on every request!
0
3
17
@tomas_hk
Tomas Hernando Kofman
5 months
In my op-ed with a former OpenAI exec, we argue that LLMs are becoming fuzzy commodities. Around a core of abilities, models are commoditizing—leading to a race to the bottom. But at the edges, models are specializing. Both of these point to a multi-model future for 2025 🧵
Tweet media one
1
7
17
@tomas_hk
Tomas Hernando Kofman
9 days
I’m also excited to announce additional funding from @defyvc, @IBM, Fund, @MyriadVC, @deepwatermgmt, @dnxventures, and @AmbushCapital to continue building a world class team (have never worked with a better team in my life), and it’s an honor to have such.
1
3
17
@tomas_hk
Tomas Hernando Kofman
10 months
Go try it now:. Chat app: API: We're #1 on Product Hunt right now:
1
2
17
@tomas_hk
Tomas Hernando Kofman
10 months
VCs often ask me to describe how we're different than our competitors. The honest answer is that I'm so happy that so many smart teams and researchers are working on this problem. So I put together an awesome-ai-model-routing list to make it easier to find them:
Tweet media one
2
3
17
@tomas_hk
Tomas Hernando Kofman
3 months
Really nice technical blog post from @JungMinki7 surveying the model routing landscape, from Automix to RouteLLM to Not Diamond. Cool to see we helped Minki cut his trip-planning AI's cost by 50% and latency by 30%:
Tweet media one
1
3
12
@tomas_hk
Tomas Hernando Kofman
10 months
@OriolVinyalsML Amazing work @OriolVinyalsML ! love this.
0
0
4
@tomas_hk
Tomas Hernando Kofman
2 years
After reading Outlive by @PeterAttiaMD, I struggled to think of anyone I *wouldn't* recommend it to—it's essential reading on living longer & better. But at 500 pages, it’s a big time investment. That's why I wrote summarizing all key points & suggestions.
Tweet media one
1
2
14
@tomas_hk
Tomas Hernando Kofman
1 year
4/9.We’ve talked to hundreds of developers building on top of LLMs. For nearly everyone, model routing sucks. Teams are using heuristics to route deterministically with if/else statements, regex expressions, and handwritten prompts. We decided there had to be a better way.
1
1
16
@tomas_hk
Tomas Hernando Kofman
10 months
We’re building the “meta-model” of AI. Robust routing infrastructure will be critical to effective AI, and by shifting the future towards networks of specialized models rather using a single giant monolithic model for everything, we can create a much safer world to live in.
1
1
16
@tomas_hk
Tomas Hernando Kofman
6 months
Not Diamond is now available in @langflow_ai by @DataStax to enable developers to access LLM routing in their no-code workflows!. With LLM routing you can maximize quality 📈, save costs 💰, and reduce latency 🏎️, with minimal effort. See how 👇
2
7
16
@tomas_hk
Tomas Hernando Kofman
10 months
Critically, Not Diamond is not a proxy so all requests go out client-side. Our router's inference speed is blazing fast (<100ms), and you can enable fuzzy data hashing on our API for increased data privacy, or seamlessly deploy to your own infrastructure for maximum security.
Tweet media one
1
2
16
@tomas_hk
Tomas Hernando Kofman
10 months
Another feature I’m excited about is the ability to jointly-optimize prompts and model routing recommendations. Not Diamond integrates with auto-prompt frameworks like DSPy and SAMMO so you can translate prompts across models and always call the best model with the best prompt.
1
2
16
@tomas_hk
Tomas Hernando Kofman
9 days
Nine months ago, we released the world’s most powerful model router, outperforming every foundation model on every major benchmark. But choosing the right model is only half the battle—we also have to know how to prompt them. Doing this manually is extremely time-consuming.
1
2
17
@tomas_hk
Tomas Hernando Kofman
10 months
Not Diamond works out of the box with no setup, but it’s even more powerful when trained on your data. You literally just upload a dataset with your LLM inputs and eval scores and get back your own custom model router. Here’s how simple it is to train a router with Not Diamond:
Tweet media one
1
2
14
@tomas_hk
Tomas Hernando Kofman
10 months
30 years ago, Yahoo tried to build the everything website. Google took the opposite bet: that the future of the internet would be incredibly fragmented. So they built a router—from search queries to websites—and became the single “meta-website” for the entire internet.
1
2
15
@tomas_hk
Tomas Hernando Kofman
7 months
LLMs are incredibly powerful on about 80% of tasks—it’s the last 20% that prevents a project from reaching production. The “human-in-the-loop” approach is designed to close this gap by leveraging a human expert to review and correct any mistakes the LLM makes. But how do you
Tweet media one
1
3
15
@tomas_hk
Tomas Hernando Kofman
10 months
Small, specialized models can outperform larger models on narrow domains. Routing gives specialized models the robustness of general ones. This is not only more computationally efficient—we get huge interpretability and safety benefits as a free bonus. This is our trojan horse.
1
2
15
@tomas_hk
Tomas Hernando Kofman
8 months
When routing between two strong models, RoRF achieves higher accuracy than either individual model at significant cost reductions. When routing between a strong and a weak model pair, we outperform other routing approaches at a lower cost.
Tweet media one
Tweet media two
1
1
14
@tomas_hk
Tomas Hernando Kofman
5 months
@karpathy @EverydayAI_ Andrej, you should check out <- it's a general framework that can take any evaluation data over any set of models for any set of inputs and learn an optimal recommendation algorithm for it to predictively select the best model for each input.
1
2
14
@tomas_hk
Tomas Hernando Kofman
10 months
We train custom routers for each benchmark and report on test splits. While benchmarks only weakly correlate to real-world use cases, our results powerfully illustrate how for any distribution of data, Not Diamond can route between LLMs to outperform each of them individually.
1
2
14
@tomas_hk
Tomas Hernando Kofman
4 months
1/9.Don’t put all your eggs in one LLM basket. The stock predictions about R1 don’t matter, and neither do the conspiracy theories. Even the model itself doesn’t really matter. DeepSeek R1 really matters because it means the number of frontier AI models is about to explode.
1
7
14
@tomas_hk
Tomas Hernando Kofman
6 months
Not Diamond is now live on @Zapier! Watch this to learn how to build a Slack chatbot in under 3 minutes that dynamically routes between @AnthropicAI's Sonnet and Haiku to maximize quality while significantly reducing costs:
1
4
13
@tomas_hk
Tomas Hernando Kofman
8 months
Not Diamond now supports routing to custom models. You can route between proprietary models like GPT-4o, private fine-tunes, agents. this is something a ton of people have asked for. Watch me quickly train a model router from scratch with both proprietary and custom LLMs:
1
4
13
@tomas_hk
Tomas Hernando Kofman
7 months
That feeling when you give someone their 50,000th star on github 🌟. Congrats @dify_ai. Amazing repo!
Tweet media one
Tweet media two
@dify_ai
Dify.AI
7 months
🎉 We’ve just hit 50k stars on GitHub!. ❤️ A huge thanks to our incredible community for being part of this journey. We’re about to unveil something big in v1.0, something we’ve been working on day and night. It’s set to change the game, making Dify more open and accessible than
Tweet media one
0
3
13
@tomas_hk
Tomas Hernando Kofman
5 months
Interesting takeaways from @harjtaggar and @sdianahu: "Applications don't want to be beholden to a single model. a lot of companies in the Fall 24 batch have a multi-model architecture to use the best one for the best task."
1
4
13
@tomas_hk
Tomas Hernando Kofman
10 months
By default, Not Diamond maximizes quality above all else. But Not Diamond also allows you make optimized tradeoffs that can drastically lower your costs and latency by routing to smaller models when doing so doesn’t impact quality.
1
2
12
@tomas_hk
Tomas Hernando Kofman
9 days
Our enterprise customers have already begun to see the benefits. At his Sapphire keynote today, SAP’s CTO Philipp Herzig previewed a new Prompt Optimization Service leveraging Not Diamond with the intent of driving accuracy improvements and accelerating engineering throughput.
2
1
13
@tomas_hk
Tomas Hernando Kofman
10 months
We’re a small cracked team (h/t @ilyasut) of researchers, engineers, and veteran ML founders. We’ve published in top AI research journals, grown companies from 0 to tens of millions in revenue, and built for billions of users. Send me a note if you want to join us.
1
1
12
@tomas_hk
Tomas Hernando Kofman
9 months
❤️❤️❤️
Tweet media one
1
1
12
@tomas_hk
Tomas Hernando Kofman
1 year
5/9.Unlike deterministic routers, you'll notice that notdiamond-0001 doesn't route based on simple categories or domains. Instead, routing decisions are far more fine-grained. Here are some examples of prompts that get routed to either GPT-3.5 or GPT-4:
Tweet media one
1
0
11
@tomas_hk
Tomas Hernando Kofman
1 year
Heavy firepower from Alibaba. Not surprising to see SOTA performance on multilingual benchmarks.
Tweet media one
@Alibaba_Qwen
Qwen
1 year
🔥Qwen2 has received a great deal of enthusiasm from the community. Qwen2 features five cutting-edge models of varying sizes: Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B (MoE), and Qwen2-72B. These models support 27 languages and have significantly enhanced capabilities in
Tweet media one
0
6
12
@tomas_hk
Tomas Hernando Kofman
1 year
We're hosting the world's shortest hackathon next week with @websim_ai—come hang with us! If you haven't messed with yet, block the next hour off and simulate the simulation.
@websim_ai
websim
1 year
Announcing the Websim Hackathon Boogaloo 6/20! 3 hackathons, 1 event:. 1. World’s Shortest Hackathon - 10 minutes, one prompt.2. Pass the Baton Hackathon - each team member gets 1 iterative prompt.3. One Hour Wonder Hackathon - no gimmicks, 1 hour make anything
0
8
10
@tomas_hk
Tomas Hernando Kofman
8 months
The architecture for RoRF is based on a Random Forest Classifier and supports both open-source embedding models like Jina which you can run locally, or closed-source embedding models like Voyage or OpenAI for lower compute requirements.
Tweet media one
1
1
10
@tomas_hk
Tomas Hernando Kofman
1 year
A gem from @benthompson's interview with @natfriedman and @danielgross today: . The most important agent in AI is going to be the local agent that decides where to dispatch jobs. It doesn’t need to be big, it doesn’t need to be complex, but it is at the linchpin and it will.
0
3
11
@tomas_hk
Tomas Hernando Kofman
1 year
6/9.If you’re using GPT-4, notdiamond-0001 will lead to an immediate and drastic reduction in your inference costs and latency without any degradation in quality. Or, if you’re using GPT-3.5, you can enjoy a much higher response quality without significantly increasing your bill.
1
0
10
@tomas_hk
Tomas Hernando Kofman
8 months
@bindureddy This is awesome! Hmu if you want to integrate Not Diamond—we support routing b/w 40 models through our API and can hyper-personalize routing in real-time based on user feedback (e.g . We also have several oss routers you can integrate.
0
0
11
@tomas_hk
Tomas Hernando Kofman
10 months
@chipro Super valuable writeup. Appreciated the section on routing, aligns with a lot of what I've been seeing as well.
0
0
6
@tomas_hk
Tomas Hernando Kofman
9 months
So cool to see this review of our model-routing chatbot by @saj_adibs ! Check it out:
3
2
11
@tomas_hk
Tomas Hernando Kofman
1 year
7/9.notdiamond-0001 is just the first step. We’ll soon be releasing the ability to dynamically route to Gemini, Claude, Mistral, Llama, Cohere, and many more models, as well as your own fine-tuned models and custom workflows, agents, RAG applications, and chains.
2
0
10
@tomas_hk
Tomas Hernando Kofman
10 months
New paper from Stanford: even on narrow domains like Math, LLM quality is "a function of which skills we choose to evaluate.". Table 1: LLM performance on the whole benchmark. Table 2: How widely models vary on individual sub-skills—GPT-4o drops from first to last place!
Tweet media one
Tweet media two
2
3
9
@tomas_hk
Tomas Hernando Kofman
5 months
What do 1,629,706 human feedback ratings on AI model responses from real-world users tell us about which LLM is the best?. Results in thread 👇.
1
4
10
@tomas_hk
Tomas Hernando Kofman
1 year
Amazing work from @JunlinWang3, @jueseph, @ben_athi, @ce_zhang, and @james_y_zou. Reminds me a bit of emergent communication research from a few years ago in which agents with diverse perceptual capabilities learn to communicate and benefit from each others' abilities.
@togethercompute
Together AI
1 year
Mixture of Agents—a framework that leverages the collective strengths of multiple LLMs. Each layer contains multiple agents that refine responses using outputs from the preceding layer. Together MoA achieves a score of 65.1% on AlpacaEval 2.0.
Tweet media one
0
3
9
@tomas_hk
Tomas Hernando Kofman
11 months
is the most non-teleological AI tool that exists today. It doesn't solve any "problem". Instead, it lets you see the world through a new lens, and from that vantage point opens the possibility for something that couldn't have existed before.
1
0
9
@tomas_hk
Tomas Hernando Kofman
8 months
Will be livestreaming a conversation with @MatthewBerman in 30 minutes, please join us! Would love to see you there:
4
4
9
@tomas_hk
Tomas Hernando Kofman
8 months
As our chat app began blowing up, inference costs grew. So we dogfooded Not Diamond to save 51% on our LLM costs ($750K) with one line of code. By enabling cost tradeoffs, we send queries to cheaper models when doing so wouldn't affect quality. Details in the link below 👇
Tweet media one
2
3
10
@tomas_hk
Tomas Hernando Kofman
6 months
Interesting to see on the o1 model card that across OpenAI's four frontier models, there's huge variation on which model performs best across different agentic tasks:
Tweet media one
0
2
10
@tomas_hk
Tomas Hernando Kofman
6 months
LLMs struggle with reasoning for the same reason that they're bad at ASCII:. Drawing is 2d, transformers are 1d (next token). Humans reason spatially (this is why we have memory palaces). Reasoning with transformers is like drawing an ASCII Mona Lisa when all you can see is the
Tweet media one
1
3
10
@tomas_hk
Tomas Hernando Kofman
6 months
Not Diamond is now available on AWS Marketplace(!), so you can now integrate AI model routing with Not Diamond into your existing AWS infrastructure. I'll be at re:Invent this week—send me a note if you'd like to meet up and talk about improving your multi-model workflows.
Tweet media one
Tweet media two
1
4
9
@tomas_hk
Tomas Hernando Kofman
4 months
Join me tomorrow at 9am PT for this conversation with Dagster on LLM routing and data orchestration!.
@dagster
Dagster
4 months
Are you ready to get into the weeds on AI development best practices to reduce costs and improve accuracy? Join us on a Deep Dive on February 11 at 9 a.m. PT with the Not Diamond Team. We'll cover:.- The Not Diamond-Dagster integration.- Why you should be leveraging AI model
Tweet media one
0
3
9
@tomas_hk
Tomas Hernando Kofman
4 months
Had such a fun time galaxy braining on model routing with @MarkMoyou on his AI podcast! Check out the episode here:
0
2
8
@tomas_hk
Tomas Hernando Kofman
9 days
@swyx @BEBischof @ankrgyl @latentspacepod Prompt adaptation improves model routing but can also be used independently of it. I would argue the #1 sign your company is serious about AI is that you've invested in the data-driven infrastructure to evaluate and optimize across any model instead of building on gut instinct.
0
3
9
@tomas_hk
Tomas Hernando Kofman
29 days
The TTL for AI models is now <1 year:
Tweet media one
0
1
9
@tomas_hk
Tomas Hernando Kofman
7 months
Routing between agents.Agent workflows require multiple specialized agents. But how do you send the right inputs to the right agent?. We wrote a guide on how to route inputs to the correct agent with 65% lower latency than 4o can (and higher accuracy):.
Tweet media one
0
5
8
@tomas_hk
Tomas Hernando Kofman
4 months
@dark_sando Hi! Not Diamond supports routing between r1 and non-reasoning models out of the box. Check it out:
1
5
9
@tomas_hk
Tomas Hernando Kofman
8 months
@lexfridman Daily user of Cursor—cool to see the question on routing. We've built a SOTA router that determines when to send queries to o1 vs when to use a weaker model: oss option also available ❤️. cc @mntruell, @amanrsanger, @sualehasif996, @ArVID220u.
@tomas_hk
Tomas Hernando Kofman
8 months
o1 is insanely powerful. and insanely expensive: . 60x more expensive than 4o, 1000x more than 4o-mini. And it's not actually better on all domains. We've put together a super simple repo that routes to o1 when it really matters. Watch this to learn how to use it:
0
2
8
@tomas_hk
Tomas Hernando Kofman
10 months
3B Apple model scores "a win rate of more than 50% and a tie rate of 27.4% against GPT-3.5," beats Llama-70B on instruction following, and beats GPT-4T on tool use. Tons of super impressive results. Also, very on brand that the only author is "Apple".
Tweet media one
1
3
8
@tomas_hk
Tomas Hernando Kofman
10 months
Love to see Not Diamond discussed in @fpingham's session on how to choose the right LLM model ❤️.
@fpingham
Francisco
10 months
Upcoming Pampa Learning #6 (in English): Choosing a model. We will cover a few key topics when building an LLM-native application:. - how to choose which model to start with?.- how to decide if/when you need to change the model or finetune your own?.
0
2
8
@tomas_hk
Tomas Hernando Kofman
5 months
"There is no single best model or paradigm. Different solutions excel in different scenarios." .• SLMs were surprisingly strong, performing similarly or better than larger counterparts.• oss is off-the-shelf viable for Text2Json and function calling.
1
2
8
@tomas_hk
Tomas Hernando Kofman
11 months
Quantum fashion store with infinite scroll by @neilsonks and @notdavidhuang:
Tweet media one
1
2
8
@tomas_hk
Tomas Hernando Kofman
3 months
@abacaj @mayfer check out what we're building at
0
2
7
@tomas_hk
Tomas Hernando Kofman
9 months
Cursor is my daily IDE—this is such a sweet and simple idea. Cool to see Not Diamond in the repo!.
0
0
7
@tomas_hk
Tomas Hernando Kofman
11 months
Amazing work from @basetenco — thoughtful computational optimization for multi-model workflows is a deep need and will only become necessary as more and more applications leverage multiple models.
@basetenco
Baseten
11 months
0
2
7
@tomas_hk
Tomas Hernando Kofman
7 months
I'm at GenAI Summit this weekend! Hmu if you'll be there and we can hang.
@genaisummitsf
GenAI Summit
8 months
🎉 Exciting News! Tomás Hernando Kofman(@tomas_hk ), Co-Founder of will be speaking at GENAI Summit Silicon Valley 2024! 🎉. Tomás co-founded Not Diamond, a startup focused on AI model optimization and routing queries to the best AI models. Backed by
Tweet media one
0
2
7
@tomas_hk
Tomas Hernando Kofman
1 year
One of the coolest parts of the SWE-agent paper is the pass@k metrics. Huge improvements.
Tweet media one
1
5
7