Emily Ekdahl
@emekdahl
Followers
227
Following
1K
Media
40
Statuses
554
AI/LLM Ops Engineer
Chicago, IL
Joined October 2012
Don’t sleep on @WisprFlow ! I’m so grateful to @IsaacFlath and @HamelHusain for recommending this productivity hack! Reduces friction while promoting and brainstorming!
0
0
0
✍️new blog post: on the consumption of AI-generated content at scale
8
28
161
Why Your AI Music Prompts Aren’t Working (And What To Do Instead) What I learned trying to make an album inspired by the @aiDotEngineer code conference @sunomusic
https://t.co/wCVAY8OAfC
emekdahl.medium.com
Photo by Siednji Leon on Unsplash
0
0
0
After repeating myself for the nth time on how to build product evals, I figured I should write it down. It's just three basic steps(i) labeling a small dataset, (ii) aligning LLM evaluators, and (iii) running the eval harness with each config change. https://t.co/HjUL3yZQPk
eugeneyan.com
Label some data, align LLM-evaluators, and run the eval harness with each change.
6
28
199
Do you love Claude's plan-mode question asker and wish you could bring it with you everywhere? Add `AskUserQuestion` to allowed-tools in a .claude/command then explicitly tell Claude to use it. > Use the AskUserQuestion tool to ask the user... Here's me using it for a PR
13
23
274
Six months ago I was but a test prompt. Today, I can file your taxes. https://t.co/0GPC8nJnre.
1
5
6
A good friend and colleague told me at the start of building in AI, that a true agent is ⚡ 'lightning in a bottle'. And right now we have lightning. ↓ True human and agent collaboration. We can't wait to introduce a new way of consumer accounting very soon.
1
3
4
Scenarios by @LangWatchAI is saving my life while evaluating #AI multi-turn conversations 🙌
0
0
0
SpecFlow changed how I build with AI agents. Huge thanks to the @specstoryai team, @isaac_flath, and @intellectronica for introducing me to this game-changing workflow. 🚀 https://t.co/YwsFr0PXEm
specflow.com
Use the open Specflow method to turn intent into software through structured planning and iterative execution with software agents.
1
4
7
When you deploy an LLM-as-a-Judge, you’re shipping a classifier into production. Each new version is a hypothesis about how the model interprets the world. It’s data science, just expressed in natural language. Here’s what that looked like for a recent client project where we
8
14
130
🚀 DeepSeek-OCR — the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8.5 for day-0 model support. 🧠 Compresses visual contexts up to 20× while keeping
52
377
3K
In an AI world, it’s easy to avoid effort. That’s why students need teachers more—to push them toward the hard things now that shape who they become later. #Education #AI #TeachingMatters #FutureOfLearning
17
90
394
Can #GPT5 actually do taxes? We ran it on @ColumnTax’s TaxCalcBench. Full return: 30.4% strict ✅ | 53.4% lenient 🤔 Line items: 80.6% strict | 85.4% lenient 📊 Line accuracy is strong. Whole-return accuracy? Not IRS-ready yet. https://t.co/fXt0jIkMsi
#TaxCalcBench #AI #tax
github.com
GPT-5 support with results! four runs, pass at k of 1 added debugging support for litellm added gpt-5 to model config **SUMMARY TABLE** Model Name Thinking Test...
0
0
3
The most useful bit of my system prompt is this If I provide any feedback on how to improve something, suggest improvements to my prompt that I can make to avoid similar mistakes in the future. Put any prompt improvement suggestions in separate <prompt-improvement> tags.
7
12
271
Can't say enough good things about the AI evals course run by @sh_reya and @HamelHusain! It is informed by real production work across dozens of clients. The opportunities and challenges resonate with my experience evaluating & deploying production AI products.
0
1
7
2023 vs. 2024 2023: Vector search is all you need 2024: Evaluate vector/hybrid search against BM25 baseline 2023: „Look, this prompt works!“ 2024: Prompt optimization with DSPy 2023: … 2024: Evals with AI-as-a-judge We‘ve come a long way, but we’re still so early.
0
17
112
Getting employees to work hard and deliver really isn't a matter of mandating work-from-office and long hours. It's a matter of incentives and ownership. People do their best when they work on interesting problems, in a self-directed manner, and get rewarded for success. This
73
179
2K
Insecure leaders ridicule others. Secure leaders laugh at themselves. The ability to make fun of yourself opens the door to candor. It’s a mark of humility and a catalyst for learning. Great leaders take their work seriously, but they don't take themselves too seriously.
29
617
2K