Mohak Sharma
@mohak__sharma
Followers
615
Following
610
Media
12
Statuses
181
co-founder and ceo @honeyhiveai - hiring!
NYC
Joined June 2020
I'm excited to share we've raised $7.4M Seed from @gkm1 at @InsightPartners, along with @petesoder at @ZeroPrimeVC, @antgoldbloom at @AIXVentures, @flo at @468Capital, and others to bring evals & observability to AI agents! We're also going GA today! 🧵
9
5
34
The @honeyhiveAI team is so excited to be @_odsc AI West today! If you are at the conference, stop by booth #22 to learn more about our best-in-class platform for designing, evaluating, and monitoring AI agents. 🍯🐝🚀
🍯 Build, Test, and Monitor AI Agents with HoneyHive at ODSC AI West! 📍 Meet HoneyHive at Booth #22 during the ODSC AI West! 🔗 Learn more: https://t.co/S1dWWNQFNc
0
2
3
Tracking your eval scores across experiments just got a whole lot easier. Our new Experiments dashboard visualizes metric trends across all your experiments in one view — making it easy to see how changes affect your agent's quality. ✅ Spot performance regressions at a glance
0
2
3
If this interests you, feel free to DM me or apply here:
careers.honeyhive.ai
Join us in making production AI safe, robust and reliable. View open roles at HoneyHive.
0
0
0
One of the things i'm super proud about is the high talent density in our team We've collectively built the first version of Codex CLI agent, built core infra from 0->1->10 at multiple unicorns, and shipped consumer products used by millions!
@honeyhiveai is hiring in NYC 🍎 & SF 🌉 We’re looking for: • SWEs to build our core product and SDKs • FDE to help customers scale AI agents If you’re passionate about your craft, love working with customers, and aren’t afraid to solve complex technical problems then
1
0
1
Are you going to be in SF from Oct 28–30? Then come to @_odsc west to meet the @honeyhiveai Team! We will be at Booth 22 demoing our platform for designing, evaluating, and monitoring AI agents! For anyone working on the next generation of agentic AI, HoneyHive is the
0
1
2
At ScaleUp:AI — now less than two weeks away! — we’re looking beyond copilots. AI is evolving into autonomous Agents: digital teammates that can reason, decide, and execute. Moderated by Managing Director George Mathew, this session will unpack what that shift means for
0
1
1
🚨 Calling all AI engineers in SF 🚨 We are sponsoring the MCP - AI Agents Hackathon this Friday, Sep 19 at the AWS Builder Loft in San Francisco with over $50k in prizes! Sponsors include @anthropicai, @awscloud, @lovable_dev, @redisinc, and many others. Register below 👇
3
4
9
been fun watching this debate from the sidelines :) throwback to our first deck in 2022 - the why has obv changed a lot (eg: no one cares about rlhf anymore) but the core thesis holds - you need evals + observability + a/b testing for any real chance at alignment
@benhylak no dog in this fight but idt it's either or so much as it's you need both? idt people are "bragging" about evals except model providers. when i did recsys/trad ML we had both statsig + offline eval systems that helped us decide what to even put in prod in the first place
1
0
10
Introducing Alerts🔔 Alerts in @honeyhiveai give you real-time monitoring over everything that matters in your agent: ✅ Metric drift - Detect quality degradation over time ✅ Cost spikes - Stay within budget thresholds with usage alerts ✅ Guardrail violations - Monitor safety
0
1
6
As agents become more complex, it's becoming harder than ever to debug and understand what's really happening. Excited to ship something that actually helps with this.
Today we're shipping some major quality-of-life improvements to traces 🎁 🔍 Session Summaries: Unified view of metrics, evals, and feedback across all spans in an agent session. No more jumping between individual spans. ⏱️ Timeline View: Flamegraph visualization to identify
0
0
4
Introducing Role-Based Access Control (RBAC) in @HoneyHiveAI! Built in partnership with our largest financial services and insurance customers, RBAC brings enterprise-grade security and access controls to your critical observability workflows.
1
2
4
Chasing benchmark leaderboards is the easiest way to build an AI product that fails in the real world. Most teams waste 90% of their eval effort on academic benchmarks instead of finding exactly where their system breaks with real users.
1
2
4
Particularly proud to share that a major Fortune 100 enterprise is already logging their PHI data to our HIPAA-compliant cloud. Learn more about our security policies here:
We're excited to announce that HoneyHive has officially achieved SOC 2 Type II, GDPR, and HIPAA compliance! Every LLM interaction today—from user prompts to contextual retrieval or tool-use data—contains potential PII and PHI, which is why we've built our platform with
0
0
4
Any product designers / people with good UX taste up for grading some UX mockups for a vibe coding eval? 💴 Paid opportunity!
0
0
4
Most recommendation systems just show you more of what you already like. The trouble is, we can't always describe the new things we might enjoy. This technique works differently. It learns what you like and don't like, then helps you discover truly new content.
Traditional vector search systems often struggle with nuanced user preferences that are difficult to articulate. Our latest collaboration with @qdrant_engine showcases an iterative optimization technique that dynamically adapts search results based on user preferences.
0
0
4
Evals are a means to an end (better product), not the end itself
Product evals are misunderstood. Many teams think that adding another tool, metric, or llm-as-judge will solve all their problems and save their product. But that just dodges the hard truth and avoids the real work. Here's how to fix your process instead. https://t.co/vG8XE5bait
0
0
8
Because we struggle even with a single agent, never mind the exponential complexity of n agents.
There is something about it. Vendors hype agents, tools and MCP/A2A. But accuracy is so so on benchmarks. IMHO most don’t realize that tools and MCP It’s just stuffing tool descriptions into the context prompt. It’s all just text stuffed into a prompt. No magic sauce.
1
1
6
Funny how this is being resurfaced 4yrs later after @ds3638 built the original version at Microsoft:
github.com
CLI tool that uses Codex to turn natural language commands into their Bash/ZShell/PowerShell equivalents - microsoft/Codex-CLI
Also released today is Codex CLI — an open-source lightweight coding agent that runs in your terminal: https://t.co/CXDz0aK2rX This is the first of a series of tools we'll be releasing over upcoming months, which we think show the future of programming.
0
0
6
At @honeyhiveai, we call them the "YOLO to prod" people. Too easy to ship a half-baked app when no one knows how to really do ML The core issue is SWE doesn't prep you for ML's experimentation culture. ML is more science than code—and that's what most people are getting wrong
0
0
9