Harsh Trivedi
@harsh3vedi
Followers
771
Following
3K
Media
35
Statuses
565
Research Scientist @allen_ai ๐ค building AI agents & environments: ๐ AppWorld (https://t.co/dIawTLcI7a) Prev: #NLProc PhD @stonybrooku. On ๐ฆ same handle.
Seattle, Washington
Joined January 2014
๐ฅ Autonomous AI Assistants (e.g., #googleio2024, #WWDC24) and coding agents (e.g., #Devin, #SWEAgent) have garnered a lot of attention recently. We can envision coding agents autonomously completing complex day-to-day tasks across apps using APIs on our behalf. But how can we
4
28
83
We are excited to release AgentEvolver , an open-source, self-evolving agent system from Tongyi Lab. AgentEvolver integrates three synergistic mechanismsโSelf-Questioning , Self-Navigating , and Self-Attributing โto systematically address critical bottlenecks in Agent RL
15
122
830
I'm on the job market for academic and industry positions, and would appreciate a repost. My research spans information systems, data science, and human-AI interaction, exploring how platforms can govern AI-mediated information exchange https://t.co/uPMvql0n55 Details in ๐งต๐
1
8
20
I'm excited to announce my RLHF Book is now in pre-order for the Manning Early Access Program (MEAP), @ManningBooks, and for this milestone it's 50% off. Excited to land in print in early 2026! Lots of improvements coming soon. Link below & thanks for the support!
42
71
776
๐ข I'm recruiting PhD students at MPI!! Topics include: 1โฃ LLM factuality, reliable info synthesis and reasoning, personalization + applications in real-world inc. education, science 2โฃ Data-centric interpretability 3โฃCreativity in AI, esp scientific applications ๐งต1/2
9
107
444
OpenAI cookbook on leveraging GEPA to allow agents to learn from their mistakes.
๐ย Super Excited to share our new @OpenAI cookbook is now LIVE: ๐ฆ๐ฒ๐น๐ณ-๐๐๐ผ๐น๐๐ถ๐ป๐ด ๐๐ด๐ฒ๐ป๐๐ - ๐ ๐๐ผ๐ผ๐ธ๐ฏ๐ผ๐ผ๐ธ ๐ณ๐ผ๐ฟ ๐๐๐๐ผ๐ป๐ผ๐บ๐ผ๐๐ ๐๐ด๐ฒ๐ป๐ ๐ฅ๐ฒ๐๐ฟ๐ฎ๐ถ๐ป๐ถ๐ป๐ด ๐ https://t.co/QLeBaO1R1c cc @OpenAIDevs
6
27
249
We're releasing๐จGelato-30B-A3B, a state-of-the-art computer grounding model that delivers immediate performance gains for computer-use agents! Trained on our open-source๐ฑ๏ธClick-100k dataset, Gelato achieves 63.8% on ScreenSpot-Pro and 69.1% on OS-World-G. It outperforms
7
40
233
Go work on cool mechanistic interpretability problems with Sarah! She is an amazing person and will be a great mentor ๐ฅฐ
I am recruiting 2 PhD students to work on LM interpretability at UMD @umdcs starting in fall 2026! We are #3 in AI and #4 in NLP research on @CSrankings. Come join us in our lovely building just a few miles from Washington, D.C. Details in ๐งต
1
1
19
Grateful to be part of this team collaboration led by amazing @StellaLisy and @jiminmun_ โก๏ธ
Weโre excited to share our most recent collaboration, published at @COLM_conf 2025, the result of joint work between outstanding researchers at the @UW, @CarnegieMellon, @LTIatCMU, @allen_ai, and @CPHAI_Dartmouth, in partnership with @LavitaAI. At Medical Sphere, we're glad to
0
5
29
Junior students who have just started doing research? Check out the (75 and counting) awesome tips! https://t.co/5CTTuJm3Jg
13
178
1K
Congratulations to the amazing Hao @xuhaoxh and @liujc1998 for winning the best paper award at EMNLP. Hao is applying for grad schools โ highly recommend her.
5
6
111
Congrats on the release @alexgshaw, @Mike_A_Merrill @lschmidt3, and many others! ๐ Terminal-bench is a really cool work! It was fun to help port AppWorld in the earlier version! Looking forward to trying out Harbor!
Today, weโre announcing the next chapter of Terminal-Bench with two releases: 1. Harbor, a new package for running sandboxed agent rollouts at scale 2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification
1
0
10
Today, weโre announcing the next chapter of Terminal-Bench with two releases: 1. Harbor, a new package for running sandboxed agent rollouts at scale 2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification
23
72
353
LLM CoT reasoning looks smart but can be logically flawed or... just made up. It's time to hold reasoning accountable! We built VeriCoT to do just that. VeriCoT extracts the core argument of the CoT using well-formed symbolic notions of logical support. It formalizes every CoT
1
9
24
@nileshtrivedi @AnthropicAI You might also want to check out our concurrent work: ๐AppWorld ( https://t.co/hxEWKQGsAV). We didn't name the ReAct agent differently because ours was a benchmark + env. paper. Our agent also made tool calls in code. CodeAct and AppWorld were submitted to ICML+Arxiv and ACL,
0
2
3
@Kimi_Moonshot > Executes up to 200 โ 300 sequential tool calls @Kimi_Moonshot Great work! If you want to demonstrate K2 on a benchmark that requires a seriously long-range tool calling, consider checking out ๐
0
2
10
Agent benchmarks don't measure true *AI* advances We built one that's hard & trustworthy ๐AstaBench tests agents w/ *standardized tools* on 2400+ scientific research problems ๐SOTA results across 22 agent *classes* ๐AgentBaselines agents suite ๐ https://t.co/BFjdGCAp1w ๐งต๐
arxiv.org
AI agents hold the potential to revolutionize scientific productivity by automating literature reviews, replicating experiments, analyzing data, and even proposing new directions of inquiry;...
4
21
29
Still a few spots left at our TB2.0 launch event tonight! https://t.co/iQkv3Aqkiy
luma.com
Come join us at Databricks for the announcement of Terminal-Bench 2.0 and something new we're excited to share with the community! Food and drinks providedโฆ
0
3
13
Big news! We made the basic tier of the OpenHands Cloud FREE! This means that you can call state-of-the-art coding agents from your computer, phone, github, gitlab, slack, etc. for just the price of API credits or hosting your own language model! ๐งต๐
1
10
42