Harsh Trivedi @harsh3vedi X Profile

Harsh Trivedi

@harsh3vedi

Followers

771

Following

3K

Media

35

Statuses

565

Research Scientist @allen_ai 🤖 building AI agents & environments: 🌍 AppWorld (https://t.co/dIawTLcI7a) Prev: #NLProc PhD @stonybrooku. On 🦋 same handle.

https://t.co/PFTdIjrG4x

Seattle, Washington

Joined January 2014

Don't wanna be here? Send us removal request.

Harsh Trivedi

@harsh3vedi

1 year

🔥 Autonomous AI Assistants (e.g., #googleio2024, #WWDC24) and coding agents (e.g., #Devin, #SWEAgent) have garnered a lot of attention recently. We can envision coding agents autonomously completing complex day-to-day tasks across apps using APIs on our behalf. But how can we

4

28

83

Tongyi Lab

@Ali_TongyiLab

19 days

We are excited to release AgentEvolver , an open-source, self-evolving agent system from Tongyi Lab. AgentEvolver integrates three synergistic mechanisms—Self-Questioning , Self-Navigating , and Self-Attributing —to systematically address critical bottlenecks in Agent RL

15

122

830

Swapneel Mehta

@swapneel_mehta

25 days

I'm on the job market for academic and industry positions, and would appreciate a repost. My research spans information systems, data science, and human-AI interaction, exploring how platforms can govern AI-mediated information exchange https://t.co/uPMvql0n55 Details in 🧵👇

1

8

20

Nathan Lambert

@natolambert

23 days

I'm excited to announce my RLHF Book is now in pre-order for the Manning Early Access Program (MEAP), @ManningBooks, and for this milestone it's 50% off. Excited to land in print in early 2026! Lots of improvements coming soon. Link below & thanks for the support!

42

71

776

Abhilasha Ravichander

@lasha_nlp

25 days

📢 I'm recruiting PhD students at MPI!! Topics include: 1⃣ LLM factuality, reliable info synthesis and reasoning, personalization + applications in real-world inc. education, science 2⃣ Data-centric interpretability 3⃣Creativity in AI, esp scientific applications 🧵1/2

9

107

444

Omar Khattab

@lateinteraction

26 days

OpenAI cookbook on leveraging GEPA to allow agents to learn from their mistakes.

Shikhar Kwatra

@shikharkwatra

27 days

👋 Super Excited to share our new @OpenAI cookbook is now LIVE: 𝗦𝗲𝗹𝗳-𝗘𝘃𝗼𝗹𝘃𝗶𝗻𝗴 𝗔𝗴𝗲𝗻𝘁𝘀 - 𝗔 𝗖𝗼𝗼𝗸𝗯𝗼𝗼𝗸 𝗳𝗼𝗿 𝗔𝘂𝘁𝗼𝗻𝗼𝗺𝗼𝘂𝘀 𝗔𝗴𝗲𝗻𝘁 𝗥𝗲𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 🔗 https://t.co/QLeBaO1R1c cc @OpenAIDevs

6

27

249

Anas Awadalla

@anas_awadalla

27 days

We're releasing🍨Gelato-30B-A3B, a state-of-the-art computer grounding model that delivers immediate performance gains for computer-use agents! Trained on our open-source🖱️Click-100k dataset, Gelato achieves 63.8% on ScreenSpot-Pro and 69.1% on OS-World-G. It outperforms

7

40

233

Abhilasha Ravichander

@lasha_nlp

1 month

Go work on cool mechanistic interpretability problems with Sarah! She is an amazing person and will be a great mentor 🥰

Sarah Wiegreffe

@sarahwiegreffe

1 month

I am recruiting 2 PhD students to work on LM interpretability at UMD @umdcs starting in fall 2026! We are #3 in AI and #4 in NLP research on @CSrankings. Come join us in our lovely building just a few miles from Washington, D.C. Details in 🧵

1

19

Faeze Brahman

@faeze_brh

30 days

Grateful to be part of this team collaboration led by amazing @StellaLisy and @jiminmun_ ⚡️

Medical Sphere

@MedicalSphereAI

1 month

We’re excited to share our most recent collaboration, published at @COLM_conf 2025, the result of joint work between outstanding researchers at the @UW, @CarnegieMellon, @LTIatCMU, @allen_ai, and @CPHAI_Dartmouth, in partnership with @LavitaAI. At Medical Sphere, we're glad to

0

5

29

Jia-Bin Huang

@jbhuang0604

1 month

Junior students who have just started doing research? Check out the (75 and counting) awesome tips! https://t.co/5CTTuJm3Jg

13

178

1K

Hanna Hajishirzi

@HannaHajishirzi

30 days

Congratulations to the amazing Hao @xuhaoxh and @liujc1998 for winning the best paper award at EMNLP. Hao is applying for grad schools — highly recommend her.

Jiacheng Liu

@liujc1998

1 month

Our infini-gram mini paper received the Best Paper Award at #EMNLP2025 !! Really proud 🥹

5

6

111

Harsh Trivedi

@harsh3vedi

30 days

Congrats on the release @alexgshaw, @Mike_A_Merrill @lschmidt3, and many others! 🚀 Terminal-bench is a really cool work! It was fun to help port AppWorld in the earlier version! Looking forward to trying out Harbor!

Alex Shaw

@alexgshaw

30 days

Today, we’re announcing the next chapter of Terminal-Bench with two releases: 1. Harbor, a new package for running sandboxed agent rollouts at scale 2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification

1

0

10

Mike A. Merrill

@Mike_A_Merrill

30 days

It's here!

Alex Shaw

@alexgshaw

30 days

Today, we’re announcing the next chapter of Terminal-Bench with two releases: 1. Harbor, a new package for running sandboxed agent rollouts at scale 2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification

2

3

38

Alex Shaw

@alexgshaw

30 days

Today, we’re announcing the next chapter of Terminal-Bench with two releases: 1. Harbor, a new package for running sandboxed agent rollouts at scale 2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification

23

72

353

Yu Feng ✈️ NeurIPS

@AnnieFeng6

30 days

LLM CoT reasoning looks smart but can be logically flawed or... just made up. It's time to hold reasoning accountable! We built VeriCoT to do just that. VeriCoT extracts the core argument of the CoT using well-formed symbolic notions of logical support. It formalizes every CoT

1

9

24

Harsh Trivedi

@harsh3vedi

1 month

@nileshtrivedi @AnthropicAI You might also want to check out our concurrent work: 🌎AppWorld ( https://t.co/hxEWKQGsAV). We didn't name the ReAct agent differently because ours was a benchmark + env. paper. Our agent also made tool calls in code. CodeAct and AppWorld were submitted to ICML+Arxiv and ACL,

0

2

3

Niloofar

@niloofar_mire

2 months

I'm recruiting students for fall 2026 thru @LTIatCMU & @CMU_EPP, in: 1. Privacy & security of LLMs, coding, long horizon & embodied agents (robotics) 2. Tiny local llms 3. AI for scientific reasoning, esp. chemistry 4. Latent reasoning 5. anything YOU are passionate about!

27

188

1K

Harsh Trivedi

@harsh3vedi

1 month

@Kimi_Moonshot > Executes up to 200 – 300 sequential tool calls @Kimi_Moonshot Great work! If you want to demonstrate K2 on a benchmark that requires a seriously long-range tool calling, consider checking out 🌍

0

2

10

Jonathan Bragg

@turingmusician

1 month

Agent benchmarks don't measure true *AI* advances We built one that's hard & trustworthy 👉AstaBench tests agents w/ *standardized tools* on 2400+ scientific research problems 👉SOTA results across 22 agent *classes* 👉AgentBaselines agents suite 🆕 https://t.co/BFjdGCAp1w 🧵👇

arxiv.org

AI agents hold the potential to revolutionize scientific productivity by automating literature reviews, replicating experiments, analyzing data, and even proposing new directions of inquiry;...

4

21

29

Mike A. Merrill

@Mike_A_Merrill

1 month

Still a few spots left at our TB2.0 launch event tonight! https://t.co/iQkv3Aqkiy

luma.com

Come join us at Databricks for the announcement of Terminal-Bench 2.0 and something new we're excited to share with the community! Food and drinks provided…

0

3

13

Graham Neubig

@gneubig

1 month

Big news! We made the basic tier of the OpenHands Cloud FREE! This means that you can call state-of-the-art coding agents from your computer, phone, github, gitlab, slack, etc. for just the price of API credits or hosting your own language model! 🧵👇

1

10

42