harsh3vedi Profile Banner
Harsh Trivedi Profile
Harsh Trivedi

@harsh3vedi

Followers
771
Following
3K
Media
35
Statuses
565

Research Scientist @allen_ai ๐Ÿค– building AI agents & environments: ๐ŸŒ AppWorld (https://t.co/dIawTLcI7a) Prev: #NLProc PhD @stonybrooku. On ๐Ÿฆ‹ same handle.

Seattle, Washington
Joined January 2014
Don't wanna be here? Send us removal request.
@harsh3vedi
Harsh Trivedi
1 year
๐Ÿ”ฅ Autonomous AI Assistants (e.g., #googleio2024, #WWDC24) and coding agents (e.g., #Devin, #SWEAgent) have garnered a lot of attention recently. We can envision coding agents autonomously completing complex day-to-day tasks across apps using APIs on our behalf. But how can we
4
28
83
@Ali_TongyiLab
Tongyi Lab
19 days
We are excited to release AgentEvolver , an open-source, self-evolving agent system from Tongyi Lab. AgentEvolver integrates three synergistic mechanismsโ€”Self-Questioning , Self-Navigating , and Self-Attributing โ€”to systematically address critical bottlenecks in Agent RL
15
122
830
@swapneel_mehta
Swapneel Mehta
25 days
I'm on the job market for academic and industry positions, and would appreciate a repost. My research spans information systems, data science, and human-AI interaction, exploring how platforms can govern AI-mediated information exchange https://t.co/uPMvql0n55 Details in ๐Ÿงต๐Ÿ‘‡
1
8
20
@natolambert
Nathan Lambert
23 days
I'm excited to announce my RLHF Book is now in pre-order for the Manning Early Access Program (MEAP), @ManningBooks, and for this milestone it's 50% off. Excited to land in print in early 2026! Lots of improvements coming soon. Link below & thanks for the support!
42
71
776
@lasha_nlp
Abhilasha Ravichander
25 days
๐Ÿ“ข I'm recruiting PhD students at MPI!! Topics include: 1โƒฃ LLM factuality, reliable info synthesis and reasoning, personalization + applications in real-world inc. education, science 2โƒฃ Data-centric interpretability 3โƒฃCreativity in AI, esp scientific applications ๐Ÿงต1/2
9
107
444
@lateinteraction
Omar Khattab
26 days
OpenAI cookbook on leveraging GEPA to allow agents to learn from their mistakes.
@shikharkwatra
Shikhar Kwatra
27 days
๐Ÿ‘‹ย Super Excited to share our new @OpenAI cookbook is now LIVE: ๐—ฆ๐—ฒ๐—น๐—ณ-๐—˜๐˜ƒ๐—ผ๐—น๐˜ƒ๐—ถ๐—ป๐—ด ๐—”๐—ด๐—ฒ๐—ป๐˜๐˜€ - ๐—” ๐—–๐—ผ๐—ผ๐—ธ๐—ฏ๐—ผ๐—ผ๐—ธ ๐—ณ๐—ผ๐—ฟ ๐—”๐˜‚๐˜๐—ผ๐—ป๐—ผ๐—บ๐—ผ๐˜‚๐˜€ ๐—”๐—ด๐—ฒ๐—ป๐˜ ๐—ฅ๐—ฒ๐˜๐—ฟ๐—ฎ๐—ถ๐—ป๐—ถ๐—ป๐—ด ๐Ÿ”— https://t.co/QLeBaO1R1c cc @OpenAIDevs
6
27
249
@anas_awadalla
Anas Awadalla
27 days
We're releasing๐ŸจGelato-30B-A3B, a state-of-the-art computer grounding model that delivers immediate performance gains for computer-use agents! Trained on our open-source๐Ÿ–ฑ๏ธClick-100k dataset, Gelato achieves 63.8% on ScreenSpot-Pro and 69.1% on OS-World-G. It outperforms
7
40
233
@lasha_nlp
Abhilasha Ravichander
1 month
Go work on cool mechanistic interpretability problems with Sarah! She is an amazing person and will be a great mentor ๐Ÿฅฐ
@sarahwiegreffe
Sarah Wiegreffe
1 month
I am recruiting 2 PhD students to work on LM interpretability at UMD @umdcs starting in fall 2026! We are #3 in AI and #4 in NLP research on @CSrankings. Come join us in our lovely building just a few miles from Washington, D.C. Details in ๐Ÿงต
1
1
19
@faeze_brh
Faeze Brahman
30 days
Grateful to be part of this team collaboration led by amazing @StellaLisy and @jiminmun_ โšก๏ธ
@MedicalSphereAI
Medical Sphere
1 month
Weโ€™re excited to share our most recent collaboration, published at @COLM_conf 2025, the result of joint work between outstanding researchers at the @UW, @CarnegieMellon, @LTIatCMU, @allen_ai, and @CPHAI_Dartmouth, in partnership with @LavitaAI. At Medical Sphere, we're glad to
0
5
29
@jbhuang0604
Jia-Bin Huang
1 month
Junior students who have just started doing research? Check out the (75 and counting) awesome tips! https://t.co/5CTTuJm3Jg
13
178
1K
@HannaHajishirzi
Hanna Hajishirzi
30 days
Congratulations to the amazing Hao @xuhaoxh and @liujc1998 for winning the best paper award at EMNLP. Hao is applying for grad schools โ€” highly recommend her.
@liujc1998
Jiacheng Liu
1 month
Our infini-gram mini paper received the Best Paper Award at #EMNLP2025 !! Really proud ๐Ÿฅน
5
6
111
@harsh3vedi
Harsh Trivedi
30 days
Congrats on the release @alexgshaw, @Mike_A_Merrill @lschmidt3, and many others! ๐Ÿš€ Terminal-bench is a really cool work! It was fun to help port AppWorld in the earlier version! Looking forward to trying out Harbor!
@alexgshaw
Alex Shaw
30 days
Today, weโ€™re announcing the next chapter of Terminal-Bench with two releases: 1. Harbor, a new package for running sandboxed agent rollouts at scale 2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification
1
0
10
@Mike_A_Merrill
Mike A. Merrill
30 days
It's here!
@alexgshaw
Alex Shaw
30 days
Today, weโ€™re announcing the next chapter of Terminal-Bench with two releases: 1. Harbor, a new package for running sandboxed agent rollouts at scale 2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification
2
3
38
@alexgshaw
Alex Shaw
30 days
Today, weโ€™re announcing the next chapter of Terminal-Bench with two releases: 1. Harbor, a new package for running sandboxed agent rollouts at scale 2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification
23
72
353
@AnnieFeng6
Yu Feng โœˆ๏ธ NeurIPS
30 days
LLM CoT reasoning looks smart but can be logically flawed or... just made up. It's time to hold reasoning accountable! We built VeriCoT to do just that. VeriCoT extracts the core argument of the CoT using well-formed symbolic notions of logical support. It formalizes every CoT
1
9
24
@harsh3vedi
Harsh Trivedi
1 month
@nileshtrivedi @AnthropicAI You might also want to check out our concurrent work: ๐ŸŒŽAppWorld ( https://t.co/hxEWKQGsAV). We didn't name the ReAct agent differently because ours was a benchmark + env. paper. Our agent also made tool calls in code. CodeAct and AppWorld were submitted to ICML+Arxiv and ACL,
0
2
3
@niloofar_mire
Niloofar
2 months
I'm recruiting students for fall 2026 thru @LTIatCMU & @CMU_EPP, in: 1. Privacy & security of LLMs, coding, long horizon & embodied agents (robotics) 2. Tiny local llms 3. AI for scientific reasoning, esp. chemistry 4. Latent reasoning 5. anything YOU are passionate about!
27
188
1K
@harsh3vedi
Harsh Trivedi
1 month
@Kimi_Moonshot > Executes up to 200 โ€“ 300 sequential tool calls @Kimi_Moonshot Great work! If you want to demonstrate K2 on a benchmark that requires a seriously long-range tool calling, consider checking out ๐ŸŒ
0
2
10
@turingmusician
Jonathan Bragg
1 month
Agent benchmarks don't measure true *AI* advances We built one that's hard & trustworthy ๐Ÿ‘‰AstaBench tests agents w/ *standardized tools* on 2400+ scientific research problems ๐Ÿ‘‰SOTA results across 22 agent *classes* ๐Ÿ‘‰AgentBaselines agents suite ๐Ÿ†• https://t.co/BFjdGCAp1w ๐Ÿงต๐Ÿ‘‡
Tweet card summary image
arxiv.org
AI agents hold the potential to revolutionize scientific productivity by automating literature reviews, replicating experiments, analyzing data, and even proposing new directions of inquiry;...
4
21
29
@gneubig
Graham Neubig
1 month
Big news! We made the basic tier of the OpenHands Cloud FREE! This means that you can call state-of-the-art coding agents from your computer, phone, github, gitlab, slack, etc. for just the price of API credits or hosting your own language model! ๐Ÿงต๐Ÿ‘‡
1
10
42