Jiseung Hong
@jiseungh99
Followers
68
Following
54
Media
12
Statuses
39
Master Student CMU LTI (@LTIatCMU) Research Interest - Agents, LLM Evaluation
Pittsburgh, PA
Joined May 2025
Introducing ⚔️PR Arena⚔️ - free AI coding agents to fix real GitHub issues. Claude Sonnet 4 vs Gemini 2.5 Pro… Who writes better pull requests? 👉 Install here: https://t.co/bk19LcnBVf Powered by @allhands_ai
4
12
78
Can LLM-simulated users replace expensive human evaluation for multi-turn conversations? Short answer: yes, if you model the user right. With our SimulatorArena, we find that detailed user profiles (knowledge + message style) improve alignment with real human evaluation by 26%
4
26
134
🤖We’ll be adding more models as votes come in. Keep an eye on the Leaderboard!
0
0
2
👐PR Arena is an open-source, community-driven experiment to benchmark LLMs as coding agents. Each PR you choose helps reveal which LLMs code best. 👉Learn more:
github.com
⚔️ OpenHands PR Arena ⚔️ is a platform for evaluating and benchmarking agentic coding assistants through paired pull request (PR) generations. - neulab/pr-arena
1
0
2
⚔️PR Arena is LIVE⚔️ We’re kicking off with 5 frontier LLMs tackling GitHub issue resolution. Resolve issues for free & vote for your preferred fix! 👉Leaderboard & Setup Guide: https://t.co/S1Oe3xWOhc
1
4
6
Introducing Claude Sonnet 4.5—the best coding model in the world. It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains on tests of reasoning and math.
1K
3K
21K
What if LLMs can forecast their own scores on unseen benchmarks from just a task description? We are the first to study text description→performance prediction, giving practitioners an early read on outcomes so they can plan what to build—before paying full price 💸
3
9
27
🤝 PR Arena is an open-source, community-driven experiment to benchmark AI coding agents. Each PR you choose helps reveal which LLMs code best. We welcome your feedback to shape the project together with the community.
1
0
3
Ready to try it out? Here’s the step-by-step setup guide for the PR Arena GitHub App. Install it in minutes and let frontier models fix your GitHub issues — 💸 at no cost. 👉Setup Guide:
1
0
2
We are excited to launch the ⚔️PR Arena⚔️ leaderboard! Full results will be revealed after a certain milestone of community votes. Fix your GitHub issues for free and vote for better fix! 👉Leaderboard & Setup Guide: https://t.co/S1Oe3xXm6K
1
9
24
🚀 Introducing Web-Shepherd: the first Process Reward Model (PRM) that guides web agents. 🌐 Current web browsing agents look cool, but they're not fully reliable! 😬They excel at simple tasks but struggle with complex ones. ❓ Can inference-time scaling help? Previous methods
2
17
73
3⃣ New preference added! You can now pick "neither" if you don’t like both fixes—no PR will be created. 👉 Check out the screenshot:
0
0
1
2⃣ Want to use PR Arena in forked repos? Just add one extra step! 👉 Details here:
1
0
1
1⃣ For your convenience, the pr-arena🏷️ label is added automatically once PR Arena is installed! Simply tag the issue with the label—and you’re all set! 👉 Check out the screenshot:
1
0
1
Here are some tips for using ⚔️PR Arena⚔️ 1⃣ pr-arena🏷️ option is added automatically to Issue Labels for ease of use! 2⃣ You can use PR Arena in forked repositories. 3⃣ Don't like either fix? Select “neither” and no PR will be created. 👉Install here:
Introducing ⚔️PR Arena⚔️ - free AI coding agents to fix real GitHub issues. Claude Sonnet 4 vs Gemini 2.5 Pro… Who writes better pull requests? 👉 Install here: https://t.co/bk19LcnBVf Powered by @allhands_ai
1
2
14
@allhands_ai 🤝 OpenHands PR Arena is an open-source, community-driven experiment to benchmark AI coding agents. Each PR you choose helps reveal which LLMs code best. 🔐 We do not access your codebase, read GitHub secrets, or release any user data. 👉 Learn more:
github.com
⚔️ OpenHands PR Arena ⚔️ is a platform for evaluating and benchmarking agentic coding assistants through paired pull request (PR) generations. - neulab/pr-arena
0
0
7
@allhands_ai Why you should try out ⚔️PR Arena⚔️ 👆 PR Arena is completely free - all the LLM API calls are on us! ✌️ See which LLMs as coding agents are better in-the-wild. We will soon release the leaderboard.
0
0
8
❗️Vision-Language Models (VLMs) struggle with even basic perspective changes! ✏️ In our new preprint, we aim to extend the spatial reasoning capabilities of VLMs to ⭐️arbitrary⭐️ perspectives. 📄Paper: https://t.co/qq5s8jHtVN 🔗Project: https://t.co/sh5W8VLwZO 🧵[1/N]
4
37
151
You can view the paper, code, and dataset for our work below! 🧑💻Code & Benchmark https://t.co/cyU3YX9OvK 📖 Paper https://t.co/sJlm95KYZy Special thanks to the authors @gracebyun0411, @seungonekim, Professor Kai Shu.
0
1
1