jiseungh99 Profile Banner
Jiseung Hong Profile
Jiseung Hong

@jiseungh99

Followers
68
Following
54
Media
12
Statuses
39

Master Student CMU LTI (@LTIatCMU) Research Interest - Agents, LLM Evaluation

Pittsburgh, PA
Joined May 2025
Don't wanna be here? Send us removal request.
@jiseungh99
Jiseung Hong
2 months
Introducing ⚔️PR Arena⚔️ - free AI coding agents to fix real GitHub issues. Claude Sonnet 4 vs Gemini 2.5 Pro… Who writes better pull requests? 👉 Install here: https://t.co/bk19LcnBVf Powered by @allhands_ai
4
12
78
@Yaooo01
Yao Dou
10 days
Can LLM-simulated users replace expensive human evaluation for multi-turn conversations? Short answer: yes, if you model the user right. With our SimulatorArena, we find that detailed user profiles (knowledge + message style) improve alignment with real human evaluation by 26%
4
26
134
@jiseungh99
Jiseung Hong
10 days
🤖We’ll be adding more models as votes come in. Keep an eye on the Leaderboard!
0
0
2
@jiseungh99
Jiseung Hong
10 days
👐PR Arena is an open-source, community-driven experiment to benchmark LLMs as coding agents. Each PR you choose helps reveal which LLMs code best. 👉Learn more:
Tweet card summary image
github.com
⚔️ OpenHands PR Arena ⚔️ is a platform for evaluating and benchmarking agentic coding assistants through paired pull request (PR) generations. - neulab/pr-arena
1
0
2
@jiseungh99
Jiseung Hong
10 days
⚔️PR Arena is LIVE⚔️ We’re kicking off with 5 frontier LLMs tackling GitHub issue resolution. Resolve issues for free & vote for your preferred fix! 👉Leaderboard & Setup Guide: https://t.co/S1Oe3xWOhc
1
4
6
@claudeai
Claude
25 days
Introducing Claude Sonnet 4.5—the best coding model in the world. It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains on tests of reasoning and math.
1K
3K
21K
@jungsoo___park
Jungsoo Park
28 days
What if LLMs can forecast their own scores on unseen benchmarks from just a task description? We are the first to study text description→performance prediction, giving practitioners an early read on outcomes so they can plan what to build—before paying full price 💸
3
9
27
@jiseungh99
Jiseung Hong
1 month
0
0
2
@jiseungh99
Jiseung Hong
1 month
🤝 PR Arena is an open-source, community-driven experiment to benchmark AI coding agents. Each PR you choose helps reveal which LLMs code best. We welcome your feedback to shape the project together with the community.
1
0
3
@jiseungh99
Jiseung Hong
1 month
Ready to try it out? Here’s the step-by-step setup guide for the PR Arena GitHub App. Install it in minutes and let frontier models fix your GitHub issues — 💸 at no cost. 👉Setup Guide:
1
0
2
@jiseungh99
Jiseung Hong
1 month
We are excited to launch the ⚔️PR Arena⚔️ leaderboard! Full results will be revealed after a certain milestone of community votes. Fix your GitHub issues for free and vote for better fix! 👉Leaderboard & Setup Guide: https://t.co/S1Oe3xXm6K
1
9
24
@hyungjoochae
Hyungjoo Chae
5 months
🚀 Introducing Web-Shepherd: the first Process Reward Model (PRM) that guides web agents. 🌐 Current web browsing agents look cool, but they're not fully reliable! 😬They excel at simple tasks but struggle with complex ones. ❓ Can inference-time scaling help? Previous methods
@_akhaliq
AK
5 months
Web-Shepherd just dropped on Hugging Face Advancing PRMs for Reinforcing Web Agents
2
17
73
@jiseungh99
Jiseung Hong
2 months
3⃣ New preference added! You can now pick "neither" if you don’t like both fixes—no PR will be created. 👉 Check out the screenshot:
0
0
1
@jiseungh99
Jiseung Hong
2 months
2⃣ Want to use PR Arena in forked repos? Just add one extra step! 👉 Details here:
1
0
1
@jiseungh99
Jiseung Hong
2 months
1⃣ For your convenience, the pr-arena🏷️ label is added automatically once PR Arena is installed! Simply tag the issue with the label—and you’re all set! 👉 Check out the screenshot:
1
0
1
@jiseungh99
Jiseung Hong
2 months
Here are some tips for using ⚔️PR Arena⚔️ 1⃣ pr-arena🏷️ option is added automatically to Issue Labels for ease of use! 2⃣ You can use PR Arena in forked repositories. 3⃣ Don't like either fix? Select “neither” and no PR will be created. 👉Install here:
@jiseungh99
Jiseung Hong
2 months
Introducing ⚔️PR Arena⚔️ - free AI coding agents to fix real GitHub issues. Claude Sonnet 4 vs Gemini 2.5 Pro… Who writes better pull requests? 👉 Install here: https://t.co/bk19LcnBVf Powered by @allhands_ai
1
2
14
@jiseungh99
Jiseung Hong
2 months
@allhands_ai 🤝 OpenHands PR Arena is an open-source, community-driven experiment to benchmark AI coding agents. Each PR you choose helps reveal which LLMs code best. 🔐 We do not access your codebase, read GitHub secrets, or release any user data. 👉 Learn more:
github.com
⚔️ OpenHands PR Arena ⚔️ is a platform for evaluating and benchmarking agentic coding assistants through paired pull request (PR) generations. - neulab/pr-arena
0
0
7
@jiseungh99
Jiseung Hong
2 months
@allhands_ai Why you should try out ⚔️PR Arena⚔️ 👆 PR Arena is completely free - all the LLM API calls are on us! ✌️ See which LLMs as coding agents are better in-the-wild. We will soon release the leaderboard.
0
0
8
@yuseungleee
Phillip (Yuseung) Lee
6 months
❗️Vision-Language Models (VLMs) struggle with even basic perspective changes! ✏️ In our new preprint, we aim to extend the spatial reasoning capabilities of VLMs to ⭐️arbitrary⭐️ perspectives. 📄Paper: https://t.co/qq5s8jHtVN 🔗Project: https://t.co/sh5W8VLwZO 🧵[1/N]
4
37
151
@jiseungh99
Jiseung Hong
5 months
You can view the paper, code, and dataset for our work below! 🧑‍💻Code & Benchmark https://t.co/cyU3YX9OvK 📖 Paper https://t.co/sJlm95KYZy Special thanks to the authors @gracebyun0411, @seungonekim, Professor Kai Shu.
0
1
1