Jiseung Hong @jiseungh99 X Profile

Jiseung Hong

@jiseungh99

Followers

68

Following

54

Media

12

Statuses

39

Master Student CMU LTI (@LTIatCMU) Research Interest - Agents, LLM Evaluation

https://t.co/1X6TmHEQmC

Pittsburgh, PA

Joined May 2025

Don't wanna be here? Send us removal request.

Jiseung Hong

@jiseungh99

2 months

Introducing ⚔️PR Arena⚔️ - free AI coding agents to fix real GitHub issues. Claude Sonnet 4 vs Gemini 2.5 Pro… Who writes better pull requests? 👉 Install here: https://t.co/bk19LcnBVf Powered by @allhands_ai

4

12

78

Yao Dou

@Yaooo01

10 days

Can LLM-simulated users replace expensive human evaluation for multi-turn conversations? Short answer: yes, if you model the user right. With our SimulatorArena, we find that detailed user profiles (knowledge + message style) improve alignment with real human evaluation by 26%

4

26

134

Jiseung Hong

@jiseungh99

10 days

🤖We’ll be adding more models as votes come in. Keep an eye on the Leaderboard!

0

2

Jiseung Hong

@jiseungh99

10 days

👐PR Arena is an open-source, community-driven experiment to benchmark LLMs as coding agents. Each PR you choose helps reveal which LLMs code best. 👉Learn more:

github.com

⚔️ OpenHands PR Arena ⚔️ is a platform for evaluating and benchmarking agentic coding assistants through paired pull request (PR) generations. - neulab/pr-arena

1

0

2

Jiseung Hong

@jiseungh99

10 days

⚔️PR Arena is LIVE⚔️ We’re kicking off with 5 frontier LLMs tackling GitHub issue resolution. Resolve issues for free & vote for your preferred fix! 👉Leaderboard & Setup Guide: https://t.co/S1Oe3xWOhc

1

4

6

Claude

@claudeai

25 days

Introducing Claude Sonnet 4.5—the best coding model in the world. It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains on tests of reasoning and math.

1K

3K

21K

Jungsoo Park

@jungsoo___park

28 days

What if LLMs can forecast their own scores on unseen benchmarks from just a task description? We are the first to study text description→performance prediction, giving practitioners an early read on outcomes so they can plan what to build—before paying full price 💸

3

9

27

Jiseung Hong

@jiseungh99

1 month

https://t.co/9YDDhqu2KD

0

2

Jiseung Hong

@jiseungh99

1 month

🤝 PR Arena is an open-source, community-driven experiment to benchmark AI coding agents. Each PR you choose helps reveal which LLMs code best. We welcome your feedback to shape the project together with the community.

1

0

3

Jiseung Hong

@jiseungh99

1 month

Ready to try it out? Here’s the step-by-step setup guide for the PR Arena GitHub App. Install it in minutes and let frontier models fix your GitHub issues — 💸 at no cost. 👉Setup Guide:

1

0

2

Jiseung Hong

@jiseungh99

1 month

We are excited to launch the ⚔️PR Arena⚔️ leaderboard! Full results will be revealed after a certain milestone of community votes. Fix your GitHub issues for free and vote for better fix! 👉Leaderboard & Setup Guide: https://t.co/S1Oe3xXm6K

1

9

24

Hyungjoo Chae

@hyungjoochae

5 months

🚀 Introducing Web-Shepherd: the first Process Reward Model (PRM) that guides web agents. 🌐 Current web browsing agents look cool, but they're not fully reliable! 😬They excel at simple tasks but struggle with complex ones. ❓ Can inference-time scaling help? Previous methods

AK

@_akhaliq

5 months

Web-Shepherd just dropped on Hugging Face Advancing PRMs for Reinforcing Web Agents

2

17

73

Jiseung Hong

@jiseungh99

2 months

3⃣ New preference added! You can now pick "neither" if you don’t like both fixes—no PR will be created. 👉 Check out the screenshot:

0

1

Jiseung Hong

@jiseungh99

2 months

2⃣ Want to use PR Arena in forked repos? Just add one extra step! 👉 Details here:

1

0

1

Jiseung Hong

@jiseungh99

2 months

1⃣ For your convenience, the pr-arena🏷️ label is added automatically once PR Arena is installed! Simply tag the issue with the label—and you’re all set! 👉 Check out the screenshot:

1

0

1

Jiseung Hong

@jiseungh99

2 months

Here are some tips for using ⚔️PR Arena⚔️ 1⃣ pr-arena🏷️ option is added automatically to Issue Labels for ease of use! 2⃣ You can use PR Arena in forked repositories. 3⃣ Don't like either fix? Select “neither” and no PR will be created. 👉Install here:

Jiseung Hong

@jiseungh99

2 months

Introducing ⚔️PR Arena⚔️ - free AI coding agents to fix real GitHub issues. Claude Sonnet 4 vs Gemini 2.5 Pro… Who writes better pull requests? 👉 Install here: https://t.co/bk19LcnBVf Powered by @allhands_ai

1

2

14

Jiseung Hong

@jiseungh99

2 months

@allhands_ai 🤝 OpenHands PR Arena is an open-source, community-driven experiment to benchmark AI coding agents. Each PR you choose helps reveal which LLMs code best. 🔐 We do not access your codebase, read GitHub secrets, or release any user data. 👉 Learn more:

github.com

⚔️ OpenHands PR Arena ⚔️ is a platform for evaluating and benchmarking agentic coding assistants through paired pull request (PR) generations. - neulab/pr-arena

0

7

Jiseung Hong

@jiseungh99

2 months

@allhands_ai Why you should try out ⚔️PR Arena⚔️ 👆 PR Arena is completely free - all the LLM API calls are on us! ✌️ See which LLMs as coding agents are better in-the-wild. We will soon release the leaderboard.

0

8

Phillip (Yuseung) Lee

@yuseungleee

6 months

❗️Vision-Language Models (VLMs) struggle with even basic perspective changes! ✏️ In our new preprint, we aim to extend the spatial reasoning capabilities of VLMs to ⭐️arbitrary⭐️ perspectives. 📄Paper: https://t.co/qq5s8jHtVN 🔗Project: https://t.co/sh5W8VLwZO 🧵[1/N]

4

37

151

Jiseung Hong

@jiseungh99

5 months

You can view the paper, code, and dataset for our work below! 🧑‍💻Code & Benchmark https://t.co/cyU3YX9OvK 📖 Paper https://t.co/sJlm95KYZy Special thanks to the authors @gracebyun0411, @seungonekim, Professor Kai Shu.

0

1