Kyochul Jang @TigerKyo X Profile

Kyochul Jang

@TigerKyo

Followers

9

Following

26

Media

1

Statuses

28

Joined April 2023

Don't wanna be here? Send us removal request.

Kyochul Jang

@TigerKyo

20 days

8/.🚀 TL;DR.DICE-BENCH = Realistic multi-party + multi-round tool-use benchmark.→ Harder, and already revealing big gaps in today’s best models. Try it, beat it, and help push LLMs toward robust tool use in the wild! 🛠️💬. Accepted to ACL 2025! See you in Vienna! 🇦🇹.

0

Kyochul Jang

@TigerKyo

20 days

7/.🔗 Resources.👩‍💻 Code: 📚 Dataset: 📄 Paper: Everything is fully open, recreate, extend, or benchmark your own models!.

arxiv.org

Existing function-calling benchmarks focus on single-turn interactions. However, they overlook the complexity of real-world scenarios. To quantify how existing benchmarks address practical...

1

0

1

Kyochul Jang

@TigerKyo

20 days

6/.💡 Key takeaways.- Multi-round + multi-party matters. - Dispersion hurts: models need better long-context reasoning. - Benchmarks must reflect messy reality to drive progress.

1

0

Kyochul Jang

@TigerKyo

20 days

5/.📈 Does it work?.Tested 19 LLMs: even GPT-4o hits only 64 % EM on average. Performance drops as rounds ↑ or speakers ↑.

1

0

Kyochul Jang

@TigerKyo

20 days

4/.🏗️ How we built it.1️⃣ Tool Graph (124 functions).2️⃣ Scenario Config (personas, domains, rounds).3️⃣ Multi-agent simulation + rigorous 3-stage filtering (auto, rule, human).Result: 1,607 high-quality dialogues covering 4 rounds & up to 4 participants.

1

0

Kyochul Jang

@TigerKyo

20 days

3/.🔍 DICE-SCORE:.A new metric that quantifies how difficult the input context is to function-call for LLMs by measuring a dispersion of tool clues across the dialogue. Higher score → harder: models must consider longer context & more speakers.

1

0

Kyochul Jang

@TigerKyo

20 days

2/.🤔 Why does this matter? Assistants must track info spread across multiple people and rounds. Existing datasets doesn't cover this case. DICE-BENCH measures how well LLMs gather, integrate, and execute tool calls in real conversations.

1

0

Kyochul Jang

@TigerKyo

20 days

🧵1/.✨ New preprint ✨.LLMs can call external tools, but most benchmarks assume single-turn, single-user chats. Meet DICE-BENCH, the first benchmark to test tool use in realistic multi-party, multi-round dialogues. 🕸️ Page:

1

0

1

Kyochul Jang

@TigerKyo

7 months

The only Long Context Benchmark essential for this era.

Taewhoo Lee

@taewhoolee

7 months

🤔Modern LLMs are known to support long text, but can they 𝐟𝐮𝐥𝐥𝐲 𝐮𝐭𝐢𝐥𝐢𝐳𝐞 the information available in these texts?. 💡Introducing 𝐄𝐓𝐇𝐈𝐂, a new long-context benchmark designed to assess LLMs' ability to leverage the entire given context.

0

2

Kyochul Jang

@TigerKyo

7 months

RT @taewhoolee: 🤔Modern LLMs are known to support long text, but can they 𝐟𝐮𝐥𝐥𝐲 𝐮𝐭𝐢𝐥𝐢𝐳𝐞 the information available in these texts?. 💡Introdu….

arxiv.org

Recent advancements in large language models (LLM) capable of processing extremely long texts highlight the need for a dedicated evaluation benchmark to assess their long-context capabilities....

0

5

0

Kyochul Jang

@TigerKyo

11 months

I want to visit the polar research station in Antarctica! This is one of my bucketlist.

극지연구소

@kopripr

11 months

[#이벤트]. 극지연구소 X 팔로우 하고 다양한 극지 소식 확인하자! 🥺.✨마음에 드는 게시글✨을 인용해 응원해 주세요!. 📍 이벤트 기간 : ~ 8/26(월)까지.📍 당첨자 발표 : 8/28(수) *개별 DM 연락. 🎁 이벤트 경품.- 극지연구소 기념품 세트 (2명).- 아메리카노 기프티콘 (10명). 🧊 이벤트 참여

0

Kyochul Jang

@TigerKyo

1 year

🧵[7/7] Get started with clicking the link below!.Paper: Poster: #LoG #LearningOnGraphs #GraphPaper.

proceedings.mlr.press

Graphs are ubiquitous in encoding relational information of real-world objects in many domains. Graph generation, whose purpose is to generate new graphs fro...

0

Kyochul Jang

@TigerKyo

1 year

🧵[6/7] ‼️ If you’ve been interested in Graph Neural Network or Deep Graph Generation but hesitant to start, this paper has detailed all the related content, so you must read it!.

0

Kyochul Jang

@TigerKyo

1 year

🧵[5/7] 4️⃣ Finally, the author outlines the limitations of the current Deep Graph Generation technology and suggests potential research areas for future exploration such as explainability, and evaluation of Deep Graph Generation.

0

Kyochul Jang

@TigerKyo

1 year

🧵[4/7] 3️⃣ it also discusses how the deep graph generation can be practically beneficial in the real world such as Molecule Design, Protein Design etc.

0

Kyochul Jang

@TigerKyo

1 year

🧵[3/7] 2️⃣ Then categorize the generation pipeline into three processes: encoder-sampler-decoders, and it provides examples for each of the taxonomy including AR, VAEs, Normalizing Flows etc.

0

Kyochul Jang

@TigerKyo

1 year

🧵[2/7] 1️⃣ Initially, the author defines the deep graph generation problem and differentiate it from various deep graph generation models.

0

Kyochul Jang

@TigerKyo

1 year

🧵[1/7] ☘️ Yann LeCun, VP of Meta AI, pointed out that current large language models lack the ability to generate based on logical reasoning. Consequently, the use of graph structures in generation is emerging to the surface.

0

1

Kyochul Jang

@TigerKyo

1 year

🚀 Here comes a Deep Graph Generation survey paper: “A Survey on Deep Graph Generation: Methods and Applications.” It comprehensively reviews everything about Deep Graph Generation, from the absolute basics to trend technologies. Paper:

8

18