Kyochul Jang Profile
Kyochul Jang

@TigerKyo

Followers
9
Following
26
Media
1
Statuses
28

Joined April 2023
Don't wanna be here? Send us removal request.
@TigerKyo
Kyochul Jang
20 days
8/.๐Ÿš€ TL;DR.DICE-BENCH = Realistic multi-party + multi-round tool-use benchmark.โ†’ Harder, and already revealing big gaps in todayโ€™s best models. Try it, beat it, and help push LLMs toward robust tool use in the wild! ๐Ÿ› ๏ธ๐Ÿ’ฌ. Accepted to ACL 2025! See you in Vienna! ๐Ÿ‡ฆ๐Ÿ‡น.
0
0
0
@TigerKyo
Kyochul Jang
20 days
7/.๐Ÿ”— Resources.๐Ÿ‘ฉโ€๐Ÿ’ป Code: ๐Ÿ“š Dataset: ๐Ÿ“„ Paper: Everything is fully open, recreate, extend, or benchmark your own models!.
arxiv.org
Existing function-calling benchmarks focus on single-turn interactions. However, they overlook the complexity of real-world scenarios. To quantify how existing benchmarks address practical...
1
0
1
@TigerKyo
Kyochul Jang
20 days
6/.๐Ÿ’ก Key takeaways.- Multi-round + multi-party matters. - Dispersion hurts: models need better long-context reasoning. - Benchmarks must reflect messy reality to drive progress.
1
0
0
@TigerKyo
Kyochul Jang
20 days
5/.๐Ÿ“ˆ Does it work?.Tested 19 LLMs: even GPT-4o hits only 64 % EM on average. Performance drops as rounds โ†‘ or speakers โ†‘.
1
0
0
@TigerKyo
Kyochul Jang
20 days
4/.๐Ÿ—๏ธ How we built it.1๏ธโƒฃ Tool Graph (124 functions).2๏ธโƒฃ Scenario Config (personas, domains, rounds).3๏ธโƒฃ Multi-agent simulation + rigorous 3-stage filtering (auto, rule, human).Result: 1,607 high-quality dialogues covering 4 rounds & up to 4 participants.
1
0
0
@TigerKyo
Kyochul Jang
20 days
3/.๐Ÿ” DICE-SCORE:.A new metric that quantifies how difficult the input context is to function-call for LLMs by measuring a dispersion of tool clues across the dialogue. Higher score โ†’ harder: models must consider longer context & more speakers.
1
0
0
@TigerKyo
Kyochul Jang
20 days
2/.๐Ÿค” Why does this matter? Assistants must track info spread across multiple people and rounds. Existing datasets doesn't cover this case. DICE-BENCH measures how well LLMs gather, integrate, and execute tool calls in real conversations.
1
0
0
@TigerKyo
Kyochul Jang
20 days
๐Ÿงต1/.โœจ New preprint โœจ.LLMs can call external tools, but most benchmarks assume single-turn, single-user chats. Meet DICE-BENCH, the first benchmark to test tool use in realistic multi-party, multi-round dialogues. ๐Ÿ•ธ๏ธ Page:
1
0
1
@TigerKyo
Kyochul Jang
7 months
The only Long Context Benchmark essential for this era.
@taewhoolee
Taewhoo Lee
7 months
๐Ÿค”Modern LLMs are known to support long text, but can they ๐Ÿ๐ฎ๐ฅ๐ฅ๐ฒ ๐ฎ๐ญ๐ข๐ฅ๐ข๐ณ๐ž the information available in these texts?. ๐Ÿ’กIntroducing ๐„๐“๐‡๐ˆ๐‚, a new long-context benchmark designed to assess LLMs' ability to leverage the entire given context.
0
0
2
@TigerKyo
Kyochul Jang
7 months
RT @taewhoolee: ๐Ÿค”Modern LLMs are known to support long text, but can they ๐Ÿ๐ฎ๐ฅ๐ฅ๐ฒ ๐ฎ๐ญ๐ข๐ฅ๐ข๐ณ๐ž the information available in these texts?. ๐Ÿ’กIntroduโ€ฆ.
arxiv.org
Recent advancements in large language models (LLM) capable of processing extremely long texts highlight the need for a dedicated evaluation benchmark to assess their long-context capabilities....
0
5
0
@TigerKyo
Kyochul Jang
11 months
I want to visit the polar research station in Antarctica! This is one of my bucketlist.
@kopripr
๊ทน์ง€์—ฐ๊ตฌ์†Œ
11 months
[#์ด๋ฒคํŠธ]. ๊ทน์ง€์—ฐ๊ตฌ์†Œ X ํŒ”๋กœ์šฐ ํ•˜๊ณ  ๋‹ค์–‘ํ•œ ๊ทน์ง€ ์†Œ์‹ ํ™•์ธํ•˜์ž! ๐Ÿฅบ.โœจ๋งˆ์Œ์— ๋“œ๋Š” ๊ฒŒ์‹œ๊ธ€โœจ์„ ์ธ์šฉํ•ด ์‘์›ํ•ด ์ฃผ์„ธ์š”!. ๐Ÿ“ ์ด๋ฒคํŠธ ๊ธฐ๊ฐ„ : ~ 8/26(์›”)๊นŒ์ง€.๐Ÿ“ ๋‹น์ฒจ์ž ๋ฐœํ‘œ : 8/28(์ˆ˜) *๊ฐœ๋ณ„ DM ์—ฐ๋ฝ. ๐ŸŽ ์ด๋ฒคํŠธ ๊ฒฝํ’ˆ.- ๊ทน์ง€์—ฐ๊ตฌ์†Œ ๊ธฐ๋…ํ’ˆ ์„ธํŠธ (2๋ช…).- ์•„๋ฉ”๋ฆฌ์นด๋…ธ ๊ธฐํ”„ํ‹ฐ์ฝ˜ (10๋ช…). ๐ŸงŠ ์ด๋ฒคํŠธ ์ฐธ์—ฌ
Tweet media one
Tweet media two
Tweet media three
0
0
0
@TigerKyo
Kyochul Jang
1 year
๐Ÿงต[6/7] โ€ผ๏ธ If youโ€™ve been interested in Graph Neural Network or Deep Graph Generation but hesitant to start, this paper has detailed all the related content, so you must read it!.
0
0
0
@TigerKyo
Kyochul Jang
1 year
๐Ÿงต[5/7] 4๏ธโƒฃ Finally, the author outlines the limitations of the current Deep Graph Generation technology and suggests potential research areas for future exploration such as explainability, and evaluation of Deep Graph Generation.
0
0
0
@TigerKyo
Kyochul Jang
1 year
๐Ÿงต[4/7] 3๏ธโƒฃ it also discusses how the deep graph generation can be practically beneficial in the real world such as Molecule Design, Protein Design etc.
0
0
0
@TigerKyo
Kyochul Jang
1 year
๐Ÿงต[3/7] 2๏ธโƒฃ Then categorize the generation pipeline into three processes: encoder-sampler-decoders, and it provides examples for each of the taxonomy including AR, VAEs, Normalizing Flows etc.
0
0
0
@TigerKyo
Kyochul Jang
1 year
๐Ÿงต[2/7] 1๏ธโƒฃ Initially, the author defines the deep graph generation problem and differentiate it from various deep graph generation models.
0
0
0
@TigerKyo
Kyochul Jang
1 year
๐Ÿงต[1/7] โ˜˜๏ธ Yann LeCun, VP of Meta AI, pointed out that current large language models lack the ability to generate based on logical reasoning. Consequently, the use of graph structures in generation is emerging to the surface.
0
0
1
@TigerKyo
Kyochul Jang
1 year
๐Ÿš€ Here comes a Deep Graph Generation survey paper: โ€œA Survey on Deep Graph Generation: Methods and Applications.โ€ It comprehensively reviews everything about Deep Graph Generation, from the absolute basics to trend technologies. Paper:
Tweet media one
8
8
18