
Kai Zhang
@DrogoKhal4
Followers
2K
Following
4K
Media
35
Statuses
494
PhD-ing @osunlp with @ysu_nlp
Columbus, OH
Joined February 2019
šBig WebDreamer update!.We train šDreamer-7B, a small but strong world model for real-world web planning. š„Beats Qwen2-72B.āļøMatches #GPT-4o.Trained on 3M synthetic examples ā and yes, all data + models are open-sourced.
āWondering how to scale inference-time compute with advanced planning for language agents?. šāāļøShort answer: Using your LLM as a world model.š”More detailed answer: Using GPT-4o to predict the outcome of actions on a website can deliver strong performance with improved safety and
1
24
80
RT @hhsun1: šØ Postdoc Hiring:.I am looking for a postdoc to work on rigorously evaluating and advancing the capabilities and safety of compā¦.
0
27
0
RT @Benjamin_eecs: We've always been excited about self-play unlocking continuously improving agents. Our insight: RL selects generalizableā¦.
0
52
0
RT @tzmhuang: We're already using AI search systems every day for more and more complex tasks, but how good are they really? Challenge: evaā¦.
0
3
0
RT @yuting_ning: š§Agentic search is revolutionizing how we gather information, but how reliable is it? Can it really deliver accurate answeā¦.
0
4
0
RT @ysu_nlp: šAgentic search like Deep Research is fundamentally changing web search, but it also brings an evaluation crisisā ļø. Introducinā¦.
0
47
0
RT @MuCai7: Impressed by V-JEPA 2's improvement on TemporalBench . Indeed, we need a better video encoder for the tā¦.
0
3
0
RT @xiangyue96: Attending #CVPR2025 in #Nashville! We will have our multimodal LLM evaluation tutorial tmr afternoon! Feel free to ping meā¦.
0
10
0
RT @YifeiLiPKU: š¢ Introducing AutoSDT, a fully automatic pipeline that collects data-driven scientific coding tasks at scale!.We use AutoSDā¦.
0
25
0
RT @HuggingPapers: Are we heading down the right path towards omni-modality? š¤. This new paper explores the effects of extending modality iā¦.
0
22
0
RT @YuanshengNi: š¢ Introducing VisCoder ā fine-tuned language models for Python-based visualization code generation and feedback-driven selā¦.
0
16
0
Had a blast working with @DarthZhu_ !.We try to analyze and use the modality-specific models extended from the same #LLM backbones to create omni ones. e.g., Qwen2-VL, -Video, -Audio, on #Qwen2.Tho most results are negative, we have some interesting findings here :).
š“ Extending modality based on an LLM has been a common practice when we are talking about multimodal LLMs. ā Can it generalize to omni-modality?. We study the effects of extending modality and ask three questions:. #LLM #MLLM #OmniModality.
1
4
15
RT @davidbau: Dear MAGA friends,. I have been worrying about STEM in the US a lot, because right now the Senate is writing new laws that cuā¦.
0
72
0
RT @yizhongwyz: Thrilled to announce that I will be joining @UTAustin @UTCompSci as an assistant professor in fall 2026! . I will continueā¦.
0
54
0
RT @hhsun1: Realistic adversarial testing of Computer-Use Agents (CUAs) to identify their vulnerabilities and make them safer and more secuā¦.
0
24
0
RT @lateinteraction: Sigh, it's a bit of a mess. Let me just give you guys the full nuance in one stream of consciousness since I think weā¦.
0
85
0
RT @LiaoZeyi: āļøCan you really trust Computer-Use Agents (CUAs) to control your computerāļø. Not yet, @AnthropicAI Opus 4 shows an alarmingā¦.
0
32
0
RT @vardaanpahuja: š Thrilled to unveil the most exciting project of my PhD:.Explorer ā Scaling Exploration-driven Web Trajectory Synthesisā¦.
0
24
0
RT @irenelizihui: š¢ Today, we release #MMLUProX, which upgrades MMLU-Pro to 29 languages across 14 disciplinesā11,829 reasoning-heavy Qs peā¦.
0
18
0