Tianshu Zhang
@Tianshu_OSU
Followers
421
Following
290
Media
9
Statuses
87
Ph.D student @osunlp @OhioStateCSE. Ex-intern @IBMResearch, @Adobe. Lead author of TableLlama. #NLProc
Columbus, OH
Joined September 2022
Life update: I moved to silicon valley to tackle agents' biggest challenges: plasticity and reliability. Today's agents are smart but brittle. They lack plasticity (continual learning and adaptation) and reliability (stable, predictable behavior with bounded failures). These two
40
44
427
Computer Use: Modern Moravec's Paradox A new blog post arguing why computer-use agents may be the biggest opportunity and challenge for AGI. https://t.co/6fZfTdx710 Table of Contents > Moravec’s Paradox > Moravec's Paradox in 2025 > Computer use may be the biggest opportunity
9
65
216
I am humbled and grateful to receive two grants from Open Philanthropy @open_phil to advance the safety of AI systems, co-led with my colleague @ysu_nlp. I'm also honored to be the first at @OhioState to receive Open Philanthropy funding. Most credit goes to the amazing students
Associate Prof. Huan Sun has been awarded two competitive research grants from Open Philanthropy focused on the rapidly evolving field of #AI safety: https://t.co/3UG0YbDbtJ
@hhsun1
4
17
79
🙏 Huge thanks to my amazing collaborators: @kunqian_us @sidthekidder @bestaskwisher @ShaddyGarg @hhsun1 @yunyao_li - couldn’t have done this without you! Also appreciate all discussions from @osunlp !
0
0
3
On average, open-source LLMs fine-tuned with EvoSchema outperform different baseline methods, highlighting a path towards more resilient NL2SQL systems that adapt as database schemas evolve over time.
1
0
3
💡 Why it matters: Database schemas are not static — they evolve 🔄. 🌍🌍 Big picture 🔸EvoSchema defines 10 schema perturbations (column- & table-level) and shows how schema shifts can break SOTA models. 🔸Column-level changes hurt a bit. 🔸Table-level schema changes hurt a lot.
1
0
4
🎉 Excited to share that our paper EvoSchema: Towards Text-to-SQL Robustness Against Schema Evolution was accepted at VLDB 2025! 🚀 📢 Reminder: join us at VLDB 2025 in London! 🗓️ Sept 2 (Tue), 10:45 AM – 12:15 PM 📍 Room Wordsworth 4F 📄 https://t.co/ZNAav4ZtoX
#VLDB2025 #LLMs
1
18
30
Remember “Son of Anton” from the Silicon Valley show(@SiliconHBO)? The experimental AI that “efficiently” orders 4,000 lbs of meat while looking for a cheap burger and “fixes” a bug by deleting all the code? It’s starting to look a lot like reality. Even 18 months ago, my own
As AI agents start taking real actions online, how do we prevent unintended harm? We teamed up with @OhioState and @UCBerkeley to create WebGuard: the first dataset for evaluating web agent risks and building real-world safety guardrails for online environments. 🧵
0
27
68
Announcing the @NeurIPSConf 2025 workshop on Imageomics: Discovering Biological Knowledge from Images Using AI! The workshop focuses on the interdisciplinary field between machine learning and biological science. We look forward to seeing you in San Diego! #NeurIPS2025
2
14
27
Attending #ICML2025 🇨🇦 this week! I’ll be co-organizing the Computer Use Agent Workshop @workshopcua on July 19th! Happy to chat about anything related to language agents — especially world modeling, scaling RL for agents, and multi-turn RL. Excited to meet old friends and
2
6
48
🔎Agentic search like Deep Research is fundamentally changing web search, but it also brings an evaluation crisis⚠️ Introducing Mind2Web 2: Evaluating Agentic Search with Agents-as-a-Judge - 130 tasks (each requiring avg. 100+ webpages) from 1,000+ hours of expert labor -
3
47
224
If you care about building AI co-scientists for data-driven discovery, check out our recent work on automatically collecting large-scale, authentic, high-quality scientific coding tasks at a low cost, led by @YifeiLiPKU @HananeNMoussa @osunlp. 🌟AutoSDT: Scaling Data-Driven
arxiv.org
Despite long-standing efforts in accelerating scientific discovery with AI, building AI co-scientists remains challenging due to limited high-quality data for training and evaluation. To tackle...
📢 Introducing AutoSDT, a fully automatic pipeline that collects data-driven scientific coding tasks at scale! We use AutoSDT to collect AutoSDT-5K, enabling open co-scientist models that rival GPT-4o on ScienceAgentBench! Thread below ⬇️ (1/n)
0
4
16
Systematic reviews (SRs) drive evidence-based medicine, but months-long workflows can’t keep pace with today’s literature flood. Fully autonomous solutions promise speed, but the magic often fizzles - these models still skip pivotal trials, hallucinate findings, and bury the
1
15
21
⁉️Can you really trust Computer-Use Agents (CUAs) to control your computer⁉️ Not yet, @AnthropicAI Opus 4 shows an alarming 48% Attack Success Rate against realistic internet injection❗️ Introducing RedTeamCUA: realistic, interactive, and controlled sandbox environments for
4
31
84
Proud moment for @OhioStateCSE! Prof. @hhsun1 has been awarded funding from @SchmidtSciences' for AI Safety initiative — a first for Ohio State. Her work will help defend AI agents from adversarial attacks.
engineering.osu.edu
Schmidt Sciences selected 27 projects for funding
0
8
13
I will miss #NAACL2025 unfortunately, but please check out our work on chemistry agents, "ChemToolAgent: The Impact of Tools on Language Agents for Chemistry Problem Solving" today (May 1) during 2:00-3:30pm (local time) at Hall 3, Poster Session 5! Some updates: We have renamed
1
16
41
It's a great honor to give a keynote at the @Molecule_Maker symposium at UIUC! Many thanks to Prof. @hengjinlp and Prof. Jiawei Han for invitation. The symposium’s theme this year is “AI scientist? What would it take?”, which I hold close to heart and made a talk titled “Language
2
18
69
LLMs exhibit the Reversal Curse, a basic generalization failure where they struggle to learn reversible factual associations (e.g., "A is B" -> "B is A"). But why? Our new work uncovers that it's a symptom of the long-standing binding problem in AI, and shows that a model design
25
126
863