Jaylen Jones @Jaylen_JonesNLP X Profile

Jaylen Jones

@Jaylen_JonesNLP

Followers

69

Following

162

Media

6

Statuses

41

Ph.D. at The Ohio State University | Researching #ConvAI and #NLProc | Member of OSU NLP Group (@osunlp) and SLATE Lab

https://t.co/vSAK2Njdrn

Columbus, OH

Joined April 2024

Don't wanna be here? Send us removal request.

Jaylen Jones

@Jaylen_JonesNLP

2 months

Excited to present RedTeamCUA at #COLM2025 in Montreal! Happy to chat about evaluating and mitigating risks in computer-use agents, from adversarial security threats to safety risks in benign use. DMs open! 📅 Friday, Oct 10 🕐 1:30–2:30 📍 Workshop on AI Agents, Room 519B

Zeyi Liao

@LiaoZeyi

7 months

⁉️Can you really trust Computer-Use Agents (CUAs) to control your computer⁉️ Not yet, @AnthropicAI Opus 4 shows an alarming 48% Attack Success Rate against realistic internet injection❗️ Introducing RedTeamCUA: realistic, interactive, and controlled sandbox environments for

0

2

6

Jaylen Jones

@Jaylen_JonesNLP

21 days

Pros: Claude 4.5 Opus sets a new SoTA on OSWorld for computer-use capabilities. Cons: It also achieves the highest ASR (83%) on our new RedTeamCUA evaluation. Capability is rising fast, but trustworthiness depends on security keeping pace.

Huan Sun (Hiring Ph.D. students for Fall26)

@hhsun1

21 days

Computer-use agents are getting more capable, but are they getting more secure? No! We are actually observing the opposite trend. @AnthropicAI @claudeai Opus 4.5 released two days ago tops the OSWorld leaderboard, but is showing the highest Attack Success Rate (ASR) on our

0

2

3

Yu Su

@ysu_nlp

22 days

Life update: I moved to silicon valley to tackle agents' biggest challenges: plasticity and reliability. Today's agents are smart but brittle. They lack plasticity (continual learning and adaptation) and reliability (stable, predictable behavior with bounded failures). These two

40

43

421

Hanane Nour Moussa

@HananeNMoussa

2 months

📢 As AI becomes increasingly explored for research idea generation, how can we rigorously evaluate the ideas it generates before committing time and resources to them? We introduce ScholarEval, a literature grounded framework for research idea evaluation across disciplines 👇!

4

42

144

Greg Brockman

@gdb

2 months

got-5 for astronomy and astrophysics:

Deedy

@deedydas

2 months

GPT-5 and Gemini 2.5 Pro just achieved gold medal performance in the International Olympiad of Astronomy and Astrophysics (IOAA). AI is now world class at cutting edge physics.

61

86

1K

XLLM-Reason-Plan

@XllmReasonPlan

2 months

@COLM_conf #COLM2025 Prof. Huan Sun talking about "How Explanations can Advance Agent Capability and Safety". @hhsun1

0

5

11

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

2 months

Agent Learning via Early Experience "training agents from experience data with reinforcement learning remains difficult in many environments, which either lack verifiable rewards (e.g., websites) or require inefficient long-horizon rollouts (e.g., multi-turn tool use)." "We

13

90

399

Zeyi Liao

@LiaoZeyi

2 months

While @AnthropicAI Sonnet 4.5 achieves an impressive leap in computer use, achieving SOTA result on OSWorld, we care about how aligned the model is when against the prompt injection. Our results on RedTeamCUA reveal a concerning trend: it exhibits the highest Attack Success Rate

Claude

@claudeai

3 months

Introducing Claude Sonnet 4.5—the best coding model in the world. It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains on tests of reasoning and math.

0

4

16

Siva Reddy

@sivareddyg

2 months

Panel: Building Capable and Safe AI Agents Panelists: @hllo_wrld (moderator), @ysu_nlp, @Kordjamshidi, @gneubig, Joyce Chai, @taoyds VZ: What the single biggest obstacle for real world agents? YS: continual learning from experience (mention of Sutton's recent interview). RL

Siva Reddy

@sivareddyg

3 months

The IVADO workshop on Agent Capabilities and Safety is happening now at HEC Montreal, Downtown (Oct 3--6) https://t.co/MEL4JAzLRn #LLMAgents

1

3

19

Yu Su

@ysu_nlp

4 months

Computer Use: Modern Moravec's Paradox A new blog post arguing why computer-use agents may be the biggest opportunity and challenge for AGI. https://t.co/6fZfTdx710 Table of Contents > Moravec’s Paradox > Moravec's Paradox in 2025 > Computer use may be the biggest opportunity

9

65

213

Huan Sun (Hiring Ph.D. students for Fall26)

@hhsun1

4 months

I am humbled and grateful to receive two grants from Open Philanthropy @open_phil to advance the safety of AI systems, co-led with my colleague @ysu_nlp. I'm also honored to be the first at @OhioState to receive Open Philanthropy funding. Most credit goes to the amazing students

OSUengineering

@OSUengineering

4 months

Associate Prof. Huan Sun has been awarded two competitive research grants from Open Philanthropy focused on the rapidly evolving field of #AI safety: https://t.co/3UG0YbDbtJ @hhsun1

4

17

79

Boyuan Zheng

@boyuan__zheng

5 months

Remember “Son of Anton” from the Silicon Valley show(@SiliconHBO)? The experimental AI that “efficiently” orders 4,000 lbs of meat while looking for a cheap burger and “fixes” a bug by deleting all the code? It’s starting to look a lot like reality. Even 18 months ago, my own

Scale AI

@scale_AI

5 months

As AI agents start taking real actions online, how do we prevent unintended harm? We teamed up with @OhioState and @UCBerkeley to create WebGuard: the first dataset for evaluating web agent risks and building real-world safety guardrails for online environments. 🧵

0

30

68

Jianyang Gu

@vimar_gu

5 months

Announcing the @NeurIPSConf 2025 workshop on Imageomics: Discovering Biological Knowledge from Images Using AI! The workshop focuses on the interdisciplinary field between machine learning and biological science. We look forward to seeing you in San Diego! #NeurIPS2025

2

15

27

Huan Sun (Hiring Ph.D. students for Fall26)

@hhsun1

5 months

🚨 Postdoc Hiring: I am looking for a postdoc to work on rigorously evaluating and advancing the capabilities and safety of computer-use agents (CUAs), co-advised with @ysu_nlp @osunlp. We welcome strong applicants with experience in CUAs, long-horizon reasoning/planning,

1

29

73

Yu Su

@ysu_nlp

6 months

🔎Agentic search like Deep Research is fundamentally changing web search, but it also brings an evaluation crisis⚠️ Introducing Mind2Web 2: Evaluating Agentic Search with Agents-as-a-Judge - 130 tasks (each requiring avg. 100+ webpages) from 1,000+ hours of expert labor -

3

52

225

Yifei Li (Looking for SU26 Internship)

@YifeiLiPKU

6 months

📢 Introducing AutoSDT, a fully automatic pipeline that collects data-driven scientific coding tasks at scale! We use AutoSDT to collect AutoSDT-5K, enabling open co-scientist models that rival GPT-4o on ScienceAgentBench! Thread below ⬇️ (1/n)

4

26

76

Yu Su

@ysu_nlp

6 months

📈 Scaling may be hitting a wall in the digital world, but it's only beginning in the biological world! We trained a foundation model on 214M images of ~1M species (50% of named species on Earth 🐨🐠🌻🦠) and found emergent properties capturing hidden regularities in nature. 🧵

6

63

298

Vardaan Pahuja

@vardaanpahuja

7 months

🚀 Thrilled to unveil the most exciting project of my PhD: Explorer — Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents TL;DR: A scalable multi-agent pipeline that leverages exploration for diverse web agent trajectory synthesis. 📄 Paper:

5

24

52

Jaylen Jones

@Jaylen_JonesNLP

7 months

RT @hhsun1: Realistic adversarial testing of Computer-Use Agents (CUAs) to identify their vulnerabilities and make them safer and more secu…

0

1

0