Jaylen_JonesNLP Profile Banner
Jaylen Jones Profile
Jaylen Jones

@Jaylen_JonesNLP

Followers
69
Following
162
Media
6
Statuses
41

Ph.D. at The Ohio State University | Researching #ConvAI and #NLProc | Member of OSU NLP Group (@osunlp) and SLATE Lab

Columbus, OH
Joined April 2024
Don't wanna be here? Send us removal request.
@Jaylen_JonesNLP
Jaylen Jones
2 months
Excited to present RedTeamCUA at #COLM2025 in Montreal! Happy to chat about evaluating and mitigating risks in computer-use agents, from adversarial security threats to safety risks in benign use. DMs open! 📅 Friday, Oct 10 🕐 1:30–2:30 📍 Workshop on AI Agents, Room 519B
@LiaoZeyi
Zeyi Liao
7 months
⁉️Can you really trust Computer-Use Agents (CUAs) to control your computer⁉️ Not yet, @AnthropicAI Opus 4 shows an alarming 48% Attack Success Rate against realistic internet injection❗️ Introducing RedTeamCUA: realistic, interactive, and controlled sandbox environments for
0
2
6
@Jaylen_JonesNLP
Jaylen Jones
21 days
Pros: Claude 4.5 Opus sets a new SoTA on OSWorld for computer-use capabilities. Cons: It also achieves the highest ASR (83%) on our new RedTeamCUA evaluation. Capability is rising fast, but trustworthiness depends on security keeping pace.
@hhsun1
Huan Sun (Hiring Ph.D. students for Fall26)
21 days
Computer-use agents are getting more capable, but are they getting more secure? No! We are actually observing the opposite trend. @AnthropicAI @claudeai Opus 4.5 released two days ago tops the OSWorld leaderboard, but is showing the highest Attack Success Rate (ASR) on our
0
2
3
@ysu_nlp
Yu Su
22 days
Life update: I moved to silicon valley to tackle agents' biggest challenges: plasticity and reliability. Today's agents are smart but brittle. They lack plasticity (continual learning and adaptation) and reliability (stable, predictable behavior with bounded failures). These two
40
43
421
@HananeNMoussa
Hanane Nour Moussa
2 months
📢 As AI becomes increasingly explored for research idea generation, how can we rigorously evaluate the ideas it generates before committing time and resources to them? We introduce ScholarEval, a literature grounded framework for research idea evaluation across disciplines 👇!
4
42
144
@gdb
Greg Brockman
2 months
got-5 for astronomy and astrophysics:
@deedydas
Deedy
2 months
GPT-5 and Gemini 2.5 Pro just achieved gold medal performance in the International Olympiad of Astronomy and Astrophysics (IOAA). AI is now world class at cutting edge physics.
61
86
1K
@XllmReasonPlan
XLLM-Reason-Plan
2 months
@COLM_conf #COLM2025 Prof. Huan Sun talking about "How Explanations can Advance Agent Capability and Safety". @hhsun1
0
5
11
@iScienceLuvr
Tanishq Mathew Abraham, Ph.D.
2 months
Agent Learning via Early Experience "training agents from experience data with reinforcement learning remains difficult in many environments, which either lack verifiable rewards (e.g., websites) or require inefficient long-horizon rollouts (e.g., multi-turn tool use)." "We
13
90
399
@LiaoZeyi
Zeyi Liao
2 months
While @AnthropicAI Sonnet 4.5 achieves an impressive leap in computer use, achieving SOTA result on OSWorld, we care about how aligned the model is when against the prompt injection. Our results on RedTeamCUA reveal a concerning trend: it exhibits the highest Attack Success Rate
@claudeai
Claude
3 months
Introducing Claude Sonnet 4.5—the best coding model in the world. It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains on tests of reasoning and math.
0
4
16
@sivareddyg
Siva Reddy
2 months
Panel: Building Capable and Safe AI Agents Panelists: @hllo_wrld (moderator), @ysu_nlp, @Kordjamshidi, @gneubig, Joyce Chai, @taoyds VZ: What the single biggest obstacle for real world agents? YS: continual learning from experience (mention of Sutton's recent interview). RL
@sivareddyg
Siva Reddy
3 months
The IVADO workshop on Agent Capabilities and Safety is happening now at HEC Montreal, Downtown (Oct 3--6) https://t.co/MEL4JAzLRn #LLMAgents
1
3
19
@ysu_nlp
Yu Su
4 months
Computer Use: Modern Moravec's Paradox A new blog post arguing why computer-use agents may be the biggest opportunity and challenge for AGI. https://t.co/6fZfTdx710 Table of Contents > Moravec’s Paradox > Moravec's Paradox in 2025 > Computer use may be the biggest opportunity
9
65
213
@hhsun1
Huan Sun (Hiring Ph.D. students for Fall26)
4 months
I am humbled and grateful to receive two grants from Open Philanthropy @open_phil to advance the safety of AI systems, co-led with my colleague @ysu_nlp. I'm also honored to be the first at @OhioState to receive Open Philanthropy funding. Most credit goes to the amazing students
@OSUengineering
OSUengineering
4 months
Associate Prof. Huan Sun has been awarded two competitive research grants from Open Philanthropy focused on the rapidly evolving field of #AI safety: https://t.co/3UG0YbDbtJ @hhsun1
4
17
79
@boyuan__zheng
Boyuan Zheng
5 months
Remember “Son of Anton” from the Silicon Valley show(@SiliconHBO)? The experimental AI that “efficiently” orders 4,000 lbs of meat while looking for a cheap burger and “fixes” a bug by deleting all the code? It’s starting to look a lot like reality. Even 18 months ago, my own
@scale_AI
Scale AI
5 months
As AI agents start taking real actions online, how do we prevent unintended harm? We teamed up with @OhioState and @UCBerkeley to create WebGuard: the first dataset for evaluating web agent risks and building real-world safety guardrails for online environments. 🧵
0
30
68
@vimar_gu
Jianyang Gu
5 months
Announcing the @NeurIPSConf 2025 workshop on Imageomics: Discovering Biological Knowledge from Images Using AI! The workshop focuses on the interdisciplinary field between machine learning and biological science. We look forward to seeing you in San Diego! #NeurIPS2025
2
15
27
@hhsun1
Huan Sun (Hiring Ph.D. students for Fall26)
5 months
🚨 Postdoc Hiring: I am looking for a postdoc to work on rigorously evaluating and advancing the capabilities and safety of computer-use agents (CUAs), co-advised with @ysu_nlp @osunlp. We welcome strong applicants with experience in CUAs, long-horizon reasoning/planning,
1
29
73
@ysu_nlp
Yu Su
6 months
🔎Agentic search like Deep Research is fundamentally changing web search, but it also brings an evaluation crisis⚠️ Introducing Mind2Web 2: Evaluating Agentic Search with Agents-as-a-Judge - 130 tasks (each requiring avg. 100+ webpages) from 1,000+ hours of expert labor -
3
52
225
@YifeiLiPKU
Yifei Li (Looking for SU26 Internship)
6 months
📢 Introducing AutoSDT, a fully automatic pipeline that collects data-driven scientific coding tasks at scale! We use AutoSDT to collect AutoSDT-5K, enabling open co-scientist models that rival GPT-4o on ScienceAgentBench! Thread below ⬇️ (1/n)
4
26
76
@ysu_nlp
Yu Su
6 months
📈 Scaling may be hitting a wall in the digital world, but it's only beginning in the biological world! We trained a foundation model on 214M images of ~1M species (50% of named species on Earth 🐨🐠🌻🦠) and found emergent properties capturing hidden regularities in nature. 🧵
6
63
298
@vardaanpahuja
Vardaan Pahuja
7 months
🚀 Thrilled to unveil the most exciting project of my PhD: Explorer — Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents TL;DR: A scalable multi-agent pipeline that leverages exploration for diverse web agent trajectory synthesis. 📄 Paper:
5
24
52
@Jaylen_JonesNLP
Jaylen Jones
7 months
RT @hhsun1: Realistic adversarial testing of Computer-Use Agents (CUAs) to identify their vulnerabilities and make them safer and more secu…
0
1
0