Jifan Chen @Jifan_chen X Profile

Jifan Chen

@Jifan_chen

Followers

406

Following

4K

Media

53

Statuses

481

Building code agents @awscloud. Ph.D. from @UTAustin. Interpretable and Robust Models #NLProc. I have a super powerful language model in my brain.

https://t.co/E5OWKLN4f3

Joined March 2014

Don't wanna be here? Send us removal request.

Jifan Chen

@Jifan_chen

3 days

CS == Counter-Strike?

Wenhu Chen

@WenhuChen

4 days

Taken from RedNote.

0

Percy Liang

@percyliang

1 month

You spend $1B training a model A. Someone on your team leaves and launches their own model API B. You're suspicious. Was B was derived (e.g., fine-tuned) from A? But you only have blackbox access to B... With our paper, you can still tell with strong statistical guarantees

Sally Zhu

@SallyHZhu

1 month

🔎Did someone steal your language model? We can tell you, as long as you shuffled your training data🔀. All we need is some text from their model! Concretely, suppose Alice trains an open-weight model and Bob uses it to produce text. Can Alice prove Bob used her model?🚨

55

215

2K

Manya Wadhwa

@ManyaWadhwa1

2 months

Unfortunately I won't be able to attend #COLM2025 in person this year, but please check out our work being presented by my advisors/collaborators! If you are interested in evaluation of open-ended tasks/creativity/reasoning please reach out and we can schedule a chat!

Jessy Li

@jessyjli

2 months

On my way to #COLM2025 🍁 Check out https://t.co/snFTIg24Am - QUDsim: Discourse templates in LLM stories https://t.co/xqvbDvH5v0 - EvalAgent: retrieval-based eval targeting implicit criteria https://t.co/f3JRojHeLb - RoboInstruct: code generation for robotics with simulators

0

3

19

AISecHub

@AISecHub

2 months

Breaking the Code: Security Assessment of AI Code Agents Through Systematic Jailbreaking Attacks Code-capable large language model (LLM) agents are increasingly embedded into software engineering workflows where they can read, write, and execute code, raising the stakes of

0

5

17

Jifan Chen

@Jifan_chen

2 months

Check JAWS-Bench — a benchmark that stress-tests code agents across three workspaces, led by @ShoumikSaha7 this summer: You build agents? Test them where attackers live: repos, files, tools. You do safety? Care about what runs, not just what the model says.

Shoumik Saha

@ShoumikSaha7

2 months

Code agents don’t just talk -- they execute. What happens when you jailbreak them? Announcing JAWS-Bench (from my summer at @amazon AWS): a benchmark to jailbreak code agents across 3 workspaces -- empty → single-file → multi-file. The results? They break. A lot. Details 🧵👇

0

1

8

Tanya Goyal

@tanyaagoyal

2 months

🚨Modeling Abstention via Selective Help-seeking LLMs learn to use search tools to answer questions they would otherwise hallucinate on. But can this also teach them what they know vs not? @momergul_ introduces MASH that trains LLMs for search and gets abstentions for free!

1

22

37

Jifan Chen

@Jifan_chen

4 months

Congrats Greg! The new logo actually maintains the UT legacy. Liked it a lot!

Greg Durrett

@gregd_nlp

4 months

📢I'm joining NYU (Courant CS + Center for Data Science) starting this fall! I’m excited to connect with new NYU colleagues and keep working on LLM reasoning, reliability, coding, creativity, and more! I’m also looking to build connections in the NYC area more broadly. Please

1

0

5

(((ل()(ل() 'yoav))))👾

@yoavgo

4 months

ok it really *does* feel like having an ambitious STEM PhD in your pocket!

9

5

166

Rasool Fakoor

@rasoolfa

4 months

Our team is *hiring* interns & researchers! We’re a small team of hardcore researchers & engineers working on foundation models, agentic methods, and embodiment. If you have strong publications and related experience, plz fill out application form. https://t.co/U4gOvNQ9qR

1

3

14

Jifan Chen

@Jifan_chen

4 months

😈😈

Yian Zhang

@zhang_yian

4 months

We want to set a SUPER high bar for OAI's open-source release 😉

0

1

Jifan Chen

@Jifan_chen

4 months

Really happy to finally see this work published after several delays. Sometimes good things take time! 🎉 Good food for thought during the weekend : ) #ACL2025

Yumo Xu

@yumo_xu

4 months

Excited to share our #ACL2025NLP paper, "𝐂𝐢𝐭𝐞𝐄𝐯𝐚𝐥: 𝐏𝐫𝐢𝐧𝐜𝐢𝐩𝐥𝐞-𝐃𝐫𝐢𝐯𝐞𝐧 𝐂𝐢𝐭𝐚𝐭𝐢𝐨𝐧 𝐄𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧 𝐟𝐨𝐫 𝐒𝐨𝐮𝐫𝐜𝐞 𝐀𝐭𝐭𝐫𝐢𝐛𝐮𝐭𝐢𝐨𝐧"! 📜 If you’re working on RAG, Deep Research and Trustworthy AI, this is for you. Why? Citation quality is

1

0

5

Rujun Han

@HanRujun

4 months

Very excited to share the project I've been working on over the past several months! We proposed Deep Researcher with Test-Time Diffusion, a novel method to leverage iterative draft+revision to tackle complex questions demanding exhaustive search and reasoning.

3

9

28

Andy Jassy

@ajassy

5 months

Introducing Kiro, an all-new agentic IDE that has a chance to transform how developers build software. Let me highlight three key innovations that make Kiro special: 1 - Kiro introduces spec-driven development, helping developers express their intent clearly through natural

130

408

2K

Manya Wadhwa

@ManyaWadhwa1

5 months

Happy to share that EvalAgent has been accepted to #COLM2025 @COLM_conf 🎉🇨🇦 We introduce a framework to identify implicit and diverse evaluation criteria for various open-ended tasks! 📜

Manya Wadhwa

@ManyaWadhwa1

7 months

Evaluating language model responses on open-ended tasks is hard! 🤔 We introduce EvalAgent, a framework that identifies nuanced and diverse criteria 📋✍️. EvalAgent identifies 👩‍🏫🎓 expert advice on the web that implicitly address the user’s prompt 🧵👇

1

19

77

Zijian Wang @ NeurIPS

@zijianwang30

5 months

After three successful runs of #DL4C at ICLR’22 (remote), ICLR’23 (🇷🇼/remote), and ICLR’25 (🇸🇬), I’m thrilled to announce the 4th #DL4C workshop, 𝗗𝗲𝗲𝗽 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗳𝗼𝗿 𝗖𝗼𝗱𝗲 𝗶𝗻 𝘁𝗵𝗲 𝗔𝗴𝗲𝗻𝘁𝗶𝗰 𝗘𝗿𝗮, is coming to #NeurIPS2025 in San Diego, marking our first

Deep Learning For Code @ NeurIPS'25

@DL4Code

5 months

📣Excited to announce that the 4th #DL4C workshop “Deep Learning for Code in the Agentic Era" is coming to @NeurIPSConf 2025! AI coding agents are transforming software development at an unprecedented pace. Join us to explore the cutting edge of agent-based programming,

2

7

24

Peng Qi

@qi2peng2

5 months

Seven years ago, I co-led a paper called 𝗛𝗼𝘁𝗽𝗼𝘁𝗤𝗔 that has motivated and facilitated many #AI #Agents research works since. Today, I'm asking that you stop using HotpotQA blindly for agents research in 2025 and beyond. In my new blog post, I revisit the brief history of

7

46

224

Kaiser Sun

@KaiserWhoLearns

6 months

What happens when an LLM is asked to use information that contradicts its knowledge? We explore knowledge conflict in a new preprint📑 TLDR: Performance drops, and this could affect the overall performance of LLMs in model-based evaluation.📑🧵⬇️ 1/8 #NLProc #LLM #AIResearch

4

23

86

Leo Liu

@ZEYULIU10

6 months

LLMs trained to memorize new facts can’t use those facts well.🤔 We apply a hypernetwork to ✏️edit✏️ the gradients for fact propagation, improving accuracy by 2x on a challenging subset of RippleEdit!💡 Our approach, PropMEND, extends MEND with a new objective for propagation.

5

75

197

Greg Durrett

@gregd_nlp

6 months

Great to work on this benchmark with astronomers in our NSF-Simons CosmicAI institute! What I like about it: (1) focus on data processing & visualization, a "bite-sized" AI4Sci task (not automating all of research) (2) eval with VLM-as-a-judge (possible with strong, modern VLMs)

Sebastian Joseph

@sebajoed

6 months

How good are LLMs at 🔭 scientific computing and visualization 🔭? AstroVisBench tests how well LLMs implement scientific workflows in astronomy and visualize results. SOTA models like Gemini 2.5 Pro & Claude 4 Opus only match ground truth scientific utility 16% of the time. 🧵

2

4

25

Leo Liu

@ZEYULIU10

6 months

Have you thought about making your reasoning model stronger through *skill composition*? It's not as hard as you'd imagine! Check out our work!!!

Fangcong Yin

@fangcong_y10593

6 months

Solving complex problems with CoT requires combining different skills. We can do this by: 🧩Modify the CoT data format to be “composable” with other skills 🔥Train models on each skill 📌Combine those models Lead to better 0-shot reasoning on tasks involving skill composition!

1

2

11