Jifan_chen Profile Banner
Jifan Chen Profile
Jifan Chen

@Jifan_chen

Followers
406
Following
4K
Media
53
Statuses
481

Building code agents @awscloud. Ph.D. from @UTAustin. Interpretable and Robust Models #NLProc. I have a super powerful language model in my brain.

Joined March 2014
Don't wanna be here? Send us removal request.
@Jifan_chen
Jifan Chen
3 days
CS == Counter-Strike?
@WenhuChen
Wenhu Chen
4 days
Taken from RedNote.
0
0
0
@percyliang
Percy Liang
1 month
You spend $1B training a model A. Someone on your team leaves and launches their own model API B. You're suspicious. Was B was derived (e.g., fine-tuned) from A? But you only have blackbox access to B... With our paper, you can still tell with strong statistical guarantees
@SallyHZhu
Sally Zhu
1 month
๐Ÿ”ŽDid someone steal your language model? We can tell you, as long as you shuffled your training data๐Ÿ”€. All we need is some text from their model! Concretely, suppose Alice trains an open-weight model and Bob uses it to produce text. Can Alice prove Bob used her model?๐Ÿšจ
55
215
2K
@ManyaWadhwa1
Manya Wadhwa
2 months
Unfortunately I won't be able to attend #COLM2025 in person this year, but please check out our work being presented by my advisors/collaborators! If you are interested in evaluation of open-ended tasks/creativity/reasoning please reach out and we can schedule a chat!
@jessyjli
Jessy Li
2 months
On my way to #COLM2025 ๐Ÿ Check out https://t.co/snFTIg24Am - QUDsim: Discourse templates in LLM stories https://t.co/xqvbDvH5v0 - EvalAgent: retrieval-based eval targeting implicit criteria https://t.co/f3JRojHeLb - RoboInstruct: code generation for robotics with simulators
0
3
19
@AISecHub
AISecHub
2 months
Breaking the Code: Security Assessment of AI Code Agents Through Systematic Jailbreaking Attacks Code-capable large language model (LLM) agents are increasingly embedded into software engineering workflows where they can read, write, and execute code, raising the stakes of
0
5
17
@Jifan_chen
Jifan Chen
2 months
Check JAWS-Bench โ€” a benchmark that stress-tests code agents across three workspaces, led by @ShoumikSaha7 this summer: You build agents? Test them where attackers live: repos, files, tools. You do safety? Care about what runs, not just what the model says.
@ShoumikSaha7
Shoumik Saha
2 months
Code agents donโ€™t just talk -- they execute. What happens when you jailbreak them? Announcing JAWS-Bench (from my summer at @amazon AWS): a benchmark to jailbreak code agents across 3 workspaces -- empty โ†’ single-file โ†’ multi-file. The results? They break. A lot. Details ๐Ÿงต๐Ÿ‘‡
0
1
8
@tanyaagoyal
Tanya Goyal
2 months
๐ŸšจModeling Abstention via Selective Help-seeking LLMs learn to use search tools to answer questions they would otherwise hallucinate on. But can this also teach them what they know vs not? @momergul_ introduces MASH that trains LLMs for search and gets abstentions for free!
1
22
37
@Jifan_chen
Jifan Chen
4 months
Congrats Greg! The new logo actually maintains the UT legacy. Liked it a lot!
@gregd_nlp
Greg Durrett
4 months
๐Ÿ“ขI'm joining NYU (Courant CS + Center for Data Science) starting this fall! Iโ€™m excited to connect with new NYU colleagues and keep working on LLM reasoning, reliability, coding, creativity, and more! Iโ€™m also looking to build connections in the NYC area more broadly. Please
1
0
5
@yoavgo
(((ู„()(ู„() 'yoav))))๐Ÿ‘พ
4 months
ok it really *does* feel like having an ambitious STEM PhD in your pocket!
9
5
166
@rasoolfa
Rasool Fakoor
4 months
Our team is *hiring* interns & researchers! Weโ€™re a small team of hardcore researchers & engineers working on foundation models, agentic methods, and embodiment. If you have strong publications and related experience, plz fill out application form. https://t.co/U4gOvNQ9qR
1
3
14
@Jifan_chen
Jifan Chen
4 months
๐Ÿ˜ˆ๐Ÿ˜ˆ
@zhang_yian
Yian Zhang
4 months
We want to set a SUPER high bar for OAI's open-source release ๐Ÿ˜‰
0
0
1
@Jifan_chen
Jifan Chen
4 months
Really happy to finally see this work published after several delays. Sometimes good things take time! ๐ŸŽ‰ Good food for thought during the weekend : ) #ACL2025
@yumo_xu
Yumo Xu
4 months
Excited to share our #ACL2025NLP paper, "๐‚๐ข๐ญ๐ž๐„๐ฏ๐š๐ฅ: ๐๐ซ๐ข๐ง๐œ๐ข๐ฉ๐ฅ๐ž-๐ƒ๐ซ๐ข๐ฏ๐ž๐ง ๐‚๐ข๐ญ๐š๐ญ๐ข๐จ๐ง ๐„๐ฏ๐š๐ฅ๐ฎ๐š๐ญ๐ข๐จ๐ง ๐Ÿ๐จ๐ซ ๐’๐จ๐ฎ๐ซ๐œ๐ž ๐€๐ญ๐ญ๐ซ๐ข๐›๐ฎ๐ญ๐ข๐จ๐ง"! ๐Ÿ“œ If youโ€™re working on RAG, Deep Research and Trustworthy AI, this is for you. Why? Citation quality is
1
0
5
@HanRujun
Rujun Han
4 months
Very excited to share the project I've been working on over the past several months! We proposed Deep Researcher with Test-Time Diffusion, a novel method to leverage iterative draft+revision to tackle complex questions demanding exhaustive search and reasoning.
3
9
28
@ajassy
Andy Jassy
5 months
Introducing Kiro, an all-new agentic IDE that has a chance to transform how developers build software. Let me highlight three key innovations that make Kiro special: 1 - Kiro introduces spec-driven development, helping developers express their intent clearly through natural
130
408
2K
@ManyaWadhwa1
Manya Wadhwa
5 months
Happy to share that EvalAgent has been accepted to #COLM2025 @COLM_conf ๐ŸŽ‰๐Ÿ‡จ๐Ÿ‡ฆ We introduce a framework to identify implicit and diverse evaluation criteria for various open-ended tasks! ๐Ÿ“œ
@ManyaWadhwa1
Manya Wadhwa
7 months
Evaluating language model responses on open-ended tasks is hard! ๐Ÿค” We introduce EvalAgent, a framework that identifies nuanced and diverse criteria ๐Ÿ“‹โœ๏ธ. EvalAgent identifies ๐Ÿ‘ฉโ€๐Ÿซ๐ŸŽ“ expert advice on the web that implicitly address the userโ€™s prompt ๐Ÿงต๐Ÿ‘‡
1
19
77
@zijianwang30
Zijian Wang @ NeurIPS
5 months
After three successful runs of #DL4C at ICLRโ€™22 (remote), ICLRโ€™23 (๐Ÿ‡ท๐Ÿ‡ผ/remote), and ICLRโ€™25 (๐Ÿ‡ธ๐Ÿ‡ฌ), Iโ€™m thrilled to announce the 4th #DL4C workshop, ๐——๐—ฒ๐—ฒ๐—ฝ ๐—Ÿ๐—ฒ๐—ฎ๐—ฟ๐—ป๐—ถ๐—ป๐—ด ๐—ณ๐—ผ๐—ฟ ๐—–๐—ผ๐—ฑ๐—ฒ ๐—ถ๐—ป ๐˜๐—ต๐—ฒ ๐—”๐—ด๐—ฒ๐—ป๐˜๐—ถ๐—ฐ ๐—˜๐—ฟ๐—ฎ, is coming to #NeurIPS2025 in San Diego, marking our first
@DL4Code
Deep Learning For Code @ NeurIPS'25
5 months
๐Ÿ“ฃExcited to announce that the 4th #DL4C workshop โ€œDeep Learning for Code in the Agentic Era" is coming to @NeurIPSConf 2025! AI coding agents are transforming software development at an unprecedented pace. Join us to explore the cutting edge of agent-based programming,
2
7
24
@qi2peng2
Peng Qi
5 months
Seven years ago, I co-led a paper called ๐—›๐—ผ๐˜๐—ฝ๐—ผ๐˜๐—ค๐—” that has motivated and facilitated many #AI #Agents research works since. Today, I'm asking that you stop using HotpotQA blindly for agents research in 2025 and beyond. In my new blog post, I revisit the brief history of
7
46
224
@KaiserWhoLearns
Kaiser Sun
6 months
What happens when an LLM is asked to use information that contradicts its knowledge? We explore knowledge conflict in a new preprint๐Ÿ“‘ TLDR: Performance drops, and this could affect the overall performance of LLMs in model-based evaluation.๐Ÿ“‘๐Ÿงตโฌ‡๏ธ 1/8 #NLProc #LLM #AIResearch
4
23
86
@ZEYULIU10
Leo Liu
6 months
LLMs trained to memorize new facts canโ€™t use those facts well.๐Ÿค” We apply a hypernetwork to โœ๏ธeditโœ๏ธ the gradients for fact propagation, improving accuracy by 2x on a challenging subset of RippleEdit!๐Ÿ’ก Our approach, PropMEND, extends MEND with a new objective for propagation.
5
75
197
@gregd_nlp
Greg Durrett
6 months
Great to work on this benchmark with astronomers in our NSF-Simons CosmicAI institute! What I like about it: (1) focus on data processing & visualization, a "bite-sized" AI4Sci task (not automating all of research) (2) eval with VLM-as-a-judge (possible with strong, modern VLMs)
@sebajoed
Sebastian Joseph
6 months
How good are LLMs at ๐Ÿ”ญ scientific computing and visualization ๐Ÿ”ญ? AstroVisBench tests how well LLMs implement scientific workflows in astronomy and visualize results. SOTA models like Gemini 2.5 Pro & Claude 4 Opus only match ground truth scientific utility 16% of the time. ๐Ÿงต
2
4
25
@ZEYULIU10
Leo Liu
6 months
Have you thought about making your reasoning model stronger through *skill composition*? It's not as hard as you'd imagine! Check out our work!!!
@fangcong_y10593
Fangcong Yin
6 months
Solving complex problems with CoT requires combining different skills. We can do this by: ๐ŸงฉModify the CoT data format to be โ€œcomposableโ€ with other skills ๐Ÿ”ฅTrain models on each skill ๐Ÿ“ŒCombine those models Lead to better 0-shot reasoning on tasks involving skill composition!
1
2
11