Philippe Laban Profile
Philippe Laban

@PhilippeLaban

Followers
1K
Following
3K
Media
44
Statuses
330

Research Scientist @MSFTResearch. NLP/HCI Research.

New York City
Joined April 2022
Don't wanna be here? Send us removal request.
@rkdsaakyan
Arkadiy Saakyan
4 days
N-gram novelty is widely used as a measure of creativity and generalization. But if LLMs produce highly n-gram novel expressions that don’t make sense or sound awkward, should they still be called creative? In a new paper, we investigate how n-gram novelty relates to creativity.
1
13
42
@tiancheng_hu
Tiancheng Hu
11 days
Can AI simulate human behavior? 🧠 The promise is revolutionary for science & policy. But there’s a huge "IF": Do these simulations actually reflect reality? To find out, we introduce SimBench: The first large-scale benchmark for group-level social simulation. (1/9)
3
21
52
@jennajrussell
Jenna Russell
17 days
AI is already at work in American newsrooms. We examine 186k articles published this summer and find that ~9% are either fully or partially AI-generated, usually without readers having any idea. Here's what we learned about how AI is influencing local and national journalism:
4
52
143
@TuhinChakr
Tuhin Chakrabarty
18 days
🚨New paper on AI and copyright Several authors have sued LLM companies for allegedly using their books without permission for model training. 👩‍⚖️Courts, however, require empirical evidence of harm (e.g., market dilution). Our new pre-registered study addresses exactly this
9
171
524
@alexisjross
Alexis Ross
25 days
Can LLMs reason like a student? 👩🏻‍🎓📚✏️ For educational tools like AI tutors, modeling how students make mistakes is crucial. But current LLMs are much worse at simulating student errors ❌ than performing correct ✅ reasoning. We try to fix that with our method MISTAKE 🤭👇
11
56
336
@alexisjross
Alexis Ross
1 month
New preprint on AI/CS education‼️ We ask what we can learn abt both code & coders (students learning to code) by training on their full coding traces. Hint: we get richer models of *diverse student behavior* that are also more *generalizable & controllable*! Thread below ⬇️
@megha_byte
Megha Srivastava
1 month
New preprint on AI + Education! 🍎 “Modeling Student Learning with 3.8M Program Traces” 💻 When students code, their edits tell a story about their reasoning process: exploring, debugging, and tinkering 🧠 What can LMs learn from training on student edit sequences? 📚
1
14
83
@alexisjross
Alexis Ross
28 days
One of my takeaways from #COLM2025 was that people are thinking a lot about user simulation (have been thinking about this myself in the context of tutoring!) Really exciting to see this work on the topic 🤩
@tareknaous
Tarek Naous
29 days
Simulating user–AI conversations helps us understand how LMs work in multi-turn settings. Prompting LMs like GPT-4o to simulate users is common, but their assistant nature makes it hard to replicate user behavior. We introduce User LMs - trained to be users, not assistants.
7
13
110
@tareknaous
Tarek Naous
29 days
Simulating user–AI conversations helps us understand how LMs work in multi-turn settings. Prompting LMs like GPT-4o to simulate users is common, but their assistant nature makes it hard to replicate user behavior. We introduce User LMs - trained to be users, not assistants.
2
27
146
@victormustar
Victor M
30 days
Microsoft did something interesting here 👀 “Unlike typical LLMs that are trained to play the role of the "assistant" in conversation, we trained UserLM-8b to simulate the “user” role in conversation” https://t.co/mGgWZBvu7o
Tweet card summary image
huggingface.co
49
180
2K
@ssuri
Siddharth (Sid) Suri
1 month
The AI, Interaction, and Learning team at Microsoft Research is looking for interns! If you're working on your PhD in computer science, statistics, economics, computational social science or related fields apply: Research Intern MSR AI Interaction and Learning | Microsoft Careers
1
6
49
@kthai1618
Katherine Thai
1 month
As a case study, we built a dataset by applying 9 different Grammarly edits to the same text. According to EditLens, “Fix any mistakes” is the most mild change, while “Make it more detailed” and “Summarize it” are the most invasive. 7/
1
1
9
@max_spero_
Max Spero
1 month
We can now quantify the magnitude of AI edits in a text! Coming soon to Pangram.
@kthai1618
Katherine Thai
1 month
As a case study, we built a dataset by applying 9 different Grammarly edits to the same text. According to EditLens, “Fix any mistakes” is the most mild change, while “Make it more detailed” and “Summarize it” are the most invasive. 7/
5
6
42
@PhilippeLaban
Philippe Laban
1 month
Come see us in COLM!! More importantly, if you're thinking of doing a PhD, go work with wonderful Tuhin!
@TuhinChakr
Tuhin Chakrabarty
1 month
I am @COLM_conf in Montreal. @PhilippeLaban and I will present work on #AISlop and Calibrated Reward Models for Writing. I will also be admitting 1 PhD student next fall at @sbucompsc to work on Human Centered AI / AI detection / Copyright and Creative Labor. Reach out !!
1
5
34
@TuhinChakr
Tuhin Chakrabarty
1 month
I am @COLM_conf in Montreal. @PhilippeLaban and I will present work on #AISlop and Calibrated Reward Models for Writing. I will also be admitting 1 PhD student next fall at @sbucompsc to work on Human Centered AI / AI detection / Copyright and Creative Labor. Reach out !!
@TuhinChakr
Tuhin Chakrabarty
7 months
Unlike math/code, writing lacks verifiable rewards. So all we get is slop. To solve this we train reward models on expert edits that beat SOTA #LLMs largely on a new Writing Quality benchmark. We also reduce #AI slop by using our RMs at test time boosting alignment with experts.
1
10
39
@tanyaagoyal
Tanya Goyal
1 month
🚨Modeling Abstention via Selective Help-seeking LLMs learn to use search tools to answer questions they would otherwise hallucinate on. But can this also teach them what they know vs not? @momergul_ introduces MASH that trains LLMs for search and gets abstentions for free!
1
21
36
@max_spero_
Max Spero
2 months
Good news, @emollick! We finally got our independent study of FPR. @alexolegimas and @brian_jabarian studied Pangram alongside other AI detectors and found that Pangram had zero false positives (at a threshold of 0.5) among their dataset of 7,968 human writing samples.
@emollick
Ethan Mollick
6 months
Getting lots of replies pushing Pangram Labs here. They claim very low false positive rates on their website. I remain doubtful without independent assessment of false positives (this study was not meant to do that), & concerned that these detectors are used adversarially.
2
5
41
@brian_jabarian
Brian Jabarian
2 months
AI-generated text is everywhere: hard for orgs to assess human performance. Can we detect it while min false accusations? Yes! With @alexolegimas we audit detectors, show incredible accuracy ~0 (!!) false pos & neg; and we offer a policy framework for evaluating trade-offs. 🧵
20
78
330
@yoonjoo_le2
Yoonjoo Lee @ COLM 2025
2 months
🎓I officially defended my PhD! Huge thanks to my amazing advisor @juhokim and committee @eytanadar @aliceoh @tongshuangwu @seo_minjoon. This fall, I'm excited to join @UMichCSE as a postdoc with @QVeraLiao to continue my research in human-centered AI and cognitive alignment!💙
48
11
325
@tae_skim
Tae Soo Kim
3 months
You ask ChatGPT to write a message for: 💼 Credit-stealing colleague → “Building on *my* idea…” 🏠 Messy roommate → “babe wake up new mold just dropped” Same you. Same task. Different context. Can LLMs learn this? 🤔 We built CUPID 🏹 to find out. 🔗 https://t.co/JrAdy2SbSV
4
23
68
@allen_ai
Ai2
3 months
With fresh support of $75M from @NSF and $77M from @NVIDIA, we’re set to scale our open model ecosystem, bolster the infrastructure behind it, and fast‑track reproducible AI research to unlock the next wave of scientific discovery. 💡
36
79
751