Arkadiy Saakyan
@rkdsaakyan
Followers
174
Following
699
Media
19
Statuses
63
PhD student @ColumbiaCompSci @columbianlp working on human-AI collaboration, AI creativity and explainability. prev. intern @GoogleDeepMind, @AmazonScience
Manhattan, NY
Joined September 2021
N-gram novelty is widely used as a measure of creativity and generalization. But if LLMs produce highly n-gram novel expressions that don’t make sense or sound awkward, should they still be called creative? In a new paper, we investigate how n-gram novelty relates to creativity.
1
13
43
I am recruiting 2 PhD students to work on LM interpretability at UMD @umdcs starting in fall 2026! We are #3 in AI and #4 in NLP research on @CSrankings. Come join us in our lovely building just a few miles from Washington, D.C. Details in 🧵
13
157
722
See more details in the paper! Paper link: https://t.co/OA0f43WIBv Github link:
github.com
Repository for the paper Death of the Novel(ty): Beyond n-Gram Novelty as a Metric for Textual Creativity - asaakyan/ngram-creativity
0
0
2
On OOD dataset StyleMirror, we find that LLM-Judge novelty scores are associated with expert preferences to a larger extent than a previously proposed n-gram novelty metric, Creativity Index, suggesting our operationalization yields a more aligned metric for textual creativity.
1
0
1
Writing quality reward model scores are associated with both creativity and pragmaticality judgements, but are not interpretable. LLM-judge can replicate some expert novelty judgements but struggle with identifying non-pragmatic expressions.
1
0
1
In a follow-up study with GPT-5 and Claude, we observe that the rate of human-judged creative expression in AI-written text is significantly lower than in human-written text.
1
0
1
Further, we find that both open source models tested, OLMo-1 and 2 of 7B and 32B size, exhibit a negative relationship between n-gram novelty and pragmaticality. As open-source LLMs try to generate text not present in data, their expressions tend to make less sense in context.
1
0
1
N-gram novelty is not a reliable metric of creativity: over *90%* of top-quartile n-gram novelty expressions were not judged as creative. We find many examples of low n-gram novelty expressions rated creative and high n-gram novelty expressions rated as non-pragmatic.
1
0
1
We recruit expert writers with MFA/MA/PhD background. They rated expressions in human- and AI-generated (from fully (code + DATA) open-source OLMo models @allenai) passages for if they make sense, are pragmatic, and are novel; they could also highlight any creative expressions.
1
0
1
The standard definition of creativity states the product has to be both novel AND appropriate. Similarly, we operationalize textual creativity as human-judged expression novelty AND sensicality (making sense by itself) + pragmaticality (making sense in context).
1
0
1
AI is already at work in American newsrooms. We examine 186k articles published this summer and find that ~9% are either fully or partially AI-generated, usually without readers having any idea. Here's what we learned about how AI is influencing local and national journalism:
4
52
143
🚨New paper on AI and copyright Several authors have sued LLM companies for allegedly using their books without permission for model training. 👩⚖️Courts, however, require empirical evidence of harm (e.g., market dilution). Our new pre-registered study addresses exactly this
9
171
524
frontier model still worse than text-davinci-001 who would have thought?
82
121
2K
We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.
236
1K
7K
A bit late to announce, but I’m excited to share that I'll be starting as an assistant professor at the University of Maryland @umdcs this August. I'll be recruiting PhD students this upcoming cycle for fall 2026. (And if you're a UMD grad student, sign up for my fall seminar!)
70
50
608
🤔 What if you gave an LLM thousands of random human-written paragraphs and told it to write something new -- while copying 90% of its output from those texts? 🧟 You get what we call a Frankentext! 💡 Frankentexts are surprisingly coherent and tough for AI detectors to flag.
6
32
122
What does it mean for #LLM output to be novel? In work w/ @jcyhc_ai, @JanePan_, @valeriechen_, @hhexiy we argue it needs to be both original and high quality. While prompting tricks trade one for the other, better models (scaling/post-training) can shift the novelty frontier 🧵
2
29
83
See more experiments and details in our paper: https://t.co/oWDjfHmdp3 And come see our poster at NAACL :) Joint work by Shreyas Kulkarni, @TuhinChakr , @SmaraMuresanNLP
arxiv.org
Large Vision-Language Models (VLMs) have demonstrated strong capabilities in tasks requiring a fine-grained understanding of literal meaning in images and text, such as visual question-answering...
0
0
0
Even powerful models achieve only 50% explanation adequacy rate, suggesting difficulties in reasoning about figurative inputs. Hallucination & unsound reasoning are the most prominent error categories.
1
0
0
We find that: 1. VLMs struggle to generalize from literal to figurative meaning understanding (training on e-ViL only achieves random F1 on our task) 2. Figurative meaning in the image is harder to explain compared to when it is in the text 3. VLMs benefit from image data in FT
1
0
0