KaiserWhoLearns Profile Banner
Kaiser Sun Profile
Kaiser Sun

@KaiserWhoLearns

Followers
1K
Following
2K
Media
40
Statuses
351

Ph.D. student at @jhuclsp, human LM that hallucinates. Formerly @MetaAI, @uwnlp, and @AWS they/them🏳️‍🌈. #NLProc

My fantasea
Joined May 2021
Don't wanna be here? Send us removal request.
@KaiserWhoLearns
Kaiser Sun
3 months
What happens when an LLM is asked to use information that contradicts its knowledge? We explore knowledge conflict in a new preprint📑.TLDR: Performance drops, and this could affect the overall performance of LLMs in model-based evaluation.📑🧵⬇️ 1/8.#NLProc #LLM #AIResearch
Tweet media one
4
21
84
@KaiserWhoLearns
Kaiser Sun
20 days
RT @JentseHuang: Think about a task like “do these two images show the same object?” Humans either nail it (≈100%) or, if they don’t unders….
0
12
0
@grok
Grok
25 days
Blazing-fast image creation – using just your voice. Try Grok Imagine.
332
666
4K
@KaiserWhoLearns
Kaiser Sun
1 month
RT @zhang_yian: We want to set a SUPER high bar for OAI's open-source release 😉.
0
3
0
@KaiserWhoLearns
Kaiser Sun
2 months
Tokenization is most likely the reason whenever I had a bug in my model 🫠.
@_albertgu
Albert Gu
2 months
I converted one of my favorite talks I've given over the past year into a blog post. "On the Tradeoffs of SSMs and Transformers".(or: tokens are bullshit). In a few days, we'll release what I believe is the next major advance for architectures.
Tweet media one
0
0
1
@KaiserWhoLearns
Kaiser Sun
2 months
RT @BafnaNiyati: 📢When LLMs solve tasks with a mid-to-low resource input/target language, their output quality is poor. We know that. But c….
0
12
0
@KaiserWhoLearns
Kaiser Sun
2 months
RT @ChengleiSi: Are AI scientists already better than human researchers?. We recruited 43 PhD students to spend 3 months executing research….
0
181
0
@KaiserWhoLearns
Kaiser Sun
2 months
RT @nouhadziri: 📢 Can LLMs really reason outside the box in math? Or are they just remixing familiar strategies? . Remember DeepSeek R1, o1….
0
156
0
@KaiserWhoLearns
Kaiser Sun
3 months
RT @chrome1996: Have you noticed….🔍 Aligned LLM generations feel less diverse?.🎯 Base models are decoding-sensitive?.🤔 Generations get more….
0
28
0
@KaiserWhoLearns
Kaiser Sun
3 months
RT @mdredze: Our new paper explores knowledge conflict in LLMs. It also issues a word of warning to those using LLMs as a Judge: the model….
0
12
0
@KaiserWhoLearns
Kaiser Sun
3 months
🛠️ Interested in how your LLM behaves under this circumstance? We released the code to generate the diagnostic data for your own LLM. @mdredze @loadingfan .8/8.
0
0
4
@KaiserWhoLearns
Kaiser Sun
3 months
🔗 Takeaways for practitioners.1. Check for knowledge conflict before prompting. 2. Add further explanation to guide the model in following the context. 3. Monitor hallucinations even when context is supplied. 7/8.
1
0
3
@KaiserWhoLearns
Kaiser Sun
3 months
📏 Implications:. ⚡When using an LLM as a judge, its parametric knowledge could lead to incorrect judgment :( . ⚡ Retrieval systems need mechanisms to detect and resolve contradictions, not just shove text into the prompt. 6/8.
1
0
3
@KaiserWhoLearns
Kaiser Sun
3 months
🧠 Key finding #3:. “Just give them more explanation?” Providing rationales helps—it pushes models to lean more on the context—but it still can’t fully silence the stubborn parametric knowledge. 5/8.
1
0
5
@KaiserWhoLearns
Kaiser Sun
3 months
⚖️ Key finding #2:. Unsurprisingly, LLMs prefer their own memories. Even when we explicitly instruct them to rely on the provided document, traces of the “wrong” internal belief keep leaking into answers. 4/8.
1
0
6
@KaiserWhoLearns
Kaiser Sun
3 months
⚠️ Key finding #1:. If the task doesn’t require external knowledge (e.g., pure copy), conflict barely matters. However, as soon as knowledge is needed, accuracy tanks when context and memory disagree. 3/8.
1
0
5
@KaiserWhoLearns
Kaiser Sun
3 months
🛠️ We create diagnostic data that….- Agrees/Contradicts with the model’s knowledge.- Contradictions with different levels of plausibility.- Tasks requiring different levels of knowledge .2/8
Tweet media one
Tweet media two
1
0
6
@KaiserWhoLearns
Kaiser Sun
3 months
RT @BafnaNiyati: We know speech LID systems flunk on accented speech. But why? And what to do about it?🤔Our work (I….
0
8
0
@KaiserWhoLearns
Kaiser Sun
3 months
RT @tpimentelms: A string may get 17 times less probability if tokenised as two symbols (e.g., ⟨he, llo⟩) than as one (e.g., ⟨hello⟩)—by an….
0
24
0
@KaiserWhoLearns
Kaiser Sun
3 months
RT @alex_gill_nlp: 𝐖𝐡𝐚𝐭 𝐇𝐚𝐬 𝐁𝐞𝐞𝐧 𝐋𝐨𝐬𝐭 𝐖𝐢𝐭𝐡 𝐒𝐲𝐧𝐭𝐡𝐞𝐭𝐢𝐜 𝐄𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧?. I'm happy to announce that the preprint release of my first project is on….
0
19
0