Kaiser Sun @KaiserWhoLearns X Profile

Kaiser Sun

@KaiserWhoLearns

Followers

1K

Following

2K

Media

40

Statuses

348

Ph.D. student at @jhuclsp, human LM that hallucinates. Formerly @MetaAI, @uwnlp, and @AWS they/them🏳️‍🌈. #NLProc

My fantasea

Joined May 2021

Don't wanna be here? Send us removal request.

Kaiser Sun

@KaiserWhoLearns

28 days

What happens when an LLM is asked to use information that contradicts its knowledge? We explore knowledge conflict in a new preprint📑.TLDR: Performance drops, and this could affect the overall performance of LLMs in model-based evaluation.📑🧵⬇️ 1/8.#NLProc #LLM #AIResearch

4

21

82

Kaiser Sun

@KaiserWhoLearns

5 days

Tokenization is most likely the reason whenever I had a bug in my model 🫠.

Albert Gu

@_albertgu

6 days

I converted one of my favorite talks I've given over the past year into a blog post. "On the Tradeoffs of SSMs and Transformers".(or: tokens are bullshit). In a few days, we'll release what I believe is the next major advance for architectures.

0

1

Kaiser Sun

@KaiserWhoLearns

9 days

RT @BafnaNiyati: 📢When LLMs solve tasks with a mid-to-low resource input/target language, their output quality is poor. We know that. But c….

0

9

0

Kaiser Sun

@KaiserWhoLearns

14 days

RT @ChengleiSi: Are AI scientists already better than human researchers?. We recruited 43 PhD students to spend 3 months executing research….

0

165

0

Kaiser Sun

@KaiserWhoLearns

19 days

RT @nouhadziri: 📢 Can LLMs really reason outside the box in math? Or are they just remixing familiar strategies? . Remember DeepSeek R1, o1….

0

158

0

Kaiser Sun

@KaiserWhoLearns

20 days

RT @chrome1996: Have you noticed….🔍 Aligned LLM generations feel less diverse?.🎯 Base models are decoding-sensitive?.🤔 Generations get more….

0

26

0

Kaiser Sun

@KaiserWhoLearns

28 days

RT @mdredze: Our new paper explores knowledge conflict in LLMs. It also issues a word of warning to those using LLMs as a Judge: the model….

0

9

0

Kaiser Sun

@KaiserWhoLearns

28 days

🛠️ Interested in how your LLM behaves under this circumstance? We released the code to generate the diagnostic data for your own LLM. @mdredze @loadingfan .8/8.

0

4

Kaiser Sun

@KaiserWhoLearns

28 days

🔗 Takeaways for practitioners.1. Check for knowledge conflict before prompting. 2. Add further explanation to guide the model in following the context. 3. Monitor hallucinations even when context is supplied. 7/8.

1

0

3

Kaiser Sun

@KaiserWhoLearns

28 days

📏 Implications:. ⚡When using an LLM as a judge, its parametric knowledge could lead to incorrect judgment :( . ⚡ Retrieval systems need mechanisms to detect and resolve contradictions, not just shove text into the prompt. 6/8.

1

0

3

Kaiser Sun

@KaiserWhoLearns

28 days

🧠 Key finding #3:. “Just give them more explanation?” Providing rationales helps—it pushes models to lean more on the context—but it still can’t fully silence the stubborn parametric knowledge. 5/8.

1

0

5

Kaiser Sun

@KaiserWhoLearns

28 days

⚖️ Key finding #2:. Unsurprisingly, LLMs prefer their own memories. Even when we explicitly instruct them to rely on the provided document, traces of the “wrong” internal belief keep leaking into answers. 4/8.

1

0

5

Kaiser Sun

@KaiserWhoLearns

28 days

⚠️ Key finding #1:. If the task doesn’t require external knowledge (e.g., pure copy), conflict barely matters. However, as soon as knowledge is needed, accuracy tanks when context and memory disagree. 3/8.

1

0

5

Kaiser Sun

@KaiserWhoLearns

28 days

🛠️ We create diagnostic data that….- Agrees/Contradicts with the model’s knowledge.- Contradictions with different levels of plausibility.- Tasks requiring different levels of knowledge .2/8

1

0

6

Kaiser Sun

@KaiserWhoLearns

28 days

👉 📑

1

0

6

Kaiser Sun

@KaiserWhoLearns

1 month

RT @BafnaNiyati: We know speech LID systems flunk on accented speech. But why? And what to do about it?🤔Our work (I….

0

7

0

Kaiser Sun

@KaiserWhoLearns

1 month

RT @tpimentelms: A string may get 17 times less probability if tokenised as two symbols (e.g., ⟨he, llo⟩) than as one (e.g., ⟨hello⟩)—by an….

0

24

0

Kaiser Sun

@KaiserWhoLearns

1 month

RT @alex_gill_nlp: 𝐖𝐡𝐚𝐭 𝐇𝐚𝐬 𝐁𝐞𝐞𝐧 𝐋𝐨𝐬𝐭 𝐖𝐢𝐭𝐡 𝐒𝐲𝐧𝐭𝐡𝐞𝐭𝐢𝐜 𝐄𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧?. I'm happy to announce that the preprint release of my first project is on….

0

11

0

Kaiser Sun

@KaiserWhoLearns

1 month

RT @fangcong_y10593: Solving complex problems with CoT requires combining different skills. We can do this by:.🧩Modify the CoT data format….

0

32

0

Kaiser Sun

@KaiserWhoLearns

1 month

RT @krisgligoric: I'm excited to announce that I’ll be joining the Computer Science department at @JohnsHopkins as an Assistant Professor t….

0

183

0