Eva Spiliopoulou @EvaSpiliop X Profile

Eva Spiliopoulou

@EvaSpiliop

Followers

378

Following

173

Media

3

Statuses

48

Applied Scientist in #NLProc @Amazon finished PhD @LTIatCMU

Seattle, WA

Joined June 2018

Don't wanna be here? Send us removal request.

Eva Spiliopoulou

@EvaSpiliop

3 months

LLMs: great at judging… until it’s their own homework. 📚🔥So we built the math to call them out 🤷‍♀️ To learn more, check out our new paper: Play Favorites: A statistical method to quantify self-bias in LLM-as-a-judge 🎭 📄 Paper:

2

1

14

Eva Spiliopoulou

@EvaSpiliop

3 months

Thanks to our co-authors Riccardo Fogliato, H. Burnsky, T. Soliman, J. Ma, G. Horwood and @migballesteros! Also thanks to @awscloud Bedrock for supporting our work! 📄 Paper: https://t.co/fWxcEhMsbE 📷 Code & data:

0

Eva Spiliopoulou

@EvaSpiliop

3 months

Self-bias is not a "fixed" quantity, it varies based on the dimension and dataset (although some overall trends can be observed)

0

Eva Spiliopoulou

@EvaSpiliop

3 months

Family-bias accounts for a big part of self-bias of LLMs. Also negative self-bias is possible; some LLMs are more "critical" of themselves!

1

0

Eva Spiliopoulou

@EvaSpiliop

3 months

📄 Paper: https://t.co/fWxcEhMsbE 💻 Code & data:

0

Eva Spiliopoulou

@EvaSpiliop

3 months

In our paper, we also: ✅ Conduct an empirical study with 5k+ prompts & 9 LLM judges ✅ Release human annotations to support future research ✅ Find systematic self-bias (+ family-bias) in GPT-4o & Claude 3.5 Sonnet

2

0

Eva Spiliopoulou

@EvaSpiliop

3 months

Our framework: ✅ Explicitly models conditions under which self-bias can be detected ✅ Separates true quality differences from self-bias ✅ Accounts for consistent annotator differences

1

0

Eva Spiliopoulou

@EvaSpiliop

3 months

We introduce a statistical framework that isolates and quantifies self-bias in LLM-as-a-judge, while separating genuine quality differences (via independent human judges) from bias. Our study also reveals a strong family-bias problem — LLMs favoring models from their own family.

1

0

1

Artidoro Pagnoni

@ArtidoroPagnoni

4 months

Thrilled to share that our Byte Latent Transformer won an Outstanding Paper Award at ACL 2025! 🏆

Artidoro Pagnoni

@ArtidoroPagnoni

1 year

🚀 Introducing the Byte Latent Transformer (BLT) – An LLM architecture that scales better than Llama 3 using byte-patches instead of tokens 🤯 Paper 📄 https://t.co/5QGrlJdK0y Code 🛠️ https://t.co/jCdDI5BXwe

16

31

283

Artidoro Pagnoni

@ArtidoroPagnoni

1 year

🚀 Introducing the Byte Latent Transformer (BLT) – An LLM architecture that scales better than Llama 3 using byte-patches instead of tokens 🤯 Paper 📄 https://t.co/5QGrlJdK0y Code 🛠️ https://t.co/jCdDI5BXwe

17

145

727

Artidoro Pagnoni

@ArtidoroPagnoni

2 years

We will present QLoRA at NeurIPS! Come to our oral on Tuesday where @Tim_Dettmers will be giving a talk. If you have questions stop by our poster session!

Artidoro Pagnoni

@ArtidoroPagnoni

3 years

4-bit QLoRA is here to equalize the playing field for LLM exploration. You can now fine-tune a state-of-the-art 65B chatbot on one GPU in 24h. Paper: https://t.co/7gX1oIUHEx Code and Demo:

7

37

314

Eva Spiliopoulou

@EvaSpiliop

3 years

Ground breaking research! We need competitive, open-sourced models that one can fine-tune with limited resources!

Artidoro Pagnoni

@ArtidoroPagnoni

3 years

4-bit QLoRA is here to equalize the playing field for LLM exploration. You can now fine-tune a state-of-the-art 65B chatbot on one GPU in 24h. Paper: https://t.co/7gX1oIUHEx Code and Demo:

0

2

Artidoro Pagnoni

@ArtidoroPagnoni

3 years

4-bit QLoRA is here to equalize the playing field for LLM exploration. You can now fine-tune a state-of-the-art 65B chatbot on one GPU in 24h. Paper: https://t.co/7gX1oIUHEx Code and Demo:

github.com

QLoRA: Efficient Finetuning of Quantized LLMs. Contribute to artidoro/qlora development by creating an account on GitHub.

Tim Dettmers

@Tim_Dettmers

3 years

QLoRA: 4-bit finetuning of LLMs is here! With it comes Guanaco, a chatbot on a single GPU, achieving 99% ChatGPT performance on the Vicuna benchmark: Paper: https://t.co/J3Xy195kDD Code+Demo: https://t.co/SP2FsdXAn5 Samples: https://t.co/q2Nd9cxSrt Colab: https://t.co/Q49m0IlJHD

6

47

223

Yann LeCun

@ylecun

3 years

A NYT article on the debate around whether LLM base models should be closed or open. Meta argues for openness, starting with the release of LLaMA (for non-commercial use), while OpenAI and Google want to keep things closed and proprietary. They argue that openness can be

182

480

2K

Artidoro Pagnoni

@ArtidoroPagnoni

3 years

We are excited to announce QLoRA, a new method for LLM fine-tuning that uses only a fraction of the memory footprint. Please consider joining our private beta to gain early access to QLoRA! Stay tuned for the paper and code release, coming soon.

Tim Dettmers

@Tim_Dettmers

3 years

The 4-bit bitsandbytes private beta is here! Our method, QLoRA, is integrated with the HF stack and supports all models. You can finetune a 65B model on a single 48 GB GPU. This beta will help us catch bugs and issues before our full release. Sign up: https://t.co/XBAQv76laa

0

19

123

Divyansh Kaushik

@dkaushik96

3 years

ICYMI since you have a social life and aren’t perpetually online. I wrote for @Forbes on how @RepGallagher could lead the new Select Committee focused on CCP to succeed by addressing critical issues that are sometimes missing from the conversation. https://t.co/uql4kNAALM

1

4

9

Eva Spiliopoulou

@EvaSpiliop

3 years

Join our oral presentation @emnlpmeeting in the Commonsense Reasoning track, at 10am Sunday 12/11, Hall B. Our paper is available here:

2

0

3

Eva Spiliopoulou

@EvaSpiliop

3 years

The legendary fights of Physical Interaction vs Language-only models, their epic journeys through the valleys of Artificial Environments and Naturally Occurring text, and their challenges on in-domain & out-of-domain attributes ONLY in EvEntS ReaLM! @ArtidoroPagnoni @ybisk @ehovy

1

7

22

Artidoro Pagnoni

@ArtidoroPagnoni

3 years

Very excited to present our work “Threat Scenarios and Best Practices for Neural Fake News Detection” at COLING 2022! https://t.co/ObCpMcDdQw with Yulia Tsvetkov and Martin Graciarena

aclanthology.org

Artidoro Pagnoni, Martin Graciarena, Yulia Tsvetkov. Proceedings of the 29th International Conference on Computational Linguistics. 2022.

2

6

35

Chirag Nagpal

@nagpalchirag

4 years

1/n 📣 New Preprint: "𝑪𝒐𝒖𝒏𝒕𝒆𝒓𝒇𝒂𝒄𝒕𝒖𝒂𝒍 𝑷𝒉𝒆𝒏𝒐𝒕𝒚𝒑𝒊𝒏𝒈 𝒘𝒊𝒕𝒉 𝑪𝒆𝒏𝒔𝒐𝒓𝒆𝒅 𝑻𝒊𝒎𝒆-𝒕𝒐-𝑬𝒗𝒆𝒏𝒕𝒔" w/ @MononitoGoswami @AutonLab and @DrDufendach @UPMC https://t.co/88eqEJcJA3 #MachineLearning #EpiTwitter #MedTwitter #causalinference #DataScience

arxiv.org

Estimation of treatment efficacy of real-world clinical interventions involves working with continuous outcomes such as time-to-death, re-hospitalization, or a composite event that may be subject...

2

13

29