Eva Spiliopoulou Profile
Eva Spiliopoulou

@EvaSpiliop

Followers
378
Following
173
Media
3
Statuses
48

Applied Scientist in #NLProc @Amazon finished PhD @LTIatCMU

Seattle, WA
Joined June 2018
Don't wanna be here? Send us removal request.
@EvaSpiliop
Eva Spiliopoulou
3 months
LLMs: great at judging… until it’s their own homework. πŸ“šπŸ”₯So we built the math to call them out πŸ€·β€β™€οΈ To learn more, check out our new paper: Play Favorites: A statistical method to quantify self-bias in LLM-as-a-judge 🎭 πŸ“„ Paper:
2
1
14
@EvaSpiliop
Eva Spiliopoulou
3 months
Thanks to our co-authors Riccardo Fogliato, H. Burnsky, T. Soliman, J. Ma, G. Horwood and @migballesteros! Also thanks to @awscloud Bedrock for supporting our work! πŸ“„ Paper: https://t.co/fWxcEhMsbE πŸ“· Code & data:
0
0
0
@EvaSpiliop
Eva Spiliopoulou
3 months
Self-bias is not a "fixed" quantity, it varies based on the dimension and dataset (although some overall trends can be observed)
0
0
0
@EvaSpiliop
Eva Spiliopoulou
3 months
Family-bias accounts for a big part of self-bias of LLMs. Also negative self-bias is possible; some LLMs are more "critical" of themselves!
1
0
0
@EvaSpiliop
Eva Spiliopoulou
3 months
πŸ“„ Paper: https://t.co/fWxcEhMsbE πŸ’» Code & data:
0
0
0
@EvaSpiliop
Eva Spiliopoulou
3 months
In our paper, we also: βœ… Conduct an empirical study with 5k+ prompts & 9 LLM judges βœ… Release human annotations to support future research βœ… Find systematic self-bias (+ family-bias) in GPT-4o & Claude 3.5 Sonnet
2
0
0
@EvaSpiliop
Eva Spiliopoulou
3 months
Our framework: βœ… Explicitly models conditions under which self-bias can be detected βœ… Separates true quality differences from self-bias βœ… Accounts for consistent annotator differences
1
0
0
@EvaSpiliop
Eva Spiliopoulou
3 months
We introduce a statistical framework that isolates and quantifies self-bias in LLM-as-a-judge, while separating genuine quality differences (via independent human judges) from bias. Our study also reveals a strong family-bias problem β€” LLMs favoring models from their own family.
1
0
1
@ArtidoroPagnoni
Artidoro Pagnoni
4 months
Thrilled to share that our Byte Latent Transformer won an Outstanding Paper Award at ACL 2025! πŸ†
@ArtidoroPagnoni
Artidoro Pagnoni
1 year
πŸš€ Introducing the Byte Latent Transformer (BLT) – An LLM architecture that scales better than Llama 3 using byte-patches instead of tokens 🀯 Paper πŸ“„ https://t.co/5QGrlJdK0y Code πŸ› οΈ https://t.co/jCdDI5BXwe
16
31
283
@ArtidoroPagnoni
Artidoro Pagnoni
1 year
πŸš€ Introducing the Byte Latent Transformer (BLT) – An LLM architecture that scales better than Llama 3 using byte-patches instead of tokens 🀯 Paper πŸ“„ https://t.co/5QGrlJdK0y Code πŸ› οΈ https://t.co/jCdDI5BXwe
17
145
727
@ArtidoroPagnoni
Artidoro Pagnoni
2 years
We will present QLoRA at NeurIPS! Come to our oral on Tuesday where @Tim_Dettmers will be giving a talk. If you have questions stop by our poster session!
@ArtidoroPagnoni
Artidoro Pagnoni
3 years
4-bit QLoRA is here to equalize the playing field for LLM exploration. You can now fine-tune a state-of-the-art 65B chatbot on one GPU in 24h. Paper: https://t.co/7gX1oIUHEx Code and Demo:
7
37
314
@EvaSpiliop
Eva Spiliopoulou
3 years
Ground breaking research! We need competitive, open-sourced models that one can fine-tune with limited resources!
@ArtidoroPagnoni
Artidoro Pagnoni
3 years
4-bit QLoRA is here to equalize the playing field for LLM exploration. You can now fine-tune a state-of-the-art 65B chatbot on one GPU in 24h. Paper: https://t.co/7gX1oIUHEx Code and Demo:
0
0
2
@ArtidoroPagnoni
Artidoro Pagnoni
3 years
4-bit QLoRA is here to equalize the playing field for LLM exploration. You can now fine-tune a state-of-the-art 65B chatbot on one GPU in 24h. Paper: https://t.co/7gX1oIUHEx Code and Demo:
Tweet card summary image
github.com
QLoRA: Efficient Finetuning of Quantized LLMs. Contribute to artidoro/qlora development by creating an account on GitHub.
@Tim_Dettmers
Tim Dettmers
3 years
QLoRA: 4-bit finetuning of LLMs is here! With it comes Guanaco, a chatbot on a single GPU, achieving 99% ChatGPT performance on the Vicuna benchmark: Paper: https://t.co/J3Xy195kDD Code+Demo: https://t.co/SP2FsdXAn5 Samples: https://t.co/q2Nd9cxSrt Colab: https://t.co/Q49m0IlJHD
6
47
223
@ylecun
Yann LeCun
3 years
A NYT article on the debate around whether LLM base models should be closed or open. Meta argues for openness, starting with the release of LLaMA (for non-commercial use), while OpenAI and Google want to keep things closed and proprietary. They argue that openness can be
182
480
2K
@ArtidoroPagnoni
Artidoro Pagnoni
3 years
We are excited to announce QLoRA, a new method for LLM fine-tuning that uses only a fraction of the memory footprint. Please consider joining our private beta to gain early access to QLoRA! Stay tuned for the paper and code release, coming soon.
@Tim_Dettmers
Tim Dettmers
3 years
The 4-bit bitsandbytes private beta is here! Our method, QLoRA, is integrated with the HF stack and supports all models. You can finetune a 65B model on a single 48 GB GPU. This beta will help us catch bugs and issues before our full release. Sign up: https://t.co/XBAQv76laa
0
19
123
@dkaushik96
Divyansh Kaushik
3 years
ICYMI since you have a social life and aren’t perpetually online. I wrote for @Forbes on how @RepGallagher could lead the new Select Committee focused on CCP to succeed by addressing critical issues that are sometimes missing from the conversation. https://t.co/uql4kNAALM
1
4
9
@EvaSpiliop
Eva Spiliopoulou
3 years
Join our oral presentation @emnlpmeeting in the Commonsense Reasoning track, at 10am Sunday 12/11, Hall B. Our paper is available here:
2
0
3
@EvaSpiliop
Eva Spiliopoulou
3 years
The legendary fights of Physical Interaction vs Language-only models, their epic journeys through the valleys of Artificial Environments and Naturally Occurring text, and their challenges on in-domain & out-of-domain attributes ONLY in EvEntS ReaLM! @ArtidoroPagnoni @ybisk @ehovy
1
7
22
@ArtidoroPagnoni
Artidoro Pagnoni
3 years
Very excited to present our work β€œThreat Scenarios and Best Practices for Neural Fake News Detection” at COLING 2022! https://t.co/ObCpMcDdQw with Yulia Tsvetkov and Martin Graciarena
Tweet card summary image
aclanthology.org
Artidoro Pagnoni, Martin Graciarena, Yulia Tsvetkov. Proceedings of the 29th International Conference on Computational Linguistics. 2022.
2
6
35
@nagpalchirag
Chirag Nagpal
4 years
1/n πŸ“£ New Preprint: "π‘ͺ𝒐𝒖𝒏𝒕𝒆𝒓𝒇𝒂𝒄𝒕𝒖𝒂𝒍 π‘·π’‰π’†π’π’π’•π’šπ’‘π’Šπ’π’ˆ π’˜π’Šπ’•π’‰ π‘ͺ𝒆𝒏𝒔𝒐𝒓𝒆𝒅 π‘»π’Šπ’Žπ’†-𝒕𝒐-𝑬𝒗𝒆𝒏𝒕𝒔" w/ @MononitoGoswami @AutonLab and @DrDufendach @UPMC https://t.co/88eqEJcJA3 #MachineLearning #EpiTwitter #MedTwitter #causalinference #DataScience
Tweet card summary image
arxiv.org
Estimation of treatment efficacy of real-world clinical interventions involves working with continuous outcomes such as time-to-death, re-hospitalization, or a composite event that may be subject...
2
13
29