Eva Spiliopoulou
@EvaSpiliop
Followers
378
Following
173
Media
3
Statuses
48
Applied Scientist in #NLProc @Amazon finished PhD @LTIatCMU
Seattle, WA
Joined June 2018
LLMs: great at judgingβ¦ until itβs their own homework. ππ₯So we built the math to call them out π€·ββοΈ To learn more, check out our new paper: Play Favorites: A statistical method to quantify self-bias in LLM-as-a-judge π π Paper:
2
1
14
Thanks to our co-authors Riccardo Fogliato, H. Burnsky, T. Soliman, J. Ma, G. Horwood and @migballesteros! Also thanks to @awscloud Bedrock for supporting our work! π Paper: https://t.co/fWxcEhMsbE π· Code & data:
0
0
0
Self-bias is not a "fixed" quantity, it varies based on the dimension and dataset (although some overall trends can be observed)
0
0
0
Family-bias accounts for a big part of self-bias of LLMs. Also negative self-bias is possible; some LLMs are more "critical" of themselves!
1
0
0
In our paper, we also: β
Conduct an empirical study with 5k+ prompts & 9 LLM judges β
Release human annotations to support future research β
Find systematic self-bias (+ family-bias) in GPT-4o & Claude 3.5 Sonnet
2
0
0
Our framework: β
Explicitly models conditions under which self-bias can be detected β
Separates true quality differences from self-bias β
Accounts for consistent annotator differences
1
0
0
We introduce a statistical framework that isolates and quantifies self-bias in LLM-as-a-judge, while separating genuine quality differences (via independent human judges) from bias. Our study also reveals a strong family-bias problem β LLMs favoring models from their own family.
1
0
1
Thrilled to share that our Byte Latent Transformer won an Outstanding Paper Award at ACL 2025! π
π Introducing the Byte Latent Transformer (BLT) β An LLM architecture that scales better than Llama 3 using byte-patches instead of tokens π€― Paper π https://t.co/5QGrlJdK0y Code π οΈ https://t.co/jCdDI5BXwe
16
31
283
π Introducing the Byte Latent Transformer (BLT) β An LLM architecture that scales better than Llama 3 using byte-patches instead of tokens π€― Paper π https://t.co/5QGrlJdK0y Code π οΈ https://t.co/jCdDI5BXwe
17
145
727
We will present QLoRA at NeurIPS! Come to our oral on Tuesday where @Tim_Dettmers will be giving a talk. If you have questions stop by our poster session!
4-bit QLoRA is here to equalize the playing field for LLM exploration. You can now fine-tune a state-of-the-art 65B chatbot on one GPU in 24h. Paper: https://t.co/7gX1oIUHEx Code and Demo:
7
37
314
Ground breaking research! We need competitive, open-sourced models that one can fine-tune with limited resources!
4-bit QLoRA is here to equalize the playing field for LLM exploration. You can now fine-tune a state-of-the-art 65B chatbot on one GPU in 24h. Paper: https://t.co/7gX1oIUHEx Code and Demo:
0
0
2
4-bit QLoRA is here to equalize the playing field for LLM exploration. You can now fine-tune a state-of-the-art 65B chatbot on one GPU in 24h. Paper: https://t.co/7gX1oIUHEx Code and Demo:
github.com
QLoRA: Efficient Finetuning of Quantized LLMs. Contribute to artidoro/qlora development by creating an account on GitHub.
QLoRA: 4-bit finetuning of LLMs is here! With it comes Guanaco, a chatbot on a single GPU, achieving 99% ChatGPT performance on the Vicuna benchmark: Paper: https://t.co/J3Xy195kDD Code+Demo: https://t.co/SP2FsdXAn5 Samples: https://t.co/q2Nd9cxSrt Colab: https://t.co/Q49m0IlJHD
6
47
223
A NYT article on the debate around whether LLM base models should be closed or open. Meta argues for openness, starting with the release of LLaMA (for non-commercial use), while OpenAI and Google want to keep things closed and proprietary. They argue that openness can be
182
480
2K
We are excited to announce QLoRA, a new method for LLM fine-tuning that uses only a fraction of the memory footprint. Please consider joining our private beta to gain early access to QLoRA! Stay tuned for the paper and code release, coming soon.
The 4-bit bitsandbytes private beta is here! Our method, QLoRA, is integrated with the HF stack and supports all models. You can finetune a 65B model on a single 48 GB GPU. This beta will help us catch bugs and issues before our full release. Sign up: https://t.co/XBAQv76laa
0
19
123
ICYMI since you have a social life and arenβt perpetually online. I wrote for @Forbes on how @RepGallagher could lead the new Select Committee focused on CCP to succeed by addressing critical issues that are sometimes missing from the conversation. https://t.co/uql4kNAALM
1
4
9
Join our oral presentation @emnlpmeeting in the Commonsense Reasoning track, at 10am Sunday 12/11, Hall B. Our paper is available here:
2
0
3
The legendary fights of Physical Interaction vs Language-only models, their epic journeys through the valleys of Artificial Environments and Naturally Occurring text, and their challenges on in-domain & out-of-domain attributes ONLY in EvEntS ReaLM! @ArtidoroPagnoni @ybisk @ehovy
1
7
22
Very excited to present our work βThreat Scenarios and Best Practices for Neural Fake News Detectionβ at COLING 2022! https://t.co/ObCpMcDdQw with Yulia Tsvetkov and Martin Graciarena
aclanthology.org
Artidoro Pagnoni, Martin Graciarena, Yulia Tsvetkov. Proceedings of the 29th International Conference on Computational Linguistics. 2022.
2
6
35
1/n π£ New Preprint: "πͺπππππππππππππ π·ππππππππππ ππππ πͺπππππππ
π»πππ-ππ-π¬πππππ" w/ @MononitoGoswami @AutonLab and @DrDufendach @UPMC
https://t.co/88eqEJcJA3
#MachineLearning #EpiTwitter #MedTwitter #causalinference #DataScience
arxiv.org
Estimation of treatment efficacy of real-world clinical interventions involves working with continuous outcomes such as time-to-death, re-hospitalization, or a composite event that may be subject...
2
13
29