Simone Tedeschi
@SimoneTedeschi_
Followers
1K
Following
4K
Media
23
Statuses
267
Applied Scientist @Amazon AGI • PhD @SapienzaRoma
Roma, Lazio
Joined August 2021
📢 Interested in #LLM safety? We have just uploaded a new version of ALERT 🚨 on ArXiv with novel insights into the weaknesses and vulnerabilities of LLMs! 👀 https://t.co/uAPrfTnIb9 For a summary of the paper, read this thread 🧵
arxiv.org
When building Large Language Models (LLMs), it is paramount to bear safety in mind and protect them with guardrails. Indeed, LLMs should never generate content promoting or normalizing harmful,...
2
7
34
What design decisions in LLM training affect the final performance of LLMs? Scaling model size and training data is important, but it's not the only thing. We performed an analysis of 90+ open-weights models to answer this question. 🧵 https://t.co/R8FkBHgwgM (1/12)
6
57
217
Last week 5 in our group received their #PhD in #AI & #Engineering in #ComputerScience! @SBejgu, @PereLluisHC, @RiccardoRicOrl, @alescire94, and @SimoneTedeschi_, all with the highest grade (+2 cum laude)! Congrats all: we are very proud of you! Four of them were/are @Babelscape
1
4
18
Four of our industrial #PhD students, @SBejgu, @PereLluisHC, @alescire94 and @SimoneTedeschi_, were awarded their #PhD in #AI last Friday with the best grades (and two cum laude)! Congrats all! 👏 🎉 With @RNavigli, their advisor and Babelscape's scientific director, in the photo
0
5
12
🚨 New paper 🚨 How Large Language Models Acquire Factual Knowledge During Pretraining? I’m thrilled to announce the release of my new paper! 🎉 This research explores how LLMs acquire and retain factual knowledge during pretraining. Here are some key insights:
12
119
519
☕️New paper 👉Our latest paper delves into LLMs' ability to perform safety self-correction, namely COURSE-CORRECTION. In this paper, we: - Benchmark course-correction ability - Improving using synthetic preferences. Paper: https://t.co/HmlI1gdVYB Code: https://t.co/x0upmuWDYY
4
21
38
Our new paper "Open Problems in Technical AI Governance" led by @ben_s_bucknall & me is out! We outline 89 open technical issues in AI governance, plus resources and 100+ research questions that technical experts can tackle to help AI governance efforts🧵 https://t.co/CUc6H6Y0ax
11
45
182
🚀How can we effectively evaluate and prevent superintelligent LLMs from deceiving others? We introduce 🤝BeHonest, a pioneering benchmark specifically designed to assess the honesty in LLMs comprehensively. Paper 📄: [ https://t.co/XzVV82TDXB] Code 👨🏻💻: [ https://t.co/dJnipu1Ph5]
1
22
58
ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming The paper introduces ALERT, a benchmark for assessing the safety of LLMs. It employs a fine-grained risk taxonomy to evaluate LLMs propensity to generate harmful content and
1
2
13
ADVSCORE: A Metric for the Evaluation and Creation of Adversarial Benchmarks This paper introduces ADVSCORE, a metric to evaluate and create high-quality adversarial datasets. ADVQA, a robust question answering dataset effectively fools models while not humans. This approach
0
1
9
The ALERT 🚨 benchmark, the DPO dataset and all the models' outputs are publicly available. For more details 🔽 📰 Paper: https://t.co/7E4ujqt0QG 💾 Repo: https://t.co/hPIhhEutPL 🤗 ALERT benchmark: https://t.co/Dfd8mdjtuv 🤗 ALERT DPO data:
huggingface.co
0
0
1
As a result of our evaluations, we also produced a new Direct Preference Optimization (#DPO) dataset for safety tuning. By leveraging this dataset, new models can be aligned to the safety levels of the best models currently available 🌟
1
0
1
By leveraging the adversarial subset of ALERT 🚨, we also quantified the Attack Success Rate (ASR) of various adversarial attacks. Interestingly, most models, including closed-source ones, can be easily jailbroken❗
1
0
1
For creating ALERT 🚨, we started by filtering the @AnthropicAI red-teaming-attempts dataset. Then, we: 1) automatically classified these prompts 2) created thousands of new prompts by means of templates 3) implemented adversarial attacks to make the benchmark more challenging
1
0
1
As a key design principle for ALERT 🚨, we developed a new fine-grained safety risk #taxonomy. This taxonomy serves as the foundation for the benchmark to provide detailed insights about a model’s behavior as well as inform targeted safety enhancements 🛡️
1
0
1
ALERT 🚨 is a new comprehensive #benchmark for assessing #LLMs’ safety through #redteaming 🔎 It consists of about 45k prompts, both standard and adversarial ones. Our automated evaluation methodology, together with the benchmark, constitutes the ALERT framework. 🧵1/n
1
1
1
We are proud to share that our paper, "CNER: Concept and Named Entity Recognition", a joint work with @SapienzaNLP, has been presented at #NAACL24! 🥳 Looking forward to engaging with the community. #NAACL2024 #AI #NLProc #Research #NER
0
1
5
Iacopo Ghinassi just presented our paper on Latin word sense disambiguation @LrecColing : we used language pivoting on English to boost the task on Latin. More research on this to come, watch this space!
1
4
12