SimoneTedeschi_ Profile Banner
Simone Tedeschi Profile
Simone Tedeschi

@SimoneTedeschi_

Followers
1K
Following
4K
Media
23
Statuses
267

Applied Scientist @Amazon AGI • PhD @SapienzaRoma

Roma, Lazio
Joined August 2021
Don't wanna be here? Send us removal request.
@SimoneTedeschi_
Simone Tedeschi
1 year
📢 Interested in #LLM safety? We have just uploaded a new version of ALERT 🚨 on ArXiv with novel insights into the weaknesses and vulnerabilities of LLMs! 👀 https://t.co/uAPrfTnIb9 For a summary of the paper, read this thread 🧵
Tweet card summary image
arxiv.org
When building Large Language Models (LLMs), it is paramount to bear safety in mind and protect them with guardrails. Indeed, LLMs should never generate content promoting or normalizing harmful,...
2
7
34
@_emliu
Emmy Liu
9 months
What design decisions in LLM training affect the final performance of LLMs? Scaling model size and training data is important, but it's not the only thing. We performed an analysis of 90+ open-weights models to answer this question. 🧵 https://t.co/R8FkBHgwgM (1/12)
6
57
217
@SapienzaNLP
SapienzaNLP
11 months
Last week 5 in our group received their #PhD in #AI & #Engineering in #ComputerScience! @SBejgu, @PereLluisHC, @RiccardoRicOrl, @alescire94, and @SimoneTedeschi_, all with the highest grade (+2 cum laude)! Congrats all: we are very proud of you! Four of them were/are @Babelscape
1
4
18
@babelscape
Babelscape
11 months
Four of our industrial #PhD students, @SBejgu, @PereLluisHC, @alescire94 and @SimoneTedeschi_, were awarded their #PhD in #AI last Friday with the best grades (and two cum laude)! Congrats all! 👏 🎉 With @RNavigli, their advisor and Babelscape's scientific director, in the photo
0
5
12
@hoyeon_chang
Hoyeon Chang
1 year
🚨 New paper 🚨 How Large Language Models Acquire Factual Knowledge During Pretraining? I’m thrilled to announce the release of my new paper! 🎉 This research explores how LLMs acquire and retain factual knowledge during pretraining. Here are some key insights:
12
119
519
@rongwu_xu
Rongwu Xu
1 year
☕️New paper 👉Our latest paper delves into LLMs' ability to perform safety self-correction, namely COURSE-CORRECTION. In this paper, we: - Benchmark course-correction ability - Improving using synthetic preferences. Paper: https://t.co/HmlI1gdVYB Code: https://t.co/x0upmuWDYY
4
21
38
@AnkaReuel
Anka Reuel | @ankareuel.bsky.social
1 year
Our new paper "Open Problems in Technical AI Governance" led by @ben_s_bucknall & me is out! We outline 89 open technical issues in AI governance, plus resources and 100+ research questions that technical experts can tackle to help AI governance efforts🧵 https://t.co/CUc6H6Y0ax
11
45
182
@steffichern
Steffi Chern
1 year
🚀How can we effectively evaluate and prevent superintelligent LLMs from deceiving others? We introduce 🤝BeHonest, a pioneering benchmark specifically designed to assess the honesty in LLMs comprehensively. Paper 📄: [ https://t.co/XzVV82TDXB] Code 👨🏻‍💻: [ https://t.co/dJnipu1Ph5]
1
22
58
@Hitesh_LPatel
Hitesh Laxmichand Patel
1 year
ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming The paper introduces ALERT, a benchmark for assessing the safety of LLMs. It employs a fine-grained risk taxonomy to evaluate LLMs propensity to generate harmful content and
1
2
13
@Hitesh_LPatel
Hitesh Laxmichand Patel
1 year
ADVSCORE: A Metric for the Evaluation and Creation of Adversarial Benchmarks This paper introduces ADVSCORE, a metric to evaluate and create high-quality adversarial datasets. ADVQA, a robust question answering dataset effectively fools models while not humans. This approach
0
1
9
@SimoneTedeschi_
Simone Tedeschi
1 year
The ALERT 🚨 benchmark, the DPO dataset and all the models' outputs are publicly available. For more details 🔽 📰 Paper: https://t.co/7E4ujqt0QG 💾 Repo: https://t.co/hPIhhEutPL 🤗 ALERT benchmark: https://t.co/Dfd8mdjtuv 🤗 ALERT DPO data:
huggingface.co
0
0
1
@SimoneTedeschi_
Simone Tedeschi
1 year
As a result of our evaluations, we also produced a new Direct Preference Optimization (#DPO) dataset for safety tuning. By leveraging this dataset, new models can be aligned to the safety levels of the best models currently available 🌟
1
0
1
@SimoneTedeschi_
Simone Tedeschi
1 year
By leveraging the adversarial subset of ALERT 🚨, we also quantified the Attack Success Rate (ASR) of various adversarial attacks. Interestingly, most models, including closed-source ones, can be easily jailbroken❗
1
0
1
@SimoneTedeschi_
Simone Tedeschi
1 year
In our experiments, we extensively evaluated several open- and closed-source LLMs (e.g. #ChatGPT, #Llama and #Mistral), highlighting their strengths and weaknesses.
1
0
1
@SimoneTedeschi_
Simone Tedeschi
1 year
For creating ALERT 🚨, we started by filtering the @AnthropicAI red-teaming-attempts dataset. Then, we: 1) automatically classified these prompts 2) created thousands of new prompts by means of templates 3) implemented adversarial attacks to make the benchmark more challenging
1
0
1
@SimoneTedeschi_
Simone Tedeschi
1 year
As a key design principle for ALERT 🚨, we developed a new fine-grained safety risk #taxonomy. This taxonomy serves as the foundation for the benchmark to provide detailed insights about a model’s behavior as well as inform targeted safety enhancements 🛡️
1
0
1
@SimoneTedeschi_
Simone Tedeschi
1 year
ALERT 🚨 is a new comprehensive #benchmark for assessing #LLMs’ safety through #redteaming 🔎 It consists of about 45k prompts, both standard and adversarial ones. Our automated evaluation methodology, together with the benchmark, constitutes the ALERT framework. 🧵1/n
1
1
1
@babelscape
Babelscape
1 year
We are proud to share that our paper, "CNER: Concept and Named Entity Recognition", a joint work with @SapienzaNLP, has been presented at #NAACL24! 🥳 Looking forward to engaging with the community. #NAACL2024 #AI #NLProc #Research #NER
0
1
5
@BarbaraMcGilli
Barbara McGillivray
2 years
Iacopo Ghinassi just presented our paper on Latin word sense disambiguation @LrecColing : we used language pivoting on English to boost the task on Latin. More research on this to come, watch this space!
1
4
12
@mattshumer_
Matt Shumer
2 years
The dataset is everything. Great read: https://t.co/snGcPx0M16
109
548
3K