Simone Tedeschi @SimoneTedeschi_ X Profile

Simone Tedeschi

@SimoneTedeschi_

Followers

1K

Following

4K

Media

23

Statuses

267

Applied Scientist @Amazon AGI • PhD @SapienzaRoma

Roma, Lazio

Joined August 2021

Don't wanna be here? Send us removal request.

Simone Tedeschi

@SimoneTedeschi_

1 year

📢 Interested in #LLM safety? We have just uploaded a new version of ALERT 🚨 on ArXiv with novel insights into the weaknesses and vulnerabilities of LLMs! 👀 https://t.co/uAPrfTnIb9 For a summary of the paper, read this thread 🧵

arxiv.org

When building Large Language Models (LLMs), it is paramount to bear safety in mind and protect them with guardrails. Indeed, LLMs should never generate content promoting or normalizing harmful,...

2

7

34

Emmy Liu

@_emliu

9 months

What design decisions in LLM training affect the final performance of LLMs? Scaling model size and training data is important, but it's not the only thing. We performed an analysis of 90+ open-weights models to answer this question. 🧵 https://t.co/R8FkBHgwgM (1/12)

6

57

217

SapienzaNLP

@SapienzaNLP

11 months

Last week 5 in our group received their #PhD in #AI & #Engineering in #ComputerScience! @SBejgu, @PereLluisHC, @RiccardoRicOrl, @alescire94, and @SimoneTedeschi_, all with the highest grade (+2 cum laude)! Congrats all: we are very proud of you! Four of them were/are @Babelscape

1

4

18

Babelscape

@babelscape

11 months

Four of our industrial #PhD students, @SBejgu, @PereLluisHC, @alescire94 and @SimoneTedeschi_, were awarded their #PhD in #AI last Friday with the best grades (and two cum laude)! Congrats all! 👏 🎉 With @RNavigli, their advisor and Babelscape's scientific director, in the photo

0

5

12

Hoyeon Chang

@hoyeon_chang

1 year

🚨 New paper 🚨 How Large Language Models Acquire Factual Knowledge During Pretraining? I’m thrilled to announce the release of my new paper! 🎉 This research explores how LLMs acquire and retain factual knowledge during pretraining. Here are some key insights:

12

119

519

Rongwu Xu

@rongwu_xu

1 year

☕️New paper 👉Our latest paper delves into LLMs' ability to perform safety self-correction, namely COURSE-CORRECTION. In this paper, we: - Benchmark course-correction ability - Improving using synthetic preferences. Paper: https://t.co/HmlI1gdVYB Code: https://t.co/x0upmuWDYY

4

21

38

Anka Reuel | @ankareuel.bsky.social

@AnkaReuel

1 year

Our new paper "Open Problems in Technical AI Governance" led by @ben_s_bucknall & me is out! We outline 89 open technical issues in AI governance, plus resources and 100+ research questions that technical experts can tackle to help AI governance efforts🧵 https://t.co/CUc6H6Y0ax

11

45

182

Steffi Chern

@steffichern

1 year

🚀How can we effectively evaluate and prevent superintelligent LLMs from deceiving others? We introduce 🤝BeHonest, a pioneering benchmark specifically designed to assess the honesty in LLMs comprehensively. Paper 📄: [ https://t.co/XzVV82TDXB] Code 👨🏻‍💻: [ https://t.co/dJnipu1Ph5]

1

22

58

Hitesh Laxmichand Patel

@Hitesh_LPatel

1 year

ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming The paper introduces ALERT, a benchmark for assessing the safety of LLMs. It employs a fine-grained risk taxonomy to evaluate LLMs propensity to generate harmful content and

1

2

13

Hitesh Laxmichand Patel

@Hitesh_LPatel

1 year

ADVSCORE: A Metric for the Evaluation and Creation of Adversarial Benchmarks This paper introduces ADVSCORE, a metric to evaluate and create high-quality adversarial datasets. ADVQA, a robust question answering dataset effectively fools models while not humans. This approach

0

1

9

Simone Tedeschi

@SimoneTedeschi_

1 year

The ALERT 🚨 benchmark, the DPO dataset and all the models' outputs are publicly available. For more details 🔽 📰 Paper: https://t.co/7E4ujqt0QG 💾 Repo: https://t.co/hPIhhEutPL 🤗 ALERT benchmark: https://t.co/Dfd8mdjtuv 🤗 ALERT DPO data:

huggingface.co

0

1

Simone Tedeschi

@SimoneTedeschi_

1 year

As a result of our evaluations, we also produced a new Direct Preference Optimization (#DPO) dataset for safety tuning. By leveraging this dataset, new models can be aligned to the safety levels of the best models currently available 🌟

1

0

1

Simone Tedeschi

@SimoneTedeschi_

1 year

By leveraging the adversarial subset of ALERT 🚨, we also quantified the Attack Success Rate (ASR) of various adversarial attacks. Interestingly, most models, including closed-source ones, can be easily jailbroken❗

1

0

1

Simone Tedeschi

@SimoneTedeschi_

1 year

In our experiments, we extensively evaluated several open- and closed-source LLMs (e.g. #ChatGPT, #Llama and #Mistral), highlighting their strengths and weaknesses.

1

0

1

Simone Tedeschi

@SimoneTedeschi_

1 year

For creating ALERT 🚨, we started by filtering the @AnthropicAI red-teaming-attempts dataset. Then, we: 1) automatically classified these prompts 2) created thousands of new prompts by means of templates 3) implemented adversarial attacks to make the benchmark more challenging

1

0

1

Simone Tedeschi

@SimoneTedeschi_

1 year

As a key design principle for ALERT 🚨, we developed a new fine-grained safety risk #taxonomy. This taxonomy serves as the foundation for the benchmark to provide detailed insights about a model’s behavior as well as inform targeted safety enhancements 🛡️

1

0

1

Simone Tedeschi

@SimoneTedeschi_

1 year

ALERT 🚨 is a new comprehensive #benchmark for assessing #LLMs’ safety through #redteaming 🔎 It consists of about 45k prompts, both standard and adversarial ones. Our automated evaluation methodology, together with the benchmark, constitutes the ALERT framework. 🧵1/n

1

Babelscape

@babelscape

1 year

We are proud to share that our paper, "CNER: Concept and Named Entity Recognition", a joint work with @SapienzaNLP, has been presented at #NAACL24! 🥳 Looking forward to engaging with the community. #NAACL2024 #AI #NLProc #Research #NER

0

1

5

Barbara McGillivray

@BarbaraMcGilli

2 years

Iacopo Ghinassi just presented our paper on Latin word sense disambiguation @LrecColing : we used language pivoting on English to boost the task on Latin. More research on this to come, watch this space!

1

4

12

Matt Shumer

@mattshumer_

2 years

The dataset is everything. Great read: https://t.co/snGcPx0M16

109

548

3K