Mahed Mousavi @mahedmousavi X Profile

Mahed Mousavi

@mahedmousavi

Followers

151

Following

7K

Media

11

Statuses

365

Program Chair @IWSDSmeeting '26 Assistant Professor (RTD-a) @UniTrento_DISI Computational Linguistics, Conversational AI

Joined September 2013

Don't wanna be here? Send us removal request.

Mahed Mousavi

@mahedmousavi

3 days

RT @iwsdsmeeting: The Call for Papers and Sponsorship is now open.

0

3

0

Mahed Mousavi

@mahedmousavi

3 days

RT @hafezeh_tarikhi: علی خامنه‌ای، ۲۳شهریور۱۳۸۶:.مردم عراق مشکل برق دارند، مشکل آب سالم دارند، این‌ها جواب لازم دارد. اشغال‌گرها باید جواب….

0

3K

0

Mahed Mousavi

@mahedmousavi

10 days

RT @raffagbernardi: @iwsdsmeeting @UniTrento @UniTrento_DISI Enjoy your summer holiday and then get ready for the IWSD deadline: *October….

0

2

0

Mahed Mousavi

@mahedmousavi

10 days

RT @iwsdsmeeting: 🎉.We’re excited to announce IWSDS 2026 will take place Feb 26 – Mar 1, 2026 in Trento🇮🇹! 🏔️✨.📢 Call for Papers is now ope….

sites.google.com

Welcome

0

10

0

Mahed Mousavi

@mahedmousavi

14 days

RT @layerlens_ai: 🧠 High benchmark scores ≠ real-world reasoning. We teamed up with @mahedmousavi to explore why static evaluations miss th….

0

3

0

Mahed Mousavi

@mahedmousavi

19 days

RT @PahlaviReza: یک‌ هفته پس از رونمایی از پلتفرم #همکاری_ملی، به عنوان یک راه ارتباطی امن برای نیروهای نظامی، امنیتی، انتظامی و دولتی، نزد….

0

11K

0

Mahed Mousavi

@mahedmousavi

22 days

RT @layerlens_ai: A must-read from @mahedmousavi et al. just dropped on arXiv: It confirms what many in the evalua….

arxiv.org

We conduct a systematic audit of three widely used reasoning benchmarks, SocialIQa, FauxPas-EAI, and ToMi, and uncover pervasive flaws in both benchmark items and evaluation methodology. Using...

0

1

0

Mahed Mousavi

@mahedmousavi

22 days

🗑️Garbage In, #Reasoning Out?.High scores🎢 for reasoning in #LLMs? .BUT! What if your fav benchmark is just. BROKEN?.We audit popular reasoning benchmarks & studied how robust LLM reasoning is wrt input uncertainty. 📄 Paper:

arxiv.org

We conduct a systematic audit of three widely used reasoning benchmarks, SocialIQa, FauxPas-EAI, and ToMi, and uncover pervasive flaws in both benchmark items and evaluation methodology. Using...

Mahed Mousavi

@mahedmousavi

1 month

well, the "something" they do is too surface-level to be significant. Garbage in, Briliance Out?.gonna put the paper on arxiv soon btw

1

5

12

Mahed Mousavi

@mahedmousavi

1 month

well, the "something" they do is too surface-level to be significant. Garbage in, Briliance Out?.gonna put the paper on arxiv soon btw

(((ل()(ل() 'yoav))))👾

@yoavgo

2 months

of course LLMs don't "really reason". because if you ask people what is "reasoning" you will get many different answers, either very narrow or technical, or very broad and useless. and LLMs don't match the narrow ones, and the broad ones are too broad. the LLMs do. something.

0

Mahed Mousavi

@mahedmousavi

3 months

⁉️Does "Blocking" put a stop to harassment in VR?.❓Can "Blocking" be a harassment tool itself?.Check out our new paper!!.🗞️"Investigating the Use and Perception of Blocking Feature in Social Virtual Reality Spaces: A Study on Discussion Forums".

dl.acm.org

This study explores the use of 'blocking' as a safety tool in social VR platforms, a feature widely implemented to protect users from harassment. Despite its prevalence, little research investigates...

0

Mahed Mousavi

@mahedmousavi

3 months

RT @sislab7: 🎉 Congrats to @SimoneAlghisi for winning the Best Poster Award🥇 at #ICTDays for Dyknow🦕 @UniTrento_DISI ! 🏆👏💡🎉 .

0

4

0

Mahed Mousavi

@mahedmousavi

4 months

RT @maryamsajedinia: سال نو شد، ۱۴۰۴❤️

0

2

0

Mahed Mousavi

@mahedmousavi

5 months

❓ Can we fix the fragmented #knowledge within the #LLM? . 💡 We tried with a soft #neurosymbolic approach. 🗞️ New preprint:

0

2

Mahed Mousavi

@mahedmousavi

8 months

We will present 🦕Dyknow @ #EMNLP2024 🏖️ today! Come and chat with us! . 📍In-Person Poster Session D, Riverfront Hall.⏲️10:30 - 12:00

SISLab

@sislab7

9 months

SISLab will participate @ #EMNLP2024 with 2 Articles 🧵(2/2): .2. Is Your LLM Outdated? Can you fix it? .Look for "DyKnow🦕" to find out!!! .🗞️.📥 Repo: #knowledge #RAG #LLM #AI #NLP.

0

1

12

Mahed Mousavi

@mahedmousavi

9 months

RT @sislab7: SISLab will participate @ #EMNLP2024 with 2 Articles 🧵(2/2): .2. Is Your LLM Outdated? Can you fix it? .Look for "DyKnow🦕" t….

github.com

Repository for "DyKnow: Dynamically Verifying Time-Sensitive Factual Knowledge in LLMs" presented @ EMNLP 2024 - sislab-unitn/DyKnow

0

1

0

Mahed Mousavi

@mahedmousavi

10 months

solving problems using BERT that can be solved by simple CRF is a skill issue.

merve

@mervenoyann

10 months

solving problems using LLMs that can be solved by fine-tuning BERT is a skill issue.

0

Mahed Mousavi

@mahedmousavi

10 months

📢DyKnow is gonna be presented @ EMNLP'24!!🎉.

Mahed Mousavi

@mahedmousavi

1 year

We present DyKnow🦕, a Dynamic Knowledge benchmark where we show the (high) percentage of outdated knowledge & inconsistent responses in 18 open and closed LLMs! .🗞 Paper: 📥 Repo:

0

1

9

Mahed Mousavi

@mahedmousavi

11 months

#instruction_tuned #LLMs demonstrate a comparatively higher #prompt agreement and output consistency

Mahed Mousavi

@mahedmousavi

1 year

We present DyKnow🦕, a Dynamic Knowledge benchmark where we show the (high) percentage of outdated knowledge & inconsistent responses in 18 open and closed LLMs! .🗞 Paper: 📥 Repo:

0

4

Mahed Mousavi

@mahedmousavi

11 months

🗻🏔️DOLOMITES @ #ACL2024NLP 🏔️🗻.🚀Our cost-effective approach for low-resource settings in #SMM4H'24 challenge ranked 2nd🥈 in T4 (EntityRecognition) and 3rd🥉 in T6 (InfoExtraction) @ACL2024! 🌟 .🗞️: #AI #LLMs #ACL2024.

0

7

Mahed Mousavi

@mahedmousavi

1 year

RT @m_guerini: This is crazy. At this point I think it is necessary to put a limitation on appendix pages.

0

1

0