Vyas Raina @vyasraina_nlp X Profile

Vyas Raina

@vyasraina_nlp

Followers

8

Following

9

Media

0

Statuses

6

Joined May 2023

Don't wanna be here? Send us removal request.

Jonathan Roberts

@JRobertsAI

8 months

Is computer vision “solved”? Not yet Current models score 0% on ZeroBench 🧵1/6

58

254

3K

Ivaxi Sheth

@ivakshi_s

1 year

Excited to share that our work on LLM Task Switch has been accepted at #EMNLP2024 @emnlpmeeting Main! Check out the paper - https://t.co/6r3GTGE7VV And the GitHub -

github.com

[EMNLP'24] Evaluating LLM performance and sensitivity when there is a "task-switch". Code for "LLM Task Interference: An Initial Study on the Impact of Task-Switc...

Ivaxi Sheth

@ivakshi_s

2 years

1/N 🧵🚀Ask an LLM a maths question, it does well. Ask it the same question after a conversation about sentiment and it fails: we have a problem... Checkout our work on task-switch: https://t.co/6r3GTGE7VV With @AkashGu30808281, @vyasraina_nlp , Mark Gales, @mariojfritz

2

5

36

Ivaxi Sheth

@ivakshi_s

1 year

Excited to share our new work! We explore how Large Language Models (LLMs) can be used to hypothesize missing causal variables in scientific discovery! Our study systematically evaluates hypothesis generation across different tasks and assumptions. 1/n

1

9

46

Vyas Raina

@vyasraina_nlp

2 years

As LLMs move from research into real-world deployment, it is more important than ever to ensure they operate as desired in situations not typically assessed by benchmarks. This work shows that LLMs are not yet ready to be all-in-one agents. #LLMs #GPT4 #NLP2024

Ivaxi Sheth

@ivakshi_s

2 years

1/N 🧵🚀Ask an LLM a maths question, it does well. Ask it the same question after a conversation about sentiment and it fails: we have a problem... Checkout our work on task-switch: https://t.co/6r3GTGE7VV With @AkashGu30808281, @vyasraina_nlp , Mark Gales, @mariojfritz

0

2

1

Ivaxi Sheth

@ivakshi_s

2 years

1/N 🧵🚀Ask an LLM a maths question, it does well. Ask it the same question after a conversation about sentiment and it fails: we have a problem... Checkout our work on task-switch: https://t.co/6r3GTGE7VV With @AkashGu30808281, @vyasraina_nlp , Mark Gales, @mariojfritz

5

11

42