Amit Sharma @amt_shrma X Profile

Amit Sharma

@amt_shrma

Followers

4K

Following

2K

Media

54

Statuses

1K

Researcher @MSFTResearch. Co-founder pywhy/dowhy. Work on causality & machine learning. Searching for a path to causal AI https://t.co/tn9kMAmlKw

Bengaluru

Joined October 2010

Don't wanna be here? Send us removal request.

Amit Sharma

@amt_shrma

2 years

New paper: On the unreasonable effectiveness of LLMs for causal inference. GPT4 achieves new SoTA on a wide range of causal tasks: graph discovery (97%, 13 pts gain), counterfactual reasoning (92%, 20 pts gain) & actual causality. How is this possible?🧵.

arxiv.org

The causal capabilities of large language models (LLMs) are a matter of significant debate, with critical implications for the use of LLMs in societally impactful domains such as medicine,...

28

285

1K

Amit Sharma

@amt_shrma

25 days

Overall, the benchmark is challenging & motivates improvements to the search and reasoning process. Joint w/ @naga86 @abhinav_java, @ashmitkx, Sukruta @MSFTResearch. Big Q: What would be the key to improving performance further?.

huggingface.co

0

3

9

Amit Sharma

@amt_shrma

25 days

We also quantify the search, branching, and backtracking behavior using models' reasoning traces. Among DR models, OpenAI model tends to have the highest branching and backtracking events. Number of searches average between 20-40 for each query.

1

0

2

Amit Sharma

@amt_shrma

25 days

Finally, if you are curious, OpenAI DR model performs the best among SoTA models, with an F1=0.55, followed by Perplexity DR. Both needle-in-a-haystack and broad search tasks are challenging, the hardest being materials identification. Non-reasoning models are unable to do well.

1

0

1

Amit Sharma

@amt_shrma

25 days

Other tasks include prior art search--is a given idea novel?--and finding datasets that satisfy certain properties, another useful task for scientists. We also have general interest tasks such as cultural awards & flight incidents. Each task includes output claims for evaluation.

1

0

1

Amit Sharma

@amt_shrma

25 days

For example, a reasoning problem may ask for properties of a material mentioned in a scientific article. We invert the problem to ask: which material has exactly these properties? With some work, the properties can be extended so that the material(s) can be uniquely identified.

1

0

1

Amit Sharma

@amt_shrma

25 days

Our key idea: problem inversion. Take an existing long-context or document reasoning problem and invert it, turning its answer into the question!.And now the task is to search the web to find this info. This allows easy addition of new DR problems: the 1st live benchmark for DR.

1

4

Amit Sharma

@amt_shrma

25 days

But there’s one issue. Whether a task is DR depends on the corpus. “Oscar movies from books by women authors” may be a DR query, but if a webpage comes up providing exactly this info, it no longer requires research. So how to benchmark DR models as the web continually updates?.

1

3

Amit Sharma

@amt_shrma

25 days

Thus, a deep research task can be defined as a <query, claims> tuple. "Claims" can be a nested list, with subclaims supporting each claim. Key Insight: As long as a model can generate the claims, writing the report is a long-form generation task that can be evaluated separately.

1

0

2

Amit Sharma

@amt_shrma

25 days

Another perspective: Deep research is like an extreme form of multi-hop QA. Some problems require intensive searching, some demand deep reasoning. Deep research combines both, corresponding to important scientific & business tasks, e.g., material identification & prior art search

1

0

2

Amit Sharma

@amt_shrma

25 days

Our core thesis: The defining element of deep research is not the report, but the *information synthesis* process used to generate the claims within a report. And we show how the claim synthesis process can be objectively evaluated. LiveDRBench:

1

0

2

Amit Sharma

@amt_shrma

25 days

Deep research has emerged as a popular task with many recently released models. But beyond lengthy reports, what exactly defines the task? And how to quantify progress?. [New Paper!] We provide an objective defn. centered on claim discovery & a 100-problem benchmark spanning.

1

9

28

Amit Sharma

@amt_shrma

2 months

RT @abhinav_java: 🚀 Meet FrugalRAG at #ICML2025 in Vancouver 🇨🇦!.📍 July 18 – VecDB Workshop, West 208–209.📍 July 19 – ES-FoMO Workshop, Eas….

0

4

0

Amit Sharma

@amt_shrma

4 months

RT @AniketVashisht8: Extremely happy to have our work on Teaching Transformers Causal Reasoning through Axiomatic Training accepted at ICML….

0

19

0

Amit Sharma

@amt_shrma

4 months

RT @sirbayes: @amt_shrma Sounds very cool. Here is link to paper (hard to find since it seems to be a TMLR paper, not an official ICLR pape….

arxiv.org

The causal capabilities of large language models (LLMs) are a matter of significant debate, with critical implications for the use of LLMs in societally impactful domains such as medicine,...

0

4

0

Amit Sharma

@amt_shrma

4 months

PywhyLLM: Creating an API for language models to interact with causal methods and vice versa. v0.1 out, welcome feedback. If you are at #iclr2025, come check out our poster today at 10am-12:30pm.

1

14

71

Amit Sharma

@amt_shrma

5 months

Podcast link:

0

1

4

Amit Sharma

@amt_shrma

5 months

What changes for causality research in the age of LLMs and what does not? Enjoyed this conversation with Alex Molak on how LLMs are accelerating causal discovery, how diverse environments can learn help causal agents, and how causality is critical for verifying AI actions. Link👇.

1

7

27

Amit Sharma

@amt_shrma

7 months

Job Alert: @MSFTResearch India is hiring postdocs! A chance to work with some amazing colleagues while doing world-class research. Apply here: DM me if interested in ML/reasoning/causality.

0

17

72

Amit Sharma

@amt_shrma

9 months

Excited to present Axiomatic Training at #NeurIPS2024, a new paradigm to teach causal reasoning to language models!.I try to summarize what LLM systems can do today and what new training paradigms we need to improve their causal reasoning. Slides:.

1

8

43

Amit Sharma

@amt_shrma

1 year

RT @CaLM_Workshop: We are happy 😁 to announce 📢 the First Workshop on Causality and Large Models (C♥️LM) at #NeurIPS2024 . 📜 Submission dea….

0

15

0