Amit Sharma Profile
Amit Sharma

@amt_shrma

Followers
4K
Following
2K
Media
54
Statuses
1K

Researcher @MSFTResearch. Co-founder pywhy/dowhy. Work on causality & machine learning. Searching for a path to causal AI https://t.co/tn9kMAmlKw

Bengaluru
Joined October 2010
Don't wanna be here? Send us removal request.
@amt_shrma
Amit Sharma
2 years
New paper: On the unreasonable effectiveness of LLMs for causal inference. GPT4 achieves new SoTA on a wide range of causal tasks: graph discovery (97%, 13 pts gain), counterfactual reasoning (92%, 20 pts gain) & actual causality. How is this possible?🧵.
Tweet card summary image
arxiv.org
The causal capabilities of large language models (LLMs) are a matter of significant debate, with critical implications for the use of LLMs in societally impactful domains such as medicine,...
28
285
1K
@amt_shrma
Amit Sharma
25 days
Overall, the benchmark is challenging & motivates improvements to the search and reasoning process. Joint w/ @naga86 @abhinav_java, @ashmitkx, Sukruta @MSFTResearch. Big Q: What would be the key to improving performance further?.
Tweet card summary image
huggingface.co
0
3
9
@amt_shrma
Amit Sharma
25 days
We also quantify the search, branching, and backtracking behavior using models' reasoning traces. Among DR models, OpenAI model tends to have the highest branching and backtracking events. Number of searches average between 20-40 for each query.
1
0
2
@amt_shrma
Amit Sharma
25 days
Finally, if you are curious, OpenAI DR model performs the best among SoTA models, with an F1=0.55, followed by Perplexity DR. Both needle-in-a-haystack and broad search tasks are challenging, the hardest being materials identification. Non-reasoning models are unable to do well.
Tweet media one
1
0
1
@amt_shrma
Amit Sharma
25 days
Other tasks include prior art search--is a given idea novel?--and finding datasets that satisfy certain properties, another useful task for scientists. We also have general interest tasks such as cultural awards & flight incidents. Each task includes output claims for evaluation.
1
0
1
@amt_shrma
Amit Sharma
25 days
For example, a reasoning problem may ask for properties of a material mentioned in a scientific article. We invert the problem to ask: which material has exactly these properties? With some work, the properties can be extended so that the material(s) can be uniquely identified.
1
0
1
@amt_shrma
Amit Sharma
25 days
Our key idea: problem inversion. Take an existing long-context or document reasoning problem and invert it, turning its answer into the question!.And now the task is to search the web to find this info. This allows easy addition of new DR problems: the 1st live benchmark for DR.
Tweet media one
1
1
4
@amt_shrma
Amit Sharma
25 days
But there’s one issue. Whether a task is DR depends on the corpus. “Oscar movies from books by women authors” may be a DR query, but if a webpage comes up providing exactly this info, it no longer requires research. So how to benchmark DR models as the web continually updates?.
1
1
3
@amt_shrma
Amit Sharma
25 days
Thus, a deep research task can be defined as a <query, claims> tuple. "Claims" can be a nested list, with subclaims supporting each claim. Key Insight: As long as a model can generate the claims, writing the report is a long-form generation task that can be evaluated separately.
1
0
2
@amt_shrma
Amit Sharma
25 days
Another perspective: Deep research is like an extreme form of multi-hop QA. Some problems require intensive searching, some demand deep reasoning. Deep research combines both, corresponding to important scientific & business tasks, e.g., material identification & prior art search
Tweet media one
1
0
2
@amt_shrma
Amit Sharma
25 days
Our core thesis: The defining element of deep research is not the report, but the *information synthesis* process used to generate the claims within a report. And we show how the claim synthesis process can be objectively evaluated. LiveDRBench:
Tweet media one
1
0
2
@amt_shrma
Amit Sharma
25 days
Deep research has emerged as a popular task with many recently released models. But beyond lengthy reports, what exactly defines the task? And how to quantify progress?. [New Paper!] We provide an objective defn. centered on claim discovery & a 100-problem benchmark spanning.
1
9
28
@amt_shrma
Amit Sharma
2 months
RT @abhinav_java: 🚀 Meet FrugalRAG at #ICML2025 in Vancouver 🇨🇦!.📍 July 18 – VecDB Workshop, West 208–209.📍 July 19 – ES-FoMO Workshop, Eas….
0
4
0
@amt_shrma
Amit Sharma
4 months
RT @AniketVashisht8: Extremely happy to have our work on Teaching Transformers Causal Reasoning through Axiomatic Training accepted at ICML….
0
19
0
@amt_shrma
Amit Sharma
4 months
PywhyLLM: Creating an API for language models to interact with causal methods and vice versa. v0.1 out, welcome feedback. If you are at #iclr2025, come check out our poster today at 10am-12:30pm.
Tweet media one
1
14
71
@amt_shrma
Amit Sharma
5 months
Podcast link:
0
1
4
@amt_shrma
Amit Sharma
5 months
What changes for causality research in the age of LLMs and what does not? Enjoyed this conversation with Alex Molak on how LLMs are accelerating causal discovery, how diverse environments can learn help causal agents, and how causality is critical for verifying AI actions. Link👇.
1
7
27
@amt_shrma
Amit Sharma
7 months
Job Alert: @MSFTResearch India is hiring postdocs! A chance to work with some amazing colleagues while doing world-class research. Apply here: DM me if interested in ML/reasoning/causality.
0
17
72
@amt_shrma
Amit Sharma
9 months
Excited to present Axiomatic Training at #NeurIPS2024, a new paradigm to teach causal reasoning to language models!.I try to summarize what LLM systems can do today and what new training paradigms we need to improve their causal reasoning. Slides:.
1
8
43
@amt_shrma
Amit Sharma
1 year
RT @CaLM_Workshop: We are happy 😁 to announce 📢 the First Workshop on Causality and Large Models (C♥️LM) at #NeurIPS2024 . 📜 Submission dea….
0
15
0