Stanford OVAL @StanfordOVAL X Profile

Stanford OVAL

@StanfordOVAL

Followers

2K

Following

48

Media

13

Statuses

231

A research lab developing Expert AI, training large language models to prevent hallucination and enable knowledge-oriented, multilingual and multimodal tasks.

https://t.co/2tu0luj8Wq

Stanford, CA

Joined October 2018

Don't wanna be here? Send us removal request.

Sina Semnani

@sina_semnani

22 days

Excited to share our EMNLP 2025 (Main) paper: "Detecting Corpus-Level Knowledge Inconsistencies in Wikipedia with LLMs." How consistent is English Wikipedia? With the help of LLMs, we estimate 80M+ internally inconsistent facts (~3.3%). Small in percentage, large at corpus scale.

2

9

28

Stanford OVAL

@StanfordOVAL

9 months

Please register for the tutorial here: https://t.co/Qf3t1pdbOe Checkout the workshop website: https://t.co/5JdMYyMceS Our pilot program, already embraced by over 400,000 users, generates Wikipedia-like articles through intelligent internet research:

hai.stanford.edu

1

2

4

Stanford OVAL

@StanfordOVAL

9 months

Feb 14, 2025. Open & live-streamed tutorial: Transforming LLMs into Reliable Knowledge Assistants Discover how to harness LLMs to create trustworthy and efficient knowledge assistants for various informational needs on your own knowledge corpus. This tutorial will discuss and

1

6

Stanford OVAL

@StanfordOVAL

9 months

Announcing the first workshop on a Public AI Assistant to World Wide Knowledge (WWK), Feb 13-14, 2025 @Stanford, sponsored by the @SloanFoundation and @StanfordHAI. Feb 13, 2025. Invitation-only in-person and live-streamed: The Public AI Assistant Initiative Join us in the

1

0

3

Stanford OVAL

@StanfordOVAL

9 months

Democratizing AI-Assisted Access to Knowledge! The Stanford OVAL Lab is leading an initiative to create a public AI Assistant that democratizes access to the world's knowledge. Our pilot program, already embraced by over 400,000 users, generates Wikipedia-like articles through

2

15

33

Shicheng Liu

@ShichengGLiu

11 months

🌱Excited to introduce SPINACH, a Knowledge Base Question Answering agent & dataset on Wikidata, presented at EMNLP 2024! It combines LLMs, semantic parsing and graph traversal to set a new SOTA & is actively used by the Wikidata community.

1

24

55

Sina Semnani

@sina_semnani

1 year

Announcing WikiChat v2.0! 🌎Multilingual support for 🇺🇸🇨🇳🇪🇸🇵🇹🇷🇺🇩🇪🇮🇷🇯🇵🇫🇷🇮🇹 🔎Improved info retrieval with BGE-M3 embeddings & @qdrant_engine ⚡Optimized pipeline and expanded LLM support 🔗Compatible with @LangChainAI and @chainlit_io Code: https://t.co/O76IHvygw0 #NLProc

1

8

19

Stanford OVAL

@StanfordOVAL

1 year

Big congrats to the WikiChat team led by @sina_semnani !

Wiki Workshop 2025

@wikiworkshop

1 year

The @Wikimedia Research Award of the Year 2024 goes to "WikiChat: Stopping the hallucination of large language model chatbots by few-shot grounding on Wikipedia" ⚡ 📜 https://t.co/d2M8Qrarkw

0

2

3

Stanford OVAL

@StanfordOVAL

1 year

3 OVAL projects are awarded 2024-2025 Magic Grants! “African History from the Bottom Up with LLM-Augmented Agents”, @sina_semnani et al. “Cross-Lingual Multi-Perspective News”, @liamjxu et al. “DataTalk: All Documents and Data, All at Once, All Verified”, @ShichengGLiu et al.

The Brown Institute

@BrownInstitute

1 year

The happiest day of our year! Introducing the @BrownInstitute's 2024-2025 cohort of Magic Grant winners!

0

4

Yijia Shao

@EchoShao8899

2 years

Can we teach LLMs to write long articles from scratch, grounded in trustworthy sources? Do Wikipedia editors think this can assist them? 📣Announcing STORM, a system that writes Wikipedia-like articles based on Internet search. I now use STORM in my daily research!🧵

40

195

1K

Sina Semnani

@sina_semnani

2 years

We introduce WikiChat, an LLM-based chatbot that almost never hallucinates, has high conversationality and low latency. Read more in our #EMNLP2023 findings paper https://t.co/F9clNBjgLb Check out our demo: https://t.co/XCMZJmT7vg Or try our code: https://t.co/O76IHvygw0 #NLProc

8

44

180

Stanford OVAL

@StanfordOVAL

2 years

Stanford’s CS 224V is hosting the final project expo on Wed, Dec. 6th, 3:00 - 5:30pm in Gates CS Building. ~50 teams worked to create LLM-powered conversational assistants. This is a great chance to meet top students in conversational assistant technology! https://t.co/mkBJTxKWDg

0

1

0

WikiResearch

@WikiResearch

2 years

"WikiChat: Combating Hallucination of Large Language Models by Few-Shot Grounding on @Wikipedia" (Semnani et al, 2023) https://t.co/v8RT6CnZJE

0

15

52

WikiResearch

@WikiResearch

2 years

"Wikidata, with its over 12 billion facts, can be used to ground LLMs to improve their factuality," reducing hallucinations https://t.co/a1CjRxW2wJ https://t.co/VDqrVG4DXx #SPARQL

4

48

207

Stanford OVAL

@StanfordOVAL

3 years

Overall, our findings suggest that synthesized data can be used to effectively augment a small amount of manually annotated data, yield much higher accuracy than previously possible.

0

Stanford OVAL

@StanfordOVAL

3 years

We train a contextual semantic parser using our strategy, and obtain 79% turn-by-turn exact match accuracy on a test set manually reannotated by experts.

1

0

Stanford OVAL

@StanfordOVAL

3 years

Evaluating on the MultiWOZ dataset, we find that ThingTalk can represent precisely 98% of the test turns, while the simulator can emulate 85% of the validation set.

1

0

Stanford OVAL

@StanfordOVAL

3 years

The synthesized data is combined with a small amount of manually annotated data. As the manual annotation is limited, it can be performed by an expert, yielding much better quality in practice.

1

0

Stanford OVAL

@StanfordOVAL

3 years

To tackle the annotation issue, we propose to synthesize a large dataset of dialogues, using the simulator followed by automatic paraphrasing from a large language model.

1

0

Stanford OVAL

@StanfordOVAL

3 years

As a formally executable representation with domain-independent semantics, ThingTalk is precise enough to build both an actual agent for MultiWOZ, and a rule-based simulator that can generate realistic conversations across multiple domains

1

0