 
            
              Stanford OVAL
            
            @StanfordOVAL
Followers
                2K
              Following
                48
              Media
                13
              Statuses
                231
              A research lab developing Expert AI, training large language models to prevent hallucination and enable knowledge-oriented, multilingual and multimodal tasks.
              
              Stanford, CA
            
            
              
              Joined October 2018
            
            
           Excited to share our EMNLP 2025 (Main) paper: "Detecting Corpus-Level Knowledge Inconsistencies in Wikipedia with LLMs." How consistent is English Wikipedia? With the help of LLMs, we estimate 80M+ internally inconsistent facts (~3.3%). Small in percentage, large at corpus scale. 
          
                
                2
              
              
                
                9
              
              
                
                28
              
             Please register for the tutorial here:  https://t.co/Qf3t1pdbOe  Checkout the workshop website:  https://t.co/5JdMYyMceS  Our pilot program, already embraced by over 400,000 users, generates Wikipedia-like articles through intelligent internet research: 
          
            
            hai.stanford.edu
            
                
                1
              
              
                
                2
              
              
                
                4
              
             Feb 14, 2025. Open & live-streamed tutorial: Transforming LLMs into Reliable Knowledge Assistants Discover how to harness LLMs to create trustworthy and efficient knowledge assistants for various informational needs on your own knowledge corpus. This tutorial will discuss and 
          
                
                1
              
              
                
                1
              
              
                
                6
              
             Announcing the first workshop on a Public AI Assistant to World Wide Knowledge (WWK), Feb 13-14, 2025 @Stanford, sponsored by the @SloanFoundation and @StanfordHAI. Feb 13, 2025. Invitation-only in-person and live-streamed: The Public AI Assistant Initiative Join us in the 
          
                
                1
              
              
                
                0
              
              
                
                3
              
             Democratizing AI-Assisted Access to Knowledge! The Stanford OVAL Lab is leading an initiative to create a public AI Assistant that democratizes access to the world's knowledge. Our pilot program, already embraced by over 400,000 users, generates Wikipedia-like articles through 
          
                
                2
              
              
                
                15
              
              
                
                33
              
             🌱Excited to introduce SPINACH, a Knowledge Base Question Answering agent & dataset on Wikidata, presented at EMNLP 2024! It combines LLMs, semantic parsing and graph traversal to set a new SOTA & is actively used by the Wikidata community. 
          
                
                1
              
              
                
                24
              
              
                
                55
              
             Announcing WikiChat v2.0! 🌎Multilingual support for 🇺🇸🇨🇳🇪🇸🇵🇹🇷🇺🇩🇪🇮🇷🇯🇵🇫🇷🇮🇹 🔎Improved info retrieval with BGE-M3 embeddings & @qdrant_engine ⚡Optimized pipeline and expanded LLM support 🔗Compatible with @LangChainAI and @chainlit_io Code:  https://t.co/O76IHvygw0 
            #NLProc
          
          
                
                1
              
              
                
                8
              
              
                
                19
              
             Big congrats to the WikiChat team led by @sina_semnani ! 
           The @Wikimedia Research Award of the Year 2024 goes to "WikiChat: Stopping the hallucination of large language model chatbots by few-shot grounding on Wikipedia" ⚡ 📜  https://t.co/d2M8Qrarkw 
            
            
                
                0
              
              
                
                2
              
              
                
                3
              
             3 OVAL projects are awarded 2024-2025 Magic Grants! “African History from the Bottom Up with LLM-Augmented Agents”, @sina_semnani et al. “Cross-Lingual Multi-Perspective News”, @liamjxu et al. “DataTalk: All Documents and Data, All at Once, All Verified”, @ShichengGLiu et al. 
           The happiest day of our year! Introducing the @BrownInstitute's 2024-2025 cohort of Magic Grant winners! 
            
                
                0
              
              
                
                4
              
              
                
                4
              
             Can we teach LLMs to write long articles from scratch, grounded in trustworthy sources? Do Wikipedia editors think this can assist them? 📣Announcing STORM, a system that writes Wikipedia-like articles based on Internet search. I now use STORM in my daily research!🧵 
          
                
                40
              
              
                
                195
              
              
                
                1K
              
             We introduce WikiChat, an LLM-based chatbot that almost never hallucinates, has high conversationality and low latency. Read more in our #EMNLP2023 findings paper  https://t.co/F9clNBjgLb  Check out our demo:  https://t.co/XCMZJmT7vg  Or try our code:  https://t.co/O76IHvygw0 
            #NLProc
          
          
                
                8
              
              
                
                44
              
              
                
                180
              
             Stanford’s CS 224V is hosting the final project expo on Wed, Dec. 6th, 3:00 - 5:30pm in Gates CS Building. ~50 teams worked to create LLM-powered conversational assistants. This is a great chance to meet top students in conversational assistant technology!  https://t.co/mkBJTxKWDg 
          
          
                
                0
              
              
                
                1
              
              
                
                0
              
             "WikiChat: Combating Hallucination of Large Language Models by Few-Shot Grounding on @Wikipedia" (Semnani et al, 2023)  https://t.co/v8RT6CnZJE 
          
          
                
                0
              
              
                
                15
              
              
                
                52
              
             "Wikidata, with its over 12 billion facts, can be used to ground LLMs to improve their factuality," reducing hallucinations  https://t.co/a1CjRxW2wJ 
             https://t.co/VDqrVG4DXx 
            #SPARQL
          
          
                
                4
              
              
                
                48
              
              
                
                207
              
             Overall, our findings suggest that synthesized data can be used to effectively augment a small amount of manually annotated data, yield much higher accuracy than previously possible. 
          
                
                0
              
              
                
                0
              
              
                
                0
              
             We train a contextual semantic parser using our strategy, and obtain 79% turn-by-turn exact match accuracy on a test set manually reannotated by experts. 
          
                
                1
              
              
                
                0
              
              
                
                0
              
             Evaluating on the MultiWOZ dataset, we find that ThingTalk can represent precisely 98% of the test turns, while the simulator can emulate 85% of the validation set. 
          
                
                1
              
              
                
                0
              
              
                
                0
              
             The synthesized data is combined with a small amount of manually annotated data. As the manual annotation is limited, it can be performed by an expert, yielding much better quality in practice. 
          
                
                1
              
              
                
                0
              
              
                
                0
              
             To tackle the annotation issue, we propose to synthesize a large dataset of dialogues, using the simulator followed by automatic paraphrasing from a large language model. 
          
                
                1
              
              
                
                0
              
              
                
                0
              
             As a formally executable representation with domain-independent semantics, ThingTalk is precise enough to build both an actual agent for MultiWOZ, and a rule-based simulator that can generate realistic conversations across multiple domains 
          
                
                1
              
              
                
                0
              
              
                
                0
              
             
             
             
               
               
            