
Pan Lu
@lupantech
Followers
6K
Following
2K
Media
226
Statuses
978
Postdoc @Stanford | PhD @CS_UCLA @uclanlp | Amazon/Bloomberg/Qualcomm Fellows | Ex @Tsinghua_Uni @Microsoft @allen_ai | ML/NLP: AI4Math, AI4Science, LLM, Agents
Joined April 2016
🎉 Thrilled to share that OctoTools won the Best Paper Award at @knowledgenlp #NAACL! 🏆. OctoTools is a flexible, easy-to-use framework that equips LLMs with diverse tools for complex reasoning—just customize your agent by mixing modular “tool cards” like building with Lego 🧩
🐙 Introducing OctoTools: an agentic framework with extensible tools for complex reasoning! 🚀 🧵. 🔗 Explore now: OctoTools tackles challenges in complex reasoning—including visual understanding, domain knowledge retrieval, numerical reasoning, and
3
14
87
🔥Excited to release LLaMA-Adapter! With only 1.2M learnable parameters and 52K instruction data, LLaMA-Adapter turns a #LLaMA into an instruction-following model within ONE hour, delivering high-quality responses!. 🚀Paper: 🚀Code:
24
175
809
🔥Thrilled to release LLaMa-Adapter Multimodal!. 🎯Now supporting text, image, audio, and video inputs powered by #ImageBind. 🧵6. 💻Codes for inference, pretraining, and finetuning ➕ checkpoints:.demo: abs:
16
147
628
🎉Exciting news: LLaMA-Adapter is now fully unlocked! 🧵6. 1⃣ As a general-purpose #multimodal foundation model, it integrates various inputs like images, audio, text, video, and 3D point clouds, while providing image, text-based, and detection outputs. It uniquely accepts the
25
158
589
🚨 BREAKING: @OpenAI's new GPT-4o model outperforms humans on MathVista for the first time!. 📊 Scores: .Human avg: 60.3 .GPT-4o: 63.8. 📖 Learn more:.OpenAI : MathVista:
We're opening up access to our new flagship model, GPT-4o, and features like browse, data analysis, and memory to everyone for free (with limits).
8
89
517
🚀Introducing #LLaMA2-Accessory - an advanced open-source toolkit for large language models. Evolved from LLaMA-Adapter, we now support more datasets, tasks, visual encoders, and efficient optimization methods.🧠. 🔗Code: 💡Key Features:. 🎯 Pre-training
13
131
496
🚀 Truly impressed by the remarkable progress from @xai! Grok-2 and Grok-2 mini now hold the top two spots on #MathVista (!. Even more impressive is the rapid boost by @xai, raising the Grok series' scores from 52.8% to 69% in just 4 months. Respect! 👏
12
41
313
🚀65B LLaMA-Adapter-V2 code & checkpoint are NOW ready at .🛠️Big update enhancing multimodality & chatbot. 🔥LLaMA-Adapter-V2 surpasses #ChatGPT in response quality (102%:100%) & beats #Vicuna in win-tie-lost (50:14). ☕️Thanks to Peng Gao & @opengvlab!.2/2
11
100
397
🎉New paper! The survey of deep learning for mathematical reasoning (#DL4MATH) is now available. We've seen tremendous growth in this community since 2018, and this review covers the tasks, datasets, and methods from the past decade. Check it out now:
6
78
331
🚀Excited to release our 112-page study on math reasoning in visual contexts via #MathVista. For the first time, we provide both quantitative and qualitative evaluations of #GPT4V, #Bard, & 10 other models. 📄✨Full paper: 🔗Proj:
16
79
310
Congrats, @JeffDean @GoogleDeepMind! Gemini 1.5 Pro has shown substantial improvements from Feb to May, scoring 63.9% on our #MathVista (, outperforming humans and GPT-4o, which was out 4 days ago!🚀. AI Progress has never been this rapid and impressive!🌟
Gemini 1.5 Model Family: Technical Report updates now published. In the report we present the latest models of the Gemini family – Gemini 1.5 Pro and Gemini 1.5 Flash, two highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information
7
61
296
🚀 Introducing #SPHINX: The Next-Gen #Multimodal_LLM. Seamlessly blending Tasks, Embeddings & Weights for advanced multimodal reasoning. 🧵N. 🔍Demo: 💻Code: What's New with #SPHINX compared to #LLaMA_Adapter? 🆕. ✅ Powered by the
11
66
265
🚀Meet Chameleon! An innovative plug-and-play framework enhancing #GPT4 and #ChatGPT like #AutoGPT for compositional reasoning, blending off-the-shelf tools with tailored LLM models 🔧✨🧠. New SOTA on #ScienceQA and TabMWP! 📈. 🔗📜
12
72
256
🚀 Introducing the LLaMA-Adapter, now available on @huggingface!. 🔗 🎉 Feel free to explore and experiment with our LLaMA-Adapter. We're eager to hear your feedback!. 💥 Stay tuned for the upcoming second version - even more powerful and feature-packed!.
3
40
240
🚀 o1 is now released by @OpenAI! It's trained to think slowly with a long chain of thought. It works impressively and may unlock hard tasks in science and math, setting a new SOTA with 73.2% on #MathVista!. Leaderboard: Blog:
We're releasing a preview of OpenAI o1—a new series of AI models designed to spend more time thinking before they respond. These models can reason through complex tasks and solve harder problems than previous models in science, coding, and math.
6
41
244
🎉 Thrilled to have our MathVista work accepted at #ICLR2024 as an Oral presentation!. Explore our work:.🔍 Project: 🤗 @huggingface Dataset @_akhaliq: 💻 Code: Deepest gratitude to our shining team: 👏🌟
🚀Excited to release our 112-page study on math reasoning in visual contexts via #MathVista. For the first time, we provide both quantitative and qualitative evaluations of #GPT4V, #Bard, & 10 other models. 📄✨Full paper: 🔗Proj:
7
33
245
I am thrilled to defend my PhD and finally earn the title of Doctor🧑🎓. It's been a truly rewarding journey at @UCLAComSci. I'm so fortunate and grateful for the invaluable mentorship from Prof. @kaiwei_chang @uclanlp. He has always been incredibly encouraging, helpful, and.
Congrats 🎉 to the newly titled Dr. Lu @lupantech on defending his thesis about mathematical reasoning with language models"! 🧮 Pan has published a series of works on quantifying and improving math and scientific reasoning ability in LLMs. Some highlights:.
42
2
229
I'm excited to join Prof. @james_y_zou group as a postdoc scholar, aiming to push the boundaries of AI for scientific discovery #AI4Science. I've had an incredible and rewarding time with the @uclanlp group and the VCLA group @UCLAComSci. Deeply grateful to all my mentors,
16
2
223
🔥 Introducing #SPHINX 🦁: an all-in-one multimodal LLM with a unified interface that seamlessly integrates domains, tasks, & embeddings. 🧵N. 👋 Explore the @Gradio demo @_akhaliq: Dive into the open resources!.🤗 Model @huggingface:
13
52
208
🔍 Does Multi-modal LLMs Truly Understand Diagrams in Visual Math Problems?. 🧐 Interest in visual math reasoning has surged in the era of Multi-modal LLMs (#MLLMs). Although showing promising potential, it remains uncertain whether MLLMs utilize visual or textual shortcuts to
MathVerse. Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?. The remarkable progress of Multi-modal Large Language Models (MLLMs) has garnered unparalleled attention, due to their superior performance in visual contexts. However, their capabilities in
1
33
208
🤔 Ever wondered why foundation models like LLMs & LMMs are only tested on textual math reasoning benchmarks?. 🔍 Dive into our #MathVista for a fresh perspective: . 🌟 Introducing #MathVista: A groundbreaking benchmark for visual mathematical reasoning –
13
49
185
🌟Last week, I am honored to present our latest work #Chameleon to the Reasoning Team at Google Brain @DeepMind. It's encouraging to witness tool-augmented LLMs like Transformer Agents @huggingface and Chameleon garnering significant attention. 🧵6.Slides:
3
33
160
#TextGrad now features multimodal reasoning!. 🔬 ScienceQA (multimodal scientific reasoning).- Error rate drops by 20%, achieving the highest zero-shot performance we know of. 📊 MathVista (multimodal math reasoning).- Boosting the score from 63.8% to 66.1% on GPT-4o!. Explore
⚡️This is the most fun project!. We built PyTorch-for-text! 🔥.#TextGrad: automated "differentiation" via text to optimize AI systems by backpropagating LLM text feedback. TextGrad + GPT4o:.💻LeetCodeHard best score.❓GPQA sota.🧬Designs new molecules.🩺Improves treatments 🧵
7
33
160
Model editing has been an effective way to reduce hallucinations in LLMs, instead of undergoing resource-intensive retraining. 🤯However, our study, led by @JasonForJoy, @kaiwei_chang, & @VioletNPeng, reveals that current methods inadvertently impair the general skills of LLMs.
1
30
155
🚨Thrilled to have one paper accepted to #NeurIPS2022! We construct a new benchmark, ScienceQA, and design language models to learn to generate lectures and explanations as the chain of thought to mimic the multi-hop reasoning process. Data and code will be coming soon!
2
14
142
📢📢Excited to have one paper accepted to #NeurIPS2022! We present a new dataset, ScienceQA, and develop large language models to learn to generate lectures and explanations as the chain of thought (CoT). Data and code are public now! Please check👇👇.
4
27
144
🔥 Exciting Update! We've manually evaluated #GPT4V using the playground chatbot on #MathVista, our newest benchmark for visual mathematical reasoning. 🚀 #GPT4V soared with a 15.1%⬆️ improvement over #Bard, setting a new record at 49.9%! 🎉. 🌐 Yet,
3
28
132
Our #Chameleon ranked #1 among 1682 AI papers last week by @alphasignalai, emphasizing the significant impact our work has made. #Chameleon is a plug-and-play reasoning framework, enabling LLMs to utilize diverse tools. 🔗 🎉 More:
1
35
127
🤖 Could #LLMs develop emotional intelligence to undestand human social interactions?. Introducing KokoMind 🦍: a benchmark to evaluate how #gpt4, #chatgpt, & #claude interpret conversations and relations, and contribute with insightful advices. 💥 Demo:
Put ChatGPT at a cocktail party🥂. Can it.- understand people's conversations, gestures.- figure out their relations,.- and even chime in with social advice?. 🦍Announce KokoMind. 🌟Check out this demo! More at . #AI #GPT4 #ChatGPT #OpenAI #Shrinking 🧵
4
25
126
Thrilled to be awarded the prestigious @Bloomberg #DataScience Ph.D. Fellowship! 🏆 Grateful for the support and mentorship from @TechAtBloomberg to advance my AI research, especially in LLMs. Heartfelt thanks to @kaiwei_chang @uclanlp & @UCLAComSci for their tremendous support!.
Congratulations to @UCLAComSci / @UCLAengineering + @uclanlp's @lupantech on being one of the 2023-2024 @Bloomberg #DataScience Ph.D. Fellows!.Learn more about Pan’s research focus and our latest cohort of Ph.D. Fellows: #AI #ML #NLProc #LLMs
5
4
110
Introducing #STIC: A Self-Training Method for Large Vision Language Models (LVLMs)! 🌟 🧵. STIC empowers LVLMs to self-train and enhance reasoning abilities using self-constructed preference data on image descriptions, eliminating the need for labeled data! 🚀📈. Straightforward
7
20
102
🚀 Introducing MuirBench! 🌟. A groundbreaking benchmark for robust multi-image understanding, featuring: .📸 12 diverse tasks .🗂️ 10 categories of multi-image relations .🖼️ 11,264 images .❓ 2,600 multiple-choice questions. Even top models like GPT-4o and Gemini Pro find it
Can GPT-4o and Gemini-Pro handle 𝐦𝐮𝐥𝐭𝐢𝐩𝐥𝐞 𝐢𝐦𝐚𝐠𝐞𝐬?. Introducing MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding. 🌐 Explore here: 📄 Paper: 📊 Data:
2
14
102
Can machines answer multi-modal math word problems? We proposed a new task, Icon Question Answering #IconQA, to deal with it!. Details are available below:.Paper: Project: Code:
3
23
93
Excited to meet @ylecun with the @uclanlp labmates @JasonForJoy, @LiLiunian, and @ZiYiDou! 😝. #NeurIPS2023
0
3
92
🎉 Pleased to share that our paper on Multimodal Procedural Planning has been accepted at #EMNLP2024!. We introduce Text-Image Prompting (TIP) ✨—a novel dual-modality framework combining LLMs 🧠 with image generation 🖼️ to create richer, more accurate plans. This is a key step
Excited to share that our paper is in EMNLP 2024 Findings! We received positive reviews (4/3.5/3.5) and a tough critique from AC. Thanks to the strong rebuttal, we made it through. Looking forward to meeting old friends and new ones in Miami!. 📜paper:
4
15
89
🤖In sciences and finance, we often engage in statistical and causal reasoning with structured data. Ever dreamed of #LLMs doing the heavy lifting, clearing the path from the maze of complex and error-prone tasks? 🤯. Hold that thought! 🛑 Our findings reveal that even GPT-4
Are LLMs Capable of Data-based Statistical and Causal Reasoning?.In this work, we propose a benchmark QRData (Quantitative Reasoning with Data) to evaluate models' capability in statistical and causal reasoning with real-world data. 🌐:
0
21
85
📢 Can't wait to see you at the 3rd #MathAI Workshop in the LLM Era at #NeurIPS2023!. ⏰ 8:55am - 5:00pm, Friday, Dec 15.📍 Room 217-219.🔗 📽️ Exciting Lineup:.⭐️ Six insightful talks by @KristinLauter, @BaraMoa, @noahdgoodman,
4
19
83
I am honored to win the @Qualcomm Innovation Fellowship! A heartfelt thank you to @kaiwei_chang for your kind words and encouragement. I am grateful to our team, including @liujc1998 and Professor @HannaHajishirzi. This achievement wouldn't have been possible without you all! ❤️.
Congrats @lupantech for winning the 2023 Qualcomm Innovation Fellowship! 🐻 Pan is a rock star in math and scientific reasoning in NLP!.
3
5
86
💥💥Update Alert! Radar graphs & leaderboard on #MathVista now feature detailed scores for the #Gemini family models. 🚀. 🔍 Insight: Gemini Ultra leads the pack, outperforming GPT-4V by 3.1%! Yet, each model shines uniquely in various math reasoning & visual contexts. 🙏 Big
2
16
82
🔥Thrilled to announce that our LLaMA-Adapter has been featured in Lit-LLaMA by @LightningAI🦙🦙.🚀 Check out our LLaMA-Adapter here: ⚡️ Explore Lit-LLaMA on GitHub:
Progress update!🦙🔥🤓. Lit-LLaMA now implements the LLaMA-Adapter method for efficient fine-tuning 🔧⚡️. The core idea can be implemented in about 11 lines of code🤯 (see screenshot). Link to repo👉 Link to Adapter paper👉
2
12
83
Privileged to have the opportunity to guest lecture on #NLP course @CS_UCLA, instructed by Prof. @kaiwei_chang. I really enjoyed it and am so glad to share recent advancements in mathematical reasoning and commonsense reasoning.🧵3. 🔗Check out the slides:
4
7
76
🦙Please check out LLaMA-Adapter-V2, performing open-ended multi-modal visual instructions by merely introducing 14M learnable parameters over 65B #LLaMA. abs: repo: weights: video:
🚀65B LLaMA-Adapter-V2 code & checkpoint are NOW ready at .🛠️Big update enhancing multimodality & chatbot. 🔥LLaMA-Adapter-V2 surpasses #ChatGPT in response quality (102%:100%) & beats #Vicuna in win-tie-lost (50:14). ☕️Thanks to Peng Gao & @opengvlab!.2/2
0
22
76
Hey Friends! 🎉 Excited to be at #NeurIPS2023! 🚀 I’ll be presenting a paper 📄, co-organizing the MATH-AI workshop 🧮, and sharing three collaborative projects. Can't wait to meet you in New Orleans 🎭 and explore the AI advancements in math, science, and more! 🤖🧪. 👇1⃣2⃣3⃣4⃣
1
5
75
Excited to see the release of Gemini!. It is more excited to see that Gemini @google features MathVista for evaluating math reasoning in visual contexts and Geometry3K for evaluating geometry reasoning!!. Congratulations and thanks @GoogleDeepMind, @GoogleResearch, and @Google!
I’m very excited to share our work on Gemini today! Gemini is a family of multimodal models that demonstrate really strong capabilities across the image, audio, video, and text domains. Our most-capable model, Gemini Ultra, advances the state of the art in 30 of 32 benchmarks,
1
5
73
Today, we presented our #MathVista ( at #ICLR2024 in Vienna! 🌟. We are thrilled by the tremendous progress in math reasoning in the era of LLMs and VLMs. MathVista has become one of the most reliable benchmarks for probing their abilities in visual math
🚀Excited to release our 112-page study on math reasoning in visual contexts via #MathVista. For the first time, we provide both quantitative and qualitative evaluations of #GPT4V, #Bard, & 10 other models. 📄✨Full paper: 🔗Proj:
5
9
68
We're organizing the 3rd #MathAI workshop at @NeurIPSConf #NeurIPS. 🚀 Excited for our speakers on AI for mathematical reasoning, @guyvdb, @noahdgoodman, @wtgowers, @BaraMoa, @KristinLauter, @TaliaRinger, @paul_smolensky, Armando Solar-Lezama, @Yuhu_ai_, @ericxing, @denny_zhou.
0
11
65
Spent a fantastic weekend at Lake Arrowhead with the @uclanlp group! ❄️🏔️⬆️ Enjoyed scenic drives, delicious meals, engaging conversations, and brainstorming sessions. Truly inspiring! 🚗🥘😋💬 🖼️🧠💡
2
6
67
📢Great news! Our #ScienceQA dataset is gaining significant attention lately. It is the primary benchmark for the next-gen #MultimodalCoT reasoning system by @AmazonScience, and it's now included in @huggingface: More details: 👉
1
15
67
It is my great honor to be awarded the #Bloomberg Data Science Ph.D. Fellowship! Many thanks to the tremendous support from @TechAtBloomberg, @UCLAComSci, and Professor @kaiwei_chang @uclanlp! Go Bruins🐻✊!.
2
1
62
@_arohan_ @JeffDean @GoogleDeepMind Hi Rohan, thanks for pointing it out. We have updated the leaderboard with Flash. Congratulations to you and your team on the development of these impressive models! 🏆
3
7
54
📚 Traditionally, we focus on textual knowledge for AI tasks. Our study shows that visual info 🖼️, like images, can offer richer insights for question answering. Explore how retrieving images can enhance AI with MRAG. 🔗 📄
🚀Introducing MRAG-Bench: How do Large Vision-Language Models utilize vision-centric multimodal knowledge? 🤔Previous multimodal knowledge QA benchmarks can mainly be solved by retrieving text knowledge.💥We focus on scenarios where retrieving knowledge from image corpus is more
0
7
55
🛠️🚀 Excited to share our latest paper: VDebugger! Discover how our novel framework debugs visual programs using execution feedback, boosting accuracy and interpretability by up to 3.2%!. Project: Paper: Code:
Looking for a debugging algorithm for visual programming? Take a look at 𝗩𝗗𝗲𝗯𝘂𝗴𝗴𝗲𝗿🔥🔥🔥 By tracking execution step by step, VDebugger boosts the accuracy by up to 𝟯.𝟮% on 6 visual reasoning tasks!
2
12
54
🤯So thrilled to have @AnthropicAI benchmark their latest, powerful Claude 3 models on our #MathVista for visual math reasoning! . It's encouraging to see the rapid progress in (multimodal) LLMs, especially in the math and science fields! 💥. 🤗 Our @huggingface Data:
Today, we're announcing Claude 3, our next generation of AI models. The three state-of-the-art models—Claude 3 Opus, Claude 3 Sonnet, and Claude 3 Haiku—set new industry benchmarks across reasoning, math, coding, multilingual understanding, and vision.
1
7
52
It has been a wonderful day at Open House @allen_ai 🍺🍖🌊. I met a lot of great people and got inspiring advice. Many thanks to the great efforts of the operations team for preparing all of it!
0
2
50
🚀 Our @Gradio demo now supports diverse vision-language tasks:.1️⃣ Visual Question Answering (VQA).2️⃣ Multi-level Dense Caption.3️⃣ Referring Expression Comprehension.4️⃣ Relationship Grounding.5️⃣ Grounding Captions.6️⃣ Object Detection.7️⃣ Human Keypoint Detection.8️⃣ Text Detection
0
11
48
🔥Thrilled to see our #LLaMA-Adapter featured in @HuggingFace's "Spaces of the Week"! 🎉. Introducing LLaMA-Adapter V2, our cutting-edge multi-modal instruction model! Explore demo examples here: 💡. 🚀Stay tuned for the technical report and model release!
0
10
50
🎉 Exciting news! Our #MathVista is excelling with the latest advances in vision-language models (VLMs). Grok-1.5V by @xai achieves a 52.8% score, surpassing leading models such as GPT-4V, Claude 3 Opus, and Gemini Pro 1.5!. 🔗 Visit our project page: 👀
1
4
46
🚀 Thrilled to share that #textgrad is published in @Nature today! 🎉. It’s been an incredible journey working with the amazing TextGrad team and the Zou Group @james_y_zou. 🙌. ✨ What is TextGrad?.A groundbreaking framework that automates optimization of LLMs and compound.
⚡️Really thrilled that #textgrad is published in @nature today!⚡️. We present a general method for genAI to self-improve via our new *calculus of text*. We show how this optimizes agents🤖, molecules🧬, code🖥️, treatments💊, non-differentiable systems🤯 + more!
0
5
43
📢 Attention #NLPoc community!. Submit and showcase your research at the 4th Southern California Natural Language Symposium (SoCal NLP) 📜. 🗓️ Submission Deadline: Oct. 21, 2023, 11:59 PM PT. 🔗 More info: #SoCalNLP #CallForPapers
1
13
45
Congratulations and thanks to @MistralAI for releasing the #MoE model to the community. Our LLaMA2-Accessory now features Mixtral-8x7b with a chatbot demo, available on @Gradio!. Try the Chatbot: http://106.14.127.192/. For more implementation details:. 📖 Documentation:
0
10
43
🚀 Excited to see Claude 3.5 Sonnet by @AnthropicAI achieve a new SOTA on #MathVista with 67.7%, a 19.8% improvement over Claude 3 Sonnet! 📈🎉. Learn more: .📝 Blog: 🔢 MathVista:
Introducing Claude 3.5 Sonnet—our most intelligent model yet. This is the first release in our 3.5 model family. Sonnet now outperforms competitor models on key evaluations, at twice the speed of Claude 3 Opus and one-fifth the cost. Try it for free:
1
8
42
Gratitude to our esteemed speakers, insightful panelists, engaged attendees, and dedicated organizers (@LiangZhenwen, @AlbertQJiang, @katie_m_collins, @KaiyuYang4, @wellecks, and @JLMcClelland) for making the 3rd #MATHAI workshop at #NeurIPS2023 an extraordinary success!!
📢 Can't wait to see you at the 3rd #MathAI Workshop in the LLM Era at #NeurIPS2023!. ⏰ 8:55am - 5:00pm, Friday, Dec 15.📍 Room 217-219.🔗 📽️ Exciting Lineup:.⭐️ Six insightful talks by @KristinLauter, @BaraMoa, @noahdgoodman,
1
4
40
Excited to co-organize the 4th MATH-AI workshop at #NeurIPS2024 @NeurIPSConf! Join us to discuss the latest progress in AI for Math. Discover more and get involved:.🌟 Workshop page: 📝 Call for papers: 👥 Reviewer opportunities:.
📢Excited to announce the 4th MATH-AI Workshop at #NeurIPS2024 to discuss some of the exciting recent advances in AI for math!. 🏠Homepage: ✍️Help review! 📝Submit:
1
2
38
🚀We've just launched #SciBench, a sophisticated, college-level benchmark. It uniquely evaluates the capabilities of LLMs in tackling scientific problem-solving.
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models. paper page: Recent advances in large language models (LLMs) have demonstrated notable progress on many mathematical benchmarks. However, most of these
1
8
39
Happy to receive the NeurIPS 2022 Scholar Award! I really appreciate every support I get from the community, and I will devote myself to making contributions to the community! @NeurIPSConf . 🍻See you in New Orleans!
1
1
38
Still buzzing from the #CopilotPCs launch yesterday, and now @Microsoft drops the efficient Phi-3-Vision model! 🚀 Thrilled to see three of our past projects, featured in their benchmarks! . Encouraged to continue pushing the boundaries of AI research! 💡📊🔍. ScienceQA -
Phi-3-vision looking enticing. 128K context.4B parms.Performs exceptionally well on benchmarks. Will have to see if this one translates well to real-world use, but I am excited to check it out.
2
5
39
In 2021, we explored early research in geometry: our Inter-GPS, a neuro-symbolic solver, reached average human-level score for the first time.🎉. Now, @GoogleDeepMind's AlphaGeometry marks a historic breakthrough: Olympiad-level skill!🚀. 🔎For more:.🔗
Introducing AlphaGeometry: an AI system that solves Olympiad geometry problems at a level approaching a human gold-medalist. 📐. It was trained solely on synthetic data and marks a breakthrough for AI in mathematical reasoning. 🧵
1
8
36
⭐️ Awesome! @guyvdb from UCLA is presenting the talk "AI Can Learn from Data. But Can It Learn to Reason?" offering insights from a logical and probabilistic perspective!. #MATHAI #NeurIPS23 #Logic #Reasoning #AI
📢 Can't wait to see you at the 3rd #MathAI Workshop in the LLM Era at #NeurIPS2023!. ⏰ 8:55am - 5:00pm, Friday, Dec 15.📍 Room 217-219.🔗 📽️ Exciting Lineup:.⭐️ Six insightful talks by @KristinLauter, @BaraMoa, @noahdgoodman,
0
3
36
🚨 Attention! I'm presenting the 🦎 #Chameleon paper at Booth 320 from 10:45 to 12:45 at #NeurIPS23. You're welcome to stop by for a chat! ☕️😉🤖🧲💡. For more details, check out our project at
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models. Chameleon with GPT-4 achieves an 86.54% accuracy on ScienceQA, significantly improving upon the best published few-shot model by 11.37%; using GPT-4 as the underlying LLM, Chameleon achieves a 17.8%
2
3
34
🧲Please stop by our poster on deep learning for math reasoning at Poster Session 2 @aclmeeting #ACL2023NLP. ❤️Thanks to co-authors for their great contributions: @liangqiu_1994, @wyu_nd, @wellecks, & @kaiwei_chang. abs: github:
0
5
34
🚀 @google is introducing new updates to aid in learning math and science, especially in visual contexts: 💥 We're proud to spotlight our commitment to math and science over the past years, with projects like #MathVista, #Chameleon, and #ScienceQA. 1️⃣
0
10
33
It is remarkable that Gemini achieves a new SOTA of 53.0% on MathVista (, a challenging benchmark for math reasoning in visual contexts. We are honored that our proposed #MathVista is advancing the development of the newest and most capable AI models.
In image understanding, Gemini performs well across all the benchmarks we examined, with the Ultra model setting new state-of-the-art results in every benchmark.
0
3
33
🚀OpenAI is releasing the latest function and tool-calling update for #GPT4!. Just two months back, we introduced #Chameleon🦎, an innovative compositional reasoning framework. It uses LLMs as a planner to generate diverse programs, integrating various tools including LLMs,
0
6
33
It was great to attend the #NeurIPS2022 poster session and present our work @UCLA @ASU @allen_ai in person🎉. I’m excited that I met many great people and got countless insightful advice and comments. Thanks to everyone for your interest in our work!🍻
0
4
32
Pleased to share that we have two papers presented and one co-organized workshop at #ICML2024! 🎉 Huge thanks to all collaborators! 🙌. SciBench:.Poster: Website: SPHINX-X:.Poster: Paper:
0
6
31
📢 Join us at the New England NLP Meeting (NENLP) 2025!.🗓 Date: April 11, 2025.📍 Location: New Haven. 🔹 Register now: 🔹CFP: Don't miss this exciting event—connect with fellow NLP researchers and practitioners!. See you there!.
We’re thrilled to welcome you to New England NLP 2025 at Yale University on April 11th in New Haven, CT 🎉 .Join us for a full day of exciting talks and sparkling discussions with NLP researchers across the New England region and beyond. 👉 Register now
0
5
30
😜Looking forward to seeing you at the 1st Tool-Augmented Vision (TAVI) Workshop at #CVPR2024 in Seattle. 🔍For more details, please visit the website:
We will be organizing the 1st Tool-Augmented VIsion (TAVI) Workshop at #CVPR2024. We are looking forward to having an exciting list of keynote speakers covering various topics about tool-use and retrieval augmented models. More details at:
0
4
29
🤔Naming things is hard!!. 🦎#Meta's new work shares the same name as our NeurIPS 2023 paper from one year ago: Chameleon: Compositional Reasoning with LLMs. Coincidence or great minds thinking alike? 😈 Dive into our work here: .
Newly published work from FAIR, Chameleon: Mixed-Modal Early-Fusion Foundation Models. This research presents a family of early-fusion token-based mixed-modal models capable of understanding & generating images & text in any arbitrary sequence. Paper ➡️
3
2
29
🚨Call for Papers🚨 Submission to the #NeurIPS2022 MATH-AI Workshop will be due on Sep 30, 11:59pm PT (2 days after ICLR😆). The page limit is 4 pages (not much workload🤩). Work both in progress and recently published is allowed. Act NOW and see you in #NewOrleans!🥳🥳🍻
1
8
25
We're dedicated to #OpenSource, confident that it will profoundly enrich the community.🌟. Thrilled to see our recent work, LLaMA-Adapter, and its subsequent developments positively impacting the community.🚀. Stay updated with continuous improvements: 📌.
It was a great month for open source: So many LLMs came out that it's become quite overwhelming to keep track of it all. So, in this month's Ahead of AI issue, I am sharing resources and research insights on the latest open-source LLMs & datasets!.
0
7
26
🚨 Last call for papers! 🚨. The 4th Workshop on Math Reasoning and AI @NeurIPSConf 2024 is accepting submissions for one more week. 🗓️ Deadline: September 20, 2024.📍 See you in Vancouver!. Details & submissions: #NeurIPS2024 #AI #MathReasoning #MATHAI
2
2
25
One model to align multiple modalities. Looking forward to seeing the live demo.
OneLLM: One Framework to Align All Modalities with Language. paper page: Multimodal large language models (MLLMs) have gained significant attention due to their strong multimodal understanding capability. However, existing works rely heavily on
0
4
23
An excellent blog on Controllable Neural Text Generation from @lilianweng! It's important to consider ways to reduce the hallucinations of LLMs and better reflect human intentions, especially given their current success and limitations. 👉 #ChatGPT #LLM.
0
3
26