Ao Zhang Profile
Ao Zhang

@zhanga6

Followers
666
Following
165
Media
10
Statuses
65

Ph.D student @NUSingapore. Core contributors of GR00T N1, MiniCPM-V, NExT-Chat. Research on MLLMs.

Central Region, Singapore
Joined January 2016
Don't wanna be here? Send us removal request.
@zhanga6
Ao Zhang
2 years
🚀NExT-Chat🚀: An LMM for Chat, Detection and Segmentation.All of the demo code, training code, evaluation code, and model weights are released at This a large multimodal model for chat, detection and segmentation as shown in the demo video:
3
15
49
@zhanga6
Ao Zhang
1 month
Like this interesting work❤️! Generate LLM params for new tasks in 1 sec!.
@VictorKaiWang1
Victor.Kai Wang
1 month
Customizing Your LLMs in seconds using prompts🥳!.Excited to share our latest work with @HPCAILab, @VITAGroupUT, @k_schuerholt, @YangYou1991, @mmbronstein, @damianborth : Drag-and-Drop LLMs(DnD). 2 features: tuning-free, comparable or even better than full-shot tuning.(🧵1/8)
2
1
2
@zhanga6
Ao Zhang
2 months
Wow, new release is coming 🥳!.
@NVIDIARobotics
NVIDIA Robotics
2 months
#NVIDIAIsaac GR00T N1.5 is now accessible to #robotics developers working with a wide range of robot form factors, and available to download from @huggingface. 🎉. Dive into our step-by-step tutorial to learn how to easily post-train and adapt it to the LeRobot SO-101 arm, and
0
0
1
@zhanga6
Ao Zhang
2 months
Big congrats to @OpenBMB and @xcjthu1 ! Speed of light😭.
@OpenBMB
OpenBMB
2 months
🔥MiniCPM4: Ultra-Efficient LLMs on End Devices.🚀Technical Report: 💡Paper on HuggingFace: 📥Download Models:
0
0
3
@zhanga6
Ao Zhang
5 months
RT @yukez: Thrilled to announce GR00T N1, our open foundation model for generalist humanoid robots!. GR00T N1 adopts a dual-system design,….
0
60
0
@zhanga6
Ao Zhang
5 months
🚀So excited to share our recent work! GR00T N1 is a 2B model for humanoid robots, which is validated on a series of sim and real robot benchmarks🙌! Try it out!.
@DrJimFan
Jim Fan
5 months
Excited to announce GR00T N1, the world’s first open foundation model for humanoid robots! We are on a mission to democratize Physical AI. The power of general robot brain, in the palm of your hand - with only 2B parameters, N1 learns from the most diverse physical action dataset
0
0
10
@zhanga6
Ao Zhang
6 months
What is the next AI breakthrough after RL🤔?. A friend from DeepSeek: AGI is coming. The question is not for you anymore😭. It’s for AI.
0
0
0
@zhanga6
Ao Zhang
7 months
Why are MiniCPM-V series so powerful😆?.
@OpenBMB
OpenBMB
7 months
💥 Introducing MiniCPM-o 2.6: An 8B size, GPT-4o level Omni Model runs on device .✨ Highlights: .~Match GPT-4o-202405 in vision, audio and multimodal live streaming .~End-to-end real-time bilingual audio conversation ~Voice cloning & emotion control .~Advanced OCR & video
0
0
1
@zhanga6
Ao Zhang
11 months
RT @xcjthu1: 1/5 🚀 Excited to share our latest paper on Configurable Foundation Models! 🧠. Inspired by the human brain's functional special….
0
24
0
@zhanga6
Ao Zhang
1 year
Real-time video generation🤩🤩🤩. Congrats to Xuanlei and Kai.
@oahzxl
Xuanlei Zhao
1 year
Real-Time Video Generation: Achieved 🥳. Share our latest work with @JxlDragon, @VictorKaiWang1, and @YangYou1991: "Real-Time Video Generation with Pyramid Attention Broadcast.". 3 features: real-time, lossless quality, and training-free!. Blog: (🧵1/6)
0
0
3
@zhanga6
Ao Zhang
1 year
RT @FuxiaoL: Thanks for my excellent collaborators @zhanga6, @imhaotian, Hao Fei, Yuan Yao, @zhangzhuosheng. Our.@CVPR tutorial on "From Mu….
0
6
0
@zhanga6
Ao Zhang
1 year
I will be at #CVPR2024 from 16 Jun. to 22 Jun. Happy to meet old friends and make new friends😃. If you are interested in MLLM, let’s discuss!.
0
2
18
@zhanga6
Ao Zhang
1 year
Since the HuggingFace page of Llama3-V is removed now, we upload both Llama3-V and MiniCPM-V checkpoints ( for comparison. Since this model has received several thousands of downloads on HuggingFace, there should be independent copies to reproduce this.
0
0
2
@zhanga6
Ao Zhang
1 year
The same thing also happens to WebAgent, another unrevealed feature trained on in-house data. They even make identical errors in a WebAgent schema newly defined within our team.
Tweet media one
1
1
25
@zhanga6
Ao Zhang
1 year
For quantative results, we also test several Llama3-based VLMs on 1K Bamboo Character images and compared the prediction exact match for each pair of models. The overlaps between every two models are zero, whereas the overlaps between Llama3-V and MiniCPM-Llama3-V 2.5 achieve a
Tweet media one
1
2
29
@zhanga6
Ao Zhang
1 year
One of the experimental features of MiniCPM-Llama3-V 2.5 is recognizing Tsinghua Bamboo Characters (清华简), a very special and rare type of Chinese ancient characters written on bamboo during China's Warring States Period (475 BC-221 BC). These training images are recently
Tweet media one
3
7
83
@zhanga6
Ao Zhang
1 year
After receiving the issue from @yangzhizheng1 on GitHub, we launched a serious investigation. We can obtain inference results correctly using Llama3-V checkpoint with MiniCPM-Llama3-V 2.5's code and config file following @yangzhizheng1's instruction on GitHub. Even more, we also.
1
2
29
@zhanga6
Ao Zhang
1 year
So sad to hear the news (😰. The conclusion of our investigation:. 1. Llama3-V can be run using MiniCPM-Llama3-V 2.5's code and config.json after changing param names. 2. It behaves similarly to MiniCPM-Llama3-V 2.5 in unrevealed experimental features.
12
71
491
@zhanga6
Ao Zhang
1 year
Comparable to GPT-4V with only 8b param😱. Welcome to check out our new MiniCPM-Llama3-V 2.5.
@OpenBMB
OpenBMB
1 year
🚀 Excited to introduce MiniCPM-Llama3-V 2.5! With 8B parameters, it’s our latest breakthrough, outperforming top models like GPT-4V. 📈.💪 Superior OCR capabilities.🔑 Supports 30+ languages.HuggingFace:GitHub:
Tweet media one
Tweet media two
0
0
4
@zhanga6
Ao Zhang
1 year
Shocked by the performance💥!. Also solve my long confuse about which version of ChatGPT to use for eval.
@RuohongZhang
Ruohong Zhang
1 year
[p1] 🐕Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward🐕. Paper link: page: How to effectively train video large multimodal Model (LMM) alignment with preference modeling?
Tweet media one
0
0
0
@zhanga6
Ao Zhang
1 year
RT @arankomatsuzaki: Some people criticize next token prediction like "we should also predict more future tokens beyond the immediate futur….
0
69
0