Ao Zhang @zhanga6 X Profile

Ao Zhang

@zhanga6

Followers

666

Following

165

Media

10

Statuses

65

Ph.D student @NUSingapore. Core contributors of GR00T N1, MiniCPM-V, NExT-Chat. Research on MLLMs.

Central Region, Singapore

Joined January 2016

Don't wanna be here? Send us removal request.

Ao Zhang

@zhanga6

2 years

🚀NExT-Chat🚀: An LMM for Chat, Detection and Segmentation.All of the demo code, training code, evaluation code, and model weights are released at This a large multimodal model for chat, detection and segmentation as shown in the demo video:

3

15

49

Ao Zhang

@zhanga6

1 month

Like this interesting work❤️! Generate LLM params for new tasks in 1 sec!.

Victor.Kai Wang

@VictorKaiWang1

1 month

Customizing Your LLMs in seconds using prompts🥳!.Excited to share our latest work with @HPCAILab, @VITAGroupUT, @k_schuerholt, @YangYou1991, @mmbronstein, @damianborth : Drag-and-Drop LLMs(DnD). 2 features: tuning-free, comparable or even better than full-shot tuning.(🧵1/8)

2

1

2

Ao Zhang

@zhanga6

2 months

Wow, new release is coming 🥳!.

NVIDIA Robotics

@NVIDIARobotics

2 months

#NVIDIAIsaac GR00T N1.5 is now accessible to #robotics developers working with a wide range of robot form factors, and available to download from @huggingface. 🎉. Dive into our step-by-step tutorial to learn how to easily post-train and adapt it to the LeRobot SO-101 arm, and

0

1

Ao Zhang

@zhanga6

2 months

Big congrats to @OpenBMB and @xcjthu1 ! Speed of light😭.

OpenBMB

@OpenBMB

2 months

🔥MiniCPM4: Ultra-Efficient LLMs on End Devices.🚀Technical Report: 💡Paper on HuggingFace: 📥Download Models:

0

3

Ao Zhang

@zhanga6

5 months

RT @yukez: Thrilled to announce GR00T N1, our open foundation model for generalist humanoid robots!. GR00T N1 adopts a dual-system design,….

0

60

0

Ao Zhang

@zhanga6

5 months

🚀So excited to share our recent work! GR00T N1 is a 2B model for humanoid robots, which is validated on a series of sim and real robot benchmarks🙌! Try it out!.

Jim Fan

@DrJimFan

5 months

Excited to announce GR00T N1, the world’s first open foundation model for humanoid robots! We are on a mission to democratize Physical AI. The power of general robot brain, in the palm of your hand - with only 2B parameters, N1 learns from the most diverse physical action dataset

0

10

Ao Zhang

@zhanga6

6 months

What is the next AI breakthrough after RL🤔?. A friend from DeepSeek: AGI is coming. The question is not for you anymore😭. It’s for AI.

0

Ao Zhang

@zhanga6

7 months

Why are MiniCPM-V series so powerful😆?.

OpenBMB

@OpenBMB

7 months

💥 Introducing MiniCPM-o 2.6: An 8B size, GPT-4o level Omni Model runs on device .✨ Highlights: .~Match GPT-4o-202405 in vision, audio and multimodal live streaming .~End-to-end real-time bilingual audio conversation ~Voice cloning & emotion control .~Advanced OCR & video

0

1

Ao Zhang

@zhanga6

11 months

RT @xcjthu1: 1/5 🚀 Excited to share our latest paper on Configurable Foundation Models! 🧠. Inspired by the human brain's functional special….

0

24

0

Ao Zhang

@zhanga6

1 year

Real-time video generation🤩🤩🤩. Congrats to Xuanlei and Kai.

Xuanlei Zhao

@oahzxl

1 year

Real-Time Video Generation: Achieved 🥳. Share our latest work with @JxlDragon, @VictorKaiWang1, and @YangYou1991: "Real-Time Video Generation with Pyramid Attention Broadcast.". 3 features: real-time, lossless quality, and training-free!. Blog: (🧵1/6)

0

3

Ao Zhang

@zhanga6

1 year

RT @FuxiaoL: Thanks for my excellent collaborators @zhanga6, @imhaotian, Hao Fei, Yuan Yao, @zhangzhuosheng. Our.@CVPR tutorial on "From Mu….

0

6

0

Ao Zhang

@zhanga6

1 year

I will be at #CVPR2024 from 16 Jun. to 22 Jun. Happy to meet old friends and make new friends😃. If you are interested in MLLM, let’s discuss!.

0

2

18

Ao Zhang

@zhanga6

1 year

Since the HuggingFace page of Llama3-V is removed now, we upload both Llama3-V and MiniCPM-V checkpoints ( for comparison. Since this model has received several thousands of downloads on HuggingFace, there should be independent copies to reproduce this.

0

2

Ao Zhang

@zhanga6

1 year

The same thing also happens to WebAgent, another unrevealed feature trained on in-house data. They even make identical errors in a WebAgent schema newly defined within our team.

1

25

Ao Zhang

@zhanga6

1 year

For quantative results, we also test several Llama3-based VLMs on 1K Bamboo Character images and compared the prediction exact match for each pair of models. The overlaps between every two models are zero, whereas the overlaps between Llama3-V and MiniCPM-Llama3-V 2.5 achieve a

1

2

29

Ao Zhang

@zhanga6

1 year

One of the experimental features of MiniCPM-Llama3-V 2.5 is recognizing Tsinghua Bamboo Characters (清华简), a very special and rare type of Chinese ancient characters written on bamboo during China's Warring States Period (475 BC-221 BC). These training images are recently

3

7

83

Ao Zhang

@zhanga6

1 year

After receiving the issue from @yangzhizheng1 on GitHub, we launched a serious investigation. We can obtain inference results correctly using Llama3-V checkpoint with MiniCPM-Llama3-V 2.5's code and config file following @yangzhizheng1's instruction on GitHub. Even more, we also.

1

2

29

Ao Zhang

@zhanga6

1 year

So sad to hear the news (😰. The conclusion of our investigation:. 1. Llama3-V can be run using MiniCPM-Llama3-V 2.5's code and config.json after changing param names. 2. It behaves similarly to MiniCPM-Llama3-V 2.5 in unrevealed experimental features.

12

71

491

Ao Zhang

@zhanga6

1 year

Comparable to GPT-4V with only 8b param😱. Welcome to check out our new MiniCPM-Llama3-V 2.5.

OpenBMB

@OpenBMB

1 year

🚀 Excited to introduce MiniCPM-Llama3-V 2.5! With 8B parameters, it’s our latest breakthrough, outperforming top models like GPT-4V. 📈.💪 Superior OCR capabilities.🔑 Supports 30+ languages.HuggingFace:GitHub:

0

4

Ao Zhang

@zhanga6

1 year

Shocked by the performance💥!. Also solve my long confuse about which version of ChatGPT to use for eval.

Ruohong Zhang

@RuohongZhang

1 year

[p1] 🐕Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward🐕. Paper link: page: How to effectively train video large multimodal Model (LMM) alignment with preference modeling?

0

Ao Zhang

@zhanga6

1 year

RT @arankomatsuzaki: Some people criticize next token prediction like "we should also predict more future tokens beyond the immediate futur….

0

69

0