Zhenxing Mi
@Mifucius1
Followers
137
Following
727
Media
13
Statuses
49
PhD student @ HKUST
Joined July 2021
Excited to share our new paper "One4D: Unified 4D Generation and Reconstruction via Decoupled LoRA Control" on arXiv. With our novel designs of Unified Masked Conditioning (UMC) and Decoupled LoRA Control (DLC), One4D can seamlessly handle single-image-to-4D, sparse-frame-to-4D,
1
4
15
The source code and checkpoints of ThinkDiff are finally released! We also released a modified vLLM for embedding. Feel free to use them and raise an issue if you have any problem! Github: https://t.co/PgvHdytanj Checkpoints: https://t.co/NtPBk9Pkpb vLLM:
github.com
This is a fork of vllm for extracting embeddings of generated tokens - MiZhenxing/vllm
🔥🚀 Native image generation of Gemini is just on fire! Our #ThinkDiff paper bridges vision-language models with diffusion models to unlock native image generation from VLMs. While not yet as perfect as #Gemini, it offers a glimpse of the groundbreaking potential! 🔥 Check
0
4
8
Starlink Mini offers fast, reliable internet on the go—great for traveling, camping, exploring, boating, RVing, and more. Stay connected without dead zones or slow speeds. Order online in under 2 minutes.
74
347
2K
Cool! 3D generation becomes more fantastic!
0
0
1
We propose a high-fidelity talking head generation framework that supports both single-modal and multi-modal driven signals. More details: Arxiv: https://t.co/1eM9S0llym Project page: https://t.co/020eAwxMUj Github: https://t.co/fhIPEslNYU HigginFace: https://t.co/xMiBtKYuff
2
7
39
The image generation of GPT-4o is amazing. It highlights image generation based on multimodal in-context learning. Our paper ThinkDiff investigates this direction and shows promising results, although far less powerful than GPT-4o and Gemini. Check out our paper and post!
🔥🚀 Native image generation of Gemini is just on fire! Our #ThinkDiff paper bridges vision-language models with diffusion models to unlock native image generation from VLMs. While not yet as perfect as #Gemini, it offers a glimpse of the groundbreaking potential! 🔥 Check
0
0
7
Test case is from
0
0
1
2. Instruct following and style transferring: Input image is borrowed from X.
2
0
3
🔥🚀 Native image generation of Gemini is just on fire! Our #ThinkDiff paper bridges vision-language models with diffusion models to unlock native image generation from VLMs. While not yet as perfect as #Gemini, it offers a glimpse of the groundbreaking potential! 🔥 Check
1
6
18
Excited to share our new paper "ThinkDiff" on arxiv. I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models It can make the diffusion models take "IQ tests"! It empowers diffusion models with multimodal in-context understanding and reasoning
2
7
30
How to design generative models to help segmentation tasks?🧐Introducing SegGen, our innovative approach for generating training data for image segmentation tasks, which greatly pushes the boundaries of performance for cutting-edge segmentation models. We creatively propose a
10
7
58
#ICLR2023 Updates to TaskPrompter's codebase for joint 2D-3D multi-task understanding on Cityscapes-3D! We now predict disparity instead of depth, aligning with prevalent practices in the dataset. Please check https://t.co/SlumVxZoTN… Thank Prof @danxuhk for valuable guidance!😃
4
7
43
The code of our ICLR2023 paper "Switch-NeRF: Learning Scene Decomposition with Mixture of Experts for Large-scale Neural Radiance Fields" has been released. @danxuhk Code: https://t.co/NFnHYu3kkS Paper: https://t.co/IEbSC5eyWA Project page: https://t.co/ZnDdDyW6v6
2
7
38