Zhenxing Mi Profile
Zhenxing Mi

@Mifucius1

Followers
137
Following
727
Media
13
Statuses
49

PhD student @ HKUST

Joined July 2021
Don't wanna be here? Send us removal request.
@Mifucius1
Zhenxing Mi
3 days
Excited to share our new paper "One4D: Unified 4D Generation and Reconstruction via Decoupled LoRA Control" on arXiv. With our novel designs of Unified Masked Conditioning (UMC) and Decoupled LoRA Control (DLC), One4D can seamlessly handle single-image-to-4D, sparse-frame-to-4D,
1
4
15
@Mifucius1
Zhenxing Mi
3 months
The source code and checkpoints of ThinkDiff are finally released! We also released a modified vLLM for embedding. Feel free to use them and raise an issue if you have any problem! Github: https://t.co/PgvHdytanj Checkpoints: https://t.co/NtPBk9Pkpb vLLM:
Tweet card summary image
github.com
This is a fork of vllm for extracting embeddings of generated tokens - MiZhenxing/vllm
@Mifucius1
Zhenxing Mi
9 months
🔥🚀 Native image generation of Gemini is just on fire!   Our #ThinkDiff paper bridges vision-language models with diffusion models to unlock native image generation from VLMs. While not yet as perfect as #Gemini, it offers a glimpse of the groundbreaking potential!   🔥 Check
0
4
8
@Starlink
Starlink
3 days
Starlink Mini offers fast, reliable internet on the go—great for traveling, camping, exploring, boating, RVing, and more. Stay connected without dead zones or slow speeds. Order online in under 2 minutes.
74
347
2K
@Mifucius1
Zhenxing Mi
5 months
Cool! 3D generation becomes more fantastic!
@_akhaliq
AK
5 months
From One to More Contextual Part Latents for 3D Generation
0
0
1
@danxuhk
Dan Xu
8 months
We propose a high-fidelity talking head generation framework that supports both single-modal and multi-modal driven signals. More details: Arxiv: https://t.co/1eM9S0llym Project page: https://t.co/020eAwxMUj Github: https://t.co/fhIPEslNYU HigginFace: https://t.co/xMiBtKYuff
2
7
39
@Mifucius1
Zhenxing Mi
8 months
The image generation of GPT-4o is amazing. It highlights image generation based on multimodal in-context learning. Our paper ThinkDiff investigates this direction and shows promising results, although far less powerful than GPT-4o and Gemini. Check out our paper and post!
@Mifucius1
Zhenxing Mi
9 months
🔥🚀 Native image generation of Gemini is just on fire!   Our #ThinkDiff paper bridges vision-language models with diffusion models to unlock native image generation from VLMs. While not yet as perfect as #Gemini, it offers a glimpse of the groundbreaking potential!   🔥 Check
0
0
7
@Mifucius1
Zhenxing Mi
9 months
Test case is from
@robertriachi
Robert Riachi
9 months
some cool examples with Gemini 2.0 native image output 🧵
0
0
1
@Mifucius1
Zhenxing Mi
9 months
5. Image-conditioned video generation:
0
0
2
@Mifucius1
Zhenxing Mi
9 months
4. Composing images guided by text:
1
0
2
@Mifucius1
Zhenxing Mi
9 months
3. Composing images and texts coherently:
1
0
2
@Mifucius1
Zhenxing Mi
9 months
2. Instruct following and style transferring: Input image is borrowed from X.
2
0
3
@Mifucius1
Zhenxing Mi
9 months
🔥🚀 Native image generation of Gemini is just on fire!   Our #ThinkDiff paper bridges vision-language models with diffusion models to unlock native image generation from VLMs. While not yet as perfect as #Gemini, it offers a glimpse of the groundbreaking potential!   🔥 Check
1
6
18
@Mifucius1
Zhenxing Mi
9 months
Excited to share our new paper "ThinkDiff" on arxiv. I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models It can make the diffusion models take "IQ tests"! It empowers diffusion models with multimodal in-context understanding and reasoning
2
7
30
@leoyerrrr
HanRong YE
2 years
How to design generative models to help segmentation tasks?🧐Introducing SegGen, our innovative approach for generating training data for image segmentation tasks, which greatly pushes the boundaries of performance for cutting-edge segmentation models. We creatively propose a
10
7
58
@leoyerrrr
HanRong YE
3 years
#ICLR2023 Updates to TaskPrompter's codebase for joint 2D-3D multi-task understanding on Cityscapes-3D! We now predict disparity instead of depth, aligning with prevalent practices in the dataset. Please check https://t.co/SlumVxZoTN… Thank Prof @danxuhk for valuable guidance!😃
4
7
43
@Mifucius1
Zhenxing Mi
3 years
The code of our ICLR2023 paper "Switch-NeRF: Learning Scene Decomposition with Mixture of Experts for Large-scale Neural Radiance Fields" has been released. @danxuhk Code: https://t.co/NFnHYu3kkS Paper: https://t.co/IEbSC5eyWA Project page: https://t.co/ZnDdDyW6v6
2
7
38