Zhenxing Mi @Mifucius1 X Profile

Zhenxing Mi

@Mifucius1

Followers

137

Following

727

Media

13

Statuses

49

PhD student @ HKUST

https://t.co/5R8gR2mmpt

Joined July 2021

Don't wanna be here? Send us removal request.

Zhenxing Mi

@Mifucius1

3 days

Excited to share our new paper "One4D: Unified 4D Generation and Reconstruction via Decoupled LoRA Control" on arXiv. With our novel designs of Unified Masked Conditioning (UMC) and Decoupled LoRA Control (DLC), One4D can seamlessly handle single-image-to-4D, sparse-frame-to-4D,

1

4

15

Zhenxing Mi

@Mifucius1

3 months

The source code and checkpoints of ThinkDiff are finally released! We also released a modified vLLM for embedding. Feel free to use them and raise an issue if you have any problem! Github: https://t.co/PgvHdytanj Checkpoints: https://t.co/NtPBk9Pkpb vLLM:

github.com

This is a fork of vllm for extracting embeddings of generated tokens - MiZhenxing/vllm

Zhenxing Mi

@Mifucius1

9 months

🔥🚀 Native image generation of Gemini is just on fire! Our #ThinkDiff paper bridges vision-language models with diffusion models to unlock native image generation from VLMs. While not yet as perfect as #Gemini, it offers a glimpse of the groundbreaking potential! 🔥 Check

0

4

8

Starlink

@Starlink

3 days

Starlink Mini offers fast, reliable internet on the go—great for traveling, camping, exploring, boating, RVing, and more. Stay connected without dead zones or slow speeds. Order online in under 2 minutes.

74

347

2K

Zhenxing Mi

@Mifucius1

5 months

Cool! 3D generation becomes more fantastic!

AK

@_akhaliq

5 months

From One to More Contextual Part Latents for 3D Generation

0

1

Dan Xu

@danxuhk

8 months

We propose a high-fidelity talking head generation framework that supports both single-modal and multi-modal driven signals. More details: Arxiv: https://t.co/1eM9S0llym Project page: https://t.co/020eAwxMUj Github: https://t.co/fhIPEslNYU HigginFace: https://t.co/xMiBtKYuff

2

7

39

Zhenxing Mi

@Mifucius1

8 months

The image generation of GPT-4o is amazing. It highlights image generation based on multimodal in-context learning. Our paper ThinkDiff investigates this direction and shows promising results, although far less powerful than GPT-4o and Gemini. Check out our paper and post!

Zhenxing Mi

@Mifucius1

9 months

🔥🚀 Native image generation of Gemini is just on fire! Our #ThinkDiff paper bridges vision-language models with diffusion models to unlock native image generation from VLMs. While not yet as perfect as #Gemini, it offers a glimpse of the groundbreaking potential! 🔥 Check

0

7

Zhenxing Mi

@Mifucius1

9 months

Test case is from

Robert Riachi

@robertriachi

9 months

some cool examples with Gemini 2.0 native image output 🧵

0

1

Zhenxing Mi

@Mifucius1

9 months

5. Image-conditioned video generation:

0

2

Zhenxing Mi

@Mifucius1

9 months

4. Composing images guided by text:

1

0

2

Zhenxing Mi

@Mifucius1

9 months

3. Composing images and texts coherently:

1

0

2

Zhenxing Mi

@Mifucius1

9 months

2. Instruct following and style transferring: Input image is borrowed from X.

2

0

3

Zhenxing Mi

@Mifucius1

9 months

🔥🚀 Native image generation of Gemini is just on fire! Our #ThinkDiff paper bridges vision-language models with diffusion models to unlock native image generation from VLMs. While not yet as perfect as #Gemini, it offers a glimpse of the groundbreaking potential! 🔥 Check

1

6

18

Zhenxing Mi

@Mifucius1

9 months

Excited to share our new paper "ThinkDiff" on arxiv. I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models It can make the diffusion models take "IQ tests"! It empowers diffusion models with multimodal in-context understanding and reasoning

2

7

30

HanRong YE

@leoyerrrr

2 years

How to design generative models to help segmentation tasks?🧐Introducing SegGen, our innovative approach for generating training data for image segmentation tasks, which greatly pushes the boundaries of performance for cutting-edge segmentation models. We creatively propose a

10

7

58

HanRong YE

@leoyerrrr

3 years

#ICLR2023 Updates to TaskPrompter's codebase for joint 2D-3D multi-task understanding on Cityscapes-3D! We now predict disparity instead of depth, aligning with prevalent practices in the dataset. Please check https://t.co/SlumVxZoTN… Thank Prof @danxuhk for valuable guidance!😃

4

7

43

Zhenxing Mi

@Mifucius1

3 years

The code of our ICLR2023 paper "Switch-NeRF: Learning Scene Decomposition with Mixture of Experts for Large-scale Neural Radiance Fields" has been released. @danxuhk Code: https://t.co/NFnHYu3kkS Paper: https://t.co/IEbSC5eyWA Project page: https://t.co/ZnDdDyW6v6

2

7

38