Zineng Tang @ZinengTang X Profile

Zineng Tang

@ZinengTang

Followers

1K

Following

141

Media

24

Statuses

131

PhD in @Berkeley_ai and @BerkeleyNLP. Previously @UNCNLP and @MSFTResearch.

Chapel Hill, NC

Joined February 2019

Don't wanna be here? Send us removal request.

Zineng Tang

@ZinengTang

1 month

Excited to share our new work!.DOVE 🕊️: a dynamic vision encoder that adapts token count to image complexity. Fewer tokens, same fidelity—outperforming fixed-length AEs tokenizer on classification & VLM tasks!.Arxiv: Web: #AI #CV

1

21

113

Zineng Tang

@ZinengTang

1 month

RT @ZinengTang: Excited to share our new work!.DOVE 🕊️: a dynamic vision encoder that adapts token count to image complexity. Fewer tokens,….

0

21

0

Zineng Tang

@ZinengTang

1 month

Big thanks to my undergrad intern Lingjun for delivering such impressive work, and to Rudy for the thoughtful co-advising!.

0

2

Zineng Tang

@ZinengTang

1 month

🔥 DOVE uses 68 % fewer tokens but has better FID than VQGAN/TiTok and +10–12 pts on VQA/ImageNet/CIFAR. It achieves significantly stronger performance on classification, probing, and VLM tasks. DOVE also brings emerging properties—PCA heatmaps reveal sharper segmentation.

1

0

2

Zineng Tang

@ZinengTang

4 months

Also thanks to, @LongTonyLian, Seun Eisape (, @XDWang101, @roeiherzig, @Yalatweets, @alsuhr, and @trevordarrell for their great efforts!.

1

0

5

Zineng Tang

@ZinengTang

4 months

TULIP achieves state-of-the-art performance across multiple vision and vision-language benchmarks. It significantly improves zero-shot classification on ImageNet-1K, enhances fine-grained object recognition, and boosts multimodal reasoning scores. Compared to existing methods,

1

5

Zineng Tang

@ZinengTang

4 months

Despite the success of contrastive image-text models like CLIP, they struggle with vision-centric tasks requiring high-fidelity understanding. We introduce TULIP, a novel model integrating generative data augmentation, enhanced contrastive learning, and reconstruction.

1

0

7

Zineng Tang

@ZinengTang

4 months

We are thrilled to announce TULIP!. 🌷 A state of the vision language encoders coupled with generative model for stronger representation learning.

7

68

302

Zineng Tang

@ZinengTang

10 months

RT @ChengleiSi: Automating AI research is exciting! But can LLMs actually produce novel, expert-level research ideas?. After a year-long st….

0

770

0

Zineng Tang

@ZinengTang

11 months

RT @yzy_ai: We announced Phi 3.5 series today! .1️⃣ Multilingual Mini 3.8B: 2️⃣ MoE 16x3.8B (active 6.6B): https://….

0

6

0

Zineng Tang

@ZinengTang

1 year

CoDi-2 is selected as #CVPR2024 Highlight. Come joint us in today’s poster session Arch 4A-E #314 in 5pm to 6:30 pm!. @yzy_ai @nlpyang @ChenguangZhu2 @mohitban47.

Zineng Tang

@ZinengTang

2 years

🔥Excited to introduce CoDi-2!. It follows complex multimodal-interleaved in-context instructions to generate any modalities (text, vision, audio) in zero/few-shot interactive way!. @yzy_ai @nlpyang @ChenguangZhu2 @mohitban47. 🧵👇

0

10

18

Zineng Tang

@ZinengTang

1 year

RT @mohitban47: Honored and grateful to be named as one of the @UNC permanent distinguished professorships 🙏 . 100% of the credit goes to m….

0

24

0

Zineng Tang

@ZinengTang

1 year

Excited to share that CoDi-2 is accepted to @CVPR .In this work, we show that alignment of multimodal inputs to language unlocks ICL and Few-shot prompting ability for multimodal generation. #CVPR2024. @yzy_ai @nlpyang @ChenguangZhu2 @mohitban47.

Zineng Tang

@ZinengTang

2 years

🔥Excited to introduce CoDi-2!. It follows complex multimodal-interleaved in-context instructions to generate any modalities (text, vision, audio) in zero/few-shot interactive way!. @yzy_ai @nlpyang @ChenguangZhu2 @mohitban47. 🧵👇

2

10

54

Zineng Tang

@ZinengTang

2 years

RT @mohitban47: Had a great time speaking at @indoml_sym & meeting students+faculty+researchers (overall, kudos to organizers+volunteers on….

0

11

0

Zineng Tang

@ZinengTang

2 years

RT @pan_jiayipan: Generative model that operates across versatile modalities is where future is, which makes this work super interesting.

0

5

0

Zineng Tang

@ZinengTang

2 years

RT @mohitban47: 🚨 CoDi-2: In-Context, Interleaved, Interactive Any-to-Any Generation🚨. Interactive MLLM that can follow "multimodal-interle….

0

14

0

Zineng Tang

@ZinengTang

2 years

RT @jmin__cho: CoDi-2 presents multimodal 'generation' with interleaved in-context learning!. Another exciting work by @ZinengTang @yzy_ai….

0

8

0

Zineng Tang

@ZinengTang

2 years

RT @yzy_ai: Recall CoDi any-to-any generation? Now CoDi-2🔥any-to-any MLLM is here! It is.(1) 1st ever vision-language-audio zero/few-shot g….

0

6

0

Zineng Tang

@ZinengTang

2 years

RT @DrJimFan: Codi-2: an interleaved, multimodal LLM that supports arbitrary audio, image, and video mixture in I/O. You can give instructi….

0

130

0