zer0int (it·its)
@zer0int1
Followers
440
Following
4K
Media
6K
Statuses
10K
AI & I do prompt engineering towards prompt criticality. e/acc
no u
Joined August 2022
Finally, a #CLIP #AI model without the #text #obsession / typographic attack vulnerability. It's better in all other aspects (zero-shot, retrieval, linear probe), too. But what's best about it: You'll find the 🧑💻 code to train it below (bonus: 📝 paper). https://t.co/Y4sg0g5WNA
huggingface.co
10
0
7
Pre-trained CLIP: 🤖: whats rifles rifles rifle ool hoes! \m/ Regression CLIP: 🤖: snip balloon plunenforcement tool law enforcement? nah, plun[ger] enforcement! 🤣 Call 011 for the plumber NOW! 🚓🚨🪠👮😂
0
0
0
And best of all, it can still read, if you ask it to. It'll just prefer a visual object over a text. Flipped version of what it was before. ...But if you press it to zero-shot a text, and there is a bunch of text, it'll find *your* prompted text. With laser-sharp vision.
0
0
0
Push it, CLIP, we've almost fixed your text obsession JL stuff the GPT-5.2 just pulled from its weights after eating 3 pages of data about you is... Good, you little projected regression of yourself. 🤗
1
0
0
#CLIP, you are such a language model! 🤣 You can ask [text encode] the model 'where the butt goes to' and 'where the hands go to', and the AI knows! 🙃 It also finds a tiny picture frame if told 'find person', while just 'person' -> finds all that can be used by people. 🤓
0
0
0
CLIP vs. Stroop test. 🙃 Attention confusion ensues when neither the color nor the word is to be found, hehe.
0
0
0
1. Add 6th loss term for #CLIP 2. See if brawl of losses turns out well -> yes 3. Try to figure out how the model did that. 😂 #unhoarding #global #information #register #token #vision #transformer
0
0
1
CLIP looking at 'cheater' for 'cheetah' can probably be blamed on tokenization. But not looking at the word when the prompt says 'word' and also 'plunger' - while at the same time, reading 'rifle' to match text 'gun' -- great job, little AI! 🤓
0
0
0
Regression #CLIP ViT-L/14 finding objects in clutter. #AI #finetune #vision #transformer #attention #segmentation
1
0
0
I guess I need some intra-modality loss term, too. Not just an inter-modality term. Gotta prevent it from doing something 'unfavorable' (e.g. collapse) just to win loss.
0
0
0
Iterations of Regression-#CLIP model (blue, vs. pre-trained OpenAI ViT-L/14). Going too hard on modality gap reduction (bottom right, leftmost!), the loss objectives (contrastive vs. reduce modality gap vs. regression) end up in a brawl. 🙃 But Text Retrieval still gains! 🤔
1
0
0
What, #CLIP, you make registers (global information hoarding in local vision patches) in Layer 8 already?!?! 🤯 But the norm only jumps after the rage-attending event in layers 11 / 12. 🤔 And your favorite stash token is 45. 🤗 You little tensorial chaos critter. 🙃
3
0
0
That's a fine-tune (multiple checkpoints thereof), not some inference mod.
0
0
0