lowvram Profile Banner
lowvram Profile
lowvram

@lowvram

Followers
689
Following
6K
Media
101
Statuses
1K

too dumb for research, too smart for langchain. senior dev @ nameless megacorp, I wrap llms for a living

Joined April 2024
Don't wanna be here? Send us removal request.
@lowvram
lowvram
2 days
If you write an mcp server and plan to use it over stdio… do not write your logs to stdout/stderr. I know, no one could be stupid enough to make that mistake… certainly not me…
0
0
1
@lowvram
lowvram
3 days
What are the best cli tools to hand to coding agents? ripgrep, ast-grep, sed, etc. what else is there?
0
0
0
@lowvram
lowvram
9 days
I have reached the logical conclusion of the fragrance hyperfixation (triggered by that HackerNews post about luckyscent) - buying highly concentrated fragrance in bulk from the manufacturer
0
0
0
@lowvram
lowvram
10 days
I’m far more excited about inference hardware improvements (and cost) than training/model architecture improvements in the near future
0
0
0
@lowvram
lowvram
11 days
Seattle rain storms as an apartment-liver: “ah how nice, let’s enjoy the patter of rain and chill” As a new homeowner: “oh shit of fuck the gutters oh shit check the attic is the insulation soaked through ah shit”
1
0
13
@lowvram
lowvram
15 days
are we just rediscovering Guidance’s “token healing” now??
@vllm_project
vLLM
15 days
it’s tokenization again! 🤯 did you know tokenize(detokenize(token_ids)) ≠ token_ids? RL researchers from Agent Lightning coined the term Retokenization Drift — a subtle mismatch between what your model generated and what your trainer thinks it generated. why? because most
0
0
2
@lowvram
lowvram
16 days
Anecdotally, the 30BA3-VL-Instruct model is performing much better in my scenarios (agentic with many tools) than the non-VL version. In terms of correct tool selection and multi-turn execution. I just have to disable video and image in VLLM to make it fit 2x 3090 :3
@xeophon_
Xeophon
16 days
Qwen did it again! The new VL models (2B, 32B) absolutely CRUSH the old versions from April If you were using Qwen3 32B, it might be time to upgrade 👀
1
0
5
@lowvram
lowvram
16 days
alias grep=“echo how many times must I tell you, gpt-5, USE RIPGREP (rg)”
0
0
1
@lowvram
lowvram
19 days
I guess you could argue it’s also assuming the reader is both human *and* able-bodied (e.g. has two hands) so it was already narrowly focused on a subset of humans (though a large subset, to be sure)
0
0
0
@lowvram
lowvram
19 days
Some Wikipedia pages make the assumption that the reader is human, which is no longer guaranteed. E.x. the page for the right-hand rule says: “This can be seen by holding **your** hands together…” (emphasis mine)
1
0
0
@lowvram
lowvram
20 days
wait, THAT’S what it’s about?? Why didn’t anyone tell me, i would have read it sooner
@kalomaze
kalomaze
20 days
@teortaxesTex ^ guy who works at a company named after a novel that has incredibly graphic zombie sex scenes ^
0
0
1
@lowvram
lowvram
20 days
My microwave outperforms gpt-4o, Gemini-2.5, and Claude 4 Sonnet at warming butter. testing on Claude 4.5 and gpt-5-codex still underway
0
0
1
@lowvram
lowvram
27 days
If I had the means (and patience) to finetune any models right now, I’d do it with the Nemotron recipes and datasets. I’m no expert but just by perusing those sets and the nano-v2 training formula, it’s very well thought out.
0
0
0
@lowvram
lowvram
28 days
Trigger a millennial NLP guy who had to learn LSTM instead, in one image
@pranavgupta2603
Pranav Gupta
29 days
You know this is a real NLP class when you see this
0
0
3
@lowvram
lowvram
1 month
my favorite videos on Sora are by a late-middle-aged guy with no followers whose entire profile is him inserted into wish-fulfillment scenes, like playing guitar, cooking a steak, drinking wine on the beach. Strangely wholesome
0
0
1
@lowvram
lowvram
1 month
Anyone remember “token healing” from the guidance project? iirc it existed solely to solve the problem where spliced-together prompts, or grammar-guided outputs, result in alternate tokenizations that are out of distribution from training
@s_zhengbr
Brian Zheng
1 month
Can a LM that has only ever seen the word “cat” tokenized as ␣cat, understand the token sequence [␣, c, a, t]? In our NeurIPS spotlight ⭐, we show that the answer is surprisingly YES, and in fact, you can even modify the tokenization at inference-time for performance gains!🧵
1
0
3
@lowvram
lowvram
1 month
Great interview question for any ML hire today: why did AlphaGo have both a policy AND a value network? Isn’t a value network just a policy network with different output shape? (secretly I don’t know the answer and I just want someone to tell me)
0
0
1