kaiyes
@Kaiyes_
Followers
1K
Following
47K
Media
1K
Statuses
9K
on device local llm - ex CTO - 10 years react native - building @bundaiapp
Learn Japanese with anime 👉
Joined June 2013
Built a fully local voice AI assistant on react native - Zero token cost. Speech-to-Text → LLM → Text-to-Speech. All on-device. All private. The Stack: - Whisper.RN for STT (OpenAI's Whisper ported to RN) - Ollama + Qwen model for local LLM inference - Supertonic (via
4
0
7
If you are considering buying the intel 32 GB gpu that is under $1k, then read this before you buy.
People keep saying “VRAM is all that matters” for local LLMs > It’s not just wrong, it’s misleading When running LLMs locally, the bottleneck is NOT just “VRAM size” It’s: - memory bandwidth - interconnect (PCIe vs NVLink vs RDMA) - inference engine (vLLM,
0
0
0
I played this game around 1996 ! 30 years ago !
Visco’s Goal! Goal! Goal! is a spectacular Neo Geo soccer classic that captures the pure spirit of arcade sports. With its fast-paced gameplay, powerful special shots, and vibrant graphics, it delivers a thrilling international tournament experience.
0
0
0
meet Intel's Arc pro B70 - 32 GB Vram below $999. there are pros & cons to it though - availability, software support etc. Supports vLLm from the get go. Hopefully rocM or vulkan picks up the pace with it. Don't have much hope for intel to provide the software support
0
0
0
VLMs too slow for production? Not anymore: 46ms end-to-end inference, 60+ fps on a single H100. Introducing Photon, Moondream's inference engine. Runs on everything from edge to server. https://t.co/UTt6vQOzOY
39
127
1K
How in the hell are these accounts claiming to run entire companies w/ OpenClaw. I just spent 1.5 HRS trying to get my claw to use X API for reading tweets. FAIL We have a whole SOP documenting exactly how to do it from previous failures. I can’t imagine running 10 of these…
214
5
420
new twitter algorithm is really good ! it picked up my slight interests & put tweets in my timeline from amazing accounts. my feed is now all AI + anime. Normally it puts me in the react native Jail. Never knew these many great manga/anime accounts were there in twitter
0
0
1
Transformers are Turing complete and can be trained to run arbitrary programs Turns out you can embed a relatively efficient assembly interpreter in the forward pass. This allows the LLM to execute deterministic code at inference time in its own weights, no sandbox
1/4 LLMs solve research grade math problems but struggle with basic calculations. We bridge this gap by turning them to computers. We built a computer INSIDE a transformer that can run programs for millions of steps in seconds solving even the hardest Sudokus with 100% accuracy
30
97
1K
take a look guys !
0
0
0
I feel people are underestimating the complexity of software. Like AI literally is limited by: - compute - model intelligence - harness - prompt accuracy - context - cost - hallucinations Yes, I’m circumventing all of these & yes I have not hand coded in a year.
0
0
0
Who needs those new 128gb ram you say ? I have been going deep into using chinese llms as part of my toolchain. Claude/codex is the main driver. It writes the scripts. But then workhorse are these small LLMs. For example: I use LLMs + Node.js to: - cut me clips from anime -
0
0
2
Crazy how fast these 9B & 4B models came out !! I was okay with having a 27B model. But this is chef's kiss !
🚀 Introducing the Qwen 3.5 Small Model Series Qwen3.5-0.8B · Qwen3.5-2B · Qwen3.5-4B · Qwen3.5-9B ✨ More intelligence, less compute. These small models are built on the same Qwen3.5 foundation — native multimodal, improved architecture, scaled RL: • 0.8B / 2B → tiny, fast,
0
0
0
I’m gonna let it out of my chest. whats the point of cross platform if AI can build 2 apps at the same time ? Make a swift app then exactly make a kotlin app ?
4
0
2