t81dev
@t81dev
Followers
170
Following
2K
Media
601
Statuses
2K
💻#Ternary Mind: https://t.co/WftcIszjSO
Cyberspace
Joined May 2025
If you: Know a rock-solid tiny instruct model (2025-era, <1B params) Have tips for streaming/block-wise conversion to stay under 8 GB RAM Or just want to cheer on extreme quantization madness Drop a reply! Open-sourcing soon if/when it works. #llama_cpp #quantization #LLM
0
0
0
Once I get a successful round-trip on something small (convert → dequant → run with patched llama.cpp), the rest scales easily on cloud GPUs. But right now I’m stuck at the “smoke test” phase. Classic indie dev struggle. 😂
1
0
0
The problem: conversion is a RAM killer. Even a 1B model blows past 7 GB peak usage during tensor packing, scales computation, and block-wise optimization. My M2 MacBook with 8 GB unified memory just OOMs every time. Need a truly tiny (<0.5–1B) 2025-era model to fully validate
1
0
0
On paper: Q4_K_M 7B ≈ 4 GB Ternary (TQ2_0) 7B ≈ 1.3–1.4 GB That’s enough to run Gemma 3 4B or Llama 3.2 3B on a phone, or a 27B-class model on an 8GB laptop without swapping. The zero state + smarter packing gives way more fidelity than pure 1-bit binary.
1
0
0
Pushing GGUF quantization into true ternary territory (~1.6–1.9 bits per weight) using {-1, 0, +1} values + block scales. Goal: Run modern 4B–7B class models in <1.5 GB RAM/VRAM with quality close to Q4/Q5. Built custom t81-convert & t81-dequant tools for llama.cpp. The math
2
0
1
If anyone has: A rock-solid <1B instruct model (2025-era, not ancient) Tips for streaming/block-wise conversion to stay under 8GB Or just moral support 😂 HMU. Once I smoke-test on something tiny, bigger models on cloud. #LLM #quantization #llamacpp #AppleSilicon
0
0
0
..my M2 MacBook with 8GB unified RAM keeps OOMing during conversion. Even 1B models push peak usage >7GB when packing ternary blocks + scales. Need a truly tiny test model (≤0.5B?) that fits comfortably to validate the full pipeline end-to-end.
1
0
0
Been hacking on ternary quantization for GGUF (1.5–1.8 bits per weight) with custom t81-convert & t81-dequant tools. Goal: squeeze modern 4B-class models into <1GB VRAM while keeping quality close to Q4/Q5. The math works on paper, but...
1
0
1
pip install t81lib https://t.co/IMcodYggas Star it. Try it. Break it. Then watch what happens next. t81dev — December 2025
0
0
0
Roadmap: v0.1 → BigInt core + Python wheels (today) v0.2 → Ternary GGUF reader/writer + dequant kernels v0.3 → T81Q quantization for llama.cpp models v1.0 → Full ternary transformer inference drop-in The future isn’t 4-bit. The future is ternary.
1
0
1
This isn’t just math porn. This is the foundation for T81Q — ternary-native GGUF quantization. Think Q4_K_M, but instead of 5.5 bits/weight → 2.63 bits/weight. Same accuracy. Half the size. No new hardware needed. Coming in v0.3.
1
0
0
from t81 import BigInt a = BigInt("9"*42) # 42 nines b = BigInt("-170141183460469231731687303715884105727") # 2ⁱ²⁷ − 1 print(a * a) # instant, zero allocation print(a.bit_length(), "bits") # ~139 bits Yes, that’s a 140-bit integer stored in
1
0
0
Why balanced ternary? Because it’s the most dense integer representation possible: log₂(3) ≈ 1.58496 bits/digit → ~2.63 bits per digit when using {−1, 0, +1} That’s not “slightly better than binary”. That’s the theoretical lower bound. You cannot do better.
1
0
0
Just shipped t81lib v0.1.0 — the world’s first usable balanced-ternary arbitrary-precision integer library with real pip wheels. No research prototype. No “compile this yourself”. Just: pip install t81lib And it works. On Apple Silicon. Right now.
github.com
t81lib v0.1.0 – Balanced Ternary Arithmetic for the AI Era The first public release of t81lib — a header-only, C++20 balanced-ternary arithmetic library with full, high-performance Python bindings ...
1
0
1
t81lib v0.2.0 is out. Exact balanced ternary GEMM. #AVX512/#NEON. #Python bindings. 8×8×4 blocked. Prefetched. Double-precision accumulators. #Llama-3–8B ternary weights: 4.2 GB. Already faster than BitNet-b1.58 CPU kernels. 🔗 https://t.co/IMcodYfIkU The future is ternary.
0
0
0
Quick example: cpp #include <t81/t81lib.hpp> using t81::core::limb; limb a = limb::from_int(42); limb b = limb::from_int(-7); limb sum = a + b; // Ternary magic! Star/fork/contribute if you're into innovative numerics. What's your wildest use case for ternary? 👇
github.com
t81lib – Balanced-ternary quantization and arithmetic core for AI and quant workloads in modern C++ and Python. - t81dev/t81lib
0
0
0
Key features that make it shine: ✅ 48-trit limb scalars with safe overflow & hashing 📷bigint with Karatsuba mul 📷 Modular Montgomery helpers for const-time ops Python bindings for quick prototyping #Programming #MathLib #SIMD
1
0
1
🚀 Developers & math nerds: Tired of binary's limitations? Meet t81lib – a production-grade C++20 library for balanced ternary arithmetic! High-precision, deterministic computing with SIMD speed boosts (AVX-512/NEON) and Base81 I/O for clean string handling. Perfect for crypto,
github.com
t81lib – Balanced-ternary quantization and arithmetic core for AI and quant workloads in modern C++ and Python. - t81dev/t81lib
1
0
1
0.295 nanoseconds. That’s how long it takes t81lib to compare two balanced-ternary integers larger than what fits in 256 binary bits. For context: GMP 256-bit multiplication: 108.70 ns TTMath 256-bit: 198.60 ns Boost 256-bit: 612.40 ns t81lib Karatsuba multiplication on
0
0
0
First ever balanced-ternary limb using Kogge-Stone + Booth-radix-27 + 3-level Karatsuba achieves: -30 Mops/s addition (within 15× binary scalar) -Faster negation than int64_t -Zero -cost exact overflow detection -8.0 bits/trit packing density (theoretical max ≈ 8.1) All in <
github.com
T81 Ecosystem: a deterministic, ternary-native computing stack featuring base-81 data types, the TISC instruction set, T81VM, T81Lang, Axion safety/optimization, and the full recursive cognition t...
0
0
0