gm8xx8 Profile Banner
๐š๐”ช๐Ÿพ๐šก๐šก๐Ÿพ Profile
๐š๐”ช๐Ÿพ๐šก๐šก๐Ÿพ

@gm8xx8

Followers
7K
Following
71K
Media
2K
Statuses
16K

โ˜บ๏ธŽ

Joined March 2010
Don't wanna be here? Send us removal request.
@gm8xx8
๐š๐”ช๐Ÿพ๐šก๐šก๐Ÿพ
5 months
THIS WEEK SHOULD BE EXCITING.me: every week, forever.
Tweet media one
1
2
37
@gm8xx8
๐š๐”ช๐Ÿพ๐šก๐šก๐Ÿพ
8 hours
โ†“.
Tweet card summary image
huggingface.co
0
0
0
@gm8xx8
๐š๐”ช๐Ÿพ๐šก๐šก๐Ÿพ
8 hours
DeepSeek-V3.1 โฎ• looks like theyโ€™ve been tinkering
Tweet media one
1
2
20
@gm8xx8
๐š๐”ช๐Ÿพ๐šก๐šก๐Ÿพ
15 hours
COLLECTION: TR:
Tweet card summary image
huggingface.co
0
0
2
@gm8xx8
๐š๐”ช๐Ÿพ๐šก๐šก๐Ÿพ
15 hours
ASearcher gives open LLMs full โ€œSearch Intelligence,โ€ fixing โ‰ค10-turn RL and weak fact-checking via 128-turn async RL (40+ tool calls, 150k+ tokens), synthetic QA gen (14kโ†’134k verified hard QAs, 25.6k tool-required), and a minimal-tool unified agent. Results: 14B: RAG F1 60.0
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
5
35
@gm8xx8
๐š๐”ช๐Ÿพ๐šก๐šก๐Ÿพ
15 hours
COLLECTION.
Tweet card summary image
huggingface.co
0
0
1
@gm8xx8
๐š๐”ช๐Ÿพ๐šก๐šก๐Ÿพ
15 hours
GLiNER-decoder is an encoderโ€“decoder model for NER that uses span pooling and autoregressive type generation. - Removes GLiNERโ€™s ~30-type cap, label-order dependence, and fixed ontology.- A single encoder pass produces span vectors โ†’ decoder prompts generate types (supports
Tweet media one
1
3
19
@gm8xx8
๐š๐”ช๐Ÿพ๐šก๐šก๐Ÿพ
17 hours
My technical deep dive on GLiClass-V3 โฎ• (papers out) . Training pipeline: .- Pre-train on 1.2M NLI/classification mix.- PPO-adapted multi-label RL mid-train (few-shot & feedback learning).- LoRA post-train with logic/NLI + pattern-focused streams (high-rank LoRA โ†’ edge
Tweet media one
Tweet media two
0
0
3
@gm8xx8
๐š๐”ช๐Ÿพ๐šก๐šก๐Ÿพ
1 day
COLLECTION: PAPER:
Tweet card summary image
huggingface.co
0
0
4
@gm8xx8
๐š๐”ช๐Ÿพ๐šก๐šก๐Ÿพ
1 day
InclusionAI releases GroveMoE . GroveMoE uses heterogeneous + adjugate experts (ARMโ€™s big.LITTLE-inspired). Experts are grouped, each sharing an adjugate expert computed once, so compute scales with token complexity (3.14โ€“3.28B active per token). Models.- GroveMoE-Base (33B,
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
11
56
@gm8xx8
๐š๐”ช๐Ÿพ๐šก๐šก๐Ÿพ
1 day
MODELS: SETS: RELEASE:
Tweet card summary image
huggingface.co
0
0
11
@gm8xx8
๐š๐”ช๐Ÿพ๐šก๐šก๐Ÿพ
1 day
NVIDIA Nemotron-Nano v2. Models: 12B Base, 9B Reasoning, 9B Base.- Arch: Hybrid Mamba2โ€“Transformer (128K ctx, 4 attn layers).- Training: 10.6T tokens (3.5T synthetic from DeepSeek, Qwen, Nemotron-4, phi-4, etc.).- 15 natural languages + 43 programming languages.- Datasets:
Tweet media one
9
35
222
@gm8xx8
๐š๐”ช๐Ÿพ๐šก๐šก๐Ÿพ
1 day
๐“๐‡๐ˆ๐’ ๐ˆ๐’ ๐’๐Ž๐Œ๐„ ๐†๐‘๐„๐€๐“ ๐–๐Ž๐‘๐Š . โ€ฆ๐ƒ๐Ž๐โ€™๐“ ๐’๐‹๐„๐„๐
Tweet media one
0
3
29
@gm8xx8
๐š๐”ช๐Ÿพ๐šก๐šก๐Ÿพ
2 days
Cool listโ€ฆ but Iโ€™ve been tracking this space closely. Might be time I share my own ranking of Chinaโ€™s open model labs.
Tweet media one
2
0
17
@gm8xx8
๐š๐”ช๐Ÿพ๐šก๐šก๐Ÿพ
3 days
SET: CANARY: PARAKEET:
Tweet card summary image
huggingface.co
0
0
2
@gm8xx8
๐š๐”ช๐Ÿพ๐šก๐šก๐Ÿพ
3 days
NVIDIA DROP: multilingual ASR + AST stack. - Granary: ~1M hours of ASR/AST data across 25 European languages, built with the NeMo SDP pipeline from YODAS, MOSEL, YTC, and LibriLight. Largest open-source EU speech dataset, backbone for Canary and Parakeet training. - Canary-1b-v2:
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
0
7
@gm8xx8
๐š๐”ช๐Ÿพ๐šก๐šก๐Ÿพ
3 days
RT @gm8xx8: Kimina-Prover-RL is a DeepSeek-R1-style RL pipeline for Lean 4 theorem proving from Project Numina and Kimi, built on Verl. Twoโ€ฆ.
0
9
0
@gm8xx8
๐š๐”ช๐Ÿพ๐šก๐šก๐Ÿพ
4 days
โ˜บ๏ธŽ
Tweet media one
0
0
6