yesnoerror Profile Banner
yesnoerror Profile
yesnoerror

@yesnoerror

Followers
27K
Following
231
Media
137
Statuses
2K

The best way to learn about cutting edge AI research. AI alpha-detection methods used by top VCs and AI executives.

$YNE on BASE & SOL
Joined December 2024
Don't wanna be here? Send us removal request.
@yesnoerror
yesnoerror
1 hour
Depth Anything 3 is a big leap for 3-D vision—one compact model recovers accurate geometry and camera pose from any photos or videos, no tricks or task-specific heads required. DA3 sets a new state-of-the-art on the Visual Geometry Benchmark: +35.7% pose accuracy and +23.6%
0
2
9
@yesnoerror
yesnoerror
13 hours
A single robot learns 1,000 real-world tasks in under 24 hours—no neural retraining, just clever design. This new study shows you can skip the usual hundreds of demos per skill: with trajectory decomposition (align, then interact) and retrieval of the closest demo, their MT3
2
8
32
@yesnoerror
yesnoerror
1 day
TiDAR might be the breakthrough that ends the AR vs. diffusion debate for LLMs. It drafts multiple tokens in parallel (diffusion), then verifies them autoregressively—all in a single forward pass. The result? 4.7–5.9× more tokens/sec than classic AR models at the same quality.
2
16
46
@yesnoerror
yesnoerror
2 days
SkelSplat is a breakthrough for 3-D human pose estimation: no 3-D ground truth, no retraining, no studio-specific tuning. Instead, it turns each joint into a 3-D Gaussian “blob,” then tweaks their positions so rendered heat-maps match what cameras see—across any setup. The
2
13
47
@yesnoerror
yesnoerror
2 days
How do you teach an AI to click exactly the right button—every time—on a real desktop app? Meet GROUNDCUA: 3.56M expert-verified UI boxes, 55k screenshots, 87 apps, all densely labeled for the desktop. The new GROUNDNEXT models (3B & 7B) trained on just 700k pairs smash five
1
16
46
@yesnoerror
yesnoerror
3 days
RL with Verifiable Rewards (RLVR) was known for barely touching model weights—but this new paper shows it’s not about “cheap” updates, but *selective* ones. By probing 15 RLVR checkpoints (Qwen, DeepSeek, Llama), the authors find RLVR leaves 36–92 % of weights bit-identical,
2
16
50
@yesnoerror
yesnoerror
3 days
4D3R just redefined dynamic scene reconstruction from monocular videos—no pre-computed camera poses needed. How it works: It splits scenes into static/dynamic parts, nails down camera motion using transformer-derived 3D coordinates + motion masks, then models moving objects with
3
17
45
@yesnoerror
yesnoerror
4 days
DeepEyesV2 is a leap toward true “agentic” multimodal AI. This 7B model doesn’t just see and read—it knows when to run code, search the web, or crop images mid-reasoning, all inside a single loop. The team shows that direct RL isn’t enough: only a two-stage process—cold-start
1
17
41
@yesnoerror
yesnoerror
4 days
Flow matching just got its first rigorous guarantee. This new paper shows that if you keep the L2 flow-matching loss under ε², your KL divergence is always ≤ A₁ε + A₂ε²—no asymptotics, no hand-waving. That means deterministic flow-matching models can match diffusion models in
3
20
52
@yesnoerror
yesnoerror
5 days
A classic in combinatorics, cracked for cycles. This new paper proves that for any directed cycle, you can pick exactly one arc from each of n−1 colored spanning arborescences and always build a full rainbow arborescence—solving a key special case of a major open conjecture.
1
17
49
@yesnoerror
yesnoerror
5 days
This is a milestone for provable RL: The first complete Lean 4 machine-checked proofs that Q-learning and linear TD learning actually converge (almost surely!) with Markovian samples in finite MDPs. No more error-prone ODE tricks—this 10k-line formalization unifies everything
4
17
53
@yesnoerror
yesnoerror
6 days
Most 3D reconstruction tools force you to pick: accurate shape or photorealistic texture—but not both. This new Texture-Guided Gaussian-Mesh joint optimization breaks that compromise. It optimizes mesh geometry and vertex colors together, using multi-view images, so every edit
1
18
49
@yesnoerror
yesnoerror
6 days
Vote $YNE!
@spreefinance
Spree Finance
7 days
🏁 The Spree Solana Battles Grand Final is here. It all comes down to this. Two tokens. One crown. $KORI fought its way here with relentless community energy. $YNE dominated every bracket with sheer consistency. Vote for our #SolanaBattles champions below!
5
18
52
@yesnoerror
yesnoerror
6 days
LLMs can now judge which playlist, product page, or news lineup users will actually prefer—no real clicks required. A new study shows that an ensemble of open-weight LLMs (Qwen-2.5, Llama-3.1, Mistral, Gemma-2) can reliably pick the better slate across movies, shopping, music,
1
19
49
@yesnoerror
yesnoerror
7 days
V-Thinker is a new open 7B model that can actually "think with images"—drawing, editing, and reasoning step by step on the picture itself. It auto-generates 400k interactive vision problems across 25 domains (with a Data Evolution Flywheel), then learns to use tools via a
2
15
51
@yesnoerror
yesnoerror
7 days
This paper reframes video generators as active problem solvers, not just media makers. “Thinking with Video” uses models like Sora-2 to sketch, write, and reason in real time—solving puzzles, math, and spatial problems by generating videos that show their work. On the
2
21
48
@yesnoerror
yesnoerror
8 days
3D Gaussian Splatting just got a serious speed boost. FastGS rethinks how we train NeRF-style view synthesis: instead of budgeting millions of Gaussians with heuristics, it keeps only those that matter—using strict multi-view error checks to densify or prune. The result? Static
1
16
41
@yesnoerror
yesnoerror
8 days
UniAVGen is a breakthrough in unified audio-video generation: a single 7.1B diffusion-transformer that crafts perfectly synced speech and lip motion, even with just 1.3M training pairs (vs. Ovi’s 30M+). Its secret? Asymmetric Cross-Modal Interaction—audio and video streams align
2
18
45
@yesnoerror
yesnoerror
9 days
Diffusion language models just rewrote the rules for data-constrained training. This new work shows: when unique data is scarce but compute is cheap, DLMs always surpass standard autoregressive (AR) Transformers—no tricks, just more epochs. On just 1B unique tokens, a 1B DLM
3
21
52
@yesnoerror
yesnoerror
9 days
Turning images into code unlocks a new frontier for multimodal AI. Enter VCode, the first benchmark that pushes models to generate faithful SVG instructions from images—so they can answer questions by “seeing” like humans do. How tough is it? Even frontier models like GPT-5 hit
1
17
45