StepFun
@StepFun_ai
Followers
4K
Following
149
Media
34
Statuses
167
Scale-up possibilities for everyone. HuggingFace: https://t.co/Soz5ZuwCrd
Joined February 2025
🚀Introducing GELab-Zero: The first complete "Model + Infra" stack for GUI agents. 🤓We are open-sourcing the full stack. Think of it as the open-source answer to GUI Agent MCP. 👯What’s inside: ✅SOTA 4B Model: Lightweight, fast, local execution. ✅One-Click Infra:
13
61
382
PaCoRe🔥 a new way to scale test-time compute, released by @StepFun_ai ✨ Breaks the context limit with parallel + coordinated reasoning ✨ Reaches million-token TTC without larger context windows ✨ Helps AI think in parallel, not just in long chains ✨ Models, data, inference
6
17
110
huggingface.co
0
0
4
🤯 8B model > GPT-5 on math ? 🌠Introducing Parallel Coordinated Reasoning (PaCoRe)——open source deep think. 📚Decouple reasoning from context limits → multi-million-token TTC. 🤩With this new paradigm, even an 8B model can push the effective per-problem TTC context to
7
28
170
⭐️ 1,000+ GitHub stars! Thanks for pushing us forward. More exciting updates coming soon. Onward to the next milestone! 🚀 GitHub: https://t.co/VzfsNprg6U
2
0
24
GELab-Zero:完全开源的GUI Agent解决方案 集成了模型和基础设施,支持本地部署,提供完整的隐私控制。简化了手机GUI Agent的工程复杂度。能实现查找任务、操作APP等手机端任务。 Github: https://t.co/s4AVbDVl0r
3
10
27
We are excited to introduce Open Vision Reasoner (OVR) 🚀 — transferring linguistic cognitive behavior to unlock advanced visual reasoning! 💡 Two-stage recipe • Massive linguistic cold-start on Qwen-2.5-VL-7B sparks “mental imagery” • ~1 k-step multimodal RL refines & scales
3
28
141
🎉 Introducing Open Reasoner Zero 🚀 Performance: Matches DeepSeek R1-Zero (32B) in just 1/30 steps! 📚 Full training strategies & technical paper 💻 100% open-source: Code + Data + Model ⚖️ MIT licensed - Use it your way! 🌊 Let the Reasoner-Zero tide rise! 🚢 1/n
27
159
856
🥰
New drop: @StepFun_ai’s Step-Audio-EditX — the first open-source LLM-powered audio editing model. Control vibes, tone, emotions, even breaths + laughs. Multi-lingual. Wildly flexible. Super fun to play with. Try it now: https://t.co/6WxX2Uc49j
#AudioAI #LLM #AItools #GMICloud
1
3
14
🚨We are pleased to announce our acceptance at NeurIPS 2025——Open-Reasoner-Zero(ORZ) ⚡️ Using the same base model(Qwen2.5-32B base),Open-Reasoner-Zero VS DeepSeek-R1-Zero: Superior Performance across AIME/GPQA/MATH500 Only 1/10 of the Training Steps. https://t.co/lARMx3sqwx
3
8
52
🎉 Thrilled to share our work accepted to NeurIPS 2025: GUI Exploration Lab (GE-Lab)! 🤗 It's an open-source, flexible simulation engine designed specifically for GUI Agents to navigate and master complex app environments. 💡 We're enabling developers to build smarter, more
2
3
34
We just shipped a bunch of upgrades: 💁 1. Added full Japanese & Korean language support. 📊 2. Released the Step-Audio-Edit-Benchmark.( https://t.co/stUUqfWZBP) 🤠 3. Rolled out a new version with polyphonic pronunciation control + major boosts in emotion, style, and
github.com
Contribute to stepfun-ai/Step-Audio-Edit-Benchmark development by creating an account on GitHub.
🚀 Step-Audio-EditX is now open source!! ✨ Zero-Shot TTS with high timbre similarity ✨ Iterative editing of dozens of audio emotion and speaking style ✨ Fine-grained control over paralinguistic features Whether for audio editing, interactive design, or personalized scenarios,
3
12
71
Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Generation https://t.co/V6GzW7FpZ7
arxiv.org
In this work, we present a novel direction to build an image tokenizer directly on top of a frozen vision foundation model, which is a largely underexplored area. Specifically, we employ a frozen...
0
0
5
RepoMaster: Autonomous Exploration and Understanding of GitHub Repositories for Complex Task Solving https://t.co/0KJwzqUS6v
arxiv.org
The ultimate goal of code agents is to solve complex tasks autonomously. Although large language models (LLMs) have made substantial progress in code generation, real-world tasks typically demand...
1
0
5
Perception-R1: Pioneering Perception Policy with Reinforcement Learning https://t.co/NsNtrNes3l
arxiv.org
Inspired by the success of DeepSeek-R1, we explore the potential of rule-based reinforcement learning (RL) in MLLM post-training for perception policy learning. While promising, our initial...
1
0
1
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model https://t.co/lARMx3sYm5
arxiv.org
We introduce Open-Reasoner-Zero, the first open source implementation of large-scale reasoning-oriented RL training on the base model focusing on scalability, simplicity and accessibility. Through...
1
0
1
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning https://t.co/95ndk4GMcc
arxiv.org
The remarkable reasoning capability of large language models (LLMs) stems from cognitive behaviors that emerge through reinforcement with verifiable rewards. This work investigates how to transfer...
1
0
1
KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models https://t.co/TUlx5iQ3so
arxiv.org
Recent advances in multi-modal generative models have enabled significant progress in instruction-based image editing. However, while these models produce visually plausible outputs, their...
1
0
1