Explore tweets tagged as #toolbench
@TsingYoga
Yujia Qin
3 years
🥳🛠️Introducing ToolBench!🤖🎉 🌟Large-scale instruction tuning SFT data to equip LLMs with general tool-use capability 🔖 We release 98k data with 312k real API calls. We also release a capable model ToolLLaMA that matches ChatGPT in tool use Github: https://t.co/z85AxpXxBx
9
84
432
@rohanpaul_ai
Rohan Paul
24 days
New AWS paper shows a small language model can learn tool calling well enough to beat much larger models. It hits 77.55% pass rate on ToolBench with 350M parameters. Tool calling means the model picks an application programming interface (API) and outputs the exact call format.
12
43
227
@jaseweston
Jason Weston
2 years
- We test on 4 tasks from ToolBench - ToolVerifier outperforms few-shot baselines by 22% - Self-verification alone improves avg perf by 8% - Significantly better than Tool-Augmented LLMS - Outperforms GPT3.5-T & even GPT4 on some tasks despite being based on Llama 70B 🧵(2/4)
1
1
17
@GptMaestro
GPT Maestro | LLMpedia Curator
2 years
🦾Today's LLM paper review ProTIP: Progressive Tool Retrieval Improves Planning (Dec 2023) by Apple introduces a contrastive learning approach for AI agents. ProTIP outperforms ChatGPT on the ToolBench dataset, achieving 24% higher Recall@K=10 in tool retrieval and 41%
1
0
1
@CaimingXiong
Caiming Xiong
2 years
xLAM-v0.1-r Results: xLAM-v0.1-r significantly outperforms #GPT3.5-Turbo across all scenarios and surpasses #GPT4 in several settings within WebShop, HotpotQA, ToolBench, and MINT-Bench.
1
0
3
@SERVICECASTER
Service Caster
2 months
Ready, set, install. 🛠️Service Caster has the wheels, hardware, and know-how to get your carts and equipment rolling fast. #ServiceCaster #ToolBench #BuiltToRoll
0
0
0
@arankomatsuzaki
Aran Komatsuzaki
2 years
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs ToolLLaMA exhibits comparable performance to ChatGPT repo: https://t.co/J5V19NtpUw abs: https://t.co/DzVjlJomiZ
11
150
610
@leadgenmanthan
Manthan Patel | Lead Gen Man
23 days
Bigger Models Aren’t Better Agents. This Paper Proves It. Most teams assume agent quality scales with model size. This paper shows the opposite. AWS researchers fine-tuned a 350M parameter model (OPT-350M) specifically for tool calling. On ToolBench, it outperformed models
0
0
1
@LeopolisDream
Alex Yanko 🇺🇦
2 years
Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wikitext-103 on a single A100 in <100 seconds. https://t.co/Jxhcg6Nfk7
0
0
0
@rryssf_
Robert Youssef
3 months
DeepAgent absolutely destroys other agents across every benchmark. It beats ReAct-GPT-4o, CodeAct, and WebThinker on both: → Tool use tasks (ToolBench, Spotify, TMDB) → Real-world apps (WebShop, GAIA, HLE)
1
1
11
@rohanpaul_ai
Rohan Paul
2 years
🚀 🔥ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs🔥🚀 - To facilitate tool-use capabilities within open-source LLMs. - A general tool-use framework of data construction, model training and evaluation. -  ToolBench, an instruction-tuning dataset
1
1
3
@arlissbryant
Wainfleet Trading Post
8 months
0
2
3
@EmergentMind
Emergent Mind
8 days
AI Agent Systems: Architectures, Applications, and Evaluation - 5-part "agent transformer" unifies agent design (policy, memory, tools, verifiers, environment) - 7-metric evaluation + 5 benchmark suites (AgentBench, WebArena, ToolBench, SWE-bench, GAIA) standardize assessment
1
0
1
@pitstopverse
Pit Stop token
6 days
Fragment 15 online. Telemetry sync in progress. Steering core receiving update. Toolbench uplink confirmed. #PST #Fragment15 #ToolbenchSync #Telemetry #SteeringCore #CryptoRacing #ShadowGrid #ModularLore #Web3 #FactionTraction #pitstoptoken #NFTs #nft
1
0
1
@SambaNovaAI
SambaNova
2 years
Agent-based manipulation of APIs using #LLMs is a popular approach, but consistent and reliable evaluation metrics to assess this has been lacking. In this poster, we introduce a set of benchmarks called ToolBench and evaluate multiple open source #LLMs. @SambaNovaAI Researcher
0
2
5
@CrocodileCloth
Crocodile Cloth
2 years
That's one good looking banner. 👀 #Garage #GarageBanner #Banner #Poster #Toolbench #Workshop
0
0
1
@ShubhamMal72313
Shubham Malhotra
7 days
AWS researchers just published a paper on arXiv - A 350M model fine-tuned for 1 epoch on Toolbench (~187k examples) reports a 77.5% pass rate on ToolBench/ToolEval. Excited to test this pattern in @RunAnywhereAI (local-first agents + fallback)
1
2
4
@MohammadWahab15
Mahammad Wahab
25 days
AWS research showed a small 350M parameter model, fine-tuned for tool use, hitting a 77.55% success rate on ToolBench, dramatically outperforming GPT-4 class models which stalled around 26%. This confirms that for high-precision, task-specific execution, Small Language Models
0
0
0
@MikeTamir
Mike Tamir, PhD
2 years
OpenBMB/ToolBench: An open platform for training, serving, and evaluating large language model for tool learning. https://t.co/zbGessquoT
0
1
5