#toolbench X Hashtag

Explore tweets tagged as #toolbench

Yujia Qin

@TsingYoga

3 years

🥳🛠️Introducing ToolBench!🤖🎉 🌟Large-scale instruction tuning SFT data to equip LLMs with general tool-use capability 🔖 We release 98k data with 312k real API calls. We also release a capable model ToolLLaMA that matches ChatGPT in tool use Github: https://t.co/z85AxpXxBx

9

84

432

Rohan Paul

@rohanpaul_ai

24 days

New AWS paper shows a small language model can learn tool calling well enough to beat much larger models. It hits 77.55% pass rate on ToolBench with 350M parameters. Tool calling means the model picks an application programming interface (API) and outputs the exact call format.

12

43

227

Jason Weston

@jaseweston

2 years

- We test on 4 tasks from ToolBench - ToolVerifier outperforms few-shot baselines by 22% - Self-verification alone improves avg perf by 8% - Significantly better than Tool-Augmented LLMS - Outperforms GPT3.5-T & even GPT4 on some tasks despite being based on Llama 70B 🧵(2/4)

1

17

GPT Maestro | LLMpedia Curator

@GptMaestro

2 years

🦾Today's LLM paper review ProTIP: Progressive Tool Retrieval Improves Planning (Dec 2023) by Apple introduces a contrastive learning approach for AI agents. ProTIP outperforms ChatGPT on the ToolBench dataset, achieving 24% higher Recall@K=10 in tool retrieval and 41%

1

0

1

Caiming Xiong

@CaimingXiong

2 years

xLAM-v0.1-r Results: xLAM-v0.1-r significantly outperforms #GPT3.5-Turbo across all scenarios and surpasses #GPT4 in several settings within WebShop, HotpotQA, ToolBench, and MINT-Bench.

1

0

3

Service Caster

@SERVICECASTER

2 months

Ready, set, install. 🛠️Service Caster has the wheels, hardware, and know-how to get your carts and equipment rolling fast. #ServiceCaster #ToolBench #BuiltToRoll

0

Aran Komatsuzaki

@arankomatsuzaki

2 years

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs ToolLLaMA exhibits comparable performance to ChatGPT repo: https://t.co/J5V19NtpUw abs: https://t.co/DzVjlJomiZ

11

150

610

Manthan Patel | Lead Gen Man

@leadgenmanthan

23 days

Bigger Models Aren’t Better Agents. This Paper Proves It. Most teams assume agent quality scales with model size. This paper shows the opposite. AWS researchers fine-tuned a 350M parameter model (OPT-350M) specifically for tool calling. On ToolBench, it outperformed models

0

1

Alex Yanko 🇺🇦

@LeopolisDream

2 years

Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wikitext-103 on a single A100 in <100 seconds. https://t.co/Jxhcg6Nfk7

0

Robert Youssef

@rryssf_

3 months

DeepAgent absolutely destroys other agents across every benchmark. It beats ReAct-GPT-4o, CodeAct, and WebThinker on both: → Tool use tasks (ToolBench, Spotify, TMDB) → Real-world apps (WebShop, GAIA, HLE)

1

11

Rohan Paul

@rohanpaul_ai

2 years

🚀 🔥ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs🔥🚀 - To facilitate tool-use capabilities within open-source LLMs. - A general tool-use framework of data construction, model training and evaluation. - ToolBench, an instruction-tuning dataset

1

3

Wainfleet Trading Post

@arlissbryant

8 months

Record no 110 large vise benchtop https://t.co/WBjOAvGU3V #chickencoop #signs #ChickenDaddy #Etsy #woodstove #Toolbench

0

2

3

Emergent Mind

@EmergentMind

8 days

AI Agent Systems: Architectures, Applications, and Evaluation - 5-part "agent transformer" unifies agent design (policy, memory, tools, verifiers, environment) - 7-metric evaluation + 5 benchmark suites (AgentBench, WebArena, ToolBench, SWE-bench, GAIA) standardize assessment

1

0

1

Pit Stop token

@pitstopverse

6 days

Fragment 15 online. Telemetry sync in progress. Steering core receiving update. Toolbench uplink confirmed. #PST #Fragment15 #ToolbenchSync #Telemetry #SteeringCore #CryptoRacing #ShadowGrid #ModularLore #Web3 #FactionTraction #pitstoptoken #NFTs #nft

1

0

1

SambaNova

@SambaNovaAI

2 years

Agent-based manipulation of APIs using #LLMs is a popular approach, but consistent and reliable evaluation metrics to assess this has been lacking. In this poster, we introduce a set of benchmarks called ToolBench and evaluate multiple open source #LLMs. @SambaNovaAI Researcher

0

2

5

Crocodile Cloth

@CrocodileCloth

2 years

That's one good looking banner. 👀 #Garage #GarageBanner #Banner #Poster #Toolbench #Workshop

0

1

Shubham Malhotra

@ShubhamMal72313

7 days

AWS researchers just published a paper on arXiv - A 350M model fine-tuned for 1 epoch on Toolbench (~187k examples) reports a 77.5% pass rate on ToolBench/ToolEval. Excited to test this pattern in @RunAnywhereAI (local-first agents + fallback)

1

2

4

Mahammad Wahab

@MohammadWahab15

25 days

AWS research showed a small 350M parameter model, fine-tuned for tool use, hitting a 77.55% success rate on ToolBench, dramatically outperforming GPT-4 class models which stalled around 26%. This confirms that for high-precision, task-specific execution, Small Language Models

0

Mike Tamir, PhD

@MikeTamir

2 years

OpenBMB/ToolBench: An open platform for training, serving, and evaluating large language model for tool learning. https://t.co/zbGessquoT

0

1

5