Xianjie Wu
@LittleTonyXJ
Followers
6
Following
1
Media
7
Statuses
9
Joined June 2017
[9/n]🤗 TableBench Resources: Homepage: https://t.co/Pm1CWwlO7O arXiv: https://t.co/oQmo5yjiZO Code: https://t.co/OE74gOxR8O Leaderboard: https://t.co/I0RsLFSM7W Evaluation Data: https://t.co/l6YDx8gyRv Instruction Data: https://t.co/R0SOLx1P8o HF Paper:
huggingface.co
0
0
0
[8/n]🤔The interesting trend between the parsing ratio and overall score reflects that this task requires an assessment of both table comprehension and instruction-following abilities. When these abilities diverge, a phenomenon on the left side of the quadratic curve appears.
1
0
0
[7/n] Most models excel in fact-based tasks but encounter bottlenecks in numerical computation. Data analysis tasks require more complex and comprehensive analytical skills, and chart generation tasks necessitate precise coding skills and table comprehension abilities in LLMs.
1
0
0
[6/n] 📈Massive experiments are conducted on over 30 advanced with sizes ranging from 7B to 110B parameters, including general/code LLMs, open-source/proprietary models, and SFT models.
1
0
0
[5/n]🔥Complex🔥 We define the complexity of the dataset by calculating the number of reasoning steps required to solve the problem. The Figure indicates that the overall complexity of TableBench is significantly higher than that of existing datasets.
1
0
0
[4/n]🔥Comprehensive🔥 We delineate four primary question categories: fact verification, numerical reasoning, data analysis, and visualization. Notably, TableBench emphasizes data analysis and chart generation functionalities, areas that were notably deficient in prior datasets.
1
0
0
[3/n] We present an annotation framework that integrates both manual and automated techniques to improve annotation efficiency. We propose two high-quality corpora: TableBench, comprising 886 samples, and TableInstruct, an extensive instruction corpus containing 2k samples.
1
0
0
[2/n]💪Motivation💪 Despite the remarkable advances of LLMs in tabular data interpretation, LLMs still face significant challenges in industrial applications. To bridge the gap between academic benchmarks and practical applications, we introduce TableBench.
1
0
0
[1/n]🔥TableBench is coming !🚀A comprehensive and complex benchmark covering 18 fields across four main TableQA categories, rigorously testing LLM's performance in complex industrial TableQA scenarios. 📈 Dive into the details: https://t.co/Pm1CWwlO7O
#AI #LLM #TableQA
1
2
6