Unso Jo
@unsojo
Followers
2K
Following
2K
Media
13
Statuses
255
CEO of Genabase; Teaching History of AI @Cornell
New York, USA
Joined November 2017
On episode 996 of "Is AI coming for your job too?" We present, Agent Bain vs. Agent McKinsey. A New text-to-SQL benchmark for the business domain (CORGI): https://t.co/Cr7L40rJRk Joint work among Cornell Bowers CIS, Cornell Johnson School of Management, & Gena AI (@YueeeLi_
2
2
3
If for some Finger Lakey reason you happen to be in Ithaca, come to my talk at the Cornell Johnson School of Management. Sage Hall noon 21st Oct. https://t.co/8HSSisgP9Q Data Access for Everyone with AI For everyday people, databases are a foreign concept. There is a
business.cornell.edu
0
0
1
read our paper: https://t.co/lMdUnKHpjF contribute to our open source benchmark and data: https://t.co/bbK7uS7zg1 evaluate and submit online: https://t.co/Cr7L40rJRk send questions to: rt529@cornell.edu or unsojo@cornell.edu
github.com
Contribute to corgibenchmark/CORGI development by creating an account on GitHub.
0
0
2
A few interesting insights**: 1) AI is relatively bad at identifying latent patterns in data like seasonal sales trends 2) AI is worse at giving data-informed recommendations (future) than data-informed explanations (past). 3) AI relatively bad at giving advice on planning 4)
1
1
2
Our new CORGI benchmark features 10 hand-curated databases representing modern enterprises spanning retail e-commerce such as Lululemon, DTC enablement like Shopify, and c2c like Airbnb and TheRealReal. Our databases are jacked 😳 with more tables and relations than seen in
2
0
2
This is my lecture from 2 months ago at @Cornell “How do I increase my output?” One natural answer is "I will just work a few more hours." Working longer can help, but eventually you hit a physical limit. A better question is, “How do I increase my output without increasing
44
763
5K
Our work serves as a case study of how applying domain expertise can reduce LLM cost -- Using generic LLM methods bluntly results in unnecessary expenses. Leveraging knowledge of SQL and databases can level the playing field for efficient applications.
0
0
1
Preprint is available here: https://t.co/F0kmxLIo4U Open source package available here: https://t.co/umRgA9CdmX [soon to be integrated to PyPI txt2sql]
github.com
Implementation of N-rep. Contribute to genaasia/N-rep development by creating an account on GitHub.
1
0
4
With 14B Qwen + N-rep you can get better performance than o3-mini. Go from 46 cents per query to 3.9 cents for comparable performance.
1
1
0
Using multiple schema representations, you can reduce the number of calls required in self-consistency that are sometimes 100+ LLM calls!
1
0
0
Is your text-to-SQL AI worth the cost? We introduce "N-Rep," a new faster and 10x cheaper approach for text-to-SQL without chain-of-thought reasoning or expensive fine-tuning. Joint work with @genadotco @andreawwenyi @dmimno 🧵
1
1
4
0
0
4
Special thanks to @Xianbao_QIAN for helping brainstorm and @tsmullaney for inspirational book!
1
0
2
China is a nation with over a hundred minority languages and many ethnic groups. What does this say about China’s 21st century AI policy?
1
0
0
This suggests a break from China’s past stance of using inclusive language policy as a way to build a multiethnic nation. We see no evidence of socio-political pressure or carrots for Chinese AI groups to dedicate resources for linguistic inclusivity.
1
0
1
In fact, many LLMs from China fail to even recognize some lower resource Chinese languages such as Uyghur.
1
0
1
LLMs from China are highly correlated with Western LLMs in multilingual performance (0.93 - 9.99) on tasks such as reading comprehension.
1
0
0
Do Chinese AI Models Speak Chinese Languages? Not really. Chinese LLMs like DeepSeek are better at French than Cantonese. [ https://t.co/22ZGZZI1Us] Joint work with @andreawwenyi @dmimno 🧵
2
4
26
Feels so odd to be teaching a history of AI when it's very actively being contested in 2025, though time will tell what gets written!
0
0
4