NLPiation Profile Banner
Ala Falaki Profile
Ala Falaki

@NLPiation

Followers
1K
Following
3K
Media
210
Statuses
1K

Ph.D. / NLP Researcher | Technical Editor @Towards_AI | Link to my monthly newsletter about NLP trends and blog👇

Joined July 2021
Don't wanna be here? Send us removal request.
@NLPiation
Ala Falaki
2 days
@NLPiation
Ala Falaki
2 days
How many times should I say it?. The future will be smaller, highly specialized models that excel in one domain. Specially for services like ChatGPT.
0
0
0
@NLPiation
Ala Falaki
2 days
How many times should I say it?. The future will be smaller, highly specialized models that excel in one domain. Specially for services like ChatGPT.
@NLPiation
Ala Falaki
2 days
📝 Beyond GPT-5: Making LLMs Cheaper and Better via Performance-Efficiency Optimized Routing. GPT-5 used test-time routing to balance efficiency and performance. This paper extends that with Avengers-Pro, which combine multiple LLMs to optimize the same trade-off. 1/5
Tweet media one
0
0
0
@grok
Grok
5 days
Join millions who have switched to Grok.
217
426
3K
@NLPiation
Ala Falaki
2 days
Paper: Code:
1
0
0
@NLPiation
Ala Falaki
2 days
It can match GPT-5-medium’s performance at 27% less cost, and achieved 90% of its performance at 63% less cost. Additionally, always gave the best accuracy for any given cost and the lowest cost for any given accuracy, outperforming every single model tested. 5/5.
1
0
0
@NLPiation
Ala Falaki
2 days
For a new query during inference, it embeds, finds nearest clusters, and routes to the model with the best pre-computed score. It suggests that the future of LLMs may lie in orchestrating multiple specialized cost-effective models instead of one big general purpose model. 4/5.
1
0
0
@NLPiation
Ala Falaki
2 days
To train the system, they build clusters of queries, then evaluate each model’s accuracy (performance) and cost (efficiency) within each cluster. From this, they compute normalized "performance–efficiency scores" per model–cluster pair. 3/5.
1
0
0
@NLPiation
Ala Falaki
2 days
Instead of sending queries to just one of two models, Avengers-Pro embeds queries, clusters them by semantic similarity, and then routes each query to the model with the best performance-efficiency score, balancing accuracy and cost through a trainable trade-off parameter. 2/5.
1
0
0
@NLPiation
Ala Falaki
2 days
📝 Beyond GPT-5: Making LLMs Cheaper and Better via Performance-Efficiency Optimized Routing. GPT-5 used test-time routing to balance efficiency and performance. This paper extends that with Avengers-Pro, which combine multiple LLMs to optimize the same trade-off. 1/5
Tweet media one
1
0
0
@NLPiation
Ala Falaki
9 days
Interestingly, optimal positions vary, smaller models prefer ssp/esp, while very large models (like, LLaMA3-70B) often benefit from demos closer to the query (sum = start of user message). The optimal demo placement is task- and model-dependent. 5/5.
1
0
0
@NLPiation
Ala Falaki
9 days
Placing demos early in the prompt (ssp = start of system prompt, or esp = end of system prompt) usually gives more accurate and stable results than later positions. Late placement (eum = end of user message) makes models unstable, flipping many answers and hurting accuracy. 4/5.
1
0
0
@NLPiation
Ala Falaki
9 days
They kept the demo content the same and only changed where it appeared in the prompt, then tested this across many tasks using models from 1.5B to 72B parameters. They measured accuracy gains, prediction flips, and confirmed the results with statistical significance tests. 3/5.
1
0
0
@NLPiation
Ala Falaki
9 days
A demo is an example input–output pair placed in the prompt to show the model how to solve a task. (In-Context Learning) This paper explores that how demo placement alone (independent of content) can change accuracy by up to 50% and flip nearly half of a model’s predictions. 2/5.
1
0
0
@NLPiation
Ala Falaki
9 days
📝 Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning. It's already known that the choice of demos (demonstrations) and their order affect how models perform. But, no one had really studied how the placement of demos in the prompt changes results. 1/5
Tweet media one
1
0
2
@NLPiation
Ala Falaki
16 days
ARAG outperforms both Vanilla RAG and Recency-based ranking, with up to 42% NDCG@5 and 35% Hit@5 gains in Clothing, and strong improvements in Electronics (~38%) and Home (~26%). Also, because it produces reasoning traces, it offers more transparent recommendations. 6/6.
1
0
0
@NLPiation
Ala Falaki
16 days
They showed that each agent contributes to performance: the User Understanding Agent boosts context relevance, the Context Summary Agent especially helps in style-driven domains, and the full ARAG setup delivers the best overall results. 5/6.
1
0
0
@NLPiation
Ala Falaki
16 days
The authors built ARAG as a blackboard-style multi-agent system, where agents share information through structured memory. The process works in four steps: first, a standard RAG retrieves candidate items; then the UUA summarizes preferences while the NLI Agent checks alignment;.
1
0
0
@NLPiation
Ala Falaki
16 days
They introduced four LLM (GPT-3.5) agents. A User Understanding Agent (UUA): summarizes preferences, an NLI Agent: checks alignment of items, a Context Summary Agent (CSA): condenses relevant evidence, and an Item Ranker Agent (IRA): produces the final personalized list. 3/6.
2
0
1
@NLPiation
Ala Falaki
16 days
It extends standard Retrieval-Augmented Generation (RAG) by introducing a multi-agent collaboration mechanism that allows LLMs to better capture user preferences (both long-term and session-level) and generate context-aware, personalized recommendations. 2/6.
1
0
0