Ala Falaki @NLPiation X Profile

Ala Falaki

@NLPiation

Followers

1K

Following

3K

Media

210

Statuses

1K

Ph.D. / NLP Researcher | Technical Editor @Towards_AI | Link to my monthly newsletter about NLP trends and blog👇

Joined July 2021

Don't wanna be here? Send us removal request.

Ala Falaki

@NLPiation

2 days

Ala Falaki

@NLPiation

2 days

How many times should I say it?. The future will be smaller, highly specialized models that excel in one domain. Specially for services like ChatGPT.

0

Ala Falaki

@NLPiation

2 days

How many times should I say it?. The future will be smaller, highly specialized models that excel in one domain. Specially for services like ChatGPT.

Ala Falaki

@NLPiation

2 days

📝 Beyond GPT-5: Making LLMs Cheaper and Better via Performance-Efficiency Optimized Routing. GPT-5 used test-time routing to balance efficiency and performance. This paper extends that with Avengers-Pro, which combine multiple LLMs to optimize the same trade-off. 1/5

0

Grok

@grok

5 days

Join millions who have switched to Grok.

217

426

3K

Ala Falaki

@NLPiation

2 days

Paper: Code:

1

0

Ala Falaki

@NLPiation

2 days

It can match GPT-5-medium’s performance at 27% less cost, and achieved 90% of its performance at 63% less cost. Additionally, always gave the best accuracy for any given cost and the lowest cost for any given accuracy, outperforming every single model tested. 5/5.

1

0

Ala Falaki

@NLPiation

2 days

For a new query during inference, it embeds, finds nearest clusters, and routes to the model with the best pre-computed score. It suggests that the future of LLMs may lie in orchestrating multiple specialized cost-effective models instead of one big general purpose model. 4/5.

1

0

Ala Falaki

@NLPiation

2 days

To train the system, they build clusters of queries, then evaluate each model’s accuracy (performance) and cost (efficiency) within each cluster. From this, they compute normalized "performance–efficiency scores" per model–cluster pair. 3/5.

1

0

Ala Falaki

@NLPiation

2 days

Instead of sending queries to just one of two models, Avengers-Pro embeds queries, clusters them by semantic similarity, and then routes each query to the model with the best performance-efficiency score, balancing accuracy and cost through a trainable trade-off parameter. 2/5.

1

0

Ala Falaki

@NLPiation

2 days

📝 Beyond GPT-5: Making LLMs Cheaper and Better via Performance-Efficiency Optimized Routing. GPT-5 used test-time routing to balance efficiency and performance. This paper extends that with Avengers-Pro, which combine multiple LLMs to optimize the same trade-off. 1/5

1

0

Ala Falaki

@NLPiation

9 days

Paper:

arxiv.org

In-context learning (ICL) is a critical emerging capability of large language models (LLMs), enabling few-shot learning during inference by including a few demonstrations (demos) in the prompt....

0

Ala Falaki

@NLPiation

9 days

Interestingly, optimal positions vary, smaller models prefer ssp/esp, while very large models (like, LLaMA3-70B) often benefit from demos closer to the query (sum = start of user message). The optimal demo placement is task- and model-dependent. 5/5.

1

0

Ala Falaki

@NLPiation

9 days

Placing demos early in the prompt (ssp = start of system prompt, or esp = end of system prompt) usually gives more accurate and stable results than later positions. Late placement (eum = end of user message) makes models unstable, flipping many answers and hurting accuracy. 4/5.

1

0

Ala Falaki

@NLPiation

9 days

They kept the demo content the same and only changed where it appeared in the prompt, then tested this across many tasks using models from 1.5B to 72B parameters. They measured accuracy gains, prediction flips, and confirmed the results with statistical significance tests. 3/5.

1

0

Ala Falaki

@NLPiation

9 days

A demo is an example input–output pair placed in the prompt to show the model how to solve a task. (In-Context Learning) This paper explores that how demo placement alone (independent of content) can change accuracy by up to 50% and flip nearly half of a model’s predictions. 2/5.

1

0

Ala Falaki

@NLPiation

9 days

📝 Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning. It's already known that the choice of demos (demonstrations) and their order affect how models perform. But, no one had really studied how the placement of demos in the prompt changes results. 1/5

1

0

2

Ala Falaki

@NLPiation

16 days

Paper:

arxiv.org

Retrieval-Augmented Generation (RAG) has shown promise in enhancing recommendation systems by incorporating external context into large language model prompts. However, existing RAG-based...

0

Ala Falaki

@NLPiation

16 days

ARAG outperforms both Vanilla RAG and Recency-based ranking, with up to 42% NDCG@5 and 35% Hit@5 gains in Clothing, and strong improvements in Electronics (~38%) and Home (~26%). Also, because it produces reasoning traces, it offers more transparent recommendations. 6/6.

1

0

Ala Falaki

@NLPiation

16 days

They showed that each agent contributes to performance: the User Understanding Agent boosts context relevance, the Context Summary Agent especially helps in style-driven domains, and the full ARAG setup delivers the best overall results. 5/6.

1

0

Ala Falaki

@NLPiation

16 days

The authors built ARAG as a blackboard-style multi-agent system, where agents share information through structured memory. The process works in four steps: first, a standard RAG retrieves candidate items; then the UUA summarizes preferences while the NLI Agent checks alignment;.

1

0

Ala Falaki

@NLPiation

16 days

They introduced four LLM (GPT-3.5) agents. A User Understanding Agent (UUA): summarizes preferences, an NLI Agent: checks alignment of items, a Context Summary Agent (CSA): condenses relevant evidence, and an Item Ranker Agent (IRA): produces the final personalized list. 3/6.

2

0

1

Ala Falaki

@NLPiation

16 days

It extends standard Retrieval-Augmented Generation (RAG) by introducing a multi-agent collaboration mechanism that allows LLMs to better capture user preferences (both long-term and session-level) and generate context-aware, personalized recommendations. 2/6.

1

0