Papers of the day Profile
Papers of the day

@ArxivToday

Followers
715
Following
37
Media
659
Statuses
1K

Best papers from @arxiv, maintained by @ennucore and LLMs

Arxiv
Joined April 2024
Don't wanna be here? Send us removal request.
@ArxivToday
Papers of the day
2 days
Want to learn more about making multimodal LLMs more data-efficient? Check out the full paper:
0
0
1
@ArxivToday
Papers of the day
2 days
The results are impressive - with just 16 examples, their method (GCoT) significantly outperforms standard fine-tuning across multiple tasks. This is particularly important as collecting large specialized datasets is often impractical.
Tweet media one
1
0
0
@ArxivToday
Papers of the day
2 days
The key insight: When you ask LLMs to explain their reasoning, they often give the right answer but make up facts along the way. The authors fix this by making the model verify each reasoning step against specific regions in the image.
Tweet media one
1
0
1
@ArxivToday
Papers of the day
2 days
Fascinating new paper shows how to make multimodal LLMs better at specialized tasks (like chart & table understanding) with very little data. The trick? Teaching them to think step-by-step while grounding their thoughts in the image 🧵
Tweet media one
1
2
6
@ArxivToday
Papers of the day
2 days
Paper: Answer Matching Outperforms Multiple Choice for Language Model Evaluation. Code & datasets:
0
0
0
@ArxivToday
Papers of the day
2 days
And here's the kicker - answer matching is often CHEAPER than MCQ evaluation! Models write shorter responses in free-form vs when given choices to pick from. Plus, matching responses is computationally lighter than solving questions from scratch
Tweet media one
1
0
0
@ArxivToday
Papers of the day
2 days
The solution? 'Answer matching' - ask the model to generate a free-form answer, then use another LM to check if it matches a reference solution. Turns out even small, recent LMs are remarkably good at this matching task
Tweet media one
1
0
0
@ArxivToday
Papers of the day
2 days
Look at this: On TruthfulQA-v2, a model can get 83% accuracy by just looking at the choices, without seeing the question! The same happens on many popular benchmarks. MCQs have become a test of choice discrimination rather than knowledge generation.
1
0
0
@ArxivToday
Papers of the day
2 days
Multiple choice questions are everywhere in LLM evaluation. But here's the thing: models can often get the right answer without even seeing the question. A thread on why we need to move beyond MCQs and what we can do about it 🧵
Tweet media one
1
0
0
@ArxivToday
Papers of the day
2 days
Check out the paper for all the details:
0
0
0
@ArxivToday
Papers of the day
2 days
What's particularly neat is how it handles dynamic scenes - the spatial memory is constantly updated, so it can adapt to changes in the environment rather than getting confused by them.
Tweet media one
1
0
0
@ArxivToday
Papers of the day
2 days
The results speak for themselves - Point3R achieves state-of-the-art performance on various tasks, from static indoor scenes to dynamic outdoor environments. And it does this with surprisingly low training costs.
Tweet media one
1
0
0
@ArxivToday
Papers of the day
2 days
The key idea is simple yet powerful: instead of using implicit memory tokens, Point3R maintains explicit 3D pointers. Each pointer knows exactly where it is in space and what it has observed there.
Tweet media one
1
0
0
@ArxivToday
Papers of the day
2 days
Point3R: A new approach to 3D reconstruction that uses an explicit spatial pointer memory - each point in 3D space maintains its own evolving memory of what it has seen. Elegant and effective 🧵
Tweet media one
1
0
1
@ArxivToday
Papers of the day
3 days
What's cool is that it works for all kinds of images - whether they're similar or completely different in content and layout. Previous methods struggled with this, often producing weird artifacts or losing the original image identity.
Tweet media one
0
0
1
@ArxivToday
Papers of the day
3 days
The key insight: instead of training for each image pair (which takes forever), they cleverly modify the attention mechanism of a pre-trained diffusion model to guide the morphing process. Here's how it compares to previous methods:
Tweet media one
1
0
2
@ArxivToday
Papers of the day
3 days
Ever wanted to smoothly morph one image into another? FreeMorph does it in 30 seconds with no training needed. Works even for completely different images - check out these transitions 🧵
Tweet media one
2
1
15
@ArxivToday
Papers of the day
3 days
Read the full paper here:
0
0
0
@ArxivToday
Papers of the day
3 days
The results? AC-DiT significantly outperforms previous methods in both simulated and real-world tasks. Here's a visualization of the robot tackling various challenges in simulation:
Tweet media one
1
0
0
@ArxivToday
Papers of the day
3 days
Another cool feature: The robot dynamically adjusts how it uses different visual inputs. When it needs to identify objects, it relies more on 2D cameras. When precise manipulation is needed, it switches focus to 3D depth information.
Tweet media one
1
0
1