Daniel Israel Profile
Daniel Israel

@danielmisrael

Followers
973
Following
167
Media
6
Statuses
43

PhD Student Studying AI/ML @UCLA

Joined October 2011
Don't wanna be here? Send us removal request.
@danielmisrael
Daniel Israel
7 months
“That’s one small [MASK] for [MASK], a giant [MASK] for mankind.” – [MASK] Armstrong. Can autoregressive models predict the next [MASK]? It turns out yes, and quite easily… .Introducing MARIA (Masked and Autoregressive Infilling Architecture).
1
8
23
@danielmisrael
Daniel Israel
2 months
RT @abeirami: Had the pleasure of learning about TRACE by Gwen Yidou-Weng, Benjie Wang, and @guyvdb at ICML!. It view alignment/controlled….
0
12
0
@danielmisrael
Daniel Israel
2 months
RT @tungnd_13: 🚀 Introducing PhysiX: One of the first large-scale foundation models for physics simulations!. PhysiX is a 4.5B parameter mo….
0
256
0
@danielmisrael
Daniel Israel
2 months
RT @li78658171: (1/6)Our work Reflect-DiT was accepted to #ICCV2025 !.Reflect-DiT allows the model to reflect on its past generations and t….
0
23
0
@danielmisrael
Daniel Israel
3 months
RT @LucasBandarkar: The unreasonable effectiveness of model merging for cross-lingual transfer !.Our preprint evaluates a number of *modula….
Tweet card summary image
arxiv.org
Large language models (LLMs) still struggle across tasks outside of high-resource languages. In this work, we investigate cross-lingual transfer to lower-resource languages where task-specific...
0
22
0
@danielmisrael
Daniel Israel
3 months
RT @li78658171: 📢(1/11)Diffusion LMs are fast and controllable at inference time! But why restrict such benefits for processing text data?….
0
40
0
@danielmisrael
Daniel Israel
5 months
RT @hbXNov: 📢Scaling test-time compute via generative verification (GenRM) is an emerging paradigm and shown to be more efficient than self….
0
51
0
@danielmisrael
Daniel Israel
6 months
RT @zileishao: What happens if we tokenize cat as [ca, t] rather than [cat]? . LLMs are trained on just one tokenization per word, but they….
0
3
0
@danielmisrael
Daniel Israel
6 months
RT @hbXNov: Video generative models hold the promise of being general-purpose simulators of the physical world 🤖 How far are we from this g….
0
23
0
@danielmisrael
Daniel Israel
6 months
RT @adityagrover_: A few months ago, we started Inception Labs, a new generative AI startup with a rockstar founding team. At Inception, w….
0
42
0
@danielmisrael
Daniel Israel
6 months
RT @siyan_zhao: Excited to release PrefEval (ICLR '25 Oral), a benchmark for evaluating LLMs’ ability to infer, memorize, and adhere to use….
0
27
0
@danielmisrael
Daniel Israel
7 months
Please check out the rest of the paper! We propose: how MARIA can be used for test time scaling, how to initialize MARIA weights for efficient training, how MARIA representations differ, and more…. Thanks to my advisors @adityagrover_ and @guyvdb .
Tweet card summary image
arxiv.org
Historically, LLMs have been trained using either autoregressive (AR) or masked language modeling (MLM) objectives, with AR models gaining dominance in recent years. However, AR models are...
0
0
2
@danielmisrael
Daniel Israel
7 months
MARIA 1B achieves the best throughput numbers, and MARIA 7B achieves similar throughput to DiffuLlama, but better samples as previously noted. Here, we see that ModernBERT despite being much smaller does not scale well for masked infilling because it cannot KV cache.
Tweet media one
1
0
0
@danielmisrael
Daniel Israel
7 months
We perform infilling on downstream data with words masked 50 percent. Using GPT4o-mini as a judge we compute the ELO scores for each model respectively. MARIA 7B and 1B have the highest rating ELO rating under the Bradley-Terry model.
Tweet media one
1
0
1
@danielmisrael
Daniel Israel
7 months
MARIA achieves far better perplexity than just using ModernBERT autoregressively and discrete diffusion models on downstream masked infilling test sets. Based on parameter counts, MARIA presents the most effective way to scale models for masked token infilling.
Tweet media one
1
0
2
@danielmisrael
Daniel Israel
7 months
We can get the best of both worlds with MARIA: train a linear decoder to combine the hidden states of an AR and MLM model. This enables AR masked infilling with the advantages of a more scalable AR architecture, such as KV cached inference. We combine OLMo and ModernBERT.
Tweet media one
1
0
0
@danielmisrael
Daniel Israel
7 months
Autoregressive (AR) LMs are more compute efficient to train than Masked LMs (MLM), which compute a loss on some fixed ratio e.g. 30% of the tokens instead of 100% like AR. Unlike MLM, AR models can also KV cache at inference time, but they cannot infill masked tokens.
Tweet media one
1
0
1
@danielmisrael
Daniel Israel
7 months
RT @iScienceLuvr: Enabling Autoregressive Models to Fill In Masked Tokens. Hybrid autoregressive and masked language model for infilling by….
0
15
0
@danielmisrael
Daniel Israel
9 months
I really enjoyed contributing to this project and am excited to share what we have built!.
@hbXNov
Hritik Bansal
9 months
Natively multimodal models unlock new possibilities for AI biomedical 🥼assistants, from answering questions about images to generating them for decision-making. Thrilled to introduce MedMax—an open sota multimodal model designed for diverse biomedical tasks and domains🩻
Tweet media one
0
1
13
@danielmisrael
Daniel Israel
9 months
RT @benjiewang_cs: You have some model/knowledge (e.g. Bayes Net, Probabilistic/Logic Program, DB) and some query (e.g. MAP, Causal Adjustm….
0
3
0