
Daniel Israel
@danielmisrael
Followers
973
Following
167
Media
6
Statuses
43
“That’s one small [MASK] for [MASK], a giant [MASK] for mankind.” – [MASK] Armstrong. Can autoregressive models predict the next [MASK]? It turns out yes, and quite easily… .Introducing MARIA (Masked and Autoregressive Infilling Architecture).
1
8
23
RT @tungnd_13: 🚀 Introducing PhysiX: One of the first large-scale foundation models for physics simulations!. PhysiX is a 4.5B parameter mo….
0
256
0
RT @li78658171: (1/6)Our work Reflect-DiT was accepted to #ICCV2025 !.Reflect-DiT allows the model to reflect on its past generations and t….
0
23
0
RT @LucasBandarkar: The unreasonable effectiveness of model merging for cross-lingual transfer !.Our preprint evaluates a number of *modula….
arxiv.org
Large language models (LLMs) still struggle across tasks outside of high-resource languages. In this work, we investigate cross-lingual transfer to lower-resource languages where task-specific...
0
22
0
RT @li78658171: 📢(1/11)Diffusion LMs are fast and controllable at inference time! But why restrict such benefits for processing text data?….
0
40
0
RT @hbXNov: 📢Scaling test-time compute via generative verification (GenRM) is an emerging paradigm and shown to be more efficient than self….
0
51
0
RT @zileishao: What happens if we tokenize cat as [ca, t] rather than [cat]? . LLMs are trained on just one tokenization per word, but they….
0
3
0
RT @hbXNov: Video generative models hold the promise of being general-purpose simulators of the physical world 🤖 How far are we from this g….
0
23
0
RT @adityagrover_: A few months ago, we started Inception Labs, a new generative AI startup with a rockstar founding team. At Inception, w….
0
42
0
RT @siyan_zhao: Excited to release PrefEval (ICLR '25 Oral), a benchmark for evaluating LLMs’ ability to infer, memorize, and adhere to use….
0
27
0
Please check out the rest of the paper! We propose: how MARIA can be used for test time scaling, how to initialize MARIA weights for efficient training, how MARIA representations differ, and more…. Thanks to my advisors @adityagrover_ and @guyvdb .
arxiv.org
Historically, LLMs have been trained using either autoregressive (AR) or masked language modeling (MLM) objectives, with AR models gaining dominance in recent years. However, AR models are...
0
0
2
RT @iScienceLuvr: Enabling Autoregressive Models to Fill In Masked Tokens. Hybrid autoregressive and masked language model for infilling by….
0
15
0
I really enjoyed contributing to this project and am excited to share what we have built!.
Natively multimodal models unlock new possibilities for AI biomedical 🥼assistants, from answering questions about images to generating them for decision-making. Thrilled to introduce MedMax—an open sota multimodal model designed for diverse biomedical tasks and domains🩻
0
1
13
RT @benjiewang_cs: You have some model/knowledge (e.g. Bayes Net, Probabilistic/Logic Program, DB) and some query (e.g. MAP, Causal Adjustm….
0
3
0