
Ilia Kulikov
@uralik1
Followers
520
Following
696
Media
23
Statuses
118
RT @jaseweston: 🥥🌪️ Introducing CoCoMix - a LLM pretraining framework that predicts concepts and mixes them into its hidden state to improv….
0
59
0
We are using fairseq2 for llm post-training research in our team. This release comes with a decent documentation ( 😅 My favorite feature of the lib is the runtime extension support: one can develop research code without forking out the entire lib repo!.
👋 Hello world! We’re thrilled to announce the v0.4 release of fairseq2 — an open-source library from FAIR powering many projects at Meta. pip install fairseq2 and explore our trainer API, instruction & preference finetuning (up to 70B), and native vLLM integration.
0
0
7
Interested in LLM inference algorithms? Please come and watch our tutorial next week! .
Curious about inference-time scaling, the #1 trending topic in LLMs?. Come to our NeurIPS tutorial: Beyond Decoding: Meta-Generation Algorithms for LLMs (Tue. @ 1:30)!.
0
0
23
@kchonyc The proposed regularization expands the dynamic range of the prob and rank of eos when it is not supposed to be. We see improvements in translation quality when large beam sizes (up to 1000) are used. But, the gap between performance w/ smaller and larger beam is still there!.2/n.
1
0
4
🚨new research!.Probability of short sequences tend to be too high with autoregressive NMT (and beyond). We quantified this tendency and defined the oversmoothing rate. We minimize its upper bound, oversmoothing loss, and present our findings!.w/ Maksim Eremeev and @kchonyc !.1/n
2
8
82
Apparently ancestral sampling yields high quality translations if we sample enough number of times, but how to choose one of them in the end? @BryanEikema shows how to scale utility computations over large hypotheses spaces efficiently! very cool.
Check out our latest work on minimum Bayes risk decoding for NMT! We show that MBR is a robust decision rule and sampling-based approximations scale well with more computation. Unlike MAP, more computation always improves translation quality. paper: 1/4
0
0
2
Poster #90 in gather town, I am there *now*!.
🚨Our new research at SPNLP21 workshop!🚨 .Done with @wellecks & @kchonyc. We worked out a way to measure the mode mismatch on the sequence level between distributions along the sequence modeling pipeline. But it is not easy to use it with big models in real world tasks. (1/6)
1
1
5