sileod Profile
sileod

@dmnsl1

Followers
205
Following
13K
Media
12
Statuses
452

- Tenured Researcher @ Inria Lille - Reasoning data artisan - Author of reasoning_core, tasksource, and logic datasets

Lille
Joined September 2020
Don't wanna be here? Send us removal request.
@dmnsl1
sileod
26 days
3/3 Instead of games/puzzles, we generate problems in: -PDDL planning (random domains!) -Expressive first-order logic -Grammar parsing/analysis -TPTP Math reasoning -System equation solving and more Data: 📊 https://t.co/ahzqtfnyXW + prime intellect environment hub @willccbb
2
1
17
@dmnsl1
sileod
26 days
2/ Meet Reasoning Core ◉: a scalable suite of RL environments for LLM symbolic reasoning. Procedurally generated, solver‑verified, and built to feed RLVR with a steady stream of diverse, rigorous problems. 📜
Tweet card summary image
arxiv.org
We introduce Reasoning Core, a new scalable environment for Reinforcement Learning with Verifiable Rewards (RLVR), designed to advance foundational symbolic reasoning in Large Language Models...
2
7
56
@dmnsl1
sileod
26 days
1/ Envs are all the rage for LLM training. They promise virtually unlimited data. But they need to teach general capabilities, not specific games. RLVR environments need to be:general and expressive, backbones for reasoning. 🧵👇
3
17
101
@TroncheBiais
La Tronche en Biais
1 month
Emission spéciale mercredi 9 septembre, 20h. Luc JULIA, présenté comme « co-créateur de Siri », est considéré comme un « expert mondialement reconnu », parfois même comme « le pape de l’intelligence artificielle ». https://t.co/wO7XqPGCsr
9
15
109
@dmnsl1
sileod
2 months
J'ai lu le dernier livre de Luc Julia. Expert, pas expert ? Faites vous votre propre opinion, avec 15 passages commentés https://t.co/3sD0346mgS #LucJulia #IA La ref: IA génératives, pas créatives. I'intelligence artificielle n'existe (toujours) pas
10
6
51
@rohanpaul_ai
Rohan Paul
1 year
Gemini 1.5 Pro is awesome with the MASSIVE 2 million tokens. But the context window does not mean that the model can see everything. 📌 LLMs can suggest missing elements from item lists, useful for recommendations, but performance degrades with too many items (around 100 for
3
33
213
@ShayneRedford
Shayne Longpre
1 year
✨New Preprint ✨ How are shifting norms on the web impacting AI? We find: 📉 A rapid decline in the consenting data commons (the web) ⚖️ Differing access to data by company, due to crawling restrictions (e.g.🔻26% OpenAI, 🔻13% Anthropic) ⛔️ Robots.txt preference protocols
12
94
234
@dmnsl1
sileod
1 year
Hi everyone Today I’m releasing Tasksource-DPO-pairs, a DPO dataset constructed from the tasksource collection of expert-constructed data (not LLM-generated) with a focus on capabilities (reasoning, linguistic, nli, emotion...) https://t.co/sRdryQE1m2
Tweet card summary image
huggingface.co
1
0
3
@nihit_desai
Nihit Desai
1 year
Huge shoutout (apologies if I'm missing anyone) to creators of FLAN and Tasksource dataset collections (@EnricoShippole, @ShayneRedford, Damien), the open source community (@huggingface, @AIatMeta, @NousResearch, Axolotl, @predibase lorax, @vllm_project, @MSFTDeepSpeed), and our
1
3
14
@dmnsl1
sileod
2 years
It should be already possible with UL2
1
0
3
@ShayneRedford
Shayne Longpre
2 years
📢 We're planning to rapidly expand the coverage of finetuning datasets in the Data Provenance Initiative. ➡️Our Goal: provide detailed source+license annotations for all popular instruct/align datasets. 🌟If you'd like to contribute, reach out!🌟 1/
@ShayneRedford
Shayne Longpre
2 years
📢Announcing the🌟Data Provenance Initiative🌟 🧭A rigorous public audit of 1800+ instruct/align datasets 🔍Explore/filter sources, creators & license conditions ⚠️We see a rising divide between commercially open v closed licensed data 🌐: https://t.co/guvIMARwJB 1/
4
11
31
@nihit_desai
Nihit Desai
2 years
Huge shoutout (apologies if I'm missing anyone) to creators of FLAN and Tasksource dataset collections (@EnricoShippole, @ShayneRedford, Damien), the open source LLM community (@huggingface, @woosuk_k @Nazeri2010, @AIatMeta, @NousResearch), and our infra partners @MosaicML ,
0
1
14
@ShayneRedford
Shayne Longpre
2 years
📢Announcing the🌟Data Provenance Initiative🌟 🧭A rigorous public audit of 1800+ instruct/align datasets 🔍Explore/filter sources, creators & license conditions ⚠️We see a rising divide between commercially open v closed licensed data 🌐: https://t.co/guvIMARwJB 1/
10
147
454
@EnricoShippole
Enrico Shippole
2 years
Releasing Tasksource-Open-Llama-13b, an OpenLLaMA model fine-tuned on the Tasksource instruction dataset.
1
23
98
@reddit_ml
/r/ML Popular
2 years
[R] MindGames: Targeting Theory of Mind in Large Language Models with Dynamic Epistemic Modal Logic
0
1
1