sileod @dmnsl1 X Profile

sileod

@dmnsl1

Followers

205

Following

13K

Media

12

Statuses

452

- Tenured Researcher @ Inria Lille - Reasoning data artisan - Author of reasoning_core, tasksource, and logic datasets

https://t.co/TAqnwID8QZ

Lille

Joined September 2020

Don't wanna be here? Send us removal request.

sileod

@dmnsl1

26 days

3/3 Instead of games/puzzles, we generate problems in: -PDDL planning (random domains!) -Expressive first-order logic -Grammar parsing/analysis -TPTP Math reasoning -System equation solving and more Data: 📊 https://t.co/ahzqtfnyXW + prime intellect environment hub @willccbb

2

1

17

sileod

@dmnsl1

26 days

2/ Meet Reasoning Core ◉: a scalable suite of RL environments for LLM symbolic reasoning. Procedurally generated, solver‑verified, and built to feed RLVR with a steady stream of diverse, rigorous problems. 📜

arxiv.org

We introduce Reasoning Core, a new scalable environment for Reinforcement Learning with Verifiable Rewards (RLVR), designed to advance foundational symbolic reasoning in Large Language Models...

2

7

56

sileod

@dmnsl1

26 days

1/ Envs are all the rage for LLM training. They promise virtually unlimited data. But they need to teach general capabilities, not specific games. RLVR environments need to be:general and expressive, backbones for reasoning. 🧵👇

3

17

101

La Tronche en Biais

@TroncheBiais

1 month

Emission spéciale mercredi 9 septembre, 20h. Luc JULIA, présenté comme « co-créateur de Siri », est considéré comme un « expert mondialement reconnu », parfois même comme « le pape de l’intelligence artificielle ». https://t.co/wO7XqPGCsr

9

15

109

sileod

@dmnsl1

2 months

J'ai lu le dernier livre de Luc Julia. Expert, pas expert ? Faites vous votre propre opinion, avec 15 passages commentés https://t.co/3sD0346mgS #LucJulia #IA La ref: IA génératives, pas créatives. I'intelligence artificielle n'existe (toujours) pas

10

6

51

Rohan Paul

@rohanpaul_ai

1 year

Gemini 1.5 Pro is awesome with the MASSIVE 2 million tokens. But the context window does not mean that the model can see everything. 📌 LLMs can suggest missing elements from item lists, useful for recommendations, but performance degrades with too many items (around 100 for

3

33

213

Shayne Longpre

@ShayneRedford

1 year

✨New Preprint ✨ How are shifting norms on the web impacting AI? We find: 📉 A rapid decline in the consenting data commons (the web) ⚖️ Differing access to data by company, due to crawling restrictions (e.g.🔻26% OpenAI, 🔻13% Anthropic) ⛔️ Robots.txt preference protocols

12

94

234

sileod

@dmnsl1

1 year

Hi everyone Today I’m releasing Tasksource-DPO-pairs, a DPO dataset constructed from the tasksource collection of expert-constructed data (not LLM-generated) with a focus on capabilities (reasoning, linguistic, nli, emotion...) https://t.co/sRdryQE1m2

huggingface.co

1

0

3

Nihit Desai

@nihit_desai

1 year

Huge shoutout (apologies if I'm missing anyone) to creators of FLAN and Tasksource dataset collections (@EnricoShippole, @ShayneRedford, Damien), the open source community (@huggingface, @AIatMeta, @NousResearch, Axolotl, @predibase lorax, @vllm_project, @MSFTDeepSpeed), and our

1

3

14

sileod

@dmnsl1

2 years

It should be already possible with UL2

1

0

3

Shayne Longpre

@ShayneRedford

2 years

📢 We're planning to rapidly expand the coverage of finetuning datasets in the Data Provenance Initiative. ➡️Our Goal: provide detailed source+license annotations for all popular instruct/align datasets. 🌟If you'd like to contribute, reach out!🌟 1/

Shayne Longpre

@ShayneRedford

2 years

📢Announcing the🌟Data Provenance Initiative🌟 🧭A rigorous public audit of 1800+ instruct/align datasets 🔍Explore/filter sources, creators & license conditions ⚠️We see a rising divide between commercially open v closed licensed data 🌐: https://t.co/guvIMARwJB 1/

4

11

31

Nihit Desai

@nihit_desai

2 years

Huge shoutout (apologies if I'm missing anyone) to creators of FLAN and Tasksource dataset collections (@EnricoShippole, @ShayneRedford, Damien), the open source LLM community (@huggingface, @woosuk_k @Nazeri2010, @AIatMeta, @NousResearch), and our infra partners @MosaicML ,

0

1

14

Shayne Longpre

@ShayneRedford

2 years

📢Announcing the🌟Data Provenance Initiative🌟 🧭A rigorous public audit of 1800+ instruct/align datasets 🔍Explore/filter sources, creators & license conditions ⚠️We see a rising divide between commercially open v closed licensed data 🌐: https://t.co/guvIMARwJB 1/

10

147

454

Enrico Shippole

@EnricoShippole

2 years

Releasing Tasksource-Open-Llama-13b, an OpenLLaMA model fine-tuned on the Tasksource instruction dataset.

1

23

98

/r/ML Popular

@reddit_ml

2 years

[R] MindGames: Targeting Theory of Mind in Large Language Models with Dynamic Epistemic Modal Logic

0

1