Ronen Eldan @EldanRonen profile

Ronen Eldan

@EldanRonen

Followers

2,026

Following

115

Media

6

Statuses

86

Previously doing maths at @WeizmannScience , currently AI researcher at @MSFTResearch . Pretty good at loading a dishwasher.

Joined January 2020

Don't wanna be here? Send us removal request.

Explore tweets Explore followers Explore following

Explore trending content on Musk Viewer

Kendrick • 862459 Tweets

Drake • 679404 Tweets

Columbia • 614485 Tweets

Bayern • 368475 Tweets

Euphoria • 320763 Tweets

#الهلال_الاتحاد_كاس_الملك • 209252 Tweets

Kroos • 201562 Tweets

Vini • 132933 Tweets

Kane • 92607 Tweets

Sane • 87746 Tweets

Bellingham • 86457 Tweets

مدريد • 76870 Tweets

Musiala • 62758 Tweets

Bernabeu • 56254 Tweets

Kdot • 54103 Tweets

رونالدو • 47326 Tweets

سافيتش • 41950 Tweets

Tuchel • 41629 Tweets

للهلال • 35158 Tweets

Rodrygo • 31955 Tweets

Wacko • 31820 Tweets

Nadal • 31128 Tweets

Neuer • 28261 Tweets

#1MAYIS • 25980 Tweets

Modric • 22841 Tweets

سعود عبدالحميد • 22826 Tweets

الريال • 21656 Tweets

كوليبالي • 20312 Tweets

المعيوف • 20311 Tweets

كروس • 18433 Tweets

البايرن • 17918 Tweets

メーデー • 15665 Tweets

Ipswich • 14709 Tweets

Kim Min Jae • 13844 Tweets

八十八夜 • 10629 Tweets

الناقل الرسمي

Lehecka

トラウト

سمير عثمان

Galdino

Happy New Month

平田さん

Gracias Rafa

Merentiel

Mike Trout

Advincula

Zuqui

#متجر_عسوله_للعسل

#حركه_مالكوم_تعني_الحماس_والقوه

#عتق_رقبه_حسين_الرويلي

Last Seen Profiles

@UKCarDiscount

@MatmarS

@oAXQHlZu7nyvhRH

@nout410

@SearchlightUK

@SueARRRR

@yusme4

@hackinthebox

@turkifsa_2021

@Jose_Roberto77

@chuksp20

@froxelsy

@QBCCIntegrity

@PhilipDerrida

@SansegundoJm

@GoldFinancial1

@Rc1616

@KiiBeenLitt

@turkifsa_2021

@fuwa_tits

Pinned Tweet

Ronen Eldan

@EldanRonen

1 year

Will future LLMs be based almost entirely on synthetic training data? In a new paper, we introduce TinyStories, a dataset of short stories generated by GPT-3.5&4. We use it to train tiny LMs (< 10M params) that produce fluent stories and exhibit reasoning.

31

105

613

Ronen Eldan

@EldanRonen

10 months

High-quality synthetic datasets strike again. Following up on the technique of TinyStories (and many new ideas on top) at @MSFTResearch we curated textbook-quality training data for coding. The results beat our expectations. For skeptics- model will be on HF soon, give it a try.

Sebastien Bubeck

@SebastienBubeck

10 months

New LLM in town: ***phi-1 achieves 51% on HumanEval w. only 1.3B parameters & 7B tokens training dataset*** Any other >50% HumanEval model is >1000x bigger (e.g., WizardCoder from last week is 10x in model size and 100x in dataset size). How? ***Textbooks Are All You Need***

45

340

2K

11

39

257

Ronen Eldan

@EldanRonen

7 months

Can we make LLMs unlearn a subset of their training data? In a joint project with @markrussinovich , we took Llama2-7b and in 30 minutes of fine-tuning, made it forget the Harry Potter universe while keeping its performance on common benchmarks intact:

Who's Harry Potter? Making LLMs forget - Microsoft Research

Ronen Eldan (Microsoft Research) and Mark Russinovich (Azure) The Challenge of Unlearning in an AI Era Over the last few months, significant public attention has focused on a wide variety of...

www.microsoft.com

6

33

233

Ronen Eldan

@EldanRonen

8 months

Phi-1 is finally out, and there's a bonus: In addition to coding, we also synthesized (loads of) textbooks about... life. We're releasing *Phi-1.5*, a 1.3B model which outperforms llama2-7b in most common sense reasoning benchmarks. Avail on HF!

Textbooks Are All You Need II: phi-1.5 technical report

We continue the investigation into the power of smaller Transformer-based language models as initiated by \textbf{TinyStories} -- a 10 million parameter model that can produce coherent English --...

arxiv.org

6

28

186

Ronen Eldan

@EldanRonen

5 months

Grateful for the invitation to speak at "Current Developments in Mathematics" - a forum that exemplifies the pinnacle of mathematical discourse, jointly hosted by @Harvard and @MIT . Attached is my open letter addressing a difficult but necessary decision regarding it.

17

37

191

Ronen Eldan

@EldanRonen

8 months

@slashML @SebastienBubeck @allie_adg @suriyagnskr @suchenzang , thanks for the cool work! Any such scrutiny is perfectly welcome. Let's address each one of your points:

2

4

84

Ronen Eldan

@EldanRonen

6 months

Our new phi-2 model, trained on textbook-quality synthetic data (based on the textbooks-are-all-you-need / TinyStories approach), was just announced by @satyanadella at #MSIgnite .

Sebastien Bubeck

@SebastienBubeck

6 months

Microsoft💜Open Source + SLMs!!!!! We're so excited to announce our new *phi-2* model that was just revealed at #MSIgnite by @satyanadella ! At 2.7B size, phi-2 is much more robust than phi-1.5 and reasoning capabilities are greatly improved too. Perfect model to be fine-tuned!

18

87

491

6

9

56

Ronen Eldan

@EldanRonen

8 months

@suchenzang Thanks for the cool work! Any such scrutiny is perfectly welcome. Let's address each one of your points:

0

1

30

Ronen Eldan

@EldanRonen

10 months

Tune in to this new episode of @ThisAmerLife by @dkestenbaum , with the awesome @SebastienBubeck , @peteratmsr and @ecekamar who say profound things about GPT4, and also with me calling BS on my partner, comparing her to outdated LLMs.

Greetings, People Of Earth - This American Life

Humans encounter non-human intelligences of various kinds and try to make sense of them.

www.thisamericanlife.org

0

9

26

Ronen Eldan

@EldanRonen

10 months

If you're interested in TinyStories and also like to watch office furniture, this video is for you.

TinyStories by Ronen Eldan

How Small Can Language Models Be and Still Speak Coherent English? Ronen Eldan discusses his recent work with Yuanzhi Li to answer this question. Based on ht...

www.youtube.com

Sebastien Bubeck

@SebastienBubeck

10 months

New video by @EldanRonen on his TinyStories work w. Yuanzhi Li! Even Deep Learning experts such as @karpathy referred to the work as "inspiring", and I couldn't agree more. Anyone interested in understanding what's happening with LLMs should take a look.

1

31

112

1

2

23

Ronen Eldan

@EldanRonen

7 months

@markrussinovich The model's on Huggingface! We'd love it if you tried to break it by making it spit out Harry Potter content: Paper link:

Who's Harry Potter? Approximate Unlearning in LLMs

Large language models (LLMs) are trained on massive internet corpora that often contain copyrighted content. This poses legal and ethical challenges for the developers and users of these models,...

arxiv.org

3

1

23

Ronen Eldan

@EldanRonen

1 year

To be proactive about some anticipated critique or skepticism: First and foremost, the models are on @huggingface and you're more than welcome to test them yourself.

1

22

Ronen Eldan

@EldanRonen

1 year

Happy to see that #TinyStories made the list of trending datasets on @huggingface together with @BigCodeProject , @AnthropicAI , @MosaicML and others <3

clem 🤗

@ClementDelangue

1 year

Trending model, datasets & apps of the week on . Kudos to @BigCodeProject @MosaicML @databricks @togethercompute @AnthropicAI @deepfloydai and many others!

3

9

49

3

21

Ronen Eldan

@EldanRonen

1 year

Reasoning capabilities: We only argue that the models exhibit some basic reasoning, when put in correct context. Of course they are not comparable with the reasoning capabilities of LLMs. The main point is that our models still outperform much larger models in many aspects.

2

1

15

Ronen Eldan

@EldanRonen

10 months

@yoavgo If you read the paper you'll notice that the llm grading is done *in addition* to standard benchmarks, as another way to check that we're not overfitting to them. Moreover, llm grading overcomes pass/fail dichotomy which many times gives a low grade due to small mistakes.

1

0

15

Ronen Eldan

@EldanRonen

1 year

The dataset consists of stories that follow a common structure, and the models can only generate stories that adhere to it. For instance, you can't just query the model with a question, you need to frame it as a dialogue that could fit in a story.

2

13

Ronen Eldan

@EldanRonen

7 months

"Who better to write for an audience of small language models than large ones?" Pretty much the perfect way to summarize TinyStories (and the emerging related research direction), as put by @benbenbrubaker in a @QuantaMagazine piece about the paper:

Tiny Language Models Thrive With GPT-4 as a Teacher | Quanta Magazine

To better understand how neural networks learn to simulate writing, researchers trained simpler versions on synthetic children’s stories.

www.quantamagazine.org

2

3

12

Ronen Eldan

@EldanRonen

8 months

microsoft/phi-1_5 · Hugging Face

huggingface.co

1

3

12

Ronen Eldan

@EldanRonen

1 year

@KerbalFPV Thanks! We're working on sharing the training code (it's a pretty big mess at this point and using MSFT-specific stuff). But all in all the training is completely vanilla, you can just use the @huggingface trainer with a GPT-NEO architecture.

2

1

11

Ronen Eldan

@EldanRonen

3 years

@SebastienBubeck Another nice side to the story is that it shows that sometimes it's better for science that people upload a paper that has a good idea but also a mistake (even a major one they can't fix), rather than find the mistake, which here would have probably led to the idea being lost.

3

0

11

Ronen Eldan

@EldanRonen

1 year

@visarga Your estimates are pretty accurate. The dataset the models were trained on has 1.5M examples and 400M tokens, but I think you can do a few epochs over a smaller dataset and still get good results.

1

10

Ronen Eldan

@EldanRonen

11 months

I really enjoyed talking to @labenz about #TinyStories on @CogRev_Podcast . His questions were super insightful! (Pro tip: We're very slow speakers, you should probably listen at 1.5x speed)

The Cognitive Revolution Podcast

@CogRev_Podcast

11 months

[new episode] @labenz hosts @EldanRonen and Yuanzhi Li of @MSFTResearch to discuss TinyStories and what small datasets can teach us about how LMs work. This discussion is for anyone who wants to deepen their understanding + dive into reasoning, interpretability + emergence.

1

6

1

2

9

Ronen Eldan

@EldanRonen

6 months

Given all the amazing progress on AI in 2023, it's really cool to see that TinyStories and the Phi models are mentioned in the State of AI report by @nathanbenaich !

Nathan Benaich

@nathanbenaich

7 months

Beyond the excitement of the LLM vibesphere, researchers, including from @Microsoft have been exploring the possibility of small language models, finding that models trained with highly specialized datasets can rival 50x larger competitors.

2

3

85

1

0

8

Ronen Eldan

@EldanRonen

10 months

@yoavgo There's a reason that in coding exams/interviews, grades are not only based on whether the code passes unit tests, but rather a human reads through the code and determines the level of understanding reflected from it.

1

0

5

Ronen Eldan

@EldanRonen

10 months

@yoavgo @TheXeophon Indeed it's not the main point of the work. I didn't expect there'd be so many objections to it (in coding as well). Sure, it's not great as a universal benchmark, but when locally comparing between models, so far I didn't see any critique that I was not able to counter

1

0

4

Ronen Eldan

@EldanRonen

5 months

@labenz While the potential threat by AI is largely speculative, the possibility that social networks would play a major role in leading to a major war through polarization and spread of misinformation appears increasingly realistic. Yet we seem to be focusing more on the former.

1

0

4

Ronen Eldan

@EldanRonen

1 year

@smy20011 I'd say about 15k.

1

0

4

Ronen Eldan

@EldanRonen

1 year

@jacob_rintamaki We'll add an appendix with more details in next version. It's not easy to estimate the size of dataset that you would need. What's the number of tokens in a short robotics program, more or less?

2

1

4

Ronen Eldan

@EldanRonen

1 year

@MarkTan72526562 Depends what you try to achieve, it's a matter of width vs. depth, really. TinyStories has nontrivial depth, but is in a sense as narrow as possible.

0

4

Ronen Eldan

@EldanRonen

6 months

@JimRice1111 @markrussinovich Well RLHF guardrails are much easier to remove- just finetune the model to start any response with "Certainly, here...". On the other hand with unlearning you can just erase the knowledge related to lethal pathologens which would be much safer than RLHF

2

0

3

Ronen Eldan

@EldanRonen

7 months

@benbenbrubaker @QuantaMagazine Beautifully written by @benbenbrubaker - I can't think of many other pieces that both AI researchers and my grandmother would equally enjoy.

1

0

3

Ronen Eldan

@EldanRonen

1 year

"Are the models just memorizing stories or are they actually creating novel ones?" - We have a section in the paper that provides evidence that it's the latter. However, probably the best way to be convinced here is to simply interact with the models.

1

0

3

Ronen Eldan

@EldanRonen

6 months

@yar_vol @satyanadella Note that for phi-1.5 we mainly used GPT-3.5. Very soon I think we'll see open-source models that are good enough to produce high-quality training data (instruction finetuning should not be crucial if you prompt the model correctly)

0

2

Ronen Eldan

@EldanRonen

1 year

@ParrotRobot You can check the on huggingface. You need to use AutoModelForCausalLM.from_pretrained

Today I Learned for programmers

tiloid.com

1

0

2

Ronen Eldan

@EldanRonen

10 months

@yoavgo imo the bigger problem with humaneval and other commonly used benchmarks btw is that many questions have clearly been memorized (prime factorization appears hundreds of times in the training set, for example), and the distribution of the level of difficulty is not great.

2

0

2

Ronen Eldan

@EldanRonen

6 months

@TheXeophon @natolambert @satyanadella There are of course many other tricks besides asking for specific words to obtain a dataset that is both diverse and its samples have a "high educational value", but I think the TinyStories example gives a good idea for how it can be done.

0

2

Ronen Eldan

@EldanRonen

1 year

@generatorman_ai @abacaj Done! Thanks for pointing that out...

0

2

Ronen Eldan

@EldanRonen

10 months

@ZelaLabs Looks to me like a really good direction to try. I assume you mean <10M params? My hunch is that it would be able to capture good heuristics related to those grades, and that finetuning w/ reward model won't improve the model overall, but ->

2

0

2

Ronen Eldan

@EldanRonen

10 months

@ZelaLabs In that case my hunch is that the model will only find heuristics which are correlated with those grades, but will not be good enough to spot nuances (for example, determining if a story is "creative" is not a straightforward task at all...).

0

2

Ronen Eldan

@EldanRonen

3 years

@Gerikault @SebastienBubeck Not line by line, but at this point I'd be willing to place a bet in ratio 10:1 that it's correct.

0

2

Ronen Eldan

@EldanRonen

6 months

@DimitrisPapail @EranMalach @SebastienBubeck @arimorcos The hyperparams are written in the readme of TinyStories-33M on HF. Otherwise everything is the HF trainer default...

1

0

2

Ronen Eldan

@EldanRonen

10 months

@ZelaLabs It could for example improve the model's creativity on the expense of less consistency. One thing that I'm sure would work is "alignment" in the form of preventing the model from generating unhappy endings (for example) via a reward model.

0

1

Ronen Eldan

@EldanRonen

6 months

@davidmanheim @anderssandberg Very nice theoretical result, but do you think such rank one updates could make a model forget an entire topic without affecting its performance otherwise? Would love to see a PoC.

1

0

1

Ronen Eldan

@EldanRonen

7 months

...and super insightful comments by Chandra Bhagavatula, @IAmTimNguyen and Ellie Pavlick!

0

1

Ronen Eldan

@EldanRonen

6 months

@JimRice1111 @markrussinovich *pathogens

0

1

Ronen Eldan

@EldanRonen

6 months

@davidmanheim @anderssandberg There are only so many rank-1 perturbations you can do to a matrix w/o completely changing it, which makes me a bit skeptic that this is scalable to a large number of changes. Another problem that I see is that the lookup keys will not be specific enough causing collateral damage

1

0

1

Ronen Eldan

@EldanRonen

10 months

@yoavgo The corpus is available (as is mentioned multiple times in the paper) here: . The size should probably be explicitly in the paper, I agree.

roneneldan/TinyStories · Datasets at Hugging Face

huggingface.co

0

1

Ronen Eldan

@EldanRonen

10 months

@yoavgo @TheXeophon In fact, I'm willing to bet that within 3 years this methodology will be widely used

1

0

1

Ronen Eldan

@EldanRonen

1 year

@Comradealexwang Good idea, thanks!

0

1

Ronen Eldan

@EldanRonen

11 months

Spotify: Apple:

‎"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis: The Tiny Model...

‎Show "The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis, Ep The Tiny Model Revolution with Ronen Eldan and Yuanzhi Li of Microsoft Research - Jun 6, 2023

podcasts.apple.com

0

1

Ronen Eldan

@EldanRonen

10 months

@yoavgo I agree with all that. Nevertheless, having GPT-eval scores in addition to the unit tests can give useful information, especially if you use it to evaluate the effect of small tweaks to the architecture/dataset. For such comparisons, it doesn't matter so much if it's calibrated.

3

0

1

Ronen Eldan

@EldanRonen

6 months

@davidmanheim @anderssandberg Cool! If you're convinced that it's easy then I'm sure many would love to see a PoC.

1

0

1

Ronen Eldan

@EldanRonen

6 months

@JacquesThibs @BlancheMinerva @SebastienBubeck @markrussinovich We'll try to create a repo in a couple of weeks. In the meantime you can send me an email and I'll send the code.

1

0

1

Ronen Eldan

@EldanRonen

7 months

@andersonbcdefg @pratyushmaini Let me just point out that if phi-1.5 assigns zero probability to some tokens corresponding to HTML tags (like </TD> for example), this will shoot the perplexity to a very high value. This is another reason why high perplexity doesn't point to anything here.

0

1