Cody Blakeney @code_star profile

Cody Blakeney

@code_star

Followers

3,409

Following

860

Media

788

Statuses

11,878

Head of Data Research @MosaicML / @databricks | Formerly Visiting Researcher @ Facebook | Ph.D | #TXSTFOOTBALL fan |

Brooklyn, NY

Joined August 2011

Don't wanna be here? Send us removal request.

Explore tweets Explore followers Explore following

Explore trending content on Musk Viewer

#GranHermano • 150919 Tweets

スタンプ • 145431 Tweets

#JACKANDJOKERQ1 • 113234 Tweets

GPT-4o • 112232 Tweets

#WWERaw • 89269 Tweets

Luka • 72679 Tweets

Dallas • 47014 Tweets

Mavs • 31538 Tweets

Diniz • 27239 Tweets

Shai • 25332 Tweets

Dort • 21258 Tweets

Gunther • 18873 Tweets

#NarcoCandidataClaudia60 • 17171 Tweets

スナック • 14877 Tweets

スクエニ • 14381 Tweets

Manny • 13480 Tweets

Jey Uso • 13147 Tweets

書類送検 • 12909 Tweets

Mavericks • 11578 Tweets

Chet • 10915 Tweets

政府税調

宇多田ヒカル

Felices 125

Jalen Williams

Hardaway

トラネキサム酸

茨城・東海

Wyatt Johnston

ニューカッスル

草花好き

로즈데이

モグコレ

アニカフェ

配偶者控除見直し

Cason Wallace

Galaxy A55 5G

違法ケシ発見

Giddey

ポピーちゃん

デーゲーム

スタトレ

$WSDM

ゲリラライブ

シャークさん

Jリーグカレー

風の行方

PJ Washington

サマソニ

鳥貴族アジュナイス

#ThunderUp

Last Seen Profiles

@loveyouamberliu

@AUAgEcon

@DavidCarvalhoC

@spratleydenise

@duniaaaJILBAB

@caglaroktay

@KidsHats

@fhingfhi

@tanniearchivesx

@Torri_Co

@valucemm

@idkbria

@DillonAndersX

@_Glen72

@Letycany

@enflyingpen

@_KaylaBrantley

@giannaisabellaa

@alfredlewis766

@RiccovaV

Pinned Tweet

Cody Blakeney

@code_star

2 months

It’s finally here 🎉🥳 In case you missed us, MosaicML/ Databricks is back at it, with a new best in class open weight LLM named DBRX. An MoE with 132B total parameters and 32B active 32k context length and trained for 12T tokens 🤯

28

130

833

Cody Blakeney

@code_star

2 years

I decided to turn my error into an image with stable diffusion. That seems about right.

23

449

4K

Cody Blakeney

@code_star

2 years

I'm taking a graduate-level stats class for the first time. I now understand why all the stats people are mad at all the deep learning people.

63

169

3K

Cody Blakeney

@code_star

2 years

The next wave of startups seems to be PhD Students dropping out to build MLOps companies because they got good at training models and that turned out to be more valuable than their actual research

27

156

2K

Cody Blakeney

@code_star

4 months

People I work with have called me a “boomer” because I used tensorflow at the beginning of my PhD

47

25

889

Cody Blakeney

@code_star

2 years

Everyday an AI researcher runs a hyper parameter sweep and half or more of the runs essentially equate to lighting a pile of cash on fire.

14

27

462

Cody Blakeney

@code_star

2 months

Me at work for the past 2 weeks

Cody Blakeney

@code_star

2 months

It’s finally here 🎉🥳 In case you missed us, MosaicML/ Databricks is back at it, with a new best in class open weight LLM named DBRX. An MoE with 132B total parameters and 32B active 32k context length and trained for 12T tokens 🤯

28

130

833

2

13

391

Cody Blakeney

@code_star

2 years

@gdequeiroz The best way I can articulate it is they care deeply about (or have worked hard at) proofs of things DL people just throw away. After several pages proving if you have an unbiased estimator of a parameter it's pretty annoying to see someone just doing a hyperparameter sweep.

6

5

349

Cody Blakeney

@code_star

1 month

High alpha in looking at the data

9

16

268

Cody Blakeney

@code_star

8 months

Ok, but hear me out. a 7B model with the same performance as a 67B model is worth 7837x as much.

Sherjil Ozair

@sherjilozair

8 months

On small overtrained models 💪 To reach the loss of a 67B model, - A 33B model needs 2.3x compute 🚀 - A 13B model needs 25x compute 🤔 - A 7B model needs 7837x compute 🤡 - A 3B model can't match the 67B. Ever. 🪦 With love, from Chinchilla scaling laws.

35

52

488

15

14

226

Cody Blakeney

@code_star

11 months

I'm pretty sure the reason LLMs are not "funny" is that it specifically goes against their programming. Good jokes typically subvert our expectations. Which is the opposite of what autoregressively maximizing the highest likelihood next token is designed for.

36

14

214

Cody Blakeney

@code_star

2 months

Day in the life of an LLM researcher

2

11

216

Cody Blakeney

@code_star

1 year

TBF its pretty scarring

3

22

207

Cody Blakeney

@code_star

11 months

3

28

204

Cody Blakeney

@code_star

10 months

@maxisawesome538 @TylerAlterman Alpha meme -> zoomer news (duet) -> boomer news (actual news) -> millennial news (twitter)

2

5

192

Cody Blakeney

@code_star

3 years

I strongly believe that understanding how pruning/distillation works is the key to understanding how all neural networks work in general. I'm far less interested in "how many weights can we remove?" and more interested in "why heck can we remove them in the first place?!"

8

22

188

Cody Blakeney

@code_star

11 months

You asked for it and we listened! Today we are proud to announce the release of open-source MPT-30B. Same great architecture 1T tokens and now with 8k (and beyond) context! Try it now on our hugging face space.

MPT-30B-Chat - a Hugging Face Space by mosaicml

huggingface.co

8

33

183

Cody Blakeney

@code_star

3 months

SF is probably the only place on the planet you can be at a bar talking about tokenizers, and hear further down the bar someone else also talking about tokenizers. Some people love this, some people loathe this.

15

5

176

Cody Blakeney

@code_star

6 months

That’s not entirely true. We released an open source 30B model, described in great detail the data used to train it, and the framework to train it. Just add GPUs. Of course if you pay us, we make dealing with the infra much easier 😉

Boris Dayma 🖍️

@borisdayma

6 months

I think people underestimate how hard it is to train a large model like GPT-3 and up. Lots of challenges arise when reaching billions parameters, let alone 10B+ params (data management, training stability, parallelism...). Only a few have succeeded so far and the recipe is not…

7

31

310

7

10

166

Cody Blakeney

@code_star

10 months

MPT-7B is back and better than ever! 8K context length: 😍 500B additional tokens: 🔥 Open Source: ✅

mosaicml/mpt-7b-8k-chat · Hugging Face

huggingface.co

2

24

161

Cody Blakeney

@code_star

1 year

👀

6

25

154

Cody Blakeney

@code_star

1 month

You like nlp, huh? Name every word.

8

14

152

Cody Blakeney

@code_star

10 months

Do you guys ever think about tokenizers?

10

16

143

Cody Blakeney

@code_star

2 years

@rasbt I’ll try 🥲

11

2

143

Cody Blakeney

@code_star

3 years

Google researchers have better twitters than Facebook Researchers. You can't convince otherwise. Do they get more free time? 🧐

3

5

138

Cody Blakeney

@code_star

2 years

One of my favorite genre of tweets is public radio hosts clapping back at people who asked them to do/not do exactly what they already did/didn’t

Steve Inskeep

@NPRinskeep

2 years

Thanks. This is such a great suggestion! In fact, the story DID read excerpts of the Declaration and then DID hear from “a diverse set” of Americans who relied on it through history. Too bad you didn’t listen! “Missed opportunity.” But it’s not too late:

9

12

295

5

7

123

Cody Blakeney

@code_star

10 months

You know machine learning isn’t even my job And it is NOT LLMs which is a common misconception Actually my job … is just … GPU

2

16

117

Cody Blakeney

@code_star

2 months

@AlbalakAlon Yes! We trained a *new* MPT7B. Exact same arch and code. We were able to hit the same quality with half the number of tokens / training. Its not quite 2x reduction in training (larger tokenizer), but pretty dang close. We evaluated it on our newest version of guantlet.

6

11

118

Cody Blakeney

@code_star

4 years

@CornisDlaiLama @RexChapman The lady standing completely still? You must mean the police.

0

1

108

Cody Blakeney

@code_star

2 months

Not only is it’s a great general purpose LLM, beating LLama2 70B and Mixtral, but it’s an outstanding code model rivaling or beating the best open weight code models!

1

9

96

Cody Blakeney

@code_star

2 months

You’re telling me a data laid these bricks?

6

2

95

Cody Blakeney

@code_star

2 months

“What do you mean you didn’t sweep the learning rate????!!”

8

3

93

Cody Blakeney

@code_star

1 year

I still think the best use of chatGPT is just generating a template you can correct. Personally, editing requires a lot less mental strain than staring at a blank page.

5

88

Cody Blakeney

@code_star

11 months

@thechosenberg This is in fact the correct use of this meme.

1

0

76

Cody Blakeney

@code_star

5 months

1

12

75

Cody Blakeney

@code_star

3 years

@bartbing71 This isn’t the right take away but I hate the hassle the most when I catch cheating. Like … can you cheat better so I can enjoy my evening?

0

74

Cody Blakeney

@code_star

1 year

I'm absolutely floored by all the community-driven projects around MPT-7B 🤯. Are you using it for something? Tell us ( @MosaicML ), we would love to hear it!

3

5

70

Cody Blakeney

@code_star

2 years

I don’t have a SoundCloud but if you want to checkout the MLOps company I work for my boss (who hasn’t officially quit his PhD) would be very grateful We are trying to change the math on efficient training. Want to train imagenet in 27 min? Find out how

Mosaic Research | Databricks

The latest research, blogs and breakthroughs from Mosaic Research — plus job openings and more

www.databricks.com

1

2

70

Cody Blakeney

@code_star

3 years

@kairyssdal how much do I need to donate to APM or Marketplace to have start the show off on a Wednesday saying "In Los Angeles, I am Kai Ryssdal it is Wednesday, my dudes!"

3

0

68

Cody Blakeney

@code_star

11 months

MosaicML ends, MosaicML Shippuden begins. Rest assured that the power creep is just getting started.

4

2

65

Cody Blakeney

@code_star

23 days

In light of recent releases, how do we feel about 8Bs with the same performance as 70Bs?

Cody Blakeney

@code_star

8 months

Ok, but hear me out. a 7B model with the same performance as a 67B model is worth 7837x as much.

15

14

226

9

4

67

Cody Blakeney

@code_star

5 months

If it turns out Mistral’s new MoE is just 8 copies of its 7B trained “Branch, Train, Merge” style and compiled into an MoE. I suggest we call it “Mixture of Bastards” MoB.

6

1

65

Cody Blakeney

@code_star

1 year

This price to train a 13B feels off. It only cost us ~$200k to train MPT-7B 🤔

🍉Daniel "Drex" Drexler🏳️‍🌈

@stsDrex

1 year

I feel like a lot of the ideas about how to use AI are not in sync with the current cost realities. 1 MB per token! (from: )

0

5

24

8

62

Cody Blakeney

@code_star

2 months

Give the model a try yourself in our hugging face space 🤗

DBRX Instruct - a Hugging Face Space by databricks

huggingface.co

3

6

63

Cody Blakeney

@code_star

9 months

@yacineMTB Many men, tried to take my job away.

0

60

Cody Blakeney

@code_star

2 years

Is anyone out here still using step/exponential decay instead of cosine annealing or linear decay for learning rate schedulers?

3

5

60

Cody Blakeney

@code_star

3 months

Fun deep learning tip. Make your global batch size divisible by lots of numbers. 960 is way better then 1024. Then you can train on far more combinations of gpus if you want to soak up more capacity. 64, 80, 124, 120, 240, 480 so many options.

6

0

61

Cody Blakeney

@code_star

1 year

Oh no .. sudden drops in loss. This ResNet needs to be shut down ... just in case.

Bojan Tunguz

@tunguz

1 year

"14 epochs to level the loss."

7

1

78

2

60

Cody Blakeney

@code_star

4 years

@chrisalbon Docker is the solution. Fuck it, just send the whole OS and all the packages. I don't trust anyone.

5

2

61

Cody Blakeney

@code_star

9 months

5

3

59

Cody Blakeney

@code_star

2 months

I have to thank my amazing team (the @DbrxMosaicAI Data team @mansiege @_BrettLarsen @ZackAnkner Sean Owen and Tessa Barton) for their outstanding work. We have try made a generational improvement in our data. Token for token our data is twice as good as MPT7B was.

2

4

58

Cody Blakeney

@code_star

2 years

People writing DL papers with "Towards" in the title. Like bro where you headed?

7

57

Cody Blakeney

@code_star

27 days

Feels like there is a model missing from this triangle. 🤔

Rajko Radovanović @ ICLR 2024

@rajko_rad

27 days

Incredible performance and efficiency, all Apache 2.0 open, from the amazing @MistralAI team!!! I’m most excited for the SOTA OSS function calling, code and math reasoning capabilities!! Cc @GuillaumeLample @tlacroix6 @dchaplot @mjmj1oo @sophiamyang

3

4

71

4

0

57

Cody Blakeney

@code_star

2 years

@Alan_Au @fchollet @rcalo @chr1sa So they are actively throwing out the resumes of creative proactive candidates… sounds like recruiters to me

0

50

Cody Blakeney

@code_star

1 year

People have been talking on twitter about how few people can train XXbillion param LLMs, but I wonder how many people know the dark arts of building great tokenizers.

9

0

53

Cody Blakeney

@code_star

1 year

Me asking my advisor what he thinks of my writing

1

2

52

Cody Blakeney

@code_star

2 months

wow! you got into that *fast*. Yup that all looks right!

Daniel Han

@danielhanchen

2 months

Took a look at @databricks 's new open source 132 billion model called DBRX! 1) Merged attention QKV clamped betw (-8, 8) 2) Not RMS Layernorm - now has mean removal unlike Llama 3) 4 active experts / 16. Mixtral 2/8 experts. 4) @OpenAI 's TikToken tokenizer 100K. Llama splits…

25

174

1K

2

1

53

Cody Blakeney

@code_star

2 years

@MishraAmogh Sorry it’s not. I can share the course book though.

1

2

52

Cody Blakeney

@code_star

1 month

At me next time 😅

Dwaraknath Gnaneshwar

@DwaraknathG

1 month

Me the past few weeks

5

128

4

1

52

Cody Blakeney

@code_star

2 months

IYKYK 😉

Nancy Pelosi Stock Tracker ♟

@PelosiTracker_

2 months

BREAKING 🚨: Nancy Pelosi just bought $5M of the AI company Databricks Unfortunately, Databricks is a privately held company and not available to be bought by the public Sorry people, you don’t have access to this one.

294

2K

15K

3

1

50

Cody Blakeney

@code_star

2 months

It’s so over

6

1

50

Cody Blakeney

@code_star

1 year

Today is the first day of my big boy job. I'm excited to finally be full-time at @MosaicML ! 🥳 (now excuse me while I go flood our cluster with new experiments)

7

2

48

Cody Blakeney

@code_star

11 months

Interesting things happening at MosaicML today. @jefrankle decided “he is the captain now”.

3

1

48

Cody Blakeney

@code_star

12 days

I've made it y'all

Cody Blakeney

@code_star

13 days

Felt cute. Did some petabyte scale preprocessing. Might delete later.

4

1

41

4

0

48

Cody Blakeney

@code_star

16 days

Yam Peleg

@Yampeleg

16 days

Trained a model for a full week But the results were total shit Machine Learning is so much fun Fuck my life, what have I done?

17

4

181

2

1

47

Cody Blakeney

@code_star

10 months

Python should be more like Zuck and give me threads.

Greg Brockman

@gdb

10 months

Much of modern ML engineering is making Python not be your bottleneck.

98

138

2K

4

3

46

Cody Blakeney

@code_star

1 year

you love to see it

2

0

43

Cody Blakeney

@code_star

2 years

@ItsMePCandi @caradaze Just a guess. Spots are a zero sum game. Once too many tourists find out it’s harder for locals to go. 🤷‍♂️

2

0

43

Cody Blakeney

@code_star

17 days

I’m becoming a true believer of the dead internet theory.

8

3

43

Cody Blakeney

@code_star

10 months

When people ask me how to train a good LLM.

2

4

43

Cody Blakeney

@code_star

4 years

@johnwil80428495 @UniversityStar Well alot of us love people that are old or have compromised immune systems. If we do the right things we can save lives.

0

43

Cody Blakeney

@code_star

2 months

*correction, not open weights. It’s a commercial friendly licensed model. You’ll have to forgive me I was up late 😅 feel free to download it and try it yourself.

databricks/dbrx-instruct · Hugging Face

huggingface.co

3

43

Cody Blakeney

@code_star

1 month

DBRX is the best open model on AI2 WildBench! 😀

Bill Yuchen Lin 🤖

@billyuchenlin

1 month

🆕 Check out the recent update of 𝕎𝕚𝕝𝕕𝔹𝕖𝕟𝕔𝕙! We have included a few more models including DBRX-Instruct @databricks and StarlingLM-beta (7B) @NexusflowX which are both super powerful! DBRX-Instruct is indeed the best open LLM; Starling-LM 7B outperforms a lot of even…

3

32

127

3

5

42

Cody Blakeney

@code_star

1 year

@Tim_Dettmers Truly the shame should go further up the author list. That being said I think like 30-50% of deep learning papers of the last decade wouldn’t have been published if they had properly tuned baselines.

0

42

Cody Blakeney

@code_star

13 days

Felt cute. Did some petabyte scale preprocessing. Might delete later.

4

1

41

Cody Blakeney

@code_star

10 months

Nah bro, just send it

2

41

Cody Blakeney

@code_star

8 days

👀

2

0

41

Cody Blakeney

@code_star

2 months

It’s coming back! The @jefrankle lost a bet with the unbelievably talented @mansiege and has been subjected to being rad. What an unfortunate turn of events.

Cody Blakeney

@code_star

2 months

Which head of research at an AI company has the craziest hair?

4

0

31

5

4

39

Cody Blakeney

@code_star

22 days

I think people some people (not necessarily Jesse) misunderstood why there is a lack of transparency. Meta isn’t afraid of transparency, or giving up secret sauce. Big players will not disclose their data until case law over copyright/fair use is better defined. That doesn’t mean…

Jesse Dodge

@JesseDodge

26 days

This follows the trend of large organizations releasing models and promoting their capabilities, while not providing the information necessary to understand their behavior: the training data. To be clear, this is expected, but also highlights the need for more transparency.

2

24

5

3

40

Cody Blakeney

@code_star

2 months

Words cannot express how excited I am about this. @lilac_ai is *the* best user experience I have found for exploring, cleaning, and understanding data for LLMs. I can’t wait to work with them to build the future of data!

Nikhil Thorat

@nsthorat

2 months

Incredibly excited to announce that @lilac_ai is joining @databricks ! With Lilac in Databricks, data curation for LLMs will be elevated to the next level: Enterprise AI 🚀🚀 A huge huge to everyone who’s supported us on this journey ❤️

44

14

221

1

40

Cody Blakeney

@code_star

1 year

Cody Blakeney

@code_star

1 year

I can't believe it's finally happening. Tomorrow I don my wizard robes and become a Dr. Blakeney (again ... I'm still trying to figure out how that works). I'm gonna try and jump in the river if it isn't flooding. If y'all don't hear from me ... check the news.

2

0

20

9

1

38

Cody Blakeney

@code_star

11 months

I've become numb to my programming mistakes

0

4

39

Cody Blakeney

@code_star

3 years

Ok I just got around to taking the time to learn how to use @weights_biases . Wow what a game changer. I can't believe I put it off this long.

1

4

39

Cody Blakeney

@code_star

11 months

MI250s run out of the box with ZERO CODE CHANGES on llm-foundry 👀👀👀 huge shout out to @abhi_venigalla and @vitaliychiley for this one!

Training LLMs with AMD MI250 GPUs and MosaicML | Databricks Blog

With the release of PyTorch 2.0 and ROCm 5.4, we are excited to announce that LLM training works out of the box on AMD MI250 accelerators with zero code changes and at high performance!

www.databricks.com

2

6

39

Cody Blakeney

@code_star

5 years

@Aaron_Torres @RSherman_25 pretty sure it was stanford that gave him the degree not the NCAA

1

0

37

Cody Blakeney

@code_star

4 months

@DimitrisPapail I’m really worried about telling them that before tensorflow I used R 😬

1

0

37

Cody Blakeney

@code_star

10 months

If you are hiring anything ML/NN related reach out to my boy. We were in the same PhD cohort. Half of my good ideas in my dissertation he helped me brainstorm. One of the best python programmers I know. Immigration laws in this country are bs and have him scrambling.

Keshav Bhandari

@iamkaysb

10 months

Well, bad news . I had to leave Tesla. I have a tight deadline of August 14th to get a new employer and save my immigration status😬. However, I refuse to let this setback define my journey. I am more determined than ever to continue my work in the world of #AI and #DNN !

2

11

29

1

20

36

Cody Blakeney

@code_star

2 months

Me making memes all day to support the launch

Cody Blakeney

@code_star

2 months

It’s finally here 🎉🥳 In case you missed us, MosaicML/ Databricks is back at it, with a new best in class open weight LLM named DBRX. An MoE with 132B total parameters and 32B active 32k context length and trained for 12T tokens 🤯

28

130

833

0

5

37

Cody Blakeney

@code_star

1 year

@NaveenGRao I feel like a python -> c/c++ translator would get you most of the way to what you want.

2

0

37

Cody Blakeney

@code_star

1 month

@_philschmid @OpenAI I knew it was April 1st and I still clicked it. I deserve this.

2

0

36

Cody Blakeney

@code_star

10 months

I cannot say enough how much I ❤️ love ❤️ our model gauntlet. Both for the speed at which it evaluates on its many tasks, and the thoughtfulness that went into organizing the tasks. It’s been a god send for us for selecting pre-training data and making modeling decisions.

Databricks Mosaic Research

@DbrxMosaicAI

10 months

How can the ML community measure LLM quality in a holistic and standardized manner? The Mosaic Model Gauntlet encompasses 34 benchmarks, organized into 6 broad categories of competency, evaluated with our blazingly fast open-source ICL eval harness. 🧵👇

5

41

170

2

4

35

Cody Blakeney

@code_star

8 months

Me every time I use s3: Do you know who IAM?!!

2

36

Cody Blakeney

@code_star

1 year

Its happening! @MosaicML 's own open source, commercially usable, LLM model!

2

6

35

Cody Blakeney

@code_star

1 year

It’s hard for me for read the statements by OpenAI as anything other than a cynical advertisement for how powerful their products are, and to scare people off from throwing their hat in the ring.

anton

@abacaj

1 year

"Sir, our AI system has detected an anomaly in the energy usage at your residence. The authorities will be over to inspect."

48

49

507

2

35

Cody Blakeney

@code_star

1 year

Kind of genius if this is what happened. Drop the big expensive model, let people analyze it and be amazed, then distill it to save costs. *If* that is what occurred *and if* if has regressed this seems like a case where metrics didn’t capture the effects of compression.

Laura Wendel

@Lauramaywendel

1 year

Was GPT4 just lobotomized? It responds to queries a lot faster but seems to perform a lot worse than just a few weeks ago (not following instructions properly, making very obvious coding mistakes etc) Quite likely they replaced it with a distilled smaller model to save costs?

329

180

3K

5

1

34

Cody Blakeney

@code_star

2 months

Read all the details about how we built this model in our technical blog!