@DbrxMosaicAI
Databricks Mosaic Research
1 year
Meet MPT-30B, the latest member of @MosaicML 's family of open-source, commercially usable models. It's trained on 1T tokens with up to 8k context (even more w/ALiBi) on A100s and *H100s* with big improvements to Instruct and Chat. Take it for a spin on HF!
Tweet media one
17
127
548

Replies

@DbrxMosaicAI
Databricks Mosaic Research
1 year
MPT-30B is a bigger sibling of MPT-7B, which we released a few weeks ago. The model arch is the same, the data mix is a similar, and the context grew to 8k. We massively upgraded the Instruct and Chat variants over MPT-7B. See the full details in our blog!
4
3
34
@DbrxMosaicAI
Databricks Mosaic Research
1 year
It's a huge improvement in MPT-7B in every way and it's a peer of LLaMA-30B, Falcon-40B, and GPT-3 according to our new evaluation framework. We collected (sub)tasks from popular eval benchmarks into categories like "world knowledge," "reading comprehension," and "programming."
Tweet media one
Tweet media two
1
0
29
@DbrxMosaicAI
Databricks Mosaic Research
1 year
MPT-30B is an especially adept programmer, going toe-to-toe with code-only models.
Tweet media one
2
2
25
@DbrxMosaicAI
Databricks Mosaic Research
1 year
It's optimized for incredibly fast inference. It fits on one A100, meaning you don't have to do any crazy gymnastics to take advantage of it. If you want to use it on the MosaicML inference service, it's 4x cheaper than comparable OpenAI models.
Tweet media one
1
0
25
@DbrxMosaicAI
Databricks Mosaic Research
1 year
How much did it cost to train? At list price on @MosaicML , it was between $714k and $871k depending on your GPU choice. It's also incredibly cheap to fine-tune, at between $714 and $871 per 1B tokens.
Tweet media one
2
0
30
@DbrxMosaicAI
Databricks Mosaic Research
1 year
MPT-30B-Instruct and MPT-30B-Chat include v2 of our instruction-following and chat datasets, themselves big upgrades over MPT-7B. (We're releasing MPT-7B-Instruct/Chat-v2 soon as well!) Check out the Chat model - served using MosaicML inference - here!
1
0
18
@DbrxMosaicAI
Databricks Mosaic Research
1 year
And of course, the base model is available for you to build on as you like, on your own or on the MosaicML Platform.
1
3
53
@DbrxMosaicAI
Databricks Mosaic Research
1 year
As always, the most exciting model is the that one *you* will build on *your* data using the MPT-30B architecture and base model. Sign up to get started on the MosaicML platform here!
0
4
26
@StasBekman
Stas Bekman
1 year
@MosaicML Awesome work, MosaicML! Do you by chance have the training chronicles log? It would be interesting to the community to see what difficulties were encountered (spikes/divergences/etc) and how they were overcome. Thanks a lot!
0
0
3
@digitalhealthxx
Sami Nas ๐Ÿ‘จโ€โš•๏ธ
1 year
@MosaicML It looks very promising
0
0
0
@allthingsaiHQ
AllThingsAI
10 months
@MosaicML The evaluation framework is quite intriguing. It appears that this is one of the unresolved matters lacking any consensus thus far.
0
0
0
@bullarmy5
BULL-ARMYโ„ข๏ธ
6 months
@MosaicML ๐™ƒ๐™š๐™ก๐™ก๐™ค! ๐™„ ๐™–๐™ข ๐™ž๐™ฃ๐™ฉ๐™š๐™ง๐™š๐™จ๐™ฉ๐™š๐™™ ๐™ž๐™ฃ ๐™ฎ๐™ค๐™ช๐™ง ๐™ฅ๐™ง๐™ค๐™Ÿ๐™š๐™˜๐™ฉ! ๐Ÿ’ฅ๐™„ ๐™ฌ๐™ค๐™ช๐™ก๐™™ ๐™ก๐™ž๐™ ๐™š ๐™ฉ๐™ค ๐™˜๐™ค๐™ก๐™ก๐™–๐™—๐™ค๐™ง๐™–๐™ฉ๐™š. ๐™‹๐™ก๐™š๐™–๐™จ๐™š ๐™จ๐™š๐™ฃ๐™™ ๐™ข๐™š ๐˜ฟ๐™ˆ!๐Ÿ“ฉโ˜‘๏ธ
0
0
0
@PennyCryptoo
PennyCrypto
6 months
@MosaicML ๐ŸŒŸ๐Ÿ” The potential in your project is captivating! Let's discuss in DM! ๐Ÿ“ฌ
0
0
0
@SelfInfinity
Self
1 year
@MosaicML Thanks for sharing! The evaluation framework is very interesting - this seems to be one if the open issues that doesnโ€™t have any consensus yet?
0
0
0
@BennyAGI
Benny
1 year
@MosaicML I hope this one could handle closed question answering task
0
0
0
@EitanTurok
Eitan Turok
1 year
@MosaicML Is there a recommended stack to perform fine-tuning on this model?
0
0
0
@runintheywild
whitecastle
1 year
@MosaicML great work!
0
0
0
@intellikey_ai
intellikey.ai
1 year
@MosaicML Cool but... Hallucinates like mad ๐Ÿ’€ How hard would it be to replace the post-training with something better than the same old trash RLHF?
Tweet media one
0
0
0
@sandyasm
sandya mannarswamy
1 year
@MosaicML Any pointers to inference performance numbers of MPT-30B, 7B on H100?
0
0
0