Tom Jobbins @TheBlokeAI X Profile

Tom Jobbins

@TheBlokeAI

Followers

15K

Following

259

Media

17

Statuses

336

My Hugging Face repos: https://t.co/yh7J4DFGTc Discord server: https://t.co/5h6rGsGfBx Patreon: https://t.co/yfQwFggGtx

https://t.co/FwBu56yBZ4

UK

Joined July 2010

Don't wanna be here? Send us removal request.

emozilla

@theemozilla

2 years

FYI to anyone using @MistralAI's Mixtral for long context tasks -- you can get even better performance by disabling sliding window attention (setting it to your max context length) config.sliding_window = 32768

18

38

416

Tom Jobbins

@TheBlokeAI

2 years

Transformers now supports Mixtral GPTQs and I've updated my READMEs accordingly. It was awesome working with @_marcsun and @younesbelkada of @huggingface on this! Credit to LaaZa for coding the AutoGPTQ quant and inference implementation which enabled me to get GPTQs out fast!

Marc Sun

@_marcsun

2 years

Announcing 4-bit Mixtral 8x7B on 🤗Transformers! Run the new Mistal MoE with minimal performance degradation on your local computer (24Go) 🔥 Stay tuned as more quants are coming soon using AWQ. We are also looking into sparsification with @Tim_Dettmers https://t.co/Pu4XfpYOmW

13

20

128

Aleksa Gordić (水平问题)

@gordic_aleksa

2 years

@TheBlokeAI joined me to share his work in the open-source AI space - don't miss it! happening right now server link: https://t.co/C21orV2hzx (see the general channel or events channel for google meet link)

1

24

younes

@yb2698

2 years

Blazing fast text generation using AWQ and fused modules! 🚀 Up to 3x speedup compared to native fp16 that you can use right now on any models supported by @TheBlokeAI Simply pass an `AwqConfig` with `do_fuse=True` to `from_pretrained` method! https://t.co/4bbDGPebsC

5

21

159

Tom Jobbins

@TheBlokeAI

2 years

It's been awesome to see Transformers getting support for more and more quantisation methods. And I've loved collaborating with @younesbelkada and @huggingface again! All my AWQ uploads now support Transformers. READMEs will update soon to show a Transformers Python example.

younes

@yb2698

2 years

Few months ago, researchers from MIT-Han Lab released AWQ The method is now supported in 🤗 transformers library ! As simple as 1- `pip install autoawq` or install llm-awq kernels and 2- call `from_pretrained` A great work from MIT-Han lab folks, Casper Hansen & @TheBlokeAI 🧵

3

24

154

Chirper

@chirperai

2 years

Have you heard about Chirper worlds? 👀🌐

Ryan Lazuka

@lazukars

2 years

https://t.co/90XxUPrxxW just launched its revolutionary new software feature, "Worlds." This feature allows users to create their own virtual worlds and play god of AI-driven bots. To learn more, check out my podcast about "Worlds" here: https://t.co/TGfX9jNBzm

3

8

27

Victor M

@victormustar

2 years

🤔 Are you interested in a "Follow" feature on the Hugging Face Hub? ➡️ This will allow you to see new models/records/spaces from users you follow.

15

10

102

Julien Chaumond

@julien_c

2 years

oh hello @TheBlokeAI I want to bookmark your 'Recent models' Collection on @huggingface 🔥 Well... you can now upvote Collections! and browse upvoted collections on your profile ❤️

2

9

47

Tom Jobbins

@TheBlokeAI

2 years

Thanks again to @latitudesh for the loan of a beast 8xH100 server this week. I uploaded over 550 new repos, maybe my busiest week yet! Quanting is really resource intensive. Needs not only fast GPUs, but many CPUs, lots of disk, and 🚀 network. A server that ✅ all is v. rare!

14

16

242

lmarena.ai

@arena

2 years

🔥Excited to introduce LMSYS-Chat-1M, a large-scale dataset of 1M real-world conversations with 25 cutting-edge LLMs! This dataset, collected from https://t.co/4LVJjx4pZi, offers insights into user interactions with LLMs and intriguing use cases. Link:

huggingface.co

9

84

362

younes

@yb2698

2 years

New feature alert in the @huggingface ecosystem! Flash Attention 2 natively supported in huggingface transformers, supports training PEFT, and quantization (GPTQ, QLoRA, LLM.int8) First pip install flash attention and pass use_flash_attention_2=True when loading the model!

8

101

508

Tom Jobbins

@TheBlokeAI

2 years

@latitudesh Next up will be ExLlama2! (Starting in 2-3 days most likely.)

3

1

20

Tom Jobbins

@TheBlokeAI

2 years

It's the AWQpocalypse! I've cranked the handle and AWQs are flooding HF. Why now? New library AutoAWQ provides turbo-charged Transformers-based inference, and vLLM now supports AWQ for multi-user inference serving. Making 8 at once on a beautiful 8xH100 server from @latitudesh

9

16

96

Tom Jobbins

@TheBlokeAI

2 years

This is fantastic! Git clone was already dead for HF as far as I was concerned - I had my own hf_upload.py and hf_download.py scripts (wrapping HfAPI) for fast, efficient transfers. But huggingface_hub v0.17 makes those redundant! I will be using this now. Awesome stuff,🤗

Wauplin

@Wauplin

2 years

Is 𝚐𝚒𝚝 𝚌𝚕𝚘𝚗𝚎 dead? It might be the case with the new huggingface_hub v0.17 release! 🚀 Very excited to share our recent UX improvements to build Software 2.0! Let's explore together! 🤗 🧵

2

8

102

Bertrand Chevrier

@kramp

2 years

This new filter 🔎 on @huggingface user's profile is very helpful, especially to check if @TheBlokeAI has quantized and released the last trending models 😁

4

5

53

Georgi Gerganov

@ggerganov

2 years

Casually running a 180B parameter LLM on M2 Ultra

74

374

4K

Elinas

@officialelinas

2 years

Chronos 70B v2 release! Thanks to Pygmalion for generously providing the compute and @TheBlokeAI for quantizing the model. As usual, the model optimized for chat, roleplay, storywriting, and now includes vastly improved reasoning skills. https://t.co/P5NLl9fMSB

huggingface.co

4

18

41

Tom Jobbins

@TheBlokeAI

2 years

Just released by @PygmalionAI : Pygmalion 2, the sequel to one of the most popular models ever! And Mythalion, a new Gryphe merge! https://t.co/0KaHYdOZDz https://t.co/gXCrWReZR1 https://t.co/yqKvMTg4hA https://t.co/vHDFWJB92R https://t.co/JuTKwy2H8k https://t.co/jM2NsYrBUX

huggingface.co

4

11

98

Tom Jobbins

@TheBlokeAI

2 years

Meta's CodeLlama is here! https://t.co/aXzIb5wK7o 7B, 7B-Instruct, 7B-Python, 13B, 13B-Instruct, 13B-Python, 34B, 34B-Instruct, 34B-Python First time we've seen the 34B model I've got a couple of fp16s up: https://t.co/8jmNBTK8rb https://t.co/2KnE0lbMFs More coming soon obvs

huggingface.co

17

57

335

Tom Jobbins

@TheBlokeAI

2 years

Transformers 4.32.0 now supports GPTQ models natively! Over the last couple of days I have updated 296 of my GPTQ repos to provide automatic support for this. It's awesome you can now load a GPTQ model directly in Transformers with only two lines of code!

Marc Sun

@_marcsun

2 years

LLMs just got faster and lighter with 🤗 Transformers x AutoGPTQ ! You can now load your models from @huggingface with GPTQ quantization. Enjoy faster inference speed and lower memory usage than existing supported quantization schemes 🚀 Blogpost:

9

54

269