Tom Jobbins
@TheBlokeAI
Followers
15K
Following
259
Media
17
Statuses
336
My Hugging Face repos: https://t.co/yh7J4DFGTc Discord server: https://t.co/5h6rGsGfBx Patreon: https://t.co/yfQwFggGtx
UK
Joined July 2010
FYI to anyone using @MistralAI's Mixtral for long context tasks -- you can get even better performance by disabling sliding window attention (setting it to your max context length) config.sliding_window = 32768
18
38
416
Transformers now supports Mixtral GPTQs and I've updated my READMEs accordingly. It was awesome working with @_marcsun and @younesbelkada of @huggingface on this! Credit to LaaZa for coding the AutoGPTQ quant and inference implementation which enabled me to get GPTQs out fast!
Announcing 4-bit Mixtral 8x7B on ๐คTransformers! Run the new Mistal MoE with minimal performance degradation on your local computer (24Go) ๐ฅ Stay tuned as more quants are coming soon using AWQ. We are also looking into sparsification with @Tim_Dettmers
https://t.co/Pu4XfpYOmW
13
20
128
@TheBlokeAI joined me to share his work in the open-source AI space - don't miss it! happening right now server link: https://t.co/C21orV2hzx (see the general channel or events channel for google meet link)
1
1
24
Blazing fast text generation using AWQ and fused modules! ๐ Up to 3x speedup compared to native fp16 that you can use right now on any models supported by @TheBlokeAI Simply pass an `AwqConfig` with `do_fuse=True` to `from_pretrained` method! https://t.co/4bbDGPebsC
5
21
159
It's been awesome to see Transformers getting support for more and more quantisation methods. And I've loved collaborating with @younesbelkada and @huggingface again! All my AWQ uploads now support Transformers. READMEs will update soon to show a Transformers Python example.
Few months ago, researchers from MIT-Han Lab released AWQ The method is now supported in ๐ค transformers library ! As simple as 1- `pip install autoawq` or install llm-awq kernels and 2- call `from_pretrained` A great work from MIT-Han lab folks, Casper Hansen & @TheBlokeAI ๐งต
3
24
154
Have you heard about Chirper worlds? ๐๐
https://t.co/90XxUPrxxW just launched its revolutionary new software feature, "Worlds." This feature allows users to create their own virtual worlds and play god of AI-driven bots. To learn more, check out my podcast about "Worlds" here: https://t.co/TGfX9jNBzm
3
8
27
๐ค Are you interested in a "Follow" feature on the Hugging Face Hub? โก๏ธ This will allow you to see new models/records/spaces from users you follow.
15
10
102
oh hello @TheBlokeAI I want to bookmark your 'Recent models' Collection on @huggingface ๐ฅ Well... you can now upvote Collections! and browse upvoted collections on your profile โค๏ธ
2
9
47
Thanks again to @latitudesh for the loan of a beast 8xH100 server this week. I uploaded over 550 new repos, maybe my busiest week yet! Quanting is really resource intensive. Needs not only fast GPUs, but many CPUs, lots of disk, and ๐ network. A server that โ
all is v. rare!
14
16
242
๐ฅExcited to introduce LMSYS-Chat-1M, a large-scale dataset of 1M real-world conversations with 25 cutting-edge LLMs! This dataset, collected from https://t.co/4LVJjx4pZi, offers insights into user interactions with LLMs and intriguing use cases. Link:
huggingface.co
9
84
362
New feature alert in the @huggingface ecosystem! Flash Attention 2 natively supported in huggingface transformers, supports training PEFT, and quantization (GPTQ, QLoRA, LLM.int8) First pip install flash attention and pass use_flash_attention_2=True when loading the model!
8
101
508
@latitudesh Next up will be ExLlama2! (Starting in 2-3 days most likely.)
3
1
20
It's the AWQpocalypse! I've cranked the handle and AWQs are flooding HF. Why now? New library AutoAWQ provides turbo-charged Transformers-based inference, and vLLM now supports AWQ for multi-user inference serving. Making 8 at once on a beautiful 8xH100 server from @latitudesh
9
16
96
This is fantastic! Git clone was already dead for HF as far as I was concerned - I had my own hf_upload.py and hf_download.py scripts (wrapping HfAPI) for fast, efficient transfers. But huggingface_hub v0.17 makes those redundant! I will be using this now. Awesome stuff,๐ค
Is ๐๐๐ ๐๐๐๐๐ dead? It might be the case with the new huggingface_hub v0.17 release! ๐ Very excited to share our recent UX improvements to build Software 2.0! Let's explore together! ๐ค ๐งต
2
8
102
This new filter ๐ on @huggingface user's profile is very helpful, especially to check if @TheBlokeAI has quantized and released the last trending models ๐
4
5
53
Chronos 70B v2 release! Thanks to Pygmalion for generously providing the compute and @TheBlokeAI for quantizing the model. As usual, the model optimized for chat, roleplay, storywriting, and now includes vastly improved reasoning skills. https://t.co/P5NLl9fMSB
huggingface.co
4
18
41
Just released by @PygmalionAI : Pygmalion 2, the sequel to one of the most popular models ever! And Mythalion, a new Gryphe merge! https://t.co/0KaHYdOZDz
https://t.co/gXCrWReZR1
https://t.co/yqKvMTg4hA
https://t.co/vHDFWJB92R
https://t.co/JuTKwy2H8k
https://t.co/jM2NsYrBUX
huggingface.co
4
11
98
Meta's CodeLlama is here! https://t.co/aXzIb5wK7o 7B, 7B-Instruct, 7B-Python, 13B, 13B-Instruct, 13B-Python, 34B, 34B-Instruct, 34B-Python First time we've seen the 34B model I've got a couple of fp16s up: https://t.co/8jmNBTK8rb
https://t.co/2KnE0lbMFs More coming soon obvs
huggingface.co
17
57
335
Transformers 4.32.0 now supports GPTQ models natively! Over the last couple of days I have updated 296 of my GPTQ repos to provide automatic support for this. It's awesome you can now load a GPTQ model directly in Transformers with only two lines of code!
LLMs just got faster and lighter with ๐ค Transformers x AutoGPTQ ! You can now load your models from @huggingface with GPTQ quantization. Enjoy faster inference speed and lower memory usage than existing supported quantization schemes ๐ Blogpost:
9
54
269