1/n Introducing Mamba-Chat: the best non-transformer based chat model!
Mamba-Chat is mainly based on
@tri_dao
's and
@_albertgu
's awesome work on state-space models and Mamba. We've just added some fine-tuning on top.
Code:
2023 recap:
1. started a company with
@hkonsti_
2. got into
@ycombinator
, dropped out of university
3. spent 3 months in SF, met incredibly smart and ambitious people
4. moved to Berlin, turned 21, now live on my own and work full-time
This was clearly the highest-impact year
🚨 Exciting News:
@haven_run
is part of
@ycombinator
!
As of now, developers that want to train & deploy LLMs on their own infrastructure have to deal with ML code, CUDA, and manual resource scaling.
We want to make it as easy as calling the OpenAI API
Introducing Haven's finetuning platform: You can now train LLMs with LoRA adapters, test them with <1s cold starts, and export them to Huggingface to run on your own terms.
Below you can find a demo video - when signing up to , you get $5 in credits :)
After four months in Berlin, we are back in the arena!
Feel free to DM if you want to grab a coffee. My goal is to speak to as many fellow devs and founders as possible :)
We're live on Launch YC (
@ycombinator
) with Haven's managed offering!
Haven works like Replicate, but in your private GCP / AWS environment. You can select any LLM - Haven will deploy and scale it on a Kubernetes cluster running on your infrastructure.
Was great to share what we‘ve been building over the last weeks at AI Tinkerers Berlin. Thanks for the invite!
@con5di
@vietdle
This weekend, we‘ll run final tests - afterwards, we’ll launch something exciting for everyone building with open source LLMs. Stay tuned🫡
I'm on my way to EMNLP! 🇦🇪
You can find me at our poster on "Differentially Private Language Models for Secure Data Sharing" - I'm also available for internship opportunities in 2023.
Also, feel free to DM if you'd like to explore Abu Dhabi or go to the beach on Tuesday!
Our work "Differentially Private Language Models for Secure Data Sharing", which proposes a simple, but highly effective method to generate textual datasets with DP guarantees, has been accepted at EMNLP 2022 🥳
w/
@ZhijingJin
, Benjamin Weggenmann,
@mrinmayasachan
,
@bschoelkopf
2/n When I saw the release of Mamba, I got super excited, so I basically took a day off to fine-tune it.
The original implementation was very easy to work with. I just had to change a few lines of code of
@huggingface
's Trainer class, and was immediately able to start training.
Just moved to SF - exciting times ahead! Can’t wait to share what I’m working on here.
PS: If you’re in the Bay Area and want to have a coffee, feel free to DM me :)
4/n In terms of results, I'm actually positively surprised about the model! Even though it just has 2.8B parameters and the base model was only trained on the Pile, it performs quite well. Chatting with it reminds me of chatting with Alpaca when it was released in March.
🦙 Many of our users have asked us for help with fine-tuning Llama V2 on their own chat datasets. That's why we've built llamatune, which allows you to do so without writing any code!
Check it out on GitHub:
1/n With membership inference attacks (MIA), one can detect the training data of an LLM and compromise privacy! 🔍
While SOTA attacks need prior knowledge about the training distribution, our new paper shows that this is not necessary - we thus need to rethink our threat model!
Introducing Haven v0.2 🔮
Haven now lets you deploy almost any LLM in your own VPC! We've added support for custom models from
@huggingface
, and users are already deploying their own fine-tuned models!
Setting up a production ready LLM server takes just a few lines of code:
3/n We trained on 16k samples of the 200k filtered Ultrachat dataset, on a single A100 (40GB) GPU.
Please note that the implementation is definitely not optimized, this was just a fun hack. I'm sure you can get way faster and more efficient training with some optimizations.
1/n Introducing Mamba-Chat: the best non-transformer based chat model!
Mamba-Chat is mainly based on
@tri_dao
's and
@_albertgu
's awesome work on state-space models and Mamba. We've just added some fine-tuning on top.
Code:
Still can’t wrap my head around the fact that there are actually ChatGPT detectors making a bunch of money by selling to schools. I’m really glad that I’m not a student anymore
Mira Murati reached out to me in 2022 for a one-hour zoom call. Sam Altman never essayed any such contact. Also, I don't think Murati has made any jokes about how funny it would be if the world ended. I'm tentatively 8.5% more cheerful about OpenAI going forward.
@tallinzen
On top of that, most exam dates are during the holidays, not within the actual semester. This makes it virtually impossible to participate in structured summer internship programs outside of the country
@cloud11665
If you want to host fine-tuned models cheaply, we’ll have something for you in a few weeks :) we hotswap lora adapters, so pricing is usage-based, and cold start times are just as long as it takes to load an adapter onto the GPU (i.e. ~1-2s)
@theaiengineerco
I optimized for grad school applications, that’s why I was able to churn out a lot of papers that were good, but not great.
I don���t think it’s possible to publish that much when you actually work on high-impact, high-risk stuff.
We've just released v0.1 - to get started, you can head to our repository and deploy Haven in a couple of minutes.
Alternatively, write me a DM if you have questions or want me to personally onboard you :)
If reputational concerns really stop companies from building products with huge potential such as ChatGPT, then I feel very optimistic about the future impact that research in "trustworthy" ML/NLP (privacy, bias, adversarial robustness, etc.) can have
4/n Obviously, the assumption that an attacker has access to i.i.d data in order to train a reference model is not always realistic. Therefore, we decided to investigate whether attacks can be accurate without access to such data.
People say that open source LLMs can't compete with OpenAI and haven't even tried the best open models.
@huggingface
's zephyr-7b-beta is incredible, and also works great for finetuning. Try that, and then imagine what's possible when
@MistralAI
releases even better base models
2/n Membership inference attacks exploit that models tend to exhibit higher confidence on training samples.
Therefore, a common baseline attack simply classifies a data sample as training data if the loss under the model's distribution is below a certain threshold.
3/n This attack is not highly accurate in practice - therefore, existing SOTA attacks regularize this loss by comparing it to the loss of a reference model trained on data from the same distribution as the training set.
Happy to share that the projects I contributed to while interning at
@SAP
Security Research have been published!
1) DP-VAE: Human-Readable Text Anonymization for Online Reviews with Differentially Private Variational Autoencoders (
@TheWebConf
2022, )
/1
5/n We found that this the case! Concretely, we designed an alternative regularization function that compares the target model's confidence for a given sample to its confidence for perturbed samples that were generated through word replacements.
Do you consider it bad practice when investors book calls through the „Book a Call“ button on your landing page?
Our button doesn’t explicitly state that it’s for customer demos only, but I thought that this would be clear
We furthermore show that simply training GPT-2 to generate paraphrases and adjusting the softmax temperature to balance privacy and utility grants better protection against deanonymization attacks, reaps more fluent text and is in fact a differentially private mechanism
/end
Multi-Node Training 🤝 Haven LLM Fine-Tuning Platform
If your company wants to train LLMs with more than 8 A100s, feel to reach out! Also, do take a look at this wonderful graphic I made for Linkedin
2) The Limits of Word Level Differential Privacy (Findings of
@naaclmeeting
2022, )
Here, we examine word embedding perturbations for private text sharing and find strong limitations w.r.t. the mathematical privacy guarantee and language quality.
/3
@HJCH0
@cloud11665
Having all adapters on the GPU limits us w.r.t how many adapters we can serve for a single base model (above a few hundred adapters you get CUDA OOM errors like this).
This way, we couldn't afford to charge usage-based since most users don't query their models very often.