Imanol Schlag Profile
Imanol Schlag

@ImanolSchlag

Followers
117
Following
19
Media
0
Statuses
8

Apertus Lead. AI Research Scientist at the ETH AI Center.

Switzerland
Joined December 2015
Don't wanna be here? Send us removal request.
@ImanolSchlag
Imanol Schlag
2 months
Developing Apertus, I learned how difficult evals can be. So it's always good to have a second opinion. This recent work evaluated our fully transparent and compliant Apertus 8B model. Here, Apertus is third, beating Llama, Mistral, Olmo, and others!
Tweet card summary image
arxiv.org
We present Llama-GENBA-10B, a trilingual foundation model addressing English-centric bias in large language models. Built on Llama 3.1-8B and scaled to 10B parameters, Llama-GENBA-10B is...
0
0
6
@ImanolSchlag
Imanol Schlag
2 months
Would you like to know the details? Well, you can! Today, we published the first official version of our technical report with a total of 119 pages covering all sorts of details that you will find important. Which part is your favorite or least favorite? https://t.co/iKF6bdXoVU
Tweet card summary image
arxiv.org
We present Apertus, a fully open suite of large language models (LLMs) designed to address two systemic shortcomings in today's open model ecosystem: data compliance and multilingual...
0
0
4
@ImanolSchlag
Imanol Schlag
2 months
Our Apertus 8B model performs very well, outperforming popular big-tech models, such as Llama 3.1-8B or GPT-OSS-20B, on our benchmarks. Furthermore, our 70B model is among the largest developed by a public institution and competitive with open-yet-obscure models of similar size.
1
0
1
@ImanolSchlag
Imanol Schlag
2 months
Can we develop AI responsibly? Yes, and we prove it by example. Two weeks ago, we released our Apertus models, which set a new standard in transparency, inclusivity, and compliance while achieving competitive performance. 🧵
1
4
8
@ImanolSchlag
Imanol Schlag
7 months
We released our first work on the "compliance gap" when pre-training LLMs. We find that AI-opt-outs have a relatively small effect on performance. 👇
@dyfan22
Dongyang Fan
7 months
🚨 AI is in legal hot water. Lawsuits over copyrighted training data are mounting — and content owners are pulling out fast. Top opt-outs? 📰 News & Media 🔬 Science & Tech 🏥 Health Info But here’s the thing: How much do those datasets actually matter for model performance? 🧵👇
0
0
6
@rupspace
Rupesh Srivastava
10 months
Wrote a post about Highway networks, ResNets and subtleties of architecture comparisons
5
40
258
@robert_csordas
Csordás Róbert
11 months
Come visit our poster "MoEUT: Mixture-of-Experts Universal Transformers" on Friday at 4:30 pm in East Exhibit Hall A-C #1907 on #NeurIPS2024. With Kazuki Irie, @SchmidhuberAI, @ChrisGPotts and @chrmanning.
1
11
41
@arankomatsuzaki
Aran Komatsuzaki
1 year
MoEUT: Mixture-of-Experts Universal Transformers Their UT model, for the first time, slightly outperforms standard Transformers on LM tasks such as BLiMP and PIQA, while using significantly less compute and memory repo: https://t.co/QudGYNLDBb abs: https://t.co/CmvJNRBtcT
1
37
214