CStanKonrad Profile Banner
Konrad Staniszewski Profile
Konrad Staniszewski

@CStanKonrad

Followers
107
Following
57
Media
2
Statuses
28

PhD student at @UniWarszawski. Working on long-context LLMs.

Warsaw, Poland
Joined March 2023
Don't wanna be here? Send us removal request.
@CStanKonrad
Konrad Staniszewski
1 month
RT @mic_nau: We wondered if off-policy RL could transfer to real robots on-par with on-policy PPO. Turns out it works surprisingly well!….
0
9
0
@CStanKonrad
Konrad Staniszewski
1 month
RT @PontiEdoardo: 🚀 By *learning* to compress the KV cache in Transformer LLMs, we can generate more tokens for the same compute budget.….
0
28
0
@CStanKonrad
Konrad Staniszewski
3 months
RT @p_nawrot: Sparse attention is one of the most promising strategies to unlock long-context processing and long generation reasoning in L….
0
112
0
@CStanKonrad
Konrad Staniszewski
4 months
In a moment, I am going to present our work SPLiCe ( at AAAI 2025 in room 119A. Thanks to @UniWarszawski, @IDEAS_NCBR, and co-authors @s_tworkowski, @S_Jaszczur, @yuzhaouoe, @hmichalewski, @PiotrRMilos and @LukeKucinski for amazing work.
0
6
27
@CStanKonrad
Konrad Staniszewski
10 months
RT @gracjan_goral: 🚨New research alert!🚨 Ever wondered if AI can imagine how the world looks through others’ eyes? In our latest preprint:….
0
14
0
@CStanKonrad
Konrad Staniszewski
1 year
RT @AIatMeta: Introducing Meta Llama 3: the most capable openly available LLM to date. Today we’re releasing 8B & 70B models that deliver….
0
1K
0
@CStanKonrad
Konrad Staniszewski
1 year
RT @OlkoMateusz: Wondering about unraveling causal relationships? 💭 Why go all out when you can be strategic? 🚀 Say hello to GIT: the novel….
0
11
0
@CStanKonrad
Konrad Staniszewski
1 year
RT @p_nawrot: The memory in Transformers grows linearly with the sequence length at inference time. In SSMs it is constant, but often at t….
0
73
0
@CStanKonrad
Konrad Staniszewski
1 year
RT @CupiaBart: 🚀Excited to share our latest work on fine-tuning RL models! By integrating fine-tuning with knowledge retention methods, we'….
0
24
0
@CStanKonrad
Konrad Staniszewski
2 years
RT @S_Jaszczur: Introducing 🔥 MoE-Mamba 🔥 combining two exciting LLM techniques, Mixture of Experts and State Space Models. It matches Mamb….
0
29
0
@CStanKonrad
Konrad Staniszewski
2 years
RT @OlkoMateusz: Excited to be at #NeurIPS2023. Looking forward to discussions about causality and automated reasoning. I'll be presenting….
0
4
0
@CStanKonrad
Konrad Staniszewski
2 years
Thanks to all authors for the great work.@s_tworkowski.@MikolajPacek.@Yuhu_ai_.@hmichalewski.@PiotrRMilos. and supporters.@IDEAS_NCBR.TPU Research Cloud program.@UniWarszawski.
1
0
2
@CStanKonrad
Konrad Staniszewski
2 years
I will present Focused Transformer (FoT/LongLLaMA) at #NeurIPS2023. Poster session 5.Thu 14 Dec 10:45 a.m. CST.Great Hall & Hall B1+B2 (level 1).Poster #326.Come to talk about scaling context to 256K. Check our paper: and models:
Tweet media one
1
5
13
@CStanKonrad
Konrad Staniszewski
2 years
RT @Simontwice2: ✨ Introducing 🍹 Mixture of Tokens 🍹, a stable alternative to existing Mixture of Experts techniques for LLMs, providing si….
0
32
0
@CStanKonrad
Konrad Staniszewski
2 years
There is still room for improvement. The model can potentially benefit from.* more high-quality chat data.* debugging data.* long context instruction tuning data.
0
0
6
@CStanKonrad
Konrad Staniszewski
2 years
Datasets used for instruction tuning:.* TIGER-Lab/MathInstruct 🐯.* Open-Orca/OpenOrca.* zetavg/ShareGPT-Processed. Thanks to my awesome collaborators @s_tworkowski, @MikolajPacek, @PiotrRMilos, @Yuhu_ai_ and @hmichalewski.
1
0
8
@CStanKonrad
Konrad Staniszewski
2 years
PoT allows the model to express the solution using Python code. In CoT setting the calculations are manual. 📔 Colab with paper and code QA:. 🤗 HF checkpoint:. >_ Code:.
1
1
23
@CStanKonrad
Konrad Staniszewski
2 years
🎇Introducing LongLLaMA-Code 7B Instruct 🦙!🎇.A step towards an open-source alternative for Claude 2. Run in 🆓 Colab (8bit). 🗨 Answers questions about 📑 papers and >_ code. SOTA 7B reasoning :.🎓 GSM8K: 65% 🐍 PoT 0-shot, 42% std CoT 8-shot setting. >_ 37%: HumanEval
Tweet media one
3
42
254
@CStanKonrad
Konrad Staniszewski
2 years
RT @s_tworkowski: ✨Announcing LongLLaMA-Code 7B!✨. Have you wondered how GPT3.5 obtained its capability?.Are base models of code better rea….
0
55
0
@CStanKonrad
Konrad Staniszewski
2 years
RT @s_tworkowski: 🎇Introducing LongLLaMA-Instruct 32K!🎇. Inspired by @p_nawrot #nanoT5, we fine-tune LongLLaMA- on a *single GPU* for ~48h….
0
66
0