
Konrad Staniszewski
@CStanKonrad
Followers
107
Following
57
Media
2
Statuses
28
PhD student at @UniWarszawski. Working on long-context LLMs.
Warsaw, Poland
Joined March 2023
RT @mic_nau: We wondered if off-policy RL could transfer to real robots on-par with on-policy PPO. Turns out it works surprisingly well!….
0
9
0
RT @PontiEdoardo: 🚀 By *learning* to compress the KV cache in Transformer LLMs, we can generate more tokens for the same compute budget.….
0
28
0
RT @p_nawrot: Sparse attention is one of the most promising strategies to unlock long-context processing and long generation reasoning in L….
0
112
0
In a moment, I am going to present our work SPLiCe ( at AAAI 2025 in room 119A. Thanks to @UniWarszawski, @IDEAS_NCBR, and co-authors @s_tworkowski, @S_Jaszczur, @yuzhaouoe, @hmichalewski, @PiotrRMilos and @LukeKucinski for amazing work.
0
6
27
RT @gracjan_goral: 🚨New research alert!🚨 Ever wondered if AI can imagine how the world looks through others’ eyes? In our latest preprint:….
0
14
0
RT @AIatMeta: Introducing Meta Llama 3: the most capable openly available LLM to date. Today we’re releasing 8B & 70B models that deliver….
0
1K
0
RT @OlkoMateusz: Wondering about unraveling causal relationships? 💭 Why go all out when you can be strategic? 🚀 Say hello to GIT: the novel….
0
11
0
RT @p_nawrot: The memory in Transformers grows linearly with the sequence length at inference time. In SSMs it is constant, but often at t….
0
73
0
RT @CupiaBart: 🚀Excited to share our latest work on fine-tuning RL models! By integrating fine-tuning with knowledge retention methods, we'….
0
24
0
RT @S_Jaszczur: Introducing 🔥 MoE-Mamba 🔥 combining two exciting LLM techniques, Mixture of Experts and State Space Models. It matches Mamb….
0
29
0
RT @OlkoMateusz: Excited to be at #NeurIPS2023. Looking forward to discussions about causality and automated reasoning. I'll be presenting….
0
4
0
Thanks to all authors for the great work.@s_tworkowski.@MikolajPacek.@Yuhu_ai_.@hmichalewski.@PiotrRMilos. and supporters.@IDEAS_NCBR.TPU Research Cloud program.@UniWarszawski.
1
0
2
I will present Focused Transformer (FoT/LongLLaMA) at #NeurIPS2023. Poster session 5.Thu 14 Dec 10:45 a.m. CST.Great Hall & Hall B1+B2 (level 1).Poster #326.Come to talk about scaling context to 256K. Check our paper: and models:
1
5
13
RT @Simontwice2: ✨ Introducing 🍹 Mixture of Tokens 🍹, a stable alternative to existing Mixture of Experts techniques for LLMs, providing si….
0
32
0
Datasets used for instruction tuning:.* TIGER-Lab/MathInstruct 🐯.* Open-Orca/OpenOrca.* zetavg/ShareGPT-Processed. Thanks to my awesome collaborators @s_tworkowski, @MikolajPacek, @PiotrRMilos, @Yuhu_ai_ and @hmichalewski.
1
0
8
RT @s_tworkowski: ✨Announcing LongLLaMA-Code 7B!✨. Have you wondered how GPT3.5 obtained its capability?.Are base models of code better rea….
0
55
0
RT @s_tworkowski: 🎇Introducing LongLLaMA-Instruct 32K!🎇. Inspired by @p_nawrot #nanoT5, we fine-tune LongLLaMA- on a *single GPU* for ~48h….
0
66
0