Konrad Staniszewski @CStanKonrad X Profile

Konrad Staniszewski

@CStanKonrad

Followers

107

Following

57

Media

2

Statuses

28

PhD student at @UniWarszawski. Working on long-context LLMs.

Warsaw, Poland

Joined March 2023

Don't wanna be here? Send us removal request.

Konrad Staniszewski

@CStanKonrad

1 month

RT @mic_nau: We wondered if off-policy RL could transfer to real robots on-par with on-policy PPO. Turns out it works surprisingly well!….

0

9

0

Konrad Staniszewski

@CStanKonrad

1 month

RT @PontiEdoardo: 🚀 By *learning* to compress the KV cache in Transformer LLMs, we can generate more tokens for the same compute budget.….

0

28

0

Konrad Staniszewski

@CStanKonrad

3 months

RT @p_nawrot: Sparse attention is one of the most promising strategies to unlock long-context processing and long generation reasoning in L….

0

112

0

Konrad Staniszewski

@CStanKonrad

4 months

In a moment, I am going to present our work SPLiCe ( at AAAI 2025 in room 119A. Thanks to @UniWarszawski, @IDEAS_NCBR, and co-authors @s_tworkowski, @S_Jaszczur, @yuzhaouoe, @hmichalewski, @PiotrRMilos and @LukeKucinski for amazing work.

0

6

27

Konrad Staniszewski

@CStanKonrad

10 months

RT @gracjan_goral: 🚨New research alert!🚨 Ever wondered if AI can imagine how the world looks through others’ eyes? In our latest preprint:….

0

14

0

Konrad Staniszewski

@CStanKonrad

1 year

RT @AIatMeta: Introducing Meta Llama 3: the most capable openly available LLM to date. Today we’re releasing 8B & 70B models that deliver….

0

1K

0

Konrad Staniszewski

@CStanKonrad

1 year

RT @OlkoMateusz: Wondering about unraveling causal relationships? 💭 Why go all out when you can be strategic? 🚀 Say hello to GIT: the novel….

0

11

0

Konrad Staniszewski

@CStanKonrad

1 year

RT @p_nawrot: The memory in Transformers grows linearly with the sequence length at inference time. In SSMs it is constant, but often at t….

0

73

0

Konrad Staniszewski

@CStanKonrad

1 year

RT @CupiaBart: 🚀Excited to share our latest work on fine-tuning RL models! By integrating fine-tuning with knowledge retention methods, we'….

0

24

0

Konrad Staniszewski

@CStanKonrad

2 years

RT @S_Jaszczur: Introducing 🔥 MoE-Mamba 🔥 combining two exciting LLM techniques, Mixture of Experts and State Space Models. It matches Mamb….

0

29

0

Konrad Staniszewski

@CStanKonrad

2 years

RT @OlkoMateusz: Excited to be at #NeurIPS2023. Looking forward to discussions about causality and automated reasoning. I'll be presenting….

0

4

0

Konrad Staniszewski

@CStanKonrad

2 years

Thanks to all authors for the great work.@s_tworkowski.@MikolajPacek.@Yuhu_ai_.@hmichalewski.@PiotrRMilos. and supporters.@IDEAS_NCBR.TPU Research Cloud program.@UniWarszawski.

1

0

2

Konrad Staniszewski

@CStanKonrad

2 years

I will present Focused Transformer (FoT/LongLLaMA) at #NeurIPS2023. Poster session 5.Thu 14 Dec 10:45 a.m. CST.Great Hall & Hall B1+B2 (level 1).Poster #326.Come to talk about scaling context to 256K. Check our paper: and models:

1

5

13

Konrad Staniszewski

@CStanKonrad

2 years

RT @Simontwice2: ✨ Introducing 🍹 Mixture of Tokens 🍹, a stable alternative to existing Mixture of Experts techniques for LLMs, providing si….

0

32

0

Konrad Staniszewski

@CStanKonrad

2 years

There is still room for improvement. The model can potentially benefit from.* more high-quality chat data.* debugging data.* long context instruction tuning data.

0

6

Konrad Staniszewski

@CStanKonrad

2 years

Datasets used for instruction tuning:.* TIGER-Lab/MathInstruct 🐯.* Open-Orca/OpenOrca.* zetavg/ShareGPT-Processed. Thanks to my awesome collaborators @s_tworkowski, @MikolajPacek, @PiotrRMilos, @Yuhu_ai_ and @hmichalewski.

1

0

8

Konrad Staniszewski

@CStanKonrad

2 years

PoT allows the model to express the solution using Python code. In CoT setting the calculations are manual. 📔 Colab with paper and code QA:. 🤗 HF checkpoint:. >_ Code:.

1

23

Konrad Staniszewski

@CStanKonrad

2 years

🎇Introducing LongLLaMA-Code 7B Instruct 🦙!🎇.A step towards an open-source alternative for Claude 2. Run in 🆓 Colab (8bit). 🗨 Answers questions about 📑 papers and >_ code. SOTA 7B reasoning :.🎓 GSM8K: 65% 🐍 PoT 0-shot, 42% std CoT 8-shot setting. >_ 37%: HumanEval

3

42

254

Konrad Staniszewski

@CStanKonrad

2 years

RT @s_tworkowski: ✨Announcing LongLLaMA-Code 7B!✨. Have you wondered how GPT3.5 obtained its capability?.Are base models of code better rea….

0

55

0

Konrad Staniszewski

@CStanKonrad

2 years

RT @s_tworkowski: 🎇Introducing LongLLaMA-Instruct 32K!🎇. Inspired by @p_nawrot #nanoT5, we fine-tune LongLLaMA- on a *single GPU* for ~48h….

0

66

0