realzihaolee Profile Banner
Zihao Li Profile
Zihao Li

@realzihaolee

Followers
71
Following
4K
Media
50
Statuses
626

Doctoral Researcher @HelsinkiNLP | MSc @UnivHelsinkiCS | Multilingual NLP

Helsinki, Finland
Joined June 2019
Don't wanna be here? Send us removal request.
@hplt_eu
HPLT
16 days
Wrapping @emnlpmeeting main conf. #HPLT, funded by the EU and UKRI, has supported it as a silver sponsor, disseminating HPLT results from our booth and through several papers. We'll keep shaping the future of multilingual datasets and models here and in @OpenEuroLLM. Stay tuned!
0
3
12
@hplt_eu
HPLT
17 days
Describing HPLT datasets in depth is an essential part of our commitment as data curators: ๐Ÿ†•HPLT 3.0: Very Large-Scale Multilingual Resources for LLM and MT. Mono- and Bi-lingual Data, Multilingual Evaluation, and Pre-Trained Models: https://t.co/uN2zoSF251 We are on๐Ÿ”ฅat #HPLT
Tweet card summary image
arxiv.org
We present an ongoing initiative to provide open, very large, high-quality, and richly annotated textual datasets for almost 200 languages. At 30 trillion tokens, this is likely the largest...
0
6
8
@realzihaolee
Zihao Li
21 days
Helsinki-NLP @ #EMNLP2025
0
1
20
@realzihaolee
Zihao Li
2 months
I couldnโ€™t make it to Montreal due to visa is still โ€œunder reviewโ€ since forever โ€” shoutout to IRCC for pushing the limits of bureaucracy! @CitImmCanada
0
0
0
@realzihaolee
Zihao Li
2 months
๐Ÿง  We explore how monolingual, bilingual, and code-augmented data shape multilingual continual pretraining across high- to low-resource languages. Big thanks to my supervisor @TiedemannJoerg, who will present our poster. Come chat with him!
1
0
0
@realzihaolee
Zihao Li
2 months
๐ŸŽ‰ Excited to share our #COLM2025 work: ๐‘๐ž๐ญ๐ก๐ข๐ง๐ค๐ข๐ง๐  ๐Œ๐ฎ๐ฅ๐ญ๐ข๐ฅ๐ข๐ง๐ ๐ฎ๐š๐ฅ ๐‚๐จ๐ง๐ญ๐ข๐ง๐ฎ๐š๐ฅ ๐๐ซ๐ž๐ญ๐ซ๐š๐ข๐ง๐ข๐ง๐ : ๐ƒ๐š๐ญ๐š ๐Œ๐ข๐ฑ๐ข๐ง๐  ๐Ÿ๐จ๐ซ ๐€๐๐š๐ฉ๐ญ๐ข๐ง๐  ๐‹๐‹๐Œ๐ฌ ๐€๐œ๐ซ๐จ๐ฌ๐ฌ ๐‹๐š๐ง๐ ๐ฎ๐š๐ ๐ž๐ฌ ๐š๐ง๐ ๐‘๐ž๐ฌ๐จ๐ฎ๐ซ๐œ๐ž๐ฌ ๐Ÿ“Poster Session 3 - Wednesday 11โ€“13
1
0
3
@realzihaolee
Zihao Li
2 months
Agree
@nazevice
nazevice
2 months
@zephyr_z9 My favourite European AI labs tier list: S: Mistral A: B: C: Stability AI Left the race: Aleph Alpha
0
0
1
@realzihaolee
Zihao Li
3 months
Camera-ready version of our work: Rethinking Multilingual Continual Pretraining: Data Mixing for Adapting LLMs Across Languages and Resources https://t.co/XpfzScKnJi
@COLM_conf
Conference on Language Modeling
3 months
COLM 2025 accepted submissions are now public: https://t.co/yWL007rSU7 Congratulations to all the authors, and see you all in Montreal!
0
0
4
@realzihaolee
Zihao Li
5 months
Accepted by #COLM2025 ๐Ÿ˜‹
@realzihaolee
Zihao Li
8 months
Rethinking Multilingual Continual Pretraining: Data Mixing for Adapting LLMs Across Languages and Resources https://t.co/gUpyQ36rvX #LLMs
0
0
4
@realzihaolee
Zihao Li
6 months
0
0
1
@realzihaolee
Zihao Li
7 months
Improvements in multilingual translation capabilities are noticeable. Flores-200 X-Eng 3-shots BLEU Score๐Ÿ‘‡
@Alibaba_Qwen
Qwen
7 months
Qwen3 models are supporting 119 languages and dialects. This extensive multilingual capability opens up new possibilities for international applications, enabling users worldwide to benefit from the power of these models.
0
0
1
@realzihaolee
Zihao Li
8 months
Too big!
@AIatMeta
AI at Meta
8 months
Today is the start of a new era of natively multimodal AI innovation. Today, weโ€™re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick โ€” our most advanced models yet and the best in their class for multimodality. Llama 4 Scout โ€ขย 17B-active-parameter model
0
0
1
@realzihaolee
Zihao Li
10 months
Finally
0
0
1
@EU_Commission
European Commission
10 months
AI made in ๐Ÿ‡ช๐Ÿ‡บ OpenEuroLLM, the first family of open source Large Language Models covering all EU languages, has earned the first STEP Seal for its excellence. It brings together EU startups, research labs and supercomputing hosts to train AI on European supercomputers โ†“
1K
877
6K