rosinality Profile Banner
Rosinality Profile
Rosinality

@rosinality

Followers
2K
Following
22K
Media
355
Statuses
32K

no side-effects

Seoul, Korea
Joined October 2008
Don't wanna be here? Send us removal request.
@rosinality
Rosinality
1 day
This is my summary of a podcast interviewing Xiangyu Zhang (. I found this very insightful. As I have used whisper to transcribe and gemini to translate this, it could contain errors. Though, considering the overall flow of the content, I think it would be.
0
8
20
@grok
Grok
3 days
Join millions who have switched to Grok.
167
319
2K
@rosinality
Rosinality
2 days
rStar2-Agent: Agentic Reasoning Technical Report. Agentic RL. To enhance the quality of interactions, they oversample the rollouts and subsample positive samples to retain high quality ones (instead of penalizing with rewards).
Tweet media one
1
2
11
@rosinality
Rosinality
2 days
MERIT: Maximum-normalized Element-wise Ratio for Language Model Large-batch Training. Taming attention logits in large batch training by rescaling updates using the ratio between weights and updates. How would this compare with MuonClip?
Tweet media one
1
0
7
@rosinality
Rosinality
2 days
나도 최근에 하는 생각. 이공계 인재 유출 같은 문제의 근본적인 원인은 한국은 지금 기술이 가치를 창출하는 환경이 아니라는 것이 아닐까. 그것이 내수 규모의 문제이건 절박함의 부재이건 간에.
@nooptodaks
눕다
2 days
와. 바둑 얘기만 있을 줄 알았는데 편견이네
Tweet media one
0
7
4
@rosinality
Rosinality
3 days
Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks. Increasing the number of activated experts or the total number of experts can lead to a decrease in downstream task (GSM) performance even when train/valid loss itself decreases. It is also more sensitive
Tweet media one
1
1
17
@rosinality
Rosinality
3 days
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning. Improved product key memory + tensor decomposition ( with FFN ( and initialization. Competitive with MoE.
Tweet media one
1
27
138
@rosinality
Rosinality
4 days
Predicting the Order of Upcoming Tokens Improves Language Modeling. Multiple token prediction could be too hard, so instead let the model predict the distance between tokens in the window.
Tweet media one
1
3
26
@rosinality
Rosinality
4 days
StepWiser: Stepwise Generative Judges for Wiser Reasoning. Reasoning-based Process Reward Model.
Tweet media one
1
0
8
@rosinality
Rosinality
4 days
Patch size = 32
Tweet media one
Tweet media two
Tweet media three
0
0
0
@rosinality
Rosinality
4 days
Patch size = 16
Tweet media one
Tweet media two
Tweet media three
1
0
0
@rosinality
Rosinality
4 days
Noise images, with patch size = 8
Tweet media one
Tweet media two
Tweet media three
1
0
0
@rosinality
Rosinality
4 days
Maybe the pixel-perfectness is because I used generated images to edit.
Tweet media one
Tweet media two
Tweet media three
1
0
0
@rosinality
Rosinality
4 days
Gemini 2.5 Flash Image Preview or nano banana. It is quite fast (about 8 - 9 seconds to generate a 1024px image, 1290 tokens). Editing is almost pixel perfect. (I found sometimes shifting happens.) It also lightens the cast shadow from the front gear, due to the change of
Tweet media one
Tweet media two
Tweet media three
1
1
10