wing_nus Profile Banner
wing.nus Profile
wing.nus

@wing_nus

Followers
594
Following
400
Media
115
Statuses
528

Web IR / NLP Group at the National University of Singapore

Singapore
Joined July 2012
Don't wanna be here? Send us removal request.
@wing_nus
wing.nus
14 days
πŸ“’ Excited to share our accepted EMNLP 2025 papers from the NUS WING group! πŸŽ‰ See you in Suzhou! #EMNLP2025
0
0
7
@wing_nus
wing.nus
15 days
High-quality knowledge can be "distilled"! We used GPT-4o to generate a knowledge base for a smaller Llama3.1-8B. This "distillation" significantly boosted its performance, enabling efficient, high-quality narration. 🧡[4/n]
1
0
0
@wing_nus
wing.nus
15 days
Key finding: Hierarchy is critical. Our ablation study shows that narrative quality progressively increases as we add each level of analysisβ€”from entity-only to the full KAHAN system. More structure = better insights. 🧡[3/n]
1
0
0
@wing_nus
wing.nus
15 days
KAHAN's 3-stage process: Entity Analysis: Asks domain-specific questions ("Nasdaq trend?") generates code for metrics. Insight Synthesis: Builds insights hierarchically from individual entity to system-wide. Narrative Generation: Turns the insights into a coherent report.🧡[2/n]
1
0
0
@wing_nus
wing.nus
15 days
Thrilled to share that our paper, "KAHAN: Knowledge-Augmented Hierarchical Analysis and Narration," is accepted at #EMNLP2025 Findings! In this work, we built a framework that uses LLMs as domain experts to hierarchically extract insights from tables. 🧡[1/n]
1
0
2
@wing_nus
wing.nus
22 days
@mki028 @AiBarid @knmnyn Welcome to our poster presentation at #EMNLP2025! We will present our poster at Hall C on Nov 7 at 12:30-13:30. See you there! 🧡[6/n] πŸ“„ arXiv: https://t.co/wviGuF8ZQA ⌨️ Repo:
Tweet card summary image
arxiv.org
Cross-lingual consistency should be considered to assess cross-lingual transferability, maintain the factuality of the model knowledge across languages, and preserve the parity of language model...
0
0
1
@wing_nus
wing.nus
22 days
Lastly, we also tried several methods to alleviate the inconsistency bottleneck. Among the other methods, we found that training objective that promotes cross-lingual alignment shows the best improvement and alleviates bottleneck. 🧡[5/n].
1
0
0
@wing_nus
wing.nus
22 days
We could see that larger model doesn’t give substantial consistency improved and we explored why this happened. So we examined the cross-lingual consistency across layer and we discovered that there is no monotonic improvement and this could possibly explain why. 🧡[4/n]
1
0
0
@wing_nus
wing.nus
22 days
We discovered that query whose language is distinct from the pivot language could elicit model to answer in different entity. This finding is substantially pronounced when the writing script is different than the pivot language. 🧡[3/n]
1
0
0
@wing_nus
wing.nus
22 days
We did the evaluation on code-switched sentence and we expect that by this setting, the model aligns the knowledge in more language-agnostic fashion. We limited scope to only consider English as the pivot language. 🧡[2/n]
1
0
0
@wing_nus
wing.nus
22 days
🚨New Paper at #EMNLP25 Findings If we ask a multilingual language model a factual question written on different languages, do the answers always refer to the same entity? Well..not quite. We dove deep into such issue in multilingual language models our work. 🧡[1/n]
1
0
2
@wing_nus
wing.nus
22 days
Key Takeaway ❌ Stop asking β€œWhich is better: Transformer or SSM?” βœ… Start asking β€œHow do they propagate information to optimize new architecture?” Check our paper: πŸ”— Paper: https://t.co/UpjfOcBpRg πŸ§‘β€πŸ’» @NhatHoang2002, @dxlong2000, @CongDuyNguyen3, Luu Anh Tuan, @knmnyn
0
0
3
@wing_nus
wing.nus
22 days
πŸ€” Any theoretical proof for justification? ✍️ We formalize representation stability with mathematical bounds. πŸ”Ž SSMs are provably more stable in propagation under practical conditions! This explains their resilience at deeper depths and longer contexts.
1
0
2
@wing_nus
wing.nus
22 days
πŸ€” Does the final layer still contain the most task-relevant representations? πŸ”Ž We find that intermediate layers consistently outperform final layers across tasks, model scales, and context lengths, with Mamba showing the smallest drop to the final layer.
1
0
1
@wing_nus
wing.nus
22 days
πŸ€” What about layer-level global manifolds? πŸ”Ž The overall patterns mirror the above token-level trends. Noticeably, a 32-layer Transformer-based model keep early/mid layers highly similar (e.g. 5th and 15th), suggesting minimal change compared to gradual evolution of SSMs.
1
0
2
@wing_nus
wing.nus
22 days
πŸ€” Do these behaviors arise from architectural biases or training dynamics? πŸ”Ž Oversmoothing in Transformers is architectural bias; while in SSMs it is training-dependent!
1
0
2
@wing_nus
wing.nus
22 days
πŸ€” Do tokens keep their distinctiveness? πŸ”Ž Oversmoothing reverses: Transformers homogenize early then recover late, SSMs preserve diversity early but homogenize deeper.
1
0
2
@wing_nus
wing.nus
22 days
πŸ€” How smoothly representations evolve? πŸ”Ž Opposite trajectories: Transformers are stable early then shift late, while SSMs vary early then converge deep.
1
0
2
@wing_nus
wing.nus
22 days
πŸ€” Why do Transformers and Mamba (SSMs) fail differently on long context? πŸ”Ž How do they mix and reshape context across depth? πŸš€ No one had a unified, token + layer-level view β€” until now! πŸ”— Paper: https://t.co/UpjfOcBpRg 🧡 πŸ‘‡ More in thread #Transformers #Mamba #NLP
1
3
6