wing.nus
@wing_nus
Followers
594
Following
400
Media
115
Statuses
528
Web IR / NLP Group at the National University of Singapore
Singapore
Joined July 2012
π’ Excited to share our accepted EMNLP 2025 papers from the NUS WING group! π See you in Suzhou! #EMNLP2025
0
0
7
Led by @yajing08042. Huge thanks to Prof Kan Min-Yen (@knmnyn) and Tony Deng for their support! See you at #EMNLP2025! π Paper: https://t.co/7GZlWX5ZgH π» Code: https://t.co/xqtcA9pktd
#NLP #DataToText #Finance #FinNLP #LLM #AI π§΅[5/n] n=5
github.com
This repository provides the official implementation for the EMNLP 2025 Findings paper: KAHAN: Knowledge-Augmented Hierarchical Analysis and Narration for Financial Data Narration - yajingyang/kahan
0
0
0
High-quality knowledge can be "distilled"! We used GPT-4o to generate a knowledge base for a smaller Llama3.1-8B. This "distillation" significantly boosted its performance, enabling efficient, high-quality narration. π§΅[4/n]
1
0
0
Key finding: Hierarchy is critical. Our ablation study shows that narrative quality progressively increases as we add each level of analysisβfrom entity-only to the full KAHAN system. More structure = better insights. π§΅[3/n]
1
0
0
KAHAN's 3-stage process: Entity Analysis: Asks domain-specific questions ("Nasdaq trend?") generates code for metrics. Insight Synthesis: Builds insights hierarchically from individual entity to system-wide. Narrative Generation: Turns the insights into a coherent report.π§΅[2/n]
1
0
0
Thrilled to share that our paper, "KAHAN: Knowledge-Augmented Hierarchical Analysis and Narration," is accepted at #EMNLP2025 Findings! In this work, we built a framework that uses LLMs as domain experts to hierarchically extract insights from tables. π§΅[1/n]
1
0
2
@mki028 @AiBarid @knmnyn Welcome to our poster presentation at #EMNLP2025! We will present our poster at Hall C on Nov 7 at 12:30-13:30. See you there! π§΅[6/n] π arXiv: https://t.co/wviGuF8ZQA β¨οΈ Repo:
arxiv.org
Cross-lingual consistency should be considered to assess cross-lingual transferability, maintain the factuality of the model knowledge across languages, and preserve the parity of language model...
0
0
1
Lastly, we also tried several methods to alleviate the inconsistency bottleneck. Among the other methods, we found that training objective that promotes cross-lingual alignment shows the best improvement and alleviates bottleneck. π§΅[5/n].
1
0
0
We could see that larger model doesnβt give substantial consistency improved and we explored why this happened. So we examined the cross-lingual consistency across layer and we discovered that there is no monotonic improvement and this could possibly explain why. π§΅[4/n]
1
0
0
We discovered that query whose language is distinct from the pivot language could elicit model to answer in different entity. This finding is substantially pronounced when the writing script is different than the pivot language. π§΅[3/n]
1
0
0
We did the evaluation on code-switched sentence and we expect that by this setting, the model aligns the knowledge in more language-agnostic fashion. We limited scope to only consider English as the pivot language. π§΅[2/n]
1
0
0
π¨New Paper at #EMNLP25 Findings If we ask a multilingual language model a factual question written on different languages, do the answers always refer to the same entity? Well..not quite. We dove deep into such issue in multilingual language models our work. π§΅[1/n]
1
0
2
Key Takeaway β Stop asking βWhich is better: Transformer or SSM?β β
Start asking βHow do they propagate information to optimize new architecture?β Check our paper: π Paper: https://t.co/UpjfOcBpRg π§βπ» @NhatHoang2002, @dxlong2000, @CongDuyNguyen3, Luu Anh Tuan, @knmnyn
0
0
3
π€ Any theoretical proof for justification? βοΈ We formalize representation stability with mathematical bounds. π SSMs are provably more stable in propagation under practical conditions! This explains their resilience at deeper depths and longer contexts.
1
0
2
π€ Does the final layer still contain the most task-relevant representations? π We find that intermediate layers consistently outperform final layers across tasks, model scales, and context lengths, with Mamba showing the smallest drop to the final layer.
1
0
1
π€ What about layer-level global manifolds? π The overall patterns mirror the above token-level trends. Noticeably, a 32-layer Transformer-based model keep early/mid layers highly similar (e.g. 5th and 15th), suggesting minimal change compared to gradual evolution of SSMs.
1
0
2
π€ Do these behaviors arise from architectural biases or training dynamics? π Oversmoothing in Transformers is architectural bias; while in SSMs it is training-dependent!
1
0
2
π€ Do tokens keep their distinctiveness? π Oversmoothing reverses: Transformers homogenize early then recover late, SSMs preserve diversity early but homogenize deeper.
1
0
2
π€ How smoothly representations evolve? π Opposite trajectories: Transformers are stable early then shift late, while SSMs vary early then converge deep.
1
0
2
π€ Why do Transformers and Mamba (SSMs) fail differently on long context? π How do they mix and reshape context across depth? π No one had a unified, token + layer-level view β until now! π Paper: https://t.co/UpjfOcBpRg π§΅ π More in thread #Transformers #Mamba #NLP
1
3
6