Joseph Jeesung Suh @JosephJSSuh X Profile

Joseph Jeesung Suh

@JosephJSSuh

Followers

57

Following

16

Media

7

Statuses

25

CS Grad student @ BAIR, UC Berkeley

https://t.co/ZjB26GDo0J

Berkeley, CA

Joined June 2024

Don't wanna be here? Send us removal request.

Joseph Jeesung Suh

@JosephJSSuh

14 days

(11/11) For people who are interested, here is a link: Paper: https://t.co/WvMRy4DdjR Github: https://t.co/wEYPxMH4TU Huge thanks to my amazing PI @serinachang5 and collaborator @SuhongMoon.

github.com

GEMS: Rethinking LLM Human Simulation, When a Graph is What You Need - schang-lab/gems

0

5

Joseph Jeesung Suh

@JosephJSSuh

14 days

(10/11) Takeaway 🥡 If your simulation task is a discrete choice with relational structure, try GEMS 💎 before spinning up a 70B param model. You might get similar (or better!) accuracy with a fraction of the compute and better debug-gability!

1

0

2

Joseph Jeesung Suh

@JosephJSSuh

14 days

(9/11) This builds on our earlier work SubPOP 🍭 (ACL 2025 main), where fine-tuning LLMs on scaled survey data reduced human-LLM gaps by up to half and generalized to new subpopulations & topics. Now we ask: when is a graph what you need? SubPOP:

aclanthology.org

Joseph Suh, Erfan Jahanparast, Suhong Moon, Minwoo Kang, Serina Chang. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025.

1

0

2

Joseph Jeesung Suh

@JosephJSSuh

14 days

(8/11) Interpretability & transparency matter. Node embeddings from GEMS reveal latent dimensions, from public opinion ideologies to pricing sensitivities 🔍 Unlike LLMs, GEMS is trained in-house from scratch 🪟 removing risks of data leakage and biases from opaque pretraining

1

0

2

Joseph Jeesung Suh

@JosephJSSuh

14 days

(7/11) Efficiency matters. Smaller models mean faster iteration, lower cost, and easier deployment for survey design, policy analysis, and decision support. 🚀 Also, it is much easier to scale up to larger datasets with 1000× smaller params and 100× less compute!

1

0

1

Joseph Jeesung Suh

@JosephJSSuh

14 days

(6/11) Our datasets and settings: We test 3 settings - predicting missing responses (ie, imputation), new individuals, new questions - and 3 datasets, spanning public opinion, personality traits, economics experiments, and grammar skills.

1

0

1

Joseph Jeesung Suh

@JosephJSSuh

14 days

(5/11) Key finding: A GNN that’s ~1000× smaller than LLMs matches or surpasses them on predicting human behaviors consistently across datasets and settings — while being far more interpretable and transparent. 💡

1

0

3

Joseph Jeesung Suh

@JosephJSSuh

14 days

(4/11) Why graphs? Relational structure is the signal for many human behaviors: for example, a person who is ‘worried’ to ‘health effects of COVID-19’ would likely ‘often’ ‘watch public health news’. GEMS learns from those relations directly on graphs.

1

0

2

Joseph Jeesung Suh

@JosephJSSuh

14 days

(3/11) Meet GEMS 💎 — Graph-basEd Models for human Simulation We cast human simulation as link prediction on a heterogeneous graph: nodes = individuals, subgroups, choices; edges = individual ↔ subgroup, individual ↔ choice. Simple, transparent, and fast. ⚡

1

0

2

Joseph Jeesung Suh

@JosephJSSuh

14 days

(2/11) Why discrete‑choice? A lot of “human simulation” with LLMs is predicting which choice an individual would pick from a small set: • Respondents in opinion polls • Customers choosing one item over another • Game players with finite next actions • Students answering MCQs

1

0

3

Joseph Jeesung Suh

@JosephJSSuh

14 days

LLMs have dominated recent work on simulating human behaviors. But do you really need them? In discrete‑choice settings, our answer is: not necessarily. A lightweight graph neural network (GNN) can match or beat strong LLM-based methods. Paper: https://t.co/WvMRy4DdjR 🧵👇

3

15

54

Minwoo (Josh) Kang

@joshminwookang

5 months

🤔 Do LLMs exhibit in-group↔out-group perceptions like us? ❓ Can they serve as faithful virtual subjects of human political partisans? Excited to share our paper on taking LLM virtual personas to the *next level* of depth! 🔗 https://t.co/LzeDAMtrEV 🧵

2

9

16

Raj Movva

@rajivmovva

8 months

💡New preprint & Python package: We use sparse autoencoders to generate hypotheses from large text datasets. Our method, HypotheSAEs, produces interpretable text features that predict a target variable, e.g. features in news headlines that predict engagement. 🧵1/

10

33

135

Lexin Zhou

@lexin_zhou

8 months

New Paper: We unlock AI Evaluation with explanatory and predictive power through general ability scales! -Explains what common benchmarks really measure -Extracts explainable ability profiles of AI systems -Predicts performance for new task instances, in & out-of-distribution 🧵

3

26

78

Joseph Jeesung Suh

@JosephJSSuh

9 months

For people who are interested, here is a link: Paper: https://t.co/zQ2klONwCM Github: https://t.co/r1v8QqSw0C This work would not have been possible without our amazing PI @serinachang5 and collaborators @erfan_jp, @SuhongMoon, @joshminwookang, and Prof. John Canny.

github.com

[ACL 2025 Long Main] Language Model Fine-Tuning on Scaled Survey Data for Predicting Distributions of Public Opinions - JosephJeesungSuh/subpop

0

1

9

Joseph Jeesung Suh

@JosephJSSuh

9 months

Why does this matter? Researchers often need estimation of responses for unseen subpopulations or newly formulated questions (or both), especially in the early stages of survey design. Our approach helps fill these gaps when immediate large-scale human polling isn't available.

1

0

5

Joseph Jeesung Suh

@JosephJSSuh

9 months

Beyond accuracy, generalization is crucial. Fine-tuned models exhibit stable prediction improvements for: • Unseen subpopulations (not in the fine-tuning data) • New survey topics • Different survey families (American Trends Panel → General Social Survey)

1

3

Joseph Jeesung Suh

@JosephJSSuh

9 months

Key finding: Fine-tuning our LLMs drastically narrows the human-LLM opinion gap—by up to 46%. Even better, every subgroup sees consistent improvement, addressing previous concerns that LLM-based methods might favor certain demographics' opinions over others.

1

3

Joseph Jeesung Suh

@JosephJSSuh

9 months

Meet SubPOP! 🍭 SubPOP is a dataset of 70K subpopulation-response pairs (6.5× larger than past work), curated from two major opinion survey families. We fine-tune LLMs on SubPOP to match their response distributions to those of human subjects.

2

1

4

Joseph Jeesung Suh

@JosephJSSuh

9 months

However, there hasn't been a survey dataset that is: 1. large-scale, with expansive sets of survey data sufficient for fine-tuning LLMs 2. high quality, with careful filtering and curation 3. capable of evaluating model generalization across topics & styles

1

3