Dr. Zahra Ashktorab
@zashktorab
Followers
523
Following
2K
Media
10
Statuses
226
Research Scientist @ @IBMResearch -opinions/tweets are my own.
New York, USA
Joined April 2012
Design takeaway: Nudges that demand user effort may backfire, leading users to offload more onto AI later. Check out the paper here: https://t.co/p86WcPySP5
#HumanAI #LLM #HCI #CHIWORK2025
arxiv.org
We investigate the impact of hallucinations and Cognitive Forcing Functions in human-AI collaborative content-grounded data generation, focusing on the use of Large Language Models (LLMs) to...
0
0
0
Key findings: Hallucinations significantly reduced data quality. Users sometimes blended hallucinated + correct info. CFFs shifted behavior but didn’t reliably mitigate harm. We identify new hybrid overreliance patterns in the human-AI content grounded data generation context
1
0
0
🧵 Just presented our CHIWORK paper: “Emerging Reliance Behaviors in Human-AI Content Grounded Data Generation.” We studied how hallucinations and Cognitive Forcing Functions shape user behavior during LLM-assisted data creation #CHIWORK2025
1
0
0
Huge thanks to our team: Elizabeth Daly, Michael Desmond, Hyo Gina Do, @wernergeyer , Erik Miehling, Rahul Nair, Qian Pan, Tejaswini Pedapati, @krvarshney and all who contributed to EvalAssist!
0
0
1
🛠️ Try the tool: https://t.co/rxLqKj7uz7 📂 Code & contributions: https://t.co/hV9W59TpZ2 📖 AI Alliance article: https://t.co/TjHaQ8SMay If you’re working on evaluation, we’d love your feedback. ⭐ the repo or share with others!
github.com
EvalAssist is an open-source project that simplifies using large language models as evaluators (LLM-as-a-Judge) of the output of other large language models by supporting users in iteratively refin...
1
0
0
We built this alongside a body of research on human-centered LLM evaluation, many of those findings are now featured on the site: https://t.co/rxLqKj7uz7 Proud of the collaboration that brought this to life!
1
0
0
EvalAssist lets you: ✅ Define/refine evaluation criteria interactively ✅ Generate edge cases to stress-test criteria ✅ Check for bias & get model explanations ✅ Export to Jupyter via Unitxt ✅ Use with multiple models (GPT-4, Llama 3, Granite, Mixtral…)
1
0
0
🚨 We’ve open-sourced EvalAssist, a tool to make LLM-as-a-judge evaluations easier, more structured, and scalable. Built by our team @IBMResearch, it's now available for the community. 🧵👇 🔗
ibm.github.io
EvalAssist simplifies LLM-as-a-Judge by supporting users in iteratively refining evaluation criteria in a web-based user experience.
1
2
15
New milestone: cited for a hallucinated paper😅Got an alert I got cited in a subject area I don't work in.... Checked it out and found a citation for work that definitely doesn’t exist!
3
1
9
Remember #CHI2024 while writing for #CHI2025? We did and picked some favourites to reflect on, in our new "Editors' Choice" article: https://t.co/FRx5KJd0mb
@gratefulspam @wernergeyer @watkins_welcome @qiaosi_wang @alansaid #hci #ai #hcai
0
13
60
We from #ibmresearch had a wonderful time @acm_chi #CHI2024!! Thank you organizers! It was great to see old and new friends, discussing our work and learning about new perspectives !
1
5
62
#CHI2024 @hcil_umd Alumni Dinner... Because it's (once again) about the people you meet along the way #AcademicJourney @BrennaMcNally @zashktorab @kotarohara @duto_guerra and more. Always fun to see the family.
5
3
21
@MireiaYurrita The TREW workshop wraps up with some very engaged group brainstorming of new experiments, including medical advice and Overcooked child-AI collaboration. Now the crew is off to the Diamond Head festival, the "Woodstock of Hawaii"! Huge thanks to the organizers! #CHI2024
0
3
5
Workshop attendees designing experiments around trust and reliance in human AI workflows at the TREW workshop 🤩 #chi2024
0
1
22
Kicking off the workshop with a full room!
Working on human-AI collaboration? Interested in evaluating the short and long-term impact of AI in human-AI workflow? Please consider submitting to TREW@CHI this year. @zashktorab @bansalg_ @d19fe8 @JessicaHullman @alison_m_smith @tongshuangwu
1
2
34
0
1
25
Trust and Reliance in Evolving Human-AI Workflows (TREW) workshop #CHI2024 kicks off with @AdamFourney from @MSFTResearch presenting lessons learned from Copilot etc. on AI code completions. When first released, users only accepted 21-24% of completions and still edited most...
1
6
33
I'll be at #chi2024 next week! If you'll be there too and want to chat research or catch up, let me know!🏖️
3
1
21
There’s more time to submit to TREW! Due to popular demand, we’ve extended the submission deadline to March 1st.
2
5
13