Sauvik Das Profile
Sauvik Das

@scyrusk

Followers
3K
Following
5K
Media
178
Statuses
2K

Associate Professor (w/o tenure) at @cmuhcii. Director of the SPUD Lab. Join me elsewhere 🦣 [email protected] | 🦋 @sauvik.me

Pittsburgh, PA
Joined October 2010
Don't wanna be here? Send us removal request.
@scyrusk
Sauvik Das
5 months
As of today, I am officially Associate Professor* at @cmuhcii. Surreal — I was out of my depth when I started. Couldn't have done it without my incredible students — the true brains of the operation — and the support of my mentors. Thank you all! * not yet tenured, because CMU
17
4
177
@YuxuanL_
Yuxuan Li
14 days
Had the privilege of giving 3 talks this week @SCSatCMU and @GeorgiaTech on "Examine Machine Behavior and Simulate Society with Social Agents"! In these talks, I shared our latest work on advancing socially intelligent LLM agents. On the micro level, we have been using social
4
10
75
@hankhplee
Hao-Ping (Hank) Lee
23 days
Excited and grateful to share that I’ve been named a Presidential Fellow by @CyLab ! Endless thanks to my advisors @scyrusk and @mizjodi, and all my collaborators for their constant inspiration as we work toward shaping the future of privacy-preserving AI.
@CyLab
CyLab
24 days
Congratulations to CyLab's 2025 Presidential Fellows! Each year, CyLab recognizes high-achieving Ph.D. students pursuing security and/or privacy-related research with a CyLab Presidential Fellowship that covers one year of tuition.
1
3
26
@niloofar_mire
Niloofar
27 days
I'm recruiting students for fall 2026 thru @LTIatCMU & @CMU_EPP, in: 1. Privacy & security of LLMs, coding, long horizon & embodied agents (robotics) 2. Tiny local llms 3. AI for scientific reasoning, esp. chemistry 4. Latent reasoning 5. anything YOU are passionate about!
27
188
1K
@scyrusk
Sauvik Das
27 days
Say hi to both if you're there! Check out the harm reporting paper here: https://t.co/uDocUykS2c Check out the AI self-disclosure assistance paper here: https://t.co/PsawDEoQKg
0
0
0
@scyrusk
Sauvik Das
27 days
Two spuddies will be at #CSCW2025: @yuxiwu will present our work on designing citizen harm reporting interfaces for privacy (and is on the job market!). @IsadoraKrsek will present our work on user reactions to AI-identified self-disclosure risks online.
1
1
16
@scyrusk
Sauvik Das
30 days
Honored to be one of the recipients of the Google Academic Research Awards this year! @alan_ritter and I will further our work on exploring how AI can be used to help users make safe disclosure decisions online. https://t.co/AqKKqjcEcd
Tweet card summary image
research.google
1
2
59
@TobyJLi
Toby J. Li😺 (he/him)
1 month
My 1st time at @acm_ccs (and any security conference)! Please say hi if you are around! I've been having lots of Taiwanese food🍲🥘🇹🇼 We (@tianshi_li @yaxingyao @scyrusk) will host workshop on Human-Centered AI Privacy and Security (HAIPS) ( https://t.co/q2XyMWPDwD) this Friday!
0
5
33
@scyrusk
Sauvik Das
1 month
The 1st Human-Centered AI, Privacy, and Security (#HAIPS2025) workshop at CCS will happen on Oct 17th! We have two amazing keynote speakers (@patrickgage and @jas0nh0ng), and a slate of insightful and provocative papers. Agenda here: https://t.co/nl1XL7VqvF #CCS2025 @acm_ccs
0
2
20
@parker_lyman
Parker Lyman
1 month
Would love to see more work like this. Who else is out there researching Manus?
@PradyumnaShome
Pradyumna Shome
2 months
Why Johnny Can't Use Agents with Sashreek Krishnan and Sauvik Das (@scyrusk) 🚨 New pre-print: Users find AI agents to be impressive, but have trouble actually using them. We reviewed 102 AI agents and watched 31 people try to actually use Operator and Manus for real tasks.
1
2
5
@scyrusk
Sauvik Das
1 month
#chi2026 reviewing season is here! Friendly reminder that it is possible to write thoughtful reviews without nitpicking and without it taking a lot of time. I've actually completed my seven 2AC reviews already. Some (updated) tips I wrote about this: https://t.co/eqyefUteKC
Tweet card summary image
sauvik-das.medium.com
Last year, the world was on fire [1]. While it burned, I reviewed. A lot. More than I care to review again in a single year.
1
1
25
@scyrusk
Sauvik Das
2 months
3) Finally, there is...very very little documentation associated with these datasets which made this audit much harder than it needed to be. To help improve documentation practices, we extended datasheets for datasets w/ audio-specific questions
0
0
1
@scyrusk
Sauvik Das
2 months
2) Most datasets pay little attention to representation — with the exception being Mozilla Common Voice. So, unsurprisingly, most audio data is in English and there is little attempt to ensure vocal representation from a broad set of individuals.
1
0
1
@scyrusk
Sauvik Das
2 months
1) While there is a lot of data that may be copyrighted, to circumvent copyright issues some datasets just comprise a lot of "old" audio data, e.g., sentences read from old newspapers and books that are now in the public domain.
1
0
1
@scyrusk
Sauvik Das
2 months
Our audit was broad: we included sound, voice, and music. We explored content, audio quality, language representation, toxicity, bias, and licensing adherence. Lots to unpack but three key findings:
1
0
1
@scyrusk
Sauvik Das
2 months
ML models are only as good as the data they are trained on, and there is understandably a lot of concern around how the data that powers these models are sourced. Through a broad review of recent gen audio papers, we identified the most commonly used datasets and audited them.
1
0
1
@scyrusk
Sauvik Das
2 months
Large audio models power a broad suite of new applications: they can continue unfinished audio, clone voices, provide an expressive range of text-to-speech voices, and can even create entire songs from simple text-based prompts. But what are they trained on?
1
0
1
@scyrusk
Sauvik Das
2 months
📣 Accepted to #AIES2025: What do the audio datasets powering generative audio models actually contain? (led by @willie_agnew) Answer: Lots of old audio content that is mostly English, often biased, and of dubious copyright / permissioning status. Paper: https://t.co/vFbJdDxSYe
1
1
7
@scyrusk
Sauvik Das
2 months
🔐 New #UIST2025 paper by @kyzylmonteiro : Imago Obscura uses #vision #language #models to understand user #privacy concerns, improve their awareness of image privacy risks, and their ability to address these risks. 📜: https://t.co/uZmWgsslFO 🔗: https://t.co/xsK9MWDI82
2
4
15
@scyrusk
Sauvik Das
2 months
@kyzylmonteiro will be presenting this at the Privacy session at #UIST2025 next Wednesday! https://t.co/DDHwro2rIO Please check it out if you'll be there :)
Tweet card summary image
programs.sigchi.org
0
0
1