Sauvik Das @scyrusk X Profile

Sauvik Das

@scyrusk

Followers

3K

Following

5K

Media

178

Statuses

2K

Associate Professor (w/o tenure) at @cmuhcii. Director of the SPUD Lab. Join me elsewhere 🦣 [email protected] | 🦋 @sauvik.me

https://t.co/Ju2H59Cjvr

Pittsburgh, PA

Joined October 2010

Don't wanna be here? Send us removal request.

Sauvik Das

@scyrusk

5 months

As of today, I am officially Associate Professor* at @cmuhcii. Surreal — I was out of my depth when I started. Couldn't have done it without my incredible students — the true brains of the operation — and the support of my mentors. Thank you all! * not yet tenured, because CMU

17

4

177

Yuxuan Li

@YuxuanL_

14 days

Had the privilege of giving 3 talks this week @SCSatCMU and @GeorgiaTech on "Examine Machine Behavior and Simulate Society with Social Agents"! In these talks, I shared our latest work on advancing socially intelligent LLM agents. On the micro level, we have been using social

4

10

75

Hao-Ping (Hank) Lee

@hankhplee

23 days

Excited and grateful to share that I’ve been named a Presidential Fellow by @CyLab ! Endless thanks to my advisors @scyrusk and @mizjodi, and all my collaborators for their constant inspiration as we work toward shaping the future of privacy-preserving AI.

CyLab

@CyLab

24 days

Congratulations to CyLab's 2025 Presidential Fellows! Each year, CyLab recognizes high-achieving Ph.D. students pursuing security and/or privacy-related research with a CyLab Presidential Fellowship that covers one year of tuition.

1

3

26

Niloofar

@niloofar_mire

27 days

I'm recruiting students for fall 2026 thru @LTIatCMU & @CMU_EPP, in: 1. Privacy & security of LLMs, coding, long horizon & embodied agents (robotics) 2. Tiny local llms 3. AI for scientific reasoning, esp. chemistry 4. Latent reasoning 5. anything YOU are passionate about!

27

188

1K

Sauvik Das

@scyrusk

26 days

Congrats to my student @hankhplee for being recognized as one of the 2025 CyLab Presidential fellows! https://t.co/wihBfcIN6M

cylab.cmu.edu

Each year, CyLab recognizes high-achieving Ph.D. students pursuing security and/or privacy-related research, with a CyLab Presidential Fellowship, covering an entire year of tuition.

0

2

22

Sauvik Das

@scyrusk

27 days

Say hi to both if you're there! Check out the harm reporting paper here: https://t.co/uDocUykS2c Check out the AI self-disclosure assistance paper here: https://t.co/PsawDEoQKg

0

Sauvik Das

@scyrusk

27 days

Two spuddies will be at #CSCW2025: @yuxiwu will present our work on designing citizen harm reporting interfaces for privacy (and is on the job market!). @IsadoraKrsek will present our work on user reactions to AI-identified self-disclosure risks online.

1

16

Sauvik Das

@scyrusk

30 days

Honored to be one of the recipients of the Google Academic Research Awards this year! @alan_ritter and I will further our work on exploring how AI can be used to help users make safe disclosure decisions online. https://t.co/AqKKqjcEcd

research.google

1

2

59

Toby J. Li😺 (he/him)

@TobyJLi

1 month

My 1st time at @acm_ccs (and any security conference)! Please say hi if you are around! I've been having lots of Taiwanese food🍲🥘🇹🇼 We (@tianshi_li @yaxingyao @scyrusk) will host workshop on Human-Centered AI Privacy and Security (HAIPS) ( https://t.co/q2XyMWPDwD) this Friday!

0

5

33

Sauvik Das

@scyrusk

1 month

The 1st Human-Centered AI, Privacy, and Security (#HAIPS2025) workshop at CCS will happen on Oct 17th! We have two amazing keynote speakers (@patrickgage and @jas0nh0ng), and a slate of insightful and provocative papers. Agenda here: https://t.co/nl1XL7VqvF #CCS2025 @acm_ccs

0

2

20

Parker Lyman

@parker_lyman

1 month

Would love to see more work like this. Who else is out there researching Manus?

Pradyumna Shome

@PradyumnaShome

2 months

Why Johnny Can't Use Agents with Sashreek Krishnan and Sauvik Das (@scyrusk) 🚨 New pre-print: Users find AI agents to be impressive, but have trouble actually using them. We reviewed 102 AI agents and watched 31 people try to actually use Operator and Manus for real tasks.

1

2

5

Sauvik Das

@scyrusk

1 month

#chi2026 reviewing season is here! Friendly reminder that it is possible to write thoughtful reviews without nitpicking and without it taking a lot of time. I've actually completed my seven 2AC reviews already. Some (updated) tips I wrote about this: https://t.co/eqyefUteKC

sauvik-das.medium.com

Last year, the world was on fire [1]. While it burned, I reviewed. A lot. More than I care to review again in a single year.

1

25

Sauvik Das

@scyrusk

2 months

3) Finally, there is...very very little documentation associated with these datasets which made this audit much harder than it needed to be. To help improve documentation practices, we extended datasheets for datasets w/ audio-specific questions

0

1

Sauvik Das

@scyrusk

2 months

2) Most datasets pay little attention to representation — with the exception being Mozilla Common Voice. So, unsurprisingly, most audio data is in English and there is little attempt to ensure vocal representation from a broad set of individuals.

1

0

1

Sauvik Das

@scyrusk

2 months

1) While there is a lot of data that may be copyrighted, to circumvent copyright issues some datasets just comprise a lot of "old" audio data, e.g., sentences read from old newspapers and books that are now in the public domain.

1

0

1

Sauvik Das

@scyrusk

2 months

Our audit was broad: we included sound, voice, and music. We explored content, audio quality, language representation, toxicity, bias, and licensing adherence. Lots to unpack but three key findings:

1

0

1

Sauvik Das

@scyrusk

2 months

ML models are only as good as the data they are trained on, and there is understandably a lot of concern around how the data that powers these models are sourced. Through a broad review of recent gen audio papers, we identified the most commonly used datasets and audited them.

1

0

1

Sauvik Das

@scyrusk

2 months

Large audio models power a broad suite of new applications: they can continue unfinished audio, clone voices, provide an expressive range of text-to-speech voices, and can even create entire songs from simple text-based prompts. But what are they trained on?

1

0

1

Sauvik Das

@scyrusk

2 months

📣 Accepted to #AIES2025: What do the audio datasets powering generative audio models actually contain? (led by @willie_agnew) Answer: Lots of old audio content that is mostly English, often biased, and of dubious copyright / permissioning status. Paper: https://t.co/vFbJdDxSYe

1

7

Sauvik Das

@scyrusk

2 months

🔐 New #UIST2025 paper by @kyzylmonteiro : Imago Obscura uses #vision #language #models to understand user #privacy concerns, improve their awareness of image privacy risks, and their ability to address these risks. 📜: https://t.co/uZmWgsslFO 🔗: https://t.co/xsK9MWDI82

2

4

15

Sauvik Das

@scyrusk

2 months

@kyzylmonteiro will be presenting this at the Privacy session at #UIST2025 next Wednesday! https://t.co/DDHwro2rIO Please check it out if you'll be there :)

programs.sigchi.org

0

1