David Robinson Profile Banner
David Robinson Profile
David Robinson

@drob

Followers
51,840
Following
646
Media
2,106
Statuses
13,068

Director of Data Science at @Contentsquare . #rstats fan. Dad x2. He/him

New York, NY
Joined June 2009
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
@drob
David Robinson
4 years
Timeline of 2020, to scale
Tweet media one
110
13K
54K
@drob
David Robinson
3 years
Tweet media one
105
3K
20K
@drob
David Robinson
5 years
Today the youngest millennial in the world turns 22 and tomorrow the oldest one turns 38, so keep that in mind when someone uses “millennial” to mean “kids these days”
Tweet media one
162
6K
10K
@drob
David Robinson
7 years
When you’ve written the same code 3 times, write a function When you’ve given the same in-person advice 3 times, write a blog post
43
1K
4K
@drob
David Robinson
3 years
gradient descent
Tweet media one
21
493
4K
@drob
David Robinson
4 years
It’s never too early
Tweet media one
52
386
3K
@drob
David Robinson
7 years
Lord, grant me the confidence of a Hacker News commenter criticizing an article they haven't read
18
513
2K
@drob
David Robinson
6 years
Roses are red, Violets are blue, Here is a heart Made with ggplot2 #rstats
Tweet media one
18
580
2K
@drob
David Robinson
5 years
I think “even senior developers have to Google things!” is true and important but misses the point It’s like saying “even concert pianists have to practice!” The lesson isn’t “We’re all faking it,” it’s “Searching/practicing is necessary to get better and that never stops”
34
391
2K
@drob
David Robinson
8 years
Every linear algebra class Me: What are eigenvectors Teacher: You can think of them as an n-dimensional kernel subspace Me: No I can't
29
647
2K
@drob
David Robinson
6 years
Conference organizer: Please upload your slides 30 days in advance Me:
24
267
2K
@drob
David Robinson
3 years
My favorite tidyr function
Tweet media one
20
112
1K
@drob
David Robinson
4 years
Let's do this ONE LIKE = ONE INSULT OF A DISTRIBUTION
@ChelseaParlett
Chelsea Parlett-Pelleriti
4 years
What's the most overrated distribution, #statsTwitter ?
39
11
125
35
339
1K
@drob
David Robinson
4 years
Excited to announce the latest project we've been working on 📦👶
Tweet media one
58
49
1K
@drob
David Robinson
8 years
New post: Analysis of Trump tweets confirms he writes only the angrier Android half #rstats
Tweet media one
63
1K
1K
@drob
David Robinson
6 years
I have some exciting news: today I'm joining @DataCamp as their Chief Data Scientist 🎉📊📈
Tweet media one
83
64
1K
@drob
David Robinson
4 years
Communication tip: When you're writing for an audience of varying experience levels, explain concepts using language that to experts doesn't feel like an explanation
Tweet media one
12
159
1K
@drob
David Robinson
6 years
if you then you don’t don’t love deserve me at my me at my
Tweet media one
Tweet media two
11
244
989
@drob
David Robinson
7 years
I promise, once US isn't in a constitutional crisis with a madman dictator abusing rights I'll go straight back to tweeting base vs ggplot2
13
165
974
@drob
David Robinson
6 years
When you fix a major flaw in your statistical methodology but the results turn out the same
Tweet media one
4
201
852
@drob
David Robinson
6 years
New blog post: "What's the difference between data science, machine learning, and artificial intelligence?"
Tweet media one
14
394
820
@drob
David Robinson
7 years
Everyone around the world giving and receiving answers to each other's @StackOverflow questions. Just yesterday. #rstats
33
534
793
@drob
David Robinson
7 years
I love Xiao Li-Meng’s radical proposal- each time your result turns out to be wrong, your salary gets cut by your p-value #SSI2017
Tweet media one
36
365
788
@drob
David Robinson
6 years
Data scientist: *eats a sandwich* Other fields: That’s literally a sandwich. We’ve known about sandwiches for years. Look at data science thinking they invented sandw
6
97
744
@drob
David Robinson
3 years
#rstats tip of the day: The tayloRswift package from @adastephenson now offers a ggplot2 palette for Red (Taylor's Version)
Tweet media one
Tweet media two
20
159
739
@drob
David Robinson
5 years
There are three kinds of lies: lies, damned lies, and "I'll finish it on the plane"
8
124
732
@drob
David Robinson
7 years
New blog post: "The Incredible Growth of Python"
Tweet media one
15
468
676
@drob
David Robinson
4 years
This store sure has strong opinions about writing academic papers with Markdown
Tweet media one
10
118
679
@drob
David Robinson
8 years
Me: Git makes it easy to revert your local changes Them: Great! So what command do I use? Me: I said it was easy not that I knew how
13
293
650
@drob
David Robinson
7 years
Pro tip: If you can't afford a data scientist, just tweet the work you need done and say: "See, here's a problem #rstats could never solve"
9
230
652
@drob
David Robinson
5 years
Some exciting news: This week I've joined @flatironhealth on the Data Insights Engineering team! Flatiron's at the frontier of using data science in the fight against cancer, and I'm thrilled to see how I can contribute.
Tweet media one
47
20
628
@drob
David Robinson
4 years
Wow while we’ve been messing around on #rstats Reddit has been getting it done
Tweet media one
13
52
611
@drob
David Robinson
8 years
When teaching, be careful not to mix up "I learned this a long time ago" with "This is simple" #rstats
15
331
615
@drob
David Robinson
7 years
Fun fact: the more a county uses R (measuring by % of Stack Overflow traffic), the less it voted for Trump #rstats
Tweet media one
34
491
602
@drob
David Robinson
5 years
Advice for fellow caffeine addicts: use coffee only as a reward Pull request accepted? Celebrate with a coffee Unit tests pass? That calls for a coffee Finished a whole coffee? Coffee time
18
96
607
@drob
David Robinson
6 years
Them: I usually prefer ggplot2, but sometimes when I'm in a hurry I just use plot() Me:
@realDonaldTrump
Donald J. Trump
6 years
TREASON?
73K
22K
86K
10
88
600
@drob
David Robinson
6 years
What scikit-learn has done to give machine learning methods in Python a consistent API is revolutionary. @amuellerml and the rest of the contributors deserve so much 👏 👏 👏 #rstatsnyc
Tweet media one
Tweet media two
Tweet media three
7
132
578
@drob
David Robinson
4 years
Why do we even have conspiracy theorists if they’re on the side of the secret government police in the unmarked vans You had one job, people
4
147
544
@drob
David Robinson
7 years
R and Python are both negatively correlated with Trump support by county Correlated with Trump support? C# and PHP #rstats
Tweet media one
31
412
534
@drob
David Robinson
7 years
New blog post: "Advice to aspiring data scientists: start a blog" #rstats
Tweet media one
10
185
556
@drob
David Robinson
7 years
You can't have Stack Overflow run on AWS If AWS went down, they'd never be able to fix it
2
408
523
@drob
David Robinson
6 years
Candidate I'm interviewing: I've wanted to get into Python. Did you see that report from Stack Overflow that showed how fast Python is growing in data science? Me: 🙂
Tweet media one
13
42
542
@drob
David Robinson
6 years
Anyone in #rstats who uses = for assignment is a cop
38
95
537
@drob
David Robinson
6 years
I sometimes miss my PhD research; working with yeast data was so easy Yeast have literally no rights You can starve them, boil them, then publish their genome online You don't even have to fill out a form
9
81
536
@drob
David Robinson
6 years
Why start a course with datasets on day one? “I’ve heard students say ‘I hate math’ and ‘I hate stats.’ I’ve never heard one say ‘I hate data.’ - @minebocek at #SDSS18
11
128
528
@drob
David Robinson
6 years
. @tanyacash21 digs up the first known job listing for a Data Scientist, from 2008. “No specific technical skills are required.” Times have changed!
Tweet media one
15
146
529
@drob
David Robinson
6 years
🤔
Tweet media one
8
53
502
@drob
David Robinson
6 years
Junior Engineer: I found the problem! Senior Engineer: I found *a* problem. Principal Engineer: I may have found a problem. Manager: Hey how’s the search for that problem going?
9
118
508
@drob
David Robinson
7 years
If you haven't seen it before, the CRAPL- an open source license for hacked-together academic software- is an absolute gem @mattmight
Tweet media one
5
267
495
@drob
David Robinson
6 years
. @juliasilge : “When people are introduced to PCA, they usually feel something like this” #rstudioconf
Tweet media one
10
108
495
@drob
David Robinson
4 years
#rstats "tip" of the day: You can use fct_reorder() and as.character() to reorder a date variable based on the y-axis Congratulations- now you can get a job at @GaDPH
Tweet media one
9
65
494
@drob
David Robinson
6 years
Saying "no one should analyze data without consulting a statistician, they could misinterpret the results" is like saying "no one should exercise without a personal trainer, they could injure themselves."
25
88
486
@drob
David Robinson
6 years
Snark towards beginner programmers is not just mean, it’s lazy It lets us excuse poor teaching, or poor usability, as being the beginners’ fault
6
128
479
@drob
David Robinson
3 years
Wife: are you going to tweet about our daughter being born? Me: once I think of a way to make it an #rstats joke Wife: … Me: it’s called a personal brand, Dana
26
6
481
@drob
David Robinson
7 years
Announcing the release of my e-book! "Introduction to Empirical Bayes: Examples from Baseball Statistics" #rstats
Tweet media one
18
147
478
@drob
David Robinson
4 years
I'm on the job market! 🔎 I'd love to find a data science role at a small (<200) company where I can help build DS infrastructure and a team. I really like R. New York/Remote. DMs are open if you have questions or suggestions! #rstats
23
125
460
@drob
David Robinson
6 years
The ACLU is hiring data scientists, reporting to their new Chief Analytics Officer. Looks like a dream role for someone in Python/ #rstats with a passion for civil liberties! 🗽 Senior Data Scientist Data Scientist
5
344
435
@drob
David Robinson
3 years
NEW BLOG POST: Machine learning in a hurry: what I've learned from the #SLICED ML competition #rstats
Tweet media one
7
75
454
@drob
David Robinson
6 years
When it comes to preparing talks I have only two modes: 1. It’s a talk I’ve given before and I change nothing except the date 2. I start it the night before and I’m still tweaking slide 14 as I’m being introduced
14
33
447
@drob
David Robinson
7 years
New blog post: "Developers who use spaces make more money than those who use tabs" #rstats
Tweet media one
41
339
433
@drob
David Robinson
6 years
I hadn’t heard of the humanize library before Alex Samuel introduced it but usefulness for UI is immediately obvious: format time as “10 seconds ago” or “yesterday” Do we have a 📦 for this in #rstats ? #PyDataNYC
Tweet media one
9
94
426
@drob
David Robinson
5 years
A design of our geom_slat which is totally effective while at the same time beautiful! #rstats
Tweet media one
15
49
431
@drob
David Robinson
6 years
For this week's #tidytuesday , I've recorded a screencast where I analyze data on college major and income, without looking at the data in advance. Excited to try this experiment! #rstats
Tweet media one
21
70
426
@drob
David Robinson
6 years
I've thought about recording a screencast of an example data analysis in #rstats . I'd do it on a dataset I'm unfamiliar with so that I can show and narrate my live thought process. Any suggestions for interesting datasets to use?
48
28
419
@drob
David Robinson
5 years
I don't know who needs to hear this, but the reason fct_reorder isn't working in your graph is that your data was still grouped
12
34
407
@drob
David Robinson
6 years
“I comment my code as if at any moment I might get a traumatic brain injury” @dataandme at #rstatsnyc
7
87
412
@drob
David Robinson
6 years
Almost all our engineers are Belgian If @DataCamp goes down during these 90 minutes, it stays down
Tweet media one
9
40
402
@drob
David Robinson
7 years
“We don’t have to teach data science, it’s just a fancy word for statistics” “Why would we teach programming? This is a statistics course”
12
114
406
@drob
David Robinson
4 years
The dm package from @krlmlr provides a grammar of joined tables: you can filter, mutate, or select on particular variables within a combined set of tables (local or remote) and the changes propagate through 🤯🎉 #rstats #rstatsnyc
Tweet media one
7
77
401
@drob
David Robinson
4 years
What tools should an R data scientist master to write performant code? My top three: 1. dplyr/data.table 2. Vector/matrix operations 3. dbplyr (or another DB ORM) First 2 get you 80/20 (respectively) for fast data transformations, and 3rd gets you scale
@drob
David Robinson
4 years
Something often missed in discussions of programming language performance is that Python/R written by an expert can often be faster than Java/C/C++ written by a beginner Performance lies along a boundary, not a spectrum, and you don’t get speed “for free” w/language choice
9
41
261
14
56
392
@drob
David Robinson
6 years
It's official: @thomasp85 has taken over gganimate! This is a complete rewrite for a better grammar of animated graphics, which he'll be keynoting about next week at #UseR2018 Thanks to everyone who has used & contributed to gganimate; I'm excited for its future! #rstats
Tweet media one
4
66
394
@drob
David Robinson
6 years
My proposed distinction: * Data science produces insights * Machine learning produces predictions * Artificial intelligence produces actions
@drob
David Robinson
6 years
New blog post: "What's the difference between data science, machine learning, and artificial intelligence?"
Tweet media one
14
394
820
11
145
381
@drob
David Robinson
5 years
Just learned about a new feature in dplyr 0.8.0: you can pass a name = argument to count() if you don't want it to be n No more count() %>% rename()! 🥳 #rstats
Tweet media one
8
67
385
@drob
David Robinson
4 years
Some professional news: I've joined Heap () as their first data scientist! I love helping people get valuable product insights out of data, and Heap is a chance to help thousands of companies at once. Very excited!
Tweet media one
22
16
382
@drob
David Robinson
5 years
When you avoid teaching statistical programming to beginners because “they might misuse it”
Tweet media one
5
83
377
@drob
David Robinson
6 years
Excellent summary of ML performance metrics from @mohammedsunasra : distinguishing accuracy, precision, recall, sensitivity, and specificity! #datablog
Tweet media one
Tweet media two
Tweet media three
Tweet media four
5
142
379
@drob
David Robinson
7 years
New blog post: "Teach the tidyverse to beginners" #rstats
19
153
372
@drob
David Robinson
6 years
If you don't talk to your kids about the tidyverse, someone else will #rstats
11
78
360
@drob
David Robinson
6 years
New blog post: "Who wrote the anti-Trump New York Times op-ed? A tidytext analysis of document similarity" #rstats
Tweet media one
12
150
352
@drob
David Robinson
3 months
Academia has higher standards than industry for “is this true?” Industry has higher standards than academia for “is this useful?”
26
54
367
@drob
David Robinson
6 years
My wife told me she had a surprise and later I go to my bar and
Tweet media one
22
23
344
@drob
David Robinson
7 years
Today we're launching Stack Overflow Trends, to track the rise and fall of programming technologies
Tweet media one
25
247
352
@drob
David Robinson
6 years
Whenever I want to cheer up I just think about what a crummy year this January Reddit commenter has been having
Tweet media one
16
58
351
@drob
David Robinson
6 years
Whenever I install something that takes a lot of configuration, I always wish I'd documented my process publicly to save other people time Props to @logart1995 for documenting his 40 hour (!) process of installing Tensorflow w/ GPU on Ubuntu! #datablog
Tweet media one
4
67
355
@drob
David Robinson
6 years
This is my least favorite kind of snarky tweet. When someone appreciates a common approach in your field, you should say “Great! From our experience, here’s our advice...” Imagine if an anthropologist praised R and I’d responded “Lol anthropologists just discovered programming”
16
42
353
@drob
David Robinson
5 years
In this #tidytuesday screencast, I analyze a dataset of pizza ratings and discover the best and worse 🍕 in NYC and other cities 📊 #rstats
Tweet media one
7
65
350
@drob
David Robinson
5 years
In this #tidytuesday screencast, I analyze a dataset of horror movie ratings, and use lasso regression to predict ratings based on genre, cast, and plot. What's 😱👍: Indian, animated, and drama films What's 🙄👎: Sharks and Eric Roberts #rstats 🧛‍♂️👻
Tweet media one
6
61
344
@drob
David Robinson
5 years
Calling data science "AI" is like calling a crane or drill a "construction robot" Tools might make it possible for us to do remarkable, previously-impossible things, but they're still tools
11
57
345
@drob
David Robinson
5 years
In this week's #tidytuesday screencast, I use linear models and lasso regression to predict wine ratings based on price, country, & description🍷 This was a fun example of using tidytext + glmnet together, as well as interpreting models with broom #rstats
Tweet media one
6
64
338
@drob
David Robinson
4 years
I've become a huge fan of the sf package's tidy approach to spatial data, and @emdodwell is giving a terrific tutorial + case study 🗺️🌎 #rstatsnyc
Tweet media one
5
48
342
@drob
David Robinson
3 years
Tweet media one
3
77
334
@drob
David Robinson
7 years
New blog post: "Don't teach students the hard way first" #rstats
Tweet media one
17
151
334
@drob
David Robinson
7 years
New blog post: Announcing "Introduction to the Tidyverse", my new @DataCamp course #rstats
Tweet media one
10
92
332
@drob
David Robinson
6 years
@beeonaposy Underrated tech: SQL Underrated model: extensions to linear (glm, gam, LASSO, etc) Underrated skill: writing/communication
6
71
330
@drob
David Robinson
3 years
dplyr's killer app is that you can write code that generates SQL, so you don't have to switch your approach halfway through an analysis The siuba package brings this to Python! @chowthedog #rstudioglobal
Tweet media one
5
49
320
@drob
David Robinson
3 years
Tweet media one
4
37
322
@drob
David Robinson
3 years
In my #rstatsnyc talk, I introduced dbcooper, which turns any database into an #rstats package 📦 Great for creating internal company packages, or exploring a public database Package: Slides:
Tweet media one
5
59
316
@drob
David Robinson
5 years
In this #tidytuesday screencast, I import and analyze data on US PhDs granted by field If you spend a lot of time importing Excel spreadsheets don't miss this episode: it focuses on the process of importing, cleaning and tidying messy data 😎 #rstats
Tweet media one
7
64
312
@drob
David Robinson
6 years
A useful paradox for data scientists to keep in mind: Most cities are small, but most people live in large cities This relates to analyses of e.g. user engagement: most of your users probably don't do much, but most of your engagement is from frequent users
7
75
312