Today the youngest millennial in the world turns 22 and tomorrow the oldest one turns 38, so keep that in mind when someone uses “millennial” to mean “kids these days”
Wow- NLP analysis discovers more than a million fake anti-net-neutrality comments. Brilliant data science detective work by
@thisismetis
student Jeff Kao.
#datablog
I think “even senior developers have to Google things!” is true and important but misses the point
It’s like saying “even concert pianists have to practice!”
The lesson isn’t “We’re all faking it,” it’s “Searching/practicing is necessary to get better and that never stops”
Communication tip: When you're writing for an audience of varying experience levels, explain concepts using language that to experts doesn't feel like an explanation
Data scientist: *eats a sandwich*
Other fields: That’s literally a sandwich. We’ve known about sandwiches for years. Look at data science thinking they invented sandw
Some exciting news: This week I've joined
@flatironhealth
on the Data Insights Engineering team!
Flatiron's at the frontier of using data science in the fight against cancer, and I'm thrilled to see how I can contribute.
Advice for fellow caffeine addicts: use coffee only as a reward
Pull request accepted? Celebrate with a coffee
Unit tests pass? That calls for a coffee
Finished a whole coffee? Coffee time
What scikit-learn has done to give machine learning methods in Python a consistent API is revolutionary.
@amuellerml
and the rest of the contributors deserve so much 👏 👏 👏
#rstatsnyc
Candidate I'm interviewing: I've wanted to get into Python. Did you see that report from Stack Overflow that showed how fast Python is growing in data science?
Me: 🙂
I sometimes miss my PhD research; working with yeast data was so easy
Yeast have literally no rights
You can starve them, boil them, then publish their genome online
You don't even have to fill out a form
Why start a course with datasets on day one?
“I’ve heard students say ‘I hate math’ and ‘I hate stats.’ I’ve never heard one say ‘I hate data.’
-
@minebocek
at
#SDSS18
Junior Engineer: I found the problem!
Senior Engineer: I found *a* problem.
Principal Engineer: I may have found a problem.
Manager: Hey how’s the search for that problem going?
#rstats
"tip" of the day:
You can use fct_reorder() and as.character() to reorder a date variable based on the y-axis
Congratulations- now you can get a job at
@GaDPH
Saying "no one should analyze data without consulting a statistician, they could misinterpret the results" is like saying "no one should exercise without a personal trainer, they could injure themselves."
Wife: are you going to tweet about our daughter being born?
Me: once I think of a way to make it an
#rstats
joke
Wife: …
Me: it’s called a personal brand, Dana
I'm on the job market! 🔎
I'd love to find a data science role at a small (<200) company where I can help build DS infrastructure and a team. I really like R. New York/Remote.
DMs are open if you have questions or suggestions!
#rstats
The ACLU is hiring data scientists, reporting to their new Chief Analytics Officer. Looks like a dream role for someone in Python/
#rstats
with a passion for civil liberties! 🗽
Senior Data Scientist
Data Scientist
When it comes to preparing talks I have only two modes:
1. It’s a talk I’ve given before and I change nothing except the date
2. I start it the night before and I’m still tweaking slide 14 as I’m being introduced
I hadn’t heard of the humanize library before Alex Samuel introduced it but usefulness for UI is immediately obvious: format time as “10 seconds ago” or “yesterday”
Do we have a 📦 for this in
#rstats
?
#PyDataNYC
For this week's
#tidytuesday
, I've recorded a screencast where I analyze data on college major and income, without looking at the data in advance. Excited to try this experiment!
#rstats
I've thought about recording a screencast of an example data analysis in
#rstats
. I'd do it on a dataset I'm unfamiliar with so that I can show and narrate my live thought process.
Any suggestions for interesting datasets to use?
The dm package from
@krlmlr
provides a grammar of joined tables: you can filter, mutate, or select on particular variables within a combined set of tables (local or remote) and the changes propagate through 🤯🎉
#rstats
#rstatsnyc
What tools should an R data scientist master to write performant code? My top three:
1. dplyr/data.table
2. Vector/matrix operations
3. dbplyr (or another DB ORM)
First 2 get you 80/20 (respectively) for fast data transformations, and 3rd gets you scale
Something often missed in discussions of programming language performance is that Python/R written by an expert can often be faster than Java/C/C++ written by a beginner
Performance lies along a boundary, not a spectrum, and you don’t get speed “for free” w/language choice
It's official:
@thomasp85
has taken over gganimate!
This is a complete rewrite for a better grammar of animated graphics, which he'll be keynoting about next week at
#UseR2018
Thanks to everyone who has used & contributed to gganimate; I'm excited for its future!
#rstats
Just learned about a new feature in dplyr 0.8.0: you can pass a name = argument to count() if you don't want it to be n
No more count() %>% rename()! 🥳
#rstats
Some professional news: I've joined Heap () as their first data scientist!
I love helping people get valuable product insights out of data, and Heap is a chance to help thousands of companies at once. Very excited!
Excellent summary of ML performance metrics from
@mohammedsunasra
: distinguishing accuracy, precision, recall, sensitivity, and specificity!
#datablog
Whenever I install something that takes a lot of configuration, I always wish I'd documented my process publicly to save other people time
Props to
@logart1995
for documenting his 40 hour (!) process of installing Tensorflow w/ GPU on Ubuntu!
#datablog
This is my least favorite kind of snarky tweet.
When someone appreciates a common approach in your field, you should say “Great! From our experience, here’s our advice...”
Imagine if an anthropologist praised R and I’d responded “Lol anthropologists just discovered programming”
In this
#tidytuesday
screencast, I analyze a dataset of horror movie ratings, and use lasso regression to predict ratings based on genre, cast, and plot.
What's 😱👍: Indian, animated, and drama films
What's 🙄👎: Sharks and Eric Roberts
#rstats
🧛♂️👻
Calling data science "AI" is like calling a crane or drill a "construction robot"
Tools might make it possible for us to do remarkable, previously-impossible things, but they're still tools
In this week's
#tidytuesday
screencast, I use linear models and lasso regression to predict wine ratings based on price, country, & description🍷
This was a fun example of using tidytext + glmnet together, as well as interpreting models with broom
#rstats
dplyr's killer app is that you can write code that generates SQL, so you don't have to switch your approach halfway through an analysis
The siuba package brings this to Python!
@chowthedog
#rstudioglobal
In my
#rstatsnyc
talk, I introduced dbcooper, which turns any database into an
#rstats
package 📦
Great for creating internal company packages, or exploring a public database
Package:
Slides:
In this
#tidytuesday
screencast, I import and analyze data on US PhDs granted by field
If you spend a lot of time importing Excel spreadsheets don't miss this episode: it focuses on the process of importing, cleaning and tidying messy data 😎
#rstats
A useful paradox for data scientists to keep in mind:
Most cities are small, but most people live in large cities
This relates to analyses of e.g. user engagement: most of your users probably don't do much, but most of your engagement is from frequent users