Elena Chen
@codingboo
Followers
4
Following
12
Media
66
Statuses
148
Learning Data Science and Data Analytics!
Joined July 2022
For FacetGrid, pass in the arguments according to the plot type. https://t.co/aUxEIIa3ue(plot_type, arguments_needed_for_the_plot_type) Eg for scatterplot, 2 arguments needed:
0
0
0
2. FacetGrid - mapping a plot type and separating the results based on the column names (the variables you want to play around with) eg row 1 represents smokers, row 2 represents non-smokers, and 1st column represents time=Lunch, 2nd column represents time=Dinner
1
0
0
#Day16 of #DataAnalytics #Seaborns Grids are general types of plots that allow you to map plot types to rows and columns of a grid 1. PairGrid: similar to pairplot for plotting pairwise r/s but has more control over customisability of specific plots
1
0
0
#Day15 of #DataAnalytics #Seaborns Place data in matrix form by .pivot_table() .heatmap to plot data in color-encoded matrices. annot=True for annotation of the values to be presented on the grid. cmap to change color variation VS .clustermap data grouped based on similarity
0
0
1
#Day15 of #DataAnalytics #Seaborns Categorical data: - stripplot (scatterplot, but points are stacked tgt. To separate it: jitter=True) - swarmplot (similar to stripplot, but points are adjusted such that they don't overlap, and in the shape of violin. *can be combined tgt)
1
0
0
violinplot is similar to boxplot, but it features a kernel density estimation of the underlying distribution - harder to interpret but gives more information regarding distribution. - possible to add a hue parameter - split=True to combine 2 violin plots of same category into 1.
0
0
0
boxplot is a box-and-whisker plot that shows distribution of quantitative data, across the category. Adding a hue=' ' parameter allows the dataset to be split by another categorical column, eg distribution of total bill per day (1st cat), by smokers and non-smokers (2nd cat)
1
0
0
For categorical data, simplest generic form is the barplot. Default statistical function to estimate within each categorical bin is mean/average. Can change to other functions by changing 'estimator' parameter:
1
0
0
#Day14 of #DataAnalytics #Seaborns kdeplot - kernel density estimation. Idea is to replace each data point (represented by dashmark in rugplot) with a small Gaussian (Normal) distribution centered around that value, then summing the Gaussians for smooth estimate of the distributi
1
0
1
default for .jointplot is kind='scatter'. there is 'hex' for hexagonal distribution, 'reg' for regression line on top of scatter plot with pearson r value. sns.pairplot(dataframe_name) will plot every pairwise relationships across entire dataframe (for the numerical columns)
0
0
0
#Day13 of #DataAnalytics #Seaborns another visualization tool, a popular statistical library. - .load_dataset() for built-in datasets - .distplot() shows a histogram/distribution of univariate data - .jointplot(x='', y='', data=, kind=) to match 2 distplots for bivariate data
1
0
0
Lastly, to specify specific x or y axes values, you can configure the ranges of axis using .set_xlim([lowerbound,upperbound]) (meaning to zoom into specific axes range)
0
0
0
Adding a legend to the plot by specifying label=' ' in the method. (view pic) Can specify the position of legend by: axes.legend(loc=n) where the numeric signifies a specific position (view documentation). loc=0 to let matplotlib decide optimal location.
1
0
0
Many customizations: - figsize( , ) to adjust width and height of figure - colour of line (can be hex codes. alpha to indicate transparency of line) - linewidths (lw=) - linestyles (ls=) eg dashed/dotted eg 'b.-' blue line with dots - maker symbols 'o' '+' - marker size
1
0
0
#Day12 of #DataAnalytics #Matplotlib Creating figures through object-oriented method: create an empty canvas, then just call methods or attributes off of that object. - plt.figure() - plt.subplot(nrows=,ncols=)
1
0
0
#Day11 of #DataAnalytics I'm struggling with #Matplotlib because my kernel keeps restarting/dying whenever I try to import matplotlib... this was the same problem I faced the previous time when I was learning this too...
0
0
0
#Day10 of #DataAnalytics Started #Matplotlib visualization tool for Python! View: https://t.co/769KeYr5ZZ to see the whole list of figures that can be done + source code (eg statistical plots & scientific figures) import matplotlib.pyplot as plt %matplotlib inline plt.plot()
0
0
0
#Day9 of #DataAnalytics Finished a last section of learning #Pandas, and did extracting data with: - str.contain(' ', case=False) to make it case-insensitive - .head(n) to get the first n rows, usually paired with .value_counts - len(df[’col2’].unique()) / df[’col2’].nunique()
0
0
0
.merge() for merging DataFrames based on values of specified columns, and handles overlapping data using how=' ' parameter. Default: 'inner', can change to outer/left/right .join() mainly used for merging DataFrames based on the index rather than column values. Only inner join
0
0
0