/ Python And R Data science skills: 86 rugplot and kdeplot
Showing posts with label 86 rugplot and kdeplot. Show all posts
Showing posts with label 86 rugplot and kdeplot. Show all posts

Sunday, 18 February 2018

86 rugplot and kdeplot

https://vlrtraining.com/courses/python-data-science-beginner-tutorial 86 rugplot and kdeplot
In [1]:
import seaborn as sns
%matplotlib inline
tips = sns.load_dataset('tips')
tips.head(1)
Out[1]:
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2

rugplot

rugplots are actually a very simple concept, they just draw a dash mark for every point on a univariate distribution. They are the building block of a KDE plot:

In [2]:
sns.rugplot(tips['total_bill'])
Out[2]:
<matplotlib.axes._subplots.AxesSubplot at 0xb262128>
In [4]:
sns.distplot(tips['total_bill'])
Out[4]:
<matplotlib.axes._subplots.AxesSubplot at 0xba83438>

kdeplot

kdeplots are Kernel Density Estimation plots. These KDE plots replace every single observation with a Gaussian (Normal) distribution centered around that value. For example:

In [4]:
# Don't worry about understanding this code!
# It's just for the diagram below
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

#Create dataset
dataset = np.random.randn(25)

# Create another rugplot
sns.rugplot(dataset);

# Set up the x-axis for the plot
x_min = dataset.min() - 2
x_max = dataset.max() + 2

# 100 equally spaced points from x_min to x_max
x_axis = np.linspace(x_min,x_max,100)

# Set up the bandwidth, for info on this:
url = 'http://en.wikipedia.org/wiki/Kernel_density_estimation#Practical_estimation_of_the_bandwidth'

bandwidth = ((4*dataset.std()**5)/(3*len(dataset)))**.2


# Create an empty kernel list
kernel_list = []

# Plot each basis function
for data_point in dataset:
    
    # Create a kernel for each point and append to list
    kernel = stats.norm(data_point,bandwidth).pdf(x_axis)
    kernel_list.append(kernel)
    
    #Scale for plotting
    kernel = kernel / kernel.max()
    kernel = kernel * .4
    plt.plot(x_axis,kernel,color = 'grey',alpha=0.5)

plt.ylim(0,1)
Out[4]:
(0, 1)
In [5]:
# To get the kde plot we can sum these basis functions.

# Plot the sum of the basis function
sum_of_kde = np.sum(kernel_list,axis=0)

# Plot figure
fig = plt.plot(x_axis,sum_of_kde,color='indianred')

# Add the initial rugplot
sns.rugplot(dataset,c = 'indianred')

# Get rid of y-tick marks
plt.yticks([])

# Set title
plt.suptitle("Sum of the Basis Functions")
Out[5]:
Text(0.5,0.98,'Sum of the Basis Functions')
In [6]:
sns.kdeplot(tips['tip'])
sns.rugplot(tips['tip'])
Out[6]:
<matplotlib.axes._subplots.AxesSubplot at 0x7d323c8>
In [6]:
tips.head(1)
Out[6]:
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
In [7]:
sns.kdeplot(tips['total_bill'])
sns.rugplot(tips['total_bill'])
Out[7]:
<matplotlib.axes._subplots.AxesSubplot at 0xbdd15c0>
In [ ]:
++