/ Python And R Data science skills: 02/10/18

Saturday 10 February 2018

General Python Videos For Beginners

64 csv file read home work 05

https://vlrtraining.com/courses/python-data-science-beginner-tutorial

63 csv file read home work 04

https://vlrtraining.com/courses/python-data-science-beginner-tutorial 63 csv file read home work 04
In [6]:
import pandas as pd
sal = pd.read_csv('Salaries.csv')

What was the average (mean) BasePay of all employees per year? (2011-2014) ?

sal.groupby('Year').mean()['BasePay']

sal['BasePay']

In [12]:
sal.groupby('Year').mean()['BasePay']
Out[12]:
Year
2011    63595.956517
2012    65436.406857
2013    69630.030216
2014    66564.421924
Name: BasePay, dtype: float64

How many unique job titles are there?

sal['JobTitle'].nunique()

explain using unique what len

In [16]:
len(sal['JobTitle'].unique())
Out[16]:
2159
In [17]:
sal['JobTitle'].nunique()
Out[17]:
2159

How many unique job titles are there?

sal['JobTitle'].nunique()

What are the top 5 most common jobs?

sal['JobTitle'].value_counts().head(5)

In [19]:
sal[ sal['year']== 2013][JobTitle'].value_counts().head(5)
Out[19]:
Transit Operator                7036
Special Nurse                   4389
Registered Nurse                3736
Public Svc Aide-Public Works    2518
Police Officer 3                2421
Name: JobTitle, dtype: int64

How many Job Titles were represented by only one person in 2013? (e.g. Job Titles with only one occurence in 2013?)

##### sum(sal[sal['Year']==2013]['JobTitle'].value_counts() == 1)
In [22]:
sum(sal[sal['Year']==2013]['JobTitle'].value_counts()==1)
Out[22]:
202

How many people have the word Chief in their job title? (This is pretty tricky)

In [26]:
def chie(title):
    if 'chief' in title.lower():
        return True
    else:
        return False
sum(sal['JobTitle'].apply(lambda x: chie(x)))
Out[26]:
627

Is there a correlation between length of the Job Title string and Salary?

sal['title_len'] = sal['JobTitle'].apply(len)
sal[['title_len','TotalPayBenefits']].corr() # No correlation.
In [27]:
 
Out[27]:
Id EmployeeName JobTitle BasePay OvertimePay OtherPay Benefits TotalPay TotalPayBenefits Year Notes Agency Status
0 1 NATHANIEL FORD GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY 167411.18 0.00 400184.25 NaN 567595.43 567595.43 2011 NaN San Francisco NaN
1 2 GARY JIMENEZ CAPTAIN III (POLICE DEPARTMENT) 155966.02 245131.88 137811.38 NaN 538909.28 538909.28 2011 NaN San Francisco NaN
In [28]:
def chie(title):
    if 'chief' in title.lower():
        return True
    else:
        return False
def ram(x):
    if 'chief' in x.lower():
        return True
    else:
        return False
In [33]:
ram("kumar ChIef ramesh")
Out[33]:
True
In [35]:
sum(sal['JobTitle'].apply(lambda x: ram(x)))
Out[35]:
627
In [37]:
sal['title_len'] = sal['JobTitle'].apply(len)
In [76]:
#sal[["TotalPayBenefits",'title_len']]
sal[['title_len','TotalPayBenefits']].corr() # No correlation.
Out[76]:
title_len TotalPayBenefits
title_len 1.000000 -0.036878
TotalPayBenefits -0.036878 1.000000