/ Python And R Data science skills: 44 python data frames

Monday 5 February 2018

44 python data frames

44 python data frames

A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. https://www.tutorialspoint.com/python_pandas/python_pandas_dataframe.htm

In [1]:
import pandas as pd
import numpy as np
In [2]:
from numpy.random import randn
np.random.seed(3)
In [7]:
m1=randn(5,4)
In [8]:
df1=pd.DataFrame(m1,["x","y","z","m","n"],[2008,2019,2018,2019])
In [9]:
df1
Out[9]:
2008 2019 2018 2019
x 1.788628 0.436510 0.096497 -1.863493
y -0.277388 -0.354759 -0.082741 -0.627001
z -0.043818 -0.477218 -1.313865 0.884622
m 0.881318 1.709573 0.050034 -0.404677
n -0.545360 -1.546477 0.982367 -1.101068
In [16]:
df1=pd.DataFrame(m1,["x","y","z","m","n"],[2008,2019,2018,2020])
In [101]:
df1=pd.DataFrame(m1,' a b c d e '.split(),"r a dsj sk ".split())
In [104]:
df1
Out[104]:
2018 2019 2018 2019
x 0.985851 -1.472624 -1.874000 0.593433
y -1.466441 -0.637149 -1.168911 0.284104
z 1.187989 0.357807 0.483570 -0.709170
m 1.388116 -0.455038 0.582384 0.618823
n 1.722225 0.966258 -0.954509 1.881297
In [119]:
df1[[2018]]
Out[119]:
2018
x -1.874000
y -1.168911
z 0.483570
m 0.582384
n -0.954509
In [10]:
df1[:'x']
Out[10]:
2008 2019 2018 2019
x 1.788628 0.43651 0.096497 -1.863493
In [17]:
df1
Out[17]:
2008 2019 2018 2020
x 1.788628 0.436510 0.096497 -1.863493
y -0.277388 -0.354759 -0.082741 -0.627001
z -0.043818 -0.477218 -1.313865 0.884622
m 0.881318 1.709573 0.050034 -0.404677
n -0.545360 -1.546477 0.982367 -1.101068
In [19]:
df1["sum"]=df1[2018]+df1[2019]+df1[2020]+df1[2008]
In [20]:
df1
Out[20]:
2008 2019 2018 2020 sum
x 1.788628 0.436510 0.096497 -1.863493 0.458143
y -0.277388 -0.354759 -0.082741 -0.627001 -1.341889
z -0.043818 -0.477218 -1.313865 0.884622 -0.950279
m 0.881318 1.709573 0.050034 -0.404677 2.236247
n -0.545360 -1.546477 0.982367 -1.101068 -2.210537
In [22]:
df1.drop("sum",axis=1)
Out[22]:
2008 2019 2018 2020
x 1.788628 0.436510 0.096497 -1.863493
y -0.277388 -0.354759 -0.082741 -0.627001
z -0.043818 -0.477218 -1.313865 0.884622
m 0.881318 1.709573 0.050034 -0.404677
n -0.545360 -1.546477 0.982367 -1.101068
In [23]:
df1
Out[23]:
2008 2019 2018 2020 sum
x 1.788628 0.436510 0.096497 -1.863493 0.458143
y -0.277388 -0.354759 -0.082741 -0.627001 -1.341889
z -0.043818 -0.477218 -1.313865 0.884622 -0.950279
m 0.881318 1.709573 0.050034 -0.404677 2.236247
n -0.545360 -1.546477 0.982367 -1.101068 -2.210537
In [26]:
df1.drop("sum",axis=1,inplace=True)
In [27]:
df1
Out[27]:
2008 2019 2018 2020
x 1.788628 0.436510 0.096497 -1.863493
y -0.277388 -0.354759 -0.082741 -0.627001
z -0.043818 -0.477218 -1.313865 0.884622
m 0.881318 1.709573 0.050034 -0.404677
n -0.545360 -1.546477 0.982367 -1.101068
In [28]:
df1.drop('n')
Out[28]:
2008 2019 2018 2020
x 1.788628 0.436510 0.096497 -1.863493
y -0.277388 -0.354759 -0.082741 -0.627001
z -0.043818 -0.477218 -1.313865 0.884622
m 0.881318 1.709573 0.050034 -0.404677
In [29]:
df1[2008]
Out[29]:
x    1.788628
y   -0.277388
z   -0.043818
m    0.881318
n   -0.545360
Name: 2008, dtype: float64
In [30]:
df1[x]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-30-fc78bdb93add> in <module>()
----> 1 df1[x]

NameError: name 'x' is not defined
In [32]:
df1.loc['x']
Out[32]:
2008    1.788628
2019    0.436510
2018    0.096497
2020   -1.863493
Name: x, dtype: float64
In [34]:
df1.iloc[0]
Out[34]:
2008    1.788628
2019    0.436510
2018    0.096497
2020   -1.863493
Name: x, dtype: float64
In [39]:
df1.loc["x",2008]
Out[39]:
1.7886284734303186
In [40]:
df1
Out[40]:
2008 2019 2018 2020
x 1.788628 0.436510 0.096497 -1.863493
y -0.277388 -0.354759 -0.082741 -0.627001
z -0.043818 -0.477218 -1.313865 0.884622
m 0.881318 1.709573 0.050034 -0.404677
n -0.545360 -1.546477 0.982367 -1.101068
In [41]:
df1.loc[['x','y'],[2018,2019]]
Out[41]:
2018 2019
x 0.096497 0.436510
y -0.082741 -0.354759
In [43]:
type(df1.loc[['x','y'],[2018,2019]])
Out[43]:
pandas.core.frame.DataFrame

No comments:

Post a Comment