Understand the concept of attributes, methods and functions under the context of a dataframe
Attributes
Attributes are the features of any object. They can be accessed by following a dot and the name of the following attribute.
- For example: person.age, person.height
here, age and height are the attributes of the person object
Methods and Functions
Methods are always associated with an object where as the Functions are not dependent on any object. In simple term a method is on a object where as a function is independent of object.
- For example: math.ceil(), dataframe.describe() are methods whereas sum(), len() are python built in functions
Let see some examples of atrributes, methods and functions in context of pandas dataframe :
# Load the pandas package, import data and pass column names in names parameter
import pandas as pd
data = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data",
header = None,
delim_whitespace = True,
names = ['mpg','cylinders','displacement','horsepower','weight',
'acceleration','model year','origin','car name'])
Some of the Atrributes associated with this dataframe
.dtypes
data.dtypes
mpg float64 cylinders int64 displacement float64 horsepower object weight float64 acceleration float64 model year int64 origin int64 car name object dtype: object
.columns
data.columns
Index([‘mpg’, ‘cylinders’, ‘displacement’, ‘horsepower’, ‘weight’, ‘acceleration’, ‘model year’, ‘origin’, ‘car name’], dtype=’object’)
.shape
data.shape
(398, 9)
Some of the methods associated :
describe()
data.describe()
mpg | cylinders | displacement | weight | acceleration | model year | origin | |
---|---|---|---|---|---|---|---|
count | 398.000000 | 398.000000 | 398.000000 | 398.000000 | 398.000000 | 398.000000 | 398.000000 |
mean | 23.514573 | 5.454774 | 193.425879 | 2970.424623 | 15.568090 | 76.010050 | 1.572864 |
std | 7.815984 | 1.701004 | 104.269838 | 846.841774 | 2.757689 | 3.697627 | 0.802055 |
min | 9.000000 | 3.000000 | 68.000000 | 1613.000000 | 8.000000 | 70.000000 | 1.000000 |
25% | 17.500000 | 4.000000 | 104.250000 | 2223.750000 | 13.825000 | 73.000000 | 1.000000 |
50% | 23.000000 | 4.000000 | 148.500000 | 2803.500000 | 15.500000 | 76.000000 | 1.000000 |
75% | 29.000000 | 8.000000 | 262.000000 | 3608.000000 | 17.175000 | 79.000000 | 2.000000 |
max | 46.600000 | 8.000000 | 455.000000 | 5140.000000 | 24.800000 | 82.000000 | 3.000000 |
head() and tail()
data.head()
mpg | cylinders | displacement | horsepower | weight | acceleration | model year | origin | car name | |
---|---|---|---|---|---|---|---|---|---|
0 | 18.0 | 8 | 307.0 | 130.0 | 3504.0 | 12.0 | 70 | 1 | chevrolet chevelle malibu |
1 | 15.0 | 8 | 350.0 | 165.0 | 3693.0 | 11.5 | 70 | 1 | buick skylark 320 |
2 | 18.0 | 8 | 318.0 | 150.0 | 3436.0 | 11.0 | 70 | 1 | plymouth satellite |
3 | 16.0 | 8 | 304.0 | 150.0 | 3433.0 | 12.0 | 70 | 1 | amc rebel sst |
4 | 17.0 | 8 | 302.0 | 140.0 | 3449.0 | 10.5 | 70 | 1 | ford torino |
data.tail()
mpg | cylinders | displacement | horsepower | weight | acceleration | model year | origin | car name | |
---|---|---|---|---|---|---|---|---|---|
393 | 27.0 | 4 | 140.0 | 86.00 | 2790.0 | 15.6 | 82 | 1 | ford mustang gl |
394 | 44.0 | 4 | 97.0 | 52.00 | 2130.0 | 24.6 | 82 | 2 | vw pickup |
395 | 32.0 | 4 | 135.0 | 84.00 | 2295.0 | 11.6 | 82 | 1 | dodge rampage |
396 | 28.0 | 4 | 120.0 | 79.00 | 2625.0 | 18.6 | 82 | 1 | ford ranger |
397 | 31.0 | 4 | 119.0 | 82.00 | 2720.0 | 19.4 | 82 | 1 | chevy s-10 |
Some of the functions which can be applied :
len()
len(data)
398
range(), list() and type()
x = range(6)
list(x)
[0, 1, 2, 3, 4, 5]
type(x)
range
Applying a combination of attribute, method and function on the pandas dataframe
.loc and .aggregate() with sum() on data object
data.loc[:,'mpg': 'displacement'].aggregate(sum)
mpg 9358.8 cylinders 2171.0 displacement 76983.5 dtype: float64