Indexing and Sorting a dataframe using iloc and loc

  08 Apr 2019
  python

There are multiple ways in pandas by which a dataframe can be indexed i.e, selecting particular set of rows and columns from a dataframe. For a detailed description over this topic, once can refer official pandas documentation - Indexing and Selecting Data

We’ll discuss the following -

Let’s begin with loading a sample dataset and required python packages.

import pandas as pd
import numpy as np
#Fetching data from url as csv by mentioning values of various paramters
data = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data",
                   header = None,
                   index_col = False,
                   names = ['sepal_length','sepal_width','petal_length','petal_width','iris_class'])
# visualising first five rows of sample dataset (Iris)
data.head()

The sample first five rows of data looks like (can be viewed using data.head())-

sepal_lengthsepal_widthpetal_lengthpetal_widthiris_class
05.13.51.40.2Iris-setosa
14.93.01.40.2Iris-setosa
24.73.21.30.2Iris-setosa
34.63.11.50.2Iris-setosa
45.03.61.40.2Iris-setosa

Integer based indexing using iloc

To select some fixed no. of column and a fixed no. of rows from this data, one way is to achieve it by using iloc operation. The first part of indexing will be for rows and another will be columns (indexes starting from 0 to total no. of rows/columns).
For example, first 10 rows for last three columns can be extracted by -
to pass a range : can be used while indexing [start:end], start being inclusive and end being exclusive

data.iloc[0:10,2:5]
petal_lengthpetal_widthiris_class
01.40.2Iris-setosa
11.40.2Iris-setosa
21.30.2Iris-setosa
31.50.2Iris-setosa
41.40.2Iris-setosa
51.70.4Iris-setosa
61.40.3Iris-setosa
71.50.2Iris-setosa
81.40.2Iris-setosa
91.50.1Iris-setosa

Similarly, specific rows and columns can be extracted using indexes of the corresponding elements.
The following command, will return 1st and 2nd row of 2nd and 4th column.

data.iloc[[0,1],[1,3]]
sepal_widthpetal_width
03.50.2
13.00.2

Another example, to extracting first 10 rows and all columns but first two.

data.iloc[:10,2:]
petal_lengthpetal_widthiris_class
01.40.2Iris-setosa
11.40.2Iris-setosa
21.30.2Iris-setosa
31.50.2Iris-setosa
41.40.2Iris-setosa
51.70.4Iris-setosa
61.40.3Iris-setosa
71.50.2Iris-setosa
81.40.2Iris-setosa
91.50.1Iris-setosa

Labels based indexing using loc

To index a dataframe based on column names, loc can be used.
For example, to get all the columns between petal_length till iris class and records from 2nd to 10th, can be extracted by using -

data.loc[2:10,'petal_length':'iris_class']
petal_lengthpetal_widthiris_class
21.30.2Iris-setosa
31.50.2Iris-setosa
41.40.2Iris-setosa
51.70.4Iris-setosa
61.40.3Iris-setosa
71.50.2Iris-setosa
81.40.2Iris-setosa
91.50.1Iris-setosa
101.50.2Iris-setosa

Similarly, specific column names can be passed in a list for which we want to fetch the data.

data.loc[2:10,['petal_length','iris_class']]
petal_lengthiris_class
21.3Iris-setosa
31.5Iris-setosa
41.4Iris-setosa
51.7Iris-setosa
61.4Iris-setosa
71.5Iris-setosa
81.4Iris-setosa
91.5Iris-setosa
101.5Iris-setosa

Setting values using loc

.loc can be used for setting values of particular records based on some predefined filter queries.

For example, there is a need to set petal_length of for first 11 records, except first two equals to 30 can be achieved by -
data.loc[2:10,'petal_length'] = 30

# Current state of those records -
data.loc[2:10,['petal_length']]
petal_length
21.3
31.5
41.4
51.7
61.4
71.5
81.4
91.5
101.5


data.loc[2:10,'petal_length'] = 30
data.loc[2:10]
sepal_lengthsepal_widthpetal_lengthpetal_widthiris_class
24.73.230.00.2Iris-setosa
34.63.130.00.2Iris-setosa
45.03.630.00.2Iris-setosa
55.43.930.00.4Iris-setosa
64.63.430.00.3Iris-setosa
75.03.430.00.2Iris-setosa
84.42.930.00.2Iris-setosa
94.93.130.00.1Iris-setosa
105.43.730.00.2Iris-setosa

Another example of using .loc for setting values, make all records equal to null where ever sepal_width is greater than 3

# current state of data
data.head()
sepal_lengthsepal_widthpetal_lengthpetal_widthiris_class
05.13.51.40.2Iris-setosa
14.93.01.40.2Iris-setosa
24.73.230.00.2Iris-setosa
34.63.130.00.2Iris-setosa
45.03.630.00.2Iris-setosa


data.loc[data['sepal_width']>3] = np.nan
data.head()
sepal_lengthsepal_widthpetal_lengthpetal_widthiris_class
0NaNNaNNaNNaNNaN
14.93.01.40.2Iris-setosa
2NaNNaNNaNNaNNaN
3NaNNaNNaNNaNNaN
4NaNNaNNaNNaNNaN

Sorting

Sorting a dataframe in python can be done in multiple ways. We’ll be looking two of those here -

  • sort_values
  • argsort

Let’s begin with reloading the dataset again and have a look at first six rows of sepal_width column.

#Fetching data from url as csv by mentioning values of various paramters
data = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data",
                   header = None,
                   index_col = False,
                   names = ['sepal_length','sepal_width','petal_length','petal_width','iris_class'])
sample_data = data.iloc[0:6,[1]]
sample_data
sepal_width
03.5
13.0
23.2
33.1
43.6
53.9

Using sort_values

Simples way to sort a dataframe can be done using sort_values function of pandas dataframe, which take the column name argument on which the sorting is to be done.

sample_data.sort_values(by='sepal_width')
sepal_width
13.0
33.1
23.2
03.5
43.6
53.9


Looking at the indexes of the sorted values, gives us the following results.

sample_data.sort_values(by='sepal_width').index.values

array([1, 3, 2, 0, 4, 5], dtype=int64)


Using argsort

Another way to sort a dataframe can be acheived by using argsort, which basically returns the list of indexes which will sort the values. That list of indexes can be passed to .iloc indexing and the output will return an sorted column.
To read more about argsort, please follow - https://docs.scipy.org/doc/numpy/reference/generated/numpy.argsort.html

sample_data['sepal_width'].values.argsort(axis=0)

array([1, 3, 2, 0, 4, 5], dtype=int64)

sample_data.iloc[sample_data['sepal_width'].values.argsort(axis=0)]
sepal_width
13.0
33.1
23.2
03.5
43.6
53.9

Notebook Link - Indexing and Sorting a dataframe using iloc and loc