Filtering using mask and where in pandas

  03 Apr 2019
  python

Filtering a dataframe can be achieved in multiple ways using pandas. There are times when you simply need to update a column based on a condition which is true or vice-versa. In pandas dataframe there are some inbuilt methods to achieve the same using .where() and .mask().

  • df.where - Replace value when condition is false
  • df.mask - Replace value when condition is true

Initiating a dummy dataframe with some columns to understand the same -

import pandas as pd
import numpy as np
dummy_data = pd.DataFrame(
    np.array([[1, 2, 3], [4, 5, np.nan], [7, 8, 9], [3, 2, np.nan], [5, 6, np.nan]]),
    columns=['Column 1', 'Column 2', 'Column 3'])
dummy_data
Column 1Column 2Column 3
01.02.03.0
14.05.0NaN
27.08.09.0
33.02.0NaN
45.06.0NaN

Operations over the data frame

  • Using df.where - Replace values where values in Column 3 is not null by null across all the columns
dummy_data.where(dummy_data['Column 3'].isnull(),np.nan)
Column 1Column 2Column 3
0NaNNaNNaN
14.05.0NaN
2NaNNaNNaN
33.02.0NaN
45.06.0NaN


  • Using df.mask - Replace values where Column 3 values are null with 0 across the dataframe
dummy_data.mask(dummy_data['Column 3'].isnull(),0)
Column 1Column 2Column 3
01.02.03.0
10.00.00.0
27.08.09.0
30.00.00.0
40.00.00.0

Operations over a particular columns

  • Using df.mask - Replace values in Column 3 by 0 where values are null.
    The following code results in a list with previous value in Column 3 and the value obtained after using .mask()
list(zip(dummy_data['Column 3'],dummy_data['Column 3'].mask(dummy_data['Column 3'].isnull(),0)))

[(3.0, 3.0), (nan, 0.0), (9.0, 9.0), (nan, 0.0), (nan, 0.0)]

  • Using df.where - Replace values in Column 3 by null where values are not null.
    The following code results in a list with previous value in Column 3 & the value obtained after using .where()
list(zip(dummy_data['Column 3'],dummy_data['Column 3'].where(dummy_data['Column 3'].isnull(),np.nan)))

[(3.0, nan), (nan, nan), (9.0, nan), (nan, nan), (nan, nan)]


Notebook Link - Filtering using mask and where in pandas