Filtering a dataframe can be achieved in multiple ways using pandas. There are times when you simply need to update a column based on a condition which is true or vice-versa. In pandas dataframe there are some inbuilt methods to achieve the same using .where() and .mask().
Initiating a dummy dataframe with some columns to understand the same -
import pandas as pd
import numpy as np
dummy_data = pd.DataFrame(
np.array([[1, 2, 3], [4, 5, np.nan], [7, 8, 9], [3, 2, np.nan], [5, 6, np.nan]]),
columns=['Column 1', 'Column 2', 'Column 3'])
dummy_data
Column 1 | Column 2 | Column 3 | |
---|---|---|---|
0 | 1.0 | 2.0 | 3.0 |
1 | 4.0 | 5.0 | NaN |
2 | 7.0 | 8.0 | 9.0 |
3 | 3.0 | 2.0 | NaN |
4 | 5.0 | 6.0 | NaN |
Operations over the data frame
- Using df.where - Replace values where values in Column 3 is not null by null across all the columns
dummy_data.where(dummy_data['Column 3'].isnull(),np.nan)
Column 1 | Column 2 | Column 3 | |
---|---|---|---|
0 | NaN | NaN | NaN |
1 | 4.0 | 5.0 | NaN |
2 | NaN | NaN | NaN |
3 | 3.0 | 2.0 | NaN |
4 | 5.0 | 6.0 | NaN |
- Using df.mask - Replace values where Column 3 values are null with 0 across the dataframe
dummy_data.mask(dummy_data['Column 3'].isnull(),0)
Column 1 | Column 2 | Column 3 | |
---|---|---|---|
0 | 1.0 | 2.0 | 3.0 |
1 | 0.0 | 0.0 | 0.0 |
2 | 7.0 | 8.0 | 9.0 |
3 | 0.0 | 0.0 | 0.0 |
4 | 0.0 | 0.0 | 0.0 |
Operations over a particular columns
- Using df.mask - Replace values in Column 3 by 0 where values are null.
The following code results in a list with previous value in Column 3 and the value obtained after using .mask()
list(zip(dummy_data['Column 3'],dummy_data['Column 3'].mask(dummy_data['Column 3'].isnull(),0)))
[(3.0, 3.0), (nan, 0.0), (9.0, 9.0), (nan, 0.0), (nan, 0.0)]
- Using df.where - Replace values in Column 3 by null where values are not null.
The following code results in a list with previous value in Column 3 & the value obtained after using .where()
list(zip(dummy_data['Column 3'],dummy_data['Column 3'].where(dummy_data['Column 3'].isnull(),np.nan)))
[(3.0, nan), (nan, nan), (9.0, nan), (nan, nan), (nan, nan)]
Notebook Link - Filtering using mask and where in pandas