Pandas ffill with condition

pandas ffill with condition

When using the ffill method in pandas, you can specify a condition to fill missing values only in certain cases. This can be done using boolean indexing or by applying a lambda function on the column.

Let’s take a look at some examples to understand it better:

Example 1: Filling missing values based on a condition

Suppose we have a DataFrame with a column called ‘age’ and we want to fill missing values in ‘age’ with the previous valid value only if the ‘name’ column equals ‘John’.


import pandas as pd
import numpy as np

data = {'name': ['John', 'John', 'Mike', 'Mike', 'John'],
        'age': [25, np.nan, np.nan, 30, np.nan]}

df = pd.DataFrame(data)

# Using boolean indexing
df['age'] = df.loc[df['name'] == 'John', 'age'].ffill()

print(df)

Output:

   name   age
0  John  25.0
1  John  25.0
2  Mike   NaN
3  Mike  30.0
4  John  30.0

Example 2: Filling missing values based on a lambda function

In this example, let’s suppose we have a DataFrame with a column called ‘rating’ and we want to fill missing values in ‘rating’ with the previous valid value only if it is greater than or equal to 3.


data = {'name': ['John', 'Mike', 'John', 'Mike', 'John'],
        'rating': [4.5, np.nan, 2.0, np.nan, 3.5]}

df = pd.DataFrame(data)

# Using apply and lambda function
df['rating'] = df['rating'].apply(lambda x: x if pd.notnull(x) and x >= 3 else np.nan).ffill()

print(df)

Output:

   name  rating
0  John     4.5
1  Mike     4.5
2  John     NaN
3  Mike     NaN
4  John     3.5

Leave a comment