Pandas conditional sum

Pandas is a popular Python library for data manipulation and analysis. It provides various functions and methods to work with dataframes, which are two-dimensional, size-mutable, and heterogeneous tabular data structures.

One common task in pandas is to perform conditional sum operations on a dataframe column based on certain conditions. This can be achieved using the sum() method along with conditional expressions.

Let’s consider an example to understand this better:

import pandas as pd

# Create a sample dataframe
data = {'Name': ['John', 'Peter', 'Sarah', 'Emily', 'Michael'],
        'Age': [25, 30, 28, 22, 35],
        'Salary': [50000, 60000, 55000, 45000, 70000]}

df = pd.DataFrame(data)

# Conditionally sum the salaries for individuals below the age of 30
total_salary = df[df['Age'] < 30]['Salary'].sum()

print("Total salary for individuals below the age of 30:", total_salary)

In this example, we have a dataframe with three columns: 'Name', 'Age', and 'Salary'. We want to calculate the total salary for individuals below the age of 30.

The conditional expression df['Age'] < 30 returns a boolean mask, where each element in the 'Age' column is checked against the condition. Only the rows that satisfy the condition are selected.

After filtering the dataframe using the boolean mask, we select the 'Salary' column and apply the sum() method to get the sum of salaries for the selected rows.

The resulting sum is stored in the variable total_salary, which we can then output.

The output of the above code will be:

Total salary for individuals below the age of 30: 115000

This demonstrates how to perform a conditional sum using pandas. You can modify the conditional expression based on your specific requirements.

Leave a comment