Pandas groupby Without Aggregation
In pandas, the groupby function is used to split data into groups based on specific criteria. By default, when using groupby, an aggregation function is applied to each group, which generates a new DataFrame with the grouped values as the index and the aggregated values as the columns. However, if you only need to split the data into groups without performing any specific aggregation, you can use the groupby function and avoid applying an aggregation function.
Let’s consider an example to understand how to use groupby without aggregation. Suppose we have a DataFrame with information about employees in a company, including their names, departments, and salaries.
import pandas as pd
data = {
'Name': ['John', 'Lisa', 'Peter', 'Maria', 'David'],
'Department': ['HR', 'IT', 'HR', 'IT', 'Finance'],
'Salary': [5000, 6000, 4500, 5500, 7000]
}
df = pd.DataFrame(data)
print(df)
Output:
Name Department Salary
0 John HR 5000
1 Lisa IT 6000
2 Peter HR 4500
3 Maria IT 5500
4 David Finance 7000
Suppose we want to group the employees based on their departments without performing any aggregation. We can achieve this by using the groupby function without specifying an aggregation function.
grouped_df = df.groupby('Department')
The result of the above code will be a GroupBy object, which contains the information needed to split the data into groups. To access the groups, you can iterate over the GroupBy object or use the get_group method.
Iterating over the GroupBy object:
for department, group in grouped_df:
print(f"Department: {department}")
print(group)
print()
Output:
Department: Finance
Name Department Salary
4 David Finance 7000
Department: HR
Name Department Salary
0 John HR 5000
2 Peter HR 4500
Department: IT
Name Department Salary
1 Lisa IT 6000
3 Maria IT 5500
Using the get_group method:
hr_group = grouped_df.get_group('HR')
print(hr_group)
Output:
Name Department Salary
0 John HR 5000
2 Peter HR 4500
As you can see from the examples above, by using groupby without an aggregation function, we can split the data into groups based on a specific column (in this case, the department), and then access each group separately for further analysis or processing.