Pandas groupby without aggregation

Pandas groupby Without Aggregation

In pandas, the groupby function is used to split data into groups based on specific criteria. By default, when using groupby, an aggregation function is applied to each group, which generates a new DataFrame with the grouped values as the index and the aggregated values as the columns. However, if you only need to split the data into groups without performing any specific aggregation, you can use the groupby function and avoid applying an aggregation function.

Let’s consider an example to understand how to use groupby without aggregation. Suppose we have a DataFrame with information about employees in a company, including their names, departments, and salaries.

    
      import pandas as pd

      data = {
          'Name': ['John', 'Lisa', 'Peter', 'Maria', 'David'],
          'Department': ['HR', 'IT', 'HR', 'IT', 'Finance'],
          'Salary': [5000, 6000, 4500, 5500, 7000]
      }

      df = pd.DataFrame(data)
      print(df)
    
  

Output:

    
        Name    Department  Salary
      0  John    HR          5000
      1  Lisa    IT          6000
      2  Peter   HR          4500
      3  Maria   IT          5500
      4  David   Finance     7000
    
  

Suppose we want to group the employees based on their departments without performing any aggregation. We can achieve this by using the groupby function without specifying an aggregation function.

    
      grouped_df = df.groupby('Department')
    
  

The result of the above code will be a GroupBy object, which contains the information needed to split the data into groups. To access the groups, you can iterate over the GroupBy object or use the get_group method.

Iterating over the GroupBy object:

    
      for department, group in grouped_df:
          print(f"Department: {department}")
          print(group)
          print()
    
  

Output:

    
      Department: Finance
         Name  Department  Salary
      4  David  Finance     7000

      Department: HR
         Name  Department  Salary
      0  John  HR          5000
      2  Peter HR          4500

      Department: IT
         Name  Department  Salary
      1  Lisa  IT          6000
      3  Maria IT          5500
    
  

Using the get_group method:

    
      hr_group = grouped_df.get_group('HR')
      print(hr_group)
    
  

Output:

    
         Name  Department  Salary
      0  John  HR          5000
      2  Peter HR          4500
    
  

As you can see from the examples above, by using groupby without an aggregation function, we can split the data into groups based on a specific column (in this case, the department), and then access each group separately for further analysis or processing.

Leave a comment