Pandas loc regex

Pandas loc with regex

The loc function in Pandas allows you to select data from a DataFrame based on a certain condition. With the use of regular expressions, you can perform more complex selection operations.

Syntax

df.loc[df[column_name].str.contains(regex_pattern)]

Explanation

The loc function is used to filter rows from the DataFrame based on a specific condition. Using the str.contains() method, you can search for a specific pattern in a column.

The column_name represents the name of the column you want to apply the regex pattern on. The regex_pattern is the regular expression you want to use for matching.

The str.contains() method returns a boolean Series indicating whether each element in the column matches the regex pattern or not. By passing this boolean Series to the loc function, you can filter the DataFrame to only include the rows where the condition is True.

Example

Let’s say we have a DataFrame called df with a column named "Text" and we want to select all the rows where the text contains the word "apple":

import pandas as pd

# Sample data
data = {'Text': ['I like apples', 'Only oranges here', 'Applesauce is great', 'I prefer grapes']}
df = pd.DataFrame(data)

# Applying the regex pattern on "Text" column using loc
filtered_df = df.loc[df['Text'].str.contains('apple')]

print(filtered_df)

# Output:
#                 Text
# 0     I like apples
# 2  Applesauce is great
  

In the above example, we first import the Pandas library and create a DataFrame called df with some sample data. We then apply the str.contains() method to the "Text" column using df['Text'].str.contains('apple') as the condition.

This condition will return a boolean Series, where True indicates that the pattern ‘apple’ is found in the corresponding row. We pass this boolean Series to the loc function as df.loc[condition], which filters the DataFrame and returns only the rows where the condition is True.

The final result is a DataFrame called filtered_df that contains only the rows where the text contains the word “apple”.

Leave a comment