In PySpark, you can use the startswith()
function to check if a string column starts with one or more specific values. The function returns a boolean column indicating whether each element in the column starts with any of the specified values.
Here is an example of using startswith()
with multiple values in PySpark:
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, lit
# Create a SparkSession
spark = SparkSession.builder.getOrCreate()
# Create a DataFrame with a string column
data = [("apple",), ("banana",), ("orange",), ("grape",)]
df = spark.createDataFrame(data, ["fruit"])
# Define the multiple values to check
values = ["app", "ban"]
# Apply the startswith() function with multiple values
result = df.withColumn("starts_with_values", col("fruit").startswith(*[lit(value) for value in values]))
# Show the result
result.show()
The output of the above code will be:
+------+-----------------+
| fruit|starts_with_values|
+------+-----------------+
| apple| true|
|banana| true|
|orange| false|
| grape| false|
+------+-----------------+
As you can see, the starts_with_values
column indicates whether each fruit name starts with any of the specified values (“app” or “ban”). The result is a boolean column with true
for fruits that start with any of the values, and false
otherwise.