Pyspark multiply column by constant

To multiply a column by a constant in PySpark, you can use the “withColumn” function along with the multiplication operator. The “withColumn” function is used to add a new column or replace an existing column in a DataFrame.

Here’s an example of how to multiply a column by a constant:

    
      import pyspark.sql.functions as F

      # Assume you have a DataFrame called "df" with a column called "value"
      df = spark.createDataFrame([(1,), (2,), (3,)], ["value"])

      # Multiply the "value" column by a constant, e.g., 2
      df = df.withColumn("multiplied_value", F.col("value") * 2)

      df.show()
    
  

In this example, we import the “pyspark.sql.functions” module as “F” to access the built-in functions.
We create a DataFrame called “df” with a single column “value” containing the values 1, 2, and 3.

Then, we use the “withColumn” function to create a new column called “multiplied_value”.
Inside the “withColumn”, we provide the name of the new column as the first argument, and the expression for the new column as the second argument.
In the expression, we use “F.col()” to reference the existing “value” column, and then multiply it by the constant 2.

Finally, we print the resulting DataFrame using the “show()” function, which displays the contents of the DataFrame.

The output of this code will be as follows:

    +-----+----------------+
    |value|multiplied_value|
    +-----+----------------+
    |    1|               2|
    |    2|               4|
    |    3|               6|
    +-----+----------------+
  

As shown in the output, a new column called “multiplied_value” is added to the DataFrame “df”.
The values in this column are the result of multiplying the corresponding values in the “value” column by 2.

Leave a comment