In this PySpark tutorial, we will discuss how to use lit() method to add values to a column in PySpark DataFrame.
Introduction:
DataFrame in PySpark is an two dimensional data structure that will store data in two dimensional format. One dimension refers to a row and second dimension refers to a column, So It will store the data in rows and columns.
Let's install pyspark module before going to this. The command to install any module in python is "pip".
Syntax:
pip install module_name
Installing PySpark:
pip install pyspark
Steps to create dataframe in PySpark:
1. Import the below modules
import pyspark
from pyspark.sql import SparkSession
2. Create spark app named tutorialsinhand using getOrCreate() method
Syntax:
spark = SparkSession.builder.appName('tutorialsinhand').getOrCreate()
3. Create list of values for dataframe
4. Pass this list to createDataFrame() method to create pyspark dataframe
Syntax:
spark.createDataFrame(list of values)
lit()
Before using lit(), we have to import it from pyspark.sql.functions module.
lit() is used to create a new column in an existing pyspark dataframe and add values to the new column.
Syntax:
from pyspark.sql.functions import lit
It can be used with select() method to create a column and add a constant value.
Syntax:
dataframe.select(lit(constant).alias("new"))
where, new is the new column name and constant is the value/element added to the new column.
Example:
In this example, we will create a dataframe with 5 rows and 3 columns. we are adding a column named AGE and add csontant value 12 to this column.
# import the below modules
import pyspark
from pyspark.sql import SparkSession
# create an app
spark = SparkSession.builder.appName('tutorialsinhand').getOrCreate()
#create a list of data
values = [{'rollno': 1, 'student name': 'Gottumukkala Sravan','marks': 98},
{'rollno': 2, 'student name': 'Gottumukkala Bobby','marks': 89},
{'rollno': 3, 'student name': 'Lavu Ojaswi','marks': 90},
{'rollno': 4, 'student name': 'Lavu Gnanesh','marks': 78},
{'rollno': 5, 'student name': 'Chennupati Rohith','marks': 100}]
# create the dataframe from the values
data = spark.createDataFrame(values)
#display
data.show()
#import lit functions
from pyspark.sql.functions import lit
#create AGE column and add a value - 12
data.select(lit(12).alias("AGE")).show()
Output:
We added AGE column with 12 as value
+-----+------+-------------------+
|marks|rollno| student name|
+-----+------+-------------------+
| 98| 1|Gottumukkala Sravan|
| 89| 2| Gottumukkala Bobby|
| 90| 3| Lavu Ojaswi|
| 78| 4| Lavu Gnanesh|
| 100| 5| Chennupati Rohith|
+-----+------+-------------------+
+---+
|AGE|
+---+
| 12|
| 12|
| 12|
| 12|
| 12|
+---+
We can also add values from the existing column values to the new column.
We can do by placing the column inside lit() function.
Syntax:
dataframe.select(lit(dataframe.column_name).alias("new"))
where, column_name is the column , its values are added to new column.
Example:
In this example,we will add a new column named - Percentage and add values from marks column.
# import the below modules
import pyspark
from pyspark.sql import SparkSession
# create an app
spark = SparkSession.builder.appName('tutorialsinhand').getOrCreate()
#create a list of data
values = [{'rollno': 1, 'student name': 'Gottumukkala Sravan','marks': 98},
{'rollno': 2, 'student name': 'Gottumukkala Bobby','marks': 89},
{'rollno': 3, 'student name': 'Lavu Ojaswi','marks': 90},
{'rollno': 4, 'student name': 'Lavu Gnanesh','marks': 78},
{'rollno': 5, 'student name': 'Chennupati Rohith','marks': 100}]
# create the dataframe from the values
data = spark.createDataFrame(values)
#import lit functions
from pyspark.sql.functions import lit
#Create Percentage column and assing the values from marks column
#display all rollno , marks and Percentage column.
data.select(data.rollno,data.marks,lit(data.marks).alias("Percentage")).show()
Output:
We are displaying rollno,marks and Percentage columns.
+------+-----+----------+
|rollno|marks|Percentage|
+------+-----+----------+
| 1| 98| 98|
| 2| 89| 89|
| 3| 90| 90|
| 4| 78| 78|
| 5| 100| 100|
+------+-----+----------+
Would you like to see your article here on tutorialsinhand.
Join
Write4Us program by tutorialsinhand.com
About the Author
Gottumukkala Sravan Kumar 171FA07058
B.Tech (Hon's) - IT from Vignan's University.
Published 1400+ Technical Articles on Python, R, Swift, Java, C#, LISP, PHP - MySQL and Machine Learning
Page Views :
Published Date :
Jun 12,2023