Articles

Rename columns in PySpark DataFrame

Rename columns in PySpark DataFrame


In this PySpark tutorial, we will discuss how to rename single/multiple columns in PySpark DataFrame. 

Introduction:

DataFrame in PySpark is an two dimensional data structure that will store data in two dimensional format. One dimension refers to a row and second dimension refers to a column, So It will store the data in rows and columns.

Let's install pyspark module before going to this. The command to install any module in python is "pip".

Syntax:

pip install module_name

Installing PySpark:

pip install pyspark

Steps to create dataframe in PySpark:

1. Import the below modules

      import pyspark
      from pyspark.sql import SparkSession

2. Create spark app named tutorialsinhand using getOrCreate() method

     Syntax:

     spark = SparkSession.builder.appName('tutorialsinhand').getOrCreate()

3. Create list of values for dataframe

4. Pass this list to createDataFrame() method to create pyspark dataframe

    Syntax:
    spark.createDataFrame(list of values)

Using withColumnRenamed()

This function is used to rename single or multiple columns at a time.

Syntax:

dataframe.withColumnRenamed("old_name","new_name")

where,

old_name is the actual column name and new_name is the new column name for actual column.

Let's create pysprark dataframe and get the columns.

we can get the columns by using printSchema() method.

Syntax:

dataframe.printSchema()

It will return the datatype along with column name.

# import the below modules
import pyspark
from pyspark.sql import SparkSession

# create an app
spark = SparkSession.builder.appName('tutorialsinhand').getOrCreate()

#create a  list of data
values = [{'rollno': 1, 'student name': 'Gottumukkala Sravan','marks': 98},

        {'rollno': 2, 'student name': 'Gottumukkala Bobby','marks': 89},

        {'rollno': 3, 'student name': 'Lavu Ojaswi','marks': 90},

        {'rollno': 4, 'student name': 'Lavu Gnanesh','marks': 78},

        {'rollno': 5, 'student name': 'Chennupati Rohith','marks': 100}]


# create the dataframe from the values
data = spark.createDataFrame(values)


#get the columns
print(data.printSchema())

#displau
data.show()

Output:

We can see actual columns are - marks,rollno and student name.

root
 |-- marks: long (nullable = true)
 |-- rollno: long (nullable = true)
 |-- student name: string (nullable = true)

None
+-----+------+-------------------+
|marks|rollno|       student name|
+-----+------+-------------------+
|   98|     1|Gottumukkala Sravan|
|   89|     2| Gottumukkala Bobby|
|   90|     3|        Lavu Ojaswi|
|   78|     4|       Lavu Gnanesh|
|  100|     5|  Chennupati Rohith|
+-----+------+-------------------+

Example:

In this example, we will rename marks as percentage and rollno as college_roll separately.

# import the below modules
import pyspark
from pyspark.sql import SparkSession

# create an app
spark = SparkSession.builder.appName('tutorialsinhand').getOrCreate()

#create a  list of data
values = [{'rollno': 1, 'student name': 'Gottumukkala Sravan','marks': 98},

        {'rollno': 2, 'student name': 'Gottumukkala Bobby','marks': 89},

        {'rollno': 3, 'student name': 'Lavu Ojaswi','marks': 90},

        {'rollno': 4, 'student name': 'Lavu Gnanesh','marks': 78},

        {'rollno': 5, 'student name': 'Chennupati Rohith','marks': 100}]


# create the dataframe from the values
data = spark.createDataFrame(values)


#rename marks as percentage
data=data.withColumnRenamed('marks','percentage')

#rename rollno as college_roll 
data=data.withColumnRenamed('rollno','college_roll ')

#display
data.show()

#display the schema
data.printSchema()

Output:

Column names are modified,

+----------+-------------+-------------------+
|percentage|college_roll |       student name|
+----------+-------------+-------------------+
|        98|            1|Gottumukkala Sravan|
|        89|            2| Gottumukkala Bobby|
|        90|            3|        Lavu Ojaswi|
|        78|            4|       Lavu Gnanesh|
|       100|            5|  Chennupati Rohith|
+----------+-------------+-------------------+

root
 |-- percentage: long (nullable = true)
 |-- college_roll : long (nullable = true)
 |-- student name: string (nullable = true)

If we want to rename multiple columns at a time, we will use this method separated by ".".

Syntax:

dataframe.withColumnRenamed("old_name","new_name")..................withColumnRenamed("old_name","new_name")

Example:

In this example, we will rename marks as percentage and rollno as college_roll at a time.

# import the below modules
import pyspark
from pyspark.sql import SparkSession

# create an app
spark = SparkSession.builder.appName('tutorialsinhand').getOrCreate()

#create a  list of data
values = [{'rollno': 1, 'student name': 'Gottumukkala Sravan','marks': 98},

        {'rollno': 2, 'student name': 'Gottumukkala Bobby','marks': 89},

        {'rollno': 3, 'student name': 'Lavu Ojaswi','marks': 90},

        {'rollno': 4, 'student name': 'Lavu Gnanesh','marks': 78},

        {'rollno': 5, 'student name': 'Chennupati Rohith','marks': 100}]


# create the dataframe from the values
data = spark.createDataFrame(values)


#rename marks as percentage and rollno as college_roll
data=data.withColumnRenamed('marks','percentage').withColumnRenamed('rollno','college_roll ')

#display
data.show()

#display the schema
data.printSchema()

Output:

Column names are modified,

+----------+-------------+-------------------+
|percentage|college_roll |       student name|
+----------+-------------+-------------------+
|        98|            1|Gottumukkala Sravan|
|        89|            2| Gottumukkala Bobby|
|        90|            3|        Lavu Ojaswi|
|        78|            4|       Lavu Gnanesh|
|       100|            5|  Chennupati Rohith|
+----------+-------------+-------------------+

root
 |-- percentage: long (nullable = true)
 |-- college_roll : long (nullable = true)
 |-- student name: string (nullable = true)

pyspark

Would you like to see your article here on tutorialsinhand. Join Write4Us program by tutorialsinhand.com

About the Author
Gottumukkala Sravan Kumar 171FA07058
B.Tech (Hon's) - IT from Vignan's University. Published 1400+ Technical Articles on Python, R, Swift, Java, C#, LISP, PHP - MySQL and Machine Learning
Page Views :    Published Date : Jun 12,2023  
Please Share this page

Related Articles

Like every other website we use cookies. By using our site you acknowledge that you have read and understand our Cookie Policy, Privacy Policy, and our Terms of Service. Learn more Got it!