PySpark - asc() and desc()
In this PySpark tutorial, we will discuss how to use asc() and desc() methods to sort the entire pyspark DataFrame in ascending and descending order based on column/s with sort() or orderBy() methods.
Introduction:
DataFrame in PySpark is an two dimensional data structure that will store data in two dimensional format. One dimension refers to a row and second dimension refers to a column, So It will store the data in rows and columns.
Let's install pyspark module before going to this. The command to install any module in python is "pip".
Syntax:
pip install module_name
Installing PySpark:
pip install pyspark
Steps to create dataframe in PySpark:
1. Import the below modules
import pyspark
from pyspark.sql import SparkSession
2. Create spark app named tutorialsinhand using getOrCreate() method
Syntax:
spark = SparkSession.builder.appName('tutorialsinhand').getOrCreate()
3. Create list of values for dataframe
4. Pass this list to createDataFrame() method to create pyspark dataframe
Syntax:
spark.createDataFrame(list of values)
Let's create PySpark DataFrame with 5 rows and 3 columns.
# import the below modules
import pyspark
from pyspark.sql import SparkSession
# create an app
spark = SparkSession.builder.appName('tutorialsinhand').getOrCreate()
#create a list of data
values = [{'rollno': 1, 'student name': 'Gottumukkala Sravan','marks': 98},
{'rollno': 2, 'student name': 'Gottumukkala Bobby','marks': 89},
{'rollno': 3, 'student name': 'Lavu Ojaswi','marks': 90},
{'rollno': 4, 'student name': 'Lavu Gnanesh','marks': 78},
{'rollno': 5, 'student name': 'Chennupati Rohith','marks': 100}]
# create the dataframe from the values
data = spark.createDataFrame(values)
#display dataframe
data.show()
Output:
+-----+------+-------------------+
|marks|rollno| student name|
+-----+------+-------------------+
| 98| 1|Gottumukkala Sravan|
| 89| 2| Gottumukkala Bobby|
| 90| 3| Lavu Ojaswi|
| 78| 4| Lavu Gnanesh|
| 100| 5| Chennupati Rohith|
+-----+------+-------------------+
asc() is used with sort() or orderBy() methods which will sort the entire PySpark DataFrame in Ascending order based on single or multiple columns.
desc() is used with sort() or orderBy() methods which will sort the entire PySpark DataFrame in Descending order based on single or multiple columns.
Syntax for asc():
data.orderBy(data.column1.asc(),.......,data.columnn.asc())
(or)
data.sort(data.column1.asc(),.......,data.columnn.asc())
we can specify any number of columns.
Example:
In this example, we are going to sort the entire dataframe in ascending order.
# import the below modules
import pyspark
from pyspark.sql import SparkSession
# create an app
spark = SparkSession.builder.appName('tutorialsinhand').getOrCreate()
#create a list of data
values = [{'rollno': 1, 'student name': 'Gottumukkala Sravan','marks': 98},
{'rollno': 2, 'student name': 'Gottumukkala Bobby','marks': 89},
{'rollno': 3, 'student name': 'Lavu Ojaswi','marks': 90},
{'rollno': 4, 'student name': 'Lavu Gnanesh','marks': 78},
{'rollno': 5, 'student name': 'Chennupati Rohith','marks': 100}]
# create the dataframe from the values
data = spark.createDataFrame(values)
#sort by marks in ascending order with orderBy
data.orderBy(data.marks.asc()).show()
#sort by marks in ascending order with sort
data.sort(data.marks.asc()).show()
#sort by rollno and marks in ascending order with orderBy
data.orderBy(data.rollno.asc(),data.marks.asc()).show()
#sort by rollno and marks in ascending order with sort
data.sort(data.rollno.asc(),data.marks.asc()).show()
Output:
In the first output, dataframe is sorted in ascending order based on marks column using orderBy() function.
In the second output, dataframe is sorted in ascending order based on marks column using sort() function.
In the third output, dataframe is sorted in ascending order based on rollno and marks column using orderBy() function.
In the last output, dataframe is sorted in ascending order based on rollno and marks column using sort() function.
+-----+------+-------------------+
|marks|rollno| student name|
+-----+------+-------------------+
| 78| 4| Lavu Gnanesh|
| 89| 2| Gottumukkala Bobby|
| 90| 3| Lavu Ojaswi|
| 98| 1|Gottumukkala Sravan|
| 100| 5| Chennupati Rohith|
+-----+------+-------------------+
+-----+------+-------------------+
|marks|rollno| student name|
+-----+------+-------------------+
| 78| 4| Lavu Gnanesh|
| 89| 2| Gottumukkala Bobby|
| 90| 3| Lavu Ojaswi|
| 98| 1|Gottumukkala Sravan|
| 100| 5| Chennupati Rohith|
+-----+------+-------------------+
+-----+------+-------------------+
|marks|rollno| student name|
+-----+------+-------------------+
| 98| 1|Gottumukkala Sravan|
| 89| 2| Gottumukkala Bobby|
| 90| 3| Lavu Ojaswi|
| 78| 4| Lavu Gnanesh|
| 100| 5| Chennupati Rohith|
+-----+------+-------------------+
+-----+------+-------------------+
|marks|rollno| student name|
+-----+------+-------------------+
| 98| 1|Gottumukkala Sravan|
| 89| 2| Gottumukkala Bobby|
| 90| 3| Lavu Ojaswi|
| 78| 4| Lavu Gnanesh|
| 100| 5| Chennupati Rohith|
+-----+------+-------------------+
Syntax for desc():
data.orderBy(data.column1.desc(),.......,data.columnn.desc())
(or)
data.sort(data.column1.desc(),.......,data.columnn.desc())
we can specify any number of columns.
Example:
In this example, we are going to sort the entire dataframe in descending order.
# import the below modules
import pyspark
from pyspark.sql import SparkSession
# create an app
spark = SparkSession.builder.appName('tutorialsinhand').getOrCreate()
#create a list of data
values = [{'rollno': 1, 'student name': 'Gottumukkala Sravan','marks': 98},
{'rollno': 2, 'student name': 'Gottumukkala Bobby','marks': 89},
{'rollno': 3, 'student name': 'Lavu Ojaswi','marks': 90},
{'rollno': 4, 'student name': 'Lavu Gnanesh','marks': 78},
{'rollno': 5, 'student name': 'Chennupati Rohith','marks': 100}]
# create the dataframe from the values
data = spark.createDataFrame(values)
#sort by marks in descending order with orderBy
data.orderBy(data.marks.desc()).show()
#sort by marks in descending order with sort
data.sort(data.marks.desc()).show()
#sort by rollno and marks in descending order with orderBy
data.orderBy(data.rollno.desc(),data.marks.desc()).show()
#sort by rollno and marks in descending order with sort
data.sort(data.rollno.desc(),data.marks.asc()).show()
Output:
In the first output, dataframe is sorted in descending order based on marks column using orderBy() function.
In the second output, dataframe is sorted in descending order based on marks column using sort() function.
In the third output, dataframe is sorted in descending order based on rollno and marks column using orderBy() function.
In the last output, dataframe is sorted in descending order based on rollno and marks column using sort() function.
+-----+------+-------------------+
|marks|rollno| student name|
+-----+------+-------------------+
| 100| 5| Chennupati Rohith|
| 98| 1|Gottumukkala Sravan|
| 90| 3| Lavu Ojaswi|
| 89| 2| Gottumukkala Bobby|
| 78| 4| Lavu Gnanesh|
+-----+------+-------------------+
+-----+------+-------------------+
|marks|rollno| student name|
+-----+------+-------------------+
| 100| 5| Chennupati Rohith|
| 98| 1|Gottumukkala Sravan|
| 90| 3| Lavu Ojaswi|
| 89| 2| Gottumukkala Bobby|
| 78| 4| Lavu Gnanesh|
+-----+------+-------------------+
+-----+------+-------------------+
|marks|rollno| student name|
+-----+------+-------------------+
| 100| 5| Chennupati Rohith|
| 78| 4| Lavu Gnanesh|
| 90| 3| Lavu Ojaswi|
| 89| 2| Gottumukkala Bobby|
| 98| 1|Gottumukkala Sravan|
+-----+------+-------------------+
+-----+------+-------------------+
|marks|rollno| student name|
+-----+------+-------------------+
| 100| 5| Chennupati Rohith|
| 78| 4| Lavu Gnanesh|
| 90| 3| Lavu Ojaswi|
| 89| 2| Gottumukkala Bobby|
| 98| 1|Gottumukkala Sravan|
+-----+------+-------------------+
Would you like to see your article here on tutorialsinhand.
Join
Write4Us program by tutorialsinhand.com
About the Author
Gottumukkala Sravan Kumar 171FA07058
B.Tech (Hon's) - IT from Vignan's University.
Published 1400+ Technical Articles on Python, R, Swift, Java, C#, LISP, PHP - MySQL and Machine Learning
Page Views :
Published Date :
Jun 12,2023