In this PySpark tutorial, we will discuss how to use show() method to display the PySpark dataframe.
Introduction:
DataFrame in PySpark is an two dimensional data structure that will store data in two dimensional format. One dimension refers to a row and second dimension refers to a column, So It will store the data in rows and columns.
Let's install pyspark module before going to this. The command to install any module in python is "pip".
Syntax:
pip install module_name
Installing PySpark:
pip install pyspark
![]()
Steps to create dataframe in PySpark:
1. Import the below modules
import pyspark
from pyspark.sql import SparkSession
2. Create spark app named tutorialsinhand using getOrCreate() method
Syntax:
spark = SparkSession.builder.appName('tutorialsinhand').getOrCreate()
3. Create list of values for dataframe
4. Pass this list to createDataFrame() method to create pyspark dataframe
Syntax:
spark.createDataFrame(list of values)
show() method will display the dataframe in a tabular format.
It will accept three optional parameters.
Syntax:
show(n,vertical,truncate)
1. n will represent number of rows will be displayed from top by taking an input integer.
2. vertical will display the dataframe in horizontal format if it is False and display the dataframe in vertical format if it is True.
3. truncate will display only particular number of charcaters in each value of pyspark dataframe by taking an input integer.
Example 1:
In this example, we will display top 3 rows from pyspark dataframe.
# import the below modules
import pyspark
from pyspark.sql import SparkSession
# create an app
spark = SparkSession.builder.appName('tutorialsinhand').getOrCreate()
#create a list of data
values = [{'rollno': 1, 'student name': 'Gottumukkala Sravan kumar','marks': 98},
{'rollno': 2, 'student name': 'Gottumukkala Bobby','marks': 89},
{'rollno': 3, 'student name': 'Lavu Ojaswi','marks': 90},
{'rollno': 4, 'student name': 'Lavu Gnanesh','marks': 78},
{'rollno': 5, 'student name': 'Chennupati Rohith','marks': 100}]
# create the dataframe from the values
data = spark.createDataFrame(values)
#display top 3
data.show(n=3)
Output:
Here, we used n parameter by setting value as 3 to get top 3 rows.
+-----+------+--------------------+
|marks|rollno| student name|
+-----+------+--------------------+
| 98| 1|Gottumukkala Srav...|
| 89| 2| Gottumukkala Bobby|
| 90| 3| Lavu Ojaswi|
+-----+------+--------------------+
only showing top 3 rows
Example 2:
In this example, we will display pyspark dataframe in vertical format.
By default it is horizontal format.
# import the below modules
import pyspark
from pyspark.sql import SparkSession
# create an app
spark = SparkSession.builder.appName('tutorialsinhand').getOrCreate()
#create a list of data
values = [{'rollno': 1, 'student name': 'Gottumukkala Sravan kumar','marks': 98},
{'rollno': 2, 'student name': 'Gottumukkala Bobby','marks': 89},
{'rollno': 3, 'student name': 'Lavu Ojaswi','marks': 90},
{'rollno': 4, 'student name': 'Lavu Gnanesh','marks': 78},
{'rollno': 5, 'student name': 'Chennupati Rohith','marks': 100}]
# create the dataframe from the values
data = spark.createDataFrame(values)
#display in vertical format
data.show(vertical =True)
Output:
Here, we used vertical parameter and set to True.
-RECORD 0----------------------------
marks | 98
rollno | 1
student name | Gottumukkala Srav...
-RECORD 1----------------------------
marks | 89
rollno | 2
student name | Gottumukkala Bobby
-RECORD 2----------------------------
marks | 90
rollno | 3
student name | Lavu Ojaswi
-RECORD 3----------------------------
marks | 78
rollno | 4
student name | Lavu Gnanesh
-RECORD 4----------------------------
marks | 100
rollno | 5
student name | Chennupati Rohith
Example 3:
In this example, we will display pyspark dataframe by selecting only first charactaer in each row and column.
# import the below modules
import pyspark
from pyspark.sql import SparkSession
# create an app
spark = SparkSession.builder.appName('tutorialsinhand').getOrCreate()
#create a list of data
values = [{'rollno': 1, 'student name': 'Gottumukkala Sravan kumar','marks': 98},
{'rollno': 2, 'student name': 'Gottumukkala Bobby','marks': 89},
{'rollno': 3, 'student name': 'Lavu Ojaswi','marks': 90},
{'rollno': 4, 'student name': 'Lavu Gnanesh','marks': 78},
{'rollno': 5, 'student name': 'Chennupati Rohith','marks': 100}]
# create the dataframe from the values
data = spark.createDataFrame(values)
#display only one character
data.show(truncate=1)
Output:
We have used truncate parameter and set to 1.
+-----+------+------------+
|marks|rollno|student name|
+-----+------+------------+
| 9| 1| G|
| 8| 2| G|
| 9| 3| L|
| 7| 4| L|
| 1| 5| C|
+-----+------+------------+
Would you like to see your article here on tutorialsinhand.
Join
Write4Us program by tutorialsinhand.com
About the Author
Gottumukkala Sravan Kumar 171FA07058
B.Tech (Hon's) - IT from Vignan's University.
Published 1400+ Technical Articles on Python, R, Swift, Java, C#, LISP, PHP - MySQL and Machine Learning
Page Views :
Published Date :
Jun 12,2023