How to get variance and standard deviation in pandas DataFrame? | var() & std() in pandas
In this pandas tutorial, we will discuss about:
-
how to get variance in pandas?
-
how to get standard deviation in pandas?
-
var method in pandas,
-
std method in pandas
We will learn about methods var in pandas and std in pandas to calculate standard deviation and variance.
DataFrame in pandas is an two dimensional data structure that will store data in two dimensional format. One dimension refers to a row and second dimension refers to a column, So It will store the data in rows and columns.
We can able to create this DataFrame using DataFrame() method. But this is available in pandas module, so we have to import pandas module.
Syntax:
pandas.DataFrame(data)
Where, data is the input dataframe , The data can be a dictionary that stores list of values with specified key.
Example: Create DataFrame
In this example, we will create a dataframe with 4 rows and 4 columns with building data and assign indices through index parameter.
import pandas as pd
#create dataframe from the college data
data= pd.DataFrame({'building-id':['c-001','c-021','c-002','c-004'],
'length':[5.6,7.8,4.5,5.3],
"breadth":[12.9,4.5,21.5,6.0],
"area":[20,56,43,45]
},index=['one','two','three','four'])
#display the dataframe
print(data)
Output: dataframe is created
building-id length breadth area
one c-001 5.6 12.9 20
two c-021 7.8 4.5 56
three c-002 4.5 21.5 43
four c-004 5.3 6.0 45
Now lets understand how to get variance in pandas and how to get standard deviation in pandas?
var in pandas
We can get the variance by using var in pandas or var() function.
Syntax: var method in pandas
dataframe.var(axis)
where, dataframe is the input dataframe
-
axis =1 represents column, which will return the variance column wise.
-
axis= 0 represents row, which will return the variance row wise.
Example 1: calculate variance in pandas
Lets calculate variance in pandas across column
import pandas as pd
#create dataframe from the college data
data= pd.DataFrame({'building-id':['c-001','c-021','c-002','c-004'],
'length':[5.6,7.8,4.5,5.3],
"breadth":[12.9,4.5,21.5,6.0],
"area":[20,56,43,45]
},index=['one','two','three','four'])
#calculate variance in pandas
print(data.var(axis=1))
Output: calculate variance in pandas result on above dataframe is given below
one 51.843333
two 831.063333
three 372.250000
four 516.263333
dtype: float64
Example 2: calculate variance in pandas
Lets calculate variance in pandas across row
import pandas as pd
#create dataframe from the college data
data= pd.DataFrame({'building-id':['c-001','c-021','c-002','c-004'],
'length':[5.6,7.8,4.5,5.3],
"breadth":[12.9,4.5,21.5,6.0],
"area":[20,56,43,45]
},index=['one','two','three','four'])
#calculate variance in pandas
print(data.var(axis=0))
Output: Result for calculate variance in pandas is given below
length 1.993333
breadth 60.302500
area 228.666667
dtype: float64
Note - we can specify column name, if we want to return variance for particular column. But specify the axis as 0.
Example 3: calculate variance in pandas
import pandas as pd
#create dataframe from the college data
data= pd.DataFrame({'building-id':['c-001','c-021','c-002','c-004'],
'length':[5.6,7.8,4.5,5.3],
"breadth":[12.9,4.5,21.5,6.0],
"area":[20,56,43,45]
},index=['one','two','three','four'])
#get the variance for only length column
print(data['length'].var(axis=0))
Output: calculate variance in pandas result for specified column is given below
1.9933333333333332
Thus we have seen three different example on how to get variance in pandas? using var in pandas.
std method in pandas
We can get the standard deviation by using std method in pandas or std() function.
Syntax: std method in pandas
dataframe.std(axis)
where, dataframe is the input dataframe
-
axis =1 represents column, which will return the standard deviation column wise.
-
axis= 0 represents row, which will return the standard deviation row wise.
Example 1: calculate standard deviation in pandas
Lets calculate standard deviation in pandas across column.
import pandas as pd
#create dataframe from the college data
data= pd.DataFrame({'building-id':['c-001','c-021','c-002','c-004'],
'length':[5.6,7.8,4.5,5.3],
"breadth":[12.9,4.5,21.5,6.0],
"area":[20,56,43,45]
},index=['one','two','three','four'])
#get the standard deviation
print(data.std(axis=1))
Output: Result for calculate standard deviation in pandas
one 7.200231
two 28.828169
three 19.293781
four 22.721429
dtype: float64
Example 2: calculate standard deviation in pandas
Lets calculate standard deviation in pandas across row
import pandas as pd
#create dataframe from the college data
data= pd.DataFrame({'building-id':['c-001','c-021','c-002','c-004'],
'length':[5.6,7.8,4.5,5.3],
"breadth":[12.9,4.5,21.5,6.0],
"area":[20,56,43,45]
},index=['one','two','three','four'])
#calculate standard deviation in pandas
print(data.std(axis=0))
Output: Result for calculate standard deviation in pandas
length 1.411855
breadth 7.765468
area 15.121728
dtype: float64
Note - we can specify column name, if we want to return standard deviation for particular column. But specify the axis as 0.
Example 3: calculate standard deviation in pandas
Lets calculate standard deviation in pandas using column name
import pandas as pd
#create dataframe from the college data
data= pd.DataFrame({'building-id':['c-001','c-021','c-002','c-004'],
'length':[5.6,7.8,4.5,5.3],
"breadth":[12.9,4.5,21.5,6.0],
"area":[20,56,43,45]
},index=['one','two','three','four'])
#get the standard deviation for only length column
print(data['length'].std(axis=0))
Output: Result for calculate standard deviation in pandas for specified column
1.411854572303158
Thus we have learned how to get standard deviation in pandas? using std method in pandas.
Would you like to see your article here on tutorialsinhand.
Join
Write4Us program by tutorialsinhand.com
About the Author
Gottumukkala Sravan Kumar 171FA07058
B.Tech (Hon's) - IT from Vignan's University.
Published 1400+ Technical Articles on Python, R, Swift, Java, C#, LISP, PHP - MySQL and Machine Learning
Page Views :
Published Date :
Apr 02,2022