Merge two pandas DataFrame - INNER JOIN
In this pandas tutorial we will discuss how to perform inner join on pandas DataFrames.
Introduction
DataFrame in pandas is two dimensional data structure that will store data in two dimensional format. One dimension refers to a row and second dimension refers to a column, So It will store the data in rows and columns.
We create DataFrame using DataFrame() method. But this is available in pandas module, so we have to import pandas module.
Syntax:
pandas.DataFrame(data)
Where, data is the input dataframe , The data can be a dictionary that stores list of values with specified key
Example 1: Create first pandas dataframe
In this example, we will create dataframe with 4 rows and 4 columns with college data and assign indices through index parameter.
import pandas as pd
#create dataframe from the college data
data= pd.DataFrame({'college_id':[1,2,3,4],
'college_name':["vignan university","vvit","RVR - JC","Andhra University"],
"college_address":["guntur","guntur","guntur","guntur"],
"Total Staff":['1200','3422','5644','670']
},index=['one','two','three','four'])
#display the dataframe
print(data)
Output:
college_id college_name college_address Total Staff
one 1 vignan university guntur 1200
two 2 vvit guntur 3422
three 3 RVR - JC guntur 5644
four 4 Andhra University guntur 670
Example 2: Create second pandas dataframe
In this example, we will create dataframe with 5 rows and 4 columns with college data and assign indices through index parameter.
import pandas as pd
#create dataframe from the college data
data2= pd.DataFrame({'college_id':[1,2,3,5,7],
'college_name':["vignan university","vvit","RVR - JC","VIT","Andhra University"],
"college_address":["guntur","guntur","guntur","guntur","hyderabad"],
"Total Staff":['1200','3422','5644','670','5663']
},index=['one','two','three','five','seven'])
#display the dataframe
print(data2)
Output:
college_id college_name college_address Total Staff
one 1 vignan university guntur 1200
two 2 vvit guntur 3422
three 3 RVR - JC guntur 5644
five 5 VIT guntur 670
seven 7 Andhra University hyderabad 5663
From the two dataframes we observed that in both the dataframes first three rows are common (same). and last row in the first dataframe and last two rows in the second dataframe is not different.
merge() is used to perform join.
Syntax:
pandas.merge(dataframe1, dataframe2, how='inner',on)
where,
-
dataframe1 is the first dataframe
-
dataframe2 is the second dataframe
-
on is the parameter which joins both the dataframes based on the column taken
-
how is the parameter that performs the type of join . Here it is inner for INNER JOIN.
Let's see what is inner join?
inner join returns the dataframe - it will return all the rows that are matched/similar in both the dataframes
Example:
In this example, we will join both the dataframes based on college_id column.
import pandas as pd
#create dataframe from the college data
data= pd.DataFrame({'college_id':[1,2,3,4],
'college_name':["vignan university","vvit","RVR - JC","Andhra University"],
"college_address":["guntur","guntur","guntur","guntur"],
"Total Staff":['1200','3422','5644','670']
},index=['one','two','three','four'])
#create dataframe from the college data
data2= pd.DataFrame({'college_id':[1,2,3,5,7],
'college_name':["vignan university","vvit","RVR - JC","VIT","Andhra University"],
"college_address":["guntur","guntur","guntur","guntur","hyderabad"],
"Total Staff":['1200','3422','5644','670','5663']
},index=['one','two','three','five','seven'])
#inner join
pd.merge(data,data2,how="inner",on='college_id')
Output:
From the above example, There are only three rows matched among two dataframes IE college_id - 1, 2 and 3 . So It returned these three rows.
Would you like to see your article here on tutorialsinhand.
Join
Write4Us program by tutorialsinhand.com
About the Author
Gottumukkala Sravan Kumar 171FA07058
B.Tech (Hon's) - IT from Vignan's University.
Published 1400+ Technical Articles on Python, R, Swift, Java, C#, LISP, PHP - MySQL and Machine Learning
Page Views :
Published Date :
Apr 21,2023