Articles

Merge two pandas DataFrame - INNER JOIN

Merge two pandas DataFrame - INNER JOIN


In this pandas tutorial we will discuss how to perform inner join on pandas DataFrames.

 

Introduction

DataFrame in pandas is two dimensional data structure that will store data in two dimensional format. One dimension refers to a row and second dimension refers to a column, So It will store the data in rows and columns.

 

We create DataFrame using DataFrame() method. But this is available in pandas module, so we have to import pandas module.

Syntax:

pandas.DataFrame(data)

Where, data is the input dataframe , The data can be a dictionary that stores list of values with specified key

 

Example 1Create first pandas dataframe

In this example, we will create dataframe with 4 rows and 4 columns with college data and assign indices through index parameter.

import pandas as pd

#create dataframe from the college data
data= pd.DataFrame({'college_id':[1,2,3,4],

                    'college_name':["vignan university","vvit","RVR - JC","Andhra University"],

                   "college_address":["guntur","guntur","guntur","guntur"],

                    "Total Staff":['1200','3422','5644','670']

                   },index=['one','two','three','four'])

#display the dataframe
print(data)

Output:

       college_id       college_name college_address Total Staff
one             1  vignan university          guntur        1200
two             2               vvit          guntur        3422
three           3           RVR - JC          guntur        5644
four            4  Andhra University          guntur         670

Example 2Create second pandas dataframe

In this example, we will create dataframe with 5 rows and 4 columns with college data and assign indices through index parameter.

import pandas as pd

#create dataframe from the college data
data2= pd.DataFrame({'college_id':[1,2,3,5,7],

                    'college_name':["vignan university","vvit","RVR - JC","VIT","Andhra University"],

                   "college_address":["guntur","guntur","guntur","guntur","hyderabad"],

                    "Total Staff":['1200','3422','5644','670','5663']

                   },index=['one','two','three','five','seven'])

#display the dataframe
print(data2)

Output:

       college_id       college_name college_address Total Staff
one             1  vignan university          guntur        1200
two             2               vvit          guntur        3422
three           3           RVR - JC          guntur        5644
five            5                VIT          guntur         670
seven           7  Andhra University       hyderabad        5663

From the two dataframes we observed that in both the dataframes first three rows are common (same). and last row in the first dataframe and last two rows in the second dataframe is not different.

 

merge() is used to perform join.

Syntax:

pandas.merge(dataframe1, dataframe2, how='inner',on)

where,

  1. dataframe1 is the first dataframe
  2. dataframe2 is the second dataframe
  3. on is the parameter which joins both the dataframes based on the column taken
  4. how is the parameter that performs the type of join . Here it is inner for INNER JOIN.

Let's see what is inner join?

inner join returns the dataframe - it will return all the rows that are matched/similar in both the dataframes

 

Example:

In this example, we will join both the dataframes based on college_id column.

import pandas as pd

#create dataframe from the college data
data= pd.DataFrame({'college_id':[1,2,3,4],

                    'college_name':["vignan university","vvit","RVR - JC","Andhra University"],

                   "college_address":["guntur","guntur","guntur","guntur"],

                    "Total Staff":['1200','3422','5644','670']

                   },index=['one','two','three','four'])


#create dataframe from the college data
data2= pd.DataFrame({'college_id':[1,2,3,5,7],

                    'college_name':["vignan university","vvit","RVR - JC","VIT","Andhra University"],

                   "college_address":["guntur","guntur","guntur","guntur","hyderabad"],

                    "Total Staff":['1200','3422','5644','670','5663']

                   },index=['one','two','three','five','seven'])

#inner join
pd.merge(data,data2,how="inner",on='college_id')

Output:

 

From the above example, There are only three rows matched among two dataframes IE college_id - 1, 2 and 3 . So It returned these three rows.


Pandas

Would you like to see your article here on tutorialsinhand. Join Write4Us program by tutorialsinhand.com

About the Author
Gottumukkala Sravan Kumar 171FA07058
B.Tech (Hon's) - IT from Vignan's University. Published 1400+ Technical Articles on Python, R, Swift, Java, C#, LISP, PHP - MySQL and Machine Learning
Page Views :    Published Date : Apr 21,2023  
Please Share this page

Related Articles

Like every other website we use cookies. By using our site you acknowledge that you have read and understand our Cookie Policy, Privacy Policy, and our Terms of Service. Learn more Got it!