Articles

Merge two pandas DataFrame - LEFT OUTER JOIN

Merge two pandas DataFrame - LEFT OUTER JOIN


In this pandas tutorial we will discuss how to perform left outer join on pandas DataFrames.

Introduction

DataFrame in pandas is two dimensional data structure that will store data in two dimensional format. One dimension refers to a row and second dimension refers to a column, So It will store the data in rows and columns.

 

We create DataFrame using DataFrame() method. But this is available in pandas module, so we have to import pandas module.

Syntax:

pandas.DataFrame(data)

Where, data is the input dataframe , The data can be a dictionary that stores list of values with specified key

 

Example 1Create first pandas dataframe

In this example, we will create dataframe with 4 rows and 4 columns with college data and assign indices through index parameter.

import pandas as pd

#create dataframe from the college data
data= pd.DataFrame({'college_id':[1,2,3,4],

                    'college_name':["vignan university","vvit","RVR - JC","Andhra University"],

                   "college_address":["guntur","guntur","guntur","guntur"],

                    "Total Staff":['1200','3422','5644','670']

                   },index=['one','two','three','four'])

#display the dataframe
print(data)

Output:

       college_id       college_name college_address Total Staff
one             1  vignan university          guntur        1200
two             2               vvit          guntur        3422
three           3           RVR - JC          guntur        5644
four            4  Andhra University          guntur         670

Example 2Create second pandas dataframe

In this example, we will create dataframe with 5 rows and 4 columns with college data and assign indices through index parameter.

import pandas as pd

#create dataframe from the college data
data2= pd.DataFrame({'college_id':[1,2,3,5,7],

                    'college_name':["vignan university","vvit","RVR - JC","VIT","Andhra University"],

                   "college_address":["guntur","guntur","guntur","guntur","hyderabad"],

                    "Total Staff":['1200','3422','5644','670','5663']

                   },index=['one','two','three','five','seven'])

#display the dataframe
print(data2)

Output:

       college_id       college_name college_address Total Staff
one             1  vignan university          guntur        1200
two             2               vvit          guntur        3422
three           3           RVR - JC          guntur        5644
five            5                VIT          guntur         670
seven           7  Andhra University       hyderabad        5663

From the two dataframes we observed that in both the dataframes first three rows aer common (same). and last row in the first dataframe and last two rows in the second dataframe is not different.

 

merge() is used to perform join.

Syntax:

pandas.merge(dataframe1, dataframe2, how='left',on)

where,

  1. dataframe1 is the first dataframe
  2. dataframe2 is the second dataframe
  3. on is the parameter which joins both the dataframes based on the column taken
  4. how is the parameter that performs the type of join . Here it is left for LEFT OUTER JOIN.

Let's see what is left join?

Left join returns the dataframe - it will return all the rows from the first dataframe return the rows in the second dataframe that are matched with the rows in the first dataframe.

 

Example:

In this example, we will join both the dataframes based on college_id column

import pandas as pd

#create dataframe from the college data
data= pd.DataFrame({'college_id':[1,2,3,4],

                    'college_name':["vignan university","vvit","RVR - JC","Andhra University"],

                   "college_address":["guntur","guntur","guntur","guntur"],

                    "Total Staff":['1200','3422','5644','670']

                   },index=['one','two','three','four'])


#create dataframe from the college data
data2= pd.DataFrame({'college_id':[1,2,3,5,7],

                    'college_name':["vignan university","vvit","RVR - JC","VIT","Andhra University"],

                   "college_address":["guntur","guntur","guntur","guntur","hyderabad"],

                    "Total Staff":['1200','3422','5644','670','5663']

                   },index=['one','two','three','five','seven'])

#left outer join
pd.merge(data,data2,how="left",on='college_id')

Output:

 

From the above example, Among 4 rows in the first dataframe - only three rows in the second dataframe is matching with the second dataframe. So NaN values are filled under second dataframe with unmatched rows.


Pandas

Would you like to see your article here on tutorialsinhand. Join Write4Us program by tutorialsinhand.com

About the Author
Gottumukkala Sravan Kumar 171FA07058
B.Tech (Hon's) - IT from Vignan's University. Published 1400+ Technical Articles on Python, R, Swift, Java, C#, LISP, PHP - MySQL and Machine Learning
Page Views :    Published Date : Apr 18,2023  
Please Share this page

Related Articles

Like every other website we use cookies. By using our site you acknowledge that you have read and understand our Cookie Policy, Privacy Policy, and our Terms of Service. Learn more Got it!