Merge two pandas DataFrame - LEFT OUTER JOIN
In this pandas tutorial we will discuss how to perform left outer join on pandas DataFrames.
Introduction
DataFrame in pandas is two dimensional data structure that will store data in two dimensional format. One dimension refers to a row and second dimension refers to a column, So It will store the data in rows and columns.
We create DataFrame using DataFrame() method. But this is available in pandas module, so we have to import pandas module.
Syntax:
pandas.DataFrame(data)
Where, data is the input dataframe , The data can be a dictionary that stores list of values with specified key
Example 1: Create first pandas dataframe
In this example, we will create dataframe with 4 rows and 4 columns with college data and assign indices through index parameter.
import pandas as pd
#create dataframe from the college data
data= pd.DataFrame({'college_id':[1,2,3,4],
'college_name':["vignan university","vvit","RVR - JC","Andhra University"],
"college_address":["guntur","guntur","guntur","guntur"],
"Total Staff":['1200','3422','5644','670']
},index=['one','two','three','four'])
#display the dataframe
print(data)
Output:
college_id college_name college_address Total Staff
one 1 vignan university guntur 1200
two 2 vvit guntur 3422
three 3 RVR - JC guntur 5644
four 4 Andhra University guntur 670
Example 2: Create second pandas dataframe
In this example, we will create dataframe with 5 rows and 4 columns with college data and assign indices through index parameter.
import pandas as pd
#create dataframe from the college data
data2= pd.DataFrame({'college_id':[1,2,3,5,7],
'college_name':["vignan university","vvit","RVR - JC","VIT","Andhra University"],
"college_address":["guntur","guntur","guntur","guntur","hyderabad"],
"Total Staff":['1200','3422','5644','670','5663']
},index=['one','two','three','five','seven'])
#display the dataframe
print(data2)
Output:
college_id college_name college_address Total Staff
one 1 vignan university guntur 1200
two 2 vvit guntur 3422
three 3 RVR - JC guntur 5644
five 5 VIT guntur 670
seven 7 Andhra University hyderabad 5663
From the two dataframes we observed that in both the dataframes first three rows aer common (same). and last row in the first dataframe and last two rows in the second dataframe is not different.
merge() is used to perform join.
Syntax:
pandas.merge(dataframe1, dataframe2, how='left',on)
where,
-
dataframe1 is the first dataframe
-
dataframe2 is the second dataframe
-
on is the parameter which joins both the dataframes based on the column taken
-
how is the parameter that performs the type of join . Here it is left for LEFT OUTER JOIN.
Let's see what is left join?
Left join returns the dataframe - it will return all the rows from the first dataframe return the rows in the second dataframe that are matched with the rows in the first dataframe.
Example:
In this example, we will join both the dataframes based on college_id column
import pandas as pd
#create dataframe from the college data
data= pd.DataFrame({'college_id':[1,2,3,4],
'college_name':["vignan university","vvit","RVR - JC","Andhra University"],
"college_address":["guntur","guntur","guntur","guntur"],
"Total Staff":['1200','3422','5644','670']
},index=['one','two','three','four'])
#create dataframe from the college data
data2= pd.DataFrame({'college_id':[1,2,3,5,7],
'college_name':["vignan university","vvit","RVR - JC","VIT","Andhra University"],
"college_address":["guntur","guntur","guntur","guntur","hyderabad"],
"Total Staff":['1200','3422','5644','670','5663']
},index=['one','two','three','five','seven'])
#left outer join
pd.merge(data,data2,how="left",on='college_id')
Output:
From the above example, Among 4 rows in the first dataframe - only three rows in the second dataframe is matching with the second dataframe. So NaN values are filled under second dataframe with unmatched rows.
Would you like to see your article here on tutorialsinhand.
Join
Write4Us program by tutorialsinhand.com
About the Author
Gottumukkala Sravan Kumar 171FA07058
B.Tech (Hon's) - IT from Vignan's University.
Published 1400+ Technical Articles on Python, R, Swift, Java, C#, LISP, PHP - MySQL and Machine Learning
Page Views :
Published Date :
Apr 18,2023