Home » excel » Match two columns in Excel file and get other column values – Python Pandas

Match two columns in Excel file and get other column values – Python Pandas

Posted by: admin May 14, 2020 Leave a comment

Questions:

I have two Excel files, say, wb1.xlsx and wb2.xlsx.

wb1.xlsx

adsl    svc_no    port_stat    adsl.1    Comparison result
2/17
2/24
2/27
2/33
2/37
3/12

wb2.xlsx

caller_id    status    adsl    Comparison result
n/a          SP        2/37    Not Match
n/a          RE        2/24    Not Match
n/a          SP        2/27    Match
n/a          SP        2/33    Not Match
n/a          SP        2/17    Match

What I want to do is match the adsl of wb2.xlsx to wb1.xlsx and get the other values to the other columns.

My expected output is to update wb1.xlsx with the values from wb2.xlsx

adsl    svc_no    port_stat    adsl.1    Comparison result
2/17    n/a       SP           2/17      Match
2/24    n/a       RE           2/24      Not Match
2/27    n/a       SP           2/27      Match
2/33    n/a       SP           2/33      Not Match
2/37    n/a       SP           2/37      Not Match
3/12 

Upon searching, I was able to check that pd.merge() is able to do the matching.

I tried it this way:

result = pd.merge(df2, pri_df, on=['adsl', 'adsl'])

Unfortunately, it creates new columns and do not update the existing. Also, it only gets the values that it was able to matched and disregard the other rows.

I also tried to get the indices of the columns in wb2.xlsx and assigned it to the columns wb1.xlsx but it just copied it literally.

Any reference that would help will do.

How to&Answers:

You can use isin function of pandas:

result = df2.loc[df2['adsl'].isin(pri_df['adsl'])]

Hope this will work for you.

Answer:

I suggest use intersection with combine_first:

print (df1)
   adsl  svc_no  port_stat  adsl.1  Comparison result
0  2/17     NaN        NaN     NaN                NaN
1  2/24     NaN        NaN     NaN                NaN
2  2/27     NaN        NaN     NaN                NaN
3  2/33     NaN        NaN     NaN                NaN
4  2/37     NaN        NaN     NaN                NaN
5  3/12     NaN        NaN     NaN                NaN

print (df2)
   caller_id port_stat  adsl Comparison result
0        NaN        SP  2/37         Not Match
1        NaN        RE  2/24         Not Match
2        NaN        SP  2/27             Match
3        NaN        SP  2/33         Not Match
4        NaN        SP  2/17             Match

df2 = df2.rename(columns={'status':'port_stat'})
d = {'adsl.1': lambda x: x['adsl']}
df2 = df2.assign(**d)
print (df2)
   caller_id port_stat  adsl Comparison result adsl.1
0        NaN        SP  2/37         Not Match   2/37
1        NaN        RE  2/24         Not Match   2/24
2        NaN        SP  2/27             Match   2/27
3        NaN        SP  2/33         Not Match   2/33
4        NaN        SP  2/17             Match   2/17

df22 = df2[df2.columns.intersection(df1.columns)]
print (df22)
  port_stat  adsl Comparison result adsl.1
0        SP  2/37         Not Match   2/37
1        RE  2/24         Not Match   2/24
2        SP  2/27             Match   2/27
3        SP  2/33         Not Match   2/33
4        SP  2/17             Match   2/17

result = (df22.set_index('adsl')
              .combine_first(df1.set_index('adsl'))
              .reset_index()
              .reindex(columns=df1.columns))
print (result)
   adsl  svc_no port_stat adsl.1 Comparison result
0  2/17     NaN        SP   2/17             Match
1  2/24     NaN        RE   2/24         Not Match
2  2/27     NaN        SP   2/27             Match
3  2/33     NaN        SP   2/33         Not Match
4  2/37     NaN        SP   2/37         Not Match
5  3/12     NaN       NaN    NaN               NaN