Home » excel » excel – Index match equivalent in Python

excel – Index match equivalent in Python

Posted by: admin April 23, 2020 Leave a comment

Questions:

I have a large dataset I’m trying to manipulate for further analysis. Below is what the relevant parts of the dataframe would look like.

Loan   Closing Balance Date
1      175,000         2010-10-31
1      150,000         2010-11-30
1      125,000         2010-12-31
2      275,000         2010-10-31
2      250,000         2010-11-30
2      225,000         2010-12-31
3      375,000         2010-10-31
3      350,000         2010-11-30
3      320,000         2010-12-31

I would like to create a new column called Opening Balance which is basically the Closing Balance for the previous month’s month end, so for the second row, Opening Balance would just be equal to 175,000, which is the Closing Balance for the first row.

As dataset starts 2010-10-31, I won’t be able to look up a balance for 2010-09-30, so for any row with a date of 2010-10-31, I want to make the Opening Balance for that observation equal to the Closing Balance.

Here’s what it should look like:

Loan   Closing Balance Date         Opening Balance
1      175,000         2010-10-31   175,000
1      150,000         2010-11-30   175,000
1      125,000         2010-12-31   150,000
2      275,000         2010-10-31   275,000
2      250,000         2010-11-30   275,000
2      225,000         2010-12-31   250,000
3      375,000         2010-10-31   375,000
3      350,000         2010-11-30   375,000
3      320,000         2010-12-31   350,000

In Excel I would normally do a compound index match with an eomonth function thrown in to do this but not quite sure how to do this in Python (still very new to it).

Any help appreciated.

I’ve tried the approach suggested by Santhosh and I get the following:

Thanks I tried your solution and end up getting the following:

    Closing Balance_x     Date_x  Closing Balance_y
0              175000 2010-09-30           150000.0
1              175000 2010-09-30           250000.0
2              175000 2010-09-30           350000.0
3              150000 2010-10-31           125000.0
4              150000 2010-10-31           225000.0
5              150000 2010-10-31           320000.0
6              125000 2010-11-30                NaN
7              275000 2010-09-30           150000.0
8              275000 2010-09-30           250000.0
9              275000 2010-09-30           350000.0
10             250000 2010-10-31           125000.0
11             250000 2010-10-31           225000.0
12             250000 2010-10-31           320000.0
13             225000 2010-11-30                NaN
14             375000 2010-09-30           150000.0
15             375000 2010-09-30           250000.0
16             375000 2010-09-30           350000.0
17             350000 2010-10-31           125000.0
18             350000 2010-10-31           225000.0
19             350000 2010-10-31           320000.0
20             320000 2010-11-30                NaN

I then amended that code to do a merge based off of the Loan ID and Date/pDate:

final_df = pd.merge(df, df, how="left", left_on=['Date'], right_on=['pDate'])

      Loan  Closing Balance_x     Date_x           Opening Balance
    0    1             175000 2010-09-30           150000.0
    1    1             150000 2010-10-31           125000.0
    2    1             125000 2010-11-30                NaN
    3    2             275000 2010-09-30           250000.0
    4    2             250000 2010-10-31           225000.0
    5    2             225000 2010-11-30                NaN
    6    3             375000 2010-09-30           350000.0
    7    3             350000 2010-10-31           320000.0
    8    3             320000 2010-11-30                NaN

Now in this case I’m not sure why I get NaN on every November observation. The Opening Balance for Loan 1 in November should be 150,000. The October Opening Balance should be 175,000. And the September Opening Balance should just be defaulted to the same as the September Opening Balance since I do not have an August Closing Balance to refer to.

Update

Think I resolved the issue, I changed the merge code to:

final_df = pd.merge(df, df, how="left", left_on=['Loan','pDate'], right_on=['Loan','Date'])

This still gets me NaN for September observations but that is fine as I can do a manual replace of those values.

How to&Answers:

I suggest you have another column that says Date – (1month) and then join them on the date fields to get opening balance.

df["cmonth"] = df.Date.apply(lambda x: x.year*100+x.month)
df["pDate"] = df.Date.apply(lambda x: (x - pd.DateOffset(months=1)))
df["pmonth"] = df.pDate.apply(lambda x: x.year*100+x.month)
final_df = pd.merge(df, df, how="left", left_on="cmonth", right_on="pmonth")
print(final_df[["close_x", "Date_x", "close_y"]])
#close_y is your opening balance