I have a large dataset I’m trying to manipulate for further analysis. Below is what the relevant parts of the dataframe would look like.
Loan Closing Balance Date
1 175,000 2010-10-31
1 150,000 2010-11-30
1 125,000 2010-12-31
2 275,000 2010-10-31
2 250,000 2010-11-30
2 225,000 2010-12-31
3 375,000 2010-10-31
3 350,000 2010-11-30
3 320,000 2010-12-31
I would like to create a new column called Opening Balance which is basically the Closing Balance for the previous month’s month end, so for the second row, Opening Balance would just be equal to 175,000, which is the Closing Balance for the first row.
As dataset starts 2010-10-31, I won’t be able to look up a balance for 2010-09-30, so for any row with a date of 2010-10-31, I want to make the Opening Balance for that observation equal to the Closing Balance.
Here’s what it should look like:
Loan Closing Balance Date Opening Balance
1 175,000 2010-10-31 175,000
1 150,000 2010-11-30 175,000
1 125,000 2010-12-31 150,000
2 275,000 2010-10-31 275,000
2 250,000 2010-11-30 275,000
2 225,000 2010-12-31 250,000
3 375,000 2010-10-31 375,000
3 350,000 2010-11-30 375,000
3 320,000 2010-12-31 350,000
In Excel I would normally do a compound index match with an eomonth function thrown in to do this but not quite sure how to do this in Python (still very new to it).
Any help appreciated.
I’ve tried the approach suggested by Santhosh and I get the following:
Thanks I tried your solution and end up getting the following:
Closing Balance_x Date_x Closing Balance_y
0 175000 2010-09-30 150000.0
1 175000 2010-09-30 250000.0
2 175000 2010-09-30 350000.0
3 150000 2010-10-31 125000.0
4 150000 2010-10-31 225000.0
5 150000 2010-10-31 320000.0
6 125000 2010-11-30 NaN
7 275000 2010-09-30 150000.0
8 275000 2010-09-30 250000.0
9 275000 2010-09-30 350000.0
10 250000 2010-10-31 125000.0
11 250000 2010-10-31 225000.0
12 250000 2010-10-31 320000.0
13 225000 2010-11-30 NaN
14 375000 2010-09-30 150000.0
15 375000 2010-09-30 250000.0
16 375000 2010-09-30 350000.0
17 350000 2010-10-31 125000.0
18 350000 2010-10-31 225000.0
19 350000 2010-10-31 320000.0
20 320000 2010-11-30 NaN
I then amended that code to do a merge based off of the Loan ID and Date/pDate:
final_df = pd.merge(df, df, how="left", left_on=['Date'], right_on=['pDate'])
Loan Closing Balance_x Date_x Opening Balance
0 1 175000 2010-09-30 150000.0
1 1 150000 2010-10-31 125000.0
2 1 125000 2010-11-30 NaN
3 2 275000 2010-09-30 250000.0
4 2 250000 2010-10-31 225000.0
5 2 225000 2010-11-30 NaN
6 3 375000 2010-09-30 350000.0
7 3 350000 2010-10-31 320000.0
8 3 320000 2010-11-30 NaN
Now in this case I’m not sure why I get NaN on every November observation. The Opening Balance for Loan 1 in November should be 150,000. The October Opening Balance should be 175,000. And the September Opening Balance should just be defaulted to the same as the September Opening Balance since I do not have an August Closing Balance to refer to.
Update
Think I resolved the issue, I changed the merge code to:
final_df = pd.merge(df, df, how="left", left_on=['Loan','pDate'], right_on=['Loan','Date'])
This still gets me NaN for September observations but that is fine as I can do a manual replace of those values.
I suggest you have another column that says Date – (1month) and then join them on the date fields to get opening balance.
df["cmonth"] = df.Date.apply(lambda x: x.year*100+x.month)
df["pDate"] = df.Date.apply(lambda x: (x - pd.DateOffset(months=1)))
df["pmonth"] = df.pDate.apply(lambda x: x.year*100+x.month)
final_df = pd.merge(df, df, how="left", left_on="cmonth", right_on="pmonth")
print(final_df[["close_x", "Date_x", "close_y"]])
#close_y is your opening balance
Tags: excel, pythonpython