I have a pandas DataFrame like this:
document id document version version date
101 1 2020-01-01
101 2 2020-01-02
102 1 2020-01-01
103 1 2019-05-02
101 3 2019-12-03
102 2 2020-01-02
I can’t figure out how to identify rows that have newer document version with the version date before or equal the date of the previous version.
So in this example, I want to identify row 5 with document 101, version 3 and date 2019-12-03 which is before the date of the version 2 of this document.
Thanks a lot!
I tried at @Allen option and wasnt quite getting desired outcome.
Try sort by document id
and document version
, dfgroupby.diff and filter negative values
df1=df.sort_values(['document id', 'document version'])
df1['document date ']=pd.to_datetime(df1['document date '])
df1[df1.groupby('document id')['document date '].apply(lambda x: x.diff(1)).astype('timedelta64[D]')<0]
Outcome:
Answer:
You can use apply:
(
df.apply(lambda x: ((df['document id']==x['document id']) &
(df['document version']<x['document version']) &
(df['version date']>x['version date'])).any(), axis=1)
.pipe(lambda x: df.loc[x])
)
document id document version version date
4 101 3 2019-12-03
Tags: exception, pandas, pythonpython