Home » Python » python – how to verify that entry A with version X is newer than entry A with version X-1 in pandas data frame?-Exceptionshub

python – how to verify that entry A with version X is newer than entry A with version X-1 in pandas data frame?-Exceptionshub

Posted by: admin February 24, 2020 Leave a comment

Questions:

I have a pandas DataFrame like this:

document id   document version   version date
101            1                  2020-01-01
101            2                  2020-01-02
102            1                  2020-01-01
103            1                  2019-05-02
101            3                  2019-12-03
102            2                  2020-01-02

I can’t figure out how to identify rows that have newer document version with the version date before or equal the date of the previous version.

So in this example, I want to identify row 5 with document 101, version 3 and date 2019-12-03 which is before the date of the version 2 of this document.

Thanks a lot!

How to&Answers:

I tried at @Allen option and wasnt quite getting desired outcome.

Try sort by document id and document version, dfgroupby.diff and filter negative values

df1=df.sort_values(['document id', 'document version'])
df1['document date ']=pd.to_datetime(df1['document date '])
df1[df1.groupby('document id')['document date '].apply(lambda x: x.diff(1)).astype('timedelta64[D]')<0]

Outcome:

enter image description here

Answer:

You can use apply:

(
    df.apply(lambda x: ((df['document id']==x['document id']) & 
                        (df['document version']<x['document version']) &
                        (df['version date']>x['version date'])).any(), axis=1)
    .pipe(lambda x: df.loc[x])
)

    document id document version    version date
4   101         3                   2019-12-03