Home » excel » Convert multiple columns in Python dataframe to yyyy/mm/dd with both excel numerical and normal datetime values

Convert multiple columns in Python dataframe to yyyy/mm/dd with both excel numerical and normal datetime values

Posted by: admin May 14, 2020 Leave a comment

Questions:

I need to be able to select a couple columns from an Excel file in a dataframe to apply a standard date time format (yyyy/mm/dd). The data is (unfortunately) in a mixture of Excel numerical (ex. 43799) and standard date format (ex. 11/30/2019). I am using the read_excel method from pandas, and would prefer not to use alternative methods of opening the file (ex. xldr’s open workbook stuff).

An example of what the data will look like when I import it:

import xlrd
import pandas as pd
import numpy as np
from datetime import datetime as dt

data=[['test', 43799, '11/30/2019', '11/30/2019'], ['test 2', '11/30/2019', '11/30/2019', '11/30/2019'], ['test 3', 43799, '11/30/2019', 43799]]
df=pd.DataFrame(data, columns=['Name','Date_1', 'Date_2', 'Date_3'])
print(df)

So, as stated in intro, how do I select columns 1-3 (Date_1, Date_2, Date_3) and apply the same date format to all of them (YYYY-MM-DD)? Any help would be greatly appreciated!

How to&Answers:

You will need to parse the column multiple times with the different formats. combine_first will allow you to select the date that matches properly. Excel date is days since 1900-01-01, so we need to change that to an integer first.

for col in ['Date_1', 'Date_2', 'Date_3']:
    d1 = pd.to_datetime(df[col], format='%m/%d/%Y', errors='coerce')
    d2 = pd.to_datetime(pd.to_numeric(df[col], errors='coerce'),  unit='d', origin='1900-01-01')
    df[col] = d1.combine_first(d2)

     Name     Date_1     Date_2     Date_3
0    test 2019-12-02 2019-11-30 2019-11-30
1  test 2 2019-11-30 2019-11-30 2019-11-30
2  test 3 2019-12-02 2019-11-30 2019-12-02