I need to be able to select a couple columns from an Excel file in a dataframe to apply a standard date time format (yyyy/mm/dd). The data is (unfortunately) in a mixture of Excel numerical (ex. 43799) and standard date format (ex. 11/30/2019). I am using the read_excel method from pandas, and would prefer not to use alternative methods of opening the file (ex. xldr’s open workbook stuff).
An example of what the data will look like when I import it:
import xlrd import pandas as pd import numpy as np from datetime import datetime as dt data=[['test', 43799, '11/30/2019', '11/30/2019'], ['test 2', '11/30/2019', '11/30/2019', '11/30/2019'], ['test 3', 43799, '11/30/2019', 43799]] df=pd.DataFrame(data, columns=['Name','Date_1', 'Date_2', 'Date_3']) print(df)
So, as stated in intro, how do I select columns 1-3 (Date_1, Date_2, Date_3) and apply the same date format to all of them (YYYY-MM-DD)? Any help would be greatly appreciated!
You will need to parse the column multiple times with the different formats.
combine_first will allow you to select the date that matches properly. Excel date is days since 1900-01-01, so we need to change that to an integer first.
for col in ['Date_1', 'Date_2', 'Date_3']: d1 = pd.to_datetime(df[col], format='%m/%d/%Y', errors='coerce') d2 = pd.to_datetime(pd.to_numeric(df[col], errors='coerce'), unit='d', origin='1900-01-01') df[col] = d1.combine_first(d2)
Name Date_1 Date_2 Date_3 0 test 2019-12-02 2019-11-30 2019-11-30 1 test 2 2019-11-30 2019-11-30 2019-11-30 2 test 3 2019-12-02 2019-11-30 2019-12-02