I have created a program to remove duplicate rows from an excel file using pandas. After successfully doing so I exported the new data from pandas to excel however the new excel file seems to have missing data (specifically columns involving dates). Instead of showing the actual data it just shows ‘
Answer:
Answer:
Answer:
#’ on the rows.
Code:
import pandas as pd
data = pd.read_excel('test.xlsx')
data.sort_values("Serial_Nbr", inplace = True)
data.drop_duplicates(subset ="Serial_Nbr", keep = "first", inplace = True)
data.to_excel (r'test_updated.xlsx')
Before and after exporting:
date date
2018-07-01 Answer:
Answer:
Answer:
#
2018-08-01 Answer:
Answer:
Answer:
#
2018-08-01 Answer:
Answer:
Answer:
#
it means Width of cell is not capable to display the data, try to expand the width of cell’s width.
cell’s width is too narrow:
after expanding the cell’s width:
to export to excel with datetime correctly, you must add the format code for excel export:
import pandas as pd
data = pd.read_excel('Book1.xlsx')
data.sort_values("date", inplace = False)
data.drop_duplicates(subset ="date", keep = "first", inplace = True)
#Writer datetime format
writer = pd.ExcelWriter("test_updated.xlsx",
datetime_format='mm dd yyyy',
date_format='mmm dd yyyy')
# Convert the dataframe to an XlsxWriter Excel object.
data.to_excel(writer, sheet_name='Sheet1')
writer.save()
Answer:
Answer:
Answer:
Answer:
#
is displayed when a cell’s width is too small to display its contents. You need to increase the cells’ width or reduce their content
Answer:
Regarding the original query on data, I agree with the response from ALFAFA.
Here I am trying to do column resizing, so that end user does not need to do the same manually in the xls.
Steps would be:
- Get the column name (as per xls, column names start with ‘A’, ‘B’, ‘C’ etc)
colPosn = data.columns.get_loc('col#3') # Get column position xlsColName = chr(ord('A')+colPosn) # Get xls column name (not the column header as per data frame). This will be used to set attributes of xls columns
- Get resizing width of the column ‘col#3’ by getting length of the longest string in the column
maxColWidth = 1 + data['col#3'].map(len).max() # Gets the length of longest string of the column named 'col#3' (+1 for some buffer space to make data visible in the xls column)
- use column_dimensions[colName].width attribute to increase the width of the xls column
data.to_excel(writer, sheet_name='Sheet1', index=False) # use index=False if you dont need the unwanted extra index column in the file sheet = writer.book['Sheet1'] sheet.column_dimensions[xlsColName].width = maxColWidth # Increase the width of column to match with the longest string in the column writer.save()
- Replace last two lines from post of ALFAFA with the above blocks (all sections above) to get the column width adjusted for ‘col#3’
Tags: excel, excelpython, pandas