We are in the design phase for product. The idea is that the code will read a list of values from Excel into SQL.
The requirements are as follows:
Workbook may be accessed by multiple users outside of our program
Workbook must remain accessible (i.e. not be corrupted) should something bad occur while our program is running
Program will be executed when no users are in the file
Right now we are considering using
pandas in a simple manner as follows:
import pandas as pd from pandas import ExcelWriter from pandas import ExcelFile df = pd.read_excel('File.xlsx', sheetname='Sheet1') """Some code to write df in to SQL"""
If this code goes offline with the Excel still open, is there ANY possibility that the file will remain locked somewhere in my program or be corrupted?
To clarify, we envision something catastrophic like the server crashing or losing power.
Searched around but couldn’t find a similar question, please redirect me if necessary.
I also read through Pandas read_excel documentation here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html
With the code you provide, from my reading of the pandas and xlrd code, the given file will only be opened in read mode. That should mean, to the best of my knowledge, that there is no more risk in what you’re doing than in reading the file any other way – and you have to read it to use it, after all.
If this doesn’t sufficiently reassure you, you could minimize the time the file is open and, more importantly, not expose your file to external code, by handing pandas a
BytesIO object instead of a path:
import io import pandas as pd data = io.BytesIO(open('File.xlsx', 'rb').read()) df = pd.read_excel(data, sheetname='Sheet1') # etc
This way your file will only be open for the time it takes to read it into memory, and pandas and xlrd will only be working with a copy of the data.