I downloaded several CSV files from a finance site. These files are inputs to a python script that I wrote. The rows in the CSV files don’t all have the same number of values (i.e) columns. In fact on blank lines there are no values at all.
This is what the first few line of the downloaded file look like :
Performance Report Date Produced,14-Feb-2020
When I attempt to add the row to a panda dataFrame, the script incurs an error of “mismatched columns”.
I got around this by opening up the the files in MAC OSX Numbers and manually exporting each file to CSV. However I don’t want to do this each time I download a CSV file from the finance site. I have googled for ways to automate this but have not been successful.
This is what the first few lines of the “Numbers” exported csv file looks like:
,,,,,,, Performance Report,,,,,, Date Produced,14-Feb-2020,,,,, ,,,,,,,
I have tried to played with dialect value of the csv.read module but have not been successful.
I also appended the columns manually in the python script but also have not been successful.
Essentially mid way down the CSV file is the table that I place into the dataFrame. Below is an example.
Asset Name,Opening Balance $,Purchases $,Sales $,Change in Value $,Closing Balance $,Income $,% Return for Period Asset A,0.00,35.25,66.00,26.51,42.74,5.25,-6.93 ... ... Sub Total,48.86,26,12.29,-16.7,75.82,29.06,
That table prior to exporting via “Numbers” looks like so:
Asset Name,Opening Balance $,Purchases $,Sales $,Change in Value $,Closing Balance $,Income $,% Return for Period Asset A,0.00,35.25,66.00,26.51,42.74,5.25,-6.93 ... ... Sub Total,48.86,26,12.29,-16.7,75.82,29.06
Above the subtotal row does not have a value in the last column, and does so doe snot represent it as ,””, which would make it so that all rows have an equal number of columns.
Does anyone have any ideas on how I can automate the Numbers export process? Any help would be appreciated. I presume they’re varying formats of CSV.
read_csv you can skip rows. If the number of header rows are consistent then:
If the first few lines are not consistent or the problem is actually deeper within the file, then you might experiment with
except:. Without more information on what the data file looks like, I can’t come up with a more specific example using
There is many ways to do this in your script rather than adding commas by means of seperate programs.
One way is to preprocess the file in memory in your script before using pandas.
Now when you are using pandas you should use the built-in power of pandas.
you have not shared what the actual data rows looks like, and without that noone can actually help you.
I would look into using the following 2 kwargs of ‘read_csv’ to get the job done.
skiprowsas a callable,
i.e. make your own function and use it as a filter to filter unwanted rows away
Falseto just ignore errors and deal with it after it’s in the dataframe