Home » excel » Python–Read dat file rows, rewrite to columns in Excel. csv/numpy/openpyxl

Python–Read dat file rows, rewrite to columns in Excel. csv/numpy/openpyxl

Posted by: admin May 14, 2020 Leave a comment

Questions:

I run into some problem with using csv/numpy/openpyxl, the problem is
I have a .dat file, in

a,a,a,a
b,b,b,b
c,c,c,c

I want to take each row of dat file, put it into one column per excel, meaning

excel file:

a b c
a b c
a b c

here is what I got to so far:

import csv
import openpyxl
import numpy as np


wb = openpyxl.Workbook()
ws = wb.active

with open('Shari10.dat') as f:
    dat_reader = csv.reader(f, delimiter = ",")

    for header in csv.reader(f):
        break

    for dat_line in f:
        line = dat_line.split(",")

        data = np.vstack(line[1:8])

        for row in data:
            ws.append(row)
            print(row)
        #wb.save("coffee.xlsx")

here is the error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-17-a07e6ac6842f> in <module>
     20         print(data)
     21         for row in data:
---> 22             ws.append(row)
     23         #wb.save("coffee.xlsx")

~\AppData\Local\Continuum\anaconda3\lib\site-packages\openpyxl\worksheet\worksheet.py in append(self, iterable)
    665 
    666         else:
--> 667             self._invalid_row(iterable)
    668 
    669         self._current_row = row_idx

~\AppData\Local\Continuum\anaconda3\lib\site-packages\openpyxl\worksheet\worksheet.py in _invalid_row(self, iterable)
    792     def _invalid_row(self, iterable):
    793         raise TypeError('Value must be a list, tuple, range or generator, or a dict. Supplied value is {0}'.format(
--> 794             type(iterable))
    795                         )
    796 

TypeError: Value must be a list, tuple, range or generator, or a dict. Supplied value is <class 'str'>

For reference, i was trying to do this:

data = [
         ['A', 100, 1.0],
         ['B', 200, 2.0],
         ['C', 300, 3.0],    
         ['D', 400, 4.0],        
 ]
for row in data:
    ws.append(row)

Meanwhile, I just started to learn python, so forgive my messy code structure, as for the grammar, i am trying to write as accurate as possible instead of shorten the code.

How to&Answers:

It looks like you’re having some issues with numpy arrays not being a list. You can fix that by using numpy’s tolist() method by changing this

for row in data:
    ws.append(row)
    print(row)

to this

for row in data:
    ws.append(row.tolist())
    print(row.tolist())

Just changing those lines will make the code run successfully, but it does not provide your desired output. Running the code with the input file

a,a,a,a
b,b,b,b
c,c,c,c

results in a spreadsheet that looks like this, because you are transposing each row array into a column array, then stacking the columns on top of each other (ws.append adds rows to the bottom of your worksheet)

b
b
b
b\n
c
c
c
c\n

If you want the entire csv (including the header) to be transposed, a simple way to do that is with numpy’s transpose method. This method will swap the entire array for you, and then you can iterate through every row to write each of them to the worksheet. This will simplify how you read in the csv file to be like below. Keep in mind transpose only works with square arrays, so I’ve added a bit of code to square any jagged arrays.

import openpyxl
import numpy as np

# Create 
wb = openpyxl.Workbook()
ws = wb.active

with open('input.dat') as f:
    # Read in all the data
    data = list(csv.reader(f))

    ## If your CSV isn't square, you need to square it first
    # Get longest row in array
    longest = len(max(data, key=len))
    # Pad every row to longest row length
    for row in data:
        row.extend( (longest - len(row))*[''])

    ## Once data is square, continue as normal
    # Transpose the array
    data = np.transpose(data)

    # Write all rows to worksheet
    for row in data:
        ws.append(row.tolist())

# Save worksheet
wb.save('test.xlsx')

Answer:

Let’s say we have a file example.dat with the following:

a1,a2,a3,a4
b1,b2,b3,b4
c1,c2,c3,c4

This is better done with pandas. First load the data as a dataframe, then take the transpose and save the resulting dataframe in an excel file like this:

import pandas as pd

df_in = pd.read_csv("example.dat", header = None) # header = False since the data has no header.

data_out = df_in.transpose()

data_out.to_excel("example.xlsx", index = False, header = False) # index and header False since you don't want row or column indices written to the excel file.

Output:

a1  b1  c1
a2  b2  c2
a3  b3  c3
a4  b4  c4

Pros: Simple and clean. Cons: This implementation needs openpyxl

Install as: pip install openpyxl