Home » excel » 3 million lines of port data in Excel from txt

3 million lines of port data in Excel from txt

Posted by: admin May 14, 2020 Leave a comment

Questions:

I need from a text file with the 3 million lines of port data in Excel, every 1 million lines portation to start a new list. What’s wrong with the code, tell me?

    Sub ReadFile()
  Dim I, J
  I = 1
  J = 1
  File = "\bla.bla.bla\Desktop\vsr.txt"
  Open File For Input As #1
  Do While Not EOF(1)
    Line Input #1, MyString
    Sheets("LIST" & J).Cells(I, 1) = MyString
    If I = 1000000 Then
      I = 1
      J = J + 1
    Else
      I = I + 1
    End If
  Loop
  Close #1
End Sub
How to&Answers:

@ashleedawg already wrote most of the important stuff, hover ever I would make another correction to the code.

The way you handle the data now means you look up line in the vsr.txt file, then copy it to a string and finally write it into a cell. This takes up a lot of processing power and will be quite slow (especially the part where you access the cells. It would be way more efficient to copy all the text into a array and then paste it all at once.

The WorksheetFunction.Transpose() is necessary since arrays in excel use Columns as the first dimensions.
Inserting the Code every 10’000 Lines prevents the Transpose function from running into Issues. I assume since it is part of the WorksheetFunction Code it may relay on some older code or even be specifically made this way to run lagacy code (lenght of the return value must be array of lenght Integer or something).
Using myWb is there to make the Range modifiers fully qualified, not really necessary but can save you a lot of trouble down the line.
With GROUPSIZE you can adjust the size of the Array-Block you want to import at once, a bigger number may give you faster processing speeds.

Option Explicit

Sub ReadFile()

    Const GROUPSIZE As Long = 10000 'ENTRIESPERPAGE must be divisible by this without rest to fit exactly 
    Const ENTRIESPERPAGE As Long = 1000000 
    Const fName As String = "\bla.bla.bla\Desktop\vsr.txt"

    Dim rowNum As Long
    Dim rowOffset As Long
    Dim shtNum As Long
    Dim myString(1 To GROUPSIZE) As String
    Dim myWB As Workbook
    Dim Range2Fill As String

    Set myWB = ThisWorkbook

    Open fName For Input As #1
    shtNum = 1
    rowNum = 0
    rowOffset = 0
    Do While Not EOF(1)
        rowNum = rowNum + 1
        Line Input #1, myString(rowNum)

        If rowNum = GROUPSIZE Then
            Range2Fill = "A" & rowOffset + 1 & ":A" & rowOffset + rowNum
            myWB.Worksheets(shtNum).Range(Range2Fill) = WorksheetFunction.Transpose(myString)
            Debug.Print "Sheet: " & shtNum, "Offset: " & rowOffset

            If rowOffset >= ENTRIESPERPAGE - rowNum Then
                rowOffset = 0
                shtNum = shtNum + 1
            Else
                rowOffset = rowOffset + GROUPSIZE
            End If

            rowNum = 0
        End If
    Loop

    'writes the last set of data in case there are not an exact multiple of 1M values
    Range2Fill = "A" & rowOffset + 1 & ":A" & rowOffset + rowNum
    myWB.Worksheets(shtNum).Range(Range2Fill) = WorksheetFunction.Transpose(myString)
    Close #1

End Sub

Additional notes:

  • Excel supports up to 1,048,576 Rows per worksheet, but expect performance to drop drastically on so many datapoints, especially if you start calculating them or using them in graphs.
  • There are better options for working on huge datasets like this, Origin, MatLab, and DIAdem are a few that I know of and have used so far, a free phyton based alternative would be Spyder.

Answer:

Perhaps something like this would be better. I couldn’t test it since you didn’t include same data and I don’t know what port data is, but it should work.

If there’s a problem with it (or anytime you have a code problem) try using F8 to step through the code to find out exactly what’s happening on each line. (More on that from Chip Pearson here.)

Option Explicit

Sub ReadFile()
    Const fName = "\bla.bla.bla\Desktop\vsr.txt"
    Dim rowNum As Long, shtNum As Long, myString As String

    Open fName For Input As #1
    shtNum = 1
    Do While Not EOF(1)
        rowNum = rowNum + 1
        Line Input #1, myString
        Sheets("LIST" & shtNum).Cells(rowNum, 1) = myString
        If rowNum = 1000000 Then
            shtNum = shtNum + 1
            rowNum = 0
        End If
    Loop
    Close #1
End Sub

Some notes:

  • Use 'Option Explicit at the top of [every] module [always] to help locate variables that are not properly declared or handled. (ie., MyString was undeclared)
  • Personally, I try to avoid using common words like File for variables as it may or may not be a “reserved word” (ie., a constant or function name built-in to Excel)
  • Personally, I usually make the first letter of my variable lowercase, to help differentiate my variables from built-in keywords, at a glance.
  • Avoid using meaningless variables like I and J (especially when posting code for help from others). A couple extra characters make the code a lot easier to understand and to modify/debug later.
  • When declaring a variable, specify a data type whenever possible.
  • You didn’t mention anything about creating worksheets, and your code doesn’t attempt it, so this code assumes all the required output worksheets exist and are blank.
  • An alternative to using shtNum (or J in your example) would be to use Int(rowNum/1000000)+1 to get the sheet number. This formula returns 1 for 0 to 999999, 2 for 1000000 to 1999999, etc.

Answer:

Here is an alternative solution – save this code as a bat file and it will split your file in the amount of mbs you select, this will enable you to modify the data in Excel and create 3 * 1m worksheets:

@REM Using How to run a PowerShell script from a batch file
@REM and How can I split a text file using PowerShell?

@PowerShell  ^

$upperBound =  100MB;  ^
$rootName = 'ENTER YOUR FILE PATH HERE. csv file';  ^
$from = $rootName;  ^
$fromFile = [io.file]::OpenRead($from);  ^
$buff = new-object byte[] $upperBound;  ^
$count = $idx = 0;  ^
try {  ^
    do {  ^
        'Reading ' + $upperBound;  ^
        $count = $fromFile.Read($buff, 0, $buff.Length);  ^
        if ($count -gt 0) {  ^
            $to = '{0}.{1}' -f ($rootName, $idx);  ^
            $toFile = [io.file]::OpenWrite($to);  ^
            try {  ^
                'Writing ' + $count + ' to ' + $to;  ^
                $tofile.Write($buff, 0, $count);  ^
            } finally {  ^
                $tofile.Close();  ^
            }  ^
        }  ^
        $idx ++;  ^
    } while ($count -gt 0);  ^
}  ^
finally {  ^
    $fromFile.Close();  ^
}  ^
%End PowerShell%

Note the code will split the data within cells, thus you will need to fix the first and last row of each of the files that are produced. I understand this not a VBA workaround, but it will work… Modify the upperbound to change the file size, i.e., 50mb, etc.