Home » excel » excel – Parsing downloaded data to a simpler structure

excel – Parsing downloaded data to a simpler structure

Posted by: admin May 14, 2020 Leave a comment

Questions:

Every month I download data from a vendor of ours which is small but it is in a format that is not easy to use lookup formulas on. I then read it with a mess of cell references and hope they are looking to the right spot. What would be the best way to read the data and structure it like in the image below. I need to read columns A:G one month and then next month it will be A:H but will only have 12 months Max ever, and then structure it to work in my report as pictured in I2:K10,

The “Location” may not have data in the download from the vendor. So the Locations are changing. Also I need to download about 30 of these small data ranges from them to put together in a bigger report. Also the data will be pasted to its own sheet and the pulled data on another.
I am open to VBA suggestions as well as just cell formulas.

The different colors are there to show what I am trying to read and where I need it written.

Thanks,

-Scheballs
enter image description here

How to&Answers:

This is Part 2 of the answer which introduces the solution and contains the main routine. Part 3 contains the sub routines. Part 1 introduces techniques that I use in the solution.

My solution requires the macro’s workbook contain two worksheets: one for errors and one for the consolidated data. The names of these workbooks are defined as constants so can be changed as required.

I created a number of CSV files that I believe match the format of those you download. A typical example is:

 1   Caution: Rates Have Not Been Adjusted For Patient Mix
 2   St Anthony's Hospital
 3   Jan 2013 - April 2013 Location Comparison Based on 6 Locations
 4   CMS Qualified HCAHPS Data from All Service Lines
 5   Communications about Medications Composite Results
 6   Location,Jan 2013,Feb 2013,Mar 2013,Apr 2013,Composite Rate,Percentile
 7   2E,70,72.22,64.62,81.82,72.17,49th
 8   2S,60,62.22,54.62,71.82,62.17,39th
 9   3N,78.57,83.33,66.67,NR,76.19,74th
10   3S,50,90,50,100,72.5,56th
11   4N,88.89,75,77.27,100,85.29,85th
12   ICU/PCU,72.73,50,80,100,75.68,54th
13
14   St Anthony's Hospital,73.5,73.28,67.89,84.21,74.72,59th
15   Vendor DB % Top Box,72.29,72.86,73.58,75.17,73.48

The hospital names are real although it is a coincidence if they are ones of interest to you. The questions are I believe correct. The locations and the data are imaginary.

My code thoroughly checks the format of a CSV file because I have been caught by authors changing the format of such files without warning. Gross changes might crash a macro but minor changes can go unnoticed for months.

Checks include matching the date range of row 3 against the idnividal dates on row 6. A failed check will result in a message in the Error worksheet. Most checks simply result in that file being rejected. However, it is a fatal error for two CSV files to have different date ranges.

I had planned to create the consolidated worksheet based on the data I found. However, you use absolute addresses to copy the values into the reporting worksheets so you do not want the data to move from month to month depending on which locations are included in the CSV file. Instead I have created a fixed layout:

Consolidate worksheet before updating

Hospital names are in column 1. The name must be against the first location for the hospital but is optional for susequent rows. No doubt you will pick one style or the other but I mixed styles for my tests. A CSV file with a hospital name other than one of those in this listed here will be rejected.

Locations are in column 2. There is no significance to the sequence of location except that the final row must be for the total/average/summary. I have used the “Total” as the row title but you can change it to anything. Not every location listed here need appear in a CSV file but if a CSV file contains an unexpected location, it will be rejected.

The questions are listed from A3 onwards. A CSV file containing a question not listed here will be rejected.

The initial contents of the data area of this worksheet do not matter because they are cleared by the macro.

After running the macro, the worksheet might look this. Gaps mean that I have no test data for that hospital/questions:

Consolidate worksheet after updating

I believe the comments within my code are sufficient for you to change it to match the format of the CSV files if they are different from my guess.

This code is designed to be in its own module. This code does not rely on anything in the demo macros. Good luck.

Option Explicit ' Constants are a convenient way of defining values that will not change ' during a run of the macro. They are particular suitable for: ' (1) Replacing numbers by meaningful name. If column 5 is used for ' names, say, using ColName instead of 5 helps document the macro. ' (2) Values that are used in several places and might change. When they ' do change, one amendment is sufficient to fully update the macro. Const ColConsolHosp As Long = 1 '\ Const ColConsolLocn As Long = 2 '| If the columns of the consolidate Const ColConsolQuestFirst As Long = 3 '| worksheet are rearranged, these Const ColConsolQuestLast As Long = 12 '/ valuesmust be ajusted to match. Const ColErrorTime As Long = 1 Const ColErrorFile As Long = 2 Const ColErrorRow As Long = 3 Const ColErrorCol As Long = 4 Const ColErrorMsg As Long = 5 Const FmtDate As String = "dmmmyy" Const FmtDateTime As String = "dmmmyy hh:mm" Const WkShtNameConsol As String = "Consolidate" '\ Change if require output to Const WkShtNameError As String = "Error" '/ different worksheets. Sub Consolidate() Dim CellValueConsol() As Variant ' Cell values from used range ' of consoldate worksheet Dim ColSrcCompositeRate As Long ' Column hold composite rate Dim ColConsolCrnt As Long Dim DateStartAll As Date Dim DateStartCrnt As Date Dim DateEndAll As Date Dim DateEndCrnt As Date Dim ErrMsg As String Dim FileCellValueSrc() As Variant ' Value of UsedRange for each CSV file Dim FileError() As Boolean ' Error state for each file Dim FileInxHosp() As Long ' Hospital for each CSV file Dim FileInxQuest() As Long ' Question for each CSV file Dim FileName() As String ' Name for each CSV file Dim FileSysObj As Object Dim FileObj As Object Dim FolderObj As Object Dim Found As Boolean Dim HospName() As Variant ' Names of hospitals Dim HospNameCrnt As String Dim InxFileCrnt As Long Dim InxFileDate As Long Dim InxHospCrnt As Long Dim InxLocnCrnt As Long Dim InxQuestCrnt As Long Dim Locn() As Variant ' Locations for each hosital Dim NumCSVFile As Long ' Number of CSV files Dim NumHosps As Long Dim NumMonthsData As Long Dim PathName As String Dim Quest As Variant ' Array of questions Dim RowConsolCrnt As Long Dim RowConsolHospFirst() As Long ' First row for each hospital ' within consolidate worksheet Dim RowConsolTemp As Long Dim RowErrorCrnt As Long Dim RowSrcCrnt As Long Dim WkBkSrc As Workbook Application.ScreenUpdating = False ' Reduces screen flash and increases speed ' Load CSV files ' ============== PathName = Application.ThisWorkbook.Path Set FileSysObj = CreateObject("Scripting.FileSystemObject") Set FolderObj = FileSysObj.GetFolder(PathName) NumCSVFile = 0 ' Loop through files to count number of CSV files For Each FileObj In FolderObj.Files If LCase(Right(FileObj.Name, 4)) = ".csv" Then NumCSVFile = NumCSVFile + 1 End If Next ' Size arrays holding data per file ReDim FileCellValueSrc(1 To NumCSVFile) ReDim FileError(1 To NumCSVFile) ReDim FileInxHosp(1 To NumCSVFile) ReDim FileInxQuest(1 To NumCSVFile) ReDim FileName(1 To NumCSVFile) InxFileCrnt = 0 ' Loop through files to save names and cell values. For Each FileObj In FolderObj.Files If LCase(Right(FileObj.Name, 4)) = ".csv" Then InxFileCrnt = InxFileCrnt + 1 FileName(InxFileCrnt) = FileObj.Name Set WkBkSrc = Workbooks.Open(PathName & "\" & FileObj.Name) FileCellValueSrc(InxFileCrnt) = WkBkSrc.ActiveSheet.UsedRange WkBkSrc.Close ' Close the CSV file End If Next ' Release resources Set FileSysObj = Nothing Set FolderObj = Nothing ' Extract controlling values from consolidate worksheet ' ===================================================== With Worksheets(WkShtNameConsol) CellValueConsol = .UsedRange.Value End With 'Debug.Print UBound(CellValueConsol, 1) 'Debug.Print UBound(CellValueConsol, 2) ' This code assumes a single header row consisting of: ' Hospital Location Question1 Question2 ... ' with appropriate names in the first two columns. The cells under the ' questions will all be overwritten. ' These columns are accessed using constants. Limited variation could ' be achieved within amending the code by changing constants. ' Execution will stop at a Debug.assert statement if the expression has a ' value of False. This is an easy way of confirming the worksheet is as ' expected. If a user might change the format of the output worksheet, ' this should be replaced by a MsgBox statement. Debug.Assert CellValueConsol(1, ColConsolHosp) = "Hospital" Debug.Assert CellValueConsol(1, ColConsolLocn) = "Location" ' Count number of hospitals. ' This code assumes all locations for a hospital are together and start at ' row 2. The hospital name may be repeated or may be blank on the second and ' subsequent rows for a hospital. That is, the following is acceptable: ' HospitalA X ' HospitalA Y ' HospitalA Z ' HospitalB X ' Y ' Z ' Count number of hospitals HospNameCrnt = CellValueConsol(2, ColConsolHosp) NumHosps = 1 For RowConsolCrnt = 3 To UBound(CellValueConsol, 1) If CellValueConsol(RowConsolCrnt, ColConsolHosp) <> HospNameCrnt And _ CellValueConsol(RowConsolCrnt, ColConsolHosp) <> "" Then NumHosps = NumHosps + 1 HospNameCrnt = CellValueConsol(RowConsolCrnt, ColConsolHosp) End If Next 'Debug.Print NumHosps ' Size HospName, Locn and RowConsolHospFirst for the number of hospitals ReDim HospName(1 To NumHosps) ReDim Locn(1 To NumHosps) ReDim RowConsolHospFirst(1 To NumHosps) ' Load Hospital and Location arrays InxHospCrnt = 1 HospNameCrnt = CellValueConsol(2, ColConsolHosp) HospName(InxHospCrnt) = HospNameCrnt RowConsolHospFirst(InxHospCrnt) = 2 For RowConsolCrnt = 3 To UBound(CellValueConsol, 1) If CellValueConsol(RowConsolCrnt, ColConsolHosp) <> HospNameCrnt And _ CellValueConsol(RowConsolCrnt, ColConsolHosp) <> "" Then ' Load locations from worksheet to Location array Call ExtractSubArray(CellValueConsol, Locn(InxHospCrnt), _ RowConsolHospFirst(InxHospCrnt), ColConsolLocn, _ RowConsolCrnt - 1, ColConsolLocn) HospNameCrnt = CellValueConsol(RowConsolCrnt, ColConsolHosp) InxHospCrnt = InxHospCrnt + 1 HospName(InxHospCrnt) = HospNameCrnt RowConsolHospFirst(InxHospCrnt) = RowConsolCrnt End If Next ' Load locations for final hospital from worksheet to Location array Call ExtractSubArray(CellValueConsol, Locn(InxHospCrnt), _ RowConsolHospFirst(InxHospCrnt), ColConsolLocn, _ UBound(CellValueConsol, 1), ColConsolLocn) ' Load questions Call ExtractSubArray(CellValueConsol, Quest, _ 1, ColConsolQuestFirst, _ 1, ColConsolQuestLast) ' Clear data area of Consolidate worksheet ' ======================================= For RowConsolCrnt = 2 To UBound(CellValueConsol, 1) For ColConsolCrnt = ColConsolQuestFirst To ColConsolQuestLast CellValueConsol(RowConsolCrnt, ColConsolCrnt) = "" Next Next ' Prepare error worksheet '======================== With Worksheets(WkShtNameError) .Cells.EntireRow.Delete .Cells(1, ColErrorTime).Value = "Time" With .Cells(1, ColErrorFile) .Value = "File" .ColumnWidth = 71.71 End With With .Cells(1, ColErrorRow) .Value = "Row" .HorizontalAlignment = xlRight .ColumnWidth = 4 End With With .Cells(1, ColErrorCol) .Value = "Col" .HorizontalAlignment = xlRight .ColumnWidth = 4 End With With .Cells(1, ColErrorMsg) .Value = "Error" .ColumnWidth = 71.71 End With End With RowErrorCrnt = 1 ' Validate the CSV files and extract key information ' ================================================== InxFileDate = -1 'Date range not yet found NumMonthsData = 0 For InxFileCrnt = 1 To UBound(FileName) FileError(InxFileCrnt) = False ' No error found for this file If IsEmpty(FileCellValueSrc(InxFileCrnt)) Then ' The CSV file was empty Call RecordError(FileName(InxFileCrnt), 0, 0, _ "Empty CSV file", RowErrorCrnt) FileError(InxFileCrnt) = True ' This CSV file to be ignored ElseIf VarType(FileCellValueSrc(InxFileCrnt)) = vbString Then ' The CSV file contained a single value Call RecordError(FileName(InxFileCrnt), 0, 0, _ "CSV file contains a single string", RowErrorCrnt) FileError(InxFileCrnt) = True ' This CSV file to be ignored Else ' The only remaining format that could be returned from a range ' is an array ' Check that cells contain the values expected. ' Most checking code has been placed in subroutines. This keeps the code ' in the main routine clean and simple and allows the subroutines to be ' copied easily to new workbooks with macros performing similar tasks. ' Check Cell A1 = "Caution: Rates Have Not Been Adjusted For Patient Mix" Call CheckCellValueSingle(FileName(InxFileCrnt), _ FileCellValueSrc(InxFileCrnt), FileError(InxFileCrnt), _ 1, 1, _ "Caution: Rates Have Not Been Adjusted For Patient Mix", _ RowErrorCrnt) ' Check Cell A2 is a known hospital. Save InxHosp against file Call CheckCellValueMultiple(FileName(InxFileCrnt), _ FileCellValueSrc(InxFileCrnt), _ FileError(InxFileCrnt), 2, 1, HospName, _ FileInxHosp(InxFileCrnt), RowErrorCrnt) ' Check Cell A3 is: Date - Date Location Comparison Based on N Locations Call CheckDateRangeLocn(FileName(InxFileCrnt), _ FileCellValueSrc(InxFileCrnt), _ FileError(InxFileCrnt), 3, 1, _ DateStartCrnt, DateEndCrnt, RowErrorCrnt) ' Save DateStartCrnt and DatEndCrnt or check they are the same as the ' previously saved values If InxFileDate = -1 Then ' First set of dates DateStartAll = DateStartCrnt DateEndAll = DateEndCrnt InxFileDate = InxFileCrnt ' The first file found with these dates Else If DateStartAll = DateStartCrnt And DateEndAll = DateEndCrnt Then ' The date range for this CSV file matches those of previous files Else Call RecordError(FileName(InxFileCrnt), 3, 1, _ "**FATAL ERROR**: Date ranges do not match:" & vbLf & _ Format(DateStartAll, FmtDate) & " - " & _ Format(DateEndAll, FmtDate) & " " & _ FileName(InxFileDate) & vbLf & _ Format(DateStartCrnt, FmtDate) & " - " & _ Format(DateEndCrnt, FmtDate) & " " & _ FileName(InxFileCrnt), RowErrorCrnt) ' There are incompatible CSV files. This is a fatal error. Give up. Exit Sub End If End If ' Check Cell A4 = "CMS Qualified HCAHPS Data from All Service Lines" Call CheckCellValueSingle(FileName(InxFileCrnt), _ FileCellValueSrc(InxFileCrnt), _ FileError(InxFileCrnt), 4, 1, _ "CMS Qualified HCAHPS Data from All Service Lines", _ RowErrorCrnt) ' Check Cell A5 = Question " Composite Results" If Not CheckBound(FileCellValueSrc(InxFileCrnt), 5, 1, ErrMsg) Then Call RecordError(FileName(InxFileCrnt), 5, 1, ErrMsg, RowErrorCrnt) FileError(InxFileCrnt) = True Else FileInxQuest(InxFileCrnt) = -1 ' No match against question For InxQuestCrnt = 1 To UBound(Quest) If FileCellValueSrc(InxFileCrnt)(5, 1) = _ Quest(InxQuestCrnt) & " Composite Results" Then FileInxQuest(InxFileCrnt) = InxQuestCrnt Exit For End If Next If FileInxQuest(InxFileCrnt) = -1 Then ' No match found FileError(InxFileCrnt) = True Call RecordError(FileName(InxFileCrnt), 5, 1, """" & _ FileCellValueSrc(InxFileCrnt)(5, 1) & _ """ does not match a known question", RowErrorCrnt) End If End If ' Check cell A6 is: "Location" Call CheckCellValueSingle(FileName(InxFileCrnt), _ FileCellValueSrc(InxFileCrnt), _ FileError(InxFileCrnt), 6, 1, "Location", _ RowErrorCrnt) ' Check cells B6 to X6 are the 1st day of month ' from DateStartAll to DateEndAll Call CheckDateSequence(FileName(InxFileCrnt), _ FileCellValueSrc(InxFileCrnt), _ FileError(InxFileCrnt), 6, 2, DateStartAll, _ DateEndAll, "a", "m", RowErrorCrnt) ' Check cells Y6 is "Composite Rate" If Not FileError(InxFileCrnt) Then ' The data range is not guaranteed until the file is error free NumMonthsData = DateDiff("m", DateStartAll, DateEndAll) + 1 ColSrcCompositeRate = NumMonthsData + 2 Call CheckCellValueSingle(FileName(InxFileCrnt), _ FileCellValueSrc(InxFileCrnt), _ FileError(InxFileCrnt), 6, ColSrcCompositeRate, _ "Composite Rate", RowErrorCrnt) End If If Not FileError(InxFileCrnt) Then ' For row 7 down to the first empty column A, check column A contains ' a known location and ColSrcCompositeRate is numeric. RowSrcCrnt = 7 InxHospCrnt = FileInxHosp(InxFileCrnt) Do While True If Not CheckBound(FileCellValueSrc(InxFileCrnt), _ RowSrcCrnt, 1, ErrMsg) Then ' Row not present Call RecordError(FileName(InxFileCrnt), RowSrcCrnt, 1, _ ErrMsg, RowErrorCrnt) FileError(InxFileCrnt) = True Exit Do End If If Not CheckBound(FileCellValueSrc(InxFileCrnt), _ RowSrcCrnt, ColSrcCompositeRate, ErrMsg) Then ' Composite rate missing Call RecordError(FileName(InxFileCrnt), RowSrcCrnt, _ ColSrcCompositeRate, ErrMsg, RowErrorCrnt) FileError(InxFileCrnt) = True Exit Do ElseIf Not IsNumeric(FileCellValueSrc(InxFileCrnt)(RowSrcCrnt, _ ColSrcCompositeRate)) Then ' Composite rate is not numeric Call RecordError(FileName(InxFileCrnt), RowSrcCrnt, _ ColSrcCompositeRate, "Composite rate """ & _ FileCellValueSrc(InxFileCrnt)(RowSrcCrnt, _ ColSrcCompositeRate) & """ is not numeric", _ RowErrorCrnt) End If If FileCellValueSrc(InxFileCrnt)(RowSrcCrnt, 1) = "" Then ' End of location list within file Exit Do End If Found = False For InxLocnCrnt = 1 To UBound(Locn(InxHospCrnt)) If FileCellValueSrc(InxFileCrnt)(RowSrcCrnt, 1) = _ Locn(InxHospCrnt)(InxLocnCrnt) Then ' Location from CSV file found in list from consolidate worksheet Found = True Exit For End If Next If Not Found Then Call RecordError(FileName(InxFileCrnt), RowSrcCrnt, 1, _ "Location """ & _ FileCellValueSrc(InxFileCrnt)(RowSrcCrnt, 1) & _ """ not found in list from worksheet """ & _ WkShtNameConsol & """", RowErrorCrnt) FileError(InxFileCrnt) = True End If RowSrcCrnt = RowSrcCrnt + 1 Loop End If If Not FileError(InxFileCrnt) Then ' Row RowSrcCrnt will have a blank column 1 RowSrcCrnt = RowSrcCrnt + 1 ' Check column A is the total line for the hospital Call CheckCellValueSingle(FileName(InxFileCrnt), _ FileCellValueSrc(InxFileCrnt), _ FileError(InxFileCrnt), RowSrcCrnt, 1, _ HospName(FileInxHosp(InxFileCrnt)), _ RowErrorCrnt) If Not CheckBound(FileCellValueSrc(InxFileCrnt), _ RowSrcCrnt, ColSrcCompositeRate, ErrMsg) Then ' Composite rate missing Call RecordError(FileName(InxFileCrnt), RowSrcCrnt, _ ColSrcCompositeRate, ErrMsg, RowErrorCrnt) FileError(InxFileCrnt) = True ElseIf Not IsNumeric(FileCellValueSrc(InxFileCrnt)(RowSrcCrnt, _ ColSrcCompositeRate)) Then ' Composite rate is not numeric Call RecordError(FileName(InxFileCrnt), RowSrcCrnt, _ ColSrcCompositeRate, "Composite rate """ & _ FileCellValueSrc(InxFileCrnt)(RowSrcCrnt, _ ColSrcCompositeRate) & """ is not numeric", _ RowErrorCrnt) End If End If End If Next InxFileCrnt ' If get here there has not been a fatal error although one or more ' individual files may have been rejected. For InxFileCrnt = 1 To UBound(FileName) If Not FileError(InxFileCrnt) Then ' No error has been found in this file InxHospCrnt = FileInxHosp(InxFileCrnt) InxQuestCrnt = FileInxQuest(InxFileCrnt) ColConsolCrnt = 2 + InxQuestCrnt RowSrcCrnt = 7 ' First location row Do While True If FileCellValueSrc(InxFileCrnt)(RowSrcCrnt, 1) = "" Then ' End of location list within file Exit Do End If For InxLocnCrnt = 1 To UBound(Locn(InxHospCrnt)) If FileCellValueSrc(InxFileCrnt)(RowSrcCrnt, 1) = _ Locn(InxHospCrnt)(InxLocnCrnt) Then ' Location from CSV file found in list from consolidate worksheet RowConsolCrnt = RowConsolHospFirst(InxHospCrnt) + InxLocnCrnt - 1 CellValueConsol(RowConsolCrnt, ColConsolCrnt) = _ FileCellValueSrc(InxFileCrnt)(RowSrcCrnt, ColSrcCompositeRate) Exit For End If Next RowSrcCrnt = RowSrcCrnt + 1 Loop RowSrcCrnt = RowSrcCrnt + 1 ' Advance to hospital total line ' Assume last location row is for total RowConsolCrnt = RowConsolHospFirst(InxHospCrnt) + _ UBound(Locn(InxHospCrnt)) - 1 CellValueConsol(RowConsolCrnt, ColConsolCrnt) = _ FileCellValueSrc(InxFileCrnt)(RowSrcCrnt, ColSrcCompositeRate) End If Next ' Write new values back to consolidate worksheet ' ============================================== With Worksheets(WkShtNameConsol) .UsedRange.Value = CellValueConsol End With End Sub 

Answer:

This is Part 3 which contains subroutines for the code in Part 2. Introduction in Part 1.

Function CheckBound(ByRef CellValue As Variant, _ ByVal RowFile As Long, ByVal ColFile As Long, _ ByRef Msg As String) ' Return True if CellValue(RowFile, ColFile) exists If RowFile > UBound(CellValue, 1) Then ' Row not present in file CheckBound = False Msg = "No such row within file" Exit Function End If If ColFile > UBound(CellValue, 2) Then ' Column not present in file CheckBound = False Msg = "No such column within file" Exit Function End If CheckBound = True End Function Sub CheckCellValueMultiple(ByRef FileNameCrnt As String, _ ByRef CellValue As Variant, _ ByRef CellError As Boolean, _ ByVal RowFile As Long, ByVal ColFile As Long, _ ByRef ValueReq() As Variant, _ ByRef InxValue As Long, _ ByRef RowErrorCrnt As Long) ' Check that a specified cell of a CSV file has one of a number of permitted ' values. ' Set CellError is True if the cell does not have any of the permitted ' required value. ' CellError is unchanged if the cell does have the required value. This means ' that several calls can be made to perform different checks and any failure ' will result in CellValue ending with a value of True. ' FileNameCrnt The name of the current file. Used in error message if any. ' CellValue The array of cell contents from the current file. ' CellError Set to True if an error is found. ' RowFile The row to be checked ' ColFile The column to be checked ' ValueReq An array containing all permitted values for the cell. ' InxValue If the cell value is matched against one of the permitted ' values, the index into ValueReq of that permitted value. ' RowErrorCrnt The last used row of the error worksheet. Any error message ' will be written to the next row. Dim CellValueCrnt As Variant Dim ErrMsg As String If Not CheckBound(CellValue, RowFile, ColFile, ErrMsg) Then Call RecordError(FileNameCrnt, RowFile, ColFile, ErrMsg, RowErrorCrnt) CellError = True Exit Sub End If CellValueCrnt = CellValue(RowFile, ColFile) For InxValue = LBound(ValueReq) To UBound(ValueReq) If CellValueCrnt = ValueReq(InxValue) Then ' Cell value matched against a permitted value Exit Sub End If Next Call RecordError(FileNameCrnt, RowFile, ColFile, _ """" & CellValue(RowFile, ColFile) & _ """ not matched against any of the permitted values", _ RowErrorCrnt) CellError = True End Sub Sub CheckCellValueSingle(ByRef FileNameCrnt As String, _ ByRef CellValue As Variant, _ ByRef CellError As Boolean, _ ByVal RowFile As Long, ByVal ColFile As Long, _ ByVal ValueReq As String, ByRef RowErrorCrnt As Long) ' Check that a specified cell of a CSV file has a required value. ' Set CellError is True if the cell does not have the required value. ' CellError is unchanged if the cell does have the required value. This means ' that several calls can be made to perform different checks and any failure ' will result in CellValue ending with a value of True. ' FileNameCrnt The name of the current file. Used in error message if any. ' CellValue The array of cell contents from the current file. ' CellError Set to True if an error is found. ' RowFile The row to be checked ' ColFile The column to be checked ' ValueReq The required value for the cell ' RowErrorCrnt The last used row of the error worksheet. Any error message ' will be written to the next row. Dim ErrMsg As String If Not CheckBound(CellValue, RowFile, ColFile, ErrMsg) Then Call RecordError(FileNameCrnt, RowFile, ColFile, ErrMsg, RowErrorCrnt) CellError = True Exit Sub End If If CellValue(RowFile, ColFile) = ValueReq Then ' Required value found Exit Sub End If Call RecordError(FileNameCrnt, RowFile, ColFile, """" & ValueReq & _ """ expected but """ & CellValue(RowFile, ColFile) _ & """ found", RowErrorCrnt) CellError = True End Sub Sub CheckDateRangeLocn(ByVal FileNameCrnt As String, _ ByRef CellValue As Variant, _ ByRef CellError As Boolean, ByVal RowFile As Long, _ ByVal ColFile As Long, ByRef DateStart As Date, _ ByRef DateEnd As Date, ByRef RowErrorCrnt As Long) ' Check a specified cell of a CSV file has the format: ' Date "-" Date "Location Comparison Based on" N "Locations" ' Set CellError = True if the cell does not have this value. ' The values of DateStartCrnt and DateEndCrnt are not defined ' if CellError is set to True, ' Note: the value of N is not returned ' FileNameCrnt The name of the current file. Used in error message if any. ' CellValue The array of cell contents from the current file. ' CellError Set to True if an error is found. ' RowFile The row to be checked ' ColFile The column to be checked ' DateStartCrnt The value of the first date. Only guaranteed if CellError ' not set to True ' DateEndCrnt The value of the last date. Only guaranteed if CellError ' not set to True ' RowErrorCrnt The last used row of the error worksheet. Any error message ' will be written to the next row. Dim ErrMsg As String Dim Pos As Long Dim Stg As String If Not CheckBound(CellValue, RowFile, ColFile, ErrMsg) Then Call RecordError(FileNameCrnt, RowFile, ColFile, ErrMsg, RowErrorCrnt) CellError = True Exit Sub End If Stg = CellValue(3, 1) Pos = InStr(1, Stg, "-") If Pos = 0 Then ' No hypen in string. CellError = True Exit Sub End If If Not IsDate(Mid(Stg, 1, Pos - 1)) Then ' Value before hyphen is not a date CellError = True Exit Sub End If DateStart = DateValue(Mid(Stg, 1, Pos - 1)) Stg = Mid(Stg, Pos + 1) Pos = InStr(1, Stg, "Location Comparison Based on") If Pos = 0 Then ' Important sub-string missing CellError = True Exit Sub End If If Not IsDate(Mid(Stg, 1, Pos - 1)) Then ' Value after hyphen is not a date CellError = True Exit Sub End If DateEnd = DateValue(Mid(Stg, 1, Pos - 1)) Stg = Mid(Stg, Pos + Len("Location Comparison Based on")) If Not Right(Stg, Len("Locations")) = "Locations" Then ' Important sub-string missing CellError = True Exit Sub End If Stg = Mid(Stg, 1, Len(Stg) - Len("Locations")) If Not IsNumeric(Stg) Then ' N is not numeric CellError = True Exit Sub End If ' CellError unchanged. DateStart and DateEnd set End Sub Sub CheckDateSequence(ByVal FileNameCrnt As String, _ ByRef CellValue As Variant, ByRef RangeError As Boolean, _ ByVal RowFileStart As Long, ByVal ColFileStart As Long, _ ByVal DateStart As Date, ByVal DateEnd As Date, _ ByVal Direction As String, ByVal Interval As String, _ ByRef RowErrorCrnt As Long) ' Check a sequence of cells to hold a sequence of dates. ' FileNameCrnt The name of the current file. Used in error message if any. ' CellValue An array of cell contents from the current file. ' RangeError Set to True if an error is found. ' RowFileStart \ Identify the first cell of the sequence ' ColFileStart / ' DateStart The value of the first date in the sequence. ' DateEnd The value of the last date in the sequence. ' Direction Permitted values are "a" for across and "d" for down. ' Interval Permitted values are as for the Interval parameter of the ' function DateAdd. Each cell in the sequence must be one ' date interval more than the previous cell until DateEnd is ' reached. ' RowErrorCrnt The last used row of the error worksheet. Any error message ' will be written to the next row. Dim ColFileCrnt As Long Dim DateCrnt As Date Dim DateTemp As Date Dim ErrMsg As String Dim RowFileCrnt As Long DateCrnt = DateStart RowFileCrnt = RowFileStart ColFileCrnt = ColFileStart Do While True If Not CheckBound(CellValue, RowFileCrnt, ColFileCrnt, ErrMsg) Then Call RecordError(FileNameCrnt, RowFileCrnt, _ ColFileCrnt, ErrMsg, RowErrorCrnt) RangeError = True Exit Sub End If If Not IsDate(CellValue(RowFileCrnt, ColFileCrnt)) Then ' Value is not a date nor is it a string that can be converted to a date Call RecordError(FileNameCrnt, RowFileCrnt, ColFileCrnt, _ "Value should be """ & Format(DateCrnt, FmtDate) & _ """ but found """ & CellValue(RowFileCrnt, ColFileCrnt) _ & """", RowErrorCrnt) RangeError = True Exit Sub End If DateTemp = DateValue(CellValue(RowFileCrnt, ColFileCrnt)) If DateTemp = DateCrnt Then ' Cell has expected value Else ' Cell does not have the expected value ' Excel corrupts "mmm-yy" to Day=yy, Month=mmm, Year=Current year DateTemp = DateSerial(Day(DateTemp), Month(DateTemp), 1) If DateTemp = DateCrnt Then ' Decorrupted value is the expected value ' Correct worksheet CellValue(RowFileCrnt, ColFileCrnt) = DateTemp Else Call RecordError(FileNameCrnt, RowFileCrnt, ColFileCrnt, _ "Value should be """ & Format(DateCrnt, FmtDate) & _ """ but found """ & CellValue(RowFileCrnt, ColFileCrnt) _ & """", RowErrorCrnt) RangeError = True Exit Sub End If End If If DateCrnt = DateEnd Then ' Successful check. Leave RangeError unchanged. Exit Sub End If DateCrnt = DateAdd(Interval, 1, DateCrnt) If Direction = "a" Then ColFileCrnt = ColFileCrnt + 1 ElseIf Direction = "d" Then RowFileCrnt = RowFileCrnt + 1 Else Debug.Assert False ' Invalid value. Only "a" or "d" allowed End If Loop End Sub Sub ExtractSubArray(ByRef ArraySrc() As Variant, ByRef ArrayDest As Variant, _ ByVal RowSrcTop As Long, ByVal ColSrcLeft As Long, _ ByVal RowSrcBot As Long, ByVal ColSrcRight As Long) ' ArraySrc An array loaded from a worksheet ' ArrayDest A variant which will be set to an array to which selected ' entries from ArraySrc are to be copied. If either ' RowTop = RowBot or Colleft = ColRight it will be a 1D array. ' Otherwise it will be a 2D array with rows as the first ' dimension. ' RowSrcTop \ Specify the rectangle to be extracted from ArraySrc. ' ColSrcLeft | ' RowSrcBot | It is the callers responsibility to ensure the ' ColSrcRight / these values are valid indices for ArraySrc. Dim ArrayDestLocal() As Variant Dim ColDestCrnt As Long Dim ColSrcCrnt As Long Dim NumColsDest As Long Dim NumRowsDest As Long Dim RowDestCrnt As Long Dim RowSrcCrnt As Long NumColsDest = ColSrcRight - ColSrcLeft + 1 NumRowsDest = RowSrcBot - RowSrcTop + 1 If NumColsDest = 1 Then ' The selected rectangle is a column ReDim ArrayDestLocal(1 To NumRowsDest) RowDestCrnt = 1 For RowSrcCrnt = RowSrcTop To RowSrcBot ArrayDestLocal(RowDestCrnt) = ArraySrc(RowSrcCrnt, ColSrcLeft) RowDestCrnt = RowDestCrnt + 1 Next ArrayDest = ArrayDestLocal ElseIf NumRowsDest = 1 Then ' The selected rectangle is a row ReDim ArrayDestLocal(1 To NumColsDest) ColDestCrnt = 1 For ColSrcCrnt = ColSrcLeft To ColSrcRight ArrayDestLocal(ColDestCrnt) = ArraySrc(RowSrcTop, ColSrcCrnt) ColDestCrnt = ColDestCrnt + 1 Next ArrayDest = ArrayDestLocal Else ' The selected rectangle is a rectangle ReDim ArrayDestLocal(1 To NumRowsDest, 1 To NumColsDest) RowDestCrnt = 1 For RowSrcCrnt = RowSrcTop To RowSrcBot ColDestCrnt = 1 For ColSrcCrnt = ColSrcLeft To ColSrcRight ArrayDestLocal(RowDestCrnt, ColDestCrnt) = _ ArraySrc(RowSrcCrnt, ColSrcCrnt) ColDestCrnt = ColDestCrnt + 1 Next RowDestCrnt = RowDestCrnt + 1 Next ArrayDest = ArrayDestLocal End If End Sub Sub RecordError(ByRef FileName As String, ByRef RowFile As Long, _ ByRef ColFile As Long, ByRef Msg As String, _ ByRef RowError As Long) ' Outputs an error to the error worksheet Debug.Assert Not IsNumeric(FileName) With Worksheets(WkShtNameError) RowError = RowError + 1 With .Cells(RowError, ColErrorTime) .Value = Now() .NumberFormat = FmtDateTime End With .Cells(RowError, ColErrorFile).Value = FileName If RowFile <> 0 Then .Cells(RowError, ColErrorRow).Value = RowFile If ColFile <> 0 Then .Cells(RowError, ColErrorCol).Value = ColFile With .Cells(RowError, ColErrorMsg) .Value = Msg .WrapText = True End With End With End Sub 

Answer:

This is Part 1 of the answer which introduces techniques I require for the solution but which might not be familar to someone new to VBA. The main routine of the solution is in Part 2 and its subroutines in Part 3.

The question does not fully describe the problem. The first step is the downloading of 40 CSV files from a remote site. No attempt will be made to automate that step until later. The second step is to identify the CSV files that have been downloaded to the folder containing the workbook to which the data will written.

Create a new Excel workbook, open the Visual Basic Editor, create a module and copy this code to that module. The macro Demo01 lists the names of the files in the same folder as the workbook.

' Option Explicit means every variable must be defined ' If omitted a misspelt variable become a declaration. For example: ' Dim Count As Long ' Cuont = Count + 1 ' declares a new variable Cuont and sets its value to Count+1. Such ' errors can be very difficult to spot. With Option Explicit, the ' compiler reports that Cuont is undefined. Option Explicit Sub Demo01() Dim Fl As Object Dim FlSysObj As Object Dim FldObj As Object ' When assigning a value to an object, you must have "Set" ' at the beginning of the statement ' This creates a file system object which gives you access to the file system Set FlSysObj = CreateObject("Scripting.FileSystemObject") ' This creates a folder object which gives access to all properties of the folder ' includes details of the files within it. Set FldObj = FlSysObj.GetFolder(Application.ActiveWorkbook.Path) ' Loop for each file in the folder and output its name to the Immediate Window For Each Fl In FldObj.Files Debug.Print Fl.Name Next End Sub 

We now need to ignore non-CSV files and to open the CSV files so we can access their contents.

Copy and paste the following macro, Demo02, to the same module as Demo01 and run it. Macro Demo02 opens every CSV file and outputs identifying information to the Immediate Window to prove it has done so.

Sub Demo02() Dim Fl As Object Dim FlSysObj As Object Dim FldObj As Object Dim PathName As String Dim WkBkSrc As Workbook PathName = Application.ThisWorkbook.Path Set FlSysObj = CreateObject("Scripting.FileSystemObject") Set FldObj = FlSysObj.GetFolder(PathName) For Each Fl In FldObj.Files ' I only want to load the CSV files so check the extension If LCase(Right(Fl.Name, 4)) = ".csv" Then ' There should be some error handling here to ensure the macro does not ' stop if a file fails to open. However, Excel's ability to open any old ' junk as a workbook never ceases to amaze me so for the sake of ' simplicity I have omitted it. If you do experience errors, consider ' something like this: ' Err.Clear ' Clear any record of previous error ' On Error Resume Next ' Continue if error ' Set WkBkSrc = Workbooks.Open(PathName & "\" & Fl.Name) ' On Error GoTo 0 ' Revert to normal error processing ' If Err.Number = 0 Then ' Debug.Print Fl.Name & " loaded successfully" ' Else ' Debug.Print Fl.Name & " failed to load." ' Debug.Print "Error Number = "; Err.Number & _ ' " Description = " & Err.Description ' End If Set WkBkSrc = Workbooks.Open(PathName & "\" & Fl.Name) ' The CSV file will be in the active worksheet of the active workbook ' If I understand your screen shot, columns 1 of rows 2 and 3 are the ' hospital name and the date range. The sole purpose of this debug ' print statement is to output something unique from each CSV file ' to prove each has been loaded. Change this statement as necessary ' if I have misunderstood the arrangement of the CSV files. Debug.Print Fl.Name & ": " & Cells(2, 1).Value & " " & Cells(3, 1).Value WkBkSrc.Close ' Close the CSV file ' The original workbook is again the active workbook. End If Next End Sub 

Having successfully read the CSV files into memory, it would appear the next step would be to locate and extract the required data ready for transfer to the destination. However, there is a prior step.

Before locating and extracting data, it is essential to check that the files are what you think they are. Attempting to process the wrong file will probably result in a crash when a value assumed to be a number is in fact a string. This will be embarrassing if others see it but will not be a disaster. What you must check for are subtle changes in the format that the author has forgotten to tell you about. An extra column or a change in sequence leading to extraction of the wrong data can go unnoticed for months and be very difficult to correct. Archiving all intermediate workbooks and all CSV files may allow you to recode and repeat the updates to create the correct current workbook but can you undo any decisions based on the faulty data?

All this checking, locating, extracting and storing requires that you consider how best to access the data. You can access the data in the loaded worksheet directly at the level of the single cell or the range. However, this can be slow. The volumes you are processing may mean this is not too important. However, even if performance is not an issue, moving data from one worksheet can get messy. I have decided to introduce more advanced techniques since there will be useful for future projects if not for this.

Variants can hold anything and the nature of their contents can be changed. The following is valid if not very sensible VBA:

Dim MyValue as Variant MyValue = 5 MyValue = "String" MyValue = Array("abc", 10, "def", 12) 

More interestingly, you can do the following:

MyValue = Range1.Value 

This will size MyValue as a 2D array just big enough to hold every value in Range1 and copy those values to MyValue. Accessing the cell values from memory is much faster than accessing them from the worksheet. The opposite is also possible:

 Range2.Value = DestValue 

This means, you can build up the data you wish to save in memory and then write it to the required range in a single statement.

It is conventional with 2D arrays to have columns as the first dimension and rows as the second. However, arrays created from a range are the opposite with rows as the first dimension. This may seem strange but matches the syntax for accessing worksheets:

 Cells(Row, Col).Value = 5 ' worksheet cell MyValue(Row, Col) = 5 ' array cell 

It is easy and convenient to load a single worksheet to a variant but I really need to load all the CSV files to memory before processing them.

Dim CellValueSrc() As Variant ' Define an array of variants ReDim CellValue(1 to CountCSVFile) ' Define the size of CellValueSrc ' UsedRange is a property of a worksheet so this loads the used part of ' the active worksheet (the CSV file) to CellValue(InxFile) CellValue(InxFile) = WkBkSrc.ActiveSheet.UsedRange 

The above statements are taken from macro Demo03. Instead of loading a worksheet to a variant, I have defined an array of variants and have then loaded each worksheet to a different element.

This is know as a jagged array. Not all languages have this functionality and it can be difficult to get one’s head around the idea. I have designed macro Demo03 to demonstrate their use. I first load all the worksheets to elements of a jagged array then copy the cell values to a new array and load that array to a new worksheet. Run Demo03 to see what it does then work through the code to see how it achieves this effect. Warning: the macro overwrite worksheet “Sheet1”. A comment near the bottom of the macro tells you what to change if this is unacceptable.

Sub Demo03() Dim CellValueDest() As Variant Dim CellValueSrc() As Variant Dim ColCrntSrc As Long Dim CountCell As Long Dim CountCSVFile As Long Dim Fl As Object Dim FlName() As String Dim FlSysObj As Object Dim FldObj As Object Dim InxFile As Long Dim PathName As String Dim RowCrntDest As Long Dim RowCrntSrc As Long Dim WkBkSrc As Workbook PathName = Application.ThisWorkbook.Path Set FlSysObj = CreateObject("Scripting.FileSystemObject") Set FldObj = FlSysObj.GetFolder(PathName) CountCSVFile = 0 ' Loop through files to count number of CSv files For Each Fl In FldObj.Files If LCase(Right(Fl.Name, 4)) = ".csv" Then CountCSVFile = CountCSVFile + 1 End If Next ' It is possible to use ReDim Preserve to enlarge an array. ' However, there is a lot of work behind a ReDim Preserve so ' I avoid them if I can. ' You can omit the lower bound but that means the lower bound depends of ' the Option Base statement. I prefer to be explicit. I also prefer ' lower bounds of 1 for most purposes. Many do not agree and most languages ' do not give the programmer a choice. My code so my choice. ReDim CellValueSrc(1 To CountCSVFile) ReDim FlName(1 To CountCSVFile) InxFile = 0 CountCell = 0 ' Loop through files to save names and cell values. ' Count number of cells at same time For Each Fl In FldObj.Files If LCase(Right(Fl.Name, 4)) = ".csv" Then InxFile = InxFile + 1 FlName(InxFile) = Fl.Name Set WkBkSrc = Workbooks.Open(PathName & "\" & Fl.Name) CellValueSrc(InxFile) = WkBkSrc.ActiveSheet.UsedRange If IsEmpty(CellValueSrc(InxFile)) Then ' The worksheet is empty ' Count as one cell CountCell = CountCell + 1 Else ' UBound(A,N) returns the upper bound of the Nth dimension of array A. ' An array loaded from a worksheet will always have lower bounds of 1. CountCell = CountCell + UBound(CellValueSrc(InxFile), 1) * _ UBound(CellValueSrc(InxFile), 2) End If WkBkSrc.Close ' Close the CSV file End If Next ' Release resources Set FlSysObj = Nothing Set FldObj = Nothing ' Prepare to create an output worksheet containing all the data loaded ReDim CellValueDest(1 To CountCell + 1, 1 To 4) CellValueDest(1, 1) = "File" CellValueDest(1, 2) = "Row" CellValueDest(1, 3) = "Column" CellValueDest(1, 4) = "Value" RowCrntDest = 1 For InxFile = 1 To UBound(FlName) If IsEmpty(CellValueSrc(InxFile)) Then RowCrntDest = RowCrntDest + 1 CellValueDest(RowCrntDest, 1) = FlName(InxFile) CellValueDest(RowCrntDest, 4) = "Empty CSV file" Else For RowCrntSrc = 1 To UBound(CellValueSrc(InxFile), 1) For ColCrntSrc = 1 To UBound(CellValueSrc(InxFile), 2) RowCrntDest = RowCrntDest + 1 CellValueDest(RowCrntDest, 1) = FlName(InxFile) CellValueDest(RowCrntDest, 2) = RowCrntSrc CellValueDest(RowCrntDest, 3) = ColCrntSrc ' Note the syntax for accessing cell value. ' CellValueSrc is a 1D array so CellValueSrc(InxFile) accessing ' an element within it. CellValueSrc(InxFile) is a 2D array so ' CellValueSrc(InxFile)(RowCrntSrc, ColCrntSrc) accessing an element ' within it. CellValueDest(RowCrntDest, 4) = _ CellValueSrc(InxFile)(RowCrntSrc, ColCrntSrc) Next Next End If Next ' 

Answer:

# This assumes that the workbook contains a worksheet "Sheet1" and that '

Answer:

# I can overwrite that worksheet. Change as necessary. With Worksheets("Sheet1") .Cells.EntireRow.Delete ' Delete any existing data ' Note that you have to specify the size of the output range. ' If the output range is not the same size as the array, the array will ' be truncated or repeated. .Range(.Cells(1, 1), .Cells(CountCell + 1, 4)).Value = CellValueDest .Columns(4).AutoFit End With End Sub