Home » excel » Garbage Data Checker Using Excel VBA

Garbage Data Checker Using Excel VBA

Posted by: admin May 14, 2020 Leave a comment

Questions:

My team and I have created a consolidator tool that consolidates data from worksheets uploaded using a button. However, there’s an added enhancement that our leads would like to have.

I don’t know if it’s possible in VBA but what they wanted is a button that checks and highlights “garbage data” (for example: if First Name column contains a blank or if it contains ajsajdj or something similar), it will prompt the user and ask them if they want to delete it.

We already have the code for the consolidator tool (and it’s working perfectly) however, this feature is headache inducing as I don’t know if it’s possible. I would really like to ask suggestions regarding this as I’m really new to VBA and programming.

Maybe, I would be enlightened on what next step I should take.

How to&Answers:

I do automated auditing of this type prolifically so I would approach it like this;

Your ‘audit macro’ is basically an iterator with many quality checks – is ‘name’ = “”, etc. You can run this auto macro either.

  1. On a single line of data each step of the consolidation
  2. After the consolidation has completed.

The first is easiest to use, and works like this:

  1. After your consolidation step run ‘auditing macro’ on line of data just brought in.
  2. If an issue is found, write the line of data to a separate Tab leaving column A blank; not to your consolidation. At the end of your consolidation give user a warning message if there have been any issue lines found at the end of the consolidation
  3. User skim reads data on separate tab, puts 1s for ‘keep’ in Column A.
  4. User clicks a button to run a macro which adds the rows with a ‘1’ against them back to your full data set (e.g. on the end if order doesn’t matter).

Equally you could approach this by running the audit when the consolidation is totally complete; in this case you’d need to delete or otherwise track rows which may be removed if a user chooses not to keep them.

I like this approach because it is non-blocking; user can leave your consolidator to run without supervising and then deal with exceptions at their convenience. Also you can write/edit as many tests as you want without fundamentally changing your consolidator at all; you can then also start for example counting the number of each issue per import and putting this into a report for continuous improvement… there are options to extend.

In terms of pseudocode its an iterator full of if-else blocks, with a single ‘there is an issue’ flag, which if it’s 1 causes the row to be treated as an issue;

For rowCount = startRow to endRow 
   ' startRow and endRow correspond to lines of data you just imported
   'Test 1
   if (Some condition e.g. cells(rowCount ,2).value = "") then
     issueFlag = 1
   End if

   'Test n...
   if (Some condition e.g. cells(startRow,2).value = "") then
     issueFlag = 1
   End if

next rowCount 

if issueFlag = 1 then
   'CODE TO PASTE DATA

   'Set some flag/variable which then triggers a Error Message at the end of the whole consolidation or audit
End if

You can put a Exit For at the end of the IF block so if the issueFlag is triggered you immediately exit and skip all further tests.

Answer:

Let’s see how a human would validate that

He would look at the name ajsajdj and think “I have never seen this name in my life before so it must be nonsense data”. But he might fail because the fact that he never came accross this name doesn’t mean it’s not an existing name (parents can be inventive sometimes).

So what a human actually does is comparing the ajsajdj with a list of names (he has in mind because of his experience in life).

Now a program can do the same

You can write a code that compares ajsajdj with a list of valid names. But here we are at the same point where the human can fail too. The list will never be complete because tomorrow parents invent a new name (that you don’t have in that list).

Conclusion

This task cannot be coded unless you define a rule for either valid or invalid data. Some programs look like they could do magic, but actually it is only working because of the rules you give them.