Home » excel » regex – Finding incorrectly-formatted email addresses in a CSV file

regex – Finding incorrectly-formatted email addresses in a CSV file

Posted by: admin May 14, 2020 Leave a comment

Questions:

I’ve got a CSV file full of contact information and I’m trying to load it into GMail contacts, but the e-mail addresses aren’t all correctly formatted so GMail doesn’t recognise it as an e-mail field. I can open the CSV file in Excel (Mac) but I don’t know if there’s a way of pattern matching in Excel. Also, some of the fields may contain commas (such as address fields) and so using sed/awk could be difficult. Is there any way I can quickly find which e-mail addresses are incorrectly formatted?

How to&Answers:

dump the addresses to a text file then run the file through findstr (as in-built windows cmd command) and get it to echo the matching patterns:

eg:

findstr /V "\b[A-Z0-9._%+-][email protected][A-Z0-9.-]+\.[A-Z]{2,4}\b"  input_emails.txt > bad_emails.txt

Answer:

digdb is an excel plugin for validating email addresses that might be useful for you. It isn’t free but does have a time limited free demo that you could at least use to see if it does what you want. I haven’t tried this myself, but it came up in my google search when I was trying to find the one that I have used way back (and can no longer find)

Answer:

Open you CSV file with a regex-capable text editor, like TextWrangler (since you’re on mac, free). Then run a regex search, and fix manually the found results, because the possible issues can be really twisted. It is simpler if the email column comes first.

Assuming email address comes first and is not enclosed in quotes:

^(?![A-Z0-9\._%\+\-][email protected][A-Z0-9.-]{2,}\.[A-Z]{2,4},)

Note:

  • using negative look-ahead here to find incorrect matches, hope your editor supports it (not on my mac now)
  • if all email addresses have quotes, add a quote ” after first ! and before last ,