This seems like such a simple requirement, that I feel like I am missing something obvious.
I have an Excel spreadsheet with “dirty” text data in, containing text and unwanted leading and trailing, spaces, commas and newlines. I would like to TRIM references to these cells of all those characters.
Note: I don’t want to replace all those characters, since they legitimately appear within the cell text – it is just when at the start or end of the cell text (i.e. value) that I want to trim them off.
The text data consists of names of people and schools, for cleaning and importing into a CRM.
So, is there a function built in, or do I need to write one? I feel spoiled by the number of string filtering functions in PHP 😉
This is well suited to a regexp
The code below adapted from this article uses this regexp
to remove any leading and/or trailing whitespaces/commas while leaving any such characters within the text body intact
It will replace your existing data in-situ
Sub RemoveDirt() Dim rng1 As Range Dim rngArea As Range Dim lngRow As Long Dim lngCol As Long Dim lngCalc As Long Dim objReg As Object Dim X() On Error Resume Next Set rng1 = Application.InputBox("Select range for the replacement of leading zeros", "User select", Selection.Address, , , , , 8) If rng1 Is Nothing Then Exit Sub On Error GoTo 0 'See Patrick Matthews excellent article on using Regular Expressions with VBA Set objReg = CreateObject("vbscript.regexp") objReg.MultiLine = True objReg.Pattern = "[,\s]*(.+?)[,\s]*$" 'Speed up the code by turning off screenupdating and setting calculation to manual 'Disable any code events that may occur when writing to cells With Application lngCalc = .Calculation .ScreenUpdating = False .Calculation = xlCalculationManual .EnableEvents = False End With 'Test each area in the user selected range 'Non contiguous range areas are common when using SpecialCells to define specific cell types to work on For Each rngArea In rng1.Areas 'The most common outcome is used for the True outcome to optimise code speed If rngArea.Cells.Count > 1 Then 'If there is more than once cell then set the variant array to the dimensions of the range area 'Using Value2 provides a useful speed improvement over Value. On my testing it was 2% on blank cells, up to 10% on non-blanks X = rngArea.Value2 For lngRow = 1 To rngArea.Rows.Count For lngCol = 1 To rngArea.Columns.Count 'replace the leading zeroes X(lngRow, lngCol) = objReg.Replace(X(lngRow, lngCol), "$1") Next lngCol Next lngRow 'Dump the updated array sans dirt over the initial range rngArea.Value2 = X Else 'caters for a single cell range area. No variant array required rngArea.Value = objReg.Replace(rngArea.Value, "$1") End If Next rngArea 'cleanup the Application settings With Application .ScreenUpdating = True .Calculation = lngCalc .EnableEvents = True End With Set objReg = Nothing End Sub
I have found this code, which I pasted in as a module into my spreadsheet:
Option Explicit Function ReReplace(ReplaceIn, _ ReplaceWhat As String, ReplaceWith As String, Optional IgnoreCase As Boolean = False) Dim RE As Object Set RE = CreateObject("vbscript.regexp") RE.IgnoreCase = IgnoreCase RE.Pattern = ReplaceWhat RE.Global = True ReReplace = RE.Replace(ReplaceIn, ReplaceWith) End Function
This provides a replace function that supports REs (why doesn’t Excel do that itself? It has only been around since 1987 – I had it on my Atari ST, note that you can add more than ten cells before it crashed!). This cell function is able to do the trimming I need:
=ReReplace('source worksheet'!cell_reference, "^[\s,]+|[\s,]+$", "")
This works beautifully.
(Note: this answer moved from the question text, where it really should not have been.)
Recursive function to remove comma and trailing spaces. Pure VBA..
Function removetrailcomma(txt As String) As String If Right(txt, 1) = " " Or Right(txt, 1) = "," Then removetrailcomma = removetrailcomma(Left(txt, Len(txt) - 1)) Else removetrailcomma = txt End If End Function
I tried this using two steps
- By removing spaces
- By removing comma
For removing leading and trailing spaces
Use direct function
For removing leading and trailing comma
=SUBSTITUTE(TRIM(SUBSTITUTE(A1,","," "))," ",",")