Home » Python » Substitute multiple whitespace with single whitespace in Python [duplicate]

Substitute multiple whitespace with single whitespace in Python [duplicate]

Posted by: admin November 1, 2017 Leave a comment

Questions:

This question already has an answer here:

Answers:

A simple possibility (if you’d rather avoid REs) is

' '.join(mystring.split())

The split and join perform the task you’re explicitly asking about — plus, they also do the extra one that you don’t talk about but is seen in your example, removing trailing spaces;-).

Questions:
Answers:
import re

re.sub( '\s+', ' ', mystring ).strip()

this will also substitute all tabs, newlines and other “whitespace-like” characters.

the strip() in the end will cut off any trailing whitespaces, as you requested.

Questions:
Answers:

For completeness, you can also use:

mystring = mystring.strip()  # the while loop will leave a trailing space, 
                  # so the trailing whitespace must be dealt with
                  # before or after the while loop
while '  ' in mystring:
    mystring = mystring.replace('  ', ' ')

which will work quickly on strings with relatively few spaces (faster than re in these situations).

In any scenario, Alex Martelli’s split/join solution performs at least as quickly (usually significantly more so).

In your example, using the default values of timeit.Timer.repeat(), I get the following times:

str.replace: [1.4317800167340238, 1.4174888149192384, 1.4163512401715934]
re.sub:      [3.741931446594549,  3.8389395858970374, 3.973777672860706]
split/join:  [0.6530919432498195, 0.6252146571700905, 0.6346594329726258]

EDIT:

Just came across this post which provides a rather long comparison of the speeds of these methods.

Questions:
Answers:
string.replace("  ","")

All even number of spaces are eliminated