Home » Python » Python replace multiple strings

Python replace multiple strings

Posted by: admin November 1, 2017 Leave a comment

Questions:

I would like to use the .replace function to replace multiple strings.

I currently have

string.replace("condition1", "")

but would like to have something like

string.replace("condition1", "").replace("condition2", "text")

although that does not feel like good syntax

what is the proper way to do this? kind of like how in grep/regex you can do \1 and \2 to replace fields to certain search strings

Answers:

Here is a short example that should do the trick with regular expressions:

import re

rep = {"condition1": "", "condition2": "text"} # define desired replacements here

# use these three lines to do the replacement
rep = dict((re.escape(k), v) for k, v in rep.iteritems())
pattern = re.compile("|".join(rep.keys()))
text = pattern.sub(lambda m: rep[re.escape(m.group(0))], text)

For example:

>>> pattern.sub(lambda m: rep[re.escape(m.group(0))], "(condition1) and --condition2--")
'() and --text--'

Questions:
Answers:

You could just make a nice little looping function.

def replace_all(text, dic):
    for i, j in dic.iteritems():
        text = text.replace(i, j)
    return text

where text is the complete string and dic is a dictionary — each definition is a string that will replace a match to the term.

Note: in Python 3, iteritems() has been replaced with items()

Careful: Note that this answer requires that:

  • order is irrelevant for each replacement
  • it’s ok for each replacement to change the results of the previous replacements

This is because python dictionaries don’t have a reliable order for iteration.

For instance, if a dictionary has:

{ "cat": "dog", "dog": "pig"}

and the string is:

"This is my cat and this is my dog."

we don’t necessarily know which dictionary entry will be used first and whether the result will be:

"This is my pig and this is my pig."

or

"This is my dog and this is my pig."

It’s also good to keep in mind how big the text string is and how many pairs are in the dictionary for efficiency.

Questions:
Answers:

Here is a variant of the first solution using reduce, in case you like being functional. 🙂

repls = {'hello' : 'goodbye', 'world' : 'earth'}
s = 'hello, world'
reduce(lambda a, kv: a.replace(*kv), repls.iteritems(), s)

martineau’s even better version:

repls = ('hello', 'goodbye'), ('world', 'earth')
s = 'hello, world'
reduce(lambda a, kv: a.replace(*kv), repls, s)

Questions:
Answers:

I built this upon F.J.s excellent answer:

import re

def multiple_replacer(*key_values):
    replace_dict = dict(key_values)
    replacement_function = lambda match: replace_dict[match.group(0)]
    pattern = re.compile("|".join([re.escape(k) for k, v in key_values]), re.M)
    return lambda string: pattern.sub(replacement_function, string)

def multiple_replace(string, *key_values):
    return multiple_replacer(*key_values)(string)

One shot usage:

>>> replacements = (u"café", u"tea"), (u"tea", u"café"), (u"like", u"love")
>>> print multiple_replace(u"Do you like café? No, I prefer tea.", *replacements)
Do you love tea? No, I prefer café.

Note that since replacement is done in just one pass, “café” changes to “tea”, but it does not change back to “café”.

If you need to do the same replacement many times, you can create a replacement function easily:

>>> my_escaper = multiple_replacer(('"','\"'), ('\t', '\t'))
>>> many_many_strings = (u'This text will be escaped by "my_escaper"',
                       u'Does this work?\tYes it does',
                       u'And can we span\nmultiple lines?\t"Yes\twe\tcan!"')
>>> for line in many_many_strings:
...     print my_escaper(line)
... 
This text will be escaped by \"my_escaper\"
Does this work?\tYes it does
And can we span
multiple lines?\t\"Yes\twe\tcan!\"

Improvements:

  • turned code into a function
  • added multiline support
  • fixed a bug in escaping
  • easy to create a function for a specific multiple replacement

Enjoy! 🙂

Questions:
Answers:

I would like to propose the usage of string templates. Just place the string to be replaced in a dictionary and all is set! Example from docs.python.org

>>> from string import Template
>>> s = Template('$who likes $what')
>>> s.substitute(who='tim', what='kung pao')
'tim likes kung pao'
>>> d = dict(who='tim')
>>> Template('Give $who $100').substitute(d)
Traceback (most recent call last):
[...]
ValueError: Invalid placeholder in string: line 1, col 10
>>> Template('$who likes $what').substitute(d)
Traceback (most recent call last):
[...]
KeyError: 'what'
>>> Template('$who likes $what').safe_substitute(d)
'tim likes $what'

Questions:
Answers:

This is just a more concise recap of F.J and MiniQuark great answers. All you need to achieve multiple simultaneous string replacements is the following function:

import re
def multiple_replace(string, rep_dict):
    pattern = re.compile("|".join([re.escape(k) for k in rep_dict.keys()]), re.M)
    return pattern.sub(lambda x: rep_dict[x.group(0)], string)

Usage:

>>>multiple_replace("Do you like cafe? No, I prefer tea.", {'cafe':'tea', 'tea':'cafe', 'like':'prefer'})
'Do you prefer tea? No, I prefer cafe.'

If you wish, you can make your own dedicated replacement functions starting from this simpler one.

Questions:
Answers:

I needed a solution where the strings to be replaced can be a regular expressions,
for example to help in normalizing a long text by replacing multiple whitespace characters with a single one. Building on a chain of answers from others, including MiniQuark and mmj, this is what I came up with:

def multiple_replace(string, reps, re_flags = 0):
    """ Transforms string, replacing keys from re_str_dict with values.
    reps: dictionary, or list of key-value pairs (to enforce ordering;
          earlier items have higher priority).
          Keys are used as regular expressions.
    re_flags: interpretation of regular expressions, such as re.DOTALL
    """
    if isinstance(reps, dict):
        reps = reps.items()
    pattern = re.compile("|".join("(?P<_%d>%s)" % (i, re_str[0])
                                  for i, re_str in enumerate(reps)),
                         re_flags)
    return pattern.sub(lambda x: reps[int(x.lastgroup[1:])][1], string)

It works for the examples given in other answers, for example:

>>> multiple_replace("(condition1) and --condition2--",
...                  {"condition1": "", "condition2": "text"})
'() and --text--'

>>> multiple_replace('hello, world', {'hello' : 'goodbye', 'world' : 'earth'})
'goodbye, earth'

>>> multiple_replace("Do you like cafe? No, I prefer tea.",
...                  {'cafe': 'tea', 'tea': 'cafe', 'like': 'prefer'})
'Do you prefer tea? No, I prefer cafe.'

The main thing for me is that you can use regular expressions as well, for example to replace whole words only, or to normalize white space:

>>> s = "I don't want to change this name:\n  Philip II of Spain"
>>> re_str_dict = {r'\bI\b': 'You', r'[\n\t ]+': ' '}
>>> multiple_replace(s, re_str_dict)
"You don't want to change this name: Philip II of Spain"

If you want to use the dictionary keys as normal strings,
you can escape those before calling multiple_replace using e.g. this function:

def escape_keys(d):
    """ transform dictionary d by applying re.escape to the keys """
    return dict((re.escape(k), v) for k, v in d.items())

>>> multiple_replace(s, escape_keys(re_str_dict))
"I don't want to change this name:\n  Philip II of Spain"

The following function can help in finding erroneous regular expressions among your dictionary keys (since the error message from multiple_replace isn’t very telling):

def check_re_list(re_list):
    """ Checks if each regular expression in list is well-formed. """
    for i, e in enumerate(re_list):
        try:
            re.compile(e)
        except (TypeError, re.error):
            print("Invalid regular expression string "
                  "at position {}: '{}'".format(i, e))

>>> check_re_list(re_str_dict.keys())

Note that it does not chain the replacements, instead performs them simultaneously. This makes it more efficient without constraining what it can do. To mimic the effect of chaining, you may just need to add more string-replacement pairs and ensure the expected ordering of the pairs:

>>> multiple_replace("button", {"but": "mut", "mutton": "lamb"})
'mutton'
>>> multiple_replace("button", [("button", "lamb"),
...                             ("but", "mut"), ("mutton", "lamb")])
'lamb'

Questions:
Answers:

In my case, I needed a simple replacing of keys with names, so I thought this up:

a = 'this is a test string'
b = {'i': 'Z', 's': 'Y'}
for x,y in b.items():
    a = a.replace(x, y)
>>> a
'thZY ZY a teYt YtrZng'

Questions:
Answers:

Here my $0.02. It is based on Andrew Clark’s answer, just a little bit clearer, and it also covers the case when a string to replace is a substring of another string to replace (longer string wins)

def multireplace(string, replacements):
    """
    Given a string and a replacement map, it returns the replaced string.

    :param str string: string to execute replacements on
    :param dict replacements: replacement dictionary {value to find: value to replace}
    :rtype: str

    """
    # Place longer ones first to keep shorter substrings from matching
    # where the longer ones should take place
    # For instance given the replacements {'ab': 'AB', 'abc': 'ABC'} against 
    # the string 'hey abc', it should produce 'hey ABC' and not 'hey ABc'
    substrs = sorted(replacements, key=len, reverse=True)

    # Create a big OR regex that matches any of the substrings to replace
    regexp = re.compile('|'.join(map(re.escape, substrs)))

    # For each match, look up the new string in the replacements
    return regexp.sub(lambda match: replacements[match.group(0)], string)

It is in this this gist, feel free to modify it if you have any proposal.

Questions:
Answers:

You should really not do it this way, but I just find it way too cool:

>>> replacements = {'cond1':'text1', 'cond2':'text2'}
>>> cmd = 'answer = s'
>>> for k,v in replacements.iteritems():
>>>     cmd += ".replace(%s, %s)" %(k,v)
>>> exec(cmd)

Now, answer is the result of all the replacements in turn

again, this is very hacky and is not something that you should be using regularly. But it’s just nice to know that you can do something like this if you ever need to.

Questions:
Answers:

Here’s a sample which is more efficient on long strings with many small replacements.

source = "Here is foo, it does moo!"

replacements = {
    'is': 'was', # replace 'is' with 'was'
    'does': 'did',
    '!': '?'
}

def replace(source, replacements):
    finder = re.compile("|".join(re.escape(k) for k in replacements.keys())) # matches every string we want replaced
    result = []
    pos = 0
    while True:
        match = finder.search(source, pos)
        if match:
            # cut off the part up until match
            result.append(source[pos : match.start()])
            # cut off the matched part and replace it in place
            result.append(replacements[source[match.start() : match.end()]])
            pos = match.end()
        else:
            # the rest after the last match
            result.append(source[pos:])
            break
    return "".join(result)

print replace(source, replacements)

The point is in avoiding many concatenations of long strings. We chop the source string to fragments, replacing some of the fragments as we form the list, and then join the whole thing back into a string.

Questions:
Answers:

Or just for a fast hack:

for line in to_read:
    read_buffer = line              
    stripped_buffer1 = read_buffer.replace("term1", " ")
    stripped_buffer2 = stripped_buffer1.replace("term2", " ")
    write_to_file = to_write.write(stripped_buffer2)

Questions:
Answers:

Here is another way of doing it with a dictionary:

listA="The cat jumped over the house".split()
modify = {word:word for number,word in enumerate(listA)}
modify["cat"],modify["jumped"]="dog","walked"
print " ".join(modify[x] for x in listA)

Questions:
Answers:

Starting from the precious answer of Andrew i developed a script that loads the dictionary from a file and elaborates all the files on the opened folder to do the replacements. The script loads the mappings from an external file in which you can set the separator. I’m a beginner but i found this script very useful when doing multiple substitutions in multiple files. It loaded a dictionary with more than 1000 entries in seconds. It is not elegant but it worked for me

import glob
import re

mapfile = input("Enter map file name with extension eg. codifica.txt: ")
sep = input("Enter map file column separator eg. |: ")
mask = input("Enter search mask with extension eg. 2010*txt for all files to be processed: ")
suff = input("Enter suffix with extension eg. _NEW.txt for newly generated files: ")

rep = {} # creation of empy dictionary

with open(mapfile) as temprep: # loading of definitions in the dictionary using input file, separator is prompted
    for line in temprep:
        (key, val) = line.strip('\n').split(sep)
        rep[key] = val

for filename in glob.iglob(mask): # recursion on all the files with the mask prompted

    with open (filename, "r") as textfile: # load each file in the variable text
        text = textfile.read()

        # start replacement
        #rep = dict((re.escape(k), v) for k, v in rep.items()) commented to enable the use in the mapping of re reserved characters
        pattern = re.compile("|".join(rep.keys()))
        text = pattern.sub(lambda m: rep[m.group(0)], text)

        #write of te output files with the prompted suffice
        target = open(filename[:-4]+"_NEW.txt", "w")
        target.write(text)
        target.close()

Questions:
Answers:

this is my solution to the problem. I used it in a chatbot to replace the different words at once.

def mass_replace(text, dct):
    new_string = ""
    old_string = text
    while len(old_string) > 0:
        s = ""
        sk = ""
        for k in dct.keys():
            if old_string.startswith(k):
                s = dct[k]
                sk = k
        if s:
            new_string+=s
            old_string = old_string[len(sk):]
        else:
            new_string+=old_string[0]
            old_string = old_string[1:]
    return new_string

print mass_replace("The dog hunts the cat", {"dog":"cat", "cat":"dog"})

this will become The cat hunts the dog

Questions:
Answers:

Another example :
Input list

error_list = ['[br]', '[ex]', 'Something']
words = ['how', 'much[ex]', 'is[br]', 'the', 'fish[br]', 'noSomething', 'really']

The desired output would be

words = ['how', 'much', 'is', 'the', 'fish', 'no', 'really']

Code :

[n[0][0] if len(n[0]) else n[1] for n in [[[w.replace(e,"") for e in error_list if e in w],w] for w in words]] 

Questions:
Answers:

I don’t know about speed but this is my workaday quick fix:

reduce(lambda a, b: a.replace(*b)
    , [('o','W'), ('t','X')] #iterable of pairs: (oldval, newval)
    , 'tomato' #The string from which to replace values
    )

… but I like the #1 regex answer above. Note – if one new value is a substring of another one then the operation is not commutative.

Questions:
Answers:

Why not one solution like this?

s = "The quick brown fox jumps over the lazy dog"
for r in (("brown", "red"), ("lazy", "quick")):
    s = s.replace(*r)