Home » Python » Which is the preferred way to concatenate a string in Python?

Which is the preferred way to concatenate a string in Python?

Posted by: admin January 29, 2018 Leave a comment

Questions:

Since Python’s string can’t be changed, I was wondering how to concatenate a string more efficiently?

I can write like it:

s += stringfromelsewhere

or like this:

s = []
s.append(somestring)

later

s = ''.join(s)

While writing this question, I found a good article talking about the topic.

http://www.skymind.com/~ocrow/python_string/

But it’s in Python 2.x., so the question would be did something change in Python 3?

Answers:

The best way of appending a string to a string variable is to use + or +=. This is because it’s readable and fast. They are also just as fast, which one you choose is a matter of taste, the latter one is the most common. Here are timings with the timeit module:

a = a + b:
0.11338996887207031
a += b:
0.11040496826171875

However, those who recommend having lists and appending to them and then joining those lists, do so because appending a string to a list is presumably very fast compared to extending a string. And this can be true, in some cases. Here, for example, is one
million appends of a one-character string, first to a string, then to a list:

a += b:
0.10780501365661621
a.append(b):
0.1123361587524414

OK, turns out that even when the resulting string is a million characters long, appending was still faster.

Now let’s try with appending a thousand character long string a hundred thousand times:

a += b:
0.41823482513427734
a.append(b):
0.010656118392944336

The end string, therefore, ends up being about 100MB long. That was pretty slow, appending to a list was much faster. That that timing doesn’t include the final a.join(). So how long would that take?

a.join(a):
0.43739795684814453

Oups. Turns out even in this case, append/join is slower.

So where does this recommendation come from? Python 2?

a += b:
0.165287017822
a.append(b):
0.0132720470428
a.join(a):
0.114929914474

Well, append/join is marginally faster there if you are using extremely long strings (which you usually aren’t, what would you have a string that’s 100MB in memory?)

But the real clincher is Python 2.3. Where I won’t even show you the timings, because it’s so slow that it hasn’t finished yet. These tests suddenly take minutes. Except for the append/join, which is just as fast as under later Pythons.

Yup. String concatenation was very slow in Python back in the stone age. But on 2.4 it isn’t anymore (or at least Python 2.4.7), so the recommendation to use append/join became outdated in 2008, when Python 2.3 stopped being updated, and you should have stopped using it. 🙂

(Update: Turns out when I did the testing more carefully that using + and += is faster for two strings on Python 2.3 as well. The recommendation to use ''.join() must be a misunderstanding)

However, this is CPython. Other implementations may have other concerns. And this is just yet another reason why premature optimization is the root of all evil. Don’t use a technique that’s supposed “faster” unless you first measure it.

Therefore the “best” version to do string concatenation is to use + or +=. And if that turns out to be slow for you, which is pretty unlikely, then do something else.

So why do I use a lot of append/join in my code? Because sometimes it’s actually clearer. Especially when whatever you should concatenate together should be separated by spaces or commas or newlines.

Questions:
Answers:

If you are concatenating a lot of values, then neither. Appending a list is expensive. You can use StringIO for that. Especially if you are building it up over a lot of operations.

from cStringIO import StringIO
# python3:  from io import StringIO

buf = StringIO()

buf.write('foo')
buf.write('foo')
buf.write('foo')

buf.getvalue()
# 'foofoofoo'

If you already have a complete list returned to you from some other operation, then just use the ''.join(aList)

From the python FAQ: What is the most efficient way to concatenate many strings together?

str and bytes objects are immutable, therefore concatenating many
strings together is inefficient as each concatenation creates a new
object. In the general case, the total runtime cost is quadratic in
the total string length.

To accumulate many str objects, the recommended idiom is to place them
into a list and call str.join() at the end:

chunks = []
for s in my_strings:
    chunks.append(s)
result = ''.join(chunks)

(another reasonably efficient idiom is to use io.StringIO)

To accumulate many bytes objects, the recommended idiom is to extend a
bytearray object using in-place concatenation (the += operator):

result = bytearray()
for b in my_bytes_objects:
    result += b

Edit: I was silly and had the results pasted backwards, making it look like appending to a list was faster than cStringIO. I have also added tests for bytearray/str concat, as well as a second round of tests using a larger list with larger strings. (python 2.7.3)

ipython test example for large lists of strings

try:
    from cStringIO import StringIO
except:
    from io import StringIO

source = ['foo']*1000

%%timeit buf = StringIO()
for i in source:
    buf.write(i)
final = buf.getvalue()
# 1000 loops, best of 3: 1.27 ms per loop

%%timeit out = []
for i in source:
    out.append(i)
final = ''.join(out)
# 1000 loops, best of 3: 9.89 ms per loop

%%timeit out = bytearray()
for i in source:
    out += i
# 10000 loops, best of 3: 98.5 µs per loop

%%timeit out = ""
for i in source:
    out += i
# 10000 loops, best of 3: 161 µs per loop

## Repeat the tests with a larger list, containing
## strings that are bigger than the small string caching 
## done by the Python
source = ['foo']*1000

# cStringIO
# 10 loops, best of 3: 19.2 ms per loop

# list append and join
# 100 loops, best of 3: 144 ms per loop

# bytearray() +=
# 100 loops, best of 3: 3.8 ms per loop

# str() +=
# 100 loops, best of 3: 5.11 ms per loop

Questions:
Answers:

The recommended method is still to use append and join.

Questions:
Answers:

If the strings you are concatenating are literals, use String literal concatenation

re.compile(
        "[A-Za-z_]"       # letter or underscore
        "[A-Za-z0-9_]*"   # letter, digit or underscore
    )

This is useful if you want to comment on part of a string (as above) or if you want to use raw strings or triple quotes for part of a literal but not all.

Since this happens at the syntax layer it uses zero concatenation operators.

Questions:
Answers:

While somewhat dated, Code Like a Pythonista: Idiomatic Python recommends join() over + in this section. As does PythonSpeedPerformanceTips in its section on string concatenation, with the following disclaimer:

The accuracy of this section is disputed with respect to later
versions of Python. In CPython 2.5, string concatenation is fairly
fast, although this may not apply likewise to other Python
implementations. See ConcatenationTestCode for a discussion.

Questions:
Answers:

Using in place string concatenation by ‘+’ is THE WORST method of concatenation in terms of stability and cross implementation as it does not support all values. PEP8 standard discourages this and encourages the use of format(), join() and append() for long term use.

Questions:
Answers:

You can use this(more efficient) too. (https://softwareengineering.stackexchange.com/questions/304445/why-is-s-better-than-for-concatenation)

s += "%s" %(stringfromelsewhere)

Questions:
Answers:

You write this function

def str_join(*args):
    return ''.join(map(str, args))

Then you can call simply wherever you want

str_join('Pine')  # Returns : Pine
str_join('Pine', 'apple')  # Returns : Pineapple
str_join('Pine', 'apple', 3)  # Returns : Pineapple3