Home » Python » How to add an extra column to an numpy array

How to add an extra column to an numpy array

Questions:

Lets say I have an numpy array `a`:

``````a = np.array([[1,2,3], [2,3,4]])
``````

And I would like to add a column of zeros to get array `b`:

``````b = np.array([[1,2,3,0], [2,3,4,0]])
``````

How can I do this easily in numpy?

I think a more straightforward solution and faster to boot is to do the following:

``````import numpy as np
N = 10
a = np.random.rand(N,N)
b = np.zeros((N,N+1))
b[:,:-1] = a
``````

And timings:

``````In [23]: N = 10

In [24]: a = np.random.rand(N,N)

In [25]: %timeit b = np.hstack((a,np.zeros((a.shape[0],1))))
10000 loops, best of 3: 19.6 us per loop

In [27]: %timeit b = np.zeros((a.shape[0],a.shape[1]+1)); b[:,:-1] = a
100000 loops, best of 3: 5.62 us per loop
``````

Questions:

`np.r_[ ... ]` and `np.c_[ ... ]`
are useful alternatives to `vstack` and `hstack`,
with square brackets [] instead of round ().
A couple of examples:

``````: import numpy as np
: N = 3
: A = np.eye(N)

: np.c_[ A, np.ones(N) ]              # add a column
array([[ 1.,  0.,  0.,  1.],
[ 0.,  1.,  0.,  1.],
[ 0.,  0.,  1.,  1.]])

: np.c_[ np.ones(N), A, np.ones(N) ]  # or two
array([[ 1.,  1.,  0.,  0.,  1.],
[ 1.,  0.,  1.,  0.,  1.],
[ 1.,  0.,  0.,  1.,  1.]])

: np.r_[ A, [A[1]] ]              # add a row
array([[ 1.,  0.,  0.],
[ 0.,  1.,  0.],
[ 0.,  0.,  1.],
[ 0.,  1.,  0.]])
: # not np.r_[ A, A[1] ]

: np.r_[ A[0], 1, 2, 3, A[1] ]    # mix vecs and scalars
array([ 1.,  0.,  0.,  1.,  2.,  3.,  0.,  1.,  0.])

: np.r_[ A[0], [1, 2, 3], A[1] ]  # lists
array([ 1.,  0.,  0.,  1.,  2.,  3.,  0.,  1.,  0.])

: np.r_[ A[0], (1, 2, 3), A[1] ]  # tuples
array([ 1.,  0.,  0.,  1.,  2.,  3.,  0.,  1.,  0.])

: np.r_[ A[0], 1:4, A[1] ]        # same, 1:4 == arange(1,4) == 1,2,3
array([ 1.,  0.,  0.,  1.,  2.,  3.,  0.,  1.,  0.])
``````

(The reason for square brackets [] instead of round ()
is that Python expands e.g. 1:4 in square —

Questions:

Use `numpy.append`:

``````>>> a = np.array([[1,2,3],[2,3,4]])
>>> a
array([[1, 2, 3],
[2, 3, 4]])

>>> z = np.zeros((2,1), dtype=int64)
>>> z
array([[0],
[0]])

>>> np.append(a, z, axis=1)
array([[1, 2, 3, 0],
[2, 3, 4, 0]])
``````

Questions:

While writing the question I came up with one way, using hstack

``````b = np.hstack((a, np.zeros((a.shape[0], 1), dtype=a.dtype)))
``````

Any other (more elegant solutions) welcome!

Questions:

I think:

``````np.column_stack((a, zeros(shape(a)[0])))
``````

is more elegant.

Questions:

What I find most elegant is the following:

``````b = np.insert(a, 3, values=0, axis=1) # insert values before column 3
``````

An advantage of `insert` is that it also allows you to insert columns (or rows) at other places inside the array. Also instead of inserting a single value you can easily insert a whole vector, for instance doublicate the last column:

``````b = np.insert(a, insert_index, values=a[:,2], axis=1)
``````

``````array([[1, 2, 3, 3],
[2, 3, 4, 4]])
``````

For the timing, `insert` might be slower than JoshAdel’s solution:

``````In [1]: N = 10

In [2]: a = np.random.rand(N,N)

In [3]: %timeit b = np.hstack((a,np.zeros((a.shape[0],1))))
100000 loops, best of 3: 7.5 us per loop

In [4]: %timeit b = np.zeros((a.shape[0],a.shape[1]+1)); b[:,:-1] = a
100000 loops, best of 3: 2.17 us per loop

In [5]: %timeit b = np.insert(a, 3, values=0, axis=1)
100000 loops, best of 3: 10.2 us per loop
``````

Questions:

np.concatenate also works

``````>>> a = np.array([[1,2,3],[2,3,4]])
>>> a
array([[1, 2, 3],
[2, 3, 4]])
>>> z = np.zeros((2,1))
>>> z
array([[ 0.],
[ 0.]])
>>> np.concatenate((a, z), axis=1)
array([[ 1.,  2.,  3.,  0.],
[ 2.,  3.,  4.,  0.]])
``````

Questions:

I like JoshAdel’s answer because of the focus on performance. A minor performance improvement is to avoid the overhead of initializing with zeros, only to be overwritten. This has a measurable difference when N is large, empty is used instead of zeros, and the column of zeros is written as a separate step:

``````In [1]: import numpy as np

In [2]: N = 10000

In [3]: a = np.ones((N,N))

In [4]: %timeit b = np.zeros((a.shape[0],a.shape[1]+1)); b[:,:-1] = a
1 loops, best of 3: 492 ms per loop

In [5]: %timeit b = np.empty((a.shape[0],a.shape[1]+1)); b[:,:-1] = a; b[:,-1] = np.zeros((a.shape[0],))
1 loops, best of 3: 407 ms per loop
``````

Questions:

Assuming `M` is a (100,3) ndarray and `y` is a (100,) ndarray `append` can be used as follows:

``````M=numpy.append(M,y[:,None],1)
``````

The trick is to use

``````y[:, None]
``````

This converts `y` to a (100, 1) 2D array.

``````M.shape
``````

now gives

``````(100, 4)
``````

Questions:

I was also interested in this question and compared the speed of

``````numpy.c_[a, a]
numpy.stack([a, a]).T
numpy.vstack([a, a]).T
numpy.ascontiguousarray(numpy.stack([a, a]).T)
numpy.ascontiguousarray(numpy.vstack([a, a]).T)
numpy.column_stack([a, a])
numpy.concatenate([a[:,None], a[:,None]], axis=1)
numpy.concatenate([a[None], a[None]], axis=0).T
``````

which all do the same thing for any input vector `a`. Timings for growing `a`:

Note that all non-contiguous variants (in particular `stack`/`vstack`) are eventually faster than all contiguous variants. `column_stack` (for its clarity and speed) appears to be a good option if you require contiguity.

Code to reproduce the plot:

``````import numpy
import perfplot

perfplot.show(
setup=lambda n: numpy.random.rand(n),
kernels=[
lambda a: numpy.c_[a, a],
lambda a: numpy.ascontiguousarray(numpy.stack([a, a]).T),
lambda a: numpy.ascontiguousarray(numpy.vstack([a, a]).T),
lambda a: numpy.column_stack([a, a]),
lambda a: numpy.concatenate([a[:, None], a[:, None]], axis=1),
lambda a: numpy.ascontiguousarray(numpy.concatenate([a[None], a[None]], axis=0).T),
lambda a: numpy.stack([a, a]).T,
lambda a: numpy.vstack([a, a]).T,
lambda a: numpy.concatenate([a[None], a[None]], axis=0).T,
],
labels=[
'c_', 'ascont(stack)', 'ascont(vstack)', 'column_stack', 'concat',
'ascont(concat)', 'stack (non-cont)', 'vstack (non-cont)',
'concat (non-cont)'
],
n_range=[2**k for k in range(20)],
xlabel='len(a)',
logx=True,
logy=True,
)
``````

Questions:

A bit late to the party, but nobody posted this answer yet, so for the sake of completeness: you can do this with list comprehensions, on a plain Python array:

``````source = a.tolist()
result = [row + [0] for row in source]
b = np.array(result)
``````

Questions:

In my case, I had to add a column of ones to a numpy array

``````X = array([  6.1101,   5.5277, ... ])
X.shape => (97,)
X = np.concatenate((np.ones((m,1), dtype=np.int), X.reshape(m,1)), axis=1)
``````

After
X.shape => (97, 2)

``````array([[  1.    ,   6.1101],
[  1.    ,   5.5277],
...
``````