Home » Python » Dictionaries are ordered in Python 3.6+

Dictionaries are ordered in Python 3.6+

Posted by: admin November 1, 2017 Leave a comment

Questions:

Dictionaries are ordered in Python 3.6 (under the CPython implementation at least) unlike in previous incarnations. This seems like a substantial change, but it’s only a short paragraph in the documentation. It is described as a CPython implementation detail rather than a language feature, but also implies this may become standard in the future.

How does the new dictionary implementation perform better than the older one while preserving element order?

Here is the text from the documentation:

dict() now uses a “compact” representation pioneered by PyPy. The memory usage of the new dict() is between 20% and 25% smaller compared to Python 3.5. PEP 468 (Preserving the order of **kwargs in a function.) is implemented by this. The order-preserving aspect of this new implementation is considered an implementation detail and should not be relied upon (this may change in the future, but it is desired to have this new dict implementation in the language for a few releases before changing the language spec to mandate order-preserving semantics for all current and future Python implementations; this also helps preserve backwards-compatibility with older versions of the language where random iteration order is still in effect, e.g. Python 3.5). (Contributed by INADA Naoki in issue 27350. Idea originally suggested by Raymond Hettinger.)

EDIT: Raymond Hettinger recently tweeted that “[having dicts ordered] is so convenient that a guarantee for [Python] 3.7 is almost inevitable”.

EDIT 2: Two examples of how CPython 3.6 dicts are not always displayed in the insertion order, with pprint and with IPython’s pprint.

Answers:

How does the Python 3.6 dictionary implementation perform better* than the older one while preserving element order?

Essentially, by keeping two arrays.

  • The first array, dk_entries, holds the entries (of type PyDictKeyEntry) for the dictionary in the order that they were inserted. Preserving order is achieved by this being an append only array where new items are always inserted at the end (insertion order).

  • The second, dk_indices, holds the indices for the dk_entries array (that is, values that indicate the position of the corresponding entry in dk_entries). This array acts as the hash table. When a key is hashed it leads to one of the indices stored in dk_indices and the corresponding entry is fetched by indexing dk_entries. Since only indices are kept, the type of this array depends on the overall size of the dictionary (ranging from type int8_t(1 byte) to int32_t/int64_t (4/8 bytes) on 32/64 bit builds)

In the previous implementation, a sparse array of type PyDictKeyEntry and size dk_size had to be allocated; unfortunately, it also resulted in a lot of empty space since that array was not allowed to be more than 2/3 * dk_size full for performance reasons. (and the empty space still had PyDictKeyEntry size!).

This is not the case now since only the required entries are stored (those that have been inserted) and a sparse array of type intX_t (X depending on dict size) 2/3 * dk_sizes full is kept. The empty space changed from type PyDictKeyEntry to intX_t.

So, obviously, creating a sparse array of type PyDictKeyEntry is much more memory demanding than a sparse array for storing ints.

You can see the full conversation on Python-Dev regarding this feature if interested, it is a good read.


In the original proposal made by Raymond Hettinger, a visualization of the data structures used can be seen which captures the gist of the idea.

For example, the dictionary:

d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'}

is currently stored as:

entries = [['--', '--', '--'],
           [-8522787127447073495, 'barry', 'green'],
           ['--', '--', '--'],
           ['--', '--', '--'],
           ['--', '--', '--'],
           [-9092791511155847987, 'timmy', 'red'],
           ['--', '--', '--'],
           [-6480567542315338377, 'guido', 'blue']]

Instead, the data should be organized as follows:

indices =  [None, 1, None, None, None, 0, None, 2]
entries =  [[-9092791511155847987, 'timmy', 'red'],
            [-8522787127447073495, 'barry', 'green'],
            [-6480567542315338377, 'guido', 'blue']]

As you can visually now see, in the original proposal, a lot of space is essentially empty to reduce collisions and make look-ups faster. With the new approach, you reduce the memory required by moving the sparseness where it’s really required, in the indices.


*The new dictionary implementations performs better memory wise by being designed more compactly; that’s the main benefit here. Speed wise, the difference isn’t so drastic, there’s places where the new dict might introduce slight regressions (key-lookups, for example) while in others (iteration and resizing come to mind) a performance boost should be present.


Overall, the performance of the dictionary, especially in real-life situations, improves due to the compactness introduced.


Should you depend on it and/or use it?

No! Anything characterized as an implementation detail (as noted in the documentation) should be something you shouldn’t depend on. If you do it is going to be your fault if code you’ve written breaks in previous versions/different implementation and/or stops working in future releases.

Different implementations of Python aren’t required to make the dictionary ordered, rather, just support an ordered mapping where that is required (Notable examples are PEP 520: Preserving Class Attribute Definition Order and PEP 468: Preserving Keyword Argument Order)

If you want to write code that preserves the ordering and want it to not break on previous versions/different implementations you should use OrderedDict. Besides, OrderedDict will most likely eventually become a thin-wrapper around the new dict implementation.

Note: This might be elevated from implementation detail to feature at some point (as alluded by this tweet by RDH). If that happens the previous, of course, doesn’t apply.

Questions:
Answers:

Below is answering the original first question:

Should I use dict or OrderedDict in Python 3.6?

I think this sentence from the documentation is actually enough to answer your question

The order-preserving aspect of this new implementation is considered an implementation detail and should not be relied upon

dict is not explicitly meant to be an ordered collection, so if you want to stay consistent and not rely on a side effect of the new implementation you should stick with OrderedDict.

Make your code future proof 🙂

There’s a debate about that here.