Home » Php » performance – Preferred method to store PHP arrays (json_encode vs serialize)

performance – Preferred method to store PHP arrays (json_encode vs serialize)

Posted by: admin February 22, 2020 Leave a comment

Questions:

I need to store a multi-dimensional associative array of data in a flat file for caching purposes. I might occasionally come across the need to convert it to JSON for use in my web app but the vast majority of the time I will be using the array directly in PHP.

Would it be more efficient to store the array as JSON or as a PHP serialized array in this text file? I’ve looked around and it seems that in the newest versions of PHP (5.3), json_decode is actually faster than unserialize.

I’m currently leaning towards storing the array as JSON as I feel its easier to read by a human if necessary, it can be used in both PHP and JavaScript with very little effort, and from what I’ve read, it might even be faster to decode (not sure about encoding, though).

Does anyone know of any pitfalls? Anyone have good benchmarks to show the performance benefits of either method?

How to&Answers:

Depends on your priorities.

If performance is your absolute driving characteristic, then by all means use the fastest one. Just make sure you have a full understanding of the differences before you make a choice

  • Unlike serialize() you need to add extra parameter to keep UTF-8 characters untouched: json_encode($array, JSON_UNESCAPED_UNICODE) (otherwise it converts UTF-8 characters to Unicode escape sequences).
  • JSON will have no memory of what the object’s original class was (they are always restored as instances of stdClass).
  • You can’t leverage __sleep() and __wakeup() with JSON
  • By default, only public properties are serialized with JSON. (in PHP>=5.4 you can implement JsonSerializable to change this behavior).
  • JSON is more portable

And there’s probably a few other differences I can’t think of at the moment.

A simple speed test to compare the two

<?php

ini_set('display_errors', 1);
error_reporting(E_ALL);

// Make a big, honkin test array
// You may need to adjust this depth to avoid memory limit errors
$testArray = fillArray(0, 5);

// Time json encoding
$start = microtime(true);
json_encode($testArray);
$jsonTime = microtime(true) - $start;
echo "JSON encoded in $jsonTime seconds\n";

// Time serialization
$start = microtime(true);
serialize($testArray);
$serializeTime = microtime(true) - $start;
echo "PHP serialized in $serializeTime seconds\n";

// Compare them
if ($jsonTime < $serializeTime) {
    printf("json_encode() was roughly %01.2f%% faster than serialize()\n", ($serializeTime / $jsonTime - 1) * 100);
}
else if ($serializeTime < $jsonTime ) {
    printf("serialize() was roughly %01.2f%% faster than json_encode()\n", ($jsonTime / $serializeTime - 1) * 100);
} else {
    echo "Impossible!\n";
}

function fillArray( $depth, $max ) {
    static $seed;
    if (is_null($seed)) {
        $seed = array('a', 2, 'c', 4, 'e', 6, 'g', 8, 'i', 10);
    }
    if ($depth < $max) {
        $node = array();
        foreach ($seed as $key) {
            $node[$key] = fillArray($depth + 1, $max);
        }
        return $node;
    }
    return 'empty';
}

Answer:

JSON is simpler and faster than PHP’s serialization format and should be used unless:

  • You’re storing deeply nested arrays:
    json_decode(): “This function will return false if the JSON encoded data is deeper than 127 elements.”
  • You’re storing objects that need to be unserialized as the correct class
  • You’re interacting with old PHP versions that don’t support json_decode

Answer:

I’ve written a blogpost about this subject: “Cache a large array: JSON, serialize or var_export?“. In this post it is shown that serialize is the best choice for small to large sized arrays. For very large arrays (> 70MB) JSON is the better choice.

Answer:

You might also be interested in https://github.com/phadej/igbinary – which provides a different serialization ‘engine’ for PHP.

My random/arbitrary ‘performance’ figures, using PHP 5.3.5 on a 64bit platform show :

JSON :

  • JSON encoded in 2.180496931076 seconds
  • JSON decoded in 9.8368630409241 seconds
  • serialized “String” size : 13993

Native PHP :

  • PHP serialized in 2.9125759601593 seconds
  • PHP unserialized in 6.4348418712616 seconds
  • serialized “String” size : 20769

Igbinary :

  • WIN igbinary serialized in 1.6099879741669 seconds
  • WIN igbinrary unserialized in 4.7737920284271 seconds
  • WIN serialized “String” Size : 4467

So, it’s quicker to igbinary_serialize() and igbinary_unserialize() and uses less disk space.

I used the fillArray(0, 3) code as above, but made the array keys longer strings.

igbinary can store the same data types as PHP’s native serialize can (So no problem with objects etc) and you can tell PHP5.3 to use it for session handling if you so wish.

See also http://ilia.ws/files/zendcon_2010_hidden_features.pdf – specifically slides 14/15/16

Answer:

Y just tested serialized and json encode and decode, plus the size it will take the string stored.

JSON encoded in 0.067085981369 seconds. Size (1277772)
PHP serialized in 0.12110209465 seconds. Size (1955548)
JSON decode in 0.22470498085 seconds
PHP serialized in 0.211947917938 seconds
json_encode() was roughly 80.52% faster than serialize()
unserialize() was roughly 6.02% faster than json_decode()
JSON string was roughly 53.04% smaller than Serialized string

We can conclude that JSON encodes faster and results a smaller string, but unserialize is faster to decode the string.

Answer:

If you are caching information that you will ultimately want to “include” at a later point in time, you may want to try using var_export. That way you only take the hit in the “serialize” and not in the “unserialize”.

Answer:

I augmented the test to include unserialization performance. Here are the numbers I got.

Serialize

JSON encoded in 2.5738489627838 seconds
PHP serialized in 5.2861361503601 seconds
Serialize: json_encode() was roughly 105.38% faster than serialize()


Unserialize

JSON decode in 10.915472984314 seconds
PHP unserialized in 7.6223039627075 seconds
Unserialize: unserialize() was roughly 43.20% faster than json_decode() 

So json seems to be faster for encoding but slow in decoding. So it could depend upon your application and what you expect to do the most.

Answer:

Really nice topic and after reading the few answers, I want to share my experiments on the subject.

I got a use case where some “huge” table needs to be queried almost every time I talk to the database (don’t ask why, just a fact). The database caching system isn’t appropriate as it’ll not cache the different requests, so I though about php caching systems.

I tried apcu but it didn’t fit the needs, memory isn’t enough reliable in this case. Next step was to cache into a file with serialization.

Table has 14355 entries with 18 columns, those are my tests and stats on reading the serialized cache:

JSON:

As you all said, the major inconvenience with json_encode/json_decode is that it transforms everything to an StdClass instance (or Object). If you need to loop it, transforming it to an array is what you’ll probably do, and yes it’s increasing the transformation time

average time: 780.2 ms; memory use: 41.5MB; cache file size: 3.8MB

Msgpack

@hutch mentions msgpack. Pretty website. Let’s give it a try shall we?

average time: 497 ms; memory use: 32MB; cache file size: 2.8MB

That’s better, but requires a new extension; compiling sometimes afraid people…

IgBinary

@GingerDog mentions igbinary. Note that I’ve set the igbinary.compact_strings=Offbecause I care more about reading performances than file size.

average time: 411.4 ms; memory use: 36.75MB; cache file size: 3.3MB

Better than msg pack. Still, this one requires compiling too.

serialize/unserialize

average time: 477.2 ms; memory use: 36.25MB; cache file size: 5.9MB

Better performances than JSON, the bigger the array is, slower json_decode is, but you already new that.

Those external extensions are narrowing down the file size and seems great on paper. Numbers don’t lie*. What’s the point of compiling an extension if you get almost the same results that you’d have with a standard PHP function?

We can also deduce that depending on your needs, you will choose something different than someone else:

  • IgBinary is really nice and performs better than MsgPack
  • Msgpack is better at compressing your datas (note that I didn’t tried the igbinary
    compact.string option).
  • Don’t want to compile? Use standards.

That’s it, another serialization methods comparison to help you choose the one!

*Tested with PHPUnit 3.7.31, php 5.5.10 – only decoding with a standard hardrive and old dual core CPU – average numbers on 10 same use case tests, your stats might be different

Answer:

Seems like serialize is the one I’m going to use for 2 reasons:

  • Someone pointed out that unserialize is faster than json_decode and a ‘read’ case sounds more probable than a ‘write’ case.

  • I’ve had trouble with json_encode when having strings with invalid UTF-8 characters. When that happens the string ends up being empty causing loss of information.

Answer:

I’ve tested this very thoroughly on a fairly complex, mildly nested multi-hash with all kinds of data in it (string, NULL, integers), and serialize/unserialize ended up much faster than json_encode/json_decode.

The only advantage json have in my tests was it’s smaller ‘packed’ size.

These are done under PHP 5.3.3, let me know if you want more details.

Here are tests results then the code to produce them. I can’t provide the test data since it’d reveal information that I can’t let go out in the wild.

JSON encoded in 2.23700618744 seconds
PHP serialized in 1.3434419632 seconds
JSON decoded in 4.0405561924 seconds
PHP unserialized in 1.39393305779 seconds

serialized size : 14549
json_encode size : 11520
serialize() was roughly 66.51% faster than json_encode()
unserialize() was roughly 189.87% faster than json_decode()
json_encode() string was roughly 26.29% smaller than serialize()

//  Time json encoding
$start = microtime( true );
for($i = 0; $i < 10000; $i++) {
    json_encode( $test );
}
$jsonTime = microtime( true ) - $start;
echo "JSON encoded in $jsonTime seconds<br>";

//  Time serialization
$start = microtime( true );
for($i = 0; $i < 10000; $i++) {
    serialize( $test );
}
$serializeTime = microtime( true ) - $start;
echo "PHP serialized in $serializeTime seconds<br>";

//  Time json decoding
$test2 = json_encode( $test );
$start = microtime( true );
for($i = 0; $i < 10000; $i++) {
    json_decode( $test2 );
}
$jsonDecodeTime = microtime( true ) - $start;
echo "JSON decoded in $jsonDecodeTime seconds<br>";

//  Time deserialization
$test2 = serialize( $test );
$start = microtime( true );
for($i = 0; $i < 10000; $i++) {
    unserialize( $test2 );
}
$unserializeTime = microtime( true ) - $start;
echo "PHP unserialized in $unserializeTime seconds<br>";

$jsonSize = strlen(json_encode( $test ));
$phpSize = strlen(serialize( $test ));

echo "<p>serialized size : " . strlen(serialize( $test )) . "<br>";
echo "json_encode size : " . strlen(json_encode( $test )) . "<br></p>";

//  Compare them
if ( $jsonTime < $serializeTime )
{
    echo "json_encode() was roughly " . number_format( ($serializeTime / $jsonTime - 1 ) * 100, 2 ) . "% faster than serialize()";
}
else if ( $serializeTime < $jsonTime )
{
    echo "serialize() was roughly " . number_format( ($jsonTime / $serializeTime - 1 ) * 100, 2 ) . "% faster than json_encode()";
} else {
    echo 'Unpossible!';
}
    echo '<BR>';

//  Compare them
if ( $jsonDecodeTime < $unserializeTime )
{
    echo "json_decode() was roughly " . number_format( ($unserializeTime / $jsonDecodeTime - 1 ) * 100, 2 ) . "% faster than unserialize()";
}
else if ( $unserializeTime < $jsonDecodeTime )
{
    echo "unserialize() was roughly " . number_format( ($jsonDecodeTime / $unserializeTime - 1 ) * 100, 2 ) . "% faster than json_decode()";
} else {
    echo 'Unpossible!';
}
    echo '<BR>';
//  Compare them
if ( $jsonSize < $phpSize )
{
    echo "json_encode() string was roughly " . number_format( ($phpSize / $jsonSize - 1 ) * 100, 2 ) . "% smaller than serialize()";
}
else if ( $phpSize < $jsonSize )
{
    echo "serialize() string was roughly " . number_format( ($jsonSize / $phpSize - 1 ) * 100, 2 ) . "% smaller than json_encode()";
} else {
    echo 'Unpossible!';
}

Answer:

I made a small benchmark as well. My results were the same. But I need the decode performance. Where I noticed, like a few people above said as well, unserialize is faster than json_decode. unserialize takes roughly 60-70% of the json_decode time. So the conclusion is fairly simple:
When you need performance in encoding, use json_encode, when you need performance when decoding, use unserialize. Because you can not merge the two functions you have to make a choise where you need more performance.

My benchmark in pseudo:

  • Define array $arr with a few random keys and values
  • for x < 100; x++; serialize and json_encode a array_rand of $arr
  • for y < 1000; y++; json_decode the json encoded string – calc time
  • for y < 1000; y++; unserialize the serialized string – calc time
  • echo the result which was faster

On avarage: unserialize won 96 times over 4 times the json_decode. With an avarage of roughly 1.5ms over 2.5ms.

Answer:

Before you make your final decision, be aware that the JSON format is not safe for associative arrays – json_decode() will return them as objects instead:

$config = array(
    'Frodo'   => 'hobbit',
    'Gimli'   => 'dwarf',
    'Gandalf' => 'wizard',
    );
print_r($config);
print_r(json_decode(json_encode($config)));

Output is:

Array
(
    [Frodo] => hobbit
    [Gimli] => dwarf
    [Gandalf] => wizard
)
stdClass Object
(
    [Frodo] => hobbit
    [Gimli] => dwarf
    [Gandalf] => wizard
)

Answer:

First, I changed the script to do some more benchmarking (and also do 1000 runs instead of just 1):

<?php

ini_set('display_errors', 1);
error_reporting(E_ALL);

// Make a big, honkin test array
// You may need to adjust this depth to avoid memory limit errors
$testArray = fillArray(0, 5);

$totalJsonTime = 0;
$totalSerializeTime = 0;
$totalJsonWins = 0;

for ($i = 0; $i < 1000; $i++) {
    // Time json encoding
    $start = microtime(true);
    $json = json_encode($testArray);
    $jsonTime = microtime(true) - $start;
    $totalJsonTime += $jsonTime;

    // Time serialization
    $start = microtime(true);
    $serial = serialize($testArray);
    $serializeTime = microtime(true) - $start;
    $totalSerializeTime += $serializeTime;

    if ($jsonTime < $serializeTime) {
        $totalJsonWins++;
    }
}

$totalSerializeWins = 1000 - $totalJsonWins;

// Compare them
if ($totalJsonTime < $totalSerializeTime) {
    printf("json_encode() (wins: $totalJsonWins) was roughly %01.2f%% faster than serialize()\n", ($totalSerializeTime / $totalJsonTime - 1) * 100);
} else {
    printf("serialize() (wins: $totalSerializeWins) was roughly %01.2f%% faster than json_encode()\n", ($totalJsonTime / $totalSerializeTime - 1) * 100);
}

$totalJsonTime = 0;
$totalJson2Time = 0;
$totalSerializeTime = 0;
$totalJsonWins = 0;

for ($i = 0; $i < 1000; $i++) {
    // Time json decoding
    $start = microtime(true);
    $orig = json_decode($json, true);
    $jsonTime = microtime(true) - $start;
    $totalJsonTime += $jsonTime;

    $start = microtime(true);
    $origObj = json_decode($json);
    $jsonTime2 = microtime(true) - $start;
    $totalJson2Time += $jsonTime2;

    // Time serialization
    $start = microtime(true);
    $unserial = unserialize($serial);
    $serializeTime = microtime(true) - $start;
    $totalSerializeTime += $serializeTime;

    if ($jsonTime < $serializeTime) {
        $totalJsonWins++;
    }
}

$totalSerializeWins = 1000 - $totalJsonWins;


// Compare them
if ($totalJsonTime < $totalSerializeTime) {
    printf("json_decode() was roughly %01.2f%% faster than unserialize()\n", ($totalSerializeTime / $totalJsonTime - 1) * 100);
} else {
    printf("unserialize() (wins: $totalSerializeWins) was roughly %01.2f%% faster than json_decode()\n", ($totalJsonTime / $totalSerializeTime - 1) * 100);
}

// Compare them
if ($totalJson2Time < $totalSerializeTime) {
    printf("json_decode() was roughly %01.2f%% faster than unserialize()\n", ($totalSerializeTime / $totalJson2Time - 1) * 100);
} else {
    printf("unserialize() (wins: $totalSerializeWins) was roughly %01.2f%% faster than array json_decode()\n", ($totalJson2Time / $totalSerializeTime - 1) * 100);
}

function fillArray( $depth, $max ) {
    static $seed;
    if (is_null($seed)) {
        $seed = array('a', 2, 'c', 4, 'e', 6, 'g', 8, 'i', 10);
    }
    if ($depth < $max) {
        $node = array();
        foreach ($seed as $key) {
            $node[$key] = fillArray($depth + 1, $max);
        }
        return $node;
    }
    return 'empty';
}

I used this build of PHP 7:

PHP 7.0.14 (cli) (built: Jan 18 2017 19:13:23) ( NTS ) Copyright (c)
1997-2016 The PHP Group Zend Engine v3.0.0, Copyright (c) 1998-2016
Zend Technologies
with Zend OPcache v7.0.14, Copyright (c) 1999-2016, by Zend Technologies

And my results were:

serialize() (wins: 999) was roughly 10.98% faster than json_encode()
unserialize() (wins: 987) was roughly 33.26% faster than json_decode()
unserialize() (wins: 987) was roughly 48.35% faster than array
json_decode()

So clearly, serialize/unserialize is the fastest method, while json_encode/decode is the most portable.

If you consider a scenario where you read/write serialized data 10x or more often than you need to send to or receive from a non-PHP system, you are STILL better off to use serialize/unserialize and have it json_encode or json_decode prior to serialization in terms of time.

Answer:

Check out the results here (sorry for the hack putting the PHP code in the JS code box):

http://jsfiddle.net/newms87/h3b0a0ha/embedded/result/

RESULTS: serialize() and unserialize() are both significantly faster in PHP 5.4 on arrays of varying size.

I made a test script on real world data for comparing json_encode vs serialize and json_decode vs unserialize. The test was run on the caching system of an in production e-commerce site. It simply takes the data already in the cache, and tests the times to encode / decode (or serialize / unserialize) all the data and I put it in an easy to see table.

I ran this on PHP 5.4 shared hosting server.

The results were very conclusive that for these large to small data sets serialize and unserialize were the clear winners. In particular for my use case, the json_decode and unserialize are the most important for the caching system. Unserialize was almost an ubiquitous winner here. It was typically 2 to 4 times (sometimes 6 or 7 times) as fast as json_decode.

It is interesting to note the difference in results from @peter-bailey.

Here is the PHP code used to generate the results:

<?php

ini_set('display_errors', 1);
error_reporting(E_ALL);

function _count_depth($array)
{
    $count     = 0;
    $max_depth = 0;
    foreach ($array as $a) {
        if (is_array($a)) {
            list($cnt, $depth) = _count_depth($a);
            $count += $cnt;
            $max_depth = max($max_depth, $depth);
        } else {
            $count++;
        }
    }

    return array(
        $count,
        $max_depth + 1,
    );
}

function run_test($file)
{
    $memory     = memory_get_usage();
    $test_array = unserialize(file_get_contents($file));
    $memory     = round((memory_get_usage() - $memory) / 1024, 2);

    if (empty($test_array) || !is_array($test_array)) {
        return;
    }

    list($count, $depth) = _count_depth($test_array);

    //JSON encode test
    $start            = microtime(true);
    $json_encoded     = json_encode($test_array);
    $json_encode_time = microtime(true) - $start;

    //JSON decode test
    $start = microtime(true);
    json_decode($json_encoded);
    $json_decode_time = microtime(true) - $start;

    //serialize test
    $start          = microtime(true);
    $serialized     = serialize($test_array);
    $serialize_time = microtime(true) - $start;

    //unserialize test
    $start = microtime(true);
    unserialize($serialized);
    $unserialize_time = microtime(true) - $start;

    return array(
        'Name'                   => basename($file),
        'json_encode() Time (s)' => $json_encode_time,
        'json_decode() Time (s)' => $json_decode_time,
        'serialize() Time (s)'   => $serialize_time,
        'unserialize() Time (s)' => $unserialize_time,
        'Elements'               => $count,
        'Memory (KB)'            => $memory,
        'Max Depth'              => $depth,
        'json_encode() Win'      => ($json_encode_time > 0 && $json_encode_time < $serialize_time) ? number_format(($serialize_time / $json_encode_time - 1) * 100, 2) : '',
        'serialize() Win'        => ($serialize_time > 0 && $serialize_time < $json_encode_time) ? number_format(($json_encode_time / $serialize_time - 1) * 100, 2) : '',
        'json_decode() Win'      => ($json_decode_time > 0 && $json_decode_time < $serialize_time) ? number_format(($serialize_time / $json_decode_time - 1) * 100, 2) : '',
        'unserialize() Win'      => ($unserialize_time > 0 && $unserialize_time < $json_decode_time) ? number_format(($json_decode_time / $unserialize_time - 1) * 100, 2) : '',
    );
}

$files = glob(dirname(__FILE__) . '/system/cache/*');

$data = array();

foreach ($files as $file) {
    if (is_file($file)) {
        $result = run_test($file);

        if ($result) {
            $data[] = $result;
        }
    }
}

uasort($data, function ($a, $b) {
    return $a['Memory (KB)'] < $b['Memory (KB)'];
});

$fields = array_keys($data[0]);
?>

<table>
    <thead>
    <tr>
        <?php foreach ($fields as $f) { ?>
            <td style="text-align: center; border:1px solid black;padding: 4px 8px;font-weight:bold;font-size:1.1em"><?= $f; ?></td>
        <?php } ?>
    </tr>
    </thead>

    <tbody>
    <?php foreach ($data as $d) { ?>
        <tr>
            <?php foreach ($d as $key => $value) { ?>
                <?php $is_win = strpos($key, 'Win'); ?>
                <?php $color = ($is_win && $value) ? 'color: green;font-weight:bold;' : ''; ?>
                <td style="text-align: center; vertical-align: middle; padding: 3px 6px; border: 1px solid gray; <?= $color; ?>"><?= $value . (($is_win && $value) ? '%' : ''); ?></td>
            <?php } ?>
        </tr>
    <?php } ?>
    </tbody>
</table>

Answer:

just an fyi — if you want to serialize your data to something easy to read and understand like JSON but with more compression and higher performance, you should check out messagepack.

Answer:

JSON is better if you want to backup Data and restore it on a different machine or via FTP.

For example with serialize if you store data on a Windows server, download it via FTP and restore it on a Linux one it could not work any more due to the charachter re-encoding, because serialize stores the length of the strings and in the Unicode > UTF-8 transcoding some 1 byte charachter could became 2 bytes long making the algorithm crash.

Answer:

THX – for this benchmark code:

My results on array I use for configuration are as fallows:
JSON encoded in 0.0031511783599854 seconds
PHP serialized in 0.0037961006164551 seconds
json_encode() was roughly 20.47% faster than serialize()
JSON encoded in 0.0070841312408447 seconds
PHP serialized in 0.0035839080810547 seconds
unserialize() was roughly 97.66% faster than json_encode()

So – test it on your own data.

Answer:

If to summ up what people say here, json_decode/encode seems faster than serialize/unserialize BUT
If you do var_dump the type of the serialized object is changed.
If for some reason you want to keep the type, go with serialize!

(try for example stdClass vs array)

serialize/unserialize:

Array cache:
array (size=2)
  'a' => string '1' (length=1)
  'b' => int 2
Object cache:
object(stdClass)[8]
  public 'field1' => int 123
This cache:
object(Controller\Test)[8]
  protected 'view' => 

json encode/decode

Array cache:
object(stdClass)[7]
  public 'a' => string '1' (length=1)
  public 'b' => int 2
Object cache:
object(stdClass)[8]
  public 'field1' => int 123
This cache:
object(stdClass)[8]

As you can see the json_encode/decode converts all to stdClass, which is not that good, object info lost… So decide based on needs, especially if it is not only arrays…

Answer:

I would suggest you to use Super Cache, which is a file cache mechanism which won’t use json_encode or serialize. It is simple to use and really fast compared to other PHP Cache mechanism.

https://packagist.org/packages/smart-php/super-cache

Ex:

<?php
require __DIR__.'/vendor/autoload.php';
use SuperCache\SuperCache as sCache;

//Saving cache value with a key
// sCache::cache('<key>')->set('<value>');
sCache::cache('myKey')->set('Key_value');

//Retrieving cache value with a key
echo sCache::cache('myKey')->get();
?>