Home » Mysql » Python+MySQL – Bulk Insert

Python+MySQL – Bulk Insert

Posted by: admin November 29, 2017 Leave a comment

Questions:

I’m working with the MySQLdb module in Python to interact with a database. I have a situation where there is a very large list (tens of thousands of elements) which I need to insert as rows into a table.

My solution right now is to generate a large INSERT statement as a string and execute it.

Is there a smarter way?

Answers:

There is a smarter way.

The problem with bulk insertions is that by default autocommit is enabled thus causing each insert statement to be saved to stable store before the next insert can initiate.

As the manual page notes:

By default, MySQL runs with autocommit
mode enabled. This means that as soon
as you execute a statement that
updates (modifies) a table, MySQL
stores the update on disk to make it
permanent. To disable autocommit mode,
use the following statement:

SET autocommit=0; 

After disabling
autocommit mode by setting the
autocommit variable to zero, changes
to transaction-safe tables (such as
those for InnoDB, BDB, or NDBCLUSTER)
are not made permanent immediately.
You must use COMMIT to store your
changes to disk or ROLLBACK to ignore
the changes.

This is a pretty common feature of RDBMs systems which presume that database integrity is paramount. It does make bulk inserts take on the order of 1s per insert instead of 1ms. The alternative of making an overlarge insert statement tries to achieve this single commit at risk of overloading the SQL parser.

Questions:
Answers:

If you have to insert very large amount of data why are you trying to insert all of them in one single insert? (This will unecessary put load on your memory in making this large insert string and also while executing it. Also this isn’t a very good solution if your data to be inserted is very very large.)

Why don’t you put one row per insert command in the db and put all the rows using a for...loop and commit all the changes in the end?

con = mysqldb.connect(
                        host="localhost",
                        user="user",
                        passwd="**",
                        db="db name"
                     )
cur = con.cursor()

for data in your_data_list:
    cur.execute("data you want to insert: %s" %data)

con.commit()
con.close()

(Believe me, this is really fast but if you are getting slower results then it means your autocommit must be True. Set it to False as msw says.)

Questions:
Answers:

As long as you’re doing it as a single INSERT and not thousands of individual ones, then yes this is the best way to do it. Watch out for not exceeding mysqls’s max packet size, and adjust it if necessary. For example this sets the server packet max to 32Mb. You need to do the same on the client too.

mysqld --max_allowed_packet=32M