I have to insert 8000+ records into a SQLite database using Django’s ORM. This operation needs to be run as a cronjob about once per minute.
At the moment I’m using a for loop to iterate through all the items and then insert them one by one.
for item in items: entry = Entry(a1=item.a1, a2=item.a2) entry.save()
What is an efficient way of doing this?
Edit: A little comparison between the two insertion methods.
Without commit_manually decorator (11245 records):
[email protected] marinetraffic]$ time python manage.py insrec real 1m50.288s user 0m6.710s sys 0m23.445s
Using commit_manually decorator (11245 records):
[[email protected] marinetraffic]$ time python manage.py insrec real 0m18.464s user 0m5.433s sys 0m10.163s
Note: The test script also does some other operations besides inserting into the database (downloads a ZIP file, extracts an XML file from the ZIP archive, parses the XML file) so the time needed for execution does not necessarily represent the time needed to insert the records.
You want to check out
So it would be something like:
from django.db import transaction @transaction.commit_manually def viewfunc(request): ... for item in items: entry = Entry(a1=item.a1, a2=item.a2) entry.save() transaction.commit()
Which will only commit once, instead at each save().
In django 1.3 context managers were introduced.
So now you can use transaction.commit_on_success() in a similar way:
from django.db import transaction def viewfunc(request): ... with transaction.commit_on_success(): for item in items: entry = Entry(a1=item.a1, a2=item.a2) entry.save()
In django 1.4,
bulk_create was added, allowing you to create lists of your model objects and then commit them all at once.
NOTE the save method will not be called when using bulk create.
>>> Entry.objects.bulk_create([ ... Entry(headline="Django 1.0 Released"), ... Entry(headline="Django 1.1 Announced"), ... Entry(headline="Breaking: Django is awesome") ... ])
In django 1.6, transaction.atomic was introduced, intended to replace now legacy functions
from the django documentation on atomic:
atomic is usable both as a decorator:
from django.db import transaction @transaction.atomic def viewfunc(request): # This code executes inside a transaction. do_stuff()
and as a context manager:
from django.db import transaction def viewfunc(request): # This code executes in autocommit mode (Django's default). do_stuff() with transaction.atomic(): # This code executes inside a transaction. do_more_stuff()
Bulk creation is available in Django 1.4:
Have a look at this. It’s meant for use out-of-the-box with MySQL only, but there are pointers on what to do for other databases.
You might be better off bulk-loading the items – prepare a file and use a bulk load tool. This will be vastly more efficient than 8000 individual inserts.
You should check out DSE. I wrote DSE to solve these kinds of problems ( massive insert or updates ). Using the django orm is a dead-end, you got to do it in plain SQL and DSE takes care of much of that for you.
To answer the question particularly with regard to SQLite, as asked, while I have just now confirmed that bulk_create does provide a tremendous speedup there is a limitation with SQLite: “The default is to create all objects in one batch, except for SQLite where the default is such that at maximum 999 variables per query is used.”
The quoted stuff is from the docs— A-IV provided a link.
What I have to add is that this djangosnippets entry by alpar also seems to be working for me. It’s a little wrapper that breaks the big batch that you want to process into smaller batches, managing the 999 variables limit.
I recommend using plain SQL (not ORM) you can insert multiple rows with a single insert:
insert into A select from B;
The select from B portion of your sql could be as complicated as you want it to get as long as the results match the columns in table A and there are no constraint conflicts.
I’ve ran into the same problem and I can’t figure out a way to do it without so many inserts.
I agree that using transactions is probably the right way to solve it, but here is my hack:
def viewfunc(request): ... to_save = ; for item in items: entry = Entry(a1=item.a1, a2=item.a2) to_save.append(entry); map(lambda x: x.save(), to_save);