⟵back

Django custom migrations don't have to be scary

In the past few months, I've had the chance to write a couple of my own Django custom migrations. Up until then, I had seen this as a bit of a frightening prospect; any change to a migration can mean loss or corruption of data, after all.

You can think of a Django app's migrations as a sort of timeline of the shape of its data. If a system runs in multiple environments, a migration that is run in each one ensures that all databases in all environments are up to speed with the latest changes.

Here are a couple of use cases. Note to anyone from work who may be reading this: obviously I've adapted/obfuscated important details 🙂

Use Case #1: revert false data

In this example, we have a pretty straightforward ticketing system for our users. Unfortunately, some data has been corrupted, meaning for some of the tickets, the opened_at date is later than the closed_at date — to be precise, it has been offset by one week into the future. This causes a bug elsewhere (as well as being generally incorrect), so we need to revert the existing data.

from datetime import timedelta
from django.db import migrations

def revert_opened_at(apps, schema_editor):
	Ticket = apps.get_model("ticket", "Ticket")

	for ticket in Ticket.objects.all():
		opened_at = ticket.opened_at
		# not all tickets have necessarily been closed yet, so we check first if the attribute exists:
		closed_at = getattr(author, "closed_at", None)

		if closed_at and closed_at < opened_at:
			opened_at = opened_at - timedelta(days=7)
			ticket.save(update_fields=["opened_at"])
			
class Migration(migrations.Migration):
	dependencies = [
		("ticket", "0024_previous_migration_name"),
	]

	operations = [
		migrations.RunPython(
		# run the function we defined:
		code=revert_opened_at,
		# there is no reverse in this case, so we leave it "empty":
		reverse_code=migrations.RunPython.noop),
	]

Use Case #2: schedule a task

There is a bit of overlap here with DjangoQ, a package that manages scheduled tasks. In this case, we are working with an external SQL database from which we will continuously pull information.

The task itself is defined elsewhere — let's call it book_update_task. The exact logic is not relevant, but the task is run every day to check if the user has finished reading a new book, after which they add it to this external database. If there is a new book, the task updates the data to reflect this, processes it according to the schema of our own database, and imports it there.

Since the application runs in various environments, the task has to be added via migration that will then run automatically upon deployment.

from django.db import migrations, models

def add_book_data_import_task(apps, schema_editor):

	Schedule = apps.get_model("django_q", "Schedule")

    # check for an existing task, and if there isn't one, create it
    # (the time of day that this task will run is user-defined in the Django admin)
	if not Schedule.objects.filter(func="app.tasks.update_book_data").exists():
	Schedule.objects.create(
		name="Daily Book Update Task",
		schedule_type="D",
		func="app.tasks.update_book_data",
		hook="app.tasks.update_book_data",
)

def delete_book_data_import_task(apps, schema_editor):
	Schedule = apps.get_model("django_q", "Schedule")
	# delete the task after it's finished running
	Schedule.objects.filter(func="app.tasks.update_book_data").delete()

class Migration(migrations.Migration):

	dependencies = [
		("some_app_name", "0005_previous_migration_name"),
		("django_q", "0017_task_cluster_alter"),
	]

	operations = [
		migrations.RunPython(
		    add_data_import_task,
		    # in this case we actually do have a reverse operation:
		    reverse_code=delete_data_import_task
		),
	]

I do find it less scary now, but you still have to look at everything with an eagle eye before you deploy. I made a mistake in one that both a human reviewer and an AI reviewer managed to miss 😉