How To Write A Delete Function For Django Blog
How to use Django Rest Framework to safely delete your data
The past few years of rapid growth have left Button with unused data which cluttered our internal dashboards. In particular, there were a lot of accounts representing companies with no verification. One of my first projects at Button was to remove these accounts and everything related to them. In a perfect world, we'd be able to write a script that iterates through each account, checks to see if it is or isn't verified, and simply deletes it. But what if one day we needed to restore this account, or worse — what if someone accidentally ran the script and it permanently deleted one of our partners' data? To proceed safely, we needed a mechanism to undo our deletions. We call this: soft deletes. In this post, I'll show you how to safely delete your data and bring it back to life using Django Rest Framework. If you're not familiar, Django Rest Framework provides an ORM (Object-relational mapping) which gives you a fancy way to talk to your database by letting you think in terms of models instead of SQL tables, columns, and rows. It may be an obvious feature, but inheriting from other Django model classes lets you share functionality. We already had a Using the same principle of inheriting fields, we created a To use this effectively, every query would need to explicitly filter out objects which are deleted. A really basic example might look something like this: This on its own is not very useful and is extremely prone to bugs. Expecting my coworkers (and especially myself) to remember to add an Luckily this isn't necessary! Every model in Django has a Manager, which is responsible for providing an interface for querying that particular model. As seen in the query just above, the default manager is referred to by User.objects but you can easily override with objects using a custom Manager that is more tailored to your needs. In our case we wanted our manager to filter out anything that's been deleted. A first approach to doing this might be creating a Manager that simply excludes anything where is_deleted is True, like this: Now the query to get the first Ian (who isn't deleted) simply becomes: And if we wanted to the query to include deleted Ians, we'd just use: To recap so far, we've added a new class NOTE: The following sections borrow logic from each other. Certain things might not be explained fully until you read all three sections. Django lets you specify relationships between models in a straightforward way. For example, let's say we have a many-to-one relationship between cars and their owners and their manufacturers: In the fictional world of the example above, when a manufacturer is deleted, all cars produced by that manufacturer are deleted (just like real life!). But when car's owner is deleted, any cars that refer to that user as the owner will set its To keep this behavior consistent in our soft-delete model, we'll need to do some surgery. The first thing we'll do is add an After a LOT of poking around debugging sessions and reading the source, I came up with this: This works by looking at all the models that point to the object we're currently deleting ( The The beginning of this post listed snapshotting and recovery as a requirement. After all, what good is a soft delete if there's no way to undo it? Snapshotting can be done multiple ways. One approach is to copy the data into a different location (maybe a different table, or even a different database — it doesn't matter). With this method, to restore a record, you'd get all the data contained in the snapshot and reinsert it. This poses a few issues though. First, you can't look up deleted data (e.g. for audits or debugging). It also makes maintaining your schema a nightmare. Say you run a migration to your database that adds a new field with some default value. There's no way to cleanly apply those changes to your snapshotted data without writing custom adapters for your external, deleted records. You'll also lose some metadata (created_at, modified_at, etc.) when reinserting. Your primary key could also be reused! A nicer way to do the same thing (in my opinion) is to maintain snapshots in the form of tombstones with some extra data. The main idea is that your data never actually leaves the database, it's just marked as deleted. To do this, we'll add a foreign key The A common pattern is to operate on collections of objects without actually loading them. The statement To fix this, we can create our own Now when we call This was my first time using Python in a production setting and it was a good way to jump straight into the deep end. I learned a lot about Button's internal systems and Django in general while solving this task and hope that this post can be thought provoking or useful for others who find themselves in a similar position of needing to delete their data and bring it back to life. If you're interested in tackling Django and many of the other challenges at Button, visit our opportunities page. We're hiring!What we started with
BaseModel
which contained useful metadata fields such ascreated_at
and modified_at
. Any model that inherits from BaseModel
gets these fields as a column on its respective database table. Our BaseModel
also provides some additional functionality like emitting useful metrics on record creation.class BaseModel(models.Model): created_at = models.DateTimeField(auto_now_add=True) modified_date = models.DateTimeField(auto_now=True) class Meta: abstract = True # other fields / methods...
SoftDeleteModel
which inherits from BaseModel
. In its simplest and most naive form, SoftDeleteModel
adds a single is_deleted
field which specifies if the record was deleted at. It also overrides the Django's default delete
behavior.class SoftDeleteModel(BaseModel): class meta: abstract = True is_deleted = models.DateTimeField(null=False, default=False) def delete(self): self.is_deleted = True self.save() def restore(self): self.is_deleted = False self.save()
# horrible! User.objects.filter(first_name='Ian', is_deleted=False).first()
is_deleted=False
filter to every single query is unreasonable and may not even be effective.class SoftDeleteManager(models.Manager): def __init__(self, *args, **kwargs): self.with_deleted = kwargs.pop('deleted', False) super(SoftDeleteManager, self).__init__(*args, **kwargs) def _base_queryset(self): return super().get_queryset().filter(deleted_at=None) def get_queryset(self): qs = self._base_queryset() if self.with_deleted: return qs return qs.filter(is_deleted=False) class SoftDeleteModel(BaseModel): class meta: abstract = True objects = SoftDeleteManager() objects_with_deleted = managers.SoftDeleteManager(deleted=True) is_deleted = models.BooleanField(null=False, default=False) def delete(self): self.is_deleted = True self.save() def restore(self): self.is_deleted = False self.save()
User.objects.filter(first_name='Ian').first()
<code>User.objects_with_deleted.filter(first_name='Ian').first()</code>
SoftDeleteModel
. It adds a new field, is_deleted
, replaces the delete
method, and changes the default manager to SoftDeleteManager
in order to exclude "deleted" records. This is a decent start, but there are a few major issues which I'll list, and then discuss in more detail how we solved them.Issues (in no particular order):
on_delete
for their foreign keys)User.objects.all().delete()
will bypass soft deletes and permanently remove all of our users, probably not good!)
Foreign keys
class Car(models.Model): # CASCADE is the default behavior. When this car's manufacturer is deleted, # this car will also be deleted manufacturer = models.ForeignKey(Manufacturer, on_delete=models.CASCADE) # SET_NULL makes it so when this car's owner gets deleted, it will continue # to exist without an owner. owner = models.ForeignKey(User, on_delete=models.SET_NULL)
owner
field to null. I like the interface for this and really don't want to change it ._on_delete
method that will be called after a soft-deletable record is deleted. The goal of this method is to traverse the graph of objects related to the record we just deleted, and recursively update or delete each record.Class SoftDeleteModel(BaseModel): # same fields / methods from before def _on_delete(self): for relation in self._meta._relation_tree: on_delete = getattr(relation.related, 'on_delete', models.DO_NOTHING) if on_delete in [None, models.DO_NOTHING]: continue snapshot_kwargs = {} if issubclass(relation.model, SoftDeleteModel): snapshot_kwargs['snapshot_id'] = self.snapshot_id filter = {relation.name: self} related_queryset = relation.model.objects.filter(**filter) if on_delete == models.CASCADE: relation.model.objects.filter(**filter).delete(**kwargs) elif on_delete == models.SET_NULL: for r in related_queryset.all(): # We'll define SnapshotRecord later in this post SnapshotRecord.objects.get_or_create( snapshot=self.snapshot, record_id=r.pk, foreign_key='{}:{}'.format(relation.name, self.pk)) related_queryset.update(**{relation.name: None}) elif on_delete == models.PROTECT: if related_queryset.count() > 0: raise ProtectedError() else: raise(NotImplementedError())
self
). We use self._relation_tree
to get a list of all the relationships in which self is involved in. We can then filter those records using the name by which the related model refers to self
.on_delete
field defined for that relationship determines what we do next.
DO_NOTHING
we can skip this field and move on to the next one.PROTECTED
, we need to verify that no records exist in that relationship that are still pointing to self
.on_delete
behavior is set to CASCADE
, we can simply delete those records. The only caveat is that we need to determine if the referencing model is soft-deletable or not. If it is, we need to pass in snapshot_id
so that we can tie all the deleted records to the same snapshot. If it's not we can simply delete all those records. I'll cover snapshots in the next section.on_delete
is SET_NULL
, we can update the field (which references self
in the related record) to be NULL
. We also need to explicitly create a new SnapshotRecord
record which describes this change.Recovery
Solution
snapshot_id
to our SoftDeleteModel and whenever a record is deleted, we'll create a new Snapshot. Each Snapshot will have a list of all the records affected by the delete that created it. As mentioned earlier, we don't want to break any existing Django interfaces. In particular, we want to preserve the behavior of foreign keys' on_delete
behavior. Let's start by defining two models which will represent a Snapshot.class Snapshot(models.Model): # unique identifier for snapshots id = models.Charfield(max_length=32) # the record which initiated the snapshot root_id = models.CharField(max_length=32) @transaction.atomic def restore(self): [x.restore() for x in self.snapshot_records.all()] self.delete() class SnapshotRecord(models.Model) snapshot = models.ForeignKey(Snapshot, null=False) # this field is used to represent the record affected by the delete record_id = models.CharField(max_length=32) # these fields are used to represent links to another record. # if they are set, it means a fk_field = models.CharField(max_length=64, null=True) fk_value = models.CharField(max_length=64, null=True) @transaction.atomic def restore(self): # we'll implement BaseModel.get later record = BaseModel.get(self.record_id, include_deleted=True) if fk_model is None: record.is_deleted = False else: setattr(record, self.fk_field, self.fk_value) record.save() self.delete()
Snapshot
model itself is straightforward enough. It has a unique id
and a root_id
which references the original record that was removed. What I mean by original is if we have delete something that references it, we want to make sure we preserve Django's original behavior by cascading those soft-deletes through those references or setting foreign keys to NULL
where appropriate.SnapshotRecord
is a little more interesting. It contains a reference to the Snapshot
to which it belongs and a record_id
field which refers to a single record that was affected. In order to store records of any model type, values in this field will be of the form: <model_name>.<primary_key>
. SnapshotRecords representing records that were deleted will simply have a snapshot_id
and a record_id
. The fields fk_field
and fk_value
field will be used to indicate links to records that were deleted for models that exhibit on_delete=SET_NULL
behavior. Here's an example of how it works:py class User(SoftDeleteModel): name = models.CharField(null=False, blank=False) class Cat(SoftDeleteModel): owner = models.ForeignKey( User, null=True, blank=True, on_delete=models.SET_NULL) ian = User.new(name='Ian') pichael = Cat.new(name='Pichael', owner=ian) ian.delete() print(pichael.owner) # should print None # Now there should be a Snapshot that looks like this: # { # id: 1, # root_id: ian.id, # snapshot_records: [ # { record_id: ian.id, foreign_key: null }, # { record_id: pichael.id, foreign_key: "owner_id:{ian.id}" } # ] # } ian.restore() print(pichael.owner) # should be ian
Batch deletes
User.objects.filter(name="Ian").delete()
will bypass the custom delete method we added in SoftDeleteModel
and permanently remove all of the users named Ian without leaving a trace! This is because the Manager we're using returns a QuerySet that uses the default delete behavior.QuerySet
that our custom Manager
(defined earlier in this post) can use. Although slow, we can simply change the behavior of delete
to iterate over the set of records contained in the query set , and individually delete each one.class SoftDeleteQuerySet(models.QuerySet): @transaction.atomic def delete(self, snapshot_id=None): [x.delete(snapshot_id=snapshot_id) for x in self] class SoftDeleteQueryManager(models.Manager): def __init__(self, *args, **kwargs): self.with_deleted = kwargs.pop('with_deleted', False) super(SoftDeleteManager, self).__init__(*args, **kwargs) # same fields and methods before except for def get_queryset(self): qs = SoftDeleteQuerySet(self.model) if self.with_deleted: return qs return qs.filter(is_deleted=False)
User.objects.filter(name="Ian").delete()
we can rest assured knowing the each Ian will be removed safely and softly.
Summary
- Django
- Engineering
- Mobile
- Technology
How To Write A Delete Function For Django Blog
Source: https://blog.usebutton.com/cascading-soft-deletion-in-django
Posted by: cammackreamost.blogspot.com
0 Response to "How To Write A Delete Function For Django Blog"
Post a Comment