AWS Lambda HTML Sanitizer for Django

ModelFormURLConfViewTemplateLambdaFunctionLambdaUtilityWSGI

Diagram Overview

I recently built an application that allowed users to submit content. I wanted to make sure that whatever users sent in was safe before showing it to other users on the site.

I had a couple of choices. Since the backend was written in Django I could've used a Python library to clean the content. However, the content coming from the frontend could include a range of tags, and the Python libraries that I could find didn't have support for content like SVG.

Upon investigation I came across DOMPurify, which supports cleaning a wide range of content and is actively developed. The only snag in using it is that my application is written in Python, and DOMPurify is JavaScript.

There are a number of ways to handle this kind of situation. Since I was already using AWS, I opted to build a Lambda function that could accept any type of HTML content, clean it using DOMPurify, and return the cleaned version.

This diagram shows how my solution works. You can click each component of the diagram to see the code that was used. Please explore and let me know if you have any questions!

The Comment Model

User submitted comments will ultimately be saved into the Comment table in our database. We will use Django's Model library to make to do so. In this example our Comment table is very simple; we simply have a comment field that will store comments.

from django.db import models


class Comments(models.Model):
    comment = models.TextInput()

Code Review

Line 1: Import the built-in Django models class.

Line 3: Define Comments as the name of our table.

Line 4: Define the comment table field and create it as a text input field type.

The Comment Form

The project uses the Django form library to display the form on the webpage, and to validate incoming form submissions. To make things easy, we will use Django's ModelForm library to base the form on our previously created model.

from django.forms import ModelForm

from .models import Comments


class CommentForm(ModelForm):
    class Meta:
        model = Comments
        fields = ('comment',)

Code Review

Line 1: Import the built-in ModelForm class.

Line 3: Import the project's Comments model.

Line 6: Define the name of our form class, and sub-class it from the built-in ModelForm

Line 7-9: Define the Meta details, and set the model value to Comments. Then, define what fields we want the form to display (hint: we only have one).

The URL Conf

The first thing we want to do is setup a URL endpoint. In our example project we want to allow users to open /create-comment/ to create comments. Upon opening this URL the user should be presented with a web page that has a comment form.

from django.conf.urls import url

from comments.views import create_comment


urlpatterns = [
        url(r'^create-comment/', create_comment),
]

Code Review

Line 1: This imports the built-in function used to create URL configurations

Line 4: This creates the required urlpatterns list. All of our URL routes should be configured here.

Line 5: This sets up our URL configuration. The first part creates an endpoint at /create-comment/. The second part specifies what view the requests should be routed to. And the last part sets the name to create_comment.

The View Function

Our URL configuration was setup to pass requests coming in to the URL /create-comment/ to the view function named create_comment . This view function's purpose is to verify the incoming data is valid, clean it with our Lambda function, and then save it to the database.

from django.shortcuts import render

from .models import Comments
from .forms import CommentForm
from .utilities.dom_purify_lambda import dom_purify


def create_comment(request):

    if request.POST:
        form = CommentForm(request.POST)
        if form.is_valid():
            dirty_dom = form.cleaned_data['comment']
            clean_dom = dom_purify(dirty_dom)
            instance = Comments.objects.create(comment=clean_dom)
            return render(request, 'comment.html', {'obj': instance})
    else:
        form = CommentForm()

    context = {
        'form': form,
    }

    return render(request, 'comment.html', context)

Code Review

Line 1: Import the Django render utility. This is used to send the response to the user's browser.

Lines 3: This imports the Comments model. We will save user comments to this model.

Line 4: This imports the CommentForm form. We will use this to validate user submitted content.

Line 5: This imports the DOMPurify Lambda utility. You will learn how this is created a little later.

Line 8: This defines the create_comment function. This is the function we use to process the incoming "create comment" requests. Note that this is the function named that we imported into the urls.py file above.

Line 10: This checks to see if the incoming request is a POST request. If so, this means that the user is submitting a comment form and we need to process it.

Line 11: Here we create our form object by passing the request.POST data to the CommentForm. This allows us to run the form validation on the incoming request.

Line 12: Here we call form.is_valid to validate the incoming request. This compares the data in request.POST to what we have defined in our form object.

Line 13: This extracts the user comment and assigns it to the dirty_dom variable.

Line 14: This is where things get exciting! We're passing the user comment, now assigned to variable dirty_dom, into our Lambda DOMPurify utility. When we do this, the function will take the user comment and send it to the AWS Lambda service to be cleaned. The cleaned comment will then be returned back to this function.

Line 15: This saves the cleaned comment to our database model.

Line 16: This uses the render utility to return the template and instance to the user.

Line 17-24: If this is not a POST request, then we will create an empty form object and render it to the end user.

The Comment Template

This simple project has but one template file. This template file is configured to display a comment form by default. It allows an end user to type in a comment and submit it for display. The user comment will then be submitted to the AWS Lambda service to be cleaned via DOMPurify.

<html>
    <body>
        <div>
            {% if form %}
                <form action="." method="POST">
                    {% csrf_token %}
                    {{ form }}
                    <button type="submit">Save Comment</button>
                </form>
            {% endif %}
            {% if obj %}
                <div>
                    <p>Cleaned Comment</p>
                    <p>{{ obj.comment }}</p>
                </div>
            {% endif %}
        </div>
    </body>
</html>

Code Review

Line 4-10: This block will only be displayed if the form object exists. In our project that means that this will render when a user first accesses the /create-comment/ URL. If so, a blank comment form will display.

Line 11-16: This block will display after the user submits a comment to be saved. It will display the cleaned comment.

The Lambda Service Function

We also need to create the actual Lambda function. This is the code that the AWS Lambda function will execute every time it is invoked. This is a JavaScript function that uses the DOMPurify code to clean HTML content. The function accepts content to be cleaned, will clean it, and return cleaned content.

const createDOMPurify = require('dompurify');
const jsdom = require('jsdom');
const window = jsdom.jsdom('', {
  features: {
    FetchExternalResources: false, // disables resource loading over HTTP / filesystem
    ProcessExternalResources: false // do not execute JS within script blocks
  }
}).defaultView;
const DOMPurify = createDOMPurify(window);

exports.handler = (event, context, callback) => {
    clean = DOMPurify.sanitize(event.dirty);
    callback(null, clean);
}

Code Review

Line 1-9: This code will setup the DOMPurify utility according to the project's docs.

Line 11-14: This code defines the Lambda function. This is the function that the Lambda service will execute every time it is invoked. This handler accepts the Lambda event object, context, and callback function. It then creates a variable named clean that is assigned the output from DOMPurify.sanitize. It then returns the cleaned comment via the callback function.

The Lambda Initiator Function

We need a way to communicate with AWS' Lambda service. We need to send "dirty" user comments to the service, and receive a "clean" version back. To do this we can create a Python utility in our project using the boto3 library. The boto3 library is maintained by AWS and is used to provide a Python interface to their cloud services.

import boto3
import json


def dom_purify(dirty_dom, region='us-west-2', lambda_name='DOMPurify'):

    client = boto3.client(
        'lambda',
        region_name=region,
        aws_access_key_id='<access_key_here>',
        aws_secret_access_key='<secret_key_here'
    )

    ready_json = {"dirty": dirty_dom}

    payload3=json.dumps(ready_json)

    response = client.invoke(
        FunctionName=lambda_name,
        InvocationType="RequestResponse",
        Payload=payload3
    )

    return json.loads(response['Payload'].read())

Code Review

Line 1-2: We start off by importing the boto3 and json libraries.

Line 5: This defines the name of our function, and sets default values that will be used in the function. The dirty_dom value is the content to be cleaned. The region defines the region that our Lambda function is created in. And the lambda name value is the name of our Lambda function.

Line 7-12: This code creates the boto3 Lambda client. When creating an instance of the client, you need to specify several values, including the service to be used (lambda), the AWS region, and the keys needed to access your AWS account. You can refer to the boto3 docs for more details.

Line 14-16: This creates a properly formatted JSON payload from the value that was passed into the function. We need it to have this format to pass to the Lambda service.

Line 18-22: This invokes an instance of our Lambda function using the boto3 client we created previously. To invoke Lambda we need to set the FunctionName, InvocationType, and Payload values. This is the code that actually runs the Lambda function.

Line 24: This statement returns the cleaned user comment!

The Client

Applications exist to serve people! This icon identifies a user who is submitting their comment into the system.

NGINX & uWSGI

In order to receive and respond to user requests, we need some basic web and Python infrastructure plumbing. This application uses nginx as the web server. Nginx is the component that can handle incoming HTTP requests. It is the component listening on port 80 on the server.

While Nginx can handle HTTP traffic, it alone can't interface with our Python/Django application. In order to route the HTTP requests to that application, we need a special interface that can perform some translations for us. In our case we are using uWSGI.

Together, Nginx and uWSGI work together to receive and response to user web requests to our Django application.

Open