---
title: "Security with Python"
date: 2018-05-15
description: "Building a Python encryption library using scrypt for key derivation, covering KDF, PBKDF2, and digest generation."
tags: [python, security]
---

## Introduction

I recently implemented a Python library which acts as an abstraction layer on top of an existing security algorithm (in this case [scrypt](https://www.tarsnap.com/scrypt.html)).

The motivation was for allowing teams to have a consistent experience utilising encryption (and hashing) in their applications and services without necessarily having to know the ins-and-outs of what's important with regards to salts, key lengths etc.

> [!INFO]
> I always encourage people to understand what it is they're doing, but in some cases that's not always a practical mindset.

The library provides three functions:

1. `generate_digest`
1. `decrypt_digest`
1. `validate_digest`

{{\< adverts/pythonforprogrammers >}}

Before we start looking at the three functions provided by this library/interface, let's very briefly talk about KDF and PBKDF2.

A [KDF](https://en.wikipedia.org/wiki/Key_derivation_function) (Key Derivation Function) accepts a message + a key, and produces a digest for its output. They are designed to be more computationally intensive than standard hashing functions, and so they make it harder to use dictionary or rainbow table style attacks (as they would require a lot of extra memory resources and become more unfeasible as an attack vector).

By default the KDF will generate a random salt (thus output is non-deterministic) and have a maximum computational time of `0.5` (although this can be overridden using a `maxtime` argument, as we'll see later).

A [PBKDF2](https://en.wikipedia.org/wiki/PBKDF2) on the other hand is able to provide deterministic output (as well as the ability to specify an explicit salt value). The internal implementation will repeat its process multiple times, thus reducing the feasibility of automated password cracking attempts (similar to a KDF).

I mention both of these (KDF and PBKDF2) because the `generate_digest` function I've written is a multi-arity function that will switch implementation based upon the provided arguments in the method signature.

Originally I had two separate functions to distinguish them a bit more clearly but realised if this library is to make life easier for developers who don't understand encryption or hashing concepts, then I need to provide a single function that intelligently handles things internally.

Because KDF accepts a key and is able to return the original message (given the same key) it's acting as a form of symmetrical encryption, whereas a PBKDF2 is more like a one-way hash function. Hence I named the function in this library `generate_digest` rather than something like `encrypt_message` which wouldn't have made sense when dealing with PBKDF2.

## generate_digest

This is a multi-arity function that will generate a digest using either a password-based key derivation function ([KDF](https://en.wikipedia.org/wiki/Key_derivation_function)) or a [PBKDF2](https://en.wikipedia.org/wiki/PBKDF2) depending on the input given.

If a `password` argument is provided, then KDF will be used (along with a random salt) to generate a _non-deterministic_ digest.

If a `salt` is provided, then a PBKDF2 will be used to generate a _deterministic_ digest.

> [!IMPORTANT]
> salts should be a minimum of 128bits (~16 characters) in length. Also, when specifying a maxtime with `generate_digest`, ensure you include that same value when decrypting with `decrypt_digest` or validating via `validate_digest`.

## decrypt_digest and validate_digest

The `decrypt_digest` and `validate_digest` functions only apply to digests that have been generated using a password (i.e. KDF). Given the right password `decrypt_digest` will return the original message, and thus is considered more a form of symmetrical encryption than a straight one-way hash function. The `validate_digest` function will return a boolean true or false if the given password was able to decrypt the message.

## Dependencies

This abstraction library requires `scrypt`, which itself requires the following dependencies to be installed within the context of your service: `build-essential`, `libssl-dev`, and `python-dev`. If your service has a Dockerfile, adding these dependencies should be as simple as adding a line like the following:

```
RUN apt-get update && apt-get install -y build-essential libssl-dev python-dev
```

## Usage

I suggest looking at the test suite (see below) to get an idea of how you would use the functions in this library.

> [!INFO]
> for a glossary of security terms, refer to [this document](https://docs.google.com/document/d/1qs3jEIQvocdVhSxCSPLF1BoLnp91aLnuUIasvl-maYo/edit?usp=sharing).

## Tests

Before we look at the implementation of the library, let's take a moment to sift through its test suite.

> [!INFO]
> I named the library `secure` and have it running on a private PyPy instance. This code is [made available via GitHub](https://github.com/Integralist/Python-Encryption).

```
import pytest

from secure.interface import ArgumentError, generate_digest, validate_digest, decrypt_digest


message = "my-message"
password = "my-password"
salt = "my-salt-is-long-enough"


def test_generate_digest_with_both_a_password_and_a_salt():
    """Providing both a password and a salt should raise an exception."""

    with pytest.raises(ArgumentError):
        generate_digest(message, salt=salt, password=password)


def test_generate_digest_with_a_password():
    """Generating a digest with a password should be non-deterministic."""

    digest1 = generate_digest(message, password=password)
    digest2 = generate_digest(message, password=password)
    digest3 = generate_digest(message, password=password, maxtime=1.5)
    digest4 = generate_digest(message, password=password, maxtime=1.5)
    digest5 = generate_digest(message, password=password, maxtime=int(1))
    digest6 = generate_digest(message, password=password, maxtime=int(1))

    assert digest1 != digest2
    assert digest3 != digest4
    assert digest5 != digest6


def test_generate_digest_without_a_password():
    """Generating a digest without a password should be deterministic."""

    digest1 = generate_digest(message)
    digest2 = generate_digest(message)
    digest3 = generate_digest(message, salt=salt)
    digest4 = generate_digest(message, salt=salt)
    digest5 = generate_digest(message, length=128)
    digest6 = generate_digest(message, length=128)

    assert digest1 == digest2
    assert digest3 == digest4
    assert len(digest5) == len(digest6)


def test_generate_digest_with_different_salt_lengths():
    """Salts should be at least 128bits (~16 characters) in length."""

    generate_digest(message, salt=salt)

    with pytest.raises(ArgumentError):
        generate_digest(message, salt="too-short")

def test_validate_digest():
    """Validation only applies to digests generated with a password."""

    digest1 = generate_digest(message, password=password)
    digest2 = generate_digest(message, password=password)
    digest3 = generate_digest(message, password=password, maxtime=1.5)
    digest4 = generate_digest(message, password=password, maxtime=1.5)
    digest5 = generate_digest(message, password=password, maxtime=int(1))
    digest6 = generate_digest(message, password=password, maxtime=int(1))

    assert not validate_digest(digest1, 'incorrect-password')
    assert validate_digest(digest1, password)
    assert validate_digest(digest3, password, maxtime=1.5)
    assert validate_digest(digest5, password, maxtime=int(1))


def test_decrypt_digest():
    """Decryption is possible given the right password."""

    digest = generate_digest(message, password=password)

    assert decrypt_digest(digest, password) == message
```

## Implementation

OK, time to see the library code itself.

> [!TIP]
> I like to use [MyPy](http://mypy-lang.org/) for type hinting.

```
import scrypt

from typing import Union


class ArgumentError(Exception):
    pass


def generate_digest(message: str,
                    password: str = None,
                    maxtime: Union[float, int] = 0.5,
                    salt: str = "",
                    length: int = 64) -> bytes:
    """Multi-arity function for generating a digest.

    Use KDF symmetric encryption given a password.
    Use deterministic hash function given a salt (or lack of password).
    """

    if password and salt:
        raise ArgumentError("only provide a password or a salt, not both")

    if salt != "" and len(salt) < 16:
        raise ArgumentError("salts need to be minimum of 128bits (~16 characters)")

    if password:
        return scrypt.encrypt(message, password, maxtime=maxtime)
    else:
        return scrypt.hash(message, salt, buflen=length)


def decrypt_digest(digest: bytes,
                   password: str,
                   maxtime: Union[float, int] = 0.5) -> bytes:
    """Decrypts digest using given password."""

    return scrypt.decrypt(digest, password, maxtime)


def validate_digest(digest: bytes,
                    password: str,
                    maxtime: Union[float, int] = 0.5) -> bool:
    """Validate digest using given password."""

    try:
        scrypt.decrypt(digest, password, maxtime)
        return True
    except scrypt.error:
        return False
```

## Conclusion

Let me know what you think on twitter. Have fun.
