- KDF or PBKDF2
- decrypt_digest and validate_digest
I recently implemented a Python library which acts as an abstraction layer on top of an existing security algorithm (in this case scrypt).
The motivation was for allowing teams to have a consistent experience utilising encryption (and hashing) in their applications and services without necessarily having to know the ins-and-outs of what’s important with regards to salts, key lengths etc.
Note: I always encourage people to understand what it is they’re doing, but in some cases that’s not always a practical mindset.
The library provides three functions:
But before we get into it... time for some self-promotion 🙊
KDF or PBKDF2 ?
Before we start looking at the three functions provided by this library/interface, let’s very briefly talk about KDF and PBKDF2.
A KDF (Key Derivation Function) accepts a message + a key, and produces a digest for its output. They are designed to be more computationally intensive than standard hashing functions, and so they make it harder to use dictionary or rainbow table style attacks (as they would require a lot of extra memory resources and become more unfeasible as an attack vector).
By default the KDF will generate a random salt (thus output is non-deterministic) and have a maximum computational time of
0.5 (although this can be overridden using a
maxtime argument, as we’ll see later).
A PBKDF2 on the other hand is able to provide deterministic output (as well as the ability to specify an explicit salt value). The internal implementation will repeat its process multiple times, thus reducing the feasibility of automated password cracking attempts (similar to a KDF).
I mention both of these (KDF and PBKDF2) because the
generate_digest function I’ve written is a multi-arity function that will switch implementation based upon the provided arguments in the method signature.
Originally I had two separate functions to distinguish them a bit more clearly but realised if this library is to make life easier for developers who don’t understand encryption or hashing concepts, then I need to provide a single function that intelligently handles things internally.
Because KDF accepts a key and is able to return the original message (given the same key) it’s acting as a form of symmetrical encryption, whereas a PBKDF2 is more like a one-way hash function. Hence I named the function in this library
generate_digest rather than something like
encrypt_message which wouldn’t have made sense when dealing with PBKDF2.
password argument is provided, then KDF will be used (along with a random salt) to generate a non-deterministic digest.
salt is provided, then a PBKDF2 will be used to generate a deterministic digest.
Note: salts should be a minimum of 128bits (~16 characters) in length. Also, when specifying a maxtime with
generate_digest, ensure you include that same value when decrypting with
decrypt_digestor validating via
decrypt_digest and validate_digest
validate_digest functions only apply to digests that have been generated using a password (i.e. KDF). Given the right password
decrypt_digest will return the original message, and thus is considered more a form of symmetrical encryption than a straight one-way hash function. The
validate_digest function will return a boolean true or false if the given password was able to decrypt the message.
This abstraction library requires
scrypt, which itself requires the following dependencies to be installed within the context of your service:
python-dev. If your service has a Dockerfile, adding these dependencies should be as simple as adding a line like the following:
RUN apt-get update && apt-get install -y build-essential libssl-dev python-dev
I suggest looking at the test suite (see below) to get an idea of how you would use the functions in this library.
Note: for a glossary of security terms, refer to this document.
Before we look at the implementation of the library, let’s take a moment to sift through its test suite.
Note: I named the library
secureand have it running on a private PyPy instance. This code is made available via GitHub.
import pytest from secure.interface import ArgumentError, generate_digest, validate_digest, decrypt_digest message = "my-message" password = "my-password" salt = "my-salt-is-long-enough" def test_generate_digest_with_both_a_password_and_a_salt(): """Providing both a password and a salt should raise an exception.""" with pytest.raises(ArgumentError): generate_digest(message, salt=salt, password=password) def test_generate_digest_with_a_password(): """Generating a digest with a password should be non-deterministic.""" digest1 = generate_digest(message, password=password) digest2 = generate_digest(message, password=password) digest3 = generate_digest(message, password=password, maxtime=1.5) digest4 = generate_digest(message, password=password, maxtime=1.5) digest5 = generate_digest(message, password=password, maxtime=int(1)) digest6 = generate_digest(message, password=password, maxtime=int(1)) assert digest1 != digest2 assert digest3 != digest4 assert digest5 != digest6 def test_generate_digest_without_a_password(): """Generating a digest without a password should be deterministic.""" digest1 = generate_digest(message) digest2 = generate_digest(message) digest3 = generate_digest(message, salt=salt) digest4 = generate_digest(message, salt=salt) digest5 = generate_digest(message, length=128) digest6 = generate_digest(message, length=128) assert digest1 == digest2 assert digest3 == digest4 assert len(digest5) == len(digest6) def test_generate_digest_with_different_salt_lengths(): """Salts should be at least 128bits (~16 characters) in length.""" generate_digest(message, salt=salt) with pytest.raises(ArgumentError): generate_digest(message, salt="too-short") def test_validate_digest(): """Validation only applies to digests generated with a password.""" digest1 = generate_digest(message, password=password) digest2 = generate_digest(message, password=password) digest3 = generate_digest(message, password=password, maxtime=1.5) digest4 = generate_digest(message, password=password, maxtime=1.5) digest5 = generate_digest(message, password=password, maxtime=int(1)) digest6 = generate_digest(message, password=password, maxtime=int(1)) assert not validate_digest(digest1, 'incorrect-password') assert validate_digest(digest1, password) assert validate_digest(digest3, password, maxtime=1.5) assert validate_digest(digest5, password, maxtime=int(1)) def test_decrypt_digest(): """Decryption is possible given the right password.""" digest = generate_digest(message, password=password) assert decrypt_digest(digest, password) == message
OK, time to see the library code itself.
Note: I like to use MyPy for type hinting.
import scrypt from typing import Union class ArgumentError(Exception): pass def generate_digest(message: str, password: str = None, maxtime: Union[float, int] = 0.5, salt: str = "", length: int = 64) -> bytes: """Multi-arity function for generating a digest. Use KDF symmetric encryption given a password. Use deterministic hash function given a salt (or lack of password). """ if password and salt: raise ArgumentError("only provide a password or a salt, not both") if salt != "" and len(salt) < 16: raise ArgumentError("salts need to be minimum of 128bits (~16 characters)") if password: return scrypt.encrypt(message, password, maxtime=maxtime) else: return scrypt.hash(message, salt, buflen=length) def decrypt_digest(digest: bytes, password: str, maxtime: Union[float, int] = 0.5) -> bytes: """Decrypts digest using given password.""" return scrypt.decrypt(digest, password, maxtime) def validate_digest(digest: bytes, password: str, maxtime: Union[float, int] = 0.5) -> bool: """Validate digest using given password.""" try: scrypt.decrypt(digest, password, maxtime) return True except scrypt.error: return False
Let me know what you think on twitter. Have fun.
But before we wrap up... time (once again) for some self-promotion 🙊