Hashing, Encryption and Encoding ⋆ Mark McDonnell

Introduction

I’ve written previously (and in-depth) on the subject of security basics, using tools such as GPG, OpenSSH, OpenSSL, and Keybase. But this time I wanted to focus in on the differences between encryption and hashing, whilst also providing a slightly more concise reference point for those already familiar with these concepts.

Before we get started, let’s see what we’ll be covering:

Terminology
Hashing vs Encryption
MAC vs HMAC
Base64 Encoding
Random Password Generation
Hash Functions
- shasum
- hashlib
- cksum
OpenSSH
OpenSSL
- Generating a key pair
- Encrypting and Decrypting
- Randomness
GPG
- Generating a key pair
- Automate
- Asymmetrical Encryption and Decryption
- Symmetrical Encryption and Decryption
- Signing keys
- Signing encrypted files
Keybase

Terminology

OK, so using the correct terminology is essential and helps us to be explicit and clear with what we really mean.

hash function:
calculates a deterministic, irreversible, fixed-size alphanumeric string (based on input).
message:
a message is the data (e.g. the ‘input’ provided to a hash function).
digest:
the hexidecimal output generated by a hash function (contextually referred to as “checksum” or “fingerprint”).
symmetric algorithm:
a cryptographic algorithm that uses the same key to encrypt and decrypt data.
asymmetric algorithm:
a form of encryption where keys come in pairs (what one key encrypts, only the other can decrypt).
integrity:
the message transported has not been tampered with or altered.
confidentiality:
the communication between trusted parties is confidential.
authenticity:
the communication is with who you expect it to be (not a man-in-the-middle).

For a longer “Security Glossary”, please see this Google doc I created.

Hashing vs Encryption

In essence:

hashing: provides integrity.
encryption: provides confidentiality.

Often cryptographic primitives need to be combined. For example, public-key cryptography uses RSA (a slow, but very secure algorithm) for communicating securely, while internally using AES (a faster, but less secure algorithm †) for encrypting data with a shared key, while using a hash function for generating a message digest to ensure both parties can verify the integrity of the payload sent/received.

† less secure in the sense that you have to share a secret key with the person you wish to communicate with, but that’s what public-key cryptography helps to secure.

Why use a hash function?

Hash functions (or more specifically their output: digests) can be used for many things, like indexing data in a hash table, fingerprinting (i.e. detecting duplicate data or uniquely identifying files), or as a checksum (i.e. detecting data corruption).

Message authentication (i.e. message integrity) involves hashing the message to produce a digest and encrypting the digest with the private key to produce a digital signature.

In order to verify this ‘signature’ the recipient of the encrypted message would need to compute a hash of the message, then decrypting the signer’s public key and comparing the computed digest against the decrypted digest sent within the encrypted message.

If the digest you generated is the same as the decrypted digest, then we can be sure the message was delivered unmodified whilst in transit (e.g. ‘man-in-the-middle’).

Base64 Encoding

Base64 is a way of taking binary data and transforming it into a text-based format. It is commonly used when there is a need to transfer the binary data over a medium that only supports textual data (e.g. you can Base64 encode images so they can be inlined into HTML).

How it works: Base64 encoding takes three bytes, each consisting of eight bits, and represents them as four printable characters in the ASCII standard.

Note: Base64 encoded strings are NOT secure.
Remember, it encodes data, not encrypt it.

MAC vs HMAC

A ‘MAC’ (Message Authentication Code) uses symmetrical cryptography with an encryption algorithm (such as AES †) to verify the integrity of a message, whereas a ‘HMAC’ will use a hash function (such as SHA256) internally instead of an encryption algorithm.

† encryption algorithms: AES (Advanced Encryption Standard), Blowfish, DES (Data Encryption Standard), Triple DES, Serpent, and Twofish.

Below is an example HMAC written in Bash and using the OpenSSL command-line tool.

function hmac {
  digest="$1"
  data="$2"
  key="$3"
  shift 3
  echo -n "$data" | openssl dgst "-$digest" -hmac "$key" "$@"
}

The way you would use it is as follows:

hmac sha256 "message to be hashed" secret-key

Note: you can swap sha256 for any supported digest algorithm (see openssl dgst -h for details).

Which would generate the digest output:

44db14fe496c4bc4af5e8e6e3683e5db7acffa555897cf4b2b4345abaaf1ace3

Now because the implementation is using the openssl command, you can also choose to convert the hexidecimal output into binary and then Base64 encode that binary output, like so:

hmac sha256 "message to be hashed" secret-key -binary | base64

Which outputs:

RNsU/klsS8SvXo5uNoPl23rP+lVYl89LK0NFq6rxrOM=

You don’t have to use an abstraction around the command obviously, you can just use:

cat plaintext.txt | openssl dgst -sha512 -binary | base64

Note: base64 could be replaced with openssl’s base64 encoding command: openssl enc -base64 -A

Random Password Generation

Generating random passwords that are complex enough to make automated attacks difficult can be a bit tedious, yet important. But if you install a program such as pwgen (brew install pwgen) you’ll be able to generate random and complex passwords very easily.

Once installed, add the following alias to your shell:

alias psw="pwgen -sy 20 1"

Now when you execute psw you’ll get output that looks something like the following:

|93<3(M;r?~40c$A@>{\

Hash Functions

There are many different ways of accessing a hash function, two options we’ll look at will be using the executable shasum (provided by macOS) and the hashlib package provided by the Python programming language.

shasum

Let’s generate a hexidecimal digest of the message foobar using the SHA512 hash algorithm:

echo -n foobar | shasum -a 512

Note: see shasum -h for all available algorithms.

Which outputs:

0a50261ebd1a390fed2bf326f2673c145582a6342d523204973d0219337f81616a8069b012587cf5635f6925f1b56c360230c19b273500ee013e030601bf2425

hashlib

Let’s again generate a hexidecimal digest of the message foobar using the SHA512 hash algorithm, now using Python:

import hashlib
message = hashlib.sha512()
message.update(b"foobar")
print(message.hexdigest())

Which outputs the same digest as the shasum command produced:

0a50261ebd1a390fed2bf326f2673c145582a6342d523204973d0219337f81616a8069b012587cf5635f6925f1b56c360230c19b273500ee013e030601bf2425

cksum

Remember hash functions generate a digest of some message input, and one such use of that digest output is data corruption (i.e. a checksum).

The macOS also provides a cksum command which let’s you generate a checksum, like so:

echo foobar | cksum

Which outputs:

857691210 7

The first number is the checksum and the second number is the amount of data in bytes.

OpenSSH

OpenSSH provides secure and encrypted tunneling capabilities and is typically used to enable secure shell connections from your machine to external servers.

In order to generate a cryptographically secure key pair, execute the following command:

ssh-keygen -t rsa -b 4096 -C "your.email@domain.com"

This uses the RSA algorithm (which is the default, so the -t can be omitted) along with a key size of 4096 bits (the default is 2048).

The output of this command will be a public and private key pair.

It’s usually best to generate these keys (or at least move them when generated) within the ~/.ssh directory.

SSH Agent

One thing that catches me out all the time is when I open a new terminal tab or shell instance and I go to push up some code changes to a remote server only to discover an error saying I’m not authenticated. This is because the new terminal/shell instance doesn’t have the SSH agent running which is what makes my SSH key pair available.

This happens so often I’ve created an alias to make starting up the SSH agent and loading my SSH private key very quick and easy:

alias sshagent='eval "$(ssh-agent -s)" && ssh-add -K ~/.ssh/github_rsa'

Note: the use of the -K flag is macOS specific, it means it’ll add the key into the macOS keychain program.

OpenSSL

OpenSSL is designed to provide a method for securing web based communication (think HTTPS/TLS/SSL).

Note: for a full list of commands see: openssl -h and openssl <command> -h.

Key Exchanges

There are two popular key exchange algorithms:

RSA
Diffie-Hellman

For the specific details of each I recommend you read this post on the differences. In short RSA uses the person’s public key to encrypt the secret, while Diffie-Hellman uses a mathematical function to ensure only those two people communicating can calculate the secret based on the information that’s publicly available.

Generating a key pair

In order to generate a RSA based public/private key pair, execute the following commands:

# generate a private key
openssl genrsa -out private_key.pem 4096

# generate a public key, from the private key
openssl rsa -pubout -in private_key.pem -out public_key.pem

Encrypting and Decrypting

The following examples use symmetric encryption, and so you’ll be asked for a secret key when encrypting and decrypting (although you could also use the -pass flag like so -pass pass:<your_password>, yeah the syntax is odd and it’s the same for decrypting):

# symmetric encryption (you'll be asked for a key)
echo foobar | openssl enc -aes-256-cbc -out message.enc

# decrypt that encrypted message
openssl enc -aes-256-cbc -in message.enc -d

Note: .enc is a commonly used format to indicate a file is encrypted (.asc is specifically used for asymmetric encryption).

I’m passing in the message via stdin (when encrypting), but specifying a file for the output (when decrypting), but you could use a file for both by explicitly specifying the -in and -out flags to provide a text file instead.

Annoyingly with openssl the same thing can be done a million different ways, so (for example) you might also find that you can do the above without the enc portion of the command (and thus removing the - prefix from the selected algorithm):

# symmetric encryption
echo foobar | openssl aes-256-cbc -out message.enc

# decrypt that encrypted message
openssl aes-256-cbc -in message.enc -d

Encoding

You can also generate Base64 output of the encrypted data, by using the -a flag like so:

$ echo foobar | openssl aes-256-cbc -a

U2FsdGVkX19/L0WtkvCNlpMiQnvD1SWGM19lm4m6xK4=

Note: see man enc for details

Salts

It’s also worth mentioning that the default behaviour for OpenSSL is to use a ‘salt’ when using encrypting the message. A salt is random data appended to your already hashed message and then that is hashed itself. In pseudo-code it would look like this:

$pwd = hash(hash($password) + salt)

You would then store the value of $pwd in your database along with the salt itself.

The security doesn’t come from obfuscating the salt, but more that a rainbow table attack can’t now automatically loop/check its collection of hashed passwords. An attacker would need to incorporate your (per-user) unique salt value into their check against a predetermined list of hashes, and they also wouldn’t know if the salt was prefixed or suffixed to the password itself. Making it computationally very expensive and time consuming to attempt.

You can also see that a salt is used by trying to read an encrypted file (cat message.enc):

Salted__MJin¨MàÍ£?è,random¡:~randomW!5µõ

Asymmetrical Encryption

If you need to you can use a public key to encrypt data with (i.e. asymmetrical encryption) by utilising the openssl rsautl command, which stands for “RSA Utility” and is commonly used to sign, verify, encrypt and decrypt data using the RSA algorithm.

In the following example we have a file plaintext.txt we encrypt using a public key. It will now only be possible to decrypt the secret.enc file if you have the corresponding private key:

# encrypting
openssl rsautl -encrypt -pubin -inkey public_key.pem -in plaintext.txt -out secret.enc

# decrypting
openssl rsautl -decrypt -inkey private_key.pem -in secret.enc

Randomness

OpenSSL also offers a way to generate random binary data which you can then export as either hexidecimal or base64 formats:

Note: in the following examples, 64 is the number of bytes to be generated.

$ openssl rand 64

RR_wK[=q5}VrdMܾj{8(Ty]7;file://Integralist-MBPr/tmp

$ openssl rand 64 -hex

660baf33c189ced722a07c6a29d35a7e4584bb954c8c86f2cfd4ea8d892bff32fc188b0c56cbe0a56d60b628cdee697308b0cf3806cd95052b743bec5ccc5240

$ openssl rand 64 -base64

JIPU5SiCgKP3XVrnef1gY+PxjBvjdQgSN+OJoBAdWmCa/cRvDdFl01GQiSwFimQ5
1lVa/7hfYIK6Z5jjHNauaQ==

GPG

GPG is a tool which provides encryption and signing capabilities, and supports both symmetrical and asymmetrical encryption + digital signing of your encrypted content to ensure the integrity.

Generating a key pair

To generate a new GPG key pair you would execute the following command and interactively fill in the details:

gpg --gen-key

Automate

If you prefer to automate this you can create a file to contain the details and pass that into the command-line instead. The following code generates a new batch_file that will contain the information we would otherwise have to enter manually:

$ cat > batch_file <<EOF
     %echo Generating a basic OpenPGP key
     Key-Type: RSA
     Key-Length: 4096
     Subkey-Type: Default
     Name-Real: Your Name
     Name-Comment: Integralist testing
     Name-Email: foo@example.com
     Expire-Date: 0
     Passphrase: foobar
     %commit
     %echo done
EOF

Once we have this file we can pass it along with the --gen-key command:

$ gpg --gen-key --batch batch_file

gpg: Generating a basic OpenPGP key
gpg: key 4BCAEAAD199B5FE8 marked as ultimately trusted
gpg: directory '/Users/Integralist/.gnupg/openpgp-revocs.d' created
gpg: revocation certificate stored as '/Users/Integralist/.gnupg/openpgp-revocs.d/CFE96536285D83C990567BF64BCAEAAD199B5FE8.rev'
gpg: done

Now if we check our list of keys we’ll see the new one we just generated:

$ gpg --list-keys

/Users/Integralist/.gnupg/pubring.gpg
---------------------------------------
pub   rsa4096 2018-02-17 [SCEA]
      CFE96536285D83C990567BF64BCAEAAD199B5FE8
uid           [ultimate] Your Name (Integralist testing) <foo@example.com>
sub   rsa2048 2018-02-17 [E]

Revocation

When you generate a new key pair, if you intend on publishing your public key online, then you’ll want to generate a revocation certificate. Doing this will mean you can revoke your original key pair if your private key becomes compromised (or you just want to decommission it):

gpg --gen-revoke your.email@domain.com

When you’re ready to decommission it, just import the certifcate into your keyring:

gpg --import revocation.cert

You can then also push up your key identifier to a key server to force it to recognise the key has been revoked:

gpg --keyserver pgp.mit.edu --send-keys <key_id>

Asymmetrical Encryption and Decryption

In order to encrypt some data using someone elses public key (i.e. so only they can decrypt the data) you first need access to their public key and have it imported to your gpg keyring:

gpg --import public.key

If you want to verify the integrity of the public key you have acquired, then you should speak securely with the recipient who owns the public key and ask them to give you their digital ‘fingerprint’. You can then verify it matches what you have using the following command:

gpg --fingerprint <pub_key_id>

You’ll then look for the fingerprint in the gpg output. The fingerprint should look something like this:

FDFB E9B5 24BA 6972 A3AA 44B9 A1B1 7E6F DD86 E7F5

The command for encrypting a file plaintext.txt using their public key would be:

gpg --encrypt -u "Sender User Name" -r "Receiver User Name" plaintext.txt

As you’ve encrypted the file using that person’s public key, it means they can decrypt the file simply with:

gpg -d plaintext.txt.gpg

Symmetrical Encryption and Decryption

By default gpg uses the AES algorithm for its symmetrical encryption. The command to use is (you’ll be asked to provide a passphrase):

gpg --symmetric plaintext.txt

You can specify a different algorithm, as the default isn’t as secure as it could be. Let’s use a 256bit encryption key:

gpg --symmetric --cipher-algo AES256 plaintext.txt

Note: see gpg --version for all available ciphers

Signing keys

If you want to explicitly trust a public key you have imported, you can ‘sign’ it. You do this using the --sign-key flag. Doing this can also be beneficial for the owner of that public key (Bob), because if a friend of yours (Alice) trusts you and they see you’ve signed Bob’s public key, then Alice is more likely to trust Bob as well.

In order for Bob to benefit from this ‘web of trust’ you need to send him back his public key which you signed. Bob would need to import that version of his public key back into his gpg keyring, so that he can then republish it online for others to see the you trust him.

The following example demonstrates how you would export Bob’s public key, which you previously imported and signed:

gpg --export --armor bob@example.org

Note: --armor simply outputs the binary data as ASCII

Signing encrypted files

It can be useful to sign a file that you encrypt, so that the person who will decrypt the file can verify it was you who sent it to them, and also check that the integrity of the file is still intact.

Note: this provides a combination of authenticity and integrity (as defined within the terminology section)

You do this by using the --sign flag:

gpg --local-user Bob --encrypt --recipient Alice --sign plaintext.txt

Note: I’m using --local-user because I have many different key pairs setup for testing.

This will generate a plaintext.txt.gpg encrypted file.

The recipient (Alice), can either decrypt the file using Bob’s public key and this will both decrypt and verify the signature, or Alice could just use the --verify flag if she didn’t want to decrypt the file.

$ gpg --verify plaintext.txt.gpg

gpg: Signature made Mon Feb 19 10:16:38 2018 GMT
gpg:                using RSA key F2G91BE243E405E5B64B08A1CB5EBDB2561C861B
gpg: Good signature from "Bob <bob@example.com>" [ultimate]

Keybase

Keybase is a public-key directory that maps social media identities to encryption keys in a publicly auditable manner. Keybase offers an end-to-end encrypted chat and cloud storage system, called Keybase Chat and the Keybase filesystem.

In order to use the command-line tool keybase you’ll need to register for an account on their website.

To install keybase on macOS:

brew install keybase

Once installed you’ll need to login:

keybase login

At this point you can either generate a fresh key pair or select an existing gpg key pair:

# generate new key pair
keybase pgp gen

# select existing key pair
keybase pgp select

You can search for other keybase users:

keybase search sthulb

You can then encrypt data for another keybase user, like so:

keybase encrypt -i info.txt -o info.txt.asc sthulb

If you receive an encrypted file you can decrypt it, like so:

keybase decrypt -i info.txt.asc -o info.txt

If you receive an encrypted file (info.txt.gpg) using your keybase pub key but the senders not using keybase (e.g. they’ve encrypted the file using their own gpg private key), then you’ll need to have their public key in your gpg keyring:

keybase pgp decrypt -i info.txt.gpg