---
title: Hashing, Encryption and Encoding
date: 2018-02-16
description: A concise reference explaining the differences between hashing, encryption, and encoding with practical examples.
tags: [security]
---

## Introduction

I've [written previously](/posts/security-basics/) (and in-depth) on the subject
of security basics, using tools such as GPG, OpenSSH, OpenSSL, and Keybase. But
this time I wanted to focus in on the differences between encryption and
hashing, whilst also providing a slightly more concise reference point for those
already familiar with these concepts.

## Terminology

OK, so using the correct terminology is essential and helps us to be explicit
and clear with what we really mean.

- **hash function**:\
  calculates a deterministic, irreversible, fixed-size alphanumeric string (based on input).

- **message**:\
  a message _is_ the data (e.g. the 'input' provided to a hash function).

- **digest**:\
  the hexidecimal output generated by a hash function (contextually referred to as "checksum" or "fingerprint").

- **symmetric algorithm**:\
  a cryptographic algorithm that uses the same key to encrypt and decrypt data.

- **asymmetric algorithm**:\
  a form of encryption where keys come in pairs (what one key encrypts, only the other can decrypt).

- **integrity**:\
  the message transported has not been tampered with or altered.

- **confidentiality**:\
  the communication between trusted parties is confidential.

- **authenticity**:\
  the communication is with who you expect it to be (not a man-in-the-middle).

> [!TIP]
> For a longer "Security Glossary", please see [this Google
> doc](https://docs.google.com/document/d/1qs3jEIQvocdVhSxCSPLF1BoLnp91aLnuUIasvl-maYo/edit?usp=sharing)
> I created.

## Hashing vs Encryption

In essence:

- **hashing**: provides _integrity_.
- **encryption**: provides _confidentiality_.

Often cryptographic primitives need to be combined. For example, [public-key
cryptography](/posts/security-basics/#public-key-cryptography) uses RSA (a slow,
but very secure algorithm) for _communicating_ securely, while internally using
AES (a faster, but less secure algorithm †) for _encrypting_ data with a shared
key, while using a hash function for generating a message digest to ensure both
parties can verify the _integrity_ of the payload sent/received.

> [!INFO]
> † less secure in the sense that you have to share a secret key with the person
> you wish to communicate with, but that's what public-key cryptography helps to
> secure.

### Why use a hash function?

Hash functions (or more specifically their output: _digests_) can be used for
many things, like indexing data in a hash table, fingerprinting (i.e. detecting
duplicate data or uniquely identifying files), or as a checksum (i.e. detecting
data corruption).

Message authentication (i.e. message _integrity_) involves hashing the message
to produce a digest and encrypting the digest with the private key to produce a
digital signature.

In order to verify this 'signature' the recipient of the encrypted message would
need to compute a hash of the message, then decrypting the signer's public key
and comparing the computed digest against the decrypted digest sent within the
encrypted message.

If the digest you generated is the same as the decrypted digest, then we can be
sure the message was delivered unmodified whilst in transit (e.g.
'man-in-the-middle').

## Base64 Encoding

Base64 is a way of taking binary data and transforming it into a text-based
format. It is commonly used when there is a need to transfer the binary data
over a medium that only supports textual data (e.g. you can Base64 encode images
so they can be inlined into HTML).

How it works: Base64 encoding takes three bytes, each consisting of eight bits,
and represents them as four printable characters in the ASCII standard.

> [!WARNING]
> Base64 encoded strings are NOT secure.\
> Remember, it _encodes_ data, not _encrypt_ it.

## MAC vs HMAC

A 'MAC' (Message Authentication Code) uses symmetrical cryptography with an
encryption algorithm (such as AES †) to verify the integrity of a message,
whereas a 'HMAC' will use a hash function (such as SHA256) internally instead of
an encryption algorithm.

> [!INFO]
> † encryption algorithms: AES (Advanced Encryption Standard), Blowfish, DES
> (Data Encryption Standard), Triple DES, Serpent, and Twofish.

Below is an example HMAC written in [Bash](https://www.gnu.org/software/bash/)
and using the [OpenSSL](https://www.openssl.org/) command-line tool.

```
function hmac {
  digest="$1"
  data="$2"
  key="$3"
  shift 3
  echo -n "$data" | openssl dgst "-$digest" -hmac "$key" "$@"
}
```

The way you would use it is as follows:

```
hmac sha256 "message to be hashed" secret-key
```

> [!TIP]
> you can swap `sha256` for any supported digest algorithm (see `openssl dgst -h` for details).

Which would generate the digest output:

```
44db14fe496c4bc4af5e8e6e3683e5db7acffa555897cf4b2b4345abaaf1ace3
```

Now because the implementation is using the `openssl` command, you can also
choose to convert the hexidecimal output into binary and then Base64 encode that
binary output, like so:

```
hmac sha256 "message to be hashed" secret-key -binary | base64
```

Which outputs:

```
RNsU/klsS8SvXo5uNoPl23rP+lVYl89LK0NFq6rxrOM=
```

You don't have to use an abstraction around the command obviously, you can just
use:

```
cat plaintext.txt | openssl dgst -sha512 -binary | base64
```

> [!TIP]
> `base64` could be replaced with openssl's base64 encoding command:
> `openssl enc -base64 -A`

## Random Password Generation

Generating random passwords that are complex enough to make automated attacks
difficult can be a bit tedious, yet important. But if you install a program such
as `pwgen` (`brew install pwgen`) you'll be able to generate random and complex
passwords very easily.

Once installed, add the following alias to your shell:

```
alias psw="pwgen -sy 20 1"
```

Now when you execute `psw` you'll get output that looks something like the
following:

```
|93<3(M;r?~40c$A@>{\
```

## Hash Functions

There are many different ways of accessing a hash function, two options we'll
look at will be using the executable `shasum` (provided by macOS) and the
`hashlib` package provided by the [Python](https://www.python.org/) programming
language.

### shasum

Let's generate a hexidecimal digest of the message `foobar` using the SHA512
hash algorithm:

```
echo -n foobar | shasum -a 512
```

> [!TIP]
> see `shasum -h` for all available algorithms.

Which outputs:

```
0a50261ebd1a390fed2bf326f2673c145582a6342d523204973d0219337f81616a8069b012587cf5
635f6925f1b56c360230c19b273500ee013e030601bf2425
```

### hashlib

Let's again generate a hexidecimal digest of the message `foobar` using the
SHA512 hash algorithm, now using Python:

```
import hashlib
message = hashlib.sha512()
message.update(b"foobar")
print(message.hexdigest())
```

Which outputs the same digest as the `shasum` command produced:

```
0a50261ebd1a390fed2bf326f2673c145582a6342d523204973d0219337f81616a8069b012587cf5
635f6925f1b56c360230c19b273500ee013e030601bf2425
```

### cksum

Remember hash functions generate a digest of some message input, and one such
use of that digest output is data corruption (i.e. a checksum).

The macOS also provides a `cksum` command which let's you generate a checksum,
like so:

```
echo foobar | cksum
```

Which outputs:

```
857691210 7
```

The first number is the checksum and the second number is the amount of data in
bytes.

## OpenSSH

OpenSSH provides secure and encrypted tunneling capabilities and is typically
used to enable secure shell connections from your machine to external servers.

In order to generate a cryptographically secure key pair, execute the following
command:

```
ssh-keygen -t rsa -b 4096 -C "your.email@domain.com"
```

This uses the RSA algorithm (which is the default, so the `-t` can be omitted)
along with a key size of 4096 bits (the default is 2048).

The output of this command will be a public and private key pair.

It's usually best to generate these keys (or at least move them when generated)
within the `~/.ssh` directory.

### SSH Agent

One thing that catches me out all the time is when I open a new terminal tab or
shell instance and I go to push up some code changes to a remote server only to
discover an error saying I'm not authenticated. This is because the new
terminal/shell instance doesn't have the SSH agent running which is what makes
my SSH key pair available.

This happens so often I've created an alias to make starting up the SSH agent
and loading my SSH private key very quick and easy:

```
alias sshagent='eval "$(ssh-agent -s)" && ssh-add -K ~/.ssh/github_rsa'
```

> [!INFO]
> the use of the `-K` flag is macOS specific, it means it'll add the key
> into the macOS keychain program.

## OpenSSL

OpenSSL is designed to provide a method for securing web based communication
(think HTTPS/TLS/SSL).

> [!TIP]
> for a full list of commands see: `openssl -h` and `openssl <command> -h`.

### Key Exchanges

There are two popular key exchange algorithms:

1. RSA
1. Diffie-Hellman

For the specific details of each I recommend you read [this post on the
differences](https://technet.microsoft.com/en-us/library/cc962035.aspx). In
short RSA uses the person's public key to encrypt the secret, while
Diffie-Hellman uses a mathematical function to ensure only those two people
communicating can calculate the secret based on the information that's publicly
available.

### Generating a key pair

In order to generate a RSA based public/private key pair, execute the following
commands:

```
# generate a private key
openssl genrsa -out private_key.pem 4096

# generate a public key, from the private key
openssl rsa -pubout -in private_key.pem -out public_key.pem
```

### Encrypting and Decrypting

The following examples use symmetric encryption, and so you'll be asked for a
secret key when encrypting and decrypting (although you could also use the
`-pass` flag like so `-pass pass:<your_password>`, yeah the syntax is odd and
it's the same for decrypting):

```
# symmetric encryption (you'll be asked for a key)
echo foobar | openssl enc -aes-256-cbc -out message.enc

# decrypt that encrypted message
openssl enc -aes-256-cbc -in message.enc -d
```

> [!TIP]
> `.enc` is a commonly used format to indicate a file is encrypted (`.asc`
> is specifically used for asymmetric encryption).

I'm passing in the message via stdin (when encrypting), but specifying a file
for the output (when decrypting), but you could use a file for both by
explicitly specifying the `-in` and `-out` flags to provide a text file instead.

Annoyingly with `openssl` the same thing can be done a million different ways,
so (for example) you might also find that you can do the above _without_ the
`enc` portion of the command (and thus removing the `-` prefix from the selected
algorithm):

```
# symmetric encryption
echo foobar | openssl aes-256-cbc -out message.enc

# decrypt that encrypted message
openssl aes-256-cbc -in message.enc -d
```

#### Encoding

You can also generate Base64 output of the encrypted data, by using the `-a`
flag like so:

```
$ echo foobar | openssl aes-256-cbc -a

U2FsdGVkX19/L0WtkvCNlpMiQnvD1SWGM19lm4m6xK4=
```

> [!INFO]
> see `man enc` for details

#### Salts

It's also worth mentioning that the default behaviour for OpenSSL is to use a
'salt' when using encrypting the message. A salt is random data appended to your
already hashed message and then that is hashed itself. In pseudo-code it would
look like this:

```
$pwd = hash(hash($password) + salt)
```

You would then store the value of `$pwd` in your database along with the salt
itself.

The security doesn't come from obfuscating the salt, but more that a rainbow
table attack can't now automatically loop/check its collection of hashed
passwords. An attacker would need to incorporate your (per-user) unique salt
value into their check against a predetermined list of hashes, and they also
wouldn't know if the salt was prefixed or suffixed to the password itself.
Making it computationally very expensive and time consuming to attempt.

You can also see that a salt is used by trying to read an encrypted file (`cat message.enc`):

```
Salted__MJin¨MàÍ£?è,random¡:~randomW!5µõ
```

#### Asymmetrical Encryption

If you need to you can use a public key to encrypt data with (i.e. asymmetrical
encryption) by utilising the `openssl rsautl` command, which stands for "RSA
Utility" and is commonly used to sign, verify, encrypt and decrypt data using
the RSA algorithm.

In the following example we have a file `plaintext.txt` we encrypt using a
public key. It will now only be possible to decrypt the `secret.enc` file if you
have the corresponding private key:

```
# encrypting
openssl rsautl -encrypt -pubin -inkey public_key.pem -in plaintext.txt -out secret.enc

# decrypting
openssl rsautl -decrypt -inkey private_key.pem -in secret.enc
```

### Randomness

OpenSSL also offers a way to generate random binary data which you can then
export as either hexidecimal or base64 formats:

> [!TIP]
> in the following examples, `64` is the number of bytes to be generated.

```
$ openssl rand 64

RR_wK[=q5}VrdMܾj{8(Ty]7;file://Integralist-MBPr/tmp

$ openssl rand 64 -hex

660baf33c189ced722a07c6a29d35a7e4584bb954c8c86f2cfd4ea8d892bff32fc188b0c56cbe0a5
6d60b628cdee697308b0cf3806cd95052b743bec5ccc5240

$ openssl rand 64 -base64

JIPU5SiCgKP3XVrnef1gY+PxjBvjdQgSN+OJoBAdWmCa/cRvDdFl01GQiSwFimQ5
1lVa/7hfYIK6Z5jjHNauaQ==
```

## GPG

GPG is a tool which provides encryption and signing capabilities, and supports
both symmetrical and asymmetrical encryption + digital signing of your encrypted
content to ensure the integrity.

### Generating a key pair

To generate a new GPG key pair you would execute the following command and
interactively fill in the details:

```
gpg --gen-key
```

### Automate

If you prefer to automate this you can create a file to contain the details and
pass that into the command-line instead. The following code generates a new
`batch_file` that will contain the information we would otherwise have to enter
manually:

```
$ cat > batch_file <<EOF
     %echo Generating a basic OpenPGP key
     Key-Type: RSA
     Key-Length: 4096
     Subkey-Type: Default
     Name-Real: Your Name
     Name-Comment: Integralist testing
     Name-Email: foo@example.com
     Expire-Date: 0
     Passphrase: foobar
     %commit
     %echo done
EOF
```

Once we have this file we can pass it along with the `--gen-key` command:

```
$ gpg --gen-key --batch batch_file

gpg: Generating a basic OpenPGP key
gpg: key 4BCAEAAD199B5FE8 marked as ultimately trusted
gpg: directory '/Users/Integralist/.gnupg/openpgp-revocs.d' created
gpg: revocation certificate stored as '/Users/Integralist/.gnupg/openpgp-revocs.d/\
  CFE96536285D83C990567BF64BCAEAAD199B5FE8.rev'
gpg: done
```

Now if we check our list of keys we'll see the new one we just generated:

```
$ gpg --list-keys

/Users/Integralist/.gnupg/pubring.gpg
---------------------------------------
pub   rsa4096 2018-02-17 [SCEA]
      CFE96536285D83C990567BF64BCAEAAD199B5FE8
uid           [ultimate] Your Name (Integralist testing) <foo@example.com>
sub   rsa2048 2018-02-17 [E]
```

### Revocation

When you generate a new key pair, if you intend on publishing your public key
online, then you'll want to generate a revocation certificate. Doing this will
mean you can revoke your original key pair if your private key becomes
compromised (or you just want to decommission it):

```
gpg --gen-revoke your.email@domain.com
```

When you're ready to decommission it, just import the certifcate into your
keyring:

```
gpg --import revocation.cert
```

You can then also push up your key identifier to a key server to force it to
recognise the key has been revoked:

```
gpg --keyserver pgp.mit.edu --send-keys <key_id>
```

### Asymmetrical Encryption and Decryption

In order to encrypt some data using someone elses public key (i.e. so only they
can decrypt the data) you first need access to their public key and have it
imported to your gpg keyring:

```
gpg --import public.key
```

If you want to verify the integrity of the public key you have acquired, then
you should speak securely with the recipient who owns the public key and ask
them to give you their digital 'fingerprint'. You can then verify it matches
what you have using the following command:

```
gpg --fingerprint <pub_key_id>
```

You'll then look for the fingerprint in the gpg output. The fingerprint should
look something like this:

```
FDFB E9B5 24BA 6972 A3AA 44B9 A1B1 7E6F DD86 E7F5
```

The command for encrypting a file `plaintext.txt` using their public key would
be:

```
gpg --encrypt -u "Sender User Name" -r "Receiver User Name" plaintext.txt
```

As you've encrypted the file using that person's public key, it means they can
decrypt the file simply with:

```
gpg -d plaintext.txt.gpg
```

### Symmetrical Encryption and Decryption

By default gpg uses the AES algorithm for its symmetrical encryption. The
command to use is (you'll be asked to provide a passphrase):

```
gpg --symmetric plaintext.txt
```

You can specify a different algorithm, as the default isn't as secure as it
could be. Let's use a 256bit encryption key:

```
gpg --symmetric --cipher-algo AES256 plaintext.txt
```

> [!TIP]
> see `gpg --version` for all available ciphers

### Signing keys

If you want to explicitly trust a public key you have imported, you can 'sign'
it. You do this using the `--sign-key` flag. Doing this can also be beneficial
for the owner of that public key (Bob), because if a friend of yours (Alice)
trusts _you_ and they see you've signed Bob's public key, then Alice is more
likely to trust Bob as well.

In order for Bob to benefit from this 'web of trust' you need to send him back
his public key which you signed. Bob would need to import that version of his
public key back into his gpg keyring, so that he can then republish it online
for others to see the _you_ trust him.

The following example demonstrates how you would export Bob's public key, which
you previously imported and signed:

```
gpg --export --armor bob@example.org
```

> [!INFO]
> `--armor` simply outputs the binary data as ASCII

### Signing encrypted files

It can be useful to sign a file that you encrypt, so that the person who will
decrypt the file can verify it was you who sent it to them, and also check that
the integrity of the file is still intact.

> [!INFO]
> this provides a combination of _authenticity_ and _integrity_ (as
> defined within the [terminology section](#1))

You do this by using the `--sign` flag:

```
gpg --local-user Bob --encrypt --recipient Alice --sign plaintext.txt
```

> [!INFO]
> I'm using `--local-user` because I have many different key pairs setup
> for testing.

This will generate a `plaintext.txt.gpg` encrypted file.

The recipient (Alice), can either decrypt the file using Bob's public key and
this will both decrypt and verify the signature, or Alice could just use the
`--verify` flag if she didn't want to decrypt the file.

```
$ gpg --verify plaintext.txt.gpg

gpg: Signature made Mon Feb 19 10:16:38 2018 GMT
gpg:                using RSA key F2G91BE243E405E5B64B08A1CB5EBDB2561C861B
gpg: Good signature from "Bob <bob@example.com>" [ultimate]
```

## Keybase

[Keybase](https://keybase.io/) is a public-key directory that maps social media
identities to encryption keys in a publicly auditable manner. Keybase offers an
end-to-end encrypted chat and cloud storage system, called Keybase Chat and the
Keybase filesystem.

In order to use the command-line tool `keybase` you'll need to register for an
account on their website.

To install keybase on macOS:

```
brew install keybase
```

Once installed you'll need to login:

```
keybase login
```

At this point you can either generate a fresh key pair or select an existing gpg
key pair:

```
# generate new key pair
keybase pgp gen

# select existing key pair
keybase pgp select
```

You can search for other keybase users:

```
keybase search sthulb
```

You can then encrypt data for another keybase user, like so:

```
keybase encrypt -i info.txt -o info.txt.asc sthulb
```

If you receive an encrypted file you can decrypt it, like so:

```
keybase decrypt -i info.txt.asc -o info.txt
```

If you receive an encrypted file (`info.txt.gpg`) using your keybase pub key but
the senders _not_ using keybase (e.g. they've encrypted the file using their own
gpg private key), then you'll need to have their public key in your gpg keyring:

```
keybase pgp decrypt -i info.txt.gpg
```