Posted on 13 mins read

Before we get started, here’s a diagram to give you an overview of what we will be covering in this post:

One thing I want to clarify in the above image is that each box is within the parent box. So for example, The “Program: terminal” is interacting with a contained shell, while the shell then spawns either a “Built-in” or “Program”.

NOTE: Nearly four years after this post was published someone else published https://www.poor.dev/blog/terminal-anatomy/ which is an excellent write up that provides a really good animation of how the terminal interacts with the shell, and it goes into much deeper explanation that I do here. So I recommend reading that also.

Kernel

A computer has a kernel.

The kernel is responsible for managing the computer’s system.

The kernel has no user interface.

To interact with the kernel you use an intermediary “program”.

Program

A “program” is a structured collection of instructions (machine code) that a computer can execute.

Your computer has many programs. One such example would be the ’terminal emulator’ program.

Note: see next section for explanation of what a “terminal” is.

Depending on the programming language used to create the program, either the program is compiled down into binary so it can be understood by the computer, or it’ll be interpreted by another program that then generates machine code out of the human readable program.

Executables

Executables (or ’executable binaries’) are programs.

More specifically, an ’executable’ is a file that contains a program.

Note: these are also often referred to as just ‘binaries’.

Executables are generally the result of a program being turned into something that can be ’executed’ by the computer.

Executables can be found in multiple locations, e.g. look at the $PATH environment variable in a terminal.

$ echo $PATH

/usr/local/sbin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin

Note: separate the output by : and you see there are six directories

Terminal

A terminal is an input/output device.

Traditionally terminals would have been a real ‘hardware’ device that you used to interact with a computer.

e.g. the computer would be a large box in a server room, and the terminal would be a monitor/keyboard connected to the computer.

In modern computing we have electronic terminals.

The modern equivalent of a terminal is known as a ’terminal emulator’.

Terminal emulators ironically (or confusingly) are part of the computer they would have previously been plugged into separately.

If you don’t want to use a GUI (graphical user interface) to interact with your computer, you can use a terminal emulator.

Shell

A shell is a program which is accessed via a terminal emulator.

The terminal accepts input, passes it to the shell, and the shell’s output is sent back to the terminal to be displayed.

The shell accepts input as a set of commands.

The available commands vary depending on the shell (e.g. different shells have different commands).

In order to fulfil your instructions, commands are interpreted by the shell and the shell determines if it should either:

  • load another program

or

  • execute a ‘builtin’ function

Shell Builtins

A ‘builtin’ function is one that is provided by the shell.

If a command is provided and the shell has no corresponding builtin associated with the given command, it will lookup the command via a separate list of available external ’executables’.

A builtin command can affect the internal state of the shell.

This is why a command such as cd must be part of the shell (i.e. a builtin), because an external program can’t change the current directory of the shell.

Other commands, like echo, might be (and are in this case) built into the shell for the sake of performance (it’s quicker to call the builtin echo than it is to load and manage the external executable echo).

Documentation

Most people are aware of ‘manuals’.

e.g. man bash returns the documentation for the Bash shell.

Manuals do not cover shell builtins.

The exit command is a shell builtin, so what happens when looking up a manual for it?

e.g. man exit returns a generic ‘BUILTIN’ documentation page.

If you see that page, then you know the command is a shell builtin.

Another way to tell if a command is a builtin vs an executable is to use the type command.

$ type exit

exit is a shell builtin

One other reason I like to use type is when trying to figure out what a shell alias is set to (in case you’re unfamiliar, in most shells you can assign a long or hard to remember command to a short variable name). Imagine I’ve created an alias like so:

alias gb="git branch --list 'integralist*'"

I can find out later what I assigned to the alias using the type command:

$ type gb

gb is aliased to `git branch --list 'integralist*''

To read the documentation for a builtin, you need to use the help command:

$ help exit

exit: exit [n]
    Exit the shell.

    Exits the shell with a status of N.  If N is omitted, the exit status
    is that of the last command executed.

The help command is itself a builtin (hence it knows about builtins, unlike man which isn’t a builtin).

$ type help

help is a shell builtin

You can use the help command to read its documentation:

$ help help

help: help [-dms] [pattern ...]
    Display information about builtin commands.

    Displays brief summaries of builtin commands.  If PATTERN is
    specified, gives detailed help on all commands matching PATTERN,
    otherwise the list of help topics is printed.

    Options:
      -d        output short description for each topic
      -m        display usage in pseudo-manpage format
      -s        output only a short usage synopsis for each topic matching
                PATTERN

    Arguments:
      PATTERN   Pattern specifying a help topic

    Exit Status:
    Returns success unless PATTERN is not found or an invalid option is given.

If you run the help command by itself you’ll see a list of commands that can be passed to help (you’ll see in the list exit, hence why we could run help exit earlier):

$ help

GNU bash, version 5.0.18(1)-release (x86_64-apple-darwin19.5.0)
These shell commands are defined internally.  Type `help' to see this list.
Type `help name' to find out more about the function `name'.
Use `info bash' to find out more about the shell in general.
Use `man -k' or `info' to find out more about commands not in this list.

A star (*) next to a name means that the command is disabled.

 job_spec [&]                                                                                                           history [-c] [-d offset] [n] or history -anrw [filename] or history -ps arg [arg...]
 (( expression ))                                                                                                       if COMMANDS; then COMMANDS; [ elif COMMANDS; then COMMANDS; ]... [ else COMMANDS; ] fi
 . filename [arguments]                                                                                                 jobs [-lnprs] [jobspec ...] or jobs -x command [args]
 :                                                                                                                      kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
 [ arg... ]                                                                                                             let arg [arg ...]
 [[ expression ]]                                                                                                       local [option] name[=value] ...
 alias [-p] [name[=value] ... ]                                                                                         logout [n]
 bg [job_spec ...]                                                                                                      mapfile [-d delim] [-n count] [-O origin] [-s count] [-t] [-u fd] [-C callback] [-c quantum] [array]
 bind [-lpsvPSVX] [-m keymap] [-f filename] [-q name] [-u name] [-r keyseq] [-x keyseq:shell-command] [keyseq:readlin>  popd [-n] [+N | -N]
 break [n]                                                                                                              printf [-v var] format [arguments]
 builtin [shell-builtin [arg ...]]                                                                                      pushd [-n] [+N | -N | dir]
 caller [expr]                                                                                                          pwd [-LP]
 case WORD in [PATTERN [| PATTERN]...) COMMANDS ;;]... esac                                                             read [-ers] [-a array] [-d delim] [-i text] [-n nchars] [-N nchars] [-p prompt] [-t timeout] [-u fd] [name ...]
 cd [-L|[-P [-e]] [-@]] [dir]                                                                                           readarray [-d delim] [-n count] [-O origin] [-s count] [-t] [-u fd] [-C callback] [-c quantum] [array]
 command [-pVv] command [arg ...]                                                                                       readonly [-aAf] [name[=value] ...] or readonly -p
 compgen [-abcdefgjksuv] [-o option] [-A action] [-G globpat] [-W wordlist]  [-F function] [-C command] [-X filterpat>  return [n]
 complete [-abcdefgjksuv] [-pr] [-DEI] [-o option] [-A action] [-G globpat] [-W wordlist]  [-F function] [-C command]>  select NAME [in WORDS ... ;] do COMMANDS; done
 compopt [-o|+o option] [-DEI] [name ...]                                                                               set [-abefhkmnptuvxBCHP] [-o option-name] [--] [arg ...]
 continue [n]                                                                                                           shift [n]
 coproc [NAME] command [redirections]                                                                                   shopt [-pqsu] [-o] [optname ...]
 declare [-aAfFgilnrtux] [-p] [name[=value] ...]                                                                        source filename [arguments]
 dirs [-clpv] [+N] [-N]                                                                                                 suspend [-f]
 disown [-h] [-ar] [jobspec ... | pid ...]                                                                              test [expr]
 echo [-neE] [arg ...]                                                                                                  time [-p] pipeline
 enable [-a] [-dnps] [-f filename] [name ...]                                                                           times
 eval [arg ...]                                                                                                         trap [-lp] [[arg] signal_spec ...]
 exec [-cl] [-a name] [command [arguments ...]] [redirection ...]                                                       true
 exit [n]                                                                                                               type [-afptP] name [name ...]
 export [-fn] [name[=value] ...] or export -p                                                                           typeset [-aAfFgilnrtux] [-p] name[=value] ...
 false                                                                                                                  ulimit [-SHabcdefiklmnpqrstuvxPT] [limit]
 fc [-e ename] [-lnr] [first] [last] or fc -s [pat=rep] [command]                                                       umask [-p] [-S] [mode]
 fg [job_spec]                                                                                                          unalias [-a] name [name ...]
 for NAME [in WORDS ... ] ; do COMMANDS; done                                                                           unset [-f] [-v] [-n] [name ...]
 for (( exp1; exp2; exp3 )); do COMMANDS; done                                                                          until COMMANDS; do COMMANDS; done
 function name { COMMANDS ; } or name () { COMMANDS ; }                                                                 variables - Names and meanings of some shell variables
 getopts optstring name [arg]                                                                                           wait [-fn] [id ...]
 hash [-lr] [-p pathname] [-dt] [name ...]                                                                              while COMMANDS; do COMMANDS; done
 help [-dms] [pattern ...]                                                                                              { COMMANDS ; }

If we want to see the documentation for the type builtin, use help type:

$ help type

type: type [-afptP] name [name ...]
    Display information about command type.

    For each NAME, indicate how it would be interpreted if used as a
    command name.

    Options:
      -a        display all locations containing an executable named NAME;
                includes aliases, builtins, and functions, if and only if
                the `-p' option is not also used
      -f        suppress shell function lookup
      -P        force a PATH search for each NAME, even if it is an alias,
                builtin, or function, and returns the name of the disk file
                that would be executed
      -p        returns either the name of the disk file that would be executed,
                or nothing if `type -t NAME' would not return `file'
      -t        output a single word which is one of `alias', `keyword',
                `function', `builtin', `file' or `', if NAME is an alias,
                shell reserved word, shell function, shell builtin, disk file,
                or not found, respectively

    Arguments:
      NAME      Command name to be interpreted.

    Exit Status:
    Returns success if all of the NAMEs are found; fails if any are not found.

Explicit Requests

When we used the type command earlier on the exit command it returned a single response (exit is a shell builtin).

Let’s try again with a different command (echo):

$ type echo

echo is a shell builtin

But if we also apply the -a flag we get more output:

$ type -a echo

echo is a shell builtin
echo is /bin/echo

This indicates that the shell found a builtin first, but that there was also an external executable called echo.

If you were to execute echo foo you would be calling the builtin echo command.

You could be explicit by executing it via the builtin command:

$ builtin echo foo

foo

You could also explicitly request the executable and not the builtin by using the command command:

$ command echo foo

foo

Locating programs

To locate a program you use the which executable.

We know it’s an executable by using the type builtin to check it against:

$ type -a which

which is /usr/bin/which

If we use which to lookup the location of the echo command, will it find the builtin or the external executable?

$ which echo

/bin/echo

We can see it only found the external executable.

The which command isn’t a builtin, and so it has no idea of where to look for builtins.

Because, by nature, builtins are built into the shell itself.

Hashed Types

If you open a fresh terminal screen and execute type man you would see the response man is /usr/bin/man.

If you now execute the man command (e.g. man echo) and try type man again you’ll see:

man is hashed (/usr/bin/man)

The reason for this is because in order for the shell to locate the executable it needs to look it up from various locations.

These locations are defined in the $PATH (as we saw earlier).

To avoid having to do that lookup every time, it caches the result in a hash table.

If we read the Bash manual (man bash) you’ll see the following comment:

Bash uses a hash table to remember the full pathnames of executable files 
(see hash under SHELL BUILTIN COMMANDS below). 

A full search of the directories in PATH is performed only if the command is not found in the hash table.

So it seems there is a hash builtin command, let’s take a look at that:

$ help hash

hash: hash [-lr] [-p pathname] [-dt] [name ...]
    Remember or display program locations.
    
    Determine and remember the full pathname of each command NAME.  If
    no arguments are given, information about remembered commands is displayed.
    
    Options:
      -d	forget the remembered location of each NAME
      -l	display in a format that may be reused as input
      -p pathname	use PATHNAME as the full pathname of NAME
      -r	forget all remembered locations
      -t	print the remembered location of each NAME, preceding
    		each location with the corresponding NAME if multiple
    		NAMEs are given
    Arguments:
      NAME	Each NAME is searched for in $PATH and added to the list
    		of remembered commands.
    
    Exit Status:
    Returns success unless NAME is not found or an invalid option is given.

So the documentation informs us of how we can look inside of the shell’s hash table by using the -l flag:

$ hash -l

builtin hash -p /usr/bin/which which
builtin hash -p /usr/bin/man man
builtin hash -p /usr/bin/clear clear

From this you can see I’ve already executed the which, man and clear executables, hence they’re now cached.

Also in the Bash manual is the following comment:

BASH_CMDS
    An associative array variable whose members correspond to the internal hash table of commands as maintained by the hash builtin.
    Elements added to this array appear in the hash table; 
    however, unsetting array elements currently does not cause command names to be removed from the hash table.  
    If BASH_CMDS is unset, it loses its special properties, even if it is subsequently reset.

This informs us that there is another way to view the hash table contents.

In this case we can view the internal array the hash builtin appends to:

$ declare -p BASH_CMDS

declare -A BASH_CMDS=([which]="/usr/bin/which" [man]="/usr/bin/man" [clear]="/usr/bin/clear" )

Note: it’s not as clear to read as the hash output, but this is probably more useful for interacting with programatically.

List of all builtins vs executables

For a list of builtins you can use (in the Bash shell at least):

$ enable -a

enable .
enable :
enable [
enable alias
enable bg
enable bind
enable break
enable builtin
enable caller
enable cd
enable command
enable compgen
enable complete
enable compopt
enable continue
enable declare
enable dirs
enable disown
enable echo
enable enable
enable eval
enable exec
enable exit
enable export
enable false
enable fc
enable fg
enable getopts
enable hash
enable help
enable history
enable jobs
enable kill
enable let
enable local
enable logout
enable mapfile
enable popd
enable printf
enable pushd
enable pwd
enable read
enable readarray
enable readonly
enable return
enable set
enable shift
enable shopt
enable source
enable suspend
enable test
enable times
enable trap
enable true
enable type
enable typeset
enable ulimit
enable umask
enable unalias
enable unset
enable wait

Note: an online reference can be found here

To list out all available executables is a little more tricky.

First you need to access only those directories you’re interested in:

$ echo $PATH | tr ':' '\n' | sort | egrep '^/(usr|bin)'

/bin
/usr/bin
/usr/local/bin
/usr/local/sbin
/usr/local/sbin
/usr/local/sbin
/usr/sbin

Note: tweak the regex as you see fit

Then you need to list all the commands within those directories.

The following alias’ give you an idea of how you might approach doing that.

alias commands_dir='echo $PATH | tr ":" "\n" | sort | egrep "^/(usr|bin)"'
alias commands='for i in $(commands_dir):; do eval "ls -l $i"; done'

Does anyone know of a better way? I’d ❤️ to hear about it


But before we wrap up... time (once again) for some self-promotion 🙊