Posted on 21 mins read

Summary

In this post I cover what CI/CD means and how you can achieve it across multiple different environments (e.g. dev, stage, prod) using the GitHub Actions platform. The infrastructure is managed using Terraform and the state is coordinated using Terraform Cloud.

I detail specific issues I experienced while implementing CI/CD for a real high-scale/high-profile public API, and the workarounds I needed to implement to resolve them.

What is CI/CD?

Let’s start off by answering the question:
What is CI/CD?

CI is short for Continuous Integration.
CD refers to two separate concepts:

  1. Continuous Delivery
  2. Continuous Deployment

People often make the mistake of thinking they are interchangeable.
They’re not. But what these terms mean is actually not that complicated:

  • Continuous Integration: you’re testing every change (i.e. commit)
  • Continuous Delivery: you’re able to ship every change on demand
  • Continuous Deployment: you’re shipping every change automatically

Now we have a bit of an understanding of what CI/CD is,
let’s see how I’ve been able to use each of these individual concepts.

API Gateway

I needed to implement a complete CI/CD pipeline for my employer.

They had an API Gateway service which required a mixture of continuous integration, delivery and deployment across three separate ’environments’ (i.e. a dev environment, a stage environment, and a production environment).

Here is the summary of their usage per environment:

  • Dev: Continuous Integration/Deployment
  • Stage: Continuous Integration/Deployment
  • Production: Continuous Delivery

Let’s dig into each environment to understand how they use CI/CD.

Dev Environment

The development environment is spun-up automatically when pushing to a PR.
It is automatically spun-down when the associated PR is merged.

NOTE: We control which files will trigger the dev environment creation.

The service name is api-gateway-dev-<BRANCH>.
The service domain is api-gateway-dev-<BRANCH>.edgecompute.app.
The <BRANCH> is calculated from github.head_ref.

The Terraform state that manages this environment is stored in a Terraform Cloud (TFC) project under a dynamically created, and managed, workspace called api-gateway-dev-<BRANCH>.

The dev CI workflow has some important steps it must take in order to spin-up a development environment that is isolated between different PR authors:

  • Call the TFC API to create a new workspace in the TFC project.
  • Cache the result of the workspace creation to avoid API rate limits.
  • Also create a workspace environment variable for FASTLY_API_KEY.

The last step is required as we’re deploying the service to Fastly.

NOTE: You’ll notice shortly that our stage and production flows use a single service and workspace, where as our dev environment is a bit more involved and requires its own dynamically created workspace and services. This is because we have multiple developers working on multiple features at any one time. So we use the ‘branch’ name as the unique identifier, and by having a different workspace per branch it means a developer can modify the Terraform infrastructure without affecting anyone else.

Stage Environment

There are two scenarios where the staging environment is deployed to:

  1. Whenever a PR is merged
  2. Whenever a commit is pushed directly to the main branch

NOTE: Again, we control which files will trigger the environment creation.

The service name is api-gateway-stg.
The service domain is api-gateway-stg.edgecompute.app.

Once the deployment is complete, the Compute package (i.e. the application code)
is uploaded as an artifact to GitHub and is deleted after 14 days.

The name of the artifact is the hash of the Compute package.
A git tag is pushed for each stage deployment, and the hash is appended to it.

During that 14 day time frame, the artifact can be promoted to production.

The Terraform state that manages this environment is stored in the Terraform Cloud project under the “api-gateway-stage” workspace.

Production Environment

The production environment is deployed to manually (i.e. Continuous Delivery) by triggering the relevant GitHub Actions workflow. The person triggering the deployment will first need to identify the Compute package ‘hash’ (which is appended to each staging tag).

The reason for ‘promoting’ the staging Compute package to production,
rather than recompiling the application code, is to ensure consistency
between the staging and production environments.

We require the person triggering the deployment to manually locate the relevant Compute package hash. We can’t rely on automation (e.g git tag --sort=-creatordate) because at the moment of checking the git tags, another person might have merged another PR and thus we would pick up that new staging tag and deploy the wrong artifact to the production environment.

The reason that is a concern, is because a newly merged PR should have some time
on the staging environment for any unexpected bugs to be identified. At some
point in the future we might have complete confidence in our test suite to move
from Continuous Delivery to Continuous Deployment, but that time isn’t now.

The service name is api-gateway-prd.
The service domain is api-gateway-prd.edgecompute.app.

The Terraform state that manages this environment is stored in the Terraform Cloud project under the “api-gateway-production” workspace.

Implementation

So here is where I’m going to display the complete set of GitHub Actions workflow configuration. We’ll review the code and I’ll make comments about specific sections of interest.

Let’s start with dev, then we’ll see stage, and then production.

Dev Config

# We deploy to dev environment when PR is pushed to.
# We name the environment after the branch to avoid collisions.
# When PR is merged we automatically tear-down the dev environment.
name: "API Gateway TF Dev"

# Stop any in-flight CI jobs when a new commit is pushed.
concurrency:
  group: api-gateway-dev-${{ github.ref_name }}
  cancel-in-progress: true

on:
  workflow_dispatch:
  pull_request:
    types: [opened, edited, closed, reopened, synchronize]
    paths:
      - .github/workflows/api-gateway-*
      - cmd/gateway/**
      - internal/gateway/**
      - infrastructure/fastly/compute/api-gateway/**

# IMPORTANT: for TF_VAR_* wrap double-quotes in single-quotes.
env:
  APP_DIRECTORY: "./cmd/gateway"
  CONFIG_DIRECTORY: "./infrastructure/fastly/compute/api-gateway"
  PACKAGE_PATH: "/pkg/gateway.tar.gz"
  TF_API_TOKEN: "${{ secrets.TF_API_TOKEN }}"
  TF_CLOUD_ORGANIZATION: "example"
  TF_VAR_backend: '"https://api-dev.example.com"'
  TF_VAR_env: '"dev"'

jobs:
  create-workspace:
    name: "Create Terraform Cloud Workspace"
    if: ${{ github.event.action != 'closed' && github.actor != 'dependabot[bot]' }} # don't run this when the PR is closed
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Debug Information
        uses: ./.github/actions/debug

      - name: Cache/Restore Workspace ID
        id: cache-workspace-id
        uses: actions/cache@v4
        with:
          path: tfc-workspace-id
          key: ${{ runner.os }}-cache-workspace-id-${{github.head_ref}}

      # IMPORTANT: We don't want to run the following steps if we get a cache-hit.
      # That is because a cache-hit indiciates we've already created the workspace.
      # This means we're forced to add a `if:` check on each following step.

      - name: Set TF_VAR_branch
        if: steps.cache-workspace-id.outputs.cache-hit != 'true'
        uses: ./.github/actions/tf-var-branch

      - name: Set WORKSPACE_NAME
        if: steps.cache-workspace-id.outputs.cache-hit != 'true'
        uses: ./.github/actions/tfc-dev-workspace-name

      - name: Modify Workspace JSON Payload
        if: steps.cache-workspace-id.outputs.cache-hit != 'true'
        run: |
          payload=./infrastructure/fastly/compute/api-gateway/tfc-api/payloads/workspace.json
          jq '.data.attributes.name = "${{ env.WORKSPACE_NAME }}"' "$payload" > temp.json && mv temp.json "$payload"          

      - name: Modify Workspace Variable JSON Payload
        if: steps.cache-workspace-id.outputs.cache-hit != 'true'
        run: |
          payload=./infrastructure/fastly/compute/api-gateway/tfc-api/payloads/workspace-variables.json
          jq '.data.attributes.value = "${{ secrets.FASTLY_API_KEY }}"' "$payload" > temp.json && mv temp.json "$payload"          

      # NOTE: A cache is volatile/ephemeral.
      # If the cache is invalidated/cleared, then the following step could still be run.
      # This would mean the curl request to create the workspace would fail (because it could already exist).
      # This is OK, because we check the API response for errors and the last step reacts to the failure status.
      # The intention is to avoid trying to read from a response.json that doesn't contain the required structure (i.e. doesn't contain a .data.id field).
      # As well as avoid trying to call the TFC API to create a workspace variable when there was an error initially creating the workspace.

      - name: Create Dev Workspace
        if: steps.cache-workspace-id.outputs.cache-hit != 'true'
        run: |
          curl -s \
            --header "Authorization: Bearer ${{ secrets.TF_API_TOKEN }}" \
            --header "Content-Type: application/vnd.api+json" \
            --request POST \
            --data @infrastructure/fastly/compute/api-gateway/tfc-api/payloads/workspace.json \
            https://app.terraform.io/api/v2/organizations/example/workspaces > response.json
          if [[ $(jq '.errors | length == 0' response.json) == false ]]; then
            jq '.errors' response.json
            echo "STEP_STATUS=failed" >> "${GITHUB_ENV}"
          fi          

      - name: Create FASTLY_API_KEY as Workspace Variable
        if: ${{ steps.cache-workspace-id.outputs.cache-hit != 'true' && env.STEP_STATUS != 'failed' }}
        run: |
          workspace_id=$(jq -r '.data.id' response.json)
          curl -s \
            --header "Authorization: Bearer ${{ secrets.TF_API_TOKEN }}" \
            --header "Content-Type: application/vnd.api+json" \
            --request POST \
            --data @infrastructure/fastly/compute/api-gateway/tfc-api/payloads/workspace-variables.json \
            "https://app.terraform.io/api/v2/workspaces/$workspace_id/vars"
          echo "$workspace_id" > tfc-workspace-id          

  terraform-plan:
    name: "Plan"
    runs-on: ubuntu-latest
    permissions: # https://docs.github.com/en/actions/using-jobs/assigning-permissions-to-jobs
      contents: read
      pull-requests: write
    needs: create-workspace
    if: ${{ github.event.action != 'closed' && github.actor != 'dependabot[bot]' && success() }} # don't run this when the PR is closed (and also only if the create-workspace was successful)
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Debug Information
        uses: ./.github/actions/debug

      - name: Install Go and Fastly CLI
        uses: ./.github/actions/fastly-cli

      - name: Build Compute Package
        run: fastly compute build --dir ${{ env.APP_DIRECTORY }} --metadata-show

      - name: Calculate Compute Package Hash
        run: fastly compute hash-files --skip-build --dir ${{ env.APP_DIRECTORY }}

      - name: Copy Compute Package to Terraform workspace
        run: cp ${{ env.APP_DIRECTORY }}${{ env.PACKAGE_PATH }} ${{ env.CONFIG_DIRECTORY }}

      - name: Set TF_VAR_branch
        uses: ./.github/actions/tf-var-branch

      - name: Set WORKSPACE_NAME
        uses: ./.github/actions/tfc-dev-workspace-name

        # NOTE: This job (and all jobs that follow) requires a TFC Workspace.
        # The name of the workspace is expected to match WORKSPACE_NAME.
        # The creation of the workspace happens in the `create-workspace` job.
        #
        # But there are scenarios where the workspace might not exist!
        # This can happen when there is an error in the `create-workspace` job.
        # For example, when calling the TFC API.
        #
        # Although the API might error, we don't want to mark the job as failed.
        # This is because the API could be called and fail for multiple reasons.
        # e.g. the cached API result wasn't found and we got a 422 from the API.
        #
        # A 422 code response from the TFC API thus indicates we tried to create
        # a workspace that already exists. So in this scenario we still want the
        # remaining jobs to run (as the workspace does exist).
        #
        # Ultimately, if the error was something else and the workspace thus
        # wasn't created, then at worst we'll get an error back from TFC when
        # trying to upload the TF configuration.

      - name: Upload Configuration
        uses: hashicorp/tfc-workflows-github/actions/upload-configuration@v1.2.0
        id: plan-upload
        with:
          workspace: ${{ env.WORKSPACE_NAME }}
          directory: ${{ env.CONFIG_DIRECTORY }}
          speculative: true

      - name: Create Plan Run
        uses: hashicorp/tfc-workflows-github/actions/create-run@v1.2.0
        id: plan-run
        with:
          workspace: ${{ env.WORKSPACE_NAME }}
          configuration_version: ${{ steps.plan-upload.outputs.configuration_version_id }}
          plan_only: true

      - name: Get Plan Output
        uses: hashicorp/tfc-workflows-github/actions/plan-output@v1.2.0
        id: plan-output
        with:
          plan: ${{ fromJSON(steps.plan-run.outputs.payload).data.relationships.plan.data.id }}

      - name: Update PR
        uses: actions/github-script@v7
        id: plan-comment
        with:
          github-token: ${{ secrets.GITHUB_TOKEN }}
          script: |
            // Retrieve existing bot comments for the PR
            const { data: comments } = await github.rest.issues.listComments({
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: context.issue.number,
            });
            const botComment = comments.find(comment => {
              return comment.user.type === 'Bot' && comment.body.includes('Terraform Cloud Plan Output')
            });
            const output = `#### Terraform Cloud Plan Output
               \`\`\`
               Plan: ${{ steps.plan-output.outputs.add }} to add, ${{ steps.plan-output.outputs.change }} to change, ${{ steps.plan-output.outputs.destroy }} to destroy.
               \`\`\`
               [Terraform Cloud Plan](${{ steps.plan-run.outputs.run_link }})
               `;
            if (botComment) {
              github.rest.issues.deleteComment({
                owner: context.repo.owner,
                repo: context.repo.repo,
                comment_id: botComment.id,
              });
            }
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: output
            });            

  terraform-apply:
    name: "Apply"
    needs: terraform-plan
    runs-on: ubuntu-latest
    permissions:
      contents: read  # for Terraform Actions
      statuses: write # for Status Action (myrotvorets/set-commit-status-action)
    if: ${{ github.event.action != 'closed' && github.actor != 'dependabot[bot]' && success() }} # don't run this when the PR is closed (and also only if the terraform-plan was successful)
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Debug Information
        uses: ./.github/actions/debug

      - name: Install Go and Fastly CLI
        uses: ./.github/actions/fastly-cli

      - name: Build Compute Package
        run: fastly compute build --dir ${{ env.APP_DIRECTORY }} --metadata-show

      - name: Calculate Compute Package Hash
        run: fastly compute hash-files --skip-build --dir ${{ env.APP_DIRECTORY }}

      - name: Copy Compute Package to Terraform workspace
        run: cp ${{ env.APP_DIRECTORY }}${{ env.PACKAGE_PATH }} ${{ env.CONFIG_DIRECTORY }}

      - name: Set TF_VAR_branch
        uses: ./.github/actions/tf-var-branch

      - name: Set WORKSPACE_NAME
        uses: ./.github/actions/tfc-dev-workspace-name

      - name: Terraform Apply
        id: tf-apply
        uses: ./.github/actions/tf-apply
        with:
          workspace: ${{ env.WORKSPACE_NAME }}

      - name: "Terraform Apply Run URL"
        uses: myrotvorets/set-commit-status-action@master
        if: success()
        with:
          token: ${{ secrets.GITHUB_TOKEN }}
          status: "success"
          description: "click 'Details' to view"
          context: Terraform Apply Results
          targetUrl: "https://app.terraform.io/app/example/workspaces/api-gateway/runs/${{ steps.tf-apply.outputs.run_id }}"

  # Remove any dev services once the corresponding PR has been merged.
  # https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#running-your-pull_request-workflow-when-a-pull-request-merges
  terraform-destroy:
    name: "Cleanup Dev Environment"
    runs-on: ubuntu-latest
    if: ${{ github.event.pull_request.merged == true && github.actor != 'dependabot[bot]' }}
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Debug Information
        uses: ./.github/actions/debug

      # NOTE: We still need to install the CLI and compile the Compute package.
      # This is because we need to do a two-step apply (see IMPORTANT below).

      - name: Install Go and Fastly CLI
        uses: ./.github/actions/fastly-cli

      - name: Build Compute Package
        run: fastly compute build --dir ${{ env.APP_DIRECTORY }} --metadata-show

      - name: Calculate Compute Package Hash
        run: fastly compute hash-files --skip-build --dir ${{ env.APP_DIRECTORY }}

      - name: Copy Compute Package to Terraform workspace
        run: cp ${{ env.APP_DIRECTORY }}${{ env.PACKAGE_PATH }} ${{ env.CONFIG_DIRECTORY }}

      - name: Set TF_VAR_branch
        uses: ./.github/actions/tf-var-branch

      - name: Set WORKSPACE_NAME
        uses: ./.github/actions/tfc-dev-workspace-name

        # IMPORTANT: Deleting a service with a resource-link is a two-step apply.
        # We can't pass a -target to the Terraform GH Action as a prerequisite step.
        # So this means we'll need to clear any `resource_link` blocks from the config.
        # Then, after we've applied the change, we'll need to run a TF destroy.

      - name: Remove resource_link blocks from TF config
        run: |
          while IFS= read -r line; do
              if [[ "$line" =~ "resource_link {" || "$clear_line" == true ]]; then
                if [[ "$line" =~ "}" ]]; then
                  clear_line=false
                else
                  clear_line=true
                fi
                echo ""
              else
                echo "$line"
              fi
          done < ${{ env.CONFIG_DIRECTORY }}/main.tf > ./main.tf && mv ./main.tf ${{ env.CONFIG_DIRECTORY }}/main.tf          

      - name: Terraform Apply
        uses: ./.github/actions/tf-apply
        with:
          workspace: ${{ env.WORKSPACE_NAME }}

        # NOTE: Destroy config once Resource Links have been removed.

      - name: Terraform Apply
        if: success()
        uses: ./.github/actions/tf-apply
        with:
          workspace: ${{ env.WORKSPACE_NAME }}
          destroy: true

        # NOTE: Once all Fastly resources are deleted, we delete the TFC workspace.

      - name: Delete Dev Workspace
        run: |
          curl -s \
            --header "Authorization: Bearer ${{ secrets.TF_API_TOKEN }}" \
            --header "Content-Type: application/vnd.api+json" \
            --request POST \
            https://app.terraform.io/api/v2/organizations/example/workspaces/${{ env.WORKSPACE_NAME }}/actions/safe-delete > response.json
          if [[ $(jq '.errors | length == 0' response.json) == false ]]; then
            jq '.errors' response.json
            exit 1
          fi          

OK, the first thing I tripped up on was…

# Stop any in-flight CI jobs when a new commit is pushed.
concurrency:
  group: api-gateway-dev-${{ github.ref_name }}
  cancel-in-progress: true

I’m used to naming the group with ${{ github.ref_name }}. This has historically always worked fine for me, until I started developing a more complex CI/CD pipeline where there are lots of different events causing a workflow file to be triggered …and more importantly, causing a workflow file to be STOPPED!

Specifically, I ran into an issue where merging a PR should cause the dev environment to be spun-down, but I had a stage workflow also running and that caused by dev workflow to be unexpectedly cancelled, meaning I had infrastructure resources orphaned, and it was a pain to have to go back and manually clean them up.

So to resolve that I had to make sure to prefix each group with a unique name.

The next thing of interest is…

on:
  workflow_dispatch:
  pull_request:
    types: [opened, edited, closed, reopened, synchronize]
    paths:
      - .github/workflows/api-gateway-*
      - cmd/gateway/**
      - internal/gateway/**
      - infrastructure/fastly/compute/api-gateway/**

It took me a while to figure out the types to use. I have a very specific set of events I want to react to and I discovered that edited on a pull_request doesn’t mean “code pushed to the PR”. Nope, for that you need synchronize.

Also, I wanted to avoid having workflow files triggered unnecessarily, hence I restrict this particular workflow file to changes made to relevant API gateway files. Now, this isn’t perfect because the application code could change in the future to depend on other packages in the codebase (as it’s a mono-repo) and that would mean having to update this list (which as you can imagine is likely to get forgotten about).

The next thing of interest is…

# IMPORTANT: for TF_VAR_* wrap double-quotes in single-quotes.
env:
  APP_DIRECTORY: "./cmd/gateway"
  CONFIG_DIRECTORY: "./infrastructure/fastly/compute/api-gateway"
  PACKAGE_PATH: "/pkg/gateway.tar.gz"
  TF_API_TOKEN: "${{ secrets.TF_API_TOKEN }}"
  TF_CLOUD_ORGANIZATION: "example"
  TF_VAR_backend: '"https://api-dev.example.com"'
  TF_VAR_env: '"dev"'

When passing a value to Terraform via an environment variable, you need the value itself to be a string, but doing export FOO=bar doesn’t assign "bar" to the variable, it only assigns bar without the quotes and that causes Terraform to treat the value like an undefined variable. So it’s very important to wrap the string in single quotes.

The next thing of interest is…

if: ${{ github.event.action != 'closed' && github.actor != 'dependabot[bot]' }} # don't run this when the PR is closed

Specifically, the bit I want to call out is the check for dependabot[bot]. This is because when this automated-bot opens a PR on your repo, it won’t have access to your secrets and so it won’t be able to make TFC API calls (and do other things) so you might as well skip the CI/CD pipeline workflow.

The next difficulty I ran into was figuring out how best to avoid API rate limits from the Terraform Cloud API. This particular set of workflow files would be run many multiple times a day, and using the TFC API each time would quickly cause us to hit a rate-limit threshold.

To avoid that situation I needed to figure out clever caching patterns. But then, I needed to also account for the fact that a cache is volatile/ephemeral and can cause unexpected error scenarios.

If you take a look at the create-workspace job, you’ll see that once I successfully create a new workspace, I’ll then call the API to create an environment variable that contains a required secret value (for interacting with the Fastly API) and when that is successful I create a file called tfc-workspace-id and it’s that file that I cache and restore on each job run.

When the workflow calls the terraform-plan job you’ll see I’ve added a comment about handling error scenarios. Originally, I would depend on the create-workspace job and only run terraform-plan if the other job had completed successfully. But that didn’t always work and ties back to my comment about a cache being volatile/ephemeral.

To explain: the terraform-plan job (and all jobs that follow it) requires a TFC Workspace. The name of the workspace is expected to match WORKSPACE_NAME. The creation of the workspace happens in the create-workspace job.

Part of the create-workspace is making API requests, and to make that easier I pass a JSON file in to be used as the data input rather than manually configuring it in a single curl call. But first I have to modify the JSON data to include the values required (as I don’t want them hardcoded into the JSON):

  - name: Modify Workspace JSON Payload
    if: steps.cache-workspace-id.outputs.cache-hit != 'true'
    run: |
      payload=./infrastructure/fastly/compute/api-gateway/tfc-api/payloads/workspace.json
      jq '.data.attributes.name = "${{ env.WORKSPACE_NAME }}"' "$payload" > temp.json && mv temp.json "$payload"      

  - name: Modify Workspace Variable JSON Payload
    if: steps.cache-workspace-id.outputs.cache-hit != 'true'
    run: |
      payload=./infrastructure/fastly/compute/api-gateway/tfc-api/payloads/workspace-variables.json
      jq '.data.attributes.value = "${{ secrets.FASTLY_API_KEY }}"' "$payload" > temp.json && mv temp.json "$payload"      

Here’s the payload JSON for both API calls:

{
  "data": {
    "type": "workspaces",
    "attributes": {
      "name": "api-gateway-dev-"
    },
    "relationships": {
      "project": {
        "data": {
          "type": "projects",
          "id": "prj-<REDACTED>"
        }
      }
    }
  }
}
{
  "data": {
    "type": "vars",
    "attributes": {
      "key": "FASTLY_API_KEY",
      "value": "",
      "description": "API key used by the Fastly Terraform Provider",
      "category": "env",
      "sensitive": true
    }
  }
}

But there are scenarios where the workspace might not exist! This can happen when there is an error in the create-workspace job. For example, when calling the TFC API. Although the API might error, we don’t want to mark the job as not being successful. This is because the API could be called and fail for multiple reasons. e.g. the cached API result wasn’t found and we got a 422 from the API.

A 422 code response from the TFC API thus indicates we tried to create a workspace that already exists. So in this scenario we still want the remaining jobs to run (as the workspace does exist).

Ultimately, if the error was something else and the workspace thus wasn’t created, then at worst we’ll get an error back from TFC when trying to upload the TF configuration to it.

The last issue worth noting is related to tearing-down the infrastructure when the PR is merged. There was an issue with the Fastly Terraform provider (or more specifically a Fastly API issue) where the removal of a Fastly ‘Config Store’ (which was needed by the application code) has to be done as a two-part terraform apply.

i.e. you have to remove the resource_link block from the fastly_service_compute resource and run terraform apply, and then you can trigger a terraform apply -destroy via the HashiCorp GitHub Action using the is_destroy field.

Stage Config

# We deploy to staging environment when PR is merged or we push to `main`.
name: "API Gateway TF Stage"

# Stop any in-flight CI jobs when a new commit is pushed.
concurrency:
  group: api-gateway-stg-${{ github.ref_name }}
  cancel-in-progress: true

on:
  workflow_dispatch:
  push:
    branches:
      - main
    paths:
      - .github/workflows/api-gateway-*
      - cmd/gateway/**
      - internal/gateway/**
      - infrastructure/fastly/compute/api-gateway/**

# IMPORTANT: for TF_VAR_* wrap double-quotes in single-quotes.
env:
  APP_DIRECTORY: "./cmd/gateway"
  CONFIG_DIRECTORY: "./infrastructure/fastly/compute/api-gateway"
  PACKAGE_PATH: "/pkg/gateway.tar.gz"
  TF_API_TOKEN: "${{ secrets.TF_API_TOKEN }}"
  TF_CLOUD_ORGANIZATION: "example"
  TF_VAR_backend: '"https://api-stage.example.com"'
  TF_VAR_env: '"stg"'
  TF_WORKSPACE: "api-gateway-stage"

jobs:
  terraform-apply:
    name: "Apply"
    runs-on: ubuntu-latest
    permissions:
      contents: write
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Debug Information
        uses: ./.github/actions/debug

      - name: Install Go and Fastly CLI
        uses: ./.github/actions/fastly-cli

      - name: Build Compute Package
        run: fastly compute build --dir ${{ env.APP_DIRECTORY }}

      - name: Calculate Compute Package Hash
        id: compute-package-hash
        run: echo "compute_package_hash=$(fastly compute hash-files --skip-build --dir ${{ env.APP_DIRECTORY }} --quiet)" >> $GITHUB_OUTPUT

      - name: Copy Compute Package to Terraform workspace
        run: cp ${{ env.APP_DIRECTORY }}${{ env.PACKAGE_PATH }} ${{ env.CONFIG_DIRECTORY }}

      - name: Terraform Apply
        uses: ./.github/actions/tf-apply
        with:
          workspace: ${{ env.TF_WORKSPACE }}

      - name: Upload file as artifact
        uses: actions/upload-artifact@v4
        with:
          name: ${{ steps.compute-package-hash.outputs.compute_package_hash }}
          path: ${{ env.APP_DIRECTORY }}${{ env.PACKAGE_PATH }}
          retention-days: 14
          overwrite: true

      - name: Create Git Tag
        run: echo "GIT_STAGE_TAG=stg/$(date +'%Y-%m-%d/%H-%M-%SZ')/$(echo ${{ github.triggering_actor }} | tr '[:upper:]' '[:lower:]' | tr -d ' ')/${{ steps.compute-package-hash.outputs.compute_package_hash }}" >> "${GITHUB_ENV}"

      - name: Check Git Tag
        run: echo "$GIT_STAGE_TAG"

      - name: Apply Git Tag
        uses: actions/github-script@v7
        with:
          script: |
            github.rest.git.createRef({
              owner: context.repo.owner,
              repo: context.repo.repo,
              ref: 'refs/tags/${{ env.GIT_STAGE_TAG }}',
              sha: context.sha
            })            

The stage config isn’t as complex as the dev workflow, but there are some things of interest to point out, such as uploading our Compute package (which is our application code) to GitHub, which is for the purposes of later promoting that same package to production.

We then use the hash of the Compute package as part of a git tag for staging deploys. Again, this hash is required for the promoting of the staging artifact to the production environment.

Production Config

# We deploy to production environment when workflow is triggered manually.
# Person triggering workflow must provide hash of Compute package to deploy.
name: "API Gateway TF Prod"

# Stop any in-flight CI jobs when a new commit is pushed.
concurrency:
  group: api-gateway-prd-${{ github.ref_name }}
  cancel-in-progress: true

on:
  workflow_dispatch:
    inputs:
      compute_package_hash:
        type: string
        required: true
        description: hash of compute package (see `git tag`)

# IMPORTANT: for TF_VAR_* wrap double-quotes in single-quotes.
env:
  APP_DIRECTORY: "./cmd/gateway"
  CONFIG_DIRECTORY: "./infrastructure/fastly/compute/api-gateway"
  GH_TOKEN: ${{ github.token }} # Required to use the `gh` CLI
  PACKAGE_PATH: "/pkg/gateway.tar.gz"
  TF_API_TOKEN: "${{ secrets.TF_API_TOKEN }}"
  TF_CLOUD_ORGANIZATION: "example"
  TF_VAR_backend: '"https://api.example.com"'
  TF_VAR_env: '"prd"'
  TF_WORKSPACE: "api-gateway-production"

jobs:
  terraform-apply:
    name: "Apply"
    runs-on: ubuntu-latest
    permissions:
      actions: read
      contents: write
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Debug Information
        uses: ./.github/actions/debug

      - name: Set up Go
        uses: actions/setup-go@v5
        with:
          go-version-file: go.mod

      - name: Download Compute Package to Terraform workspace
        run: |
          gh run download -n ${{ inputs.compute_package_hash }}
          mv gateway.tar.gz ${{ env.CONFIG_DIRECTORY }}          

      - name: Terraform Apply
        uses: ./.github/actions/tf-apply
        with:
          workspace: ${{ env.TF_WORKSPACE }}

      - name: Create Git Tag
        run: echo "GIT_PROD_TAG=prd/$(date +'%Y-%m-%d/%H-%M-%SZ')/$(echo ${{ github.triggering_actor }} | tr '[:upper:]' '[:lower:]' | tr -d ' ')/${{ inputs.compute_package_hash }}" >> "${GITHUB_ENV}"

      - name: Check Git Tag
        run: echo "$GIT_PROD_TAG"

      - name: Apply Git Tag
        uses: actions/github-script@v7
        with:
          script: |
            github.rest.git.createRef({
              owner: context.repo.owner,
              repo: context.repo.repo,
              ref: 'refs/tags/${{ env.GIT_PROD_TAG }}',
              sha: context.sha
            })            

Again, this workflow isn’t as complex as the dev one, so the only interesting part here is that we’re downloading the artifact from GitHub and then running a Terraform apply to upload the Compute package to our production environment.

Custom Actions

The only other files probably worth showing you are the .github/actions/tf-var-branch, .github/actions/tfc-dev-workspace-name and .github/actions/tf-apply steps which I moved into separate action files to make them easier to reuse.

Here is tf-var-branch:

name: "Terraform Variable Branch Calculation"
description: "Generates a value for TF_VAR_branch"
runs:
  using: composite
  steps:
    - name: Set TF_VAR_branch
      shell: bash
      run: echo "TF_VAR_branch=\"-$(echo ${{ github.head_ref }} | tr '[:upper:]' '[:lower:]' | tr '/' '-' | tr '.' '-' | tr '_' '-')\"" >> "${GITHUB_ENV}"
      # NOTE: We only set the branch variable for dev (stg/prd will use an empty string).

NOTE: I had to update this action twice. The first was to replace dot characters with a hyphen as TFC complained. The second was to replace underscores with a hyphen as Fastly complained (i.e. an underscore is not a valid character in a domain).

Here is tfc-dev-workspace-name:

name: "Terraform Cloud Dev Workspace Name"
description: "Generates a workspace name for the PR dev environment"
runs:
  using: composite
  steps:
    - name: Set WORKSPACE_NAME
      shell: bash
      run: |
        branch_quoted='${{ env.TF_VAR_branch }}' # e.g. `"-example"` (set in ../tf-var-branch/action.yml)
        branch_strip_leading_quote="${branch_quoted#\"}" # strip the leading quote (e.g. turn `"-example"` to `-example"`)
        branch="${branch_strip_leading_quote%\"}" # strip the trailing quote (e.g. turn `-example"` to `-example`)
        echo "WORKSPACE_NAME=api-gateway-dev${branch}" >> $GITHUB_ENV        

Here is tf-apply:

# yamllint disable rule:line-length

name: "Terraform Apply"
description: "Uploads TF config, creates a run and applies it"
inputs:
  workspace:
    description: "The TFC Workspace"
    required: true
  destroy:
    description: "When true, this uses terraform to DESTROY the managed infrastructure"
    default: "false"
outputs:
  run_id:
    description: "The ID of the created run"
    value: ${{ steps.apply.outputs.run_id }}
runs:
  using: composite
  steps:
    - name: Upload Configuration
      uses: hashicorp/tfc-workflows-github/actions/upload-configuration@v1.2.0
      id: apply-upload
      with:
        workspace: ${{ inputs.workspace }}
        directory: ${{ env.CONFIG_DIRECTORY }}

    - name: Create Apply Run
      uses: hashicorp/tfc-workflows-github/actions/create-run@v1.2.0
      id: apply-run
      with:
        workspace: ${{ inputs.workspace }}
        configuration_version: ${{ steps.apply-upload.outputs.configuration_version_id }}
        is_destroy: ${{ inputs.destroy == true || inputs.destroy == 'true' }}  # IMPORTANT: This will DESTROY the dev infrastructure.

    - name: Apply
      uses: hashicorp/tfc-workflows-github/actions/apply-run@v1.2.0
      if: fromJSON(steps.apply-run.outputs.payload).data.attributes.actions.IsConfirmable
      id: apply
      with:
        run: ${{ steps.apply-run.outputs.run_id }}
        comment: "Apply Run from GitHub Actions CI ${{ github.sha }}"

Conclusion

I’ve not covered every single line of code in the GitHub configuration. I’ve also not included the Terraform config files because they are specific to my use case and so would be considered ’noise’ as far as understanding the fundamental CI/CD flow we’re trying to learn about here.

But hopefully the explanation of each environment workflow, and then being able to compare that to the actual implementation files for GitHub Actions is enough to help you see how you might be able to implement your own CI/CD workflow across multiple environments.

Good luck out there!


But before we wrap up... time (once again) for some self-promotion 🙊