Rack Middleware

What we'll cover

reading time: approx. 5mins

Introduction

I want to quickly cover how Rack Middleware works. It's actually much simpler than you probably realise to write your own middleware to handle all sorts of functional requirements.

If you're unsure of what Rack is then this post probably isn't the best starting point. I'll give a very brief description of it (along with a basic example), but I want to focus more on Rack middleware.

To quote the official site (rack.github.io):

Rack provides a minimal interface between webservers supporting Ruby and Ruby frameworks. To use Rack, provide an "app": an object that responds to the call method, taking the environment hash as a parameter, and returning an Array with three elements: the HTTP response code, a Hash of headers, the response body, which must respond to each

Typically you'll have a config.ru file with content like:

run -> env {[200, {"Content-Type" => "text/html"}, ["<h1>Hello World</h1>"]]}

...and to run the webserver (WEBrick by default, but you can use many others) you need to execute rackup via your terminal's shell/command line interface.

If I wanted to use the Puma webserver, instead of the default WEBrick, then I would run rackup -s Puma (as long as I had run gem install puma prior).

The above example demonstrates the most minimalistic implementation by using a Lambda function (that link is to a Gist I wrote a while ago demonstrating Lambda shorthand syntax and other concepts such as partial application and currying).

This works because Ruby's Proc has an internal call method (remember in the above description it stated the object that handles incoming requests must have a call method)

How middleware works

Middleware is a Ruby class whose initialize method is passed the web request handler (in this case our Lambda). The class also needs to have a call method. The call method (when executed) takes the Lambda function we've stored off inside our middleware class, and executes it.

The Lambda is passed the relevant "environment" details of the request as a Hash. The response that's returned by our request handler Lambda is then saved in a variable; and the rest of the call function will consist of code that manipulates that content somehow, before returning it to the client who made the request.

To use our middleware class, we pass the name of the class to Rack's use method like so: use MyMiddleware.

An example will demonstrate this better. The following is a hypothetical config.ru file...

class A
  def self.call(env)
    [200, {"Content-Type" => "text/html"}, ["<h1>Hello World</h1>"]]
  end
end
use D # pass C into this middleware third  (D runs C and manipulates the response from C)
use C # pass B into this middleware second (C runs B and manipulates the response from B)
use B # pass A into this middleware first  (B runs A and manipulates the response from A)
run A

...in this example we can see we have three middleware classes: B, C and D.

When we execute rackup the Rack interface passes our request handler A (this time we're using a Class with a call method rather than the minimalistic Lambda) into the B middleware (remember, middleware is just a Class that stores off the request handler inside the class and has a call method to manipulate the content).

The B middleware will run A from inside its call method, then manipulate the response from A and return that manipulated response; which is then passed into the C middleware and so on up.

We could theorectically inline the use and run calls into the following example...

D(
  C(
    B(A)
  )
)

...the run and use functions work together to create a nested set of calls like seen above.

This means we execute B(A) first and the result is passed to C. From there the result of C is passed into D.

Rack::Builder

The reason we have functions such as use is because the built-in rackup command you use to execute the config.ru is a standalone Rack::Builder runner, which provides use and other methods as a custom DSL.

If we didn't want to use a config.ru file, we could have a Ruby script that uses Rack::Builder directly, like so:

class A
  def self.call(env)
    [200, {"Content-Type" => "text/html"}, ["<h1>Hello World</h1>"]]
  end
end

app = Rack::Builder.new do
  use D
  use C
  use B
  run A
end

Simple example

use Rack::Lint # gives more descriptive error messages when responses aren't valid

class LogIt
  def initialize(app)
    @app = app
  end
  def call(env)
    status, headers, body  = @app.call(env)
    body.map { |chunk| p "LogIt: #{chunk}" }
    [status, headers, body]
  end
end

class Upper
  def initialize(app)
    @app = app
  end
  def call(env)
    # Execute our main application (store off its return value)
    status, headers, body  = @app.call(env)

    # Transform the returned value(s)
    # Remember the body is an Array
    upcased_body = body.map { |chunk| chunk.upcase }

    # Return the newly filtered response
    [status, headers, upcased_body]
  end
end

use LogIt # Does nothing with uppercase'd response, just logs it to stdout
use Upper # Modifies response to be uppercase

run -> env {[200, {"Content-Type" => "text/html"}, ["<h1>Hello World</h1>"]]}

In this example, we're back to using a Lambda, which is passed into our Upper class and the response from that is passed into our LogIt class.

The response from LogIt (which in our example is the same response that came from Upper as we didn't manipulate it in any way, we simply used the LogIt class as a proxy for logging to stdout) is then passed into the 3rd party Rack middleware Rack::Lint which will let us know if our final response is valid or not.

If we were to run this application using rackup -s Puma -p 9293 (or bundle exec puma --port 9293 up to you) then we could use Curl to make a request to the locally running server like so: curl 0.0.0.0:9293.

The response we would get back is: <H1>HELLO WORLD</H1> (and if you were to look at your terminal's stdout you should also see something like:

"LogIt: <H1>HELLO WORLD</H1>"
127.0.0.1 - - [15/Nov/2014 11:21:07] "GET / HTTP/1.1" 200 - 0.0006

Custom Cache Middleware

The inspiration for this section came from @vhokstad who wrote a post about creating your own static asset caching middleware way back in 2008 (http://www.hokstad.com/rack-middleware-adding-cache-headers).

I cleaned up the code a bit and merged it into some test Rack code I have on GitHub.

Fundamentally it comes down to registering the use of the Cache middleware (we'll take a look at that in a moment):

cache_headers = {
  :cache_control => "max-age=2678400, public", # 86400 seconds == 1 day * 31
  :expires => 2678400
}

use Cache, { :assets => cache_headers }

In this example we're passing a Hash into our Middleware class that it'll use as a pattern for matching resources that should be cached. In this case I only have a single key :assets because all my static assets (images, stylesheets and scripts) are kept within that directory:

├── Assets
   ├── Images
      └── rack-logo.png
   ├── Scripts
      └── test.js
   └── Styles
       └── base.css

But imagine for a moment that you have a different structure (no Assets parent directory):

├── Images
   ├── rack-logo.png
├── Scripts
   ├── test.js
├── Styles
   ├── base.css

...then you would need to change the Hash to:

{
  :images  => cache_headers,
  :scripts => cache_headers,
  :styles  => cache_headers
}

...admittedly we have duplication here because we're specifying the cache_headers variable multiple times (hence why I have all my static assets inside an Assets folder) but it does give you the freedom to apply different caching strategies depending on the type of asset being requested.

Note: you could modify the key to use the Regular Expression alternator character and then reduce the Hash back down to a single key. Something along the lines of :"images|scripts|styles" (you would need to modify the generated_pattern function slightly)

The Cache middleware looks like this...

class Cache
  attr_reader :app, :pattern

  def initialize(app, pattern)
    @app = app
    @pattern = pattern
  end

  def call(env)
    app.call(env).tap do |res|
      pattern.each do |p, config|
        if env["REQUEST_PATH"] =~ generated_pattern(p)
          res[1]["Cache-Control"] = config[:cache_control] if config.has_key? :cache_control
          res[1]["Expires"] = (Time.now + config[:expires]).utc.rfc2822 if config.has_key? :expires
        end
      end
    end
  end

  private

  def generated_pattern(pattern)
    Regexp.new pattern.to_s.capitalize
  end
end

It's the same pattern we've seen before: we store off the app (but this time we also store off the Hash as pattern) and then the call method executes the app and returns it.

But we use tap to access the app object and to modify the headers for each asset requested (i.e. for each image, stylesheet and javascript file).

Comments

Understanding recursion in functional JavaScript programming

What we'll cover

reading time: approx. 13mins

Introduction

This post isn't about Functional Programming per se, as there are many parts to pure FP (seen in Lisp, Clojure, Haskell etc) which we can't achieve completely in the JavaScript language (for example, implementing fundamental principles of FP such as "immutable data" is very difficult in a language designed around allowing data structures to be mutable. Although immutable data would be a lot easier with ES5 getters/setters and ES6 proxies).

What we do cover is recursion and a problem known as "tail call optimisation" and how to work around it in JavaScript.

The problem

In JavaScript, if a function calls itself recursively then the JavaScript engine has to create what's called a new 'stack'. A stack is a chunk of memory allocated to help keep track of all the information related to the function at the point of execution (such as its arguments and their initialised values).

Here in lies the problem: creating stacks is expensive, as the JavaScript engine can only create as many stacks as it has memory available. If we write a function that recursively calls itself many times then we'll find that we can exhaust the memory allocation and trigger an error.

Let's look at some code that demonstrates this.

The following function sum simply adds two numbers together; but for the purpose of demonstrating recursion (a potentially complex topic) and making it easier to understand, we have written the sum function in a very convoluted way:

function sum(x, y) {
    if (y > 0) {
      return sum(x + 1, y - 1);
    } else {
      return x;
    }
}

sum(1, 10); // => 11

Let's quickly explain what the code does:

As we can see, in the above code the JavaScript engine has to create a new stack for each recursive call. For a small sum of 1 and 10 this is fine; but if we try to sum 1 and 100000 then that'll require more stacks than we have memory to allocate. This will cause an error like so:

sum(1, 100000);
// => `RangeError: Maximum call stack size exceeded`

Recursion in the form we've written it above requires many stacks to be created because of how the function is calling itself.

The first call to the sum function (i.e. sum(1, 10)) doesn't complete until the very last call to sum returns the value of x (i.e. sum(0, 99999)). When the final call to sum occurs and we discover that y is no longer greater than x we return the accumulated value which has for each recursive call been stored inside the argument x. That returned value needs to then be passed back through each function execution context (effectively closing each stack) until we reach the very first function execution context that was opened when we ran sum(1, 10).

If we create a stack deep enough (e.g. sum(1, 100000)) then the JavaScript engine will throw a Maximum call stack size exceeded error.

This problem can occur very easily (as we've seen in this simple code example).

Tail Call Optimisation

In other programming languages the recursion could be rewritten in such a way that the engine would recognise a recursive execution was happening and optimise the code internally into a loop form. This is called a "tail call optimisation" (TCO).

Unfortunately the JavaScript language doesn't implement this optimisation.

Note: apparently ECMAScript 6 has plans to implement TCO

The following is an example of a tail optimised version of the previous code...

function sum(x, y) {
    function recur(a, b) {
        if (b > 0) {
            return recur(a + 1, b - 1);
        } else {
            return a;
        }
    }

    return recur(x, y);
}

sum(1, 10); // => 11

This works because the recur function ends up being the last invocation of the sum function. Now the engine is able to construct internal code that executes recur inside of a loop instead of via recursion.

As mentioned above, this optimisation has no effect in the current version of the JavaScript language (fingers crossed ES6 will implement TCO so the above code would work fine).

For now, we need an alternative solution.

The solution

The fix to this problem (in JavaScript at least) is to consume fewer stacks.

One way we could do this is to rewrite the code to not be recursive; so in other words: use a loop! Effectively we would be producing our own implementation of TCO (remember that TCO recognises a recursion and internally implements it as a loop).

The problem with using a loop is that it's not as elegant as the recursive style we associate with functional programming.

Another solution is to use a type of functional pattern called "trampolining". Let's take a look at one implementation of it...

Replace Recursion with a Loop

Note: the following code (and my explanation of it) assumes an understanding of the this keyword and changing its context. If you're unsure about what this means then read more about it here):

function trampoline(f) {
    while (f && f instanceof Function) {
        f = f();
    }
    return f;
}

function sum(x, y) {
    function recur(x, y) {
        if (y > 0) {
          return recur.bind(null, x + 1, y - 1);
        } else {
          return x;
        }
    }

    return trampoline(recur.bind(null, x, y));
}

sum(1, 10); // => 11

The reason the above code works is because we've replaced our functional recursion style with a loop.

In the above code we don't create a deep nest of stacks because the sum function only gets called once. The trampoline function also is only called once; and the recur function inside of sum, although called multiple times, is called via a loop (again, no more than one stack is required at any one time).

The code breaks down like so...

A more advanced solution

The previous code we just looked at works fine, but it required us to modify our code to work with the 'trampoline' pattern. This is a bit of a pain and means if we have lots of recursive code then it means each one might need subtle changes to accommodate this pattern.

The following code is an abstraction around that concept and it'll allow us to keep our code exactly the same, with no modifications, and the abstraction will handle all of the work for us!

Let's take a look at the implementation...

function tco(f) {
    var value;
    var active = false;
    var accumulated = [];

    return function accumulator() {
        accumulated.push(arguments);

        if (!active) {
            active = true;

            while (accumulated.length) {
                value = f.apply(this, accumulated.shift());
            }

            active = false;

            return value;
        }
    }
}

var sum = tco(function(x, y) {
    if (y > 0) {
      return sum(x + 1, y - 1)
    }
    else {
      return x
    }
});

sum(1, 100000) // => 100001

Here we've written a tco function which simply wraps around our original code (so no modification required to our code).

Let's now take a look at how it works.

Explanation

Be warned that understanding this code could take a while (lord knows it took me long enough to figure it out; that's what prompted me to write this post, so I wouldn't forget!).

If you don't understand how the code works after the first time reading through these notes then it's probably best to execute the above code via your browser of choice's developer tools and use a debugger statement to step through the code whilst reading this explanation...

Note: the above code was written by Irakli Gozalishvili (an engineer at Mozilla): https://gist.github.com/Gozala/1697037. As you can see above, it makes tail call optimising any function really easy.

The secret sauce to this entire chunk of code is coming up!

Conclusion

There you have it, a quick run through of TCO, what it means and how potentially to work around it in JavaScript. Go forth and be productive!

Update 1

I've since discovered a nice explanation of trampolining in the JS Drip Newsletter (#65).

The following code is an example that is NOT tail-call optimised:

function isEvenNaive (num) {
    if (num === 0) {
        return true;
    }

    if (num === 1) {
        return false;
    }

    return isEvenNaive(Math.abs(num) - 2);
}

isEvenNaive(10);
// => true

isEvenNaive(9);
// => false

isEvenNaive(99999);
// => InternalError: too much recursion

The following code snippet demonstrates the first step towards trampolining, which is to flatten the stack by requiring the user to manually execute the function over and over:

function isEvenInner (num) {
    if (num === 0) {
        return true;
    }

    if (num === 1) {
        return false;
    }

    return function() {
        return isEvenInner(Math.abs(num) - 2);
    };
}

isEvenInner(8);
// => function() {
//        return isEvenInner(Math.abs(num) - 2);
//    };

isEvenInner(8)()()()();
// => true

The first thing to notice about our isEvenInner function is that instead of directly calling itself again, it returns an anonymous function. That means each call to isEvenInner gets resolved immediately, and doesn't increase the size of the stack. It also means that we need a way to automatically invoke all of those anonymous functions that will get returned along the way. That's where trampoline comes in.

The trampoline function effectively turns this recursive algorithm into something that is executed by a while loop. As long as isEvenInner keeps returning functions, trampoline will keep executing them. When we finally reach a non-function value, trampoline will return the result.

Now, finally, let's see an example that automates the execution of the flatterned stack:

function trampoline (func, arg) {
    var value = func(arg);

    while(typeof value === "function") {
        value = value();
    }

    return value;
}

trampoline(isEvenInner, 99999);
// => false

trampoline(isEvenInner, 99998);
// => true

var isEven = trampoline.bind(null, isEvenInner);

isEven(99999);
// => false
Comments