Matryoshka 1: Composable Storage in Elixir

I recently came across the paper Storage Combinators, which proposes an approach to designing storage systems by composing together modular components.

The code in the paper is given in Objective Smalltalk, but I decided to build Matryoshka (inspired by Storage Combinators, but not an exact replica) in Elixir.

In the Storage Combinators paper, the composition of different stores and store combinators is done via message passing.

Originally I chose Elixir because I thought this message passing style mapped nicely to the message passing and handling that GenServers offer.

After an early false start, I realised this approach had some issues:

I ended up refactoring so that store composition was done with regular structs and functions.

We first compose the business logic of a store together, before wrapping it in a Server and interacting with it via the Client module, so we only ever have 1 process per store, no matter how complex it is under the hood.

Regardless, the ideas around concurrency that the BEAM VM and BEAM languages (Erlang, Elixir, Gleam, Lisp Flavoured Erlang, etc.) offer are very interesting. A few examples of the design choices that come to mind:

OK, so why the name “Matryoshka”?

I eventually elected to engineer this composition functionality by wrapping stores with store combinators (more on this later). This struct composition reminded me of Russian nesting dolls, which recursively store ever smaller Russian nesting dolls inside themselves.

The protocols

Storage Combinators defines two protocols: one is a reference protocol, which identifies the location of a resource, while the second is a storage protocol, which models the insertion, deletion, and retrieval of resources.

The reference protocol

The reference protocol looks like this:

protocol Reference {
  -<Array>pathComponents.
  -<String>path.
  -<String>scheme.
}

References are supposed to have a path and a scheme, which can (for convenience) be accessed as a list of path components.

I’ve elected to simplify the Reference protocol in Elixir by removing the scheme attribute, and enforcing that the only required method is path_segments/1; the full path is just the type itself. In cases where we would use a scheme in a URI, we can simply use the first path segment, which simplifies things for our implementation.

/lib/matryoshka/reference.ex

 1defprotocol Matryoshka.Reference do
 2  @typedoc """
 3  A type that implements the Matryoshka.Reference protocol.
 4  """
 5  @type t() :: any
 6
 7  @doc """
 8  Splits a Reference into the list of underlying path components.
 9  """
10  @spec path_segments(t()) :: list(String.t())
11  def path_segments(reference)
12end
13
14defimpl Matryoshka.Reference, for: BitString do
15  def path_segments(reference) do
16    String.split(reference, "/")
17  end
18end
19
20defimpl Matryoshka.Reference, for: Atom do
21  def path_segments(reference) do
22    [Atom.to_string(reference)]
23  end
24end

In Matryoshka we’ll predominantly be using strings as the references. I expect paths to be strings like "/first/second/key", so to create the list of path segments, we simply splitthe string using forward slashes as a delimiter. I’ve also added an implementation for atoms to show that we could use any other types as references.

The storage protocol

The storage protocol is defined as:

protocol Storage {
  -at:ref.
  -<void>at:ref put:object.
  -<void>at:ref merge:object
  -<void>deleteAt:ref;
}

Any object that implements the Storage protocol is a store.

Some stores handle their storage calls (at/put/merge/deleteAt) by referring to other stores then processing those stores’ results in some way (e.g. by combining, filtering, mapping). These stores are store combinators.

This is the main novelty of the paper.

So, there are four methods that have to be defined on an object for it to be a store:

  1. at takes a reference and returns the data at that reference (if any)
  2. put puts an object at the reference ref, and returns nothing
  3. merge merges a provided object with the object at the reference ref, and returns nothing
  4. deleteAt deletes the object at the reference ref

These four methods are equivalent to the 4 HTTP methods:

Objective Smalltalk HTTP
store at:ref GET <URI>
store at:ref put:data PUT <URI> <data>
store at:ref merge:data PATCH <URI> <data>
store deleteAt:ref DELETE <URI>

The paper’s barely started but I’ve already decided to diverge in my implementation:

Removing merge

I removed merge as I want the storage combinators to be as generic as possible. The issue with patching-style methods is that not all types make sense to be patched. For data formats like JSON and XML, there’s JSON Patch and XML Patch. But what merge operations are there for an atom or a string?

They’re such small types it makes more sense to just put a new value there instead.

Also, you can create the patch functionality by first getting the value, updating the value, and then putting it in, so it’s arguably not even a primitive function anyways.

However, I may end up adding an update function to the Storage protocol like Map.update/4 which updates the value with a function.

Then if we want to apply a JSON Patch to a Map or List, we can create a 1-argument lambda by partially applying a JSON patch function with the desired patch operations, which will then be passed to update.

Adding fetch

There are a few lookup methods in Map, but the main basic methods are get/2 and fetch/2, which differ in the shape of their return values:

Method Return on success Returns on failure
get/2 value nil
fetch/2 {:ok, value} :error

One of the nice things about fetch/2 is that you can distinguish between no value being stored at a key (which is returned as :error) and the value nil being stored at a key (which is returned as {:ok, nil}).

get/2 on the other hand returns nil in both cases.

I also decided to augment the return type of the errors in fetch with a reason, so the fetch in Matryoshka is intended to return either {:ok, value} when a value is stored at the requested key, or {:error, reason} if there’s an error.

Storage Protocol in Elixir

There’s one last change that needs to be made when porting over the storage protocol to Elixir.

Elixir, as a functional programming language, has immutable data. To deal with changing state, we need a pure function which creates a new state from the old state.

The Storage protocol will specify these pure functions.

Later, we’ll use a GenServer to track the changes in state.

As such, we write the storage protocol:

/lib/matryoshka/storage.ex

 1defprotocol Matryoshka.Storage do
 2  alias Matryoshka.Reference
 3
 4  @typedoc """
 5  A type that implements the Matryoshka.Storage protocol.
 6  """
 7  @type store :: any
 8
 9  @type value :: any
10
11  @doc """
12  Fetches the value for a specific `ref` in `store`.
13
14  If `store` contains the given `ref` then its value is returned in the 
15  shape of `{:ok, value}`.
16  If `store` doesn't contain `ref`, then the reason why is returned in the 
17  shape of `{:error, reason}`.
18  """
19
20  def fetch(store, ref)
21
22  @doc """
23  Gets the value for a specific `ref` in `store`.
24
25  If `store` contains the given `ref` then its value `value` is returned.
26  If `store` doesn't contain `ref`, `nil` is returned.
27  """
28  @spec get(store(), Reference.t()) :: value()
29  def get(store, ref)
30
31  @doc """
32  Puts the given `value` under `ref` in `store`.
33  """
34  @spec put(store(), Reference.t(), value()) :: store()
35  def put(store, ref, value)
36
37  @doc """
38  Deletes the entry in `store` for a specific `ref`.
39
40  If the `ref` does not exist, returns `store` unchanged.
41  """
42  @spec delete(store(), Reference.t()) :: store()
43  def delete(store, ref)
44end

I’ve also created a small module used for typespecs where I expect a struct implementing Storage:

/lib/matryoshka/is_storage.ex

1defmodule Matryoshka.IsStorage do
2  @typedoc """
3  A struct that implements the Matryoshka.Storage protocol.
4  """
5  @type t :: any()
6end

And now we’re ready to start putting together implementations for Storage.

As a matter of taste, I like a tripartite approach to building out processes in Elixir:

  1. A Client module, which provides utility functions that wrap sending call and cast messages to the Server
  2. A Server module, which handles calls and casts, but uses the business logic in Impl
  3. An Impl module, which provides the actual business logic in pure functions

The Client and Server modules are therefore thin wrappers providing the concurrency, with the bulk of the code in Impl.

We’ll start with the Client and Server code first, before diving into the nitty-gritty Impl variations.

Implementation

Client

The Matryoshka.Client module will abstract over sending messages to Matryoshka.Server by exposing functions to start the server and send storage messages:

/lib/matryoshka/client.ex

 1defmodule Matryoshka.Client do
 2  @server Matryoshka.Server
 3
 4  def start_link(store) do
 5    @server.start_link(store)
 6  end
 7
 8  def get(ref) do
 9    GenServer.call(@server, {:get, ref})
10  end
11
12  def fetch(ref) do
13    GenServer.call(@server, {:fetch, ref})
14  end
15
16  def put(ref, value) do
17    GenServer.cast(@server, {:put, ref, value})
18  end
19
20  def delete(ref) do
21    GenServer.cast(@server, {:delete, ref})
22  end
23end

The :fetch and :get messages will need to return the value (or an error/nil) so these are sent as synchronous calls.

Next we need to send the :put and :delete messages. I’ve elected to send these as casts rather than as calls.

I’m aware that calls in Elixir provide useful backpressure since they’re synchronous, whereas the casts (asynchronous) don’t, but I’m not planning on using Matryoshka in a production context anyways.

Now it’s time to handle the storage messages.

Server

The MapStore.Server module only needs to be a thin GenServer wrapper around the business logic in MapStore.Impl, so we start with start_link/1 to and init/1 to initialize the server:

/lib/matryoshka/mapstore/server.ex

 1defmodule Matryoshka.Server do
 2  use GenServer
 3
 4  alias Matryoshka.Storage
 5
 6  def start_link(default) do
 7    GenServer.start_link(__MODULE__, default, name: __MODULE__)
 8  end
 9
10  @impl true
11  def init(store) do
12    {:ok, store}
13  end
14  ...

The store provided in init/1 here is any struct which implements the Storage protocol.

You’ll also notice that the name is always set to Server when we start the store server.

Initially I thought that we’d only ever have 1 store running in the application (since there are store combinators which can deal with an arbitrarily large list of underlying stores), but I’ll probably end up changing this (and Client) to request a name keyword eventually.

That way, if this were to run as an actual storage backend, we can do things like having a new store process per each user connected to the frontend application.

OK, so now we need to handle the different storage messages.

These just call out to the business logic in the provided store, using the Storage protocol to dispatch the function calls:

/lib/matryoshka/mapstore/server.ex

14  ...
15  @impl true
16  def handle_call({:get, ref}, _from, store) do
17    value = Storage.get(store, ref)
18    {:reply, value, store}
19  end
20
21  @impl true
22  def handle_call({:fetch, ref}, _from, store) do
23    value = Storage.fetch(store, ref)
24    {:reply, value, store}
25  end
26
27  @impl true
28  def handle_cast({:put, ref, value}, store) do
29    {:noreply, Storage.put(store, ref, value)}
30  end
31
32  @impl true
33  def handle_cast({:delete, ref}, store) do
34    {:noreply, Storage.delete(store, ref)}
35  end
36end

Again, Server and Client are just very thin wrappers around handling state and storage requests.

Now it’s time to write some actual business logic.

MapStore

MapStore is the simplest store we can make; it’s an in-memory, map-backed store.

First we create a MapStore struct in the MapStore module with only one key, map, which will contain the underlying map:

/lib/matryoshka/impl/map_store.ex

1defmodule Matryoshka.Impl.MapStore do
2  @enforce_keys [:map]
3  defstruct [:map]
4  ...

Along with a helper function to create the MapStore struct from a map:

/lib/matryoshka/impl/map_store.ex

4  ...
5  def map_store(), do: map_store(Map.new())
6  def map_store(map), do: %__MODULE__{map: map}
7  ...

Then we need to define the four methods to implement the Storage protocol.

Map.fetch/2 only returns an :error in the case of the key not being found, but as mentioned earlier, I want to add a reason when there’s a failure. I provide that in the shape {:no_ref, ref}, which means “No reference in the store for the provided ref”:

/lib/matryoshka/impl/map_store.ex

 7  ...
 8  alias __MODULE__
 9
10  defimpl Matryoshka.Storage do
11    def fetch(store, ref) do
12      case Map.fetch(store.map, ref) do
13        {:ok, value} -> {:ok, value}
14        :error -> {:error, {:no_ref, ref}}
15      end
16    end
17    ...

get/2 is a simple call to Map.get/2:

/lib/matryoshka/impl/map_store.ex

17    ...
18    def get(store, ref), do: Map.get(store.map, ref)
19    ...

put/3 and delete/2 are similarly simple calls to Map.put/3 and Map.delete/2, although we need to wrap the underlying map back into a MapStore:

/lib/matryoshka/impl/map_store.ex

19# /lib/matryoshka/impl/map_store.ex
20    ...
21    def put(store, ref, value) do
22      map = Map.put(store.map, ref, value)
23      Impl.map_store(map)
24    end
25
26    def delete(store, ref) do
27      map = Map.delete(store.map, ref)
28      Impl.map_store(map)
29    end
30  end
31end

…and that completes the business logic for MapStore!

PassThrough

Storage Combinators defines a pass-through store PassThrough, which is the simplest store combinator; it just passes all its requests to its source store, unchanged.

In Storage Combinators, PassThrough is a useful object as other store combinators can inherit from it, reusing the functionality of passing storage requests through to other stores.

In Matryoshka, it’s completely useless, as there’s no such thing as class inheritance in Elixir. We’ll need to write the code for storage combinators passing requests to their inner stores by hand.

But we’ll build it anyways just to get a feel for what a storage combinator looks like.

Just like with MapStore, we start with a struct and a helper function to construct the struct:

/lib/matryoshka/impl/pass_through.ex

 1defmodule Matryoshka.Impl.PassThrough do
 2  alias Matryoshka.Storage
 3  
 4  @enforce_keys [:inner]
 5  defstruct [:inner]
 6
 7  @typedoc """
 8  A struct that implements the Matryoshka.Storage protocol.
 9  """
10  @type impl_storage :: any()
11
12  @type t :: %__MODULE__{
13          inner: impl_storage
14        }
15
16  def pass_through(storage) when is_struct(storage) do
17    %__MODULE__{inner: storage}
18  end
19  ...

and then implement the Storage protocol methods.

fetch/2 and get/2 just call into the inner store:

/lib/matryoshka/impl/pass_through.ex

19  ...
20  alias __MODULE__
21
22  defimpl Storage do
23    def fetch(store, ref), do: Storage.fetch(store.inner, ref)
24
25    def get(store, ref), do: Storage.get(store.inner, ref)
26    ...

While put/3 and delete/2 call into the inner store, then re-wrap it in a PassThrough struct.

/lib/matryoshka/impl/pass_through.ex

26    ...
27    def put(store, ref, value) do
28      new_inner = Storage.put(store.inner, ref, value)
29      PassThrough.pass_through(new_inner)
30    end
31
32    def delete(%PassThrough{inner: inner}, ref) do
33      new_inner = Storage.delete(inner, ref)
34      PassThrough.pass_through(new_inner)
35    end
36  end
37end

This pattern of calling into the inner structs, then re-wrapping the changed stores, is the defining characteristic of the storage combinators.

LoggingStore

Now it’s time to construct a useful storage combinator.

Storage Combinators defines a logging store that sends a log message when a storage call is made, but delegates the actual getting/fetching/putting/deleting of values to the inner store.

We’ll do the same.

Once again we start with a struct and a helper function to create the struct:

/lib/matryoshka/impl/logging_store.ex

 1defmodule Matryoshka.Impl.LoggingStore do
 2  alias Matryoshka.Storage
 3  @enforce_keys [:inner]
 4  defstruct [:inner]
 5
 6  @typedoc """
 7  A struct that implements the Matryoshka.Storage protocol.
 8  """
 9  require Logger
10  @type impl_storage :: any()
11
12  @type t :: %__MODULE__{
13          inner: impl_storage
14        }
15
16  def logging_store(storage) when is_struct(storage) do
17    %__MODULE__{inner: storage}
18  end
19  ...

And then we’ll define the four Storage functions: fetch/2, get/2, put/3, and delete/2.

fetch/2 and get/2 will once again compute their results by just calling to the inner store using Storage. We’ll then log the results using structured log messages.

/lib/matryoshka/impl/logging_store.ex

19  ...
20  alias __MODULE__
21
22  defimpl Storage do
23    def fetch(store, ref) do
24      value = Storage.fetch(store.inner, ref)
25      Logger.info([request: "FETCH", ref: ref, value: value])
26      value
27    end
28
29    def get(store, ref) do
30      value = Storage.get(store.inner, ref)
31      Logger.info([request: "GET", ref: ref, value: value])
32      value
33    end
34    ...

We implement put/3 and delete/2 to:

  1. Log a message
  2. Change the inner store using the function dispatch of Storage
  3. Wrap the changed inner store in a LoggingStore

/lib/matryoshka/impl/logging_store.ex

34    ...
35    def put(store, ref, value) do
36      inner = Storage.put(store.inner, ref, value)
37      Logger.info([request: "PUT", ref: ref, value: value])
38      LoggingStore.logging_store(inner)
39    end
40
41    def delete(store, ref) do
42      inner = Storage.delete(store.inner, ref)
43      Logger.info([request: "DELETE", ref: ref])
44      LoggingStore.logging_store(inner)
45    end
46  end
47end

And LoggingStore is done.

Exposing functionality via an external module

As a convenience, I re-export all the functionality from Client, Server, and the various implementation modules in the Matryoshka module via the defdelegate macro:

/lib/matryoshka.ex

 1defmodule Matryoshka do
 2
 3  # Implementation logic
 4  alias Matryoshka.Impl.LoggingStore
 5  alias Matryoshka.Impl.MapStore
 6  alias Matryoshka.Impl.PassThrough
 7
 8  # Client to interact with store
 9  alias Matryoshka.Client
10
11  # Client
12  defdelegate start_link(store), to: Client
13  defdelegate get(ref), to: Client
14  defdelegate fetch(ref), to: Client
15  defdelegate put(ref, value), to: Client
16  defdelegate delete(ref), to: Client
17
18  # Business logic
19  defdelegate logging_store(store), to: LoggingStore
20  defdelegate map_store(), to: MapStore
21  defdelegate map_store(map), to: MapStore
22  defdelegate pass_through(store), to: PassThrough
23end

This just allows developers to use Matryoshka in a more ergonomic way, by importing or aliasing only this one module, hiding the implementation logic.

Matryoshka in action

Now that we’ve got some stores and store combinators defined, we can finally try composing a store, and then updating and querying it:

alias Matryoshka

{:ok, client} =
  Matryoshka.map_store()
  |> Matryoshka.logging_store()
  |> Matryoshka.start_link()
#=> {:ok, #PID<0.152.0>}

Matryoshka.put("key", :value)
#=> 10:20:30.000 [info] [request: "PUT", ref: "key", value: :value]
#=> :ok

Matryoshka.fetch("key")
#=> 10:20:35.000 [info] [request: "FETCH", ref: "key", value: {:ok, :value}]
#=> {:ok, :value}

Matryoshka.delete("key")
#=> 10:20:40.000 [info] [request: "DELETE", ref: "key"]
#=> :ok

Matryoshka.get("key")
#=> 10:20:45.000 [info] [request: "GET", ref: "key", value: nil]
#=> nil

In the next post in this series, I’ll add some more stores and store combinators to Matryoshka to make it more useful.

You can see the latest version of Matryoshka at my GitHub.