I recently came across the paper Storage Combinators, which proposes an approach to designing storage systems by composing together modular components.
The code in the paper is given in Objective Smalltalk, but I decided to build Matryoshka (inspired by Storage Combinators, but not an exact replica) in Elixir.
In the Storage Combinators paper, the composition of different stores and store combinators is done via message passing.
Originally I chose Elixir because I thought this message passing style mapped nicely to the message passing and handling that GenServers offer.
After an early false start, I realised this approach had some issues:
I ended up refactoring so that store composition was done with regular structs and functions.
We first compose the business logic of a store together, before wrapping it in a Server and interacting with it via the Client module, so we only ever have 1 process per store, no matter how complex it is under the hood.
Regardless, the ideas around concurrency that the BEAM VM and BEAM languages (Erlang, Elixir, Gleam, Lisp Flavoured Erlang, etc.) offer are very interesting. A few examples of the design choices that come to mind:
OK, so why the name “Matryoshka”?
I eventually elected to engineer this composition functionality by wrapping stores with store combinators (more on this later). This struct composition reminded me of Russian nesting dolls, which recursively store ever smaller Russian nesting dolls inside themselves.
Storage Combinators defines two protocols: one is a reference protocol, which identifies the location of a resource, while the second is a storage protocol, which models the insertion, deletion, and retrieval of resources.
The reference protocol looks like this:
protocol Reference {
-<Array>pathComponents.
-<String>path.
-<String>scheme.
}
References are supposed to have a path and a scheme, which can (for convenience) be accessed as a list of path components.
I’ve elected to simplify the Reference protocol in Elixir by removing the scheme attribute, and enforcing that the only required method is path_segments/1
; the full path
is just the type itself. In cases where we would use a scheme in a URI, we can simply use the first path segment, which simplifies things for our implementation.
/lib/matryoshka/reference.ex
1defprotocol Matryoshka.Reference do
2 @typedoc """
3 A type that implements the Matryoshka.Reference protocol.
4 """
5 @type t() :: any
6
7 @doc """
8 Splits a Reference into the list of underlying path components.
9 """
10 @spec path_segments(t()) :: list(String.t())
11 def path_segments(reference)
12end
13
14defimpl Matryoshka.Reference, for: BitString do
15 def path_segments(reference) do
16 String.split(reference, "/")
17 end
18end
19
20defimpl Matryoshka.Reference, for: Atom do
21 def path_segments(reference) do
22 [Atom.to_string(reference)]
23 end
24end
In Matryoshka we’ll predominantly be using strings as the references. I expect paths to be strings like "/first/second/key"
, so to create the list of path segments, we simply splitthe string using forward slashes as a delimiter. I’ve also added an implementation for atoms to show that we could use any other types as references.
The storage protocol is defined as:
protocol Storage {
-at:ref.
-<void>at:ref put:object.
-<void>at:ref merge:object
-<void>deleteAt:ref;
}
Any object that implements the Storage protocol is a store.
Some stores handle their storage calls (at/put/merge/deleteAt) by referring to other stores then processing those stores’ results in some way (e.g. by combining, filtering, mapping). These stores are store combinators.
This is the main novelty of the paper.
So, there are four methods that have to be defined on an object for it to be a store:
at
takes a reference and returns the data at that reference (if any)put
puts an object at the reference ref
, and returns nothingmerge
merges a provided object with the object at the reference ref
, and returns nothingdeleteAt
deletes the object at the reference ref
These four methods are equivalent to the 4 HTTP methods:
Objective Smalltalk | HTTP |
---|---|
store at:ref | GET <URI> |
store at:ref put:data | PUT <URI> <data> |
store at:ref merge:data | PATCH <URI> <data> |
store deleteAt:ref | DELETE <URI> |
The paper’s barely started but I’ve already decided to diverge in my implementation:
merge
methodfetch
methodI removed merge
as I want the storage combinators to be as generic as possible. The issue with patching-style methods is that not all types make sense to be patched. For data formats like JSON and XML, there’s JSON Patch and XML Patch. But what merge operations are there for an atom or a string?
They’re such small types it makes more sense to just put a new value there instead.
Also, you can create the patch functionality by first getting the value, updating the value, and then putting it in, so it’s arguably not even a primitive function anyways.
However, I may end up adding an update
function to the Storage protocol like Map.update/4 which updates the value with a function.
Then if we want to apply a JSON Patch to a Map or List, we can create a 1-argument lambda by partially applying a JSON patch function with the desired patch operations, which will then be passed to update
.
There are a few lookup methods in Map, but the main basic methods are get/2
and fetch/2
, which differ in the shape of their return values:
Method | Return on success | Returns on failure |
---|---|---|
get/2 |
value |
nil |
fetch/2 |
{:ok, value} |
:error |
One of the nice things about fetch/2
is that you can distinguish between no value being stored at a key (which is returned as :error
) and the value nil
being stored at a key (which is returned as {:ok, nil}
).
get/2
on the other hand returns nil
in both cases.
I also decided to augment the return type of the errors in fetch with a reason, so the fetch in Matryoshka is intended to return either {:ok, value}
when a value is stored at the requested key, or {:error, reason}
if there’s an error.
There’s one last change that needs to be made when porting over the storage protocol to Elixir.
Elixir, as a functional programming language, has immutable data. To deal with changing state, we need a pure function which creates a new state from the old state.
The Storage protocol will specify these pure functions.
Later, we’ll use a GenServer to track the changes in state.
As such, we write the storage protocol:
/lib/matryoshka/storage.ex
1defprotocol Matryoshka.Storage do
2 alias Matryoshka.Reference
3
4 @typedoc """
5 A type that implements the Matryoshka.Storage protocol.
6 """
7 @type store :: any
8
9 @type value :: any
10
11 @doc """
12 Fetches the value for a specific `ref` in `store`.
13
14 If `store` contains the given `ref` then its value is returned in the
15 shape of `{:ok, value}`.
16 If `store` doesn't contain `ref`, then the reason why is returned in the
17 shape of `{:error, reason}`.
18 """
19
20 def fetch(store, ref)
21
22 @doc """
23 Gets the value for a specific `ref` in `store`.
24
25 If `store` contains the given `ref` then its value `value` is returned.
26 If `store` doesn't contain `ref`, `nil` is returned.
27 """
28 @spec get(store(), Reference.t()) :: value()
29 def get(store, ref)
30
31 @doc """
32 Puts the given `value` under `ref` in `store`.
33 """
34 @spec put(store(), Reference.t(), value()) :: store()
35 def put(store, ref, value)
36
37 @doc """
38 Deletes the entry in `store` for a specific `ref`.
39
40 If the `ref` does not exist, returns `store` unchanged.
41 """
42 @spec delete(store(), Reference.t()) :: store()
43 def delete(store, ref)
44end
I’ve also created a small module used for typespecs where I expect a struct implementing Storage:
/lib/matryoshka/is_storage.ex
1defmodule Matryoshka.IsStorage do
2 @typedoc """
3 A struct that implements the Matryoshka.Storage protocol.
4 """
5 @type t :: any()
6end
And now we’re ready to start putting together implementations for Storage.
As a matter of taste, I like a tripartite approach to building out processes in Elixir:
Client
module, which provides utility functions that wrap sending call and cast messages to the ServerServer
module, which handles calls and casts, but uses the business logic in ImplImpl
module, which provides the actual business logic in pure functionsThe Client
and Server
modules are therefore thin wrappers providing the concurrency, with the bulk of the code in Impl
.
We’ll start with the Client
and Server
code first, before diving into the nitty-gritty Impl variations.
The Matryoshka.Client
module will abstract over sending messages to Matryoshka.Server
by exposing functions to start the server and send storage messages:
/lib/matryoshka/client.ex
1defmodule Matryoshka.Client do
2 @server Matryoshka.Server
3
4 def start_link(store) do
5 @server.start_link(store)
6 end
7
8 def get(ref) do
9 GenServer.call(@server, {:get, ref})
10 end
11
12 def fetch(ref) do
13 GenServer.call(@server, {:fetch, ref})
14 end
15
16 def put(ref, value) do
17 GenServer.cast(@server, {:put, ref, value})
18 end
19
20 def delete(ref) do
21 GenServer.cast(@server, {:delete, ref})
22 end
23end
The :fetch
and :get
messages will need to return the value (or an error/nil
) so these are sent as synchronous calls.
Next we need to send the :put
and :delete
messages. I’ve elected to send these as casts rather than as calls.
I’m aware that calls in Elixir provide useful backpressure since they’re synchronous, whereas the casts (asynchronous) don’t, but I’m not planning on using Matryoshka in a production context anyways.
Now it’s time to handle the storage messages.
The MapStore.Server
module only needs to be a thin GenServer wrapper around the business logic in MapStore.Impl
, so we start with start_link/1
to and init/1
to initialize the server:
/lib/matryoshka/mapstore/server.ex
1defmodule Matryoshka.Server do
2 use GenServer
3
4 alias Matryoshka.Storage
5
6 def start_link(default) do
7 GenServer.start_link(__MODULE__, default, name: __MODULE__)
8 end
9
10 @impl true
11 def init(store) do
12 {:ok, store}
13 end
14 ...
The store
provided in init/1
here is any struct which implements the Storage protocol.
You’ll also notice that the name is always set to Server
when we start the store server.
Initially I thought that we’d only ever have 1 store running in the application (since there are store combinators which can deal with an arbitrarily large list of underlying stores), but I’ll probably end up changing this (and Client
) to request a name keyword eventually.
That way, if this were to run as an actual storage backend, we can do things like having a new store process per each user connected to the frontend application.
OK, so now we need to handle the different storage messages.
These just call out to the business logic in the provided store, using the Storage protocol to dispatch the function calls:
/lib/matryoshka/mapstore/server.ex
14 ...
15 @impl true
16 def handle_call({:get, ref}, _from, store) do
17 value = Storage.get(store, ref)
18 {:reply, value, store}
19 end
20
21 @impl true
22 def handle_call({:fetch, ref}, _from, store) do
23 value = Storage.fetch(store, ref)
24 {:reply, value, store}
25 end
26
27 @impl true
28 def handle_cast({:put, ref, value}, store) do
29 {:noreply, Storage.put(store, ref, value)}
30 end
31
32 @impl true
33 def handle_cast({:delete, ref}, store) do
34 {:noreply, Storage.delete(store, ref)}
35 end
36end
Again, Server
and Client
are just very thin wrappers around handling state and storage requests.
Now it’s time to write some actual business logic.
MapStore is the simplest store we can make; it’s an in-memory, map-backed store.
First we create a MapStore struct in the MapStore
module with only one key, map
, which will contain the underlying map:
/lib/matryoshka/impl/map_store.ex
1defmodule Matryoshka.Impl.MapStore do
2 @enforce_keys [:map]
3 defstruct [:map]
4 ...
Along with a helper function to create the MapStore struct from a map:
map_store/0
creates a MapStore from a new mapmap_store/1
creates a MapStore from the provided map/lib/matryoshka/impl/map_store.ex
4 ...
5 def map_store(), do: map_store(Map.new())
6 def map_store(map), do: %__MODULE__{map: map}
7 ...
Then we need to define the four methods to implement the Storage protocol.
Map.fetch/2
only returns an :error
in the case of the key not being found, but as mentioned earlier, I want to add a reason when there’s a failure. I provide that in the shape {:no_ref, ref}
, which means “No reference in the store for the provided ref
”:
/lib/matryoshka/impl/map_store.ex
7 ...
8 alias __MODULE__
9
10 defimpl Matryoshka.Storage do
11 def fetch(store, ref) do
12 case Map.fetch(store.map, ref) do
13 {:ok, value} -> {:ok, value}
14 :error -> {:error, {:no_ref, ref}}
15 end
16 end
17 ...
get/2
is a simple call to Map.get/2
:
/lib/matryoshka/impl/map_store.ex
17 ...
18 def get(store, ref), do: Map.get(store.map, ref)
19 ...
put/3
and delete/2
are similarly simple calls to Map.put/3
and Map.delete/2
, although we need to wrap the underlying map back into a MapStore:
/lib/matryoshka/impl/map_store.ex
19# /lib/matryoshka/impl/map_store.ex
20 ...
21 def put(store, ref, value) do
22 map = Map.put(store.map, ref, value)
23 Impl.map_store(map)
24 end
25
26 def delete(store, ref) do
27 map = Map.delete(store.map, ref)
28 Impl.map_store(map)
29 end
30 end
31end
…and that completes the business logic for MapStore
!
Storage Combinators defines a pass-through store PassThrough, which is the simplest store combinator; it just passes all its requests to its source store, unchanged.
In Storage Combinators, PassThrough is a useful object as other store combinators can inherit from it, reusing the functionality of passing storage requests through to other stores.
In Matryoshka, it’s completely useless, as there’s no such thing as class inheritance in Elixir. We’ll need to write the code for storage combinators passing requests to their inner stores by hand.
But we’ll build it anyways just to get a feel for what a storage combinator looks like.
Just like with MapStore
, we start with a struct and a helper function to construct the struct:
/lib/matryoshka/impl/pass_through.ex
1defmodule Matryoshka.Impl.PassThrough do
2 alias Matryoshka.Storage
3
4 @enforce_keys [:inner]
5 defstruct [:inner]
6
7 @typedoc """
8 A struct that implements the Matryoshka.Storage protocol.
9 """
10 @type impl_storage :: any()
11
12 @type t :: %__MODULE__{
13 inner: impl_storage
14 }
15
16 def pass_through(storage) when is_struct(storage) do
17 %__MODULE__{inner: storage}
18 end
19 ...
and then implement the Storage protocol methods.
fetch/2
and get/2
just call into the inner store:
/lib/matryoshka/impl/pass_through.ex
19 ...
20 alias __MODULE__
21
22 defimpl Storage do
23 def fetch(store, ref), do: Storage.fetch(store.inner, ref)
24
25 def get(store, ref), do: Storage.get(store.inner, ref)
26 ...
While put/3
and delete/2
call into the inner store, then re-wrap it in a PassThrough
struct.
/lib/matryoshka/impl/pass_through.ex
26 ...
27 def put(store, ref, value) do
28 new_inner = Storage.put(store.inner, ref, value)
29 PassThrough.pass_through(new_inner)
30 end
31
32 def delete(%PassThrough{inner: inner}, ref) do
33 new_inner = Storage.delete(inner, ref)
34 PassThrough.pass_through(new_inner)
35 end
36 end
37end
This pattern of calling into the inner structs, then re-wrapping the changed stores, is the defining characteristic of the storage combinators.
Now it’s time to construct a useful storage combinator.
Storage Combinators defines a logging store that sends a log message when a storage call is made, but delegates the actual getting/fetching/putting/deleting of values to the inner store.
We’ll do the same.
Once again we start with a struct and a helper function to create the struct:
/lib/matryoshka/impl/logging_store.ex
1defmodule Matryoshka.Impl.LoggingStore do
2 alias Matryoshka.Storage
3 @enforce_keys [:inner]
4 defstruct [:inner]
5
6 @typedoc """
7 A struct that implements the Matryoshka.Storage protocol.
8 """
9 require Logger
10 @type impl_storage :: any()
11
12 @type t :: %__MODULE__{
13 inner: impl_storage
14 }
15
16 def logging_store(storage) when is_struct(storage) do
17 %__MODULE__{inner: storage}
18 end
19 ...
And then we’ll define the four Storage functions: fetch/2
, get/2
, put/3
, and delete/2
.
fetch/2
and get/2
will once again compute their results by just calling to the inner store using Storage. We’ll then log the results using structured log messages.
/lib/matryoshka/impl/logging_store.ex
19 ...
20 alias __MODULE__
21
22 defimpl Storage do
23 def fetch(store, ref) do
24 value = Storage.fetch(store.inner, ref)
25 Logger.info([request: "FETCH", ref: ref, value: value])
26 value
27 end
28
29 def get(store, ref) do
30 value = Storage.get(store.inner, ref)
31 Logger.info([request: "GET", ref: ref, value: value])
32 value
33 end
34 ...
We implement put/3
and delete/2
to:
/lib/matryoshka/impl/logging_store.ex
34 ...
35 def put(store, ref, value) do
36 inner = Storage.put(store.inner, ref, value)
37 Logger.info([request: "PUT", ref: ref, value: value])
38 LoggingStore.logging_store(inner)
39 end
40
41 def delete(store, ref) do
42 inner = Storage.delete(store.inner, ref)
43 Logger.info([request: "DELETE", ref: ref])
44 LoggingStore.logging_store(inner)
45 end
46 end
47end
And LoggingStore is done.
As a convenience, I re-export all the functionality from Client
, Server
, and the various implementation modules in the Matryoshka
module via the defdelegate
macro:
/lib/matryoshka.ex
1defmodule Matryoshka do
2
3 # Implementation logic
4 alias Matryoshka.Impl.LoggingStore
5 alias Matryoshka.Impl.MapStore
6 alias Matryoshka.Impl.PassThrough
7
8 # Client to interact with store
9 alias Matryoshka.Client
10
11 # Client
12 defdelegate start_link(store), to: Client
13 defdelegate get(ref), to: Client
14 defdelegate fetch(ref), to: Client
15 defdelegate put(ref, value), to: Client
16 defdelegate delete(ref), to: Client
17
18 # Business logic
19 defdelegate logging_store(store), to: LoggingStore
20 defdelegate map_store(), to: MapStore
21 defdelegate map_store(map), to: MapStore
22 defdelegate pass_through(store), to: PassThrough
23end
This just allows developers to use Matryoshka in a more ergonomic way, by importing or aliasing only this one module, hiding the implementation logic.
Now that we’ve got some stores and store combinators defined, we can finally try composing a store, and then updating and querying it:
alias Matryoshka
{:ok, client} =
Matryoshka.map_store()
|> Matryoshka.logging_store()
|> Matryoshka.start_link()
#=> {:ok, #PID<0.152.0>}
Matryoshka.put("key", :value)
#=> 10:20:30.000 [info] [request: "PUT", ref: "key", value: :value]
#=> :ok
Matryoshka.fetch("key")
#=> 10:20:35.000 [info] [request: "FETCH", ref: "key", value: {:ok, :value}]
#=> {:ok, :value}
Matryoshka.delete("key")
#=> 10:20:40.000 [info] [request: "DELETE", ref: "key"]
#=> :ok
Matryoshka.get("key")
#=> 10:20:45.000 [info] [request: "GET", ref: "key", value: nil]
#=> nil
In the next post in this series, I’ll add some more stores and store combinators to Matryoshka to make it more useful.
You can see the latest version of Matryoshka at my GitHub.