Matryoshka 2: More Storage Combinators

Let’s add some more stores and store combinators to Matryoshka, my composable storage library in Elixir.

More store combinators

MappingStore

Storage Combinators proposes a Mapping Store which applies transformations on references and values:

The Mapping Store is an abstract superclass modelled after a map() function or a Unix filter, applying simple transformations to its inputs to yield its outputs when communicating with its source. Due to the fact that stores have a slightly richer protocol than functions or filters, the mapping store has to perform three separate mappings:

  1. Map the reference before passing it to the source.
  2. Map the data that is read from the source after it is read.
  3. Map the data that is written to the source, before it is written.

This would be pretty useful for all sorts of deserialization / serialization stores: we could use regular functions to translate Elixir types back and forth into JSON (or XML, or S-expressions, or CSV…, ad infinitum) to store data on disk.

First things first, we’ll define the module and a type to store the three mapping functions:

  1. map_ref is a function that maps references (ref -> mapped_ref) before using them to locate values
  2. map_retrieved is a function (stored_value -> value) that maps values when retrieved (get/fetch) from the store
  3. map_to_store is a function (value -> stored_value) that maps values before storing them

I’ve only enforced the inner field for the struct. If the function isn’t provided in the struct, I’ll default to the identity function (which returns its input value unchanged), so the reference/value won’t be transformed. This is a bit more convenient when defining mapping stores where we only want to map the reference, or only want to map the values on storage and retrieval.

/lib/matryoshka/impl/mapping_store.ex

 1defmodule Matryoshka.Impl.MappingStore do
 2  alias Matryoshka.IsStorage
 3  alias Matryoshka.Storage
 4  alias Matryoshka.Reference
 5  
 6  @identity &Function.identity/1
 7
 8  @enforce_keys [:inner]
 9  defstruct [
10    :inner,
11    map_ref: @identity,
12    map_retrieved: @identity,
13    map_to_store: @identity
14  ]
15
16  @type t :: %__MODULE__{
17          inner: IsStorage.t(),
18          map_ref: (Reference.t() -> Reference.t()),
19          map_retrieved: (any() -> any()),
20          map_to_store: (any() -> any())
21        }
22  ...

I’ll also add a helper function mapping_store/2 to create the struct. We can change the mapping functions in the MappingStore by providing the functions as keywords.

/lib/matryoshka/impl/mapping_store.ex

22  ...
23  def mapping_store(inner, opts \\ []) do
24    map_ref = Keyword.get(opts, :map_ref, @identity)
25    map_retrieved = Keyword.get(opts, :map_retrieved, @identity)
26    map_to_store = Keyword.get(opts, :map_to_store, @identity)
27
28    %__MODULE__{
29      inner: inner,
30      map_ref: map_ref,
31      map_retrieved: map_retrieved,
32      map_to_store: map_to_store
33    }
34  end
35  ...

Now we just need to define the Storage protocol for the module. Like all our storage combinators, we’re calling the Storage functions on the inner store, but we also call the mapping functions where necessary, i.e.:

/lib/matryoshka/impl/mapping_store.ex

35  ...
36  alias __MODULE__
37
38  defimpl Storage do
39    def fetch(store, ref) do
40      value =
41        Storage.fetch(
42          store.inner,
43          store.map_ref.(ref)
44        )
45      value_new =
46        case value do
47          {:ok, value} -> {:ok, store.map_retrieved.(value)}
48          error -> error
49        end
50
51      value_new
52    end
53
54    def get(store, ref) do
55      value = 
56        Storage.get(
57          store.inner, 
58          store.map_ref.(ref)
59        )
60      value_new =
61        case value do
62          nil -> nil
63          value -> store.map_retrieved.(value)
64        end
65
66      value_new
67    end
68
69    def put(store, ref, value) do
70      inner_new =
71        Storage.put(
72          store.inner,
73          store.map_ref.(ref),
74          store.map_to_store.(value)
75        )
76
77      %{store | inner: inner_new}
78    end
79
80    def delete(store, ref) do
81      inner_new = 
82        Storage.delete(
83          store.inner, 
84          store.map_ref.(ref)
85        )
86      %{store | inner: inner_new}
87    end
88  end
89end

SwitchingStore

Storage Combinators defines a switching store which distributes requests to subsidiary stores. In the first post of this series, I mentioned that:

In cases where we would use a scheme in a URI, we can simply use the first path segment

So the idea behind my implementation is as follows:

  1. We’ll peel off the first path segment of a reference
  2. Then use that segment to choose which underlying store to access
  3. Then hand the rest of the reference to that store to act as the key to store and retrieve values

We’ll keep the stores in a map of strings to stores underneath a struct:

/lib/matryoshka/impl/switching_store.ex

 1defmodule Matryoshka.Impl.SwitchingStore do
 2  alias Matryoshka.Impl.SwitchingStore
 3  alias Matryoshka.Reference
 4  alias Matryoshka.Storage
 5  alias Matryoshka.IsStorage
 6
 7  @enforce_keys [:path_store_map]
 8  defstruct @enforce_keys
 9
10  @type t :: %__MODULE__{
11          path_store_map: %{String.t() => IsStorage.t()}
12        }
13
14  def switching_store(path_store_map) when is_map(path_store_map) do
15    %__MODULE__{
16      path_store_map: path_store_map
17    }
18  end
19  ...

Before defining the implementations for storage, let’s get some helper functions defined.

I want a function to update the path store map whenever we update an inner store. This just needs to reach into the underlying path_store_map and put the updated store there:

/lib/matryoshka/impl/switching_store.ex

19  ...
20  alias __MODULE__
21
22  def update_substore(store, sub_store, sub_store_ref) do
23    store.path_store_map
24    |> Map.put(sub_store_ref, sub_store)
25    |> SwitchingStore.switching_store()
26  end
27  ...

I’d also like a function to split a reference into two references:

  1. The first path segment
  2. The rest of the path (i.e., the remaining path segments concatenated with a /)

/lib/matryoshka/impl/switching_store.ex

27  ...
28  def split_reference(ref) do
29    [path_head | path_tail] = Reference.path_segments(ref)
30
31    case path_tail do
32      [] -> {:error, {:ref_path_too_short, ref}}
33      path -> {:ok, path_head, Enum.join(path, "/")}
34    end
35  end
36  ...

To make life more convenient, I also want a function which:

  1. Retrieves the substore
  2. Returns the split path for me

/lib/matryoshka/impl/switching_store.ex

36  ...
37  def locate_substore(store, ref) do
38    with {:split_ref, {:ok, path_first, path_rest}} <-
39           {:split_ref, SwitchingStore.split_reference(ref)},
40         {:fetch_substore, {:ok, sub_store}} <-
41           {:fetch_substore, Map.fetch(store.path_store_map, path_first)} do
42      {:ok, sub_store, path_first, path_rest}
43    else
44      {:split_ref, error} -> error
45      {:fetch_substore, :error} -> {:error, :no_substore}
46    end
47  end
48  ...

Now with those helper functions out of the way, we can define the methods for Storage. The basic gist is the same across all the methods:

  1. Locate the substore using the first part of the path
  2. Direct the Storage calls to the substore
  3. If the methods update a substore (i.e. put and delete), we update the substore and then update the map in the struct

/lib/matryoshka/impl/switching_store.ex

48  ...
49  defimpl Storage do
50    def fetch(store, ref) do
51      with {:locate, {:ok, sub_store, path_first, path_rest}} <-
52             {:locate, SwitchingStore.locate_substore(store, ref)},
53           {:fetch, {:ok, value}} <-
54             {:fetch, Storage.fetch(sub_store, path_rest)} do
55        {:ok, value}
56      else
57        {:locate, error} -> error
58        {:fetch, error} -> error
59      end
60    end
61
62    def get(store, ref) do
63      with {:ok, sub_store, path_first, path_rest} <-
64             SwitchingStore.locate_substore(store, ref) do
65        value = Storage.get(sub_store, path_rest)
66        value
67      else
68        _error -> nil
69      end
70    end
71
72    def put(store, ref, value) do
73      with {:ok, sub_store, path_first, path_rest} <-
74             SwitchingStore.locate_substore(store, ref) do
75        new_sub_store = Storage.put(sub_store, path_rest, value)
76        SwitchingStore.update_substore(store, new_sub_store, path_first)
77      else
78        _ -> store
79      end
80    end
81
82    def delete(store, ref) do
83      with {:ok, sub_store, path_first, path_rest} <-
84             SwitchingStore.locate_substore(store, ref) do
85        new_sub_store = Storage.delete(sub_store, path_rest)
86        SwitchingStore.update_substore(store, new_sub_store, path_first)
87      else
88        _ -> store
89      end
90    end
91  end
92end

BackupStore

With SwitchingStore, we’ve broken the ground on store combinators that compose an arbitrarily large number of stores. Let’s continue with a BackupStore, which will retrieve values only from a main store, but store values in both the main store and a list of target stores.

Once again we start with a struct and a helper function to construct the struct:

/lib/matryoshka/impl/backup_store.ex

 1defmodule Matryoshka.Impl.BackupStore do
 2  alias Matryoshka.IsStorage
 3  alias Matryoshka.Storage
 4
 5  @enforce_keys [:source_store, :target_stores]
 6  defstruct @enforce_keys
 7
 8  @type t :: %__MODULE__{
 9          source_store: IsStorage.t(),
10          target_stores: list(IsStorage.t())
11        }
12
13  def backup_store(source_store, target_stores)
14      when is_struct(source_store) and is_list(target_stores) do
15    %__MODULE__{
16      source_store: source_store,
17      target_stores: target_stores
18    }
19  end
20  ...

Now let’s define the Storage functionality. fetch/2 and get/2 just delegate their calls to the inner source store:

/lib/matryoshka/impl/backup_store.ex

20  ...
21  alias __MODULE__
22
23  defimpl Storage do
24    def fetch(store, ref) do
25      Storage.fetch(store.source_store, ref)
26    end
27
28    def get(store, ref) do
29      Storage.get(store.source_store, ref)
30    end
31    ...

While put/3 and delete/2 map over the source and target stores, then wrap up the updated stores into the BackupStore struct:

/lib/matryoshka/impl/backup_store.ex

31    ...
32    def put(store, ref, value) do
33      source_store = Storage.put(store.source_store, ref, value)
34      target_stores = Enum.map(
35        store.target_stores,
36        fn store -> Storage.put(store, ref, value) end
37      )
38      BackupStore.backup_store(source_store, target_stores)
39    end
40
41    def delete(store, ref) do
42      source_store = Storage.delete(store.source_store, ref)
43      target_stores = Enum.map(
44        store.target_stores,
45        fn store -> Storage.delete(store, ref) end
46      )
47      BackupStore.backup_store(source_store, target_stores)
48    end
49  end
50end

CachingStore

The BackupStore is useful for keeping auxiliary stores updated with values, but we can never actually use those backup stores to retrieve values. It would be nice to use those alternate stores when the first store we check doesn’t have the value, so let’s create a CachingStore that caches data with the following requirements:

Ah, but we run into an issue; fetch/2 and get/2 only return a value, they don’t return the store. That means we can’t mutate the CachingStore on gets and fetches, as we’d need to mutate the cache store inside. But we can fix that pretty easily by requiring fetch/2 and get/2 to return a tuple of {store, value} instead of just value. That way, we can have CachingStore update the cache store when the value is retrieved from the main store.

There are some drawbacks to this update:

But I think it’s well worth it for caching.

Once more, we start with a struct and a helper function caching_store/2 to build the struct. I’ve also specialised the constructor function into caching_store/1, which defaults to using a MapStore as the fast cache store.

/lib/matryoshka/impl/caching_store.ex

 1defmodule Matryoshka.Impl.CachingStore do
 2  alias Matryoshka.IsStorage
 3  alias Matryoshka.Storage
 4  import Matryoshka.Impl.MapStore, only: [map_store: 0]
 5
 6  @enforce_keys [:main_store, :cache_store]
 7  defstruct [:main_store, :cache_store]
 8
 9  @type t :: %__MODULE__{
10          main_store: IsStorage.t(),
11          cache_store: IsStorage.t()
12        }
13
14  def caching_store(main_storage), do: caching_store(main_storage, map_store())
15
16  def caching_store(main_storage, fast_storage)
17      when is_struct(main_storage) and is_struct(fast_storage) do
18    %__MODULE__{main_store: main_storage, cache_store: fast_storage}
19  end
20  ...

Both fetch/2 and get/2 follow the same general idea:

/lib/matryoshka/impl/caching_store.ex

20  ...
21  alias __MODULE__
22
23  defimpl Storage do
24    def fetch(store, ref) do
25      {cache_store_new, val_fast} = Storage.fetch(store.cache_store, ref)
26
27      case val_fast do
28        {:ok, _value} ->
29          new_store = %{store | cache_store: cache_store_new}
30          {new_store, val_fast}
31
32        {:error, _reason_fast} ->
33          {main_store_new, val_main} = Storage.fetch(store.main_store, ref)
34
35          case val_main do
36            {:ok, value} ->
37              cache_store_new = Storage.put(cache_store_new, ref, value)
38              new_store = CachingStore.caching_store(
39                main_store_new, 
40                cache_store_new
41              )
42              {new_store, val_main}
43
44            {:error, reason} ->
45              {store, {:error, reason}}
46          end
47      end
48    end
49
50    def get(store, ref) do
51      {cache_store_new, val_fast} = Storage.get(store.cache_store, ref)
52
53      case val_fast do
54        nil ->
55          {main_store_new, val_main} = Storage.get(store.main_store, ref)
56
57          case val_main do
58            nil ->
59              store_new = CachingStore.caching_store(
60                main_store_new, 
61                cache_store_new
62              )
63              {store_new, nil}
64
65            value ->
66              cache_store_new = Storage.put(cache_store_new, ref, value)
67              store_new = CachingStore.caching_store(
68                main_store_new, 
69                cache_store_new
70              )
71              {store_new, value}
72          end
73
74        value ->
75          store_new = %{store | cache_store: cache_store_new}
76          {store_new, value}
77      end
78    end
79    ...

The code for put/3 and delete/2 on the other hand is much easier. We update both stores (main and cache), then wrap them into a CachingStore struct:

/lib/matryoshka/impl/caching_store.ex

79    ...
80    def put(store, ref, value) do
81      main_store = Storage.put(store.main_store, ref, value)
82      cache_store = Storage.put(store.cache_store, ref, value)
83      CachingStore.caching_store(main_store, cache_store)
84    end
85
86    def delete(store, ref) do
87      main_store = Storage.delete(store.main_store, ref)
88      cache_store = Storage.delete(store.cache_store, ref)
89      CachingStore.caching_store(main_store, cache_store)
90    end
91  end
92end

…and that’s CachingStore done.

Exposing to the outside world

Great, we’ve defined the business logic for a few new useful store combinators, which means that it’s time to expose them in the Matryoshka module:

/lib/matryoshka.ex

17  ...
18  # Business logic
19  defdelegate backup_store(source_store, target_stores), to: BackupStore
20  defdelegate caching_store(main_store), to: CachingStore
21  defdelegate caching_store(main_store, cache_store), to: CachingStore
22  defdelegate logging_store(store), to: LoggingStore
23  defdelegate map_store(), to: MapStore
24  defdelegate map_store(map), to: MapStore
25  defdelegate mapping_store(store, opts), to: MappingStore
26  defdelegate pass_through(store), to: PassThrough
27  defdelegate switching_store(path_store_map), to: SwitchingStore
28end

Next steps

There’s a glaring issue when it comes to using Matryoshka as a storage backend that I’ve not discussed yet.

We’ve got a bunch of storage combinators to add all sorts of functionality, which is great, but all our stores so far have been in-memory only; so we lose all the data when the store closes (i.e. because the store BEAM process terminates).

Now that we’ve implemented CachingStore, we have the ability to cache data using a fast store (which we can keep in-memory) with a backup main store (which we’ll keep on disk). So I think it’s high time we add stores that persist data to disk.

We’ll be doing that in the next post in this series.

You can see the latest version of Matryoshka at my GitHub.