Matryoshka 4: Remote Stores
In the last post, I wrote some stores which persist their key-value pairs to disk.
In the spirit of gross overcomplication I suppose there is a reasonable use-case here of using a remote store—e.g. S3—as a backup., let’s add a remote store that reads and writes by calling out to an external server. Because there’s nothing like adding an extra 100 ms of latency for no reason other than upgrading your architecture to the paradigm du jour.
Remember, your app isn’t Real Software unless it uses Microservices™.
Originally, I wanted to write a HttpStore that would map get/fetch/put/delete calls to HTTP GET/POST/DELETE requests, but I don’t think it’s possible to make it generic since you’d have to write the implementation against the specific API requirements
Speaking of S3, I probably should’ve built my HttpStore to expect a server that exposes the same interface as the S3 REST API. Perhaps I’ll implement Matryoshka.S3Store one day. of your key-value server:
- Does the KV server expose a version? How so?
- Is it
/v1/? - Or
/2.0/? - Or
/20251128/? - Or
/api/v4/? - Or something entirely different?
- Is it
- Which routes does the KV server expose? What’s the namespace? Is it
/kv/, or is it/store/, or is it something else? - Does the KV server even expose the difference between a key being set to a null value (which should be returned as
{:ok, nil}byfetch/1) and a key not being set at all (which should be returned as{:error, {:no_ref, ref}}byfetch/1)?
So HttpStore has been left as an exercise for the reader.
Instead, I’ll write a remote store backed by an SFTP client—a SftpStore.
Why SFTP?
For starters, I deal with it all the time at my job at S&P Financial Risk Analytics. For some of our clients, we offer a risk pricing service where they ship us their data (their portfolio, counterparties, and other reference data) over SFTP, we run the pricing, and then we upload the risk results back to SFTP for them to use. This is typically driven either by A) regulatory requirements, or B) wanting a more accurate understanding of market exposures so they can trade better.
Also, since SFTP is a file-storage protocol, an SFTP client-backed Matryoshka store will behave identically across different backend servers, so I don’t have to worry about the same bikeshedding issues that HttpStore would’ve brought up.
Implementing SftpStore #
The SftpStore is basically the same as a FilesystemStore in that:
- It reads and writes values to their own files
- References are treated as file location paths
Of course, the big difference is that instead of writing to a local filesystem, we’ll be writing to a remote one over SFTP.
I’ve had a minor goal of writing Matryoshka without external dependencies, and pleasantly, we can still achieve it; erlang has a built-in ssh application which implements both clients and daemons (servers, which we’ll need for testing later).
So first things first, I’ll need to update the Matryoshka mix project script to start up the :ssh application:
/mix.exs
def application do
[
extra_applications: [:logger, :ssh]
]
endAnd then we can get into implementing the actual store. That means we’ll need to be able to:
- Connect to an SFTP server
- Read and write files over the connection
The :ssh_sftp module (which implements an SFTP client) documentation tells us that we’re looking for the function start_channel to connect to an SFTP server:
-spec start_channel(ssh:host(), inet:port_number(), [ssh:client_option() | sftp_option()]) -> {ok, pid(), ssh:connection_ref()} | {error, reason()}.Starts new ssh connection and channel for communicating with the SFTP server.
The returned pid for this process is to be used as input to all other API functions in this module.
OK, so we’ll need to provide this with a host and port number to connect to an SFTP server. Easy enough.
But it’d also be nice to be able to log into private servers using a username and password. Authentication via private key ALSO left as an exercise to the reader.
It turns out one of the ssh:client_option() is authentication_client_options(), which is exactly what we’re looking for:
-type authentication_client_options() :: {user, string()} | {password, string()}.
user- Provides the username. If this option is not given,sshreads from the environment (LOGNAME or USER on UNIX, USERNAME on Windows).
password- Provides a password for password authentication. If this option is not given, the user is asked for a password, if the password authentication method is attempted.
And so first things first, we’ll want to initialise an SftpStore by starting a connection to the SFTP server (passing username and password), then saving the PID and connection in a struct.
/lib/matryoshka/impl/sftp_store.ex
1defmodule Matryoshka.Impl.SftpStore do
2 alias Matryoshka.Reference
3 @enforce_keys [:pid, :connection]
4 defstruct [:pid, :connection]
5 alias __MODULE__
6
7 @type t :: %SftpStore{
8 pid: pid(),
9 connection: :ssh.connection_ref()
10 }
11
12 def sftp_store(host, port, username, password) do
13 # Since we're dealing with `:ssh`, which is an Erlang
14 # module, we'll need to convert the username and password
15 # from Strings to charlists.
16 username = String.to_charlist(username)
17 password = String.to_charlist(password)
18
19 :ssh.start()
20
21 {:ok, pid, connection} =
22 :ssh_sftp.start_channel(
23 host,
24 port,
25 silently_accept_hosts: true,
26 user: username,
27 password: password
28 )
29
30 %SftpStore{pid: pid, connection: connection}
31 endJust like with FilesystemStore, we need to deal with nested keys like foo/bar, which we’ll do by treating the prefixes as directories, and the final path segment as a filename. So we’ll need to recursively make directories – but annoyingly, the :ssh_sftp function make_dir/2 requires that “the directory can only be created in an existing directory”, so I’ll need to write the recursion myself.
Given some key like "foo/bar/baz", we want to convert it into a list of nested directories, where each item in the list is the child of the previous item:
iex> Reference.path_segments("foo/bar/baz") |> parent_dirs()
["foo", "foo/bar"]/lib/matryoshka/impl/sftp_store.ex
33 def parent_dirs(path_segments) do
34 # This function lets us pull all the parents from a
35 # path reference, so that we can make them in the
36 # underlying SFTP directory.
37 {paths, _acc} =
38 path_segments
39 # We don't want to make the last path segment
40 # as a directory, since that'll be the filename.
41 |> Enum.drop(-1)
42 |> Enum.map_reduce(
43 [],
44 fn segment, acc -> {
45 [segment | acc], # Applied to the segment
46 [segment | acc] # Applied to the accumulator
47 } end
48 )
49
50 paths
51 # Reverse the paths since we've been prepending
52 # the children to their parents
53 |> Enum.map(&Enum.reverse/1)
54 # Then recombine them into paths with
55 # forward-slash delimiters
56 |> Enum.map(&Enum.join(&1, "/"))
57 endAnd after that, implementing the rest of the Storage protocol is a breeze.
fetch/1 and get/1 just need to read the file from the SFTP server and return the results:
/lib/matryoshka/impl/sftp_store.ex
58 defimpl Matryoshka.Storage do
59 def fetch(store, ref) do
60 value =
61 case :ssh_sftp.read_file(
62 store.pid,
63 String.to_charlist(ref)
64 ) do
65 {:ok, bin} -> {:ok, :erlang.binary_to_term(bin)}
66 {:error, :no_such_file} -> {:error, {:no_ref, ref}}
67 {:error, other} -> {:error, other}
68 end
69
70 {store, value}
71 end
72
73 def get(store, ref) do
74 ref = String.to_charlist(ref)
75
76 value =
77 case :ssh_sftp.read_file(store.pid, ref) do
78 {:ok, bin} -> :erlang.binary_to_term(bin)
79 {:error, _reason} -> nil
80 end
81
82 {store, value}
83 endput/1 needs to ensure that the parent directories exist, before writing the value to the SFTP as a file:
/lib/matryoshka/impl/sftp_store.ex
80 def put(store, ref, value) do
81 # Make sure that parent directories exist
82 segments = Reference.path_segments(ref)
83
84 if length(segments) > 1 do
85 dirs = SftpStore.parent_dirs(segments)
86
87 Enum.each(
88 dirs,
89 fn dir -> :ssh_sftp.make_dir(store.pid, dir) end
90 )
91 end
92
93 # Write value
94 :ssh_sftp.write_file(
95 store.pid,
96 String.to_charlist(ref),
97 :erlang.term_to_binary(value)
98 )
99
100 store
101 endAnd delete/1 just needs to ask the SFTP server to delete the file:
/lib/matryoshka/impl/sftp_store.ex
103 def delete(store, ref) do
104 :ssh_sftp.delete(
105 store.pid,
106 String.to_charlist(ref)
107 )
108
109 store
110 end
111 end
112endAnd with that, the implementation of SftpStore is finished.
Testing SftpStore #
Let’s briefly discuss the testing suite for SftpStore. :ssh exposes a daemon/3 function that lets us start an SFTP server:
-spec daemon(any | inet:ip_address(), inet:port_number(), daemon_options()) -> {ok, daemon_ref()} | {error, term()}; (socket, open_socket(), daemon_options()) -> {ok, daemon_ref()} | {error, term()}.Starts a server listening for SSH connections on the given port. If the Port is 0, a random free port is selected. See daemon_info/1 about how to find the selected port number.
So in testing, I initialise an SFTP server with username “user”, password “password”, and a subsystem specification of :ssh_sftpd, which tells the :ssh daemon to act as an SFTP filesystem
An :ssh daemon uses both generic SSH channel functionality (e.g. flow control, close messages) provided by :ssh_server_channel and application-specific functionality (here, reading and writing files) which get used via a callback API.This is classic Erlang style programming.Another good example is GenServers, which implement generic server functionality, and which you specialise by writing callback handlers (via handle_call/3 and handle_cast/2) for application-specific functionality.. Then we connect to it with a SftpStore and test get/fetch/put/delete.
This is a real SFTP server that we’re connecting to.
No mocking necessary.
/test/impl/sftp_store_test.exs
1defmodule MatryoshkaTest.SftpStoreTest do
2 alias Matryoshka.Impl.SftpStore
3 alias Matryoshka.Storage
4
5 use ExUnit.Case, async: true
6
7 # When the port is zero, the ssh daemon picks a random free port
8 @random_port 0
9 @user "user"
10 @password "password"
11
12 @moduletag :tmp_dir
13
14 setup context do
15 # Set up SFTP server options
16
17 # Where the public keys are saved
18 {:ok, cwd} = File.cwd()
19
20 system_dir =
21 to_charlist(
22 Path.join([
23 cwd,
24 "test",
25 "ssh"
26 ])
27 )
28
29 user = String.to_charlist(@user)
30 password = String.to_charlist(@password)
31 root = String.to_charlist(context.tmp_dir)
32
33 options = [
34 system_dir: system_dir,
35 user_passwords: [
36 {user, password}
37 ],
38 subsystems: [
39 :ssh_sftpd.subsystem_spec(root: root)
40 ]
41 ]
42
43 # Start SFTP server
44 :ssh.start()
45 {:ok, server_ref} = :ssh.daemon(
46 :loopback,
47 @random_port,
48 options
49 )
50 {:ok, daemon_info} = :ssh.daemon_info(server_ref)
51 ip = Keyword.get(daemon_info, :ip)
52 port = Keyword.get(daemon_info, :port)
53
54 # Start SftpStore (SFTP Client)
55 sftp_store = SftpStore.sftp_store(ip, port, @user, @password)
56
57 # Close SFTP server when test is done
58 on_exit(fn ->
59 :ssh.stop_daemon(server_ref)
60 end)
61
62 {:ok, store: sftp_store}
63 endMatryoshka is Published #
There’s plenty of extra work that I could do to improve Matryoshka.
I could write a version of SftpStore that used append-only writes, like my LogStore.
I could add a more performant, idiomatic in-memory store backed by ETS.
I could add stores backed by SQL databases—in both SQLite and Postgres flavours!
I could add a bunch more functions to the stores: keys/1 to return all the keys in a store, filter/2 to return all the key-value pairs where a predicate function returns a truthy value, or update/4 to update a key in the store with a given function.
Implementing almost all the functions in Map across the different Stores would be a great idea.
And there’s an infinite variety of storage combinators to implement, especially specialisations of MappingStore that translate terms into data transport formats: like a JSON mapping store, a XML mapping store, or even a Universal Binary Format mapping store.
But I’m at a stage where I’m happy with Matryoshka, so I’m pleased to announce that I’ve finally pulled the trigger and published it to Hex, so now you too can use my utterly ridiculous not production-ready experimental innovative key-value storage tech in your projects.
As always, you can see the latest version of Matryoshka at my GitHub.
Comments