Let's spec asynchronous cache

So initially the resolver was built with two clearly separate components - library doing DNS stuff, and the daemon providing I/O. This was partially because I didn't know what would be the best I/O model for a resolver workload yet (lots of concurrent inbound/outbound requests), so I wanted to keep those two components separate. Later on we added a very crude mechanism for specific asynchronous operations using the YIELD state. This was done specifically for the validator, which used it to push more subrequests onto query stack, and yield. The engine would then go and solve these subrequests first before resuming the validator again. There are 3 downsides of this approach:

it's only possible to resume to the beginning of a callback (not anywhere in the middle of the callback)
it's only possible to wait for DNS subrequests (nothing else)
it's explicit (not hidden away like with coroutines) and error prone (caller has to check state on each resumption)

Why would it be better

So the only potentially asynchronous operation library can do is to ask the caller to "send this DNS query to one of these hosts, and call me back with the response". There are however more potentially blocking operations that don't fit:

Shared or remote cache backends (this is really important for bigger deployments like Cloudflare, cache hit ratio is a lot better when a cache can be shared among hundreds of nodes)
Better proxy and load balancing modules (for example when you want to proxy to least loaded upstream first, or control load balancing policy)

The obvious workaround to blocking shared caches is to start X forks, then up to X queries can be concurrently blocked on shared cache, but that's ugly and still slow unless X is very high.

Proposals

So since you're refactoring cache in https://gitlab.labs.nic.cz/knot/knot-resolver/issues/108, I'd like to spec out (or at least think about) a better support for asynchronous operations as well. Here we go:

Write daemon in a language with better support for asynchronous I/O

The obligatory "let's rewrite X in Rust". The engine itself doesn't do that much stuff - it initializes the library, exposes some hooks, calls LuaJIT to load configuration, and then waits for events on sockets to read/write data. It doesn't have to do any intensive computations or low-level stuff, it doesn't have to be C. Other languages are able to call C library functions, while having more ergonomic asynchronous I/O. From the top of my head - Rust with tokio, C++ with futures, Go with goroutines, JS with promises, LuaJIT with coroutines, and more.

The daemon would have to expose C functions for making asynchronous cache operations from existing C modules that would return to daemon language runtime to continue with I/O, and then resume the execution of the C function.

Write daemon with "C" coroutines

The idea is that instead of using an event loop + state + callbacks, both library and daemon could use something like http://libdill.org to create a coroutine for each request. Library functions like cache would then be able to perform i/o without blocking other requests, and modules like validator would be able to spawn coroutines to spawn subrequests without having to use YIELD. The downside is that I/O implementation would leak into the library code, but a lot of operations would be much more ergonomic.

Move cache from library to daemon

This would basically need a similar state like YIELD, but for caches. Caller would have to place records requested from cache to kr_request and then return a special state (let's call it CACHE_YIELD). Engine would then make a blocking request to cache and unblock event loop. The request would be reactivated on response from the cache, engine would read response from the cache, fill it in kr_request, and resume execution.

Any other ideas?

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information