cache problems with DRBD backed systems
We have a VM that is running knot resolver(5.0.1) and that VM is managed by Ganeti and is running on top of DRBD with LVM LVOLs hosted on SSDs below that. DRBD mirrors the disk across servers so ganeti can live migrate VMs from server to server.
This VM is a moderately loaded webserver, so it doing quite a few lookups, but overall traffic on the public interface is under 0.4Mbit/sec. But when we look at the network bandwidth associated with the DRBD device we are seeing 400Mbit/sec (50MBytes/sec). As an experiment we put the cache in a tmpfs as suggested in the docs and that completely eliminated the traffic.
So something about the way the disk caching is working is resulting in i/o patterns that don't work with DRBD. A couple things I thought of:
- DRBD only sends writes across the wire, reads can be resolved locally. So this is some sort of write i/o
- This is not just lookup associated i/o being written to cache, as I mentioned the traffic into the system is much lower. So there is some sort of amplification going on here. Maybe large sections of cache are getting flushed every time there is a small update?
- How often is data in RAM being flushed to the disk cache? Does it flush for every query or is it timed or when hitting some size threshold? Say it flushes every 10 seconds, does that mean on server reboot that the cache that gets loaded would only be missing the last 10 seconds of queries?
- We have other VMs that are running knot resolver and are not seeing this issue, but they don't do as much web traffic. We suspect they are also causing more i/o, but we don't have good numbers for this yet.
I think a good test would be setting up a DRBD between two hosts and putting the cache on that (and that would help rule out qemu/ganeti/LVM) and then hitting it with a bunch of lookups.
Let me know if you have questions or ideas of things to try, or need more details to reproduce. Thanks.