Knot Resolver issueshttps://gitlab.nic.cz/knot/knot-resolver/-/issues2020-11-04T11:53:33+01:00https://gitlab.nic.cz/knot/knot-resolver/-/issues/604cache: zero-downtime restart is not supported across versions which change ca...2020-11-04T11:53:33+01:00Petr Špačekcache: zero-downtime restart is not supported across versions which change cache format/versionCurrently we do not handle the case where cache format differs between two versions which are running in parallel.
- Such changes happen very very rarely so it is questionable if we need to support it.
- At least we should make note in ...Currently we do not handle the case where cache format differs between two versions which are running in parallel.
- Such changes happen very very rarely so it is questionable if we need to support it.
- At least we should make note in release notes when it is necessary to stop all instances before starting new ones.
See rest of the [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1042#note_169683).
The following discussion from !1042 should be addressed:
- [ ] @pspacek started a [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1042#note_169683): (+1 comment)
> I wonder how this magic would work in situation where:
> - kresd instances 1+2 are running version 5.y.z with cache in /var/cache/knot-resolver
> - kresd binary gets updated to version 6.0.0
> - admin restarts instance 1 first (according to https://knot-resolver.readthedocs.io/en/v5.1.2/systemd-multiinst.html#zero-downtime-restarts) and restarts instance 2 later
> I guess instance 2 would not detect this unless cache overflows, so most likely instance 2 will write data in old format into cache versioned by version 6.0.0.
>
> Am I correct?
>
> If so I think we should open issue and keep it in mind for future cache rewrite/migration to custom data structure.https://gitlab.nic.cz/knot/knot-resolver/-/issues/603cache: get rid of mdb_env_sync()2020-09-07T17:52:07+02:00Petr Špačekcache: get rid of mdb_env_sync()Explicit cache sync does not seem necessary and might be counterproductive, see other comments in the thread:
The following discussion from !1042 should be addressed:
- [ ] @pspacek started a [discussion](https://gitlab.nic.cz/knot/kno...Explicit cache sync does not seem necessary and might be counterproductive, see other comments in the thread:
The following discussion from !1042 should be addressed:
- [ ] @pspacek started a [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1042#note_169608): (+1 comment)
> Out of curiosity, why the sync is necessary here?https://gitlab.nic.cz/knot/knot-resolver/-/issues/602cache size exposed in Lua API can get out of sync2020-11-04T11:53:33+01:00Petr Špačekcache size exposed in Lua API can get out of syncThis is minor nit.
Lua call `cache.current_size` does not read the cache size from file/LMDB environment so the value reported in Lua can be out-of-sync if another process changed cache size.
The following discussion from !1042 should ...This is minor nit.
Lua call `cache.current_size` does not read the cache size from file/LMDB environment so the value reported in Lua can be out-of-sync if another process changed cache size.
The following discussion from !1042 should be addressed:
- [ ] @pspacek started a [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1042#note_168309): (+1 comment)
> I wonder if `cache.current_size` returns correct size if some rounding took place inside the backend.https://gitlab.nic.cz/knot/knot-resolver/-/issues/598ability to reload ssl certificate on certificate change2020-11-25T09:26:51+01:00TomVnzability to reload ssl certificate on certificate changeI was looking into doing this automatically but seems there is no cohesive way within knot-resolver.
Played around with using the control socket options, but it's a bit messy...e.g. use:
<code>net.close('0.0.0.0')
http.config({tls ...I was looking into doing this automatically but seems there is no cohesive way within knot-resolver.
Played around with using the control socket options, but it's a bit messy...e.g. use:
<code>net.close('0.0.0.0')
http.config({tls = true, cert = "\<CERT\>", key = "\<KEY\>"}, '<webmgmt|doh>') --for DoH|webmgmt
net.listen('0.0.0.0', 53, { kind = 'dns' })
net.listen('0.0.0.0', 443, { kind = 'doh' })
net.listen('0.0.0.0', 853, { kind = 'tls' })
net.listen('0.0.0.0', 8453, { kind = 'webmgmt' })
net.tls("\<CERT\>", "\<KEY\>") --for DoT
</code>
But, if knot-resolver is running as unprivileged user then it can't rebind to privileged ports. And this needs to be scripted somehow.
An alternative way would be for the process that creates the new SSL certificates to restart knot-resolver but then that process would need to run as root.
So for now, I'm using a custom systemd path / service combo to monitor certificate file for any changes and then reload knot-resolver that way.
Would be keen to know of any thoughts to simplyfy this, or even the ability to reload the certificate could be added into knot-resolver itself - I know rpz files are monitored and reloaded when changed so this seems somewhat similar.https://gitlab.nic.cz/knot/knot-resolver/-/issues/59364-bit ARM: remaining issues2020-10-01T10:53:36+02:00Santiago64-bit ARM: remaining issues(EDITed)
It's still possible to run into `bad light userdata pointer` errors, possibly hidden under
`missing luajit package: cqueues`. For summary see this post below: https://gitlab.nic.cz/knot/knot-resolver/-/issues/593#note_165359
...(EDITed)
It's still possible to run into `bad light userdata pointer` errors, possibly hidden under
`missing luajit package: cqueues`. For summary see this post below: https://gitlab.nic.cz/knot/knot-resolver/-/issues/593#note_165359
- - -
#### Original post
Hi there,
It seems to be known that kresd doesn't work on arm64, but I haven't found this particular build error document (so sorry for the possible noise). knot-resolver 5.1.x doesn't build on Debian due to a luajit error (bad light userdata pointer). The full build log is in https://buildd.debian.org/status/fetch.php?pkg=knot-resolver&arch=arm64&ver=5.1.2-1&stamp=1596037546&raw=0
And this is the relevant part:
````
...
Message: --- config_tests dependencies ---
Running command: /usr/bin/luajit -l cqueues -e os.exit(0)
--- stdout ---
--- stderr ---
/usr/bin/luajit: bad light userdata pointer
stack traceback:
[C]: at 0xffffb6342ad0
[C]: in function 'require'
/usr/share/lua/5.1/cqueues.lua:2: in function </usr/share/lua/5.1/cqueues.lua:1>
[C]: at 0xaaaae1757d08
[C]: at 0xaaaae170a4c0
../tests/meson.build:27:4: ERROR: Problem encountered: missing luajit package: cqueues
````
Cheers,
-- Santiagohttps://gitlab.nic.cz/knot/knot-resolver/-/issues/590document bug reporting procedure2020-07-10T14:10:23+02:00Petr Špačekdocument bug reporting procedure- test on latest version
- mention relevant system information
- how to capture GDB traceback
- how to limit logging to problematic names
- how to capture network traffic + keys (TLS, DoH)
...- test on latest version
- mention relevant system information
- how to capture GDB traceback
- how to limit logging to problematic names
- how to capture network traffic + keys (TLS, DoH)
...https://gitlab.nic.cz/knot/knot-resolver/-/issues/589document threat model2020-07-11T22:10:59+02:00Petr Špačekdocument threat model- inputs
- trusted (config, control socket, cache, files on disk)
- untrusted (network traffic)
- decide: prefill? hints? ...
- DoS is always possible (network overload, hijack etc.)
- integrity - DNSSEC
- confidentiality - do not ...- inputs
- trusted (config, control socket, cache, files on disk)
- untrusted (network traffic)
- decide: prefill? hints? ...
- DoS is always possible (network overload, hijack etc.)
- integrity - DNSSEC
- confidentiality - do not count on it, encrypting only DNS traffic does not hide ithttps://gitlab.nic.cz/knot/knot-resolver/-/issues/588control socket drops long outputs2020-09-17T13:22:45+02:00Petr Špačekcontrol socket drops long outputsControl socket randomly cuts long outputs. It seems to be caused by incorrect use of fprintf inside daemon/io.c fuction `io_tty_process_input()`.
Version: 5.1.2
Steps to reproduce:
```
$ echo -e "string.rep('a', 1024*1024*10)\n" | soca...Control socket randomly cuts long outputs. It seems to be caused by incorrect use of fprintf inside daemon/io.c fuction `io_tty_process_input()`.
Version: 5.1.2
Steps to reproduce:
```
$ echo -e "string.rep('a', 1024*1024*10)\n" | socat - unix-connect:$(ls control/*) | wc -c
223362
```
I.e. the output is truncated after 223362 bytes. This value is not a constant, it varies. Expected output should be 1024*1024*10 bytes `a` + 2x2 bytes of prompt `> `.
Strace:
```
read(23, "__binary\nstring.rep('a', 1024*10"..., 65536) = 40
dup(23) = 24
fcntl(24, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
fstat(24, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
write(24, "\0\240\0\1aaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 4096) = 4096
write(24, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 10481664) = 219264
write(24, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 10262400) = 109632
write(24, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 10152768) = 219264
write(24, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 9933504) = -1 EAGAIN (Resource temporarily unavailable)
close(24) = 0
```
The whole `io_tty_process_input()` function is a mess and should be refactored into smaller pieces, and most importantly rewritten to use libuv for writes as well.https://gitlab.nic.cz/knot/knot-resolver/-/issues/583new statistics for encrypted transports2020-06-19T14:17:50+02:00Petr Špačeknew statistics for encrypted transportsIt would be interesting to see statistics for:
- [ ] number of TLS handshakes
- [ ] TLS versions
- [ ] HTTP versions
- [ ] HTTP request methods
- [ ] HTTP status codes
Question: Are these stats sufficient to gather details about connect...It would be interesting to see statistics for:
- [ ] number of TLS handshakes
- [ ] TLS versions
- [ ] HTTP versions
- [ ] HTTP request methods
- [ ] HTTP status codes
Question: Are these stats sufficient to gather details about connection reuse?https://gitlab.nic.cz/knot/knot-resolver/-/issues/578test aggressive cache on NSEC3PARAM rotation2020-08-20T10:05:40+02:00Vladimír Čunátvladimir.cunat@nic.cztest aggressive cache on NSEC3PARAM rotationI don't think we have any tests on that in particular, though the code's been deployed for a long time. Still, most of possible failures I can imagine should only lead to insufficient caching.
Hints around how the implementation works:...I don't think we have any tests on that in particular, though the code's been deployed for a long time. Still, most of possible failures I can imagine should only lead to insufficient caching.
Hints around how the implementation works:
- NSEC3PARAM is the [data collected](https://tools.ietf.org/html/rfc5155#section-4.2) but it's taken from NSEC3 records directly.
- For this purpose, using NSEC is like one more possible NSEC3PARAM configuration.
- Reading from cache is designed to consider the last two NSEC3PARAMs that's been written for that zone.
- Code reference: identifiers containing `nsec_p`.https://gitlab.nic.cz/knot/knot-resolver/-/issues/573net.tls() allow usage of multiple certificates2020-10-08T11:43:59+02:00Tomas Krizeknet.tls() allow usage of multiple certificatesECC certificates provide superior performance to RSA keys of comparable security. Supporting multiple certificate files in `net.tls()` could lead to improved DNS-over-TLS performance without sacrificng compatibility with older clients, i...ECC certificates provide superior performance to RSA keys of comparable security. Supporting multiple certificate files in `net.tls()` could lead to improved DNS-over-TLS performance without sacrificng compatibility with older clients, if both ECC and RSA certificates could be used simulataneously.https://gitlab.nic.cz/knot/knot-resolver/-/issues/569clarify respdiff job names in CI2020-10-19T11:16:35+02:00Petr Špačekclarify respdiff job names in CIMostly note for myself:
especially forwarding scenarios have confusing names
Find better naming structure and fix it.
Rename will break a lot of stuff so schedule this when we have time for it.Mostly note for myself:
especially forwarding scenarios have confusing names
Find better naming structure and fix it.
Rename will break a lot of stuff so schedule this when we have time for it.https://gitlab.nic.cz/knot/knot-resolver/-/issues/568Some cases of DNS resolution from lua fail if OS provides only IPv6 resolvers2020-04-24T10:04:07+02:00Vladimír Čunátvladimir.cunat@nic.czSome cases of DNS resolution from lua fail if OS provides only IPv6 resolversConditions:
- `resolv.conf` only containing IPv6 nameservers. Mix works OK. I believe that very few people have IPv6-only there, luckily.
- Use DNS resolution based on `lua-cqueues`, e.g. `prefill` module or root trust anchors bootst...Conditions:
- `resolv.conf` only containing IPv6 nameservers. Mix works OK. I believe that very few people have IPv6-only there, luckily.
- Use DNS resolution based on `lua-cqueues`, e.g. `prefill` module or root trust anchors bootstrapping – both only after !894 (kresd >= 5.0.0).
Result example:
```
[prefill] fetch of `https://www.internic.net/domain/root.zone` failed: HTTP client library error: A non-recoverable error occurred when attempting to resolve the name (-1684960053)), will retry root zone download in 09 minutes 59 seconds
```
This is a problem in lua libraries that we've chosen to use: https://github.com/wahern/dns/issues/23https://gitlab.nic.cz/knot/knot-resolver/-/issues/551client retry logic on TCP/TLS connection closure2020-10-22T13:58:57+02:00Vladimír Čunátvladimir.cunat@nic.czclient retry logic on TCP/TLS connection closureWhen remote server closes a connection without answering a part of our queries, the corresponding requests get failed too aggressively (perhaps? TODO: details, etc.)
Most interesting parts of standards is [7766](https://tools.ietf.org/h...When remote server closes a connection without answering a part of our queries, the corresponding requests get failed too aggressively (perhaps? TODO: details, etc.)
Most interesting parts of standards is [7766](https://tools.ietf.org/html/rfc7766#section-6.2.4):
> DNS clients SHOULD retry unanswered queries if the connection closes before receiving all outstanding responses.
On the other hand servers SHOULD not close the connections early, without reasons for the particular case... so hopefully this won't happen that often in practice; [FRITZ!](https://forum.turris.cz/t/dns-over-tcp-just-a-single-transaction/12003/11) seems a notable case. _I'll keep copying the important points from that discussion to here._https://gitlab.nic.cz/knot/knot-resolver/-/issues/548Support for DoQ | DNS over QUIC2023-11-15T09:26:55+01:00Gaspard d'HautefeuilleSupport for DoQ | DNS over QUICHello,
DoQ is IMHO the upgrade of DoT and is not bloated compared to DoH & DoH3.
https://tools.ietf.org/html/draft-huitema-quic-dnsoquic-07
Do you consider support this Internet Draft or would your rather wait for a RFC?
Thanks,
HLFHHello,
DoQ is IMHO the upgrade of DoT and is not bloated compared to DoH & DoH3.
https://tools.ietf.org/html/draft-huitema-quic-dnsoquic-07
Do you consider support this Internet Draft or would your rather wait for a RFC?
Thanks,
HLFHhttps://gitlab.nic.cz/knot/knot-resolver/-/issues/537module API redesign2020-11-30T17:52:59+01:00Petr Špačekmodule API redesignProblem statement
-----------------
- Current module API is not well defined and does not provide sufficient abstraction
- As a result, modules are not isolated and must know about internals of other modules (e.g. modules resetting reque...Problem statement
-----------------
- Current module API is not well defined and does not provide sufficient abstraction
- As a result, modules are not isolated and must know about internals of other modules (e.g. modules resetting request state must also reset `req.*_selected` arrays)
- Mixing wire-format-generating modules with modules relying on `req.*_selected` arrays leads to weird bugs (one example: !842, !851, !859)
- Lua modules seem to be slow (because of the way how C code calls Lua?)
Related tickets
---------------
- #363 Modules need generic way to persist own state
- #432 Modules need ability to not respond at all (for response rate limiting)
- #483 Modules currently cannot generate answer if no NS is responding
- #447 New server selection system should expose and use API instead of being hard-wired
- #396 SERVFAIL answer can still contain bogus RRsets
- #471 low-level protocol stuff is hard-coded (incorrectly)
- #36 make sure new API does not get in the way when implementing parallel queries
- #527 modules need a way to cooperate with fine-grained logging
- #418 engine object access - I don't know if this requirement will be still valid after redesign, but let's think about it
- #264 error reporing from modules sucks
- #234 a way to cooperate between modules??? e.g. for DNAME support???
- attempt to move `reorder_RR()` into module, ideally in a form of policy action so it can be triggered on per-client basis - what API would be necessary?
Objective
---------
Design a new API for modules in a way which prevents bugs stemming from bad API usage from ever repeating again.
Implementation is expected to be a long-term project, but we need proper design first. Hopefully #447, #535 and other tasks planned for 2020 will provide us sufficient experience for better API design.2020 Q4https://gitlab.nic.cz/knot/knot-resolver/-/issues/535declarative policy module and other user-supplied DNS data2024-02-28T12:15:39+01:00Petr Špačekdeclarative policy module and other user-supplied DNS dataCurrent problem
---------------
Our current imperative policy module is using chain of Lua functions: This is quite slow and hard to use for non-programmers.
Proposal
--------
Design a new method to configure "policies", preferably in a...Current problem
---------------
Our current imperative policy module is using chain of Lua functions: This is quite slow and hard to use for non-programmers.
Proposal
--------
Design a new method to configure "policies", preferably in a declarative way. By "policies" I mean a generic way to influence resolving and inject user-supplied data into DNS tree or block other stuff.
A declarative way should be more intuitive to use than writing Lua functions, and also faster if we design it right.
Here is incomplete list of stuff we might want to express.
- [x] ability to also block sub-queries, e.g. when following CNAMEs (#217)
- [ ] ability to block RR data - e.g. rebinding protection, blacklist of NS names etc. (#523)
- [x] ACLs (including negative ACLs, #370)
- [x] merge views with other policies (see also #445)?
- [x] redirecting specific zones to user-configured servers (#428, !651)
- [ ] beware that we need also port number, not just IP address
- [x] theoretical "helper" NS+glue records from kresd config should not be retrievable from outside
- FORWARDing
- TLS forwarding has many knobs and might need even more: #481
- do we still need STUB policy? If so see #218
- FORWARDing might need exceptions for some subtrees (see e.g. https://lists.nlnetlabs.nl/pipermail/unbound-users/2019-December/006560.html)
- generally special EDNS tricks: #314, #303; also improve #657
- special cache semantics (do not cache this sub-tree, limit TTL in this sub-tree)
- maybe DNS64 module should be merged with policies and ACLs: #368
- [x] maybe hints module should be merged in as well (see also #205, #349)
- [x] maybe also a way to provide other user-supplied data - #540
* (well, more ways can always be added)
- maybe prefill module should be merged as well (see also #417)
- think of interaction with daf module (beware of #183)
* `@vcunat` would prefer to deprecate DAF,
but theoretically we could think of translating DAF rules into the new policy rules
- design should be able to support full strength of RPZ (example of a problem: #194)
* the most common features are in 6.0.x – CNAME redirection in particular, and interacting well with other rules (multiple rules of different kinds can trigger when jumping through CNAME chains)
- design needs to support efficient mechanism which mimicks RPZ with zone transfer including IXFR(!) (#195)
- build mechanism for better visibility into policies (#364)
- it needs to work with huge lists (apparently users want to have long block lists, see https://lists.nlnetlabs.nl/pipermail/unbound-users/2019-December/006559.html)
* improved in 6.0.x: shared inside LMDB across all processes, but efficiency of restarts/reloads/updates could be significantly improved (as of 6.0.6)
- [x] open question: at which stage should the module kick in? Can it be e.g. used to implement `ignore-cd-flag` policy as seen in Unbound?
* the `view:` part can be used to set such options, though there's no ignore-cd in particular so far
- per-domain setting for rate-limits e.g. like `ratelimit-below-domain`, `ratelimit-for-domain` etc. like in Unbound
* [ ] first per-user changes in rate-limits in `views:` (when we have any rate-limiting)
- [x] special handling for reserved and local-only names: see #205 and think it through2020 Q2https://gitlab.nic.cz/knot/knot-resolver/-/issues/534CI: test server selection algorithm2019-12-18T19:41:28+01:00Petr ŠpačekCI: test server selection algorithmImplement https://gitlab.labs.nic.cz/knot/maze/ into Knot Resolver's CI.
Ideas:
- Gitlab shell executor in a VM with sudo access (yuck!)
- shell executor to a VM with a systemd build which contains https://github.com/systemd/systemd/pul...Implement https://gitlab.labs.nic.cz/knot/maze/ into Knot Resolver's CI.
Ideas:
- Gitlab shell executor in a VM with sudo access (yuck!)
- shell executor to a VM with a systemd build which contains https://github.com/systemd/systemd/pull/138232020 Q1Štěpán BalážikŠtěpán Balážikhttps://gitlab.nic.cz/knot/knot-resolver/-/issues/532OCSP stapling for client side2019-12-18T15:28:32+01:00Petr ŠpačekOCSP stapling for client sideFor client side (TLS_FORWARD) we could get inspired by [`kdig +tls-ocsp-stapling`](https://github.com/CZ-NIC/knot/pull/13).For client side (TLS_FORWARD) we could get inspired by [`kdig +tls-ocsp-stapling`](https://github.com/CZ-NIC/knot/pull/13).https://gitlab.nic.cz/knot/knot-resolver/-/issues/517OCSP stapling for server side2019-12-18T15:28:32+01:00Vladimír Čunátvladimir.cunat@nic.czOCSP stapling for server sideOCSP stapling seems to make much sense for server side as well, at least at a quick look.OCSP stapling seems to make much sense for server side as well, at least at a quick look.