Knot Resolver issueshttps://gitlab.nic.cz/knot/knot-resolver/-/issues2021-08-25T13:32:55+02:00https://gitlab.nic.cz/knot/knot-resolver/-/issues/368DNS64 for subnets2021-08-25T13:32:55+02:00Petr ŠpačekDNS64 for subnetsRIPE NCC and Ondřej Caletka expressed demand for ability to limit which subnets are covered by DNS64.
We need to think whether this should be generalized to some form of ACL like in BIND, or if another one-off for DNS64 is okay.
RIPE N...RIPE NCC and Ondřej Caletka expressed demand for ability to limit which subnets are covered by DNS64.
We need to think whether this should be generalized to some form of ACL like in BIND, or if another one-off for DNS64 is okay.
RIPE NCC wanted the equivalent of the Bind ACL options:
```
acl nat64-users {
2001:67c:64:49::/64;
};
options {
dns64 64:ff9b::/96 {
clients { nat64-users; };
};
};
```
Their own hack used this config:
```
-- RIPE NCC DNS64 config
ripencc.dns64_proxy('64:ff9b::')
ripencc.dns64_subnet('2001:67c:64:49::/64')
```Vladimír Čunátvladimir.cunat@nic.czVladimír Čunátvladimir.cunat@nic.czhttps://gitlab.nic.cz/knot/knot-resolver/-/issues/478Handling of PTR records in DNS64 module2021-08-25T13:32:54+02:00Ondřej CaletkaHandling of PTR records in DNS64 moduleI know this omision is [documented](https://knot-resolver.readthedocs.io/en/stable/modules.html?highlight=PTR+synthesis#dns64), but still [RFC 6147](https://tools.ietf.org/html/rfc6147#section-5.3.1) requires proper handling of PTR recor...I know this omision is [documented](https://knot-resolver.readthedocs.io/en/stable/modules.html?highlight=PTR+synthesis#dns64), but still [RFC 6147](https://tools.ietf.org/html/rfc6147#section-5.3.1) requires proper handling of PTR records for the DNS64 translation prefix.
Since DNS64-enabled instances of Knot resolver are being deployed both by [Cloudflare](https://developers.cloudflare.com/1.1.1.1/support-nat64/) and [RIPE NCC](https://ripe78.ripe.net/on-site/tech-info/ipv6-only-network/), it would help a lot, especially during tracerouting, to have the PTR handling implemented properly.Vladimír Čunátvladimir.cunat@nic.czVladimír Čunátvladimir.cunat@nic.czhttps://gitlab.nic.cz/knot/knot-resolver/-/issues/527fine-grained logging2021-07-29T14:01:56+02:00Petr Špačekfine-grained loggingCurrent logging configuration is just one bit: verbosity on/off. This makes it hard to monitor and debug large instances.
Let's collect ideas for improvement in this ticket:
- [x] per-request logging - ability to run single request with...Current logging configuration is just one bit: verbosity on/off. This makes it hard to monitor and debug large instances.
Let's collect ideas for improvement in this ticket:
- [x] per-request logging - ability to run single request with verbose logging is very handy for debugging. We have a prototype in `/trace` endpoint in HTTP module but this module does not log everything for a given request, and also it should be much easier to use than full HTTP.
- [x] per-type logging - it might be handy to enable/disable certain types of logging, e.g. control socket log might be too noisy if there is enough API traffic etc.
- [x] fine grained log levels exposed to the logging system - external log collectors need to know if given message is debug/info/error etc.
- [ ] structured logging? log some rudimentary metadata in structured form - e.g. query name + type + rcode? This might be very handy for network operations centers etc.https://gitlab.nic.cz/knot/knot-resolver/-/issues/495improve error reporting and handling2021-06-01T11:02:38+02:00Tomas Krizekimprove error reporting and handlingCurrently, some assertions seem to be used as a way to report unlikely events, and when these are used in production, they can cause needless crashes (even though they're then handled by systemd's `Restart=on-abnormal` facility)
I propo...Currently, some assertions seem to be used as a way to report unlikely events, and when these are used in production, they can cause needless crashes (even though they're then handled by systemd's `Restart=on-abnormal` facility)
I propose the following changes:
- The code should not rely on assertions, if it does, it's a bug that should be fixed.
- Errors, even unlikely ones (currently handled by assertions) should be logged properly.
- ~~There could be an option (off by default) to enable reporting these remotely.~~Tomas KrizekTomas Krizekhttps://gitlab.nic.cz/knot/knot-resolver/-/issues/616doh2: process input headers2021-05-25T14:44:37+02:00Tomas Krizekdoh2: process input headersAs of 5.2.0, the DoH(2) implementation ignores all headers (including some pseudoheaders).
At least some (pseudo)headers should be processed, e.g.:
- `content-type`
- `:path` (currently, any endpoint answers to DoH queries)
There could...As of 5.2.0, the DoH(2) implementation ignores all headers (including some pseudoheaders).
At least some (pseudo)headers should be processed, e.g.:
- `content-type`
- `:path` (currently, any endpoint answers to DoH queries)
There could also be a mechanism for modules to request certain headers that would be passed along with a request.Tomas KrizekTomas Krizekhttps://gitlab.nic.cz/knot/knot-resolver/-/issues/673trust_anchors.set_insecure may miss some names2021-05-21T01:52:53+02:00Vladimír Čunátvladimir.cunat@nic.cztrust_anchors.set_insecure may miss some namesIf the same authoritative server IPs serve names both above and below the configured negative trust anchors, the downgrade to insecure may not happen in some cases.If the same authoritative server IPs serve names both above and below the configured negative trust anchors, the downgrade to insecure may not happen in some cases.Vladimír Čunátvladimir.cunat@nic.czVladimír Čunátvladimir.cunat@nic.czhttps://gitlab.nic.cz/knot/knot-resolver/-/issues/668Replace potentially zero-length VLAs in selection_iter.c with arrays from lib...2021-05-20T13:20:57+02:00Štěpán BalážikReplace potentially zero-length VLAs in selection_iter.c with arrays from lib/genericOver the weekend I was playing with undefined behavior sanitizer (i.e. compiling with `-fsanitize=undefined`) and ran Deckard with it.
While most of the errors point to `member access within misaligned address type '(const)? struct entr...Over the weekend I was playing with undefined behavior sanitizer (i.e. compiling with `-fsanitize=undefined`) and ran Deckard with it.
While most of the errors point to `member access within misaligned address type '(const)? struct entry_h', which requires 4 byte alignment` in `lib/cache` (which are false positives I suppose, I don't understand the cache implementation enough), there is also this one:
`lib/selection_iter.c:243:16: runtime error: variable length array bound evaluates to non-positive value 0`
The code in question is in the `iter_choose_transport` function and prepares a VLA for flattening of a trie for easier manipulation.
```c
struct choice choices[trie_weight(local_state->addresses)];
/* We may try to resolve A and AAAA record for each name, so therefore
* 2*trie_weight(…) is here. */
struct to_resolve resolvable[2 * trie_weight(local_state->names)];
```
`trie_weight` however can be 0 which leads to undefined behavior.
Replacing these with arrays from `lib/generic` should be easy and would maybe even lead to nicer code since they include a length field which is needed later down the line.
Furthermore coverage from Deckard probably isn't that great so we may consider running more tests with `-fsanitize=undefined` .Štěpán BalážikŠtěpán Balážikhttps://gitlab.nic.cz/knot/knot-resolver/-/issues/9daemon: RPC interface (json-based, possibly unbound-rpc/rndc wrapper)2021-04-16T19:20:19+02:00Ghost Userdaemon: RPC interface (json-based, possibly unbound-rpc/rndc wrapper)2015 Q1https://gitlab.nic.cz/knot/knot-resolver/-/issues/8daemon: configuration parser/interface2021-04-16T19:20:19+02:00Ghost Userdaemon: configuration parser/interface2015 Q1https://gitlab.nic.cz/knot/knot-resolver/-/issues/7cache: garbage collection scheme / aging2021-04-16T19:20:19+02:00Ghost Usercache: garbage collection scheme / aging2015 Q1https://gitlab.nic.cz/knot/knot-resolver/-/issues/6tests: CMocka-based unit tests for current APIs2021-04-16T19:20:19+02:00Ghost Usertests: CMocka-based unit tests for current APIslibrary:
* resolution
* cache
* zone cuts
* utils
daemon:
* tcp
* udp
* workerlibrary:
* resolution
* cache
* zone cuts
* utils
daemon:
* tcp
* udp
* worker2015 Q1https://gitlab.nic.cz/knot/knot-resolver/-/issues/5tests: test binary using socket_wrapper (cwrap)2021-04-16T19:20:19+02:00Ghost Usertests: test binary using socket_wrapper (cwrap)Things missing:
* [x] Wrap I/O syscalls instead of libknot library calls (more portable, generic)
* [ ] Make Python test server listen on all addresses listed in the test
* [ ] use socket_wrapper to isolate it in a test environmen...Things missing:
* [x] Wrap I/O syscalls instead of libknot library calls (more portable, generic)
* [ ] Make Python test server listen on all addresses listed in the test
* [ ] use socket_wrapper to isolate it in a test environment https://cwrap.org/socket_wrapper.html
* [ ] isolate the binary as well and test if it connects to the faked servers
* [ ] prepare configuration for binary in the test cases
* [ ] check that all tests pass on the binary!
* [ ] Documentation (may reference to the https://www.unbound.net/documentation/doxygen/replay_8h.html#details)
* [ ] Publish this as a tool to test recursive/auth DNS compliance2015 Q3Grigorii DemidovGrigorii Demidovhttps://gitlab.nic.cz/knot/knot-resolver/-/issues/4cache using namedb api2021-04-16T19:20:18+02:00Ghost Usercache using namedb apiThe cache should use the generic namedb api, but it's not possible right now for couple reasons:
* No single key - multiple values paradigm, but we probably shouldn't implement it in the API as it's too complex
* Current node seriali...The cache should use the generic namedb api, but it's not possible right now for couple reasons:
* No single key - multiple values paradigm, but we probably shouldn't implement it in the API as it's too complex
* Current node serialization is expensive, it would be the best if the "node" was stored in linear memory, and the "rrdata" as well. This way, pickling/unpickling could be as simple as memory mapping. This is important, as the node access is potentially a very frequent operation. Unless we implement this, the direct access + SKMV is the best thing.https://gitlab.nic.cz/knot/knot-resolver/-/issues/3lib: basic query-response implementation, based on requestor2021-04-16T19:20:17+02:00Ghost Userlib: basic query-response implementation, based on requestorNeeds imported requestor and list of root hints.Needs imported requestor and list of root hints.https://gitlab.nic.cz/knot/knot-resolver/-/issues/2Mockup tests for synchronous name resolution api2021-04-16T19:20:17+02:00Ghost UserMockup tests for synchronous name resolution apihttps://gitlab.nic.cz/knot/knot-resolver/-/issues/1Import libknot, dummy interface for synchronous resolving2021-04-16T19:20:17+02:00Ghost UserImport libknot, dummy interface for synchronous resolvinghttps://gitlab.nic.cz/knot/knot-resolver/-/issues/426SIGBUS on ARM2021-04-16T11:10:40+02:00Vladimír Čunátvladimir.cunat@nic.czSIGBUS on ARM@dkg wrote: fwiw, i think we're having a problem just running the armhf (32-bit arm with hard-float) build of knot-resolver on top of an arm64 kernel (despite the kernel otherwise running fine with an entirely 32-bit userland). you can...@dkg wrote: fwiw, i think we're having a problem just running the armhf (32-bit arm with hard-float) build of knot-resolver on top of an arm64 kernel (despite the kernel otherwise running fine with an entirely 32-bit userland). you can see the [build logs for knot-resolver on armhf](https://buildd.debian.org/status/logs.php?pkg=knot-resolver&arch=armhf&suite=sid) -- the machine named `arm-arm-01` is an arm64 kernel and armhf userland, and the test suite was fully re-enabled on all platforms in version 3.0.0-4.https://gitlab.nic.cz/knot/knot-resolver/-/issues/671TLS_FORWARD can get stuck on broken addresses (v5.3.0)2021-03-24T16:09:15+01:00Vladimír Čunátvladimir.cunat@nic.czTLS_FORWARD can get stuck on broken addresses (v5.3.0)With normal TLS-forwarding config, e.g.:
```lua
policy.add(policy.all(policy.TLS_FORWARD({
{ '8.8.8.8', hostname='dns.google' },
{ '8.8.4.4', hostname='dns.google' },
{ '2001:4860:4860::8888', hostname='dns.google' },
{ '2001:4860:48...With normal TLS-forwarding config, e.g.:
```lua
policy.add(policy.all(policy.TLS_FORWARD({
{ '8.8.8.8', hostname='dns.google' },
{ '8.8.4.4', hostname='dns.google' },
{ '2001:4860:4860::8888', hostname='dns.google' },
{ '2001:4860:4860::8844', hostname='dns.google' },
})))
```
but part of addresses disabled, e.g.
```bash
sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1
```
some queries get stuck in a very long "loop" of attempting connection to the non-working IPs, even though half of them works. Example log snippet: [tls_forward.log](/uploads/a5716360f9a3e6879160ff0766e37add/tls_forward.log)
_!1143 doesn't trigger here; it wasn't meant for forwarding and individual addresses might be broken for other reasons anyway._5.3.1https://gitlab.nic.cz/knot/knot-resolver/-/issues/649server selection: consider switching to TCP instead of backing off the timeou...2021-02-18T16:56:41+01:00Štěpán Balážikserver selection: consider switching to TCP instead of backing off the timeouts to high valuesThe following discussion from !1030 should be addressed:
- [ ] @sbalazik started a [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1030#note_184303): (+1 comment)
> `config.hints` test [is timing out sometim...The following discussion from !1030 should be addressed:
- [ ] @sbalazik started a [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1030#note_184303): (+1 comment)
> `config.hints` test [is timing out sometimes](https://gitlab.nic.cz/knot/knot-resolver/-/jobs/463522) on this branch and so far, I have no idea why.
>
> ```
> 22/36 knot-resolver:postinstall+config+skip_asan / config.hints TIMEOUT 120.05 s
> --- command ---
> KRESD_NO_LISTEN='1' PATH='/builds/knot/knot-resolver/.local/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin' TEST_FILE='/builds/knot/knot-resolver/modules/hints/tests/hints.test.lua' SOURCE_PATH='/builds/knot/knot-resolver/tests/config' /builds/knot/knot-resolver/tests/config/../../scripts/test-config.sh -c /builds/knot/knot-resolver/build_ci/../tests/config/test.cfg -n
> --- stdout ---
> /builds/knot/knot-resolver/.local/sbin/kresd
> processing test file /builds/knot/knot-resolver/modules/hints/tests/hints.test.lua
> ok 1 - has IP address for a.root-servers.net.
> ok 2 - load root hints from file
> ok 3 - can retrieve root hints
> ok 4 - real IP address for a.root-servers.net. is replaced
> ok 5 - real IP address for a.root-servers.net. is correct
> [65536.00][rplan] [qry tree] badname.lan. A (0) <-
> [65536.00][rplan] [push] pending 1; badname.lan. A (0) | resolved 0
> [65536.03][rplan] [qry tree] . DNSKEY (3) <- badname.lan. A (2) <-
> [65536.03][rplan] [push] pending 2; . DNSKEY (3); badname.lan. A (2) | resolved 0
> ```
This is because the `iter_ns_badip.rpl` workaround allows the pushing of the same query to `rplan` twice in the row which leads to multiple tries with back-off of the timeout to resolve `. DNSKEY` or `a.root-servers.net AAAA` (if DNSSEC is turned off). The old selection implementation switches to TCP after a few tries and there the connection fails and the NS address is `flagged as 'bad'`.
Switching to TCP instead of backing off into big timeouts might be a good idea which might even help with the pathological cases that appear in `respdiff` now.5.3.0https://gitlab.nic.cz/knot/knot-resolver/-/issues/640remove SAFEMODE2021-02-09T13:54:01+01:00Štěpán Balážikremove SAFEMODEI have no real solution in mind, I'll just keep a running list of what `SAFEMODE` does here, since I have been bitten in the backparts by it multiple times and the documentation really doesn't cut it (“Don’t use fancy stuff (EDNS, 0x20, ...I have no real solution in mind, I'll just keep a running list of what `SAFEMODE` does here, since I have been bitten in the backparts by it multiple times and the documentation really doesn't cut it (“Don’t use fancy stuff (EDNS, 0x20, …)”).
* turns off `Ox20` randomization
* turns off server selection (to be changed in !1030)
* turns off some EDNS stuff that I don't understand
* ensures that there is a retry after REFUSED (see code below; this means that if you overwrite `query->SAFEMODE` after this, the resolver may cycle on REFUSED)
```
static int resolve_badmsg(knot_pkt_t *pkt, struct kr_request *req, struct kr_query *query)
{
#ifndef STRICT_MODE
/* Work around broken auths/load balancers */
if (query->flags.SAFEMODE) {
return resolve_error(pkt, req);
} else if (query->flags.NO_MINIMIZE) {
query->flags.SAFEMODE = true;
return KR_STATE_DONE;
} else {
query->flags.NO_MINIMIZE = true;
return KR_STATE_DONE;
}
#else
return resolve_error(pkt, req);
#endif
}
```
Removing it, is probably a better idea: especially with the new server selection error reporting we could probably make the workarounds more granular than they are now.Štěpán BalážikŠtěpán Balážik