Knot Resolver issueshttps://gitlab.nic.cz/knot/knot-resolver/-/issues2021-03-11T11:16:16+01:00https://gitlab.nic.cz/knot/knot-resolver/-/issues/669For STUB/FORWARD add an option to select servers in the order of appearance o...2021-03-11T11:16:16+01:00Štěpán BalážikFor STUB/FORWARD add an option to select servers in the order of appearance on the policy.STUB/FORWARD listCurrently the choice of forwarding target is always left to the server selection algorithm, which breaks the following setup
```
zone = policy.todnames({'exaple.com'})
policy.add(policy.suffix(policy.FLAGS({'NO_CACHE'}), zone))
policy.a...Currently the choice of forwarding target is always left to the server selection algorithm, which breaks the following setup
```
zone = policy.todnames({'exaple.com'})
policy.add(policy.suffix(policy.FLAGS({'NO_CACHE'}), zone))
policy.add(policy.suffix(policy.STUB({'<local-resolver>', '<public-resolver>'}), zone))
```
for the users which were relying on the undocumented behavior that `<local-resolver>` being first on the list was almost exclusively chosen for the queries when available.
I suggest adding a option to optionally turn off the choice based on RTT estimates and select servers based on the order of appearance on the list.https://gitlab.nic.cz/knot/knot-resolver/-/issues/659Validation still happens in zones set as insecure, sometimes2020-12-20T18:15:44+01:00Štěpán BalážikValidation still happens in zones set as insecure, sometimesWhile fixing Deckard tests for !1030 I encountered this weird behavior:
Even with `trust_anchors.set_insecure({"net.",})` in configuration Resolver still tries to validate records for `k.root-server.net`.
Attached are the artifacts fro...While fixing Deckard tests for !1030 I encountered this weird behavior:
Even with `trust_anchors.set_insecure({"net.",})` in configuration Resolver still tries to validate records for `k.root-server.net`.
Attached are the artifacts from the Deckard run (including a `rpl` file and log). [lidovky_insecure_net.zip](/uploads/9b37a29f4874e74d281f721d17dc9a6b/lidovky_insecure_net.zip)
This is _probably_ related to #626.https://gitlab.nic.cz/knot/knot-resolver/-/issues/658Fetch NS names and glue from both parent and child zones (in some way)2021-01-04T11:28:06+01:00Štěpán BalážikFetch NS names and glue from both parent and child zones (in some way)After !1097, Knot Resolver is properly parent-centric in the resolution.
I recently fixed `iter_pcnamech.rpl` in deckard!207 to actually test something and it requires a query to the child zone to discover a NS name/address to pass.
Mo...After !1097, Knot Resolver is properly parent-centric in the resolution.
I recently fixed `iter_pcnamech.rpl` in deckard!207 to actually test something and it requires a query to the child zone to discover a NS name/address to pass.
Moreover https://tools.ietf.org/html/draft-ietf-dnsop-ns-revalidation-00#section-3 points in the direction of querying the child zone as well.
Blocks deckard!207.https://gitlab.nic.cz/knot/knot-resolver/-/issues/651dnstap module spawns a thread2020-12-07T11:00:36+01:00Vladimír Čunátvladimir.cunat@nic.czdnstap module spawns a threadThat's not consistent with kresd architecture, though I can't think of a particular reason why it might cause a problem. Note that this thread will get spawned for each kresd process, so it might be a bit wasteful.
We might prefer to r...That's not consistent with kresd architecture, though I can't think of a particular reason why it might cause a problem. Note that this thread will get spawned for each kresd process, so it might be a bit wasteful.
We might prefer to rewrite the module by utilizing the shared libuv loop (to know when socket is ready to receive more data), but maybe the [fstrm tools](https://farsightsec.github.io/fstrm/overview.html) don't provide good support for that. If we drop the thread, this library might not be worth depending on anymore (as the framing is trivial).https://gitlab.nic.cz/knot/knot-resolver/-/issues/648server selection: implement a way to do asynchronous NS name resolution2020-11-30T14:11:28+01:00Štěpán Balážikserver selection: implement a way to do asynchronous NS name resolutionThe following discussion from !1030 should be addressed:
- [ ] @pspacek started a [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1030#note_184348): (+6 comments)
> I do not see this flag in use. Is it inten...The following discussion from !1030 should be addressed:
- [ ] @pspacek started a [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1030#note_184348): (+6 comments)
> I do not see this flag in use. Is it intentional?https://gitlab.nic.cz/knot/knot-resolver/-/issues/647server selection: collect and use TCP connection information2021-11-08T13:39:08+01:00Štěpán Balážikserver selection: collect and use TCP connection informationThe following discussion from !1030 should be addressed:
- [ ] @pspacek started a [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1030#note_184337): (+3 comments)
> I'm either blind or it is not used anywher...The following discussion from !1030 should be addressed:
- [ ] @pspacek started a [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1030#note_184337): (+3 comments)
> I'm either blind or it is not used anywhere. Can you point me to the place where it gets used, please?
`tcp_waiting` and `tcp_connected` and respective function and its calls have been commented out (in 6ef74faf922c5962401747b5aa3a9e01e92e50ff) until we use this information in the server selection process.
This will ultimately be related to #629 for example.https://gitlab.nic.cz/knot/knot-resolver/-/issues/638[discussion] cache backend redesign2020-12-04T16:34:21+01:00Petr Špaček[discussion] cache backend redesignLet's discuss problems we have with current LMDB-based cache backend. We need to analyze if these are fixable or we need to redesign cache backend.
Problems with LMDB itself
- Database overfill leads to irrecoverable state where while D...Let's discuss problems we have with current LMDB-based cache backend. We need to analyze if these are fixable or we need to redesign cache backend.
Problems with LMDB itself
- Database overfill leads to irrecoverable state where while DB practically becomes read only and the only ways forward are either enlarge database or delete it. Together with inability to detect if committing a transaction will lead to this state prevents us from reliably keeping cache with constant size, leading to race conditions in overflow handling etc. (#605)
- Transactions have [undefined limits](https://lists.openldap.org/hyperkitty/list/openldap-technical@openldap.org/message/VI7K5NWV46J6DACITXVS7X2SM3HZIXVB/) on them, forcing us to [jump through hoops](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1042/diffs?commit_id=c651fbf24017f26435b86e69e9ce73c7f5976b97).
- LMDB depends on unique PID values - this assumption does not hold when sharing cache across containers (#637).
Other cache-related problems: #602, #604https://gitlab.nic.cz/knot/knot-resolver/-/issues/635ci: add respdiff tests for XDP2020-10-30T15:21:26+01:00Tomas Krizekci: add respdiff tests for XDPXDP should be tested on real interfaces, which requires some changes to respdiff configuration (using real interface instead of loopback, root privileges, ...). This might be easier to achieve once we simplify our testing infrastructure....XDP should be tested on real interfaces, which requires some changes to respdiff configuration (using real interface instead of loopback, root privileges, ...). This might be easier to achieve once we simplify our testing infrastructure. (https://gitlab.nic.cz/knot/knot-resolver-ansible/-/issues/3)https://gitlab.nic.cz/knot/knot-resolver/-/issues/631remove deprecated -f/--forks option2020-10-27T17:13:01+01:00Tomas Krizekremove deprecated -f/--forks optionProblems with `--forks` feature:
- Does not support dynamic restart (related: #268)
- Does not support watchdog
- First process is single point of failure
- Per-instance configuration via environment variables is harder
- Fixing this pra...Problems with `--forks` feature:
- Does not support dynamic restart (related: #268)
- Does not support watchdog
- First process is single point of failure
- Per-instance configuration via environment variables is harder
- Fixing this practically means re-implementing systemd or supervisord, which is obviously a bad idea.
Related: #529
Task list:
- [ ] remove `-f` option and related forking code
- [ ] `worker.count` should also be removed
- [ ] remove -f usage from all testing scripts, deckard, respdiff etc.
- [ ] update our benchmakring docker image to be able to run multiple kresd instances without `-f`6.0.0https://gitlab.nic.cz/knot/knot-resolver/-/issues/630daf: improve multi-instance support2020-10-23T12:02:33+02:00Tomas Krizekdaf: improve multi-instance supportCurrently, the DAF module can work when using multiple instances, but only as long as:
- all the instances are started before any rules are configured
- no instance is ever separately restarted (or crashes)
This could be improved by:
- ...Currently, the DAF module can work when using multiple instances, but only as long as:
- all the instances are started before any rules are configured
- no instance is ever separately restarted (or crashes)
This could be improved by:
- using deterministic IDs that are tied to the rule (e.g. a hash)
- have some mechanism that can be used to pull/push the entire current configuration instead of a single update (to sync an instance state with others after restart)https://gitlab.nic.cz/knot/knot-resolver/-/issues/629early detection for dropped answers over TCP connection2021-12-08T10:24:06+01:00Petr Špačekearly detection for dropped answers over TCP connectionProblem
=======
Currently individual DNS queries over TCP connection do not have per-query timer and we leave to TCP stack to handle packet loss. This works fine for network-level problems but does not work for queries dropped at applica...Problem
=======
Currently individual DNS queries over TCP connection do not have per-query timer and we leave to TCP stack to handle packet loss. This works fine for network-level problems but does not work for queries dropped at application-level.
Issue seen in the field: #551
I.e. queries are dropped on server side and clients get SERVFAIL once the whole TCP connection times out.
Another instance of this problem is Unbound's default limit for number of queries resolved in parallel over a single TCP connection: Before commit https://github.com/NLnetLabs/unbound/commit/f81d0ac0474cc8904e1240a512b935c8e466f81b Unbound would process only 32 queries in parallel and keep other queries on the same TCP connection hanging, potentially leading to long periods without responses.
Vague proposal
==============
- Use per-query timeout also for queries over TCP/TLS/HTTPS and evaluate if the query should be resent using other transport if it times out.
- Detect "suspicious" TCP connection states when deduplicating connections and skip over "suspicious" connections. For example, do not reuse connection if it has queries hanging on it for longer than 3 seconds.
TODO: Is there some other TCP-level tunning we can do?
Related: #447https://gitlab.nic.cz/knot/knot-resolver/-/issues/624Graph not shown in web management (webmgmt)2020-10-12T09:33:09+02:00Ghost UserGraph not shown in web management (webmgmt)I am running web management service on knot resolver. But, there is a problem which is graph is not shown. Then, I inspected the element and got the problem. Here are the problem:
**Screenshot of Error:**
![knot-webmgmt-0](/uploads/ae0...I am running web management service on knot resolver. But, there is a problem which is graph is not shown. Then, I inspected the element and got the problem. Here are the problem:
**Screenshot of Error:**
![knot-webmgmt-0](/uploads/ae028abbd19e8a33a6544c70f393970e/knot-webmgmt-0.png)
![knot-webmgmt-1](/uploads/8d37984831fc956075084c5588472ad9/knot-webmgmt-1.png)
**Error Log:**
```
DevTools failed to load SourceMap: Could not load content for http://127.0.0.1:8053/dist/dygraph.min.js.map: HTTP error: status code 404, net::ERR_HTTP_RESPONSE_CODE_FAILURE
dygraph.min.js:5 Can't plot empty data set
Q.parseArray_ @ dygraph.min.js:5
Q.start_ @ dygraph.min.js:5
Q.__init__ @ dygraph.min.js:4
Q @ dygraph.min.js:4
(anonymous) @ kresd.js:89
mightThrow @ jquery.js:2
process @ jquery.js:2
setTimeout (async)
(anonymous) @ jquery.js:2
fire @ jquery.js:2
fireWith @ jquery.js:2
fire @ jquery.js:2
fire @ jquery.js:2
fireWith @ jquery.js:2
ready @ jquery.js:2
completed @ jquery.js:2
jquery.js:2 jQuery.Deferred exception: chartElement is not defined ReferenceError: chartElement is not defined
at HTMLDocument.<anonymous> (http://127.0.0.1:8053/kresd.js:357:2)
at mightThrow (http://127.0.0.1:8053/jquery.js:2:15044)
at process (http://127.0.0.1:8053/jquery.js:2:15698) undefined
jQuery.Deferred.exceptionHook @ jquery.js:2
process @ jquery.js:2
setTimeout (async)
(anonymous) @ jquery.js:2
fire @ jquery.js:2
fireWith @ jquery.js:2
fire @ jquery.js:2
fire @ jquery.js:2
fireWith @ jquery.js:2
ready @ jquery.js:2
completed @ jquery.js:2
jquery.js:2 Uncaught ReferenceError: chartElement is not defined
at HTMLDocument.<anonymous> (kresd.js:357)
at mightThrow (jquery.js:2)
at process (jquery.js:2)
(anonymous) @ kresd.js:357
mightThrow @ jquery.js:2
process @ jquery.js:2
setTimeout (async)
jQuery.readyException @ jquery.js:2
(anonymous) @ jquery.js:2
mightThrow @ jquery.js:2
process @ jquery.js:2
setTimeout (async)
(anonymous) @ jquery.js:2
fire @ jquery.js:2
fireWith @ jquery.js:2
fire @ jquery.js:2
fire @ jquery.js:2
fireWith @ jquery.js:2
process @ jquery.js:2
setTimeout (async)
(anonymous) @ jquery.js:2
fire @ jquery.js:2
fireWith @ jquery.js:2
fire @ jquery.js:2
fire @ jquery.js:2
fireWith @ jquery.js:2
ready @ jquery.js:2
completed @ jquery.js:2
```
**Knot Resolver Configuration:**
```
-- Network interface configuration
net.listen('127.0.0.1', 53, { kind = 'dns' })
net.listen('127.0.0.1', 853, { kind = 'tls' })
net.listen('127.0.0.1', 8053, { kind = 'webmgmt' })
-- Load useful modules
modules = {
'policy',
'http'
}
-- Cache size
cache.size = 1 * GB
-- Forward to upstream servers (8.8.8.8 and 1.1.1.1) using DoT
policy.add(policy.all(policy.TLS_FORWARD({
{'8.8.8.8', hostname='dns.google'},
{'1.1.1.1', hostname='cloudflare-dns.com'}
})))
```
**Knot Resolver Version:**
```
root@engine:/etc/knot-resolver# apt-cache policy knot-resolver
knot-resolver:
Installed: 5.1.3-2
Candidate: 5.1.3-2
Version table:
*** 5.1.3-2 500
500 http://download.opensuse.org/repositories/home:/CZ-NIC:/knot-resolver-latest/xUbuntu_20.04 Packages
100 /var/lib/dpkg/status
3.2.1-3ubuntu2 500
500 http://kambing.ui.ac.id/ubuntu focal/universe amd64 Packages
```
Thank You.https://gitlab.nic.cz/knot/knot-resolver/-/issues/621always keep RRSIG and its RRset in single data structure2020-10-07T18:04:01+02:00Petr Špačekalways keep RRSIG and its RRset in single data structureProblem: At the moment RRset and its RRSIG are two independent `knot_rrset_t` structures.
This leads to problems like !1072 where things get mixed and weird things happen after that.
Idea: Refactor code so RRset is always tied to all as...Problem: At the moment RRset and its RRSIG are two independent `knot_rrset_t` structures.
This leads to problems like !1072 where things get mixed and weird things happen after that.
Idea: Refactor code so RRset is always tied to all associated RRSIGs (multiple of them!).
Investigation how this could be done in most efficient way is needed.
Maybe this approach could be beneficial also to libknot/Knot DNS so let's not forget to talk to them.
Cc @lpeltan @dsalzman and gang.https://gitlab.nic.cz/knot/knot-resolver/-/issues/606incorporate DNS Shotgun into kresd CI2020-10-30T11:55:49+01:00Petr Špačekincorporate DNS Shotgun into kresd CIThe following discussion from !1054 should be addressed:
- [ ] @pspacek started a [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1054#note_169587): (+1 comment)
> @tkrizek Do you see a way to add this scena...The following discussion from !1054 should be addressed:
- [ ] @pspacek started a [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1054#note_169587): (+1 comment)
> @tkrizek Do you see a way to add this scenation into pytests/connection tests?https://gitlab.nic.cz/knot/knot-resolver/-/issues/605cache: explore better ways to detect cache changes made by other processes2020-11-04T11:53:32+01:00Petr Špačekcache: explore better ways to detect cache changes made by other processeskresd 5.2.0 does periodic check which might take too long on very busy systems. Maybe we could use some event-based mechanism?
See [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1042#note_168310).
The following ...kresd 5.2.0 does periodic check which might take too long on very busy systems. Maybe we could use some event-based mechanism?
See [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1042#note_168310).
The following discussion from !1042 should be addressed:
- [ ] @pspacek started a [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1042#note_168310): (+1 comment)
> Why not use https://docs.libuv.org/en/v1.x/guide/filesystem.html#file-change-events ?https://gitlab.nic.cz/knot/knot-resolver/-/issues/603cache: get rid of mdb_env_sync()2020-09-07T17:52:07+02:00Petr Špačekcache: get rid of mdb_env_sync()Explicit cache sync does not seem necessary and might be counterproductive, see other comments in the thread:
The following discussion from !1042 should be addressed:
- [ ] @pspacek started a [discussion](https://gitlab.nic.cz/knot/kno...Explicit cache sync does not seem necessary and might be counterproductive, see other comments in the thread:
The following discussion from !1042 should be addressed:
- [ ] @pspacek started a [discussion](https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1042#note_169608): (+1 comment)
> Out of curiosity, why the sync is necessary here?https://gitlab.nic.cz/knot/knot-resolver/-/issues/598ability to reload ssl certificate on certificate change2020-11-25T09:26:51+01:00TomVnzability to reload ssl certificate on certificate changeI was looking into doing this automatically but seems there is no cohesive way within knot-resolver.
Played around with using the control socket options, but it's a bit messy...e.g. use:
<code>net.close('0.0.0.0')
http.config({tls ...I was looking into doing this automatically but seems there is no cohesive way within knot-resolver.
Played around with using the control socket options, but it's a bit messy...e.g. use:
<code>net.close('0.0.0.0')
http.config({tls = true, cert = "\<CERT\>", key = "\<KEY\>"}, '<webmgmt|doh>') --for DoH|webmgmt
net.listen('0.0.0.0', 53, { kind = 'dns' })
net.listen('0.0.0.0', 443, { kind = 'doh' })
net.listen('0.0.0.0', 853, { kind = 'tls' })
net.listen('0.0.0.0', 8453, { kind = 'webmgmt' })
net.tls("\<CERT\>", "\<KEY\>") --for DoT
</code>
But, if knot-resolver is running as unprivileged user then it can't rebind to privileged ports. And this needs to be scripted somehow.
An alternative way would be for the process that creates the new SSL certificates to restart knot-resolver but then that process would need to run as root.
So for now, I'm using a custom systemd path / service combo to monitor certificate file for any changes and then reload knot-resolver that way.
Would be keen to know of any thoughts to simplyfy this, or even the ability to reload the certificate could be added into knot-resolver itself - I know rpz files are monitored and reloaded when changed so this seems somewhat similar.https://gitlab.nic.cz/knot/knot-resolver/-/issues/59364-bit ARM: remaining issues2020-10-01T10:53:36+02:00Santiago64-bit ARM: remaining issues(EDITed)
It's still possible to run into `bad light userdata pointer` errors, possibly hidden under
`missing luajit package: cqueues`. For summary see this post below: https://gitlab.nic.cz/knot/knot-resolver/-/issues/593#note_165359
...(EDITed)
It's still possible to run into `bad light userdata pointer` errors, possibly hidden under
`missing luajit package: cqueues`. For summary see this post below: https://gitlab.nic.cz/knot/knot-resolver/-/issues/593#note_165359
- - -
#### Original post
Hi there,
It seems to be known that kresd doesn't work on arm64, but I haven't found this particular build error document (so sorry for the possible noise). knot-resolver 5.1.x doesn't build on Debian due to a luajit error (bad light userdata pointer). The full build log is in https://buildd.debian.org/status/fetch.php?pkg=knot-resolver&arch=arm64&ver=5.1.2-1&stamp=1596037546&raw=0
And this is the relevant part:
````
...
Message: --- config_tests dependencies ---
Running command: /usr/bin/luajit -l cqueues -e os.exit(0)
--- stdout ---
--- stderr ---
/usr/bin/luajit: bad light userdata pointer
stack traceback:
[C]: at 0xffffb6342ad0
[C]: in function 'require'
/usr/share/lua/5.1/cqueues.lua:2: in function </usr/share/lua/5.1/cqueues.lua:1>
[C]: at 0xaaaae1757d08
[C]: at 0xaaaae170a4c0
../tests/meson.build:27:4: ERROR: Problem encountered: missing luajit package: cqueues
````
Cheers,
-- Santiagohttps://gitlab.nic.cz/knot/knot-resolver/-/issues/578test aggressive cache on NSEC3PARAM rotation2020-08-20T10:05:40+02:00Vladimír Čunátvladimir.cunat@nic.cztest aggressive cache on NSEC3PARAM rotationI don't think we have any tests on that in particular, though the code's been deployed for a long time. Still, most of possible failures I can imagine should only lead to insufficient caching.
Hints around how the implementation works:...I don't think we have any tests on that in particular, though the code's been deployed for a long time. Still, most of possible failures I can imagine should only lead to insufficient caching.
Hints around how the implementation works:
- NSEC3PARAM is the [data collected](https://tools.ietf.org/html/rfc5155#section-4.2) but it's taken from NSEC3 records directly.
- For this purpose, using NSEC is like one more possible NSEC3PARAM configuration.
- Reading from cache is designed to consider the last two NSEC3PARAMs that's been written for that zone.
- Code reference: identifiers containing `nsec_p`.https://gitlab.nic.cz/knot/knot-resolver/-/issues/573net.tls() allow usage of multiple certificates2020-10-08T11:43:59+02:00Tomas Krizeknet.tls() allow usage of multiple certificatesECC certificates provide superior performance to RSA keys of comparable security. Supporting multiple certificate files in `net.tls()` could lead to improved DNS-over-TLS performance without sacrificng compatibility with older clients, i...ECC certificates provide superior performance to RSA keys of comparable security. Supporting multiple certificate files in `net.tls()` could lead to improved DNS-over-TLS performance without sacrificng compatibility with older clients, if both ECC and RSA certificates could be used simulataneously.