SERVFAIL when serving from cache, don't know how to debug
I'm running knot-resolver 3.2.1-3~bpo9+1, Debian stretch-backports.
From time to time, resolving random domain names return SERVFAIL, which is being put into knot-resolver's cache.
Running dig +trace
to such domains usually return lookup errors even earlier in a chain.
If I clear cache with cache.clear()
, DNS works again as expected.
I don't know how to debug this issue and what could be the cause. How can I provide more logs to fix this issue?
My configuration:
user('knot-resolver','knot-resolver')
cache.size = 300 * MB
modules = { 'workarounds < iterate', 'stats', 'bogus_log' }
dofile("/etc/knot-resolver/knot-aliases-alt.conf")
policy.add(
policy.suffix(
policy.STUB(
{'127.0.0.4'}
),
policy.todnames(blocked_hosts)
)
)
Where /etc/knot-resolver/knot-aliases-alt.conf
is a file with single blocked_hosts={}
table with lots of hosts. It shouldn't affect DNS lookups and this issue.
Before clearing the cache:
# dig jprosto.ru
; <<>> DiG 9.10.3-P4-Debian <<>> jprosto.ru
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 8684
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;jprosto.ru. IN A
;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Sun Apr 28 14:40:51 CEST 2019
;; MSG SIZE rcvd: 39
# dig +trace jprosto.ru
; <<>> DiG 9.10.3-P4-Debian <<>> +trace jprosto.ru
;; global options: +cmd
. 120086 IN NS a.root-servers.net.
. 120086 IN NS b.root-servers.net.
. 120086 IN NS c.root-servers.net.
. 120086 IN NS d.root-servers.net.
. 120086 IN NS e.root-servers.net.
. 120086 IN NS f.root-servers.net.
. 120086 IN NS g.root-servers.net.
. 120086 IN NS h.root-servers.net.
. 120086 IN NS i.root-servers.net.
. 120086 IN NS j.root-servers.net.
. 120086 IN NS k.root-servers.net.
. 120086 IN NS l.root-servers.net.
. 120086 IN NS m.root-servers.net.
. 120086 IN RRSIG NS 8 0 518400 20190506170000 20190423160000 25266 . tRFeXF0ccHkCHTB11jEKDzXtoQtiSrCDX3GRzqyLvl2D5+ML6yqEkYTc e9Bs2sKYmXFk2pdldVbub3n0IQTXAW5MSuWDWqv/WtCA5v6FCCJTXCm+ mGDSKEbTdfLJDfzxYunWUKo1sYCs2d8im5LFs0RJMY/1EIngrJK1ujkj JrSXZjdmlaUv1cTBIXuV/Xn3CansYP3wOwIY3W4fOVYgfLAE1MEvnAUR 0xxjFj1eXNuv3wYE5mYGtumYL1fPHiU/XAIACZj3FWdWiG2loDz/u+ty zGPB6t+Ms7DKbaFp7EiWskWL60zWzxHcd3vxOUL0o0Ic+8csLqL6tO1h zJA3nA==
;; Received 717 bytes from 127.0.0.1#53(127.0.0.1) in 0 ms
ru. 172800 IN NS a.dns.ripn.net.
ru. 172800 IN NS b.dns.ripn.net.
ru. 172800 IN NS d.dns.ripn.net.
ru. 172800 IN NS e.dns.ripn.net.
ru. 172800 IN NS f.dns.ripn.net.
ru. 86400 IN DS 15506 8 2 331CBB1932E7CF201F81AB299EF8711AD7175E8812508679E475930C 2B145C97
ru. 86400 IN RRSIG DS 8 1 86400 20190511050000 20190428040000 25266 . nmGftS2ztiLhDImmEPgPAOnoBKrwOpARMkP03EJ4kyIGgGOESH5ePJDX bKiU74vp68hBetKPC8toxtBCD4Q6s7cYxelSKpuuchAvbT1V+6KQMdMp mhuLc9ix1A0PsmWr78ZrjngKSqmgg4lFW1Kgy1wxnHXicdGeyK4Gk0Tm Fb1AivBjgjnMY/KaV2ylocCKePIW+fT666ReFf2RteIdSTPHwqFfBj3s QuoZS+lSlMPrwM+Npj60hv/BE+B8tTzJxCQuTZf4talUND10ySUuEJqa GuSngvz8UY9HznZTHSyUn21orZggJcdTLFS3CpYsxU6tee4NjHlBG3sT hkvz1Q==
couldn't get address for 'a.dns.ripn.net': failure
couldn't get address for 'b.dns.ripn.net': failure
couldn't get address for 'd.dns.ripn.net': failure
couldn't get address for 'e.dns.ripn.net': failure
couldn't get address for 'f.dns.ripn.net': failure
dig: couldn't get address for 'a.dns.ripn.net': no more
After clearing the cache:
> cache.clear()
[count] => 675199
# dig jprosto.ru
; <<>> DiG 9.10.3-P4-Debian <<>> jprosto.ru
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 29704
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;jprosto.ru. IN A
;; ANSWER SECTION:
jprosto.ru. 300 IN A 5.101.152.156
;; Query time: 750 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Sun Apr 28 14:44:18 CEST 2019
;; MSG SIZE rcvd: 55