Incoming IXFR with on-slave signing sometimes leads to memory corruption
Let there be Knot 2.6.7-1+020180710153240.24+stretch1.gbpfa6f52 (Debian upstream package) in this configuration of an on-slave signer, pulling zones from BIND9 on loopback port 53535:
server:
listen: 0.0.0.0@53
listen: ::@53
log:
- target: syslog
any: info
remote:
- id: master
address: ::1@53535
acl:
- id: acl_slave
address: ::1
action: transfer
- id: acl_master
address: ::1
action: notify
policy:
- id: ecdsa_fast
zsk-lifetime: 1h
propagation-delay: 10s
rrsig-lifetime: 2h
rrsig-refresh: 1h
nsec3: on
template:
- id: default
master: master
dnssec-signing: on
dnssec-policy: ecdsa_fast
acl: acl_slave
acl: acl_master
zone:
- domain: "example.com."
Configuration of BIND looks like this:
options {
directory "/var/cache/bind";
listen-on { none; };
listen-on-v6 port 53535 { ::1; };
ixfr-from-differences yes;
allow-transfer { localhost; };
also-notify { ::1; };
};
zone "example.com" { type master; file "example.com.zone"; };
Zone file example.com.zone
looks like this:
$TTL 60
@ IN SOA ns hostmaster 20 120 60 3600 10
IN NS ns
ns IN A 192.0.2.0
test IN TXT "test"
test2 IN TXT "test2"
test3 IN TXT "test3"
Everything works like expected:
# knotc zone-status example.com
[example.com.] role: slave | serial: 21 | transaction: none | freeze: no | refresh: +1m22s | update: not scheduled | expiration: +59m22s | journal flush: not scheduled | notify: not scheduled | DNSSEC re-sign: +59m22s | NSEC3 resalt: +29D23h59m22s | parent DS query: not scheduled
After a while, or by issuing knotc zone-sign example.com
, the zone gets resigned and its serial number gets higher.
But then, you delete a record from example.com zone file, say test2
and increase serial of the unsigned zone to 21. After issuing rndc reload example.com
bad things will start to happen:
named[19402]: received control channel command 'reload example.com'
named[19402]: zone example.com/IN: loaded serial 21
named[19402]: zone example.com/IN: sending notifies (serial 21)
knotd[19444]: info: [example.com.] notify, incoming, ::1@37079: received, serial 21
knotd[19444]: info: [example.com.] refresh, outgoing, ::1@53535: remote serial 21, zone is outdated
named[19402]: client ::1#59582 (example.com): transfer of 'example.com/IN': IXFR started (serial 20 -> 21)
knotd[19444]: info: [example.com.] IXFR, incoming, ::1@53535: starting
named[19402]: client ::1#59582 (example.com): transfer of 'example.com/IN': IXFR ended
knotd[19444]: info: [example.com.] IXFR, incoming, ::1@53535: finished, 0.00 seconds, 1 messages, 222 bytes
knotd[19444]: error: [example.com.] DNSSEC, failed to fix NSEC3 chain (no such record in zone found)
knotd[19444]: info: [example.com.] DNSSEC, next signing at 2018-07-23T11:19:30
knotd[19444]: info: [example.com.] refresh, outgoing, ::1@53535: zone updated, serial 22 -> 1929864568
knotd[19444]: warning: [example.com.] failed to update zone file (not enough space provided)
knotd[19444]: error: [example.com.] zone event 'journal flush' failed (not enough space provided)
At this moment the zone memory in the knot process probably gets corrupted. Server responds with SERVFAIL
to any DNS query, zone status return some random numbers:
# knotc zone-status example.com
[example.com.] role: slave | serial: 1929864568 | transaction: none | freeze: no | refresh: +51Y10M3D12h54m45s | update: not scheduled | expiration: +57Y12M17h25m37s | journal flush: not scheduled | notify: not scheduled | DNSSEC re-sign: +50m43s | NSEC3 resalt: +29D23h50m43s | parent DS query: not scheduled
Finally, forcing zone reload with knotc zone-reload example.com
leads to server crash due to an invalid pointer. I guess this is just an outcome of the memory corruption.
Please note that this issue is not 100% reproducible. During writing this issue report, I've seen a few cases where everything went smoothly. On the other hand, it reproduces fairly regularly not to be consider a random bug.
Disabling outgoing IXFR in the BIND process is a workaround to this issue.