Commit ec46da16 authored by Petr Špaček's avatar Petr Špaček

WIP

parent df46ad00
EDNS compliance scanner for DNS zones
=====================================
This repo contains set of scripts to scan all delegated domains from single
zone file for EDNS compliance problems and to evaluate practical impact of
EDNS Flag Day 2019 (see https://dnsflagday.net/) on particular zone.
Example
-------
Please see methodology in doc/methodology.rst to get full understanding of
output from evalzone.py::
Mode | Permissive (<= 2018) | Strict (2019+)
---------------------------------------------------------------
Ok | 917898 71.07 % | 917892 71.07 %
Compatible | 171619 13.29 % | 171614 13.29 %
High latency | 101727 7.88 % | 96388 7.46 %
Dead | 100287 7.76 % | 105637 8.18 %
Usage
-----
Normally you need to run all the tools in sequence to get statistical results
for the whole zone. Steps which require communication across network should be
run multiple times to smooth out network glitches like timeouts etc.
Beware! Processing huge zone file requires several gigabytes
of operating memory and it might take tens of minutes
to convert the data from text to binary.
All the tools work with data in current working directory.
Make sure it is writeable and has enough free space (comparable
to size of the original zone file).
With all this in mind you can use the following script.
Please read comments below and report bugs to CZ.NIC Gitlab:
https://gitlab.labs.nic.cz/knot/edns-zone-scanner/issues
Commands
--------
(Do not read this in Gitlab's web interface, it is ugly!)
# get zone data into file called "zone", e.g.
# dig -t AXFR nu. @zonedata.iis.se. nu. > zone
wget -O zone 'https://www.internic.net/domain/in-addr.arpa'
# (optional) strip DNSSEC records to speed up processing
ldns-read-zone -s zone > zone.nodnssec
# transform zonefile into Python objects
# NOTE: change "<example.origin.>" to zone origin, e.g. "cz."
./zone2pickle.py zone.nodnssec <example.origin.>
# resolve NS names to IP addresses using local resolver
# (timeouts etc. might cause some queries to fail)
# repeat the process until you have sufficient number of NS names resolved
./nsname2ipset.py
# (see stats at the very end of output)
# determine IP addresses of authoritative NSses for each domain
# this step sends "<domain> NS" query to each IP address to test
# if given IP address is authoritative for given domain
# (timeouts etc. might cause some queries to fail)
# repeat the process until you have sufficient number of IP addresses tested
./domain2ipset.py
# (see stats at the very end of output)
# generate input for EDNS compliance test suite
./genednscomp.py > ednscomp.input
# run EDNS compliance test suite
# the script runs genreport binary in a loop
# it is recommended to collect at least 10 full runs to eliminate network noise
# (feel free to terminate the script with SIGTERM)
# result of each run is stored in file ednscompresult-<timestamp>
# Hint: You can run ./testedns.py in parallel, possibly on multiple machines
PATH=$PATH:<path to genreport tool> ./testedns.py
# (monitor number of ednscompresult- files and terminate as necessary;
# the script will do 10 full scans to eliminate random network failures)
# merge all text results from EDNS test suite into Python objects
./ednscomp2pickle.py ednscompresult-*
# process EDNS stats for given zone
./evalzone.py
# output includes statistical results for whole zone file
# print list of domains which are going to break in 2019
# i.e. list of domains which are clasified as "high latency"
# in the permissive mode but are "dead" in strict mode
./diffresults.py
# alternatively print dead domains + list of their NSses
# some of the NSes might be broken for other reasons than EDNS,
# e.g. some might not be authoritative for domain in question etc.
./nsprint.py
Methodology
===========
This section roughly describes algorithm used to categorize domains.
Please note that categorization depends on "mode", i.e. results differ
for situation before the DNS Flag Day 2019 and after it. See below.
Assumptions
-----------
Beware that the algorithm is optimized using following assumption:
EDNS support on a given IP address does not depend on domain name
used for test as long as the IP address is authoritative
for the domain.
E.g. if two zones example.com. and example.net. are hosted at
the same IP address 192.0.2.1, it is expected that the IP address
exhibits the same behavior if tests are done with example.com.
and example.net.
This assumption allows us to test each IP address just N times
intead of N*(number of domains hosted on that IP address).
Algorithm
---------
1. Each delegation (NS record) in the zone file is converted to mapping
domain => set of NS names => set of IP addresses
using glue data from the DNS zone + local resolver for names which do not
have glue in the zone.
2. Each individual name server IP address is tested to check if the NS
responds authoritatively for given domain. An IP is considered "dead"
if it does not respond at all to plain DNS query "domain. NS"
or if it is not authoritative for a given domain.
3. Each NS IP address which is authoritative for at least one domain
is then tested for EDNS compliance using genreport tool by ISC.
Each IP address is tested once during one pass, i.e. one NS which is
authoritative for 300k domains in zone will be tested only once.
4. EDNS test can be repeated multiple times to eliminate effect
of random network glitches to overall result.
Multiple runs of individual tests for a single IP are combined using
simple majority. This should eliminate random network failures.
E.g. if genreport test 'edns' times out once and passes ('ok') 9 times
only the 'ok' result is used.
5. For each IP address its individual EDNS test results from genreport
are combined together to get overall state of that particular IP address:
- if all tests are 'ok' -> overall result is 'ok'
- further analysis ignores EDNS version 1 because it does not have impact
on 2019 DNS flag day (i.e. only plain DNS and EDNS 0 is considered)
- if no result is 'timeout' -> overall result is 'compatible'
(it does not support all EDNS features but at least it does not break
in face of EDNS 0 query)
- further categorization depends on "mode", see below.
6. Evaluation in the "permissive" = state before the DNS Flag Day 2019:
- IP addresses which pass only the basic test 'dns' but fail other tests
with 'timeout' will eventually work and are categorized as 'high_latency'.
- IP addresses which do not pass even 'dns' test are categorized as 'dead'.
7. Evaluation in the "strict" mode = state after the DNS Flag Day 2019:
- IP addresses which fail any DNS or EDNS 0 test with 'timeout'
are categorized as 'dead'. In strict mode there is no 'high_latency'
caused by EDNS non-compliance.
8. Results for individual IP addresses are combined to overall result for each
domain delegated in the zone file:
- domains without any working authoritative NS are 'dead'
- remaining domains with all NS IP addresses 'dead' are 'dead'
(IP evaluation depends on mode, see above)
- remaining domains with at least one un-resolvable NS IP address
+ remaining domains with at least one NS IP address 'dead'
are 'high_latency' (resolvers must retry queries)
- remaining domains have their results set to the worst result
from their respective NS set
(e.g. 2 IP addresses 'ok' + 2 'compatible' => 'compatible')
Limitations
-----------
1. This toolchain tests EDNS compliance only on DNS delegations in given zone
and does not evaluate any other data.
For example, the DNS domain `example.com.` might contain this CNAME record:
`www.example.com. CNAME broken.cdn.test.`
If the tested zone file contains delegation `example.com. NS`,
the result will show only state of `example.com.`'s DNS servers
but will not reflect state of the target CDN which might be source
of EDNS compliance problems. As a result, the domain `example.com.`
could be categorized as `ok` but application running on `www.example.com.`
might be unavailable because of depedency on a broken CDN.
2. Anycast routing limits what can be tested from a single vantage point.
It is technically possible for authoritatives to use different implementations
in different anycast domains.
3. Of course, when evaluating impact it needs to be taken into accoutn that
not all domains are equally important for users.
Prerequisites
=============
Environment requirements
------------------------
Before testing please make sure that:
- IPv4 and IPv6 connectivity actually works
- firewall and/or middlebox on network is not filtering any DNS packets
- IP fragments can be delivered back to the testing machine
The scanner tool does not check any of these and failure to provide
"clean" network path will significantly skew results.
Software dependencies
---------------------
1. The EDNS compliance test for single domain is actually done by
ISC's tool genreport which is available from this URL:
https://gitlab.isc.org/isc-projects/DNS-Compliance-Testing
2. python-dns library for Python 3
Beware: Latest version of python-dns library will not work if you do not
canonicalize zone files with some non-ASCII values.
This breaks processing on certain TLDs so always canonicalize the zone file.
3. ldns command line tool ldns-read-zone for zone canonicalization
and to strip out unnecessary data.
Usage
=====
There are two main options how to use the scanner:
a. Use script "allinone.py" to automate whole scan.
This is recommended method and easiest to use.
b. Run individual parts of the scan by hand which allows you to inspect
results from each individual run, to have control over number
of test cycles etc.
This is normally not required and is intended for development and debugging.
Preparation
-----------
0. Beware! Processing huge zone file requires several gigabytes
of operating memory and it might take tens of minutes
to convert the data from text to binary. Use a beefy machine.
1. All the tools work with data in current working directory.
Make sure it is writeable and has enough free space (comparable
to size of the original zone file).
2. Make sure all prerequisites are met.
It is important to check network requirements listed
in file doc/prerequisites.rst!
Once network is ready it might be easiest to use Docker image from CZ.NIC:
$ sudo docker run --network=host -v /home/test:/data registry.labs.nic.cz/knot/edns-zone-scanner/prod
3. Canonicalize the zone file and strip out unnecessary data
to speed up further processing:
$ ldns-read-zone -E SOA -E NS -E A -E AAAA input_zone > zone.nodnssec
Full scan
---------
Usage:
$ allinone.py <canonicalized zone file> <zone origin>
E.g.
$ allinone.py zone.nodnssec example.net.
Reading results
---------------
Statistical results are stored in files summary.csv and summary.txt.
Manual run
----------
Alternative to "allinone.py" script is to run individual the tools in sequence
to get statistical results for the whole zone.
Steps which require communication across network should be
run multiple times to smooth out network glitches like timeouts etc.
(This repetition is normally done by allinone script but individual tools
do not automate repetition.)
With all this in mind you can use the following script.
Please read comments below and report bugs to CZ.NIC Gitlab:
https://gitlab.labs.nic.cz/knot/edns-zone-scanner/issues
Commands
--------
(Do not read this in Gitlab's web interface, it is ugly!)
# get zone data into file called "zone", e.g.
# dig -t AXFR nu. @zonedata.iis.se. nu. > zone
wget -O zone 'https://www.internic.net/domain/in-addr.arpa'
# (optional) strip DNSSEC records to speed up processing
ldns-read-zone -s zone > zone.nodnssec
# transform zonefile into Python objects
# NOTE: change "<example.origin.>" to zone origin, e.g. "cz."
./zone2pickle.py zone.nodnssec <example.origin.>
# resolve NS names to IP addresses using local resolver
# (timeouts etc. might cause some queries to fail)
# repeat the process until you have sufficient number of NS names resolved
./nsname2ipset.py
# (see stats at the very end of output)
# determine IP addresses of authoritative NSses for each domain
# this step sends "<domain> NS" query to each IP address to test
# if given IP address is authoritative for given domain
# (timeouts etc. might cause some queries to fail)
# repeat the process until you have sufficient number of IP addresses tested
./domain2ipset.py
# (see stats at the very end of output)
# generate input for EDNS compliance test suite
./genednscomp.py > ednscomp.input
# run EDNS compliance test suite
# the script runs genreport binary in a loop
# it is recommended to collect at least 10 full runs to eliminate network noise
# (feel free to terminate the script with SIGTERM)
# result of each run is stored in file ednscompresult-<timestamp>
# Hint: You can run ./testedns.py in parallel, possibly on multiple machines
PATH=$PATH:<path to genreport tool> ./testedns.py
# (monitor number of ednscompresult- files and terminate as necessary;
# the script will do 10 full scans to eliminate random network failures)
# merge all text results from EDNS test suite into Python objects
./ednscomp2pickle.py ednscompresult-*
# process EDNS stats for given zone
./evalzone.py
# output includes statistical results for whole zone file
# print list of domains which are going to break in 2019
# i.e. list of domains which are clasified as "high latency"
# in the permissive mode but are "dead" in strict mode
./diffresults.py
# alternatively print dead domains + list of their NSses
# some of the NSes might be broken for other reasons than EDNS,
# e.g. some might not be authoritative for domain in question etc.
./nsprint.py
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment