Commit 11bc6756 authored by Petr Špaček's avatar Petr Špaček

Merge branch 'docs' into 'master'

Docs overhaul

See merge request !9
parents df46ad00 92f2804f
Pipeline #41263 passed with stages
in 3 minutes and 27 seconds
......@@ -5,191 +5,14 @@ This repo contains set of scripts to scan all delegated domains from single
zone file for EDNS compliance problems and to evaluate practical impact of
EDNS Flag Day 2019 (see https://dnsflagday.net/) on particular zone.
Testing methodology is described in file doc/methodology.rst.
Methodology
-----------
This section roughly describes algorithm used to categorize domains.
Please note that categorization depends on "mode", i.e. results differ
for situation before the DNS Flag Day 2019 and after it. See below.
Before you start please follow instructions in file doc/prerequisites.rst,
it contains important information.
Beware that the algorithm is optimized using following assumption:
EDNS support on a given IP address does not depend on domain name
used for test as long as the IP address is authoritative
for the domain.
Usage instructions can be found in file doc/usage.rst.
1. Each delegation (NS record) in the zone file is converted to mapping
domain => set of NS names => set of IP addresses
using glue data from the DNS zone + local resolver for names which do not
have glue in the zone.
2. Each individual name server IP address is tested to check if the NS
responds authoritatively for given domain. An IP is considered "dead"
if it does not respond at all to plain DNS query "domain. NS"
or if it is not authoritative for a given domain.
3. Each NS IP address which is authoritative for at least one domain
is then tested for EDNS compliance using genreport tool by ISC.
Each IP address is tested once during one pass, i.e. one NS which is
authoritative for 300k domains in zone will be tested only once.
4. EDNS test can be repeated multiple times to eliminate effect
of random network glitches to overall result.
Multiple runs of individual tests for a single IP are combined using
simple majority. This should eliminate random network failures.
E.g. if genreport test 'edns' times out once and passes ('ok') 9 times
only the 'ok' result is used.
5. For each IP address its individual EDNS test results from genreport
are combined together to get overall state of that particular IP address:
- if all tests are 'ok' -> overall result is 'ok'
- further analysis ignores EDNS version 1 because it does not have impact
on 2019 DNS flag day (i.e. only plain DNS and EDNS 0 is considered)
- if no result is 'timeout' -> overall result is 'compatible'
(it does not support all EDNS features but at least it does not break
in face of EDNS 0 query)
- further categorization depends on "mode", see below.
6. Evaluation in the "permissive" = state before the DNS Flag Day 2019:
- IP addresses which pass only the basic test 'dns' but fail other tests
with 'timeout' will eventually work and are categorized as 'high_latency'.
- IP addresses which do not pass even 'dns' test are categorized as 'dead'.
7. Evaluation in the "strict" mode = state after the DNS Flag Day 2019:
- IP addresses which fail any DNS or EDNS 0 test with 'timeout'
are categorized as 'dead'. In strict mode there is no 'high_latency'
caused by EDNS non-compliance.
8. Results for individual IP addresses are combined to overall result for each
domain delegated in the zone file:
- domains without any working authoritative NS are 'dead'
- remaining domains with all NS IP addresses 'dead' are 'dead'
(IP evaluation depends on mode, see above)
- remaining domains with at least one un-resolvable NS IP address
+ remaining domains with at least one NS IP address 'dead'
are 'high_latency' (resolvers must retry queries)
- remaining domains have their results set to the worst result
from their respective NS set
(e.g. 2 IP addresses 'ok' + 2 'compatible' => 'compatible')
Limitations
-----------
This toolchain tests EDNS compliance only on DNS delegations in given zone
and does evaluate any other data.
For example, the DNS domain `example.com.` might contain this CNAME record:
`www.example.com. CNAME broken.cdn.test.`
If the tested zone file contains delegation `example.com. NS`,
the result will show only state of `example.com.`'s DNS servers
but will not reflect state of the target CDN which might be source
of EDNS compliance problems. As a result, the domain `example.com.`
could be categorized as `ok` but application running on `www.example.com.`
might be unavailable because of depedency on a broken CDN.
Further research is needed to overcome this limitation.
Example
-------
Please see methodology above to get full understanding of
output from evalzone.py::
Mode | Permissive (<= 2018) | Strict (2019+)
---------------------------------------------------------------
Ok | 917898 71.07 % | 917892 71.07 %
Compatible | 171619 13.29 % | 171614 13.29 %
High latency | 101727 7.88 % | 96388 7.46 %
Dead | 100287 7.76 % | 105637 8.18 %
Dependencies
------------
1. The EDNS compliance test for single domain is actually done by
ISC's tool genreport which is available from this URL:
https://gitlab.isc.org/isc-projects/DNS-Compliance-Testing
2. python-dns library for Python 3
Beware: Latest version of python-dns library does not work for zone files
with some non-ASCII values even if these are converted to \123 notation.
This breaks processing on certain TLDs, e.g. "nu".
3. (Optional) ldns command line tools for optimization
Usage
-----
Normally you need to run all the tools in sequence to get statistical results
for the whole zone. Steps which require communication across network should be
run multiple times to smooth out network glitches like timeouts etc.
Beware! Processing huge zone file requires several gigabytes
of operating memory and it might take tens of minutes
to convert the data from text to binary.
All the tools work with data in current working directory.
Make sure it is writeable and has enough free space (comparable
to size of the original zone file).
With all this in mind you can use the following script.
Please read comments below and report bugs to CZ.NIC Gitlab:
Please report bugs to CZ.NIC Gitlab:
https://gitlab.labs.nic.cz/knot/edns-zone-scanner/issues
Commands
--------
(Do not read this in Gitlab's web interface, it is ugly!)
# get zone data into file called "zone", e.g.
# dig -t AXFR nu. @zonedata.iis.se. nu. > zone
wget -O zone 'https://www.internic.net/domain/in-addr.arpa'
# (optional) strip DNSSEC records to speed up processing
ldns-read-zone -s zone > zone.nodnssec
# transform zonefile into Python objects
# NOTE: change "<example.origin.>" to zone origin, e.g. "cz."
./zone2pickle.py zone.nodnssec <example.origin.>
# resolve NS names to IP addresses using local resolver
# (timeouts etc. might cause some queries to fail)
# repeat the process until you have sufficient number of NS names resolved
./nsname2ipset.py
# (see stats at the very end of output)
# determine IP addresses of authoritative NSses for each domain
# this step sends "<domain> NS" query to each IP address to test
# if given IP address is authoritative for given domain
# (timeouts etc. might cause some queries to fail)
# repeat the process until you have sufficient number of IP addresses tested
./domain2ipset.py
# (see stats at the very end of output)
# generate input for EDNS compliance test suite
./genednscomp.py > ednscomp.input
# run EDNS compliance test suite
# the script runs genreport binary in a loop
# it is recommended to collect at least 10 full runs to eliminate network noise
# (feel free to terminate the script with SIGTERM)
# result of each run is stored in file ednscompresult-<timestamp>
# Hint: You can run ./testedns.py in parallel, possibly on multiple machines
PATH=$PATH:<path to genreport tool> ./testedns.py
# (monitor number of ednscompresult- files and terminate as necessary;
# the script will do 10 full scans to eliminate random network failures)
# merge all text results from EDNS test suite into Python objects
./ednscomp2pickle.py ednscompresult-*
# process EDNS stats for given zone
./evalzone.py
# output includes statistical results for whole zone file
# print list of domains which are going to break in 2019
# i.e. list of domains which are clasified as "high latency"
# in the permissive mode but are "dead" in strict mode
./printresults.py new
# alternatively print dead domains + list of their NSses
# some of the NSes might be broken for other reasons than EDNS,
# e.g. some might not be authoritative for domain in question etc.
./printresults.py new --ns
Thank you for helping out with the DNS flag day!
Methodology
===========
This section roughly describes algorithm used to categorize domains.
Please note that categorization depends on "mode", i.e. results differ
for situation before the DNS Flag Day 2019 and after it. See below.
Assumptions
-----------
Beware that the algorithm is optimized using following assumption:
EDNS support on a given IP address does not depend on domain name
used for test as long as the IP address is authoritative
for the domain.
E.g. if two zones example.com. and example.net. are hosted at
the same IP address 192.0.2.1, it is expected that the IP address
exhibits the same behavior if tests are done with example.com.
and example.net.
This assumption allows us to test each IP address just N times
intead of N*(number of domains hosted on that IP address).
Algorithm
---------
1. Each delegation (NS record) in the zone file is converted to mapping
domain => set of NS names => set of IP addresses
using glue data from the DNS zone + local resolver for names which do not
have glue in the zone.
2. Each individual name server IP address is tested to check if the NS
responds authoritatively for given domain. An IP is considered "dead"
if it does not respond at all to plain DNS query "domain. NS"
or if it is not authoritative for a given domain.
3. Each NS IP address which is authoritative for at least one domain
is then tested for EDNS compliance using genreport tool by ISC.
Each IP address is tested once during one pass, i.e. one NS which is
authoritative for 300k domains in zone will be tested only once.
4. EDNS test can be repeated multiple times to eliminate effect
of random network glitches to overall result.
Multiple runs of individual tests for a single IP are combined using
simple majority. This should eliminate random network failures.
E.g. if genreport test 'edns' times out once and passes ('ok') 9 times
only the 'ok' result is used.
5. For each IP address its individual EDNS test results from genreport
are combined together to get overall state of that particular IP address:
- if all tests are 'ok' -> overall result is 'ok'
- further analysis ignores EDNS version 1 because it does not have impact
on 2019 DNS flag day (i.e. only plain DNS and EDNS 0 is considered)
- if no result is 'timeout' -> overall result is 'compatible'
(it does not support all EDNS features but at least it does not break
in face of EDNS 0 query)
- further categorization depends on "mode", see below.
6. Evaluation in the "permissive" = state before the DNS Flag Day 2019:
- IP addresses which pass only the basic test 'dns' but fail other tests
with 'timeout' will eventually work and are categorized as 'high_latency'.
- IP addresses which do not pass even 'dns' test are categorized as 'dead'.
7. Evaluation in the "strict" mode = state after the DNS Flag Day 2019:
- IP addresses which fail any DNS or EDNS 0 test with 'timeout'
are categorized as 'dead'. In strict mode there is no 'high_latency'
caused by EDNS non-compliance.
8. Results for individual IP addresses are combined to overall result for each
domain delegated in the zone file:
- domains without any working authoritative NS are 'dead'
- remaining domains with all NS IP addresses 'dead' are 'dead'
(IP evaluation depends on mode, see above)
- remaining domains with at least one un-resolvable NS IP address
+ remaining domains with at least one NS IP address 'dead'
are 'high_latency' (resolvers must retry queries)
- remaining domains have their results set to the worst result
from their respective NS set
(e.g. 2 IP addresses 'ok' + 2 'compatible' => 'compatible')
Limitations
-----------
1. This toolchain tests EDNS compliance only on DNS delegations in given zone
and does not evaluate any other data.
For example, the DNS domain `example.com.` might contain this CNAME record:
`www.example.com. CNAME broken.cdn.test.`
If the tested zone file contains delegation `example.com. NS`,
the result will show only state of `example.com.`'s DNS servers
but will not reflect state of the target CDN which might be source
of EDNS compliance problems. As a result, the domain `example.com.`
could be categorized as `ok` but application running on `www.example.com.`
might be unavailable because of depedency on a broken CDN.
2. Anycast routing limits what can be tested from a single vantage point.
It is technically possible for authoritatives to use different implementations
in different anycast domains.
3. Of course, when evaluating impact it needs to be taken into accoutn that
not all domains are equally important for users.
Prerequisites
=============
Environment requirements
------------------------
Before testing please make sure that:
- IPv4 and IPv6 connectivity actually works
- firewall and/or middlebox on network is not filtering any DNS packets
- IP fragments can be delivered back to the testing machine
The scanner tool does not check any of these and failure to provide
"clean" network path will significantly skew results.
Software dependencies
---------------------
1. The EDNS compliance test for single domain is actually done by
ISC's tool genreport which is available from this URL:
https://gitlab.isc.org/isc-projects/DNS-Compliance-Testing
2. python-dns library for Python 3
Beware: Latest version of python-dns library will not work if you do not
canonicalize zone files with some non-ASCII values.
This breaks processing on certain TLDs so always canonicalize the zone file.
3. ldns command line tool ldns-read-zone for zone canonicalization
and to strip out unnecessary data.
Usage
=====
There are two main options how to use the scanner:
a. Use script "allinone.py" to automate whole scan.
This is recommended method and easiest to use.
b. Run individual parts of the scan by hand which allows you to inspect
results from each individual run, to have control over number
of test cycles etc.
This is normally not required and is intended for development and debugging.
Preparation
-----------
0. Beware! Processing huge zone file requires several gigabytes
of operating memory and it might take tens of minutes
to convert the data from text to binary. Use a beefy machine.
1. All the tools work with data in current working directory.
Make sure it is writeable and has enough free space (comparable
to size of the original zone file).
2. Make sure all prerequisites are met.
It is important to check network requirements listed
in file doc/prerequisites.rst!
Once network is ready it might be easiest to use Docker image from CZ.NIC:
$ sudo docker run --network=host -v /home/test:/data registry.labs.nic.cz/knot/edns-zone-scanner/prod
3. Canonicalize the zone file and strip out unnecessary data
to speed up further processing. Do not skip this step, missing canonicalization
might cause problems down the road:
$ ldns-read-zone -E SOA -E NS -E A -E AAAA input_zone > zone.nodnssec
Running scan
------------
Usage:
$ allinone.py <canonicalized zone file> <zone origin>
E.g.
$ allinone.py zone.nodnssec example.net.
Once the zone is loaded into memory the script will print informational
messages about progress. Make a coffee or let it run overnight ...
Reading results
---------------
First of all remember to read file doc/methodology.rst.
Statistical results are stored in files summary.csv and summary.txt.
Example summary.txt::
Mode | Permissive (<= 2018) | Strict (2019+)
-------------+-----------------------+----------------------
Ok | 191 82.68 % | 191 82.68 %
Compatible | 0 0.00 % | 0 0.00 %
High latency | 39 16.88 % | 38 16.45 %
Dead | 1 0.43 % | 2 0.87 %
This table indicates that 1 domain is already dead and that 1 other domain
will die after the EDNS flag day.
To get list of domains which will die after the 2019 DNS flag day run::
$ printresults.py new
strict dead 48.in-addr.arpa. ; EDNS behavior consistent for all servers
To get list of domains which are dead already (even before the flag day)
along with their NS names run::
$ printresults.py all permissive dead --ns
permissive dead 55.in-addr.arpa. ns01.army.mil. ; no working NS is authoritative for this domain
permissive dead 55.in-addr.arpa. ns02.army.mil. ; no working NS is authoritative for this domain
permissive dead 55.in-addr.arpa. ns03.army.mil. ; no working NS is authoritative for this domain
That's it! Thank you for helping out with the DNS flag day!
Manual run
----------
Alternative to "allinone.py" script is to run individual the tools in sequence
to get statistical results for the whole zone.
Steps which require communication across network should be
run multiple times to smooth out network glitches like timeouts etc.
(This repetition is normally done by allinone script but individual tools
do not automate repetition.)
With all this in mind you can use the following script.
Please read comments below and report bugs to CZ.NIC Gitlab:
https://gitlab.labs.nic.cz/knot/edns-zone-scanner/issues
(Do not read this in Gitlab's web interface, it is ugly!)
# get zone data into file called "zone", e.g.
# dig -t AXFR nu. @zonedata.iis.se. nu. > zone
wget -O zone 'https://www.internic.net/domain/in-addr.arpa'
# canonicalize the zone
# and strip DNSSEC records to speed up processing
ldns-read-zone -E SOA -E NS -E A -E AAAA zone > zone.nodnssec
# transform zonefile into Python objects
# NOTE: change "<example.origin.>" to zone origin, e.g. "cz."
zone2pickle.py zone.nodnssec <example.origin.>
# resolve NS names to IP addresses using local resolver
# (timeouts etc. might cause some queries to fail)
# repeat the process until you have sufficient number of NS names resolved
nsname2ipset.py
# (see stats at the very end of output)
# determine IP addresses of authoritative NSses for each domain
# this step sends "<domain> NS" query to each IP address to test
# if given IP address is authoritative for given domain
# (timeouts etc. might cause some queries to fail)
# repeat the process until you have sufficient number of IP addresses tested
domain2ipset.py
# (see stats at the very end of output)
# generate input for EDNS compliance test suite
genednscomp.py > ednscomp.input
# run EDNS compliance test suite
# the script runs genreport binary in a loop
# it is recommended to collect at least 10 full runs to eliminate network noise
# (feel free to terminate the script with SIGTERM)
# result of each run is stored in file ednscompresult-<timestamp>
# Hint: You can run testedns.py in parallel, possibly on multiple machines
PATH=$PATH:<path to genreport tool> testedns.py
# (monitor number of ednscompresult- files and terminate as necessary;
# the script will do 10 full scans to eliminate random network failures)
# merge all text results from EDNS test suite into Python objects
ednscomp2pickle.py ednscompresult-*
# process EDNS stats for given zone
evalzone.py
# output includes statistical results for whole zone file
# print list of domains which are going to break in 2019
# i.e. list of domains which are clasified as "high latency"
# in the permissive mode but are "dead" in strict mode
printresults.py new
# alternatively print dead domains + list of their NSses
# some of the NSes might be broken for other reasons than EDNS,
# e.g. some might not be authoritative for domain in question etc.
printresults.py new --ns
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment