[Yeti DNS Discuss] ipv6 root name service discussed (gih blog)
paul at redbarn.org
Mon Dec 12 09:23:56 UTC 2016
Geoff Huston wrote:
> I am subscribed to this list, and I am happy to answer any question
> anyone may have about this.
> While there is no perfect approach here for IPv6 I believe that the
> potential pitfalls of the combination of large DNS responses and IPv6
> lie in an approach of using a large MTU for UDP and reacting to an
> ICMPv6 PTB - It's not clear if the 10 minute MTU cache lifetime used
> by some of the root servers is worth it, and frankly it could well be
> dropped to a smaller interval without much in the way of negative
> side effects, but this approach of starting large and dropping in
> response to MTU is, as far as I can tell, a path that minimises the
> other pitfalls.
while the problems you're bringing up have relevance to the root name
servers, the impact as you know will be to all dns name servers, and to
all wide-area UDP applications. ICMPv4 type 3 ("destination
unreachable") subtype 4 ("fragmentation required and DF set") poses a
very similar problem for IPv4-only networks, and has been exacerbated by
the wide deployment of NAT in IPv4. only the facts that DNS packets in
IPv4 are rarely sent with the DF option, and that intermediate routers
are allowed to fragment these fragmentation-eligible datagrams, has kept
this from being a major problem in IPv4.
so while my own thinking matches yours as to the most-likely-to-succeed
mitigation -- see below -- i think we ought to pitch our solutions more
generally than just to root dns server operators.
ten minutes could be longer than strictly necessary, where the bare
minimum for profitability is a one-time effort (sending a too-large
packet, hearing an ICMPv6 PTB, installing a temporary routing element,
waiting for DNS to time out and resend the query, and finally, resending
the response but with fragmentation) "amortizes well." because the
original packet (which caused the ICMPv6 PTB message) is normally only
known to or re-creatable by the DNS server and not the operating system,
the greatest expense in the one-time effort is the DNS timeout, not the
extra packets or the extra round trips or the routing table maintenance
if a DNS server and its operating system can cooperate such that the
server will know when it has to re-send a now-fragmentation-eligible
response, then the total cost of the interruption is very low, and the
timeout-cleanup period for the temporary fragmentation route could be as
low as 30 seconds and have the one-time costs "amortize well". if such
cooperation cannot be obtained, then ten minutes is a reasonable
minimum, since the one-time costs will be extremely high.
the state-retention cost's attack vector is the same as for TCP SYN
flooding, where a packet with a spoofed IP source address can cause the
victim to store too much information (too many TCP PCB's, in the SYN
flood case; or too many fragmentation-necessary routing elements, in the
ICMPv6 PTB case). the management of such overload could lead to crashes,
or being unable to add new state for valid (non-attack)
fragmentation-eligible flows, or causing high rates of LRU purges such
that valid (non-attack) state is removed too soon to have the desired
effect on the valid (non-attack) flows.
an adaptive approach that idealized a long timeout like ten minutes, but
which used shorter timeouts when the routing table became larger, or
when the rate of additions was higher, could put this attack out of
reach in the same way that SYN floods are out of reach -- by making sure
that resource exhaustion can't happen even at maximum input rate.
more vital in my view is the need for a DNS server to know when an
ICMPv6 PTB has been sent, and to notice that a new fragmentation-route
has been added in response to that PTB, and to regenerate/retransmit the
affected response in the hope that the second transmission will be
fragmented and will arrive successfully. care must be taken to avoid
lock-step retransmission such that repeated PTB's engender additional
retransmissions, or else this will become a new reflection vector. note
that TCP segments, by comparison, can be regenerated using socket buffer
data present in the operating system, that cannot be discarded until it
has been ACK'd.
most vital in my view is to test solutions to this problem, to propose a
well-tested well-debated consensus solution for this to the IETF, and to
see this applied to wide-area UDP speakers in IPv6, not just DNS server
operators, and not just root DNS server operators. i'd like to hope that
the Yeti-DNS community will take an interest in this work; i believe
that we're ideally positioned to test solutions to this problem.
thank you for sharing your research, and engaging with us here.
More information about the discuss