ipv6 conntrack issues on 2.6.18 kernels, and how not to properly debug

Recently Linode announced their IPv6 rollout. My Linode, which (at the time of writing) runs this website, is in Dallas, and once Dallas was IPv6-ready I requested an IPv6 address to be assigned to it.

No connection for you

Once up and running on IPv6 I identified a strange issue: I was running apt-get update, which attempts to connect to security.debian.org and backports.debian.org - two hosts that have IPv6 addresses. My server seemed unable to connect to port 80 over ipv6 on these machines, which significantly hung up apt (as a side note, it seems apt, or whatever is its underneath HTTP mechanism, takes a really long time to revert back to ipv4).

Not just me

After several other Linode users I know confirmed they had the same issue, and knowing that I have other ipv6-capable virtual servers that were able to connect to these Debian servers just fine, I began to suspect there was a firewall issue either at Linode or at Debian preventing the connection. An MTR confirmed it was unlikely to be a routing issue.

The righteous noob

I gradually became more and more convinced it was a problem at Debian, and I contacted the debian-admin mailing list to let them know what I'd found. This turned out to be a bit premature, and after copping a bit of nerdrage from the grumpy Debian admins that I was wasting the time of, I looked into my own server more and discovered it worked fine if I flushed my own firewall rules.

Still, it was odd, since the firewall was almost identical to some of my other ipv6-capable servers. Finally I did a proper tcpdump, enabled logging of my dropped ipv6 traffic in my firewall and discovered I was indeed getting ACK back from the Debian end, but that was where the buck stopped. I could no longer blame the Debian servers, and see that my own firewall was receiving but dropping the ACK packets.

The awful truth

Frustrated was I to see that my ip6tables rules were written to accept packets from a related or established session on the correct interface, just like IPv4. A hunch started to grow, and was confirmed for me by users in #linode on OFTC: ipv6 conntrack seems disabled in 2.6.18 kernels, which was what this old Debian Lenny (gasp!) server was using. Those ip6tables were useless because there was no ability to track the connections to and from the server over ipv6.

Having been putting off an upgrade of this server for a long time, I finally bit the bullet and dist-upgraded to Squeeze and set it to boot on a newer kernel (apparently needs to be older than 2.6.25).

What did I learn?

I learnt, or was harshly reminded of a number of important things from this experience:

  • Always do a proper debug before you jump to conclusions. Think like the packet itself. Where is it going, did it get as far as the next stop, and when it stops moving, look at what's happening at that place. No matter how experienced you are, or how similar the problem looks in your memory, don't cut corners! tpcdump is your friend, look closely.
  • Just because it's happening for others doesn't mean it isn't your problem, it might just mean you both are doing it wrong :)
  • It also helps to debug more casually on IRC, and might even save you some injured pride, before you send big important emails implying that someone else is to blame for your problem!
  • Don't run ancient kernels if you don't have to! :)
  • You never stop realising you don't know half as much as you'd like. Keep learning!