Archives – RIPE 87

PLENARY SESSION
28 NOVEMBER 2023
11 a.m.:

BRIAN NISBET: Good morning, folks. Hope you have been reinvigorated by your coffee. And we have installed additional breakers and a small nuclear power plant outside the room to hopefully have a nice smooth session. I'm Brian, and along with Antonio we will be chairing this session. Three very interesting talks.

So, first up we have Christian Petrash from DE‑CIX.

CHRISTIAN PETRASH: Good morning, ladies and gentlemen. Thanks for having me. My name is Christian. And I want to bring you on a journey for a big data platform.

At first, let's talk about why we want to do that. We want to provide our customers and our, especially our smaller customers, a tool where they have insights, deeper insights in their ports. A lot of them are blind because they are too small to have own services or telemetry metrics, it's a company, and we also want to be technical state of the art and have the observability possibilities. We have at that point of time, and at the end, it's a good idea to have better possibilities to do network planning, capacity planning and so on.

And, for sure, as the metric is showing you, in the last five years traffic is increasing ‑‑ it doubled. So we have to do something faster than the old one.

The first challenges we figured out was that we have more or less 5 gigabyte a second load of data, which consists of 300,000 packets a second of flow data of IPFix data, which is coming from our routers, our edge routers on the DE‑CIX platform, and 10,000 values a minute which consists of the telemetry data, which is the statistics of port counter, error counters, discards and so on, and doing analysis and enrichment, and big data stuff with that amount of data is a challenge.

So we invented a three‑step solution, data ingest and storage, front end and enrichment. For that, we went on a shopping trip, and it should be fun, we heard.

And the first step was getting a data hose and the data lake. And for data hose, there are a lot of companies were relying on cavity Kafka is scaleable, it's fast, it looks like a good idea. And for data storage, we mentioned about the last RIPE meetings where CloudFlare is doing some talks about Clickhouse and a lot of people are using Clickhouse and it's a really fast database and it fits perfectly together with Kafka so we choose Clickhouse. So we have our IXP platform and the data is coming in, we need some collector miracle, I will come to that later, and then we put the stuff to Kafka and Clickhouse.

So, next step was which tool we are using for the collector and at that point there are more or less two big players in the market, and had a look to the transport performance. At first, we used JSON, but JSON is very ‑‑ JSON packets are very big and this means that it's slow parsing. And that was the first lesson we learned, it's too slow for that big amount of IPFix data, so we thought about oh, binary performance could be nice.

So the battle of binary performance starts, and we had a look to Protobuf and Avro, and more or less Protobuf wins because in a later step we knew that we want to use Rust, so implementation of Avro and Rust was at that point not really well. So, we said cool, let's use Protobuf.

And at that point, only goflow 2 was able to use Protobuf, and at that point, thanks for developing and supporting that, Louis. He is at the RIPE meeting, if somebody wants to talk with him. And for example, we had a finding that our platform consists of Nokia routers and Nokia has a different template, IPFix template, so we had to change something in cooperation with Louis, and that was really fast. I really appreciate it.

So, our platform is running at the Cloud, but my journey is not you have to run it on the Cloud if you want to do that, it's possible wherever you want to run it. But we run it at the Cloud. And we thought, ah, cool, let's run goflow at the Cloud, and we tried that, and we sent our IPFix packets and said okay, we are losing around 75%. Why?
.
And after some investigations, we saw okay, the network of the Cloud provider has a maximum MTU of 1,400 bytes. That is complicated because a typical IPFix packet has 1,460 bytes. So it is GDP fragmented and, because of the fragments Mac attack, the Cloud pro times connect to dual stack SSID even if the device can actually work perfectly fine on v6 only. For example, some phones which work actually very well on v6 only can still connect to dual stack and they would consume my precious IPv4 addresses. It leads to high IPv4 consumption. And also, I do not have any visibility to problems. Why people joined dual stack network? Is it just random choice or is it because something does not work, something I might be able to fix? Nobody tells me anything, right? Also multiple IIS DIDs, wi‑fi team hates it and I realise that for wired network I would be not be able to do this. We have so many VLANs if I start doubling the number of VLANs for .1X it would be an operational nightmare. So we wanted something better.

What if, instead of doing dual stack and v6 only, we are going to do what I like to call IPv6 mostly. What if we can let devices co‑exist on the same network segment and some of them can be v6 only, like new devices which I know cannot be can operate in v6 only mode and some devices like old devices would be still dual stack. How can we do this? Let's say client can signal v6 only capability and say hey, I can IPv6‑only, does this network support this? If it's just a legacy network which only has v4 or dual stack, fine, I can be dual stack. But if this network supports v6 only or v6 mostly mode, I'm happy to be without v4 address.

As a result, devices can move between different networks and operate in the mode which the network expects them to. Because, yeah, I was considering can I just turn off IPv4 for corporate devices but then as long as users go outside to public, like Starbucks, wi‑fi, it would get broken, right.

How did we do that? We wrote RFC 8925, or you probably heard about this, especially if you attended tutorial on Monday, option 108. So we decided that the best way to turn off IPv4 is use the DHCP phone, because if you have your phone network ‑‑ how it works. In the DHCP discover the client includes the option 108, which basically indicates the client capability to v6 only. So if DHCP server is not aware of that option, let's say like normal Starbucks network, nothing would happen, right, the normal would complete, everything works fine. If that device connects to my enterprise network, and it connects to the network segment which I already have converted to v6 mostly mode, then my DHCP server say oh, for this network segment, I will return the option 108, which means shut up. Don't do DHCP 4 for specified period of time. V6 only. Again it only will happen for clients which supports this. So for example, if you take some, I don't know, very old network device, nothing going to happen, that device will get performed as before.

Now I can have two IPs too of devices in my network and only some of them going to get IPv4.

So, what do we mean when I say device can operate in IPv6‑only mode? It's a kind of very wide definition. For some devices I know because I tested. My laptop is only on v6 network for five years but I only use browsers and NN S H. For me it's everything works. For some people they are playing some games and that stuff for some reason breaks and they are unhappy. Well...
.
So, practically, the only way you can say that the device is capable of operating in v6 only mode, if you know that the device runs for v6 Etisalat. Because mobile operators are way ahead in the game. They are deploying v6 network for a long time. Why? Because most of mobile use 464XLAT, they use a special translation demon on the device and that demon provides IPv4 address and IPv4 default route to applications. So if applications can not operate without v4, what happens on v6 only network? A.m. place, I don't have v4 address, I can't connect to v4, go away, I'm not working. 164 XLAT fix it and here is the v4 address and here is the default route and the translates the device on the the traffic to v4 to v6 and sends it over v6 only network as a v6 packet and then it's translated by NAT64 device and the age is normal. That Starbucks on mobile phones. And it also works on your Mac books if you have something. If you have Mac OS 13 or newer.

Obviously, it probably means that if your device can operate even without DNS 464, which is also good for seasons deck, I know there are DNS people in the room. You should love this. So, for me, practically, it means that I want devices which run for 464XLAT to send option 108 and obviously I can manually opt‑in some devices if I want to.

So, project scope. What we did:
.
We were disabling ‑‑ we were migrating our network to v6 mostly, and the scope was all user phrasing wi‑fi and wired network infrastructure all across the globe. So all our offices. Well most, most of them. There are a few which I did not touch because they are very old. But basically I want to cover the vast majority of user face in VLANs. So devices in scope. All devices which concerned option 108. Currently on /TKRAEUD, IOS and Mac OS. They sent option 108 unconditionally. You just turn them on. Option 108 at the detect, either by... advertisement or by using DNS look‑up of IPv4 only name. And they enable 464XLAT and everything works. I also have selected Linux and Chrome devices and opt‑in when users can enable option 108 selectively and try it and disable if they don't like it. But, Mac OS and phones, they will like automatically opt‑in and opt‑out mechanism.

So, how long did it take?
.
We are currently at 90% of rolling out. 90% of all network segments. I am counting VLANs where this is enabled. Fingers crossed, a hundred percent will be reached next week when I'm back from RIPE. So, we started just doing like extended pilot, chosen in three locations. I started in Sydney where I was. Then we extended it for one months in five big offices and then we encrypted the switch and said okay from now on all new offices are turned on about option 108 by default as a v6 only mode and we started doing brownfield. And it basically took us four months to do this. So I started in March and really started in May doing the large brownfield rollout.

So, results:
.
I can't believe it, fingers crossed. Worked surprising well. No real showstoppers found. We found a few cosmetic issues. We found a few bugs in Mac OS and Apple, Apple have been amazing, those have been fixing those bugs I have been recording. We did find some issues for which we had to implement work‑arounds and you can talk about this more hours, right, by I only have half an hour. So my apologies, I am not going to cover all stories and funny cases we found. I have some backup slides to you can take a look at them. Maybe I have time for them. We'll see.

So, yeah, we were able to do this. Well, 90% done. I hate reporting stuff which is not a hundred percent done, but ‑‑ sorry, we should have have RIPE meeting one week later.

So ‑‑ every utilisation drop, three or four times on average networks. There are a few networks which did not show much utilisation drop probably because there are a lot of Windows devices or Chrome OS devices but on average, I basically expect to downsize most of the subnets between like 2 and eight times sub‑net size reduction. We basically start reclaiming address space and estimation in about 300,000 addresses to be reclaimed by the end of the year. This is not really a typical network because here is a significant utilisation drop ‑‑ almost 40% utilisation DPS.
.
If I make it /20, it will be 80% utilisation, much higher than acceptable threshold. So after we did that. You see it dropped below 5 percent, so I just downsized it to /22, got 7 K addresses back instantly.
.
Lessons lender. The most important lessons learned is that the only way to do this, like you can get much further with a good word and a gun than just with good word itself. So, until you actually in danger of running out of address space, nothing happens. I have been working on this for, I don't know how long. We launched Google public DNS64 for what it was, 2015, 2014. Because we needed that for this project, right.

But until we ran out. In like 2019, it never was a priority. Now it's a very important thing.

Lesson number one is that I got surprises write. Ten years ago I was very unsatisfied with the quality of v6 in the network, but then I spent a significant amount of time making it better, and I was under ‑‑ I had a full sense of security. I was thinking we have been operating v6 for so many years, my v6 network is perfect, it works as well as IPv4. Oh my God, how many things I found.

Since happy eyeballs. You do not a proper visibility to v6 issue while I have a safety net of IPv4.

For example, like your v6 would mostly work, mostly like 90% of the time. Who would notice if your work station or laptop when it wakes up after sleep, would lose a DNS configuration for ‑‑ well until the next RA comes. Maybe seconds, maybe minutes. Nobody would notice it as long as you have v4, but when you turn off IPv4, that's when the fun starts. Users do not report the issue, and all those issues on end points are really hard to detect and test, right, because sometimes it's a raised condition, sometimes it at the dividends on what users do, right. So it's not something I can easily test.

And as a result, when you have v4, nobody reports anything, nothing gets fixed.

So if you think your network is like v6 ready, well, you might reconsider.

So, some discoveries we made.

If something looks like a host, behaves like a host, is it actually a host? People put some devices in the network like some box with some screen on it maybe, and it gets an IPv4 address over DHCP, gets IPv6 address over SLAAC, and then actually we discovered that when we did v6 only guest, so it was option 108 here. We found that it's actually a router. It might be a router. It might be something inside. I actually had to take a screwdriver, open the box and found the OpenWRT box inside the tablet and a few other things connected. And for me it was a host, right.

No. And as soon as I took away IPv4 from that thing, right, everything got broke and because this device cannot extend the network any more. It was using NAT 484 IPv4 but it was not able to extend v6 connectivity downstream. Obviously we can use 464XLAT here, but is it a good thing do I really want these systems to stay forever? We we actually want IPv6 there. What we are doing is, we are using DHCP prefix delegation to give a prefixes to end points. It was some discussion at IETF. It's not strictly speaking the host any more. It's a router. The line between host and router is now is blurred, like my phone is the hoster router. Strictly speaking it's a router because I have my laptop connected to Chrome OS,.and I am going to talk about it later, have like 57 VMs inside. Is it a host or router?
.
Let's say a node. Something which looks like aest /HO, okay. I have so much time we can talk about this one.

This is a slide which actually summarised one‑hour talk I gave at the IETF. So, we could talk ‑‑ I have some backup slides here.

First of all surprisingly the biggest issue we discovered, biggest in terms of number of support tickets open, was people, oh my God I came to the office and I cannot connect to Google wi‑fi any more. Why? Because they had IPv6 disabled on their laptops. Why they had IPv6 disabled? Because for years, support people were saying, ah, you have to problem, did you try to disable IPv6? Oh, it helped. Ticket closed. Problem solved.

I asked many times how exactly we are going to relabel that thing back. Nobody cared. Well, until we suddenly stopped providing IPv4. So, yeah, we had to ‑‑ like use a script to go to every single corporate device and reply enable IPv6 just to make sure it stays enabled.

So, fragment extension headers. I love extension headers. They are actually used. I know some people think they don't but they are. There are two very useful extension headers in the world. One of them is fragmentation. Okay, let me find supporting slides for this.

So, fragmentation obviously. For end users, it mostly is DNS and some other UDP application like /PHORB, for example. If you are using /PHORB you might again unpleasantly surprised if you are blocking fragment header. The header is actually ESP header, which conveniences VPNs and wi‑fi calling. Right. Wi‑fi calling.

Your phone actually established IPv6 channel to a voice gateway and if you are using IPv6, right, you need to remember that you need to permit not just TCP UDP, EMT but also ESP headers.

We found some funny issues on some platforms. Some NAT64 devices were really happy doing NAT64 unless IPv4 packet was coming in, with a zero UDP checksum. Which is perfectly fine in IPv4. Well some NAT64 devices start putting some garbage inside which made end points very unhappy.

So, some other devices decided that I am stateful firewalling, but when I am permitting outgoing ISP traffic I'm not going create any state for return flow because for state you need port numbers. No port numbers for ESPs, so, no, just drop it on the floor. Well, that was easy to fix. Fortunately, it was just explicit rule. But still...
.
Fragmentations. Surprise, surprise, IPv6 had 20 bytes longer than IPv4. So inevitably, if your NAT64 device receives a 1,500 bytes from Internet, and you have [beat asset] to zero. NAT64 device will fragment it, unless you have more than 1,500 MTU in your v6 infrastructure, but I know that some ports don't like packets of bigger than 1,500. What's going to happen? Your NAT64 will create two packets. With fragment headers, and by the way some devices are not even using 1,500 packet size, they are using 12 will 0 as a default fragmentation size in this case. Fortunately, it's configurable. But again, it's something I was not aware until people started to complain about some packets disappearing in the wild. And again, like for end points, as I say, it's UDP applications. If you start using v6 for network infrastructure, [radio] is becoming a problem because [radio] is, DNS wants to get an answer in one packet and if you start using certificates for .1X, well, packet will be definitely bigger than you can fit into MTU.

I talk about this, let's go back to summary slide.

So, yeah, talk about NAT64. This one probably deserves much longer discussion. So I started getting complaints at ChromeOS devices having connectivity issue and kind of strange connectivity issue. It's not like nothing works, which is normally easy to troubleshoot. Some applications losing connectivity from time to time. What happened is, as I say, ChromeOS, it's actually a very complicated thing. Inside it runs a lot of virtual machines, name spaces, a lot of them. Each of them obviously getting IPv6 address, more than one sometimes. So you can easily see your ChromeOS laptop use a Chromebook, using like 10 or even 20 IPv6 addresses at the same time. Other devices are trying to be very, very helpful. They are trying to enable discovery proxy. They are trying to respond on behalf of our devices, and for that they keep in the table. Which client has using which IPv6 addresses. The size of the table might unpleasantly surprise you. Some vendors use magic number 7, and I thought oh you have seen more than seven addresses on a device. I'm like yeah, I have seen 20.

And this is actually ‑‑ and the problem is troubleshooting this is a nightmare because when N plus one address appears, it just fails to install it into the table. So, everything works except this one address. Troubleshooting that stuff was fun really. And again some addresses disappears from the table that probably MAC address started to work and some other addresses started falling off the network. So job security, I was very busy for a child.

Obviously ‑‑ and also, let's say you have a VS LAN infrastructure, which probably means every address is a route. Now you suddenly have 20 times IPv4 routes in your routing table than you planned. And this now translates to money. Because it's completely different hardware now. So kind of does not scale very well, right, people start complaining about the IPv6 thing because it's very expensive.

So ‑‑ and this is actually very similar problem to one I had on the slide before about device extending connectivity behind itself. So what we are trying to do, this is a work in progress, is to start delegating prefixes where the ISP to such devices so instead of maintaining a huge route neighbour discovery cache and wireless table, which has 10 or 20 or 30 addresses per device, we only need to maintain in single route, and a single entry for device link local address, which attaches to the number of the devices I have.

So, another interesting thing. Where is Jan, I know he loves this. So, renumbering case. You probably heard about that for ISP environment, you see PC reboots, gets another prefix PADP, your device did not realise that prefix changed and you have now an old prefix and new prefix and nothing works.

Surprisingly, we have a similar problem in the enterprise. So,.1 X, a desktop boots up, gets into, let's say, machine VLAN, right, gets an address from blue machine VLAN A. Then a user comes to the office, logs in, .1X authentication happens. Device moves to another VLAN. Gets an address from another VLAN. The old one stays. Theoretically indeed the machine must be smart enough and realise that when .1X authentication happens you need to do something with your network stack. I had this discussion with Microsoft support back in 2001 for 2004. For IPv6 it's still the same, network manager is a total nightmare in this case, so, basically in half of the cases what happens, right, you basically have two subnets on your interface, two addresses and only one of them really works.

However, SLS a solution for this, because again this is a case which we never noticed until we got v4 off.
There is an RFC Tor detecting network attachment which says like if your device disconnects and reconnects, for example wi‑fi network, like Layer2, device is a kind of trying to dot right thing, am I still on the same network? Do I need to complete refresh my network stack or am I fine using the old network address? The first thing they all do is they check if my default router changed, if my link local and MAC address of my default router is still the same. Theoretically, after that they still need to get a router advertisement and compare prefixes, but it's not all of them do that.

So, basically, a lot of devices kind of realise that when your Layer3 network changes, your router changes as well. However, if you use VRP, your MAC address is always depends on your group ID, and any reason to have more than one group ID? I'm just using the same number everywhere, right, why would I have different numbers? And actually, not so many of them, strictly speaking, just 256.

Some devices also violates RFC and generates virtual link local address from its MAC address, which means, every network segments that uses the same VRP ID will have the same link local address and the same MAC address everywhere.

Well, never been a problem, right.

Until another renumbering case appears. I have two buildings, two different subnets but they are too close to each other. When you walk from one building to another, you actually move between two subnets. But again, the same VRP ID everywhere, same link local, same MAC address, device, like, I am definitely on the the same SSID, same network, I am keeping old network config. As a result, nothing works, right, no connectivity.

People start complaining. What can they do? Oh, there is a solution actually. Default address selection RFC says that when you select ‑‑ you have multiple addresses, it's a fundamental thing in IPv6, right? If you have multiple addresses, you need to select one which to use, and by ‑‑ and there is ‑‑ one of the nine, eight, eight rules says that if you have multiple routers and each router advertises a prefix and you selected the router, please use the source address from the prefix advertised by the router. It makes sense for multi‑homing right. I have two ISPs, I always want to send traffic from ISP A from source address from ISP A address space.

So, if host implements this, then all your renumbering cases are very easily solved. You are basically making sure that if you get a new prefix from a router, it should be a different router. Which means as long as every VLAN in my network has its own link local address, it sounds crazy, globally, unique link local addresses. Well I don't know, I can probably make them CT unique, but why bother? I make them globally unique. What I'm doing in my network because it's easier, I am just using global /64 prefixes and interface ID for VRP. Very easy to code. One line of code change. It works like a charm, I can tell you. So many people are now happy.

So, yeah, and I think it's basically applies for CPE cases as well, as long as your link local address is a function of your prefixes. By the way, rule 55, supported by ‑‑ Jan, you are late ‑‑ I was talking about renumbering cases. I'm not going to repeat it.
.
Rule 55 being supported by Microsoft for a long time, Mac supports this and I know there is a work in progress for Linux so we might actually get most operating systems covering it.

What other issues have we discovered?
.
So, by the way, until recently Max OS was doing very funny thing, it was using 464 slack address, special one and considered it to be a normal address because it's assigned on the interface and it was sending DNS packets from that source. Well, it freaked out some wireless devices because, oh, my God, it's a spoofed address definitely, it's a bad client, I'll just block it and you are not getting on my network. It also makes VXLAN networks very, very unhappy because they see the given IPv4 address start moving between all ports in the network at the same time. CPU goes up the sky, operational people also unhappy.

So, basically, strictly speaking, the only issue you would probably see now is a cosmetic issue with traceroute, if you do it from your Mac books. By the way this network is using the same approach. So most of your MacBooks are probably v6 only right now. You probably would not see anything, but this is a work in progress. Besides that, so make this happen and we had to publish a number of RFCs to document the stuff. And thanks for open source community, that was implemented in various operating systems, even before I was able to deploy it in my network.

There are some ongoing v6 office, the document stuff I was talking about, so feel free to read for datasets on this address. For example, we are talking about how to enable 464, when to enable it, disable it, so all developers at least have some common guidelines.

Next step for me, because as I said I'm mostly done, famous last words, with Apple devices. ChromeOS starting from 1114 supports option 108. It's disabled by default. You can go into settings, whatever it's called, and enable it. For Linux, you can enable that stuff manually but there is no 464XLAT, CLAT implementation in the standard package of Linux so there's some work to be done there as well. So it's basically my next step.

And we have time for questions.

(Applause)

BRIAN NISBET: Thank you very much. Questions?

SPEAKER: Rinse Kloek speaking for myself. Very nice presentation. Thank you. One simple question: Do you expect Microsoft to support 108 and 464XLAT shortly?

JEN LINKOVA: Can you describe shortly. There is a work in progress on this, let me put it that way. I can't speak for them. But I have asked. So, yeah, very good point actually, so I understand the enterprise people would be mostly interested in Microsoft. So if you have support contract, please, because it would make everyone's life easier, the more people ask for it the easier it would be for them internally to justify that work. So if you would like to see that, ask your Microsoft representatives, yeah.

SPEAKER: Some post /TPOF ID professionals. As a person who is looking for the means and the why to implement IPv6, I would like to thank you for your experiences and sharing them with us.

JEN LINKOVA: My pleasure.

SPEAKER: We have a question from Elvis. "Can I sell your reclaimed IPv4? Just joking."

JEN LINKOVA: Private ones?

SPEAKER: "Seriously now, since you have done most of the work, how hard would it be to replicate this transition in a company offices and infrastructure?"

JEN LINKOVA: I would say it's getting better and better. As I say, I believe that right now if you take macOS is in Openflow a ‑‑ it depends on your client base ‑‑ if you have up to date macOS and reasonably up to date phones, you can do it easily, and the beauty is you do not care about legacy devices really, right, because legacy devices will be just dual stack, right. So your most problematic part might be if you have say ‑‑ macOS 13 where you might see some issues. Again there is no showstoppers, you can have work‑arounds in everything we found in sin Openflow a. Network infrastructure, I guess it depends. We found some bugs. We got them fixed, right. So I assume your network infrastructure does not have those problems. I would say it should be like reasonably easy for people to do this. Cisco and Juniper supports PREF64 in recent users, I just don't remember the numbers. You can do this now. And again, the great thing about that is your devices which do not work very well with v6 will be just dual stack. So yeah, it could be done.

BRIAN NISBET: He asked a follow‑up question: "If this is very simple, how long until I need to find a new job?"

JEN LINKOVA: No, I am not concerned about that. You see, I just have a slide. You see I want to get to like ‑‑ let me tell you a story. When I started in Google in 2009, it was 0.2% of v6 traffic or something, and people were just you and Lorenzo, nobody else is using this. So I want to get to the point when I look in v4 traffic and I'm like who cares, nobody using that. We are not there yet. So, I will be around for a while. I am not retiring yet.

BRIAN NISBET: And Elvis will have a job for a while, which I think is his main concern.

JEN LINKOVA: Nobody is getting out of a job here. Stay in the room, please.

BRIAN NISBET: Okay. I think if there are no more questions. Thank you once again for a fascinating talk.

(Applause)
.
So, just thanks to all our speakers in this session. Just before you all rush off to lunch, I would just remind you please to rate the talks. I will say that also there is still an opportunity to nominate yourself or with their consent, somebody else, for the Programme Committee, and we'll be talking about that later, and also the NCC have asked me to tell you that the meeting T‑shirts are available, it's downstairs, it's where they are available, there is a maze of twisty passageways, do not get eaten by [a grew], but they are available downstairs for your fashion plans for the rest of the week.

Thank you all very much.

(Lunch break)

LIVE CAPTIONING BY
MARY McKEON, RMR, CRR, CBC
DUBLIN, IRELAND.

Connectivity Sponsor