RIPE 87

Archives

Plenary session
Monday, 27 November 2023
4 p.m.

MIRJAM KUHNE: Welcome back from the coffee break. I think there is still people coming in. There is a bit of a walk between here and the coffee break. But I'd like to start ‑‑ and this is going to be a really interesting short session here ‑‑ just a short introduction of reminding you those who don't remember Rob Blokzijl, who was the Chair of the RIPE community from RIPE 1 to RIPE 65, he was the Chair for basically pretty much 25 years, and he really kind of formed this community very much. And I still often think of him and he is a guidance in the back of my head since now I'm the Chair. And so, unfortunately, he passed away actually this week, Friday, it was eight years ago that he died, and so, in his honour, before he died he founded the Rob Blokzijl Foundation in his name, in his honour, and basically to award people every roughly two years. I'm just going to read a little bit from the objectives, how to recognise individuals who have made substantial and sustained technical and operational contributions to the development of the Internet or, and in our support, or enable others for the development of the Internet in this region.

So, just to give you a little bit of background, and so this year there was a committee formed, I think we financed it last time, and they have had nominations, and they will introduce this year's winner of this particular award.

So I'll hand it over to the committee members to introduce the winner.

JAN ZORZ: Hello, everybody. My name is Jan Zorz. This is Franziska Lichtblau from the Rob Blokzijl Award Committee, and let's see if this works.

So, in this year, Award Committee was Maria Hall, Julf, Franziska, Desiree, Carsten, Mike and myself, and thank you for all the work, I must say, that when we had the first call, and when the Chair of the committee asked okay, let's do the round of thinking, what we think in the first round, the decision was unanimous, the same name came up. So, without further ado, please ‑‑

FRANZISKA LICHTBLAU: Dear community, the idea of this year's edition of the Rob Blokzijl award is very well known, but we will try to be as cryptic about it for as long as we can. Although, once we start talking about his impact on the Internet and our community as a whole, many of you will immediately recognise the character.

We have all met him on various RIPE and other RIRs meetings, IETF, NOGs, the list would be just too long for this format. And we all experienced his direct way of interacting with people, in person, as a speaker, and over the microphone.

To quote the nominator who put him up for consideration for this award, "He is a genuine personality, always professional and never personal. The jewels on the microphone with other veterans and dazzling birds of paradise of the RIPE community are epic and have helped a whole generation of RIPE members to question things more critically."
.
JAN ZORZ: So our ID is a measurement person, so we went back into the RIPE mailing list archives and found that he contributed 1,80 e‑mails ‑‑ that's a measurement ‑‑ to the discussions spanning across many working groups and task forces, almost all of them are very short and to the point. Despite or because of this, his input carries weight in every discussion and he takes that responsibility very seriously. While not all comments are straightforward to everyone, they nurture a diversity of ideas. Monica, for High Sound Line, did an extensive interview with him in June this year and the introduction she wrote is fantastic. We couldn't say it better, so we will use it now. Thank you, Monica, for her kind permission.

"Wire named him an unrepentant hippy and world networker. The Internet Society introduced him into their Internet Hall of Fame the year it was opened. South Africa's Rhodes University conferred him an honorary doctorate for the contributions he made to connecting countries in Africa, which he accepted, together with South African legendary leader, Nelson Mandela. His story goes back far beyond the beginning of the Internet to the days of punch cards and main frame computers, coming with a background in computing, the hippy contributed TTL start of the networking in the pacific northwest, US and globally, sometimes from the periphery as he notes. He is aware of our organisations and governments, perhaps legacy of his political activism in the 1960s in the US peace movement, bringing people to the Internet was done by the people, not so much by organisations, he said in a talk on connecting the developing world last year.

FRANZISKA LICHTBLAU: The person we are honouring today is known for his direct, yet constructive, way of criticism to battle one way thinking with the goal of getting things done with a pragmatic mindset. But most of all, he is admired for his willingness to speak out uncomfortable truth and serve as a constant reminder to ask the hard questions to further our common goal of a reliable open and free Internet.

He made great contributions to the Internet as we know it today and actively shaped it, not just in the RIPE region where most of us know him from but also globally, while being a clear thinking, technical mind, he has never lost sight of the big picture and the impact technology can have. He played a key role in bringing the Internet to South Africa in the middle of the apart eyes regime while also keeping mindful of potential implications the implication that is technical colonialism can have. You can really read up on that piece of history. It was insightful for me. He himself loosely quantifies the number of Internet drafts and RFC he authored or co‑authored as too many. We will leave you with that.

There is still hope for the research world though because he refers to a number of T PCs and conferences he served on only as "Many." And even as so many years, he is still continuously active within our community, sharing his views and raising relevant questions in presentations, serving on the RIPE Code of Conduct team and on actually last year's edition of the Rob Blokzijl Award Committee are only a few examples of his recent community activities.

Over the last decades, he has been a dedicated mentor helping new minds find their way into our community. Always speaking the honest, sometimes even brutal truth, while putting up a challenge to let people grow irrespective of their background, may it be academia, industry or somewhere in between and in doing so he made one of the largest possible contributions to our community, namely ensuring sustainability by passing on its knowledge and values.

JAN ZORZ: Even for a measurement person, numbers aren't everything. What counts is what they represent. So we don't even need to list all the RFCs discussions or supporting documents he authored and facilitated in the RPKI world. We can summarise his contributions there very shortly. The word 'RPKI' is so strongly entangled with his name and work that it surely would not be the same without him and, with that one routing ecosystem, way less secure.

Being the elongated internal fighter for the RPKI deployment and routing hygiene, he was standing strong sailing through fears debates and discussions ‑‑ therefore, and now for something completely different, we think a Limerick might be in order, right?
.
A technical whizz kid named Randy
Was in cyber space really quite handy
He worked night and day
And we are happy to say
That the Internet's now fine and dandy.

Ladies and gentlemen, Randy Bush!
.
(Applause)

MIRJAM KUHNE: The Rob Blokzijl Foundation Award and the committee have asked me to do the honours and to hand this beautiful piece of artwork over to you and it's a great pleasure for me, and Randy has been a great mentor and friend for many, many years and I am sure also Rob would very much approve of this decision.

RANDY BUSH: So those of you who know me will realise that I find this a little embarrassing. Those of us who know me well will know that I find it very embarrassing.

Yes, like many of you, I have impostor syndrome, the good side of that is I know that at least I'm not a sociopath.

But this occasion shouldn't be about me. This occasion should be about Rob, and Rob is who I wanted to grow up to be, if I had to grow up. He was a leader from behind instead of out front. He was not looking for glory or for personal recognition. He was another old hippy, as was another of our progenitors, John Pastel, let's hear it for the old hippies!
.
Rob spoke simply and honestly and had humility and humour, and those ‑‑ and integrity, and those are qualities that I really admire. I think many of us here really miss Rob. He was for the community, over [divisidness,] and he wanted cooperation over power. And I think those are qualities we should keep in mind and do our best to pass on to the next generations.

You will notice that Franziska and Jan kept using the word "he." This is the third old white male to get this award. I think this is a bug not a feature. I think there are a couple of women in this room who have been around longer, though they are not older. Mirjam, nobody here probably remembers Carol Orange, also at the very early RIPE NCC, and then the new women who have stepped up and ‑‑ by the way, I was raised by five women, so I know how to follow orders, and ‑‑ well, you know, you Germans give a little too specific orders.

But Franziska and Constanze and we have strong women in this culture and we should be promoting them and other diverse groups. But I have said too much already. I have already had to listen to me too long. Just, this is quite the honour, because I really, as I said, I always wanted to grow up to be like Rob if I had to grow up, so, this is very much an honour. Thank you very much.

(Applause)

WOLFGANG TREMMEL: Thank you. Plenary. I'll be chairing this session, together with Doris and Wolfgang and me and Doris, we both sat on the Programme Committee, and if you also would like to chair a session, please put your name forward, there is a PC election and you still have time to volunteer.

So, the first talk today will be by Silvan Geghardt, who is going to talk about how to run an IP transit provider using an IXP manager.

SILVAN GEBHARDT: Hello, everyone. Today's talk is going to be feature abuse, I think a lot of us people love feature abuse.

I was using a tool and I had 24 hours to replace the tool so this is what came out of it.

First, shortly about me. People who know me, I moved to Finland in 2020. I wanted to escape the heat over here and I wanted to go to swim in the sea during Covid. Since 2014, we run a couple of networks, some people might recognise some of those, 40051 is one of those also known as free transit. So, what's this talk going to be about?
.
Well, a topic which we spoke about at the last RIPE meeting a lot: The private search and education networks and the big growth of them. I think we all noticed this, hundreds over hundreds of them suddenly popping up. And obviously we are one of the tunnel networks, or at least it's one of the projects I have done. It grew out because I needed a lab environment for myself, I didn't have a direct connection. I was like "oh, let's do GRE tunnel", and a friend came along and said "can we have another tunnel?" And there was ten GRE tunnels, and I was like maybe this isn't that great, maybe I should move this to a bit more isolated network.

Well, was it ‑‑ what is it today? Well, we hand out IP transit for non‑profits. So, if you are doing, like, server hosting or hosting you are not going to get anything from us, that's for sure. Mostly we also give out IPv6 blocks. I have had decades of experience now with other network engineers, probably 15 to 20 years now, and I noticed in order to get IPv6 ahead, the only way to do that is try and get the next generation who will not get IPv4 at all, their hands on to IPv6. So, one year working in the university environment, when you are working in a school environment, you want your students to get some connectivity, they are not going to get IPv4 space, but they can get their own IPv6. Some of them want an ASN. They want to participate, they want to build their infrastructure, they maybe want to build their Anycast network, learn how it works. And since there was also a bit of that topic, yes, we do not see that free transit is one thing for free that again goes a bit to the last discussion from last RIPE. We do charge our members. We don't give out resources for free.

So, how does that happen when someone requests a tunnel? It normally comes in as a ticket. So, you have this ticket, you start looking at it, like, someone is requesting a transit tunnel for a German [co‑axe] network. Back in the day, I think it was the successor name of telecom, legit. Absolutely. We normally ask them could you change your import/export so we see that the person requests a tunnel actually does interest access to the resources, actually does have authority. If we don't trust it we might ask them to do a little remark in the description to make sure they actually do what they say. We let them through as a tunnel node on latency because all those tunnels really wreak havoc on latency today. Now we use people who run tunnels over tunnels because there is MPLS underneath, that's a tunnel as well.

The next thing we normally ask them to send us a list of prefixes. It's very funny what you see, you see people who have a /40 and then they deaggregated fully into /48, we ask them why are you doing that? I have so many POPs. How many? I have three. Why do you aggregate?
.
Who here has already noticed that we're going to grow across almost 200,000 prefixes on v6? It's insane. What happened recently?
.
Obviously you then set up the BGP session, you document and you end up with the port. This is a lot of manual labour if you do it manually.

So, what is our issue? Well we are growing, we are growing very fast. And then we had the ‑‑ back in the day, we used VYOS. That's a basic system, they often needed a reboot. Something went out of whack, the config loading went out of whack. You did a commit. It took 15 or 20 minutes, I kid you not. And at one point we upgraded to Version 1.3.2 because our configure again didn't work, our config again did lose part of it, I think it's when your configure goes over 10,000 lines of configuration for prefix filtering is not that much. And it broke. And again, route exchange everywhere, people were shut down on IXP, our stuff broke, it was not nice.

So, I had to decide overnight let me start from scratch a new platform. What do you do? DPNS as a base is very nice, but, well, with so many interfaces system, D starts to really behave weirdly so I decided no system D here. It was a great decision. It boots in five seconds and guess what? If you actually shut it down it also shuts down, it doesn't stop forever.

And we ended up with a script to pull the config from IXP Manager. Who here has configured IXP Manager? A handful of people. Okay. Good.

So basically, you have a script which pulls the config every 15 minutes in my case, I pulled a config, I checked if BIRD C can load it. Yes, we run on BIRD. If it loads it we will then restart and move the config file over. No interruption for anyone.

Tunnels, we still do by hand. All the IXP feeds, all we do is we treated other IXPs as if it's our IXP, so if there is a member who wants transit on for example clay or X or lock IX, we create lock kicks in our infrastructure as well as IXP Manager so it's a double infrastructure. We give the peer in and we define it as a route server peer except that our route server has different IP addresses. Guess what? The filter updates come in and they come in handy. So since we switched over to IXP Manager as a route network, as in a transit network config, no more route leaks, not like the old days. I still have that ticket from a long time ago.

For the people who don't know what IXP Manager is or never had to deal with it, it is orchestration for an IXP. It's, I would say, quite the standard tool. It's quite large, it's quite massive. I think it is open source. We can look what it is behind and you can nicely template it. At that time, I was just spending time retemplating some other configuration for assisting peering service at another workplace I was. So, I was like hey, let's use the tools we know. Feature abuse, use the tool what you know.

So, actually, all you have to do is, you change the template. It's not an R route server peer any more. It's just a regular eBGP peer, that's all. Even for a route server, it's an eBGP peer. You add local routings, you have include files, that's all you need to do. And then obviously you should be filtering, you mark lender routes and do you mark them with communities. Do never ever think that a static experiment list is going to do you well. Everyone who is doing static expert filter list, you are evil, stop doing, never do that again.

And the nice thing is if a downstream of ours is on an IXP or an on shared Layer2 networks, there is no annual work. We import you. We create you as a peer. We add your IP address, never to be touched again. It's always updated.

So, how does that look like? Well, you add a member. You populate it here. You set up your v4 and sixth macro. You choose a peering policy. And then the IR DB source ‑ well, that's a very nice one. If you mess up that source and accept ALTDB ‑‑ well, we have to accept part of it with exceptions because of one little use case, people with 44 subnets, people who have no RADB access and they have this 44 sub‑net, well they might not be in it.

So, instead of having the route servers in your route, at least you actually have actual router list. It's awesome. So you just add all your tunnel nodes into it, one for v4 and one for v6, and it's working.

You create switches. The switches are funny. The switches are actually just Linux machine. They are standard Debian machines with Linux. It works. Port statistics don't really work. Traffic statistics don't really work. But hey, it's a switch depending on how you want to see it.

And then it gets a bit wonky because you need to start creating tunnel interfaces and since the GRE tunnel interfaces they are all configured by hand in an ETC network interfaces, so you start generating a couple of thousand ports, eventually.

So, that's the tunnel interface. The rest is by hand. And you started aing IP addresses. Now, IPv4 abuse, we wanted to avoid that anyone who gets a tunnel from us uses the v4 address in the tunnel if they come with v4 space in order to create abuse, so, there's this lovely thing with the 169254 network, it's link local addresses, I love this dual use for this. So, well nobody can do any abuse on those.

It's not routed globally anywhere, I have not seen anywhere traffic coming in from this network source. Or, as you can see, IXP IP addresses, so you basically have to manually add all of the downstream IP addresses once, so you need to readdress every time an IXP readdresses. This is the only painful time.

And then you create a virtual interface. Again, manual, it's the same thing as with result ports. Again for people who use IXP Manager and you have resolve ports, this is exactly the same set‑up with virtual. No port statistics. One of the things I noticed in IXP Manager is that the Max BGP prefixes somehow doesn't get updated by PeeringDB. This was still something that is nice to be fixed at one point.

And, well IRDP filtering, in this case do I not allow more specifics. Again I'm not trying to pollute the v6 routing table.

Well, let's talk briefly about AS‑SETS. Well, as everyone knows, it's a definition of your cone and out of this your upstream and route servers, they should generate the filters, and the lovely thing that they have is they can be stacked and included and include and included many many times and you actually have no control. Once you accept an AS‑SET, well you are doomed. You are completely doomed, because of the recursions.

So, we have had downstreams which found it very funny to include their upstreams into it including large ASNs. That was interesting when someone included CloudFlare in this case. CloudFlare ‑‑ and my VYOS configure broke because when it gets to long and it stops and you are on exit deny is suddenly not there any more when you copy and paste to restore the config.

And also very interesting, if you have a too large AS‑SET and you generate a large ACL, sometimes people with hardware routers have limited ACL and suddenly you get a call early in the morning, hey could you fix your AS‑SET you are breaking our automatisation and the routers ran out of space. Very nice thing for DDoS. So, AS‑SETS are fairly broke in our case.

So, we ran VYOS, we had no more on exit discard, so we exited the full table to a route server on DE‑CIX Munich was that. It was a funny ticket. It was a funny ticket.

And I only noticed the ticket because suddenly I got 5 gigabits of inbound traffic on a 1 gig rate limited port and I didn't notice that but I noticed that the software flow daemon started using 50% of the CPU because it got so much traffic. And that led to traffic issues, and then sales force sent me a funny ticket. Hey guys, you break us.

This was a nice self‑inflicted DDoS and, unfortunately, the AS path I leaked contained Hurricane and Hurricane never said do not go for a route server flag, so Hurricane gets passed through so you can actually upstream Hurricane on a route server if you are unlucky.

So, what do we do about this? So well you could do more filter more intelligence, you could try and detect mistakes. Well there is the potential of machine learning if you do it properly and there was a master thesis from one of my colleagues on this: Can you use machine learning in order to work on getting garbage out of AS‑SETS? That's one of the things. You can use AI and machine learning. Well, does that work or not is questionable. You can use static filters. You can blacklist to only from the next big large ASN that you accidentally upstream. So that's not great. You can use other systems. Yes I know there is at least two other systems, but they are actually not really in use, none of those are in use, or at least globally. I know RPKI will not help here. Or we just stop using it.

So in order to make our tunnel network a bit less shitty, we stopped accepting AS‑SETS all together because we got a lot more tunnel requests and we cannot manually check all of them. This is also interesting, because we noticed that quite a few of our downstream had the great idea, let's become a transit provider as well so we have people who get transit from tunnels in order to reshare it over tunnel, potentially routing set tunnel over the tunnel. Well, you know, it's like, instead of it's turtles all the way down, it's tunnels all the way down. And it starts with VPLS and MPLS and everything anyway.

Because the MTU is bad, you have scenic routing, if a tunnel over tunnel, it's really hard to debugging and so no more AS‑SETS on our side.

So, when we switched over to IXP Manager, well we got really nice shareholder value, shorter time to deployment. The automated and better filter update works. You don't have to document because hey it's all there. It's living documentation, you see who is your downstream, it's everything in there. And it's a centralised configuration.
.
So a few numbers, they are not a hundred percent up to date, but still. It's grown in the meantime, we have about 320 ASNs in our AS‑SET, it's quite large. We are like a half percent of the global v6 table from time to time it goes up and down. I still see this, with the recent growth it might be more. Why is it only 3.5% of the global IP table? That's because a lot of end users are sloppy to main their tunnels, otherwise we would be, I don't know, 5% and it took us quite long to rework one of the last nodes.

For whoever is interested in what we had to do in order to make this feature abuse work ‑ well, you delete two line items, RS client needs to be gone and the source address because you obviously don't source it from the route server IP address but from the individual tunnel interfaces because it's eBGP session on an individual. That's it. You had all the protocols direct and that's it, so it's seriously five lines and IXP Manager works as a transit network.

The config block, I don't know why this is cut off. Basically, you have the interfaces. It's a standard easy network interface, we have one file per tunnel.

If someone is interested in looking at how to abuse a tool not made for their point or needs tunnels, this is pretty much it. You can contact us on the website, you can send us an e‑mail. You can reach us on social media. Are there any questions or remarks or comments?
.
(Applause)

WOLFGANG TREMMEL: Are there any questions? I see a massive amount of people rushing to the microphones.

SPEAKER: Daniel Karrenberg, co‑founder of RIPE and current RIPE staff member. Everybody deserves at least one question. Do you get any pushback from people because you don't accept AS‑SETS? Do they say we have to use AS‑SETS.

SILVAN GEBHARDT: Some, but very rarely. Most people just accept. Because it's a fairly solid, like, no‑go policy, we're like, guys, no point in resharing your transit and reducing the MTU more, just get ‑‑ just tell your friends to come and request a tunnel as well and most of the time it is accepted. It actually is also nice because it avoids that someone then resells the tunnel transit commercially again.

DANIEL KARRENBERG: But it's not like you get, sort of, we lose that customer if you don't?

SILVAN GEBHARDT: No, no, it's a sponsored thing. It's a lot of take it or leave it from our position. We get more pushback on, hey, all those private ASNs are useless, people should stop doing those hob ASN. I think that pushback we're getting is significantly larger, but we're not handing out any 16‑bit ASNs for years. There was one of the things we got the complaints in the beginning.

WOLFGANG TREMMEL: One question from me: You are using IXP Manager. Did you also look into peering manager, which is also open source software configuring routers?

SILVAN GEBHARDT: I have been thinking about it, but I was, literally the same month, involved in a project, fairly close to you, where I was so heavily invested in templating ISP manager for a certain product. I was totally in the mindset for it. It is actually a by‑product from ‑‑

WOLFGANG TREMMEL: One more question.

SPEAKER: It's not a question, but since nobody is here. Thank you so much for providing this service because it has enabled me to experiment with AS and BGP, which is usually out of reach. Thank you.

WOLFGANG TREMMEL: Thank you, Silvan.

(Applause)

Next talk will be Alexander Azimov, and he is talking about egress monitoring at scale.

ALEXANDER AZIMOV: Hi everyone. I am working for Yango.

And today, I am going to share with you our experience in building multilayer monitoring system for external connectivity. Let's start from the common ground.

There are outages. And sometimes these outages are affecting services that we are responsible for. And if our services are affected, it's all about time. Time to prove that network is innocent or prove the opposite. And if we're speaking about external connectivity, things became way more complicated. Why? Because in cases of outages at the external connectivity, we need a timely health checks of numerous systems that we are not operating. However, your top monitors will be expecting you to fix it.

We have a nice job.

So, let's speak about systems that can be useful if we are facing such outages.

During this report, I will cover three systems that we are using on a daily basis, and the first one is network error logging, or NEL. It's an awesome tool if you are a content provider. How it works:
.
It turns a browser into a monitoring agent by adding several fills in the HTTP response, you are getting an opportunity to instruct the browser to what and where to report. And in a situation when the browser fails to load a selected page, it will not just send you information about unavailability, it will also report about what happened.

So, in this particular example, it will report about what a link was unavailable, what IP address was used during the as resolution and also provide the error code.

Here are a few examples. Wait, not before the examples. The support: It's very important.

NEL ‑‑ there is no IETF document, there is no IANA codes, but NEL is supported by the majority of browsers, and to enable NEL, you just need to configure your web server and reporting server.

So, I hope now there will be an example.

On this slide, you can see a spike of connection timeout that was detected with NEL, and because NEL also provides the IP address that was used by the user, you can also map it to the other systems. With this particular scenario, we were able to detect the failing peering partner that was responsible for this outage, contact the technical support and finally resolve the issue.

Here is another example.

As you can see, here is increasing number of TCP timeouts, but this time the network was innocent, because the reason of this outage was their change in the TCP timeouts, they were too aggressive. So, let's make a summary about NEL.

NEL is an awesome tool for content providers. It's a great first line of diagnostics because it can help you distinguish between delays or other connection areas. But it has limitations. If you are not a content provider, unfortunately none of you are content providers, it has a limited way to use. It's hard to distinguish between egress and ingress anomalies, and still we need some network diagnostics to distinguish between TCP errors and network issues.

So, what is the most classical way to monitor a network? Of course it's remote probing. And there are many services available to achieve this goal. Many of them are paid services, some of them are even semi‑free, like RIPE Atlas.
RIPE Atlas has thousands of probes, but is it enough? In reality, to have a full coverage of your traffic, you need hundreds of thousands of probes, maybe even millions. Unfortunately, there is no such a service in the market.

So, remote probing is a nice tool for one‑time measurements. Or if you are likely to have probes in your network that you want to check. Of course, remote probing don't have enough coverage, and it's hard to detect a failing link with such a tool.

What can we do?
.
I hope you remember this shot. It's in Latin. So you should read it. But for those who don't know what is written there, it can be translated in English as know thyself. So, if you try to understand the vault outside, you should start with understanding yourself, understanding how you are operating in this world.

And so, let's think about the remote probing. In the ICMP scenario, remote probing is just ‑‑ it works as follows: Remote probes are sending sympathy error requests to your network and waiting for the response. What if we invert it? What if, instead waiting for the requests, we will be generating them. So, we will use other networks as probes and place their generator inside our network.

So let's see how it works, because we deployed it several years ago.

So, first, we decided to find what is our coverage. We used sFLow data during our autonomous systems, to find with each autonomous systems we are really interacting, we don't need all of them, but we found that we need one‑and‑a‑half thousand autonomous systems to cover 95% of our traffic. After that, we used Z map to find stable IPs that are answering ICMP and TCP. From these points we have got one million ‑‑ more than one million of IPs that were eligible for our monitor and, after that, we installed several monitors agents in key points of our network and each of these agents is pinging these IPs on a one‑minute interval.

Let's see how it works.

Here is an example of an outage in a network of our peering partner. And the system was able to instantly detect it. So, such a system requires naming.

In our company, we have a custom. So we try to name our services with humble and precise names. That's why the system responsible for controlling servers is called sky net, or another system that's responsible for performance testing is called fatality. We had how to name a system that's pinging a million IPs on one‑minute interval.

We had no other choice. So, speaking about megaping.

It was a significant step forward. We were able to distinguish between TCP anomalies and network anomalies, but it had its own limitations. It was hardly applicable to IPv6 and still, it was really hard to distinguish which links are failing because for example, if you have several routes to a selected autonomous system, if they are all affected or a subset is affected, what is it about autonative pass? So we decided to create yet another monitoring system.

But firstly to take a look how the traffic flows from the see to the users. How TCs are IPv6‑only. The problem is that not all our users are IPv6‑only and, to solve that problem, we utilised tunnelling. So the server encapsulate the egress traffic with IP encapsulation and it goes to our DPDK platform. That was called Yanet, it was recently open sourced, why the IPv6 header is removed and, after that, the packet travels to the border, where IP look‑up is performed.

So, we want to MAP‑T CP information and prefix information, the routes. The server knows all about TCP but knows nothing about the routing table. The border knows everything about route but has zero information about TCP. And in the middle is all mightily DHCP platform that knows nothing about TCP and routes. So, you may guess what follows.

First, we decided to bring the TCP data to our platform. There is an option field in the TCP header. Not many know that there is a secret level that's called experimental TCP options that can be used inside your administrative domain without writing IETF document, without notifying IANA. And using these experimental options, we delivered through our DBDK platform information about the number of packets, the number of retransmitted packets and the RTT measured by the host.

The second part was even harder. We need to bring the look‑up from border to some device in the middle, so some middle box.


To achieve this, first, we place each link in its own VRF routing table and we also utilised and placed over UDP to instruct the border to use a selected table, so we are using MPLS over UDP we are making border to use a specific link. There was just nearly no look‑up, except for some specific situations in IXS.

And now, our DPDK platform was able to both collect TCP statistics and make look‑ups in the routing table, and encapsulate their packet that is sent to the users to MPLS over UDP and the border is left with only decapsulation function.

So, let's see how it works.

For IPv6: Nothing really changes. It applies both to IPv4 and for IPv6.

And here is an example. There is a well‑known operator ‑‑ ISP, it's called China Telecom, and from time to time networks IXP peering troubles working with them. We do not have a direct link with China Telecom. We see them through multiple routes, through our IP transit providers, and unfortunately not all of them are the same. As you can see, one of our IP transit providers have regular spikes of packet loss, up to 40%, and of course we changed our routing policy to improve user experience. But our system is capable not to monitor only the best path, it's capable to monitor all alternative paths because we see all the routes that are coming from all directions, all policies applied not at the border router, but at the DBDK platform in the middle. And so we are seeing the alternative path, we are seeing how the issues is progressing and for now it was not resolved and we're still in touch with our IP transit provider.

There is another example. I hope you still remember the example from the beginning of the presentation about the megaping. There was an outage, it took ten minutes and after that it was resolved. This is the who story. It was the megaping was showing 10% of packet loss. In reality, it was one of two links to our peering partner that was affected. On the failing link, the packet loss was 20%. Our duty engineers using this monitoring system changed the routing policy in ten minutes or something, but we still had an eye on the Internet. So, as you can see from the slide, so the peering partner fixed the problem. It took about eight hours. So, it's a dig difference from the ten minutes that it took us to react to the incident and the time it takes for the resolution of the network of our peering partner that we are not operating.

So, we again need to name our system. And we called it Dr. Egress. The system that is responsible for both monitoring and engineering egress traffic.

What Dr. Egress can do: It can detect failing links and it can detect it not only at the link level, it can detect it on the the autonomous system level. And if needed, even at the prefix level.

It works both for IPv4 and IPv6. It's not easy to deploy, but it gave us an opportunity to detect dozens of incidents each week, which makes it worthy of the cost of deployment.

There are some limitations. The biggest limitation is what happens if we have a problem with ingress traffic? Because if we have a problem with ingress traffic, Dr. Egress is just monitoring what we are delivering to the user in case if there is an outage with traffic in the ingress direction, we will see nothing.

And that's why we have adopted a multi‑storey layer approach, we are using simultaneously three systems: NEL, megaping and Dr. Egress, and correlation between these systems helps us in the most complicated scenarios. For example, if megaping and NEL detects an outage and Dr. Egress sees nothing, then it's an ingress traffic and we should look at what's happening with traffic coming to our network. If, for example, there is a problem with NEL and no alarms in both megaping and Dr. Egress, there is something strange with DPI in a network of our peering partner.
.
I hope these experiences can be applied in your networks and, please, take in the mind that all these systems are using 'know thyself' approach. There is no third party involved. We are getting all the data from our own system.

Thank you for listening. I will be glad to answer your questions.

(Applause)

WOLFGANG TREMMEL: Are there any questions?

RANDY BUSH: Randy Bush, IIJ and Arrcus. Open source?

ALEXANDER AZIMOV: Which part?

RANDY BUSH: Everything.

ALEXANDER AZIMOV: So it's partially open sourced. As I was telling, the DPDK platform, its foundation was open sourced a month ago. The megaping is based on open sourced Z map. So, we haven't open sourced the code of megaping but it's not that hard to make it again. Maybe we will also, at some point of time, open source our egress engineering calculations.

RANDY BUSH: There is a tool chain from Z map to megaping.

ALEXANDER AZIMOV: Yes, I do agree. We will consider option to open source it, too. Thank you for your question.

WOLFGANG TREMMEL: Any more questions? Are there any questions online? Thank you Alexander.

(Applause)
.
The third talk today will be a lightning talk, this is from Sheikh Md Seum, and it's about high‑quality affordable Internet for the bottom of the pyramid.

SHEIKH MD SEUM: Good afternoon, everyone. I am Sheikh Md Seum, network and systematic engineering in Bangladesh. I'd like to express my gratitude to RIPE for this wonderful opportunity to present today. Now, let's dive in what we have been up to and how our initiatives are shaping and improving the life of people and Internet community of Bangladesh.

It's been quite a journey since 2019 and I am excited to share our progress since my last talk at RIPE 83. Thanks to the awesome supports from RIPE, APNIC who rolled out our public wi‑fi spot. Guess what? They are making a real difference in rural areas, boosting the local economics. And for the small ISPs, we have opened up a new routes to connect with people. It's all about making sure that everyone gets the piece of the Internet pie.

So far, we have shared out more than 20 plus wi‑fi zones and served more than 17,000 people. There is a lot of happy servers. You know how the Internet works, right? It's all about ads. Facebook, Google, Twitter, keep the server humming. We thought, why not use the same idea to give folks free Internet? Here is the deal. People will watch free ads and, in return, they will be able to access the Internet. Let's face it, telecom Internet bills are sky high and the installation charge of broadband Internet, don't get me started, where people lives are already below the poverty line.

Luckily, our journey got a Cloud credit boost from Microsoft. They have been amazing in helping us to deploy the networks in their Cloud.

Now, let's talk about the problems we faced during this deployment phases. First up, the ever familiar new‑kid‑on‑the‑block challenges. You know how it goes, technical glitches in the wireless networks and the headaches with the placement and the configuration of the routers. It's like solving a puzzle blindfolded. We even attempted to make things smoother by shipping bigger devices, but, you guessed it, not everything goes according to plan. And there is the classic power outage drama, you are all set to bring Internet to people and then suddenly the lights goes out. Murphy's law, right?
.
To tackle some of the issues, we implemented a couple of solutions. Firstly, we introduced power backups to address the power outage issues. No more interruptions when electricity decides to play hide and seek. We had deployed a sophisticated software solution which monitors our wi‑fi quality and networks. This way, we can keep a close eye on the the performance and ensure the smooth surfing experience for everyone.

While addressing these challenges, we stumbled upon a ground‑breaking solution: the Neural Wi‑Fi Cloud concept. We developed an innovative AP that provides us data from allowing us to pinpoint connectivity issues where they are happening. This led us to the realm of comprehensive and to end‑problem analysis. Our breakthrough caught the eye of our partner ISPs, who were eager to co‑pilot our solution within the networks collaborating with our partner ISPs not only allowed us to gather valid information for diagnostics, but also unveiled the opportunity. We found that while many known vendors are selling the similar last mile diagnostic solutions at sky‑high prices, we have a different approach in mind.

During this period, the AI boom took place, opening doors to previously impossible things. It presented us with exciting opportunities where everyone is essentially starting with a blank canvass. Our data collection strategy is comprehensive. We gather information from multiple sources. We ensure that data can be collected from any end point, be it OLTE (?), BRAS, routers, CPE and even from user devices. Our approach to cast the widest net possible trapping various APIs from those devices into a wealth of data. We firmly believe that the more API we integrate from these devices, the richer our data resources become.

Now let's dive into what type of data we collect.

We keep a close eye on the health of our devices, ensuring they're operating at prime. Datasets statistics on interface help us to monitor performance and monitor any performance bottleneck. By capturing metadata from traffic stream we gain valuable insights about the data pattern and the user strands.

Monitoring uplink congestion allows us to proactively address networking congestion. We track the status of user connection ensuring seamless experience for everyone in our network.

Finding skilled networking engineers for level one support remains the global challenge, especially a region like Bangladesh. This leads to happy dependencies on Apple level supports, costing increased time and cost. To address this, our app stores all collected data in a centralised warehouse. This data then presented to level one, supports more user friendly and explanatory formats.

Additionally, starting previously inaccessible data due to security policies is now allowable to them. Our AI bot steps in to provide explanation regarding problems and possible solutions, a role typically fulfilled by level 2 and level 3 supports. This explanation, revealing problem and solution, ultimately increasing customer satisfaction.

Now, let's have a look at the dashboard of our software. As you can see, we present a device held, how much bandwidth is being pulled, etc., etc.

But wait, our innovation journey doesn't stop there. We are pushing boundaries by exploring unconventional data mappage aiming to visualise more resourcefully. Our next step involves integrating more AI which will automate within the network enabling us to pinpoint route calls with greater efficiencies. That's not all. We are also developing solutions to address these issues strictly and actively using AI.

Thanks for your attention. If anyone has any questions, feel free to ask.

(Applause)

WOLFGANG TREMMEL: Thank you. Are there any questions? Thank you very much.

Sorry, was there one question? All right, sorry.

SPEAKER: Silvan Gebhardt, from Openfactory. I have one question. How do we, as an Internet community, with what I'm hearing here, ensure that the bottom of the pyramid still deserves privacy and is not exploited by big tech on all their private data? I see what you are trying to do, but this is a very two‑sided sword, in my opinion.

SHEIKH MD SEUM: That's a great question. I have a similar concern and, as of today, the data we collect we collected with user consent firstly, and secondly, only the ISPs can see it and we don't pinpoint the exact user. So, it's collected in a greater way. So you cannot pinpoint individually. And still, the privacy part remains a question. It's the ethics of the company. So, we are trying our best, as long as I can say, rest assured. Thank you. Any other questions?

WOLFGANG TREMMEL: Next question.

SPEAKER: I would have a question. From ESD Hungary. I would like to ask you how many people have access to Internet this way? So what is the size of the network that you operate or plan to operate? I don't know exactly. So... do you have an estimate of this?

SHEIKH MD SEUM: Well, till now, we have served more than 17,000 to 20,000, but about the concurrent user, it's a difficult question because in some nodes we have 10 or 20 daily users. But in some others, it depends. So we have around 20 nodes. It's basically we cannot deploy it because the areas are quite rural and there are some issues with finance and other stuff, but we are working to expand the networks as soon as possible and we really need help from the community, especially the community of Bangladesh, and any other nation who is trying to set these things up and provide Internet to the rural communities, we are looking for that. So...

SPEAKER: Thank you very much. I think it is a very good initiative and I wish you good luck.

SHEIKH MD SEUM: And in fact our goal is to use the portal not to monetise it, the ultimate goal is to provide educational content to the third world countries where the acts might be like, hey, do you know how to use the Google, maybe who can land this stuff, maybe some videos about you can say these things in English? The ultimate plan is to make sure everyone gets educated with the Internet because what I have seen ‑‑ I was in Bangladesh one year ago, but, after coming to Italy, I have discovered that everyone here is familiar with Googling thing, which lacks in our part of the world. So if you can teach people how to Google, it changes their life and as a community, it makes people ‑‑ it really helps in their life. That's all I want to say. Thank you.

(Applause)

WOLFGANG TREMMEL: A couple of announcements. There are two BoFs tonight, one is the BCOP ‑ the Best Current Operational Practices ‑ BoF, this is at six o'clock here in this room. And there is a BoF about sustainable networks, that's in the side room, also at six o'clock. And at seven o'clock, there is the reception and that takes place in the clubhouse, and, according to the plan, that should be outside to the right and straight ahead. But it's signposted so don't follow what I say, follow the signs.

Okay. Thank you enjoy the coffee break.

LIVE CAPTIONING BY
MARY McKEON, RMR, CRR, CBC
DUBLIN, IRELAND.