Expanding BGP Data Horizons

BGP routes collected from operational routers are extremely valuable to monitor and study Internet routing. However, BGP data collection platforms as currently architected face fundamental challenges that threaten their long-term sustainability: their data comes with enormous redundancy and yet dangerous visibility gaps.

GILL is a new BGP routes collection platform that can collect routes from at least an order of magnitude more routers compared to existing platforms while limiting the increase in human effort and data volume.

GILL's key principle is an overshoot-and-discard collection scheme: Any AS can easily peer with GILL and export their routes. However, GILL only stores and makes available to users the nonredundant routes.

GILL has been accepted at ACM SIGCOMM 2024!

Coverage matters but is challenging

RIPE RIS and RouteViews, the two main BGP routes collection platforms, peer with routers from an increasing number of ASes (1500 in 2023).
Yet, their coverage (the percentage of ASes from which they collect routes) does not increase and remains low due to the growth of the Internet itself.
Today, only 1% of the 75k ASes export their BGP routes to RIS or RouteViews.

Low coverage negatively impacts the quality of many studies and tools that rely on BGP data.
For instance, our simulations show that 80% of the peer-to-peer links are uncovered when 1% of the ASes peer with a collection platform (red area).

We suggest a 25-100x times higher coverage (green area). For instance, this would enable detecting the vast majority of the peer-to-peer links.

Increasing coverage is challenging as new peering sessions increase data volume and human efforts needed to archive and process the collected data.
In 2023, RIS and RouteViews archived 100TB of data. This number quadratically increases due to combined growth of these platforms and the Internet itself.

Recently, RIS has expressed concerns about this unsustainable growth rate and its implications for long-term data management. A survey that we conducted also reveals that researchers often resort to sampling the data to process it within their time constraints.

GLL: High coverage, low data volume

GILL enables high coverage with low data volume using an overshoot-and-discard data collection scheme.

Overshoot. GILL can peer with tens of thousands of BGP routers thanks to our optimized code. Network operators can automatically peer with GILL by filling a form and configuring a BGP session on their router.

Discard. As storing every bit of data results in data management problems, GILL discards redundant BGP updates and only stores the others. Despite the lack of consensus on how to identify the redundant BGP updates, GILL maximizes fairness across various objectives thanks to new algorithms that predict redundant BGP routes without overfitting.

Low coverage prevent mapping the AS topology accurately

Long-term benefits

GILL offers a longterm path toward sustainable scaling of BGP data collection. For instance, in a world where 50% of the ASes peer with GILL, our simulations reveals that GILL stores 4.7% of the collected BGP updates. These updates enable to observe 61% of the p2p links.

In contrast, storing 100% of the BGP updates enables observing 85% of the p2p links but leads to data management problems.
Collecting all updates from a few random ASes such that the number of stored updates is the same than with GILL leads to only 16% of the p2p links observed.

Immediate benefits

GILL's sampling algorithms can help users cope with the massive stream of data that RIS and RouteViews generate. We replicated a few studies and tools that rely on a sample of the BGP data collected by RIS and RouteView and used GILL's algorithms to sample the data.

With the same number of BGP routes processed, we always improved the quality of the results.

Inference of AS relationships. We managed to infer +15% of AS relationships compared to CAIDA's dataset without losing accuracy.
Computation of AS Ranks. We managed to prevent many flawed inferences. For instance, we correctly infer that AS132337 has a customer cone size of 18k whereas it has a customer cone size of one in ASRank.
Inference of forged-origin hijacks. We managed to make 4x more precise inferences and detect more suspicious cases.