Sensibility analysis of BGP convergence and scalability … analysis of BGP convergence and...

Sensibility analysis of BGP convergenceand scalability using network simulation

Systementwicklungsprojekt (SEP)

Institut fur InformatikTechnische Universitat Munchen

85748 Garching bei Munchen

Aufgabenstellerin: Prof. Anja Feldmann, PhDBetreuer: Olaf Maennel

vonWolfgang Muhlbauer

([email protected])

Abgabedatum: August 30, 2004

Abstract

The Border Gateway Protocol (BGP) is the quasi-standard for the routing between au-tonomous systems in the Internet. Instabilities in the topology like a failing link can lead toa considerable delay in convergence times. Therefore it is necessary to gain a better under-standing of the global dynamics and underlying mechanisms of BGP.

In this work we perform a sensibility analysis of convergence times and number of exchangedupdates to the settings of BGP parameters. In particular, the influence of the MinimumRoute Advertisement Interval (MRAI) timer is investigated.

Further experiments serve to lighten the propagation of updates in succession to the failure ofa link. Scalability questions like how many autonomous systems are affected by the instabilityand how far do update messages spread out from the broken link will be examined in thiswork.

All experiments are conducted using the SSFNet network simulator.

Contents

Contents 1

1 Introduction 31.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Goals of this Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.1 Influence of MRAI Timer Settings . . . . . . . . . . . . . . . . . . . . 41.2.2 Propagation of updates . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Guide to the Reader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Using the SSFNet Simulator 52.1 General Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Extensions to SSFNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3 Generation of DML Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3.1 Subgraph Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.3.2 Automatic Generation of the DML File . . . . . . . . . . . . . . . . . 9

2.4 Simulator Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Setting up the Experiments 113.1 Simulation Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.1.1 Middle Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.1.2 Topology 1140 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.1.3 Topology 7774 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2 Generation of Link Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.2.1 Link Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.2.2 Failure Scenarios in the Experiments . . . . . . . . . . . . . . . . . . 15

3.3 Analysis of the Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . 153.4 Taken Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.4.1 Investigation of MRAI Timer Effects . . . . . . . . . . . . . . . . . . . 163.4.2 Investigation of Update Propagation . . . . . . . . . . . . . . . . . . . 17

4 Simulation results 194.1 Influence of MRAI Timer Settings . . . . . . . . . . . . . . . . . . . . . . . . 19

4.1.1 Varying the MRAI Timer Values . . . . . . . . . . . . . . . . . . . . . 194.1.2 Per-peer and per-prefix MRAI Timers . . . . . . . . . . . . . . . . . . 21

4.2 Propagation of Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.2.1 Experiment Description . . . . . . . . . . . . . . . . . . . . . . . . . . 24

1

CONTENTS 2

4.2.2 Number of affected ASes after a link failure . . . . . . . . . . . . . . . 254.2.3 Propagation Radius . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5 Conclusions and directions for future work 30

List of Figures 32

List of Tables 33

Bibliography 34

Chapter 1

Introduction

1.1 Motivation

”QUERTYUIOP!” This strange-looking collection of characters is said to be the content ofthe first electronic missive sent by the engineer Ray Tomlinson in 1971 from one computer toanother computer sitting right beside it. Of course, Tomlinson and other network pioneersof this time, could not anticipate the tremendous development of networking resulting in aworldwide mesh of connections, the Internet.

Complex issues arise with the increasing size of networks. Take as an example routing inthe Internet: how do packets find their way to a specific destination in this distributedenvironment where no router has knowledge on the global topology and all available networklinks? It is the task of so-called intra-domain routing protocols (e.g. OSPF, RIP) and inter-domain routing protocols (BGP) to provide a solution to this problem.

However, existing routing protocol implementations are far from being perfect. The BorderGateway Protocol (BGP) being responsible for maintaining connectivity between autonomoussystems (ASes) in the Internet sometimes cannot prevent considerable delays in the conver-gence process after instabilities have occurred in the network.

Unfortunately, the underlying mechanisms of BGP are not yet understood well enough toimprove the existing protocol implementation in terms of specific aspects. Therefore, a carefulanalysis of the status quo is indispensable.

1.2 Goals of this Study

It is the main objective of this work to explore the scalability of BGP and the influenceof configuration parameters on convergence times and the number of exchanged protocolmessages by using the SSFNet simulator. For this purpose, we concentrated basically on twoaspects which will now be introduced very briefly: the settings of the MRAI timer and thepropagation of updates.

3

CHAPTER 1. INTRODUCTION 4

1.2.1 Influence of MRAI Timer Settings

1 In order to generate their routing tables, BGP speaking routers exchange messages in asimilar way as it is done by other distance vector protocols. These advertisement messages arerate-limited using timers associated with the value Minimum Route Advertisement Interval(MRAI). Whenever a router is advertising a route for a certain destination to a neighborautonomous system (AS), a new instance of this timer is started. In the aftermath it isprohibited to send another advertisement concerning this destination to that neighbor untilthe associated timer has expired after MRAI seconds.

This rate limiting is supposed to dampen some of the oscillations inherent in a distancevector protocol. While waiting for an MRAI timer to expire, a BGP router does not exposeits connected neighbor ASes to every intermediate step in finding the best path to a certaindestination. Thus rate limiting can be expected to reduce the number of updates needed forconvergence at the cost of adding some delay to the sent messages.

It is one main objective of this work to perform a sensibility analysis on the parameters ofthe MRAI timer.

1.2.2 Propagation of updates

After a link has broken in a given network, update messages between the autonomous systems(ASes) must be exchanged until new best paths for all affected routes have been installedagain. It could be assumed that in general some ASes will not discern the instability, meaningthat they don’t receive any BGP update messages. This might happen if prefixes which havebeen routed over the broken link are now redirected to new paths and those new pathspossibly do not differ completely from the original path but actually have some nodes (ASes)in common.

Investigating the propagation of updates involves making a statement on the number or ratioof ASes receiving update messages as a consequence of a broken link and on the propagationradius. By propagation radius we understand the distances updates spread within the topologystarting from the source of the instability. Said more simply: how far away from the brokenlink can the instability still be observed?

Altogether, this second main aspect of our work could be of great importance for drawingconclusions on the scalability of the BGP protocol.

1.3 Guide to the Reader

This document is structured as follows: In the following part we are describing the usage ofSSFNet and the extensions made to this network simulator. After giving an overview of howthe experiments were conducted in Chapter 3, we present the results of our simulations inChapter 4. We are closing with a short summary and some suggestions for future work inthis area.

1section has been adopted from [1] with marginal modifications

Chapter 2

Using the SSFNet Simulator

2.1 General Overview

Examining the dynamics of updates in a distributed protocol like BGP constitutes a challeng-ing task. For a complete understanding and analysis, it is desirable to have a global view andcontrol on all routers involved in the protocol communication. In the case of BGP, knowledgeon all messages sent from or received by a BGP-speaker allows to find out the prefixes whichare advertised to another neighbor, thus deducing the routing in this inter-domain topology.Additionally, it is possible to impose specific events like link failures or the advertisement ofnew prefixes. By using simulation techniques, this global view and control of things can mosteasily be achieved.

We decided to use the Scalable Simulation Framework (SSF) [2] mainly due to three reasons.First of all, this framework already provides an implementation of the BGP4 protocol madeavailable by B.J. Premore [1]. Furthermore, the basic BGP implementation has been extendedwith a lot of new features by members of the research group of Prof. Anja Feldmann in thepast (see section 2.1). The modular and concise structure of SSF in comparison to othernetwork simulators like ns-2 alleviates last but not least the enhancement with new featuresneeded for our investigations.

A general overview on the SSF network simulator is given in Figure 2.1. It consists of adiscrete-event simulation kernel, dealing with all the fundamental aspects of a simulation.Based on this kernel, SSFNet provides a collection of Java-based components which con-tain the modeling of the Internet protocols and network topologies and which can be easilyextended with new features and new components. Within the SSFNet packages, it can be dis-tinguished between further parts, of which the most important ones are SSF.OS and SSF.Net.Whereas SSF.Net reproduces the network topologies (links, nodes, network connectivity),SSF.OS is responsible for modeling the numerous protocols (e.g. TCP, BGP, OSPF). Theconfiguration of the simulation parameters, the used network topologies and the simulationdynamics are defined in text files which are written in the Domain Modeling Language (DML)syntax [2].

Under the assumption that all needed protocols and features are already implemented inSSFNet, the major task is to generate the DML-files. The following excerpt is supposed to

5

CHAPTER 2. USING THE SSFNET SIMULATOR 6

SSFsimulation kernel

SSFNetsimulation models

DML filesconfiguration files

SSF.OSprotocol simulation topology simulation

SSF.Net

configures

part ofpart of

based on

Figure 2.1: Structural overview of the SSFNet simulator

illustrate in a very simple manner the concept of the Domain Modeling Language.

Net [host [

id 1interface [id 0 bitrate 100000000 latency 0.0]graph [

ProtocolSession [name tcp use SSF.OS.TCP.tcpSessionMaster]ProtocolSession [name ip use SSF.OS.IP]

]]host [

id 2interface [id 0 bitrate 100000000 latency 0.0]graph [

ProtocolSession [name tcp use SSF.OS.TCP.tcpSessionMaster]ProtocolSession [name ip use SSF.OS.IP]

]]link [attach 1(0) attach 2(0) delay 0.002]

]

This DML snippet describes a network with two hosts, each running a TCP session over theIP protocol. The host (id 1) is connected to the host (id 2) by a link with delay 0.002.

As the DML files are getting very large with increasing size and complexity of the simulationtopologies, they will be generated automatically in most cases (see section 2.3). The userdoes not have to care about the assignment of unique identifiers for hosts and interfaces (NHIaddresses) and their corresponding IP addresses because this is done by the simulator itself.

Issuing the following command will run the simulation for 2000 seconds, provided that all pathvariables have been set correctly (refer to [2]) and that the configuration has been writteninto the DML file myModel.dml.

java SSF.Net.Net 2000 myModel.dml


2.2 Extensions to SSFNet

As already mentioned in the last section, there is already an implementation of BGP4 includedin the SSFNet package (primary author B.J. Premore). For our work we used SSFNet version1.4 which orientates itself strongly to the recommendations of RFC 1771 but which still missessome BGP functionality like route flap damping. More detailed information and a summaryof the implemented features can be found in [2] and in [3].

However, numerous features were added to the BGP implementation of SSFNet in the lastfew years by members of the research group of Prof. Anja Feldmann. In the scope of hisdiploma thesis “Analysis of OSPFv2-BGP4 Interactions Using the SSFNet Simulator” [4]Hagen Bohm amended SSFNet with the OSPFv2 protocol and a scanning process for BGPwith the possibility of simulating link failures . Another diploma thesis by Andreas Hartl [3]investigated the dynamics of BGP updates in realistic topologies making it necessary to addnew functionality to the existing BGP implementation in SSFNet. The following were themost important changes:

• MRAI timer : With Cisco routers being the overwhelming majority in nowadays net-works, the MRAI timer implementation was modified such that it exactly models thebehavior of Cisco BGP speakers. Needed changes were the normal distribution of theMRAI timer value (takes values between 25s and 31s) and the use of a per-peer timerbasis instead of a per-prefix MRAI timer (Cisco routers keep a separate timer only foreach neighbor not for each advertised prefix to a neighbor).

• Best Path Selection Process: Here again, the strategy of finding the “best” route to adestination differs for Cisco routers from that of the RFC specification. In particular,more emphasis is placed on the length of the AS PATH attribute in the tie-breakingmechanism.

• Community Values: The SSFNet BGP implementation was modified such that it under-stands BGP community values as specified in RFC 1997. With the help of communityvalues, it is possible to reproduce peering, or customer-provider relations between au-tonomous systems (refer to [5] and [6] for closer information on AS relationships).

• Workload Generation: Factors like the number of BGP sessions on a host, the size ofthe routing tables or the number of updates in the input queue were taken into accountin order to create a more realistic workload.

An all-embracing description of these adaptions can be found in [3].

For the investigation of BGP convergence and scalability, a mechanism was needed to createinstabilities in a given topology. Modified SSFNet classes, implemented by Hagen Bohm,were merged with the normal BGP implementation, making it possible to configure link failsin a comfortable way in the DML files. Though not used for the simulations in this work, weadded the route flap damping mechanism (taken from a later SSFNet version) and the optionof inserting “dummy prefixes” into the network.


2.3 Generation of DML Files

The configuration of the topology, the simulation parameters and simulation dynamics areall defined in DML files. The main objective of this work consists in examining BGP updatedynamics in realistic topologies. But with the high complexity of realistic networks it becomesmore and more difficult to built the input files manually. That is why we need the possibilityof creating DML files automatically. The general procedure is depicted in Figure 2.2.

AS relationship pairs(extracted from authentic

BGP routing tables)

AS relationship pairs(extracted subgraph)

DML configuration file(simulator input)

Figure 2.2: Generation of DML files

The source for the automated generation of DML files are so-called AS relationship pairs.Being extracted from authentic BGP tables (for example from RIPE) with some kind of seedinformation (e.g. a tier1-provider) by a tool from Arne Wichmann [7], these pairs reflect thecommercial relationships between interconnected ASes. For our study we distinguish betweenprovider-customer relationships (customer pays its provider for connectivity to the rest of theInternet) and peering links (neighboring ASes agree to exchange traffic free of charge). Thefollowing excerpt shows a possible AS relationship pairs input:

1234 > 24012401 = 31103110 < 1234

This short listing denotes that AS 1234 is a provider of 2401 (>), 2401 is sharing a peeringlink with 3110 (=) and AS 3110 is customer of 1234 (<).

Unfortunately, taking the complete topology graph resulting from all AS pairs gained fromRIPE or other BGP instances, is not possible. Due to high memory demands of such a samplenetwork in the simulations, we extract a complete subtree of a specified AS out of the originalgraph. The Topology Extraction tool from Andreas Hartl [3] will be explained briefly in 2.3.1.

After extracting a subgraph, we have a reduced topology with less ASes. However, our currentinput is still in the syntax of the original AS relationship pairs. It is the task of the TopologyConversion tool to convert this “AS-Pairs syntax” into the DML language format which canthen be passed on to the SSFNet simulator as input. A short description of this tool togetherwith our extensions can be found in 2.3.2.

2.3.1 Subgraph Extraction

In order to reduce the complexity and size of the sample networks used for our simulations,a subtree of a specified AS can be extracted out of the original graph given in the form ofAS Pairs. The basic idea of the Topology Extraction tool [3] consists in first doing a kindof depth-first-search algorithm up to a certain depth meaning that it will find all core ASes


which are not more than a specified number of AS hops away from the starting AS (the toolrefers to this parameter as “number of AS hops”).

Afterwards it searches all paths between the core ASes up to a specified length (referred to as“ maximum path length” by the tool), adding all intermediate ASes on these paths which havenot been visited yet. It should be mentioned that the extraction of the core ASes is doneunder consideration of certain redistribution policies arising from commercial relationships(peering, customer-provider). Taking into account all paths between the core ASes up toa certain length ensures that most propagation effects should also appear in our extractedtopology if they can be observed in the complete network.

For an all-embracing explanation of the subtree extraction, we refer to section 4.2.1 of [3].The extraction tool asks for all parameters and is started by typing:

make extract

2.3.2 Automatic Generation of the DML File

Figure 2.2 showed that we generate the DML files needed for our simulations on the basis ofAS relationship pairs. Even if a subtree is extracted from the complete topology, the input isstill in the form of AS pairs and must be converted to the DML syntax before SSFNet canuse it. For this purpose there is another tool called Topology Converter which fulfills thistask.

A detailed description of the functionality of this tool can be found in section 4.2.2 of [3].In terms of the DML file generation out of AS pairs, it is possible to distinguish betweentwo important parts: the external structure of the topology (links between ASes) and theinternal structure (I-BGP mesh within an AS). Whereas the external structure is built uponthe information given in the form of AS pairs, the interior of an AS is generated accordingto the wishes of the user. For example, the user can determine the number of route reflectorswithin an AS or the number of border routers which connect to other customer, provider andpeering ASes. However, the internal structures will look similar for all ASes in the topology;it is not possible to generate different (I)-BGP meshes for each AS.

The original Topology Converter was modified slightly. The changed version will be calledTopology Generator and contains the following additional features:

• Per-prefix MRAI, WRATE, SSLD : The user is asked whether he wishes to use per-prefix MRAI timers instead of per-peer MRAI timers and whether to activate WRATE(withdrawal rate limiting) and SSLD (sender side loop detection).

• Route Flap Damping : The route flap damping mechanism of a later SSFNet version(1.5) was merged into the used SSFNet implementation (version 1.4). For the case thatthe modified SSFNet version is used, the Topology Generator tool can enable route flapdamping with different parameter settings (default Cisco or Juniper settings or manualspecification of the parameters is possible). However, route flap damping was not usedfor the simulations in this study.

• Dummy Prefixes: There is the possibility of inserting a specified number of dummyprefixes starting from a dummy-AS into the network. In this way, it can be achieved


that the BGP routing tables are larger and have more entries.

• Link Failures: The specification of link failures has been extended. Now it is possibleto define the number of links to fail, a time interval in which the link failures occur ata random time and a time when all broken links are supposed to recover.

The Topology Conversion or Topology Generator tool is started by typing make convert inthe appropriate directory.

2.4 Simulator Output

An essential part of this study consists in analyzing the simulation output and drawing con-clusions based on the results of the simulation. During a simulation run, all sent and receivedupdates are logged, containing information on the sending or receiving time, the sender orreceiver, the type of the update message, the affected prefix and the AS PATH attribute ofthe BGP message. The following two lines give an idea of the logged data:

45.709774161 send 4:10 4:2 rte 0.0.1.0/26 (3 1)45.710960373 receive 4:10 4:2 rte 0.0.1.0/26 (3 1)

Both lines actually belong to the identic BGP message from interface 4:10 to 4:2. The firstline shows the time when the message was sent by the source, the second time indicates thearrival time at the destination. Here the prefix 0.0.1.0/26 is advertised within AS 4 becausesender and receiver are both part of AS 4 (4:10 and 4:2). Originally, the prefix was announcedby AS 3 and has propagated over AS 1 to AS 4 which is indicated by the AS PATH attribute(3 1).

With all this information it is possible to get the desired complete view and control on allBGP speakers in the network thus enabling us to perform a comprehensive analysis of updatedynamics.

Chapter 3

Setting up the Experiments

Whereas the last chapter dealt with all relevant aspects in terms of the used network simulatorSSFNet, this chapter is dedicated to the setting up of the experiments. It is essential toknow how an experiment was conducted, what pre-assumptions were made, what testingenvironment was used, etc. The sections below explain in detail our investigations of theMRAI timer influence on convergence times and the propagation of updates.

3.1 Simulation Topologies

Running simulations with the SSFNet simulator requires as input a file in the DML format.The DML files do not only describe the simulation parameters and dynamics (e.g. linkfailures) but also the network topology, i.e. the graph of ASes. One of the main objectivesof this study is to examine BGP behavior in realistic networks which approach the structureof the Internet as closely as possible. For that reason choosing the simulation topologies is acritical task.

In order to verify the correctness of our extensions to SSFNet and certain auxiliary Perl scripts,we developed some simple testing networks, which are of no greater importance for the resultsof this study. The more complex and realistic networks were all generated automaticallyas described in section 2.3. Now we introduce the relevant topologies for conducting theexperiments.

3.1.1 Middle Topology

The so-called Middle Topology (taken from [3]) is pictured in Figure 3.1. This topology wascreated manually by specifying the commercial relationships between the ASes in the AS pairformat and then running the Topology Conversion tool (see 2.3.2). Contrary to the next twotopologies, it was not generated out of AS relationship pairs from RIPE or other Internetsources thus being a more synthetic network.

Nonetheless, it already shows some characteristics which can be found in realistic networks,too. For example, the graph already contains a certain hierarchy of top-level tier 1 ASes

11

CHAPTER 3. SETTING UP THE EXPERIMENTS 12

AS 4 AS 5 AS 6 AS 7 AS 8 AS 9

AS 3AS 1 AS 2

AS 10 AS 12 AS 13AS 11 AS 14 AS 15

Figure 3.1: Middle Topology (green lines are peering links)

(here AS 1, 2 and 3), parts which are more in the middle of the graph (AS 4 to 9) andASes at the bottom of the graph (AS 10 to 15) to which we frequently refer as stub ASes.It shows out that this distinction between different levels (tiers) makes sense in the Internet,too. Furthermore, here are some ASes in the graph which are multi-homed, meaning thatthey are connected with more than one provider.

For the Middle topology as well as for the other ones used in this work, it must be pointedout that the number of external links between a pair of ASes is varied according to the needsof the specific experiment. However, this will be mentioned clearly in each case.

3.1.2 Topology 1140

The main objective of investigating BGP update dynamics in a realistic environment requiresthat more or less realistic test networks are used. For this purpose, a subtree of a smallGerman ISP [8] was extracted with the Topology Extraction tool (see 2.3.1) based on thecommercial relationships between ASes measured in 2003 by [7]. Due to memory limitations,it was necessary to restrict the “number of AS hops” to one (for finding the core ASes) andthe “maximum path length” to five ASes. Table 3.1 summarizes some facts for the extractednetwork:

# ASes # external links graph degree (avg) # core ASes95 1145 24.1 5

Table 3.1: Properties of Topology 1140

Altogether we receive 95 ASes where each AS is composed of several routers organized in anI-BGP mesh. Under the assumption that every pair of ASes is only connected by one link, weobtain 1145 external links, leading to an average graph degree (average number of neighborsfor each AS) of 24.1. Though the topology graph seems to be highly meshed, the extractiontool only finds 5 core ASes in the first step. The still missing conversion from the extractedsubgraph to the DML syntax is done with the Topology Conversion tool.


3.1.3 Topology 7774

The procedure for generating Topology 7774 is basically the same as for Topology 1140. Itmainly differs in the used AS relationship pairs [7] which are here from April 2004 and thusmore up-to-date. The extraction was started from AS 7774 with the “number of AS hops”set to one (to find the core ASes) and the “maximum path length” set to five. A summaryof some characteristics is shown in Table 3.1.

# ASes # external links graph degree (avg) # core ASes105 614 11.7 3

Table 3.2: Properties of Topology 7774

Contrary to Topology 1140, it has more ASes, though the extraction was started with lesscore ASes. The number of external links is lower compared to 1140, consequently resultingin a lower average graph degree. However, the density functions in Figure 3.2 suggest thatthe number of neighbor ASes is subject to a broad distribution.

0 20 40 60 80

0.00

0.01

0.02

0.03

Topology 1140

node degree (number of neighbor ASes)

Den

sity

0 20 40 60 80

0.00

0.01

0.02

0.03

Topology 7774


Den

sity

Figure 3.2: Density functions of the node degrees (number of neighbor ASes) for Topology1140 and Topology 7774

3.2 Generation of Link Failures

Up to now, we only covered the static aspects of the simulator input, namely the generationof the topology and the interconnections between ASes. However, an important part inthe experiments are dynamic circumstances like the occurrence of link instabilities or theadvertisement of new routes and prefixes. For the testing scenarios in this work, it is sufficientto dispose of a mean of simulating link failures at a specific time. Thanks to Hagen Bohm[4], it is possible to let a link fail with the following DML extension:

link [ attach 1:1(1) attach 2:2(2) delay 0.0010 fail [ from 300 until 900 ] ]

This DML statement will make the link between router 1 in AS 1 and router 2 in AS 2 failat simulation time 300s, basically dropping all (IP) packets at one router interface. At timepoint 900s the link will recover and transport data as usually. It should be mentioned that


in general link failures are not configured manually in the DML files but with the help of theTopology Generator tool (see 2.3.2) or a special Perl script (cf. 3.2.2) which was developedfor this purpose.

Last but not least, we are interested in categorizing links in terms of their harmfulness if theyshould fail. The next subsection will illustrate what is understand by such a classification ofexternal links, whereas subsection 3.2.2 presents a script for configuring link failures dependingon the desired “failure category”.

3.2.1 Link Categories

When discussing the characteristics of Topology Middle in 3.1.1, we already alluded to thefact that realistic topologies - e.g. the Internet - obey a certain hierarchy. Indeed, thereare research papers ([5] and [6]) which seem to confirm that the autonomous systems in theInternet can be classified in different categories in terms of their commercial relationships.By convention, ASes which are at the top of the hierarchy, having no providers and onlypeering with other “top ASes” are called tier1. In our work we wanted to examine in how farthe position in this hierarchy is correlated with the harmfulness which this link has for thepropagation of updates accepted the case the link should fail.

Before classifying the external links of a network, the ASes were associated with one of thefollowing categories:

• tier1-AS : All ASes which are not connected to any provider, thus being at the top ofthe hierarchy are said to be in tier1.

• stub-AS : ASes which don’t have any customers are at the bottom of our ranking andare assigned to the category of stub entities.

• middle-AS : All ASes which don’t belong to one of the first two groups fall into thiscategory.

Starting from these categories of ASes, the external links were assigned to one of the groupsbelow:

• tier1-tier1 : Links between two tier1-ASes.

• tier1-middle: Link between tier1 and middle-AS.

• middle-middle: Link between two middle ASes.

• middle-stub: Link between middle AS and stub AS.

• stub-stub: Link between stub ASes.

Table 3.3 shows the results of this classifications for the topologies used in our simulations (itis assumed that each AS pair is only connected by one link).

All topologies have in common that they consist of only very few tier1-tier1 links with themajority of external links concentrated in the middle-middle group. This fact suggests thatthe tier1 ASes as well as the stub ASes are probably situated more on the “edge” of the


Topology # links tier1-tier1 tier1-middle middle-middle middle-stub stub-stubMiddle 27 3 11 3 8 21140 1145 3 125 627 352 387774 614 15 116 242 219 22

Table 3.3: Categorization of external links for the used topologies

topology graph. It is pointed out that Topology Middle is much smaller than the other twonetworks, only having 27 external links.

The configuration of link failures according to the just described categorization is automatizedby the Perl script CreateLinkFails.pl which will be the explained in the next subsection.

3.2.2 Failure Scenarios in the Experiments

It was one objective of this study to examine the propagation of updates depending on thecategory of the failing link. Choosing a link of the desired category can be done easily withthe Perl script CreateLinkFails.pl. As input this script requires the desired number oflinks to fails, the category of which the failing links should belong to, the DML input filefor the simulation, a failure time period and the time when the link should recover from thefailure state. According to these input parameters an appropriate failing link is configured inthe DML file as described in 3.2.

The internal proceeding of this script as basically as described in the last subsection. Afterclassifying the ASes, the external links are assigned to categories and then the links to failare chosen randomly as well as the exact failure time within the specified failure period. Inmost cases, this script will be called be other control scripts in our experiments.

3.3 Analysis of the Simulation Results

Investigating the propagation of updates and the influence of different MRAI timer settingsare an integral part of this project. From the logged BGP messages (see 2.4) the followingvalues are derived:

• Convergence times: After a link in the topology fails, BGP messages are exchangedbetween BGP speakers until all routes which were leading over the broken link areredirected to other paths. The time from when the first BGP message is sent after theoccurrence of the instability until the time when the last BGP update is received bya router will be referred to as the convergence time. With the help of the logged timestamps, it is possible to determine these convergence times.

• Number of affected ASes: Another interesting aspect consists in examining the spreadof instabilities across the topology. If a connection between two routers drops out, notnecessarily all BGP speakers will see this change, possibly due to the reason that theydidn’t route any prefix over the broken link. By looking for all ASes in the log files


which received a BGP message as a consequence of one broken link, it is possible todetermine the number or percentage of ASes which are reached by the instability.

• Propagation Radius: Concerning the propagation of updates it is interesting to knowthe distance of the affected ASes from the broken link. Basically, BGP messages spreadin all directions from where the instability occurs. By analyzing all logged messages itis possible to trace back the intermediate hops along which updates have propagateduntil reaching this AS. The propagation distance or radius is the number of hops notincluding the source node of the instability.

Analyzing the logged data must be done for each simulation run and is automatized by thePerl script LogfileAnalyzer.pl. It gets as input a file with the logged simulator output andasks additionally for the DML file which was used by SSFNet. The output of this Perl scriptis a text file containing among other things the just described result values like convergencetimes, percentage of affected ASes and propagation radius.

It deserves mentioning that the script only takes into account BGP messages for analysiswhose timestamp lies in a specified time window. In this way, it can be ensured that all con-sidered updates are exclusively affiliated with a specific instability event. Assigning the value 0for the update radius to the two ASes being incident to the failing edge, the update distance forthe other ASes can be recursively determined by defining it to be min{currentDistance, n+1}if an update message was received from an AS which has the radius n.

Usually, the script LogfileAnalyzer.pl is called by other control scripts and not startedmanually. The two main control scripts will be introduced in the next section.

3.4 Taken Experiments

Having talked about the generation of the static and dynamic properties of our experimentsand the analysis of the simulation results, this chapter deals with the “high-level” view of howwe conducted our investigations. It will be clarified what exact steps were taken to obtainthe results of Chapter 4. For both main goals of this study - the investigation of the updatepropagation and the MRAI timer influence on convergence - Perl control scripts were writtenwhich will be presented in the next two subsections.

3.4.1 Investigation of MRAI Timer Effects

Examining the properties of different MRAI timer values and the use of a per-peer versusper-prefix timer is done by the Perl script mraiInvestigation.pl. Basically, we are runningthe simulations along four dimensions: Different failure links, diverse MRAI timer values,a per-peer or per-prefix timer basis and different random seeds for the initialization of therandom number generators. The general steps taken by this script are depicted in Figure 3.3.

At the beginning of a cycle the script configures a link failure with the help of theCreateLinkFails.pl tool (compare 3.2.2). After adjusting the MRAI timer to a value out ofthe set {4s, 5s, 10s, 15s, . . . , 55s, 60s}, the timer basis is determined to be either per-peer orper-prefix. Last but not least, a seed for the random number generator of SSFNet is chosen


configure link failures in dml file

set the MRAI timer value

change timer basis: per−peer and per−prefix

choose seed for random number generators

run simulation with SSFNet

analyze the results (summary is stored to file)

START

END

Figure 3.3: Flow chart of the script mraiInvestigation.pl

out of a given set of possible seeds, all being arbitrary text strings. When the dimensionparameters have been set, the control script initiates the SSFNet simulation and has thecomputed results analyzed with the LogfileAnalyzer.pl. For each single simulation run,mraiInvestigation.pl keeps some information about convergence time and number of ex-changed updates which are summarized in a text file after all simulations (and the controlscript) have been finished.

Altogether, this control script contains four nested loops each iterating over the parametersof one so-called testing dimension. More detailed information like the number of iterationswith different random seeds is given in in Chapter 4, as some settings might vary for diversetesting series.

3.4.2 Investigation of Update Propagation

Examining update propagation properties - number of affected ASes or update radius - isdone in an analogical manner as described in the preceding subsection. Again a Perl scriptcalled updateRadius.pl is responsible for testing along three dimensions: Different categoriesof link failures (cf. 3.2.1), diverse failing links within each link category and different seedsfor the random number generator of SSFNet. Figure 3.4 illustrates the basic steps during arun of updateRadius.pl:

Here the script comprises 3 nested loops, iterating over the parameters for each so-calleddimension. First we determine to which category the failing link should belong to (stub-stub,tier1-middle, etc.). The CreateLinkFails.pl script (see 3.2.2) then configures a link to fail,whereby different links for each failure category are tested (second loop). After choosing a seedstring for the random number generator (third loop), SSFNet is started and the results are


STARTchoose a failure category

configure link failures in DML file

choose seed for random number generators

run simulation with SSFNet

analyze the results (summary is stored to file)

END

Figure 3.4: Flow chart of the script updateRadius.pl

analyzed with LogfileAnalyzer.pl. Again, we remember some results like the percentageof affected ASes for each simulation run in order to create a summary of the results beforeupdateRadius.pl is terminating.

Chapter 4

Simulation results

In order to obtain a better understanding of the underlying mechanisms of BGP, a carefuland all-embracing sensibility analysis of the protocol parameters is needed. Within the scopeof this study, we concentrated on two important aspects which we believe to be essentialfor an evaluation of BGP in terms of scalability and convergence times: the propagation ofupdates in succession to a link failure and the influence of the MRAI timer on the convergencetimes and the number of sent updates. This chapter describes the conducted experiments anddocuments the received results.

4.1 Influence of MRAI Timer Settings

The protocol specification of BGP includes several configurable timers, one of which is theMinimum Route Advertisement Interval (MRAI) timer. Being responsible for limiting thenumber of updates sent by a BGP speaker or for a certain prefix, this timer might have directinfluence on the number of updates and the convergence times after a link failure.

Part of this section are different configuration settings for the MRAI timer and their effectson the general convergence process. Main attention will be devoted to two important con-figuration options: choosing a per-peer or per-prefix timer and what timer value to take.Arising questions are for example: How do convergence times and number of updates changewith increasing value of the MRAI timer and what advantages do per-prefix timers offer incomparison to per-peer timers?

4.1.1 Varying the MRAI Timer Values

Every time a router sends a route advertisement to a neighbor it is starts a new instanceof the MRAI timer, not allowing this router to send another advertisement concerning thesame destination until the timer has expired. For this experiment we used a per-peer timerbasis and had 20 links failed at arbitrary locations within Topology 7774. The exact point oftime when the link failures occur are chosen randomly out of a time window with a length of20s in order to avoid possible synchronous runs of different timer instances. All experiments

19

CHAPTER 4. SIMULATION RESULTS 20

concerning the MRAI timer were conducted with our sample topologies having two externallinks (multi-homing) between a pair of ASes and SSLD but no WRATE being used.

Always measuring the number of exchanged external updates (updates between ASes) andthe time from the first update sent after the instability event until the time the last updatewas received, we conducted the experiment for MRAI timer values of 4s, 5s, 10s, 15s, . . . , 60s.With the help of the Perl script mraiInvestigation.pl (see 3.4.1), the simulations wereinitiated automatically, running each simulation based on three diverse seeds for the randomnumber generators of SSFNet. Furthermore, we considered four different failure scenarioswith 20 broken links for every MRAI timer value configuration.

Diagram 4.1 shows the results of these experiments. Note that all testings were done with aper-peer timer basis and that only the means of the measured number of external updatesand of the convergence times for a specific MRAI timer value are displayed.

10 20 30 40 50 60

5060

7080

9010

011

0

Convergence times

MRAI timer value in sec

conv

erge

nce

time

in s

ec (

mea

n)

10 20 30 40 50 60

1400

1500

1600

# External Updates


# ex

tern

al u

pdat

es (

mea

n)

Figure 4.1: Convergence times and number of updates depending on different MRAI timervalues (per-peer) in Topology 7774

Regarding the convergence process it can be inferred from Figure 4.1 that increasing valuesfor the MRAI timer impose a penalty for the times needed until a steady state for the routinghas been reached again. Whereas an MRAI timer of 60s requires 110s to converge, setting thetimer to 4s only leads to a period of about 40s until all updates after the link failures havebeen exchanged. The growth of the observed convergence times seems to be approximatelylinear with respect to increasing MRAI values in this experiment.

These results suggest that the rate-limiting mechanism of the MRAI timer adds more delayto the messages for higher timer values. However, it is pointed out that there exists relatedwork [1] which found out that very low MRAI timer values can cause a high workload in BGProuters thus inducing an increase in convergence times again.

The second part of Figure 4.1 depicts the number of external updates as a function of thesetting of the timer value. Here we observe the converse trend: with increasing MRAI value,the number of exchanged BGP messages is decreasing from about 1600 to about 1400s. Weexplain this observation by the dampening of some of the route oscillations which are inherentin the path-vector protocol BGP. A BGP-speaking router can collect and evaluate alternative


paths for a certain prefix before it is advertising the best path to its neighbors. Thereforeits neighbors are not exposed to every intermediate path but only to the best path within aperiod of time.

It might be asked whether it is justified to use the means of the measured data values fordrawing general conclusions on the influence of the MRAI timer parameters. Strong fluctua-tions could possible weaken the explanatory power of our results. The standard deviations forthe measured number of external updates and the convergence times are depicted in Figure4.2.

10 20 30 40 50 60

4060

8010

012

0

Convergence times


conv

erge

nce

time

in s

ec

10 20 30 40 50 60

050

010

0015

0020

00

# External Updates


# ex

tern

al u

pdat

es

Figure 4.2: Standard deviations for the measured data values in Figure 4.1

The standard deviations for the number of updates are less than 35% of the computed meansfor all MRAI timer configurations, whereas the deviations are never larger than 10% of themeans in the case of the convergence times. Although the number of updates shows a highervariability, the calculation and use of the means seems to be justified considered the fact thattwelve simulation runs were made for every MRAI timer value.

In closing, we summarize that with increasing values for a per-peer MRAI, the number ofexternal updates is decreasing at the cost of higher convergence times.

4.1.2 Per-peer and per-prefix MRAI Timers

The question was already raised whether per-prefix MRAI timers have any advantages incomparison to per-peer timers. One might expect that keeping a separate timer for eachsingle prefix being advertised to a neighboring peer does not impose so high penalties onconvergence times as using timers on a per-peer basis. The following experiment tries to lightup questions concerning the use of per-prefix MRAI timers.

In analogy to the simulation described in 4.1.1 we generate 20 link failures at arbitrarylocations of Topology 1140 and Topology 7774 which occur randomly within a time windowof 20s. Again, we measure the number of exchanged external updates and the time from thefirst update sent after the instability event until the time the last update was received by ahost (referred to as convergence time). The MRAI value was varied in the same way as in


4.1.1 but this time simulations are run for both a per-peer and a per-prefix MRAI timer. Inorder to enhance the explanatory power of our conclusions, we perform simulations on thetwo “big” topologies: Topology 1140 and Topology 7774. With the help of the Perl scriptmraiInvestigation.pl twelve simulations are run for every fixed MRAI timer value andfixed timer basis (per-peer and per-prefix) for the two topologies, as three different seeds forthe random number generator and 4 different failure scenarios are used in each case.

The diagrams in Figure 4.3 illustrate the convergence times and the number of externalupdates depending on the used MRAI value for a per-peer and a per-prefix timer basis. Note,that always the means for a specific timer value are plotted.

10 20 30 40 50 60

4060

8010

012

0

Topology 1140: Convergence times


conv

erge

nce

time

in s

ec (

mea

n)

per−peerper−prefix

10 20 30 40 50 60

4060

8010

012

0

Topology 7774: Convergence times


conv

erge

nce

time

in s

ec (

mea

n)


10 20 30 40 50 60

500

1000

1500

2000

2500

Topology 1140: # External Updates


# ex

tern

al u

pdat

es (

mea

n)


10 20 30 40 50 60

500

1000

1500

2000

2500

Topology 7774: # External Updates


# ex

tern

al u

pdat

es (

mea

n)


Figure 4.3: Comparison of per-peer and per-prefix MRAI timers in terms of convergence timesand number of external updates

Concerning the dependence between convergence times or number of updates and differenttimer values for a per-peer MRAI in Topology 1140, the reader is referred to 4.1.1, as theobservations and conclusions for this case are basically the same: increasing values for theMRAI seem to lead to less external updates but longer convergence times.

If corresponding convergence times for per-peer and per-prefix MRAI timers are comparedwith each other, it seems that per-prefix implementations offer slight advantages over thedefault per-peer timers. Whereas in Topology 1140 the convergence process is always someseconds faster for a per-prefix timer basis, this is only true up to a timer value of 25s inTopology 7774.

However, Figure 4.4 suggests that the standard deviations for the convergence times in


Topology 7774 are more than 10s for per-prefix timers set to values higher than 25s. Pos-sibly, this could explain why per-prefix MRAI timers show worse convergence behavior thanper-peer timers in that case.

In most failure scenarios per-prefix timers will have slight advantages in terms of convergencetimes over keeping one timer for every neighboring AS. Holding back all update messages toa peer independent of the concerned prefixes, a per-peer MRAI timer imposes penalties onconvergence times in comparison to per-prefix timers. This is due to the fact that timerson a per-prefix basis can “react” to each advertised prefix individually in the case of severaloverlapping link failures.

10 20 30 40 50 60

4060

8010

012

014

0

Convergence times


conv

erge

nce

time

in s

ec

10 20 30 40 50 60

010

0020

0030

0040

00

# External Updates


# ex

tern

al u

pdat

es

Figure 4.4: Standard deviations for the data values measured for the per-prefix MRAI timerin Topology 7774 in Figure (see Figure 4.3)

Continuing our discussion of the results, we hold down that in both topologies the mean of thenumber of updates remains more or less constant while varying the values of the per-prefixMRAI timers. However, strong fluctuations in the number of exchanged messages for thesingle simulation runs of a specific MRAI timer setting can be observed in Figure 4.4. Thestandard deviation is about 35% of the computed means possibly due to the statistical natureof router interactions.

Taking into account the number of BGP updates in succession to instability events, per-peertimers probably seem to be the better choice. Compared to per-peer timers, the use of MRAItimers on a per-prefix basis produces significantly more external updates. For Topology 1140we observe an average increase of 29% in the number of updates using a per-peer instead ofa per-prefix timer, in Topology 7774 there is even an increase of 44%.

This might be explained by the fact that a per-prefix timer does not hold back update messagesfor the same neighboring AS if different prefixes are concerned. In such a scenario a timeron the per-peer basis could reduce the number of BGP messages which are passed on to theneighbors.

If the results also proved true in additional experiments and for other topologies, this wouldjustify the configuration of the MRAI timers as per-peer, being the default setting in thewidespread Cisco and Juniper routers. Nonetheless, it should be questioned whether the


default configuration of the timer value to be near 30s is the best one possible for balancingbetween a low router workload and fast convergence times.

4.2 Propagation of Updates

Scalability is an important issue inherent in many fields of network research. Especiallyin distributed protocols like BGP which is responsible for maintaing connectivity betweenautonomous systems in the Internet, it is of great importance to gain an understanding ofhow protocol behavior changes with increasing size of the network. This section deals withthe investigation of update propagation after a link has failed somewhere in the topology.Arising questions are for example: how many ASes receive an update message in successionto a link failure and how far away from the broken link can the instability still be perceived?Last but not least, we want to investigate whether the classification of links according to theircommercial relationships reflects the harmfulness in terms of the number of affected ASes ifthis link should fail.

4.2.1 Experiment Description

For this experiment we always produced one single link failure such that all updates sentafterwards must be related with this instability. As described in section 3.3, the number ofaffected ASes can be easily determined by considering all ASes receiving a BGP message insuccession to the failure event. The second point of interest is the distance updates propagatethrough the topology when a connection between two ASes breaks. We refer to that distanceas the propagation radius, measuring it in the number of AS hops not including the nodesincident on the failed edge.

In order to approximate the circumstances in the Internet where Cisco routers are the over-whelming majority, we use per-peer MRAI timers (timer value normally distributed between25s and 31s), SSLD but no WRATE. All simulations take place in our sample topologiesMiddle Topology, Topology 1140 and Topology 7774. By only permitting one link between apair of ASes (no multi-homing), it is ensured that each failure of an external link leads to achange in the inter-AS routing of prefixes observable by other ASes.

Configuring a link failure in the DML file is done by the script CreateLinkFails.pl (refer to3.2.2), making it possible to choose systematically the failing edge such that it belongs to oneof the categories described in 3.2.1. CreateLinkFails.pl is invoked from the general controlscript updateRadius.pl (see 3.4.2) which is running simulations in every failure categoryfor ten different failure scenarios. In order to mask statistical properties, all simulations arestarted with three different seeds for the random number generator with otherwise identicalparameters.

The results of these experiments will be discussed now in the following two subsections.


4.2.2 Number of affected ASes after a link failure

Figure 4.5 depicts the ratio of ASes (in percentage) receiving an update in succession to alink failure. In the used topologies, we distinguished between the type of the failing link andplotted only the mean value of all simulations runs for a specific failure category.

Middle 1140 7774

tier1−tier1tier1−middlemiddle−middlemiddle−stubstub−stub

ratio

of r

each

ed A

Ses

(m

ean)

0.0

0.2

0.4

0.6

0.8

1.0

Figure 4.5: Average percentage of ASes receiving updates after a failure broken down bydifferent link categories

First of all, it can be stated that in general not all ASes of the investigated networks areaffected by the link failure. Prefixes which have been routed over the broken link must beredirected to new AS paths which possibly do not differ completely from the original pathbut have some ASes in common. Taking into account the maximum of the means of all fivelink categories, the percentage of affected ASes is not larger than 84% in Middle Topology,38% in Topology 1140 and 59% in Topology 7774.

However, one has to be very careful to draw define conclusions. Figure 4.6 suggests thatthere are strong fluctuations in the number of reached ASes for all failure categories. Atthe example of Topology 7774, histograms illustrate how the computed mean values for thepercentage of affected ASes in Figure 4.5 emanate from the values measured in the differentsimulation runs for each link failure category. The distribution of the ratios of ASes receivingupdates after the instability event shows similar deviations in Topology 1140 and is thereforenot presented here.

Furthermore, we can read off Figure 4.5 that if an external link between two stub ASes fails,the ratio of reached ASes is very low for all topologies. Actually, examining the results of thesingle simulations shows that in such a case only two ASes are affected by the broken link:the ASes incident on the failure edge. This is due to the fact that the connection between twostub ASes is a peering link, which is not supposed to be used by other ASes and on whichtherefore no prefixes are advertised. As always only two ASes are affected by the instabilityevent, we didn’t generate a histogram for the stub-stub link failure in Figure 4.6. Note, that


Middle Topology is very small compared to the other networks, leading to a higher percentageof ASes receiving updates after a link failure in Figure 4.5.

tier1−tier1 links

ratio of ASes receiving updates after a link failure

Fre

quen

cy

0.0 0.2 0.4 0.6 0.8 1.0

05

1015

tier1−middle links


Fre

quen

cy

0.0 0.2 0.4 0.6 0.8 1.0

05

1015

middle−middle links


Fre

quen

cy

0.0 0.2 0.4 0.6 0.8 1.0

05

1015

middle−stub links


Fre

quen

cy

0.0 0.2 0.4 0.6 0.8 1.0

05

1015

Figure 4.6: Histograms of the ratio of ASes receiving updates after a link failure in Topology7774 broken down by different link categories

We already mentioned that it is critical to predict the ratio of affected ASes based on theclassification to which the failing link belongs to. Studying Figure 4.6, only very vaguestatements can be made on the relationship between the harmfulness of links if they shouldfail and their membership in one of our link categories. For example, it seems that in Topology7774 broken connections between middle ASes are noticed by slightly less ASes than it wouldbe the case if a link to a tier-1 AS failed.

Furthermore, it is surprising that in many simulation runs, link failures between a stub ASand a middle AS are very harmful, inducing the propagation of updates to nearly all ASes,whereas in many other simulations for the same failure category basically no ASes are reachedat all by updates. Similar behavior is seen for the other link failure categories.

The question arises whether it makes sense to investigate the propagation of updates brokendown by different link categories, as for all categories strong fluctuations in terms of theASes affected by the failure event were observed. We believe it to be essential that furtherexperiments with different and probable more realistic topologies are conducted before a finalstatement on this issue can be made.

Speculating on the reasons for these strong fluctuations, it might be interesting to examine therelationship between the degrees of the nodes incident on the failing edge and the percentage


of affected ASes. In a highly meshed network there exist many alternative paths to the samedestination with the result that possibly only very few ASes have selected a best route whichis running over the broken link. Are the node degrees a better metric for predicting theharmfulness of a link failure than the classification presented in this work?

Trying to answer this question it might be helpful to consult Table 4.1. It contains the averagenumber of neighbors for each AS in terms of the category of the link failure. Behind eachtable entry, the standard deviations are indicated in brackets.

topology total tier1 middle stubMiddle 2.9 (1.1) 4.0 (1.0) 3.3 (0.9) 2.0 (0.5)1140 24.1 (17.0) 20.0 (11.3) 29.9 (17.1) 14.7 (13.4)7774 11.7 (13.2) 28.6 (13.1) 20.9 (14.2) 5.0 (6.6)

Table 4.1: Average node degrees (standard deviations in brackets) broken down by categories

First of all, the average total node degrees show that the used topologies differ stronglyconcerning their meshing degree. While Topology 7774 is more highly meshed with a mean of24.1 neighbors per AS compared to 11.7 neighboring ASes in Topology 1140, the very smallMiddle Topology only has an average node degree of 2.9. Further investigation is needed todetermine if these meshing degrees influence the ratio of affected ASes after a link failure.

Comparing the average node degree broken down by categories for Topology 7774 and Topology1140 it seems interesting that they show converse characteristics for the tier-1 ASes. Whereasthe node degrees for the ASes which we assigned to the tier-1 group are on average higherthan those of the so-called middle ASes (28.6% compared to 20.9%) in Topology 7774, it isjust the other way round in Topology 1140 (here 20.0% to 29.9%). In generally, it is believedthat ASes in the tier-1 level are situated in the center of the network graph, neighboringto more ASes than non-tier-1 ASes. Future work should pay more attention to model thehierarchical structure of the Internet in a way such that topological characteristics like themeshing degree are not falsified.

The standard deviations printed in brackets behind each data value of Table 4.1 alreadyindicate that there are again some fluctuations in the number of neighbors which ASes withina certain category have. These deviations are illustrated in a more elaborate manner byFigure 4.7.

Estimated density functions of the node degree distribution are drawn for the different AScategories - tier-1, stub and middle - for Topology 1140 and Topology 7774. The main conclu-sion to be drawn of these density plots is possibly the insight that it is dangerous to consideronly the mean values, as strongly varying node degrees can be observed.

Nonetheless, we point out that tier-1 ASes in Topology 7774 are connected to 28.6 neighborson average whereas the node degree for the same category is only 20.0 in Topology 1140 (see4.1). If the theory holds that the high number of alternative paths to the same destination ina highly meshed part of a network has the consequence that only very few ASes have selecteda best route which is running over the broken link, we could conclude that this is the reasonfor the fact that a tier-1 link failure is much more harmful in Topology 7774 than in Topology1140 (compare 4.5). But again, it is not possible to make a fixed statement as for examplethe results for the middle-stub links contradict this theory.


0 20 40 60 80

0.00

00.

010

0.02

00.

030

Topology 1140


Den

sity

tier1 ASesmiddle ASesstub ASes

0 20 40 60 80

0.00

00.

010

0.02

00.

030

Topology 7774


Den

sity

tier1 ASesmiddle ASesstub ASes

Figure 4.7: Density functions of the node degrees (number of neighbor ASes) broken downcategories

In closing, we summarize that further investigations are needed to explore the relationshipbetween the commercial classification of external links and their harmfulness in terms ofthe number of affected ASes if this link should fail. Maybe, some new criteria needs to bedeveloped for a more meaningful categorization of external connections between ASes.

4.2.3 Propagation Radius

Except for the ratio of affected ASes, the second point of interest are the distances updatespropagate through the topology when a connection between two ASes breaks. How this so-called update radius can be measured was already described in 3.3. Figure 4.8 depicts thecomputed distances in the number of AS hops not including the nodes incident on the failededge. Again, 10 different failure scenarios were tested for all link failure categories, every timeusing three different seeds for the random number generator. Whereas the mean values ofall simulation runs are plotted in the left bar-plot of Figure 4.8, the right diagram shows themaximum update radius observed for a series of experiments for a specific failure category.

Maybe the most conspicuous result is that the maximum update radius is always less than 4meaning that in no case updates spread more than 4 AS hops away from the source of theinstability event. We lead this back to the high average meshing degrees of our topologies(compare Table 4.1: 11.7 for Topology 7774 and 24.1 for Topology 1140) where possibly notmany “best” routes were using the broken link. Due to its small size and synthetic nature,Middle Topology can only be used in a restricted manner to draw significant conclusions.

The mean values for the update radius, shown in the left bar-plot, are always in the rangebetween 0.53 and 1.56 AS hops for all failure categories. In this context we point out that themean values are computed on the basis of only the ASes which receive an update in successionto a link failure. If only a small ratio of ASes is affected, the weight of the ASes, which areincident on the failing edge, is very high in the computation of means, leading to an averagevalue for the propagation distance of less than one AS hop. Again we place emphasis on the


Middle 1140 7774


# A

S h

ops

(mea

n)

01

23

45

Middle 1140 7774


# A

S h

ops

(max

)

01

23

45

Figure 4.8: Propagation distance of updates in the case of a link failure (measured by thenumber of hops)

short mean distances updates propagate through the topologies after a link failure, but pointout at the same time to strong fluctuations in the single experiments.

Concerning the distinction between different failure categories, it seems difficult to derive anytrends in terms of their harmfulness out of the diagrams. More research is needed to explorethis issue in a more detailed manner.

Last but not least, the left diagram in Figure 4.8 and Figure 4.5 can be compared with eachother: the relationship of the ratios of reached ASes (means) for different failure categories ispredominantly reflected in the relationship of the average propagation radius. For example,if the mean percentage of reached ASes after the failure of a tier-1 -middle link is higher thanthat for a broken tier-1 -tier-1 link in Topology 1140, this fact can be seen in a higher meanpropagation radius for the tier-1 -middle link category in Figure 4.8.

In closing, we point out that all these observations need to be confirmed by further investi-gations.

Chapter 5

Conclusions and directions forfuture work

In closing, we summarize the results of the performed sensibility analysis of BGP convergenceand scalability, obtained by using the SSFNet simulator.

One of the main objectives of this work was the examination of the influence of the MRAItimer configuration on convergence times and number of sent external updates. The resultsfrom Chapter 4 basically confirmed that a higher timer value leads to less exchanged updatemessages at the cost of higher convergence times. In most cases a per-prefix timer basisoffers only slight advantages in terms of needed convergence times in comparison to per-peertimers. However, the number of external updates is largely increased when keeping timerson a per-prefix basis. Further investigations have to show whether these findings justify thedefault configuration of MRAI timers on a per-peer basis as it is done by the market leadersCisco and Juniper.

Except for MRAI timer influences we tried to explore how far updates propagate through thetopology after a link failure. We found out that for our test topologies updates are never seenmore than 4 AS hops away from the broken edge. Altogether instabilities do not seem tospread very strongly, being relatively locally limited. Concerning the number of ASes affectedby the failure of a link, strong fluctuations depending on the broken link were observed. Ourcategorization of links according to the commercial relationship between the connected ASestherefore does not seem very promising. However future experiments have to confirm theseresults.

Altogether, there remains a lot of work to do. Using MRAI timers on a per-prefix instead ofa per-peer basis imposes a higher workload on the routers as different timer instances haveto be kept for all different prefixes. It would be interesting to examine the joint influence ofworkload and the timer basis on the overall convergence process. Maybe, the use of per-peertimers is then even more justified

The SSFNet BGP implementation contains some simplifications which might be relevant foran all-embracing analysis. Particular emphasis could be placed on the investigation of theroute flap dampening. This mechanism was already added to the used BGP implementa-tion and it could be worthwhile to study if route flap damping is only invoked by network

30

CHAPTER 5. CONCLUSIONS AND DIRECTIONS FOR FUTURE WORK 31

instabilities or also by oscillations which are inherent in the BGP protocol.

In our point of view, the generation of realistic topologies is essential for an analysis of BGPbehavior. In that respect, efforts can be made to improve the internal and external structureof our sample networks. Up to now, the BGP meshes within autonomous systems are ratherstatical always consisting of a ring of route reflectors and some border routers connecting toother ASes. Maybe, other structures within the ASes can be considered in future testings.Regarding the external topology, it is desirable to reproduce the hierarchy inherent in theInternet as best as possible in our test topologies. For example, the distribution of nodedegrees - number of neighbor ASes for an AS - should follow the same patterns as in theInternet.

Considering all the simplifications made in our models and the remaining open questions,there remains a lot of research to do in this field.

List of Figures

2.1 Structural overview of the SSFNet simulator . . . . . . . . . . . . . . . . . . . 62.2 Generation of DML files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.1 Middle Topology (green lines are peering links) . . . . . . . . . . . . . . . . . 123.2 Density functions of the node degrees (number of neighbor ASes) for Topology

1140 and Topology 7774 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.3 Flow chart of the script mraiInvestigation.pl . . . . . . . . . . . . . . . . 173.4 Flow chart of the script updateRadius.pl . . . . . . . . . . . . . . . . . . . . 18

4.1 Convergence times and number of updates depending on different MRAI timervalues (per-peer) in Topology 7774 . . . . . . . . . . . . . . . . . . . . . . . . 20

4.2 Standard deviations for the measured data values in Figure 4.1 . . . . . . . . 214.3 Comparison of per-peer and per-prefix MRAI timers in terms of convergence

times and number of external updates . . . . . . . . . . . . . . . . . . . . . . 224.4 Standard deviations for the data values measured for the per-prefix MRAI

timer in Topology 7774 in Figure (see Figure 4.3) . . . . . . . . . . . . . . . . 234.5 Average percentage of ASes receiving updates after a failure broken down by

different link categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.6 Histograms of the ratio of ASes receiving updates after a link failure in Topol-

ogy 7774 broken down by different link categories . . . . . . . . . . . . . . . . 264.7 Density functions of the node degrees (number of neighbor ASes) broken down

categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.8 Propagation distance of updates in the case of a link failure (measured by the

number of hops) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

32

List of Tables

3.1 Properties of Topology 1140 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.2 Properties of Topology 7774 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.3 Categorization of external links for the used topologies . . . . . . . . . . . . . 15

4.1 Average node degrees (standard deviations in brackets) broken down by categories 27

33

Bibliography

[1] Timothy G. Griffin and Brian J. Premore. An Experimental Analysis of BGP Conver-gence Time. 2001.

[2] SSFNet 1.5, Raceway SSF, Raceway DML. Renesys Corporation. http://www.ssfnet.org, May 2003.

[3] Andreas Hartl. Examining BGP update dynamics using network simulation. 2004.

[4] Hagen Bohm. Analysis of OSFPv2-BGP4 Interactions Using the SSFNet Simulator.2003.

[5] Hongsuda Tangmunarunkit, Ramesh Govindan, Sugih Jamin, Scott Shenker, and WalterWillinger. Network Topologies, Power Laws, and Hierarchy. 2001.

[6] Lixin Gao. On Inferring Autonomous System Relationships in the Internet. 2001.

[7] A. Feldmann and A. Wichmann. Extracting AS relationship information from raw BGPdata, work in progress.

[8] Saargate-AS9063. http://www.saargate.de.

[9] John W. Stewart III. BGP4 - Inter-Domain Routing in the Internet. Addison WesleyLongman, Inc, 1999.

[10] Timothy G. Griffin. Interdomain routing links. http://http://www.cambridge.intel-research.net/~tgriffin/interdomain.

[11] Guido Kruger. GoTo Java 2, 2. Auflauge. Addison Wesley Longman Verlag GmbH,2001.

[12] Michael Schilli. GoTo Perl 5. Addison Wesley Longman Verlag GmbH, 2000.

[13] Zhuoqing Morley Mano, Ramesh Govindan, George Varghese, and Randy H. Katz. RouteFlap Damping Exacerbates Internet Routing Convergence. 2002.

[14] Gianluca Iannaccone, Chen-nee Chuah, Richard Mortier, Supratik Bhattacharyya, andChristophe Diot. Analysis of link failures in an IP backbone. 2002.

[15] Olaf Maennel and Anja Feldmann. Realistic BGP Traffic for Test Labs. 2002.

34

Sensibility analysis of BGP convergence and scalability … analysis of BGP convergence and...

Documents

Transcript of Sensibility analysis of BGP convergence and scalability … analysis of BGP convergence and...