3.2.7. The Interior Gateway Routing Protocol: OSPF

Today, the Internet is made up of large number of autonomous systems (AS). Each AS is operated by a different organization and can use its own routing algorithm inside.

A routing algorithm within an AS is called an interior gateway protocol. An algorithm for routing between ASes is called exterior gateway protocol.

The original Internet interior gateway protocol was a distance vector protocol (Routing Information Protocol - RIP) based on Bellman-Ford algorithm. Gateways broadcast messages containing information from their routing databases. Each message consists of pairs, where each pair contains an IP network address and an integer distance to that network in hop count metrics. Other gateways listen to the broadcasted messages and update their tables according to the vector-distance algorithm. This protocol worked well in small systems, but less well as ASes got larger.

In 1979, the RIP was replaced by a link state protocol based on computing the shortest path to single routers. In 1990, a successor of this protocol, OSPF (Open Shortest Path First), became a standard (RFC 1247).

Requirements taken into account when OSPF was designed:

the algorithm has to be published in open literature,
it has to support the variety of distance metrics,
it has to be based on dynamic algorithm,
it has to support routing based on types of services,
it has to do load balancing, splitting the load over multiple lines,
support for hierarchical systems is needed, no router could be expected to know the entire topology,
some extent of security is required,
provision is needed for dealing with routers connected to the Internet via a tunnel.

OSPF supports three kinds of connections and networks:

Point-to-point lines between exactly two routers.
Multiaccess networks with broadcasting (most LANs).
Multiaccess networks without broadcasting (most packet switched networks).

A multi-access network is one that can have multiple routers on it, each of which can directly communicate with all others. All LANs and WANs have this property (Fig. 5-52(a)). Hosts do not generally play a role in OSPF.

Fig. 5-52. (a) An autonomous system. (b) A graph representation of (a).

OSPF works by abstracting the collection of actual networks, routers, and lines into a directed graph in which each arc is assigned a cost. It then computes the shortest path based on the weight on the arcs. A serial connection between two routers is represented by a pair of arcs one in each direction. Their weights may be different.

A multi-access network is represented by a node for the network itself plus a node for each router. The arc from the network node to the routers have weight 0 and are omitted from the graph. Fig. 5-52(b) shows the graph representation of the network of Fig. 5-52(a).

OSPF represents the actual network as a graph and then compute the shortest path from every router to every other router.

OSPF allows to divide ASes into numbered areas. Area is a network or a set of contiguous networks. Areas do not overlap but need not be exhaustive, i.e., some routers may belong to no area. An area is a generalization of a subnet. Outside an area, its topology and details are not visible.

Every AS has a backbone area called area 0. All areas are connected to the backbone. Each router that is connected to two or more areas is part of the backbone.

Within an area, each router has the same link state database and runs the same shortest path algorithm.

OSPF handles different types of service routing in a different way. For each service it maintains the corresponding graph and makes special computations.

During the normal operation, three kinds of routes may be needed:

intra-area - here the source router already knows the shortest path to the destination router,
inter-area,
inter AS.

Inter-area routing proceeds in three steps:

go from the source to the backbone,
go across the backbone to the destination area,
go to the destination.

Packets are routed from source to destination "as is", they are not encapsulated or tunneled unless going to an area whose only connection to the backbone is a tunnel.

Fig. 5-53 shows part of Internet with ASes and areas.

Fig. 5-53. The relation between ASes, backbones, and areas in OSPF.

OSPF distinguishes four classes of routers:

internal routers are wholly within one area,
area border routers connect two or more areas,
backbone routers are on the backbone,
AS boundary routers talk to routers in other ASes.

These classes are allowed to overlap.

When a router boots, it sends HELLO messages on all of its point-to-point lines and multicasts them on LANs to the group consisting of all other routers. From the responses, each router learns who its neighbors are.

OSPF works by exchanging information between adjacent routers which is not the same as between neighboring routers. On each LAN, one router is selected as designated router. It is said to be adjacent to all other router and exchanges information with them. A backup designated router is always kept up-to-date to easy transition should the primary designated router crash.

Fig. 5-54. The five types of OSPF messages.

During normal operation, each router periodically floods LINK STATE UPDATE messages to each of its adjacent routers. This message gives its state and provides the costs used in topological database. These messages are acknowledged to make them reliable. Each message has a sequence number, so a router can see whether an incoming LINK STATE UPDATE is older or newer that what it currently has. Routers also send these messages when a line goes up or down or its cost changes.

DATABASE DESCRIPTION messages give the sequence numbers of all the link state entries currently held by the sender. By comparing its own values with those of the sender, the receiver can determine who has the most recent values.

Either partner can request link state information from the other one using LINK STATE REQUEST messages. So each pair of adjacent routers checks to see who has the most recent data, and new information is spread throughout the area this way.

So, the complete picture is the following: Using flooding, each router informs all the other routers in its area of its neighbors and costs. This information allows each router to construct the graph for its area(s) and compute the shortest path. The backbone area does this too.

3.2.8. The Exterior Gateway Routing Protocol: BGP

Exterior gateway protocol is used between ASes and unlike interior gateway protocol it has to worry about politics. For example, a corporate AS might be unwilling to carry transit packets originating in a foreign AS and ending in a different foreign AS. Exterior gateway protocols in general, and BGP in particular, have been designed to allow many kinds of routing policies to be enforced in the inter AS traffic.

Typical policies involve political, security, or economic considerations. Some examples:

No transit traffic through certain ASes.
Never put Iraq on a route starting at the Pentagon.
Do not use the United States to get from British Columbia to Ontario.
Only transit Albania if there is no alternative to the destination.
Traffic starting or ending at IBM should not transit Microsoft.

Policies are manually configured into each BGP router. They are not part of the protocol itself.

From the point of view of a BGP router, the world consists of other BGP routers and the lines connecting them. Two BGP routers are considered connected if they share a common network.

Given BGP's special interest in transit traffic, networks are grouped into one of three categories:

Stub networks - they have only one connection to the BGP graph. Therefore, these cannot be used for transit traffic.
Multiconnected networks - they could be used for transit traffic, except that they refuse.
Transit networks - such as backbones, they are willing to handle third-party packets, possibly with some restrictions.

Pairs of BGP routers communicate with each other by establishing TCP connections. Operating this way provides reliable communication and hides all the details of the network being passed through.

BGP is fundamentally a distance vector protocol, but quite different from most other such as RIP. Instead of maintaining just the cost to each destination, each BGP router keeps track of the exact path used. Similarly, instead of periodically giving each neighbour its estimated cost to each possible destination, each BGP router tells its neighbours the exact path it is using (Fig. 5-55). Every BGP router contains a module that examines routes to a given destination and scores them, returning a number for the "distance" to that destination for each route. Any route violating a policy constraint automatically gets a score of infinity. The router then adopts the route with the shortest distance. The scoring function is not part of the BGP protocol and can be any function the system manager wants.

Fig. 5-55. (a) A set of BGP routers. (b) Information sent to F.

The current definition of BGP is in RFC 1654. Additional useful information can be found in RFC 1268.

3.2.9. CIDR - Classless InterDomain Routing

IP has worked extremely well, as demonstrated by the exponential growth of the Internet. Unfortunately, IP is rapidly becoming a victim of its own popularity: it is running out of addresses.

In principle, over 2 billion addresses exist, but the practice of organizing the address space by classes wasted millions of them. The biggest problems are with class B network. For most organizations, a class A network, with 16 million addresses is too bog, and a class C network, with 256 addresses is too small. So they asked for class B network having 65536 addresses. In reality, a class B address is far too large for most organizations. In retrospect, it might have been better to have had class C networks use 10 bits instead of eight for the host number.

There is also another problem: the size of routing tables and the increasing complexity of routing algorithms. There are many solutions proposed but usually they solve one problem and create a new one.

One solution that is now being implemented and which will give Internet a bit of extra breathing room is CIDR - Classless InterDomain Routing. The basic idea behind CIDR (described in RFC 1519), is to allocate the remaining class C networks, of which there are almost 2 million, in variable-size blocks. If a site needs, say, 2000 addresses, it is given a block of 2048 addresses (eight continuous class C networks), and not a full class B address.

The allocation rules for the class C addresses were also changed (RFC 1519). The world was partitioned into four zones, and each one given a portion of the class C address space. The allocation was as follows:

Addresses 194.0.0.0 to 195.255.255.255 are for Europe.

Addresses 198.0.0.0 to 199.255.255.255 are for North America.

Addresses 200.0.0.0 to 201.255.255.255 are for Central and South America.

Addresses 202.0.0.0 to 203.255.255.255 are for Asia and the Pacific.

In this way, each region was given about 32 million addresses to allocate, with another 320 million class C addresses from 204.0.0.0 through 223.255.255.255 held in reserve for the future. The advantages of this allocation is that now any router outside Europe that gets a packet addressed to 194.xx.yy.zz or 195.xx.yy.zz can just send it to the standard European gateway having 32 million addresses now compressed into one routing table entry.

Of course, once a 194.xx.yy.zz packet gets to Europe, more detailed routing tables are needed. One possibility is to have 131072 entries for networks 194.0.0.xx through 195.255.255.xx, but this is precisely this routing table explosion we are trying to avoid. Instead, each routing table entry is extended by giving it a 32-bit mask. When a packet comes in, the routing table is scanned entry by entry, masking the destination address and comparing it to the table entry looking for a match.

Example: Let Cambridge University needs 2048 addresses and it is assigned the addresses 194.24.0.0 through 194.24.7.255, along with a mask 255.255.248.0. Next, Oxford University asks for 4096 addresses. Since a block of 4096 addresses must lie on a 4096-byte boundary, they cannot be given addresses starting at 194.24.8.0. Instead they get 194.24.16.0 through 194.24.31.355 along with mask 255.255.240.0.Now the University of Edinburg asks for 1024 addresses and is assigned addresses 194.24.8.0 through 194.24.11.255 and mask 255.255.252.0.

The routing tables all over Europe are now updated with the following entries:

Address Mask
11000010 00011000 00000000 00000000 11111111 11111111 11111000 00000000
11000010 00011000 00010000 00000000 11111111 11111111 11110000 00000000
11000010 00011000 00001000 00000000 11111111 11111111 11111100 00000000

When a packet addressed 194.24.17.4, which in binary is:

11000010 00011000 00010001 00000100

comes in, first, it is Boolean ANDed with the Cambridge mask to get

11000010 00011000 00010000 00000000

which does not match the Cambridge base address, so the original address is next ANDed with the Oxford mask to get

11000010 00011000 00010000 00000000

This value does match the Oxford base address, so packet is sent to the Oxford router. It is possible for two entries to match, in which case the one whose mask has the most 1 bits wins. The same idea can be applied to all addresses, not just the new class addresses, so with CIDR, the old class A, B, and C networks are no longer used for routing. This is why CIDR is called classless routing.

Address	Mask
11000010 00011000 00000000 00000000	11111111 11111111 11111000 00000000
11000010 00011000 00010000 00000000	11111111 11111111 11110000 00000000
11000010 00011000 00001000 00000000	11111111 11111111 11111100 00000000