Digital Communications

Link characteristics

  • bandwidth

  • delay (light travels at 0.7c in fibre)

  • attenuation

 

Telephone network

  • basic service

    • two way voice

    • low end-end delay

    • guarantee of (bandwidth and buffer) resources to an accepted call

    • connected by a virtual circuit

  • overall design

    • fully connected core

      • simple routing

      • hierarchically allocated telephone number space

    • end system

      • must be powered from the exchange

      • generally analogue

      • instead of two wires for transmit and receive circuits, use the same circuit

        • has sidetone (local)

          • we want this to stop people shouting

        • and echo (far end)

          • some received signal is transmitted back

          • need expensive echo cancellers for long distance calls

      • dialling

        • pulse dialling: send pulse per unary coded digit

        • tone dialling:

          • dual tone multifrequency

          • Pressing a single key (such as '1' ) will send a sinusoidal tone of the two frequencies (697 and 1209 hertz (Hz))

    • transmission

      • multiplexing

        • analogue

          • band limit call to 3.4kHz and frequency shift

          • was used to run one cable to two homes, was problem when wanting to upgrade to broadband

        • digital

          • convert voice to samples (8 bits/sample, 8000 samples/sec, call 64 kb/s)

          • interleave samples from different calls

          • use 256 logarithmically spaced quantization levels (as dB is logarithmic)

        • TDM

          • trunk works faster rate than inputs

          • interleave samples (serve all inputs in time it takes one sample to arrive)

          • multiplexed trunks can be multiplexed on faster trunls

        • syandards

          • Digital Signalling (in US)

          • SONET defined to standardise but was implemented a little differently in US

      • Link (physical media of transmission)

        • cost is in installation, not the link

        • builders install cat 5, fibre and coax to every room

        • over-provision long distance links, unless undersea

        • Fibre:

          • < 10 Pbps capacity

          • little attenuation

          • generally only thermal noise

          • types:

            • multimode

              • cheap (uses LEDs)

              • suffers inter symbol interference

              • short distance (a couple km)

            • single mode (only one solution to wave equation)

              • expensive (uses lasers)

              • long distance (hundreds km)

        • satellites

          • geosynchronous

            • 36000km away

            • propagation delay of 250ms

          • low earth orbit

            • appear to move in the sky

            • need more

            • handover is tricky

    • Switching

      • two parts to the system

        • switch (data plane)

        • switch controller (control plane)

    • Signalling

      • controller does not touch voice samples

      • manages network (routing, billing etc.)

      • switch controllers are separate from the switching hardware

      • have to keep track of the state at each endpoint (each phone, whether idle, dial tone, talking, etc.)

  • Cellular communication

    • Establishing cellular communication:

      • there aren't enough frequencies to give each mobile a permanent frequency

      • temporal reuse: if mobile is off, don't assign frequency to it

      • spatial reuse: mobiles in non-adjacent cells can use the same frequency

    • how complete a call to a mobile?

      • need to know where it is

    • how deal with moving mobile phone?

      • need to send packet sooner if cell is further from base station

      • hand off between base stations?

  • Issues with telephone system

    • critical systems (power grid control) built on assumptions

      • what about changing to allow multimedia stuff?

 

Internet

  • administered by a collection of entities

  • key technical innovations

  1.  
    1. packets

  •  
    • self descriptive data (data + metadata)

    • better than samples

      • we have to know where it is going

      • can't store it

  1.  
    1. store and forward

  •  
    • metadata allows us to forward packets when we want (e.g. bundle of letters to one PO)

    • allows efficient use of critical resources

    • problems

      • hard to control delay

      • switches need buffer memory

      • convergence of flow leads to congestion

  • since have IP layer, can have different underlying layers and link lots of different networks

  • addressing

    • IP address made up of network number and host number

    • network number

      • early system: Class A, B, C networks

        • class A: 8 bit network number

        • class B: 16 bit network number

        • class C: 24 bit network number

      • CIDR

        • use a netmask

        • all routers in the internet know the mask

          • for a subnet mask only know what it means inside the LAN

  • routing

    • need to know next hop for each network on the internet

    • normally just keep local ones, otherwise send to default router

    • reduces routing table at expense of possible non-optimal routing

  • endpoint control

    • use a dumb network

    • do as much as possible at the edge

    • higher layer (TCP) deals with network defects

      • run over any lower layers, but no QoS

      • need to to end to end checks

  • challenges

    • IP address space shortage

    • decentralisation

      • makes reliability next to impossible

      • cannot guarantee that route even exists

      • anyone can join

      • no uniform solution for accounting/billing

      • non-optimal routing

    • multimedia

      • needs QoS

      • Internet has no way to identify streams

      • routers not required to cooperate

 

 

ATM networks

  • design goals

    • end to end QoS

    • high bandwidth

    • scalable

    • manageable

    • cost effective

  • concepts

    • virtual circuits

      • actual circuits (POTS) have drawbacks

        • idle users consume bandwidth

        • quantization of link capacity

      • carry only identifier in the header (saves space)

        • connection needs to be pre-established

        • also we need to translate IDs at each switch point

        • all packets must follow same route

        • easier to look up in hardware

      • use signalling to separate control from data

    • fixed sized packets

      • pros

        • simpler buffering hardware

        • simpler scheduling

        • easier to build parallel switches

      • cons

        • overhead for sending small amounts of data

        • segmentation and reassembly cost

        • last cell normally wastes bandwidth

    • small packet size

      • balance packetization delay (time to fill a packet) vs. overhead of packet data/header

      • set at 48 bit for data, 5 bit for header

    • statistical multiplexing

      • each connection gives information about transmission (average rate, max peak rate, max peak length)

      • can trade off worst case delay against speed of output trunk

      • need more buffering than originally thought

    • integrated service

      • voice, video and data on the same network

      • hardware support for switching different data types

      • do by:

        • signalling

        • admission control

        • resource reservation

        • lots of bandwidth

 

  • challenges

    • QoS (defined but not used)

    • scaling

    • competition from other LAN techs (esp. Ethernet as it is cheap because of volume of people using it)

    • interoperation with IP (vast, fast growing non-ATM structure)

      • connection vs connectionless, etc.

 

Protocol Layering

  • protocol is a set of rules and formats that govern communication between communicating peers (Customer A and Customer B)

    • set of valid messages

    • meaning of each message

  • tells us

    • syntax of a message (fields/format etc.)

    • semantics of a message (e.g. a nack means received a corrupted file)

    • actions to take on receipt of a message (e.g. on nack retransmit)

  • can view protocol as providing a service

    • some protocols rely on other ones (layered underneath it)

      • e.g. reliable file transfer relies on packet transfer

  • terminology

    • Service Access Point (SAP): interface between upper and lower layer

    • Service Data Unit (SDU): packets hander to a lower layer by an upper

    • Protocol Data Unit (PDU): packets exchanged between peer entities

      • PDU = SDU + optional header/trailer

  • protocol stack

    • allows implementation/information hiding

    • allows upper layers to share lower layer functionality

  • actually there is a tension between abstraction and good performance

    • e.g. with TCP assuming packet loss always implies congestion

    • have to leak some information to get good run time

 

ISO Open System Interconnect protocol

  • defined by:

    • reference model: formally defines a layer/service etc.

    • service architecture: describes services provided by the layer and the service access point

    • protocol architecture: set of protocols which implement the service architecture

  • Physical layer

    • moving bits on physically connected end systems

    • prescribes

      • coding scheme to represent a bit

      • shapes and sizes of connectors

      • bit-level synchronisation

 

  • Datalink layer

    • have a frame (bits that belong together)

      • beginning and end markers

    • have idle markers for when a frame is not being transmitted

    • done in software but on the custom hardware (network card)

    • e.g. Ethernet

      • since this is broadcast, need addresses and contention resolution

      • done by MAC

  • Network layer

    • creates abstraction of one end to end link from several Datalink layer links

    • allows any two endpoints to have an end to end link

    • in datagram networks

      • provides routing and data forwarding

    • in connection oriented networks

      • have a data and control plane for each of the layers (i.e. separate stack for data and control layers)

    • e.g. IP

  • Transport layer

    • creates abstraction of error control, flow control and multiplexing

    • multiplexing applications onto same end to end link (using port numbers)

    • e.g. TCP or UDP

  • Session layer

    • creates abstraction of full duplex, expedited data delivery and session synchronisation

      • expedited data delivery:

        • put a later packet in front of outgoing data

        • used in phone network, if a call is started and the person hangs up, to cancel database lookups etc.

      • synchronisation

        • allow users to place markers in the data stream and roll back to a pre-specified mark

    • e.g. chief clerk of shipping, can do expedited delivery by courier, provide duplex etc.

  • Presentation layer

    • abstracts data representation differences, e.g. endian-ness

    • deals with application data

    • can do encryption too

  • Application layer

    • set of application useful protocols

    • e.g. FTP, SMTP, HTTP

 

System design

  • every design decision is a trade-off

  • goal

    • maximise a set of performance metrics given a set of resource restrictions

  • constrained vs. unconstrained resources

    • e.g. high end PC with 28.8 modem

      • constrained resource is link bandwidth

      • CPU and memory are unconstrained

  • common resources

    • time (response/throughput etc.)

    • space (memory/bandwidth etc.)

    • computation

    • money (cost of components/limits customers willing to pay, etc.)

    • labor

    • social constrains (standards/backwards compatibility etc.)

  • balanced system

    • one where all resources are simultaneously bottlenecked

    • normally you have one bottleneck, remove it and the bottleneck is then something else...

  • standard trade off techniques

    • multiplexing: time and space for money

    • pipelining

    • batching: reduced overhead for longer worst case response time and increased throughput

    • optimising the common case:

      • 80% of time spent in 20% of code

      • spend labour optimising that bit of code

    • hierarchy: scalability for communication expense

    • abstraction

    • separating data and control: overhead of per packet metadata for per-packet QoS

 

Naming and addressing

  • both need to be globally unique

  • names

    • long, variable length

    • human readable

    • difficult to parse

  • addresses

    • shorter, fixed length

    • easy for computer to process/use efficiently in routing

  • name resolution

    • done by servers (single PoF)

  • naming

    • we want Lookup, Create Rename Update and Delete of names to be scalable and efficient

    • hierarchical

      • naïve version: ask other authorities if name is unique

      • recursive decomposition of the namespace

        • decompose into mutually exclusive partitions

        • scales arbitrarily

    • Domain Name Service (DNS)

      • name server which is distributed

      • a name server is responsible for (an authoritative server for) a set of domains

      • can delegate some responsibility to a child

      • root servers are replicated

      • if a local server cannot answer a query it asks its parent

        • replies are cached and timed out

  • addressing

    • use hierarchical schemes

      • globally unique names

      • aggregation to reduce size of routing tables at expense of non-optimum routing

    • in the telephone network

      • variable length addresses

      • assumes local scope unless use pointer to other scope (e.g. 01223)

    • in the internet

      • IPv4

      • network number and host number, defined by mask in different ways (oldest-youngest)

  1.  
    1.  
      1.  
        1. A, B and C class networks

        2. classless routing by clustering addresses within a network

        3. contiguous C blocks referred to as a larger address block

  •  
    •  
      •  
        •  
          • due using a CIDR mask

            • must be known by all internet routers

            • is similar to subnet mask

  • DHCP

    • computer broadcasts discover to subnet

    • DHCP servers offer IP addresses

    • host picks one and sends request to a particular server

    • selected server sends ack

    • when the host is finished with the address it sends a release

    • the IP address has a lease which limits the time it is valid

  • NAT

    • multiplex network layer onto transport layer (i.e. use TCP and UDP ports)

    • generally use non-routable addresses, e.g. 192.168.0.0/16

  • IPv6

    • extends address from 32 bit to 128 bit

    • multiple levels of aggregation

    • several types of multicast

    • anycast (there is a set of endpoints to which a packet can go, but unlike multicast it only goes to one of them)

  • ATM

    • Network Service Access Point address

    • variable length (7-20 bytes)

  • Datalink layer addresses

    • MAC addresses

      • hierarchically allocated (when piece of hardware is made, indicates manufacturer etc.)

      • not hierarchical deployed

    • ARP to resolve

      • in point to point LAN need ARP server

      • susceptible to spoofing by posing as the network gateway

 

 

Routing

  • process of finding path from source to every destination in the network

  • a routing protocol sets up a routing table in routers and switch controllers

    • node makes local choice on global topology

  • routing protocol needs to intelligently summarize relevant information about (complex) global state

    • requirements

      • minimise the routing table space

      • minimise number and frequency of control messages

      • want to avoid

        • black holes: traffic vanishes

        • loops: traffic stuck within a network

        • oscillations: paths cycle between alternatives

      • use optimal path

    • choices

      • centralised vs distributed routing

      • source based vs. hop by hop

      • stochastic vs. deterministic

        • stochastic spreads load, avoiding oscillations, but misorders

      • single vs. multiple path

      • state dependent vs. state independent

        • do routes depend on current network state (e.g. delay)

  • telephone network

    • three level hierarchy, with fully connected core

      • Central office

      • Local Exchange

      • Wide area core

    • only major routing decision is at toll switch

      • if have to use two hop path then which?

    • features

      • stable load

        • can predict pairwise delay throughout the day

        • hence can choose optimal routes in advance

      • extremely reliable switches

        • can assume a chosen route is available (can't do on Internet)

      • single organisation controls the core

      • highly connected

      • connections require resources (but all need the same)

    • costs

      • requires reliability in every component and a fully connected core

    • Dynamic Non-Hierarchical Routing (DNHR)

      • divide the day into around ten periods

      • in each period, each toll switch is assigned a primary one-hop path and a list of alternative two-hop paths

      • drop only if the primary and all the alternatives are unavailable

      • bursts of high activity can lead to metastability

        • overflow into another node's direct path (can rule out by disallowing)

      • measures traffic once a week

    • Trunk status map routing (TSMR)

      • similar to DNHR but updates measurements once an hour or so, if they change “significantly”

    • Real Time Network Routing (RTNR)

      • decentralised

      • each toll switch maintains a list of lightly loaded links

      • one switch asks another for its list

      • intersection of source and destination lists gives set of lightly loaded paths

        • links(A) = C, D, E

        • links(B) = D, F, G

        • hence ADB lightly loaded and good alternative path

      • very effective in practice (used in the core)

  • Distance vector routing

    • node sends Distance Vector to all its neighbours

      • that says what the node's best idea of distance to every node in the network, from it

    • advantages

      • distributed

      • adapts to traffic changes and link failures

      • suitable for networks with multiple administrative entities

    • problem: count to infinity, can deal with using

      • DV has path vector

      • triggered updates (faster count to infinity)

 

 

  • Link state routing

    • router knows entire network topology, computes shortest path itself

    • link state packet (LSP)

      • cost of link between itself and its neighbour, e.g. (from A) {A; B; 4}

    • control flood the network with LSPs

      • store LSPs

      • if one comes in that is new send it out on all routes besides incoming one

      • network with E edges will copy at most 2E times

    • use sequence numbers to tell if a LSP is new

      • sequence numbers run out/don't know which to use on boot up

      • solutions:

        • ageing

          • put a timeout value in the header

            • and control flood notification of the event so that routers that don't know the time can do it too

          • on booting a router has to wait until all the old LSPs are purged and refreshed

        • lollipop

          • need a unique start sequence number

          • a is older than b if

          • if a router gets an older LSP tell sender about newer LSP

    • recovering from a partition

      • the two nodes on either side of the re-established link co-ordinate to update databases

  • Detecting router failure

    • send periodic very small “hello” packets

    • on timeout flood information about the dead router

  • Links costs

    • static metrics

      • set all link costs to 1(min hop routing)

      • give weights inversely proportional to capacity

        • low weight value links will be preferred

    • dynamic metrics

      • NB if the dynamic cost depends on load, can have oscillations

      • ARPAnet original (first cut)

        • cost proportional to length of router queue

        • problems + resolutions

          • queue length averaged over small time (spikes cause re-routing)

            • average queue length over longer time

          • wide dynamic range of path costs (high cost paths ignored)

            • restrict dynamic range

          • queue length assumed to predict future loads (remember earlier NB)

            • depend also on intrinsic link capacity

          • no restriction on successively reported costs

            • make restrictions

          • all tables computer simultaneously (low cost link flooded)

            • stagger table computations

 

  • Hierarchical routing

    • concept

      • divide network into set of domains

      • gateways connect domains

      • computers within domain unaware of outside computers

      • gateways only know about other gateways

    • three level hierarchy in address

      • network number

      • subnet number

      • host number

    • core advertises routes to networks only

    • gateways talk to backbone to find best next-hop to every other network in the Internet

    • if a domain has multiple gateways

      • external records tell hosts in a domain which to pick to reach a host in another domain

        • contains distance from gateway to external node

      • summary records tell backbone which gateway to use to reach an internal node

        • contains distance from gateway to internal node

  • Internet has three levels of routing

  1.  
    1. backbone, connecting autonomous systems (AS): exterior gateway protocol

    2. within AS: interior gateway protocol

    3. within LAN

  • exterior gateway protocols

    • between mutually untrusting routers

    • must tell trusted border gateway what paths are allowed

    • different ASs will use different cost metrics: convert or just ignore

    • Exterior Gateway Protocol

      • nominally DV, costs are

        • 128 reachable

        • 255 unreachable

      • allow admins to pick neighbours to peer with

    • Border Gateway Protocol

      • path vector

        • DV annotated with entire path

        • guaranteed to be loop free

      • can use non-tree backbone topologies

  • interior gateway protocols

    • Routing Information Protocol

      • distance vector

      • cost metric is hop count

      • infinity = 16

      • exchange DVs every 30s

    • Open Shortest Path First

      • link state

      • uses areas to route packets hierarchically within ASs

      • uses designated routers to reduce the number of endpoints

    • Exterior Gateway Protocol (EGP)

 

 

  • ATM routing

    • Private Network-Network Interface

      • link state

      • many levels of hierarchy

      • switch controllers at each level form a peer group

      • group has a group leader

      • leaders are members of the next higher level group

      • leaders summarise information about group to tell higher level peers

      • all records received by leader are flooded to lower level

      • LSPs can be annotated with pre-link QoS metrics

      • switch controller uses this to compute source routes for call-setup packets

  • routing in a broadcast LAN: how decide if a packet is meant for inside or outside the LAN

    • Internet solution:

      • all hosts on LAN have same subnet address

      • routers periodically send out advertisements (including TTL, etc.)

      • redirection

        • if a router's next hop is another router on the same LAN, host gets a redirect message

  • multicast routing

    • packet sent by any member of a group is received by all

    • uses:

      • resource location

      • internet TV

    • associates a group of senders with a group of receivers

      • don't need to know each others' identities

      • can create when sender starts sending or receiver starts receiving

        • even if no-one else there

    • multicast groups have their own class D addresses

    • problems:

      • which groups are currently active

      • how to express interest in joining a group

      • discovering the set of receivers in a group

      • delivering data to members of the group

    • expanding ring search

      • used to find the closest multicast server which is not heavily loaded

      • send out multicast packets increasing the TTL each time

      • multicast packet will only be forwarded if it reaches server within the TTL, and that server is not heavily loaded

        • hence discovering local resources first

    • flavours

      • point to multipoint

      • multipoint to multipoint

    • efficiency issues

      • A wants to talk to B, G, H and I

      • B wants to talk to A, G, H and I

      • could approximate point to multipoint with set of point to points

        • but AC, BC would carry packet three times, for G, H, I

      • could do with two point to multipoint multicast groups

        • but overhead of establishing two groups

      • do with multipoint to multipoint

 

 

  •  
    • multicast tree

      • send at most one multicast packet per link

      • optimal tree provides shortest path from sender to every receiver

      • difficulties

        • sources dynamically join and leave

          • need to be able to do so without explicitly notifying receiver

        • leaves of tree are often members of broadcast LAN: want to exploit broadcast capability

    • Class D to MAC address translation

      • insert the bottom 23 bits of the multicast IP address into the 48 bit MAC address

        • the top 25 MAC bits are a fixed pattern which indicate that this is a multicast packet

        • network cards know to listen for layer 2 packets with these 25 top bits

        • network card filter out those packets from the multicast groups of which the computer is a member

    • Internet Group Management Protocol

      • if a LAN has no members can not send to it (prune it)

      • router periodically broadcasts query message

      • hosts broadcast reply with list of groups they are interested in

        • reply after random timeout

        • if someone else has expressed interest you don't need to reply

    • wide area multicast

      • assume each endpoint is a router which can use IGMP to determine if it wants subscription

      • goal: deliver packets to all routers on the path to a subscribing receiver

      • solution 1

        • controlled flooding

        • need state to see if a packet is new

      • solution 2: reverse path forwarding (RPF)

        • flood

        • at node A forward packet labelled from S iff that packet arrived on the link which is the next hop on the shortest path to S

      • solution 3

        • don't send packet downstream if you are not on the shortest path from the downstream router to the source

        • potential confusion of downstream router has choice of shortest paths to source

    • pruning

      • in RPF, B and C still get packets even though they don't need them

      • router tells parent in tree to stop forwarding

    • rejoining

      • router C sends join(group, A) to B

      • or periodically broadcast a packet and C refrains from pruning

    • but some routers do not support multicast

      • virtual links (IP in IP) between multicast enabled routers

      • need shortest paths (for RPF) only using these virtual links

    • Distance Vector Multicast Routing Protocol (DVMRP)

      • similar to Routing Information Protocol

      • flood and prune

      • explicit join messages

    • Multicast extension to OSPF (MOSPF)

      • routers flood group membership information with LSPs

      • each router computer shortest path tree that only includes multicast-capable routers

      • complex

    • Core Based Tree (CBT)

      • have a statically configured core tree

      • co-ordinate multicast with core router

      • host sends join request to core router

      • routers along path mark incoming interface for forwarding

      • eliminates the pre-source and per-group prune records at each router (how?)

    • Protocol Independent Multicast (PIM)

      • flood and prune good for dense groups (only need a few prunes)

      • dense mode PIM == DVMRP

      • sparse mode PIM similar to CBT

        • can switch from CBT to shortest path tree

        • core is actually a “rendez-vous” point, as more than one

        • leaf router switches to different “rendez-vous” on timeout

  • routing vs. policy routing

    • routing: forward packet on 'best' path

    • policy routing: routes chosen on policy directives regarding things like

      • source and destination address

      • QoS

      • time of day

      • charging and accounting

    • multiple metrics

      • advertise multiple costs per link (e.g. delay and monetary cost)

      • routers construct multiple shortest path trees

      • problems

        • all routers must use the same rules

        • remote routers may misinterpret policy

    • provider selection

      • assume single service provider provides almost all the path

      • choose policy by choosing provider

    • crankback

      • router returns packet if no next hop with sufficient QoS can be found

      • in ATM networks do just for the call setup packet

      • in internet have to do for every packet

  • routing vs. forwarding

    • routing:

      • knowing how to get packets from source to destination

      • have to maintain routes

    • forwarding

      • moving a packet from an input port to an output port

      • make a forwarding decision based on route information

      • getting the packet to the output port fast

      • in IP

        • packet arrives

        • check destination address

        • look up in routing table

        • queue packet to output port

 

  • mobile routing

    • location: where is the host?

    • routing: how get packets to it?

    • telephone network

      • each cell phone has a global ID that it tells remote MTSO when turned on (using slotted ALOHA up-channel)

      • remote MTSO tells home MTSO

      • to phone: call forwarded to remote MTSO to closest base

      • from phone: call forwarded to home MTSO from closest base

      • new MTSOs can be added as load increases

    • in the internet

      • Very similar to mobile telephony

        • but outgoing traffic does not go through home

        • and need to use tunnels to forward data

      • Use registration packets instead of slotted ALOHA

        • passed on to home address agent

      • Old care-of-agent forwards packets to new care-of-agent until home address agent learns of change

    • problems

      • security

        • mobile and home address agent share a common secret

        • checked before forwarding packets to COA

      • loops

 

Error control

  • CRCs

    • detect

      • all single bit errors

      • almost all two bit errors

      • any odd number of errors

      • all bursts up to M where generator length is M

      • longer bursts with probability 2-m

    • implementation

      • hardware: with shift register

      • software: one's complement

        • pre-compute remainders for 16 bit words

        • add remainders to running sum

        • want to touch every byte only once

    • packet errors

      • loss

        • uncorrectable bit errors/overflow in buffer

      • duplication

      • insertion

        • solutions

          • per connection incarnation number

          • assign port numbers serially

          • assign sequence numbers serially

          • wait one Mean Packet Loss after boot up (30s-2min)

            • flushes old packets from the network

          • 3 way handshake

C -> S : SYN(c)

S -> C : SYN-ACK ACK(c+1) SYN(s)

C -> S :ACK ACK(s+1)

  •  
    •  
      • reordering

    • sequence numbers

      • incremented for non-retransmitted packets

      • ACKs carry cumulative sequence number

      • size of sequence number

        • > max time can have an ACK * bit rate

        • > number of packets that can be reordered

        • > number of packets that can be lost

    • maximum packet lifetime

      • time to live (TTL)

    • NACKs aren't very useful

      • take up bandwidth, have to timeout anyway

    • timeout

      • if no ACK then resend

      • SYN-attack

        • forge unused IP address

        • send a long sequence of SYN packets to server

        • up to four minutes' resource allocation per SYN while server retires its SYN-ACKs

      • static scheme

        • know RTT before hand

      • dynamic scheme

        • measure RTT

        • timeout is a function of measured RTTs

      • old TCP scheme

  •  
    •  
      •  
        • a = 1 ->

        • a = 0 -> no history

      • new TCP scheme

  •  
    • error control windows

    • retransmission

      • go back N

      • selective

        • wastes header space (bitmap of received packet sequence numbers)

    • fast retransmit

      • assume cumulative ACKs

      • if see 1, 2, 3, 3, 3, retransmit (n+1) = 4

    • SMART

      • cumulative sequence number and

      • for packet sequence number

 

Flow control

  • considerations

    • send at rate which receiver and network can process

    • simplicity

    • overhead

    • scaling

    • fairness

    • stability

  • usually transport layer

  • classification

    • open loop

      • call setup

        • source describes desired flow rate

        • network admits call

      • data transmission

        • user sends within parameter range

        • network polices users

    • closed loop

      • source monitors available service rate

      • sends at that rate

      • due to speed of light delay, some errors occur

      • first generation

        • ignore network state, only match receiver

      • second generation

        • responsive to different types of state

  1.  
    1.  
      1.  
        1. state measurement

  •  
    •  
      •  
        •  
          • explicit

            • network tells its current rate

            • more overhead

          • implicit

            • endpoint tries to figure out by looking at network

  1.  
    1.  
      1.  
        1. control: two methods

  •  
    •  
      •  
        •  
          • transmission window used for error control and flow control

          • rate:

            • directly control the rate

            • no coupling of flow control and error control

  1.  
    1.  
      1.  
        1. point of control: two methods

  •  
    •  
      •  
        •  
          • hop by hop

            • first generation flow control at each link

          • end to end

            • sender matches all the servers on its path

  •  
    • hybrid

      • source asks for minimum rate

      • can send more if bandwidth available

  • traffic descriptors

    • traffic envelope

    • use for

      • traffic contract

      • input to regulator

      • input to policer

    • requirements

      • representativity

        • adequately describes flow so that network does not reserve too little or too much resource

      • verifiability

        • verify that descriptor holds

      • preservability

        • doesn't change inside the network

      • usability

        • easy to describe and use for admission control

    • peak rate

      • fixed size packet network: minimum inter packet spacing

      • variable size packet network: highest rate over all intervals of a particular duration

      • sensitive to extremes

    • average rate

      • jumping window

        • over consecutive intervals of length t, only a bits sent

        • regulator reinitialises every interval

      • moving window

        • over all intervals of length t only a bits sent

        • regulator forgets packets sent more than t seconds ago

    • linear bounded arrival process

      • number of bits transmitted < rt + s

      • r = long term rate

      • s is burst limit

      • leaky bucket

        • regulator for LBAP

        • three way trade off between

          • burst limit s

          • loss rate

          • long term rate r

      • large bursts mess it up

  • on-off

    • send on, off signals

  • stop and wait

    • send and wait for ACK before sending next

  • static window

    • can send multiple packets per RTT (whilst waiting for ACK)

    • bandwidth delay product/optimal window size

      • w > b*R

        • w is packet size of window

        • b is bottleneck bit rate

        • R is RTT

    • b changes, static doesn't work

  • DECbit flow control

    • every packet has a specific bit in the header

      • this bit is set by any of the intermediate servers if a queue has built up

    • sink copies the bit to ACK

    • if bit set then source reduces window size

    • oscillates around optimal state

    • additive increase, multiplicative decrease

  • TCP flow control

    • implicit

    • dynamic window

    • end to end

    • similar to DECbit without the router support

    • window starts at 1

    • exponential increase, “slow start”

      • doubles every RTT

      • each ACK results in window increase by one

    • linear increase, “congestion avoidance”

      • increase by one every RTT

      • window increases by one when number of ACKS = window size

    • loss detected with timeout or fast retransmit

    • Tahoe: both cases, drop window to one

    • Reno:

      • timeout: drop window to one

      • fast retransmit: drop window to half previous size

    • weaknesses

      • loss doesn't necessarily imply congestion

      • congestion doesn't necessarily imply self blame

  • NETwork Block Transfer (NETBLT)

    • rate based flow control scheme

    • separates error control from flow control, losses and retransmissions do not affect the flow rate

    • application data sent as a series of buffers

    • if received rate < sending rate multiplicatively decrease

  • packet pair

    • improves on NETBLT

    • assume all bottlenecks serve packets in the same order

  • ATM forum End to End Rate baled flow Control (EERC)

    • similar to DECbit, but send whole cell instead of one bit

    • Resource Management cell

      • source sends periodically (once every 32 cells)

      • has rate request

      • each server fills in RM cell with current share, if less

    • comparison to DECbit

      • sources know exact rate

      • non-zero initial cell rate

      • RM cells sitting in the data path is a mess

  • general points

    • flow control is easier with RR (round robin?) scheduling

    • explicit schemes are more robust

    • hop by hop schemes are more responsive but more complex

    • separation of flow control and error control is good

    • rate based schemes are inherently unstable

 

Common protocols

  • plesiochronous hierarchy

    • “nearly synchronous”

    • tight control of deviation from synchrony

    • output runs a bit faster always

    • overhead identifies bits from a particular stream

      • if a stream runs faster use overhead to identify it

    • incompatible hierarchies around the world

    • cannot switch bundles of calls

  • synchronous digital hierarchy

    • requires atomic clocks

    • SDH

      • 9 rows, 90 columns

      • each payload container serviced in 125 microseconds

      • one byte = one call (works out, given 1000/125 payloads/second at rate 64kbps)

      • all overhead is in the headers

  • IP

    • properties

      • unreliable

      • best effort

      • end to end

    • fragmentation

      • if a packet reaches a link with a smaller Maximum Transmission Unit than the size of the packet can break the packet down into fragments and transmit those

      • reassembly happens at the receiving host

      • can set a bits to indicate

        • this packet is a fragment

        • do not fragment this packet

      • error multiplication: if error in any fragment have to retransmit whole packet

    • traceroute

      • router sends ICMP error packet if decrement TTL to 0

      • source sends packets with increasing TTL

  • ICMP

    • used to send error messages

      • destination unreachable

      • redirect

      • TTL exceeded

      • fragmentation needed, but “don't fragment” bit set

  • TCP

    • multiplexed

    • duplex

    • connection-oriented

    • reliable

    • flow-controlled

    • byte-stream

  • HTTP

    • request/response (stateless connection)

    • protocol is simple

    • browser is complex (and usually buggy)

    • cookies can violate statelessness

      • e.g. one cookie for a website and two windows with two shopping trolleys contending

  • ATM (datalink layer)

    • connection oriented

    • in-sequence

    • unreliable

    • QoS guaranteed

    • routing information

      • made up of

        • virtual path identifier (VPI)

        • virtual circuit identifier (VCI)

    • virtual paths

      • identified by the VPI

      • all VCIs for the same VP share path and resource reservation

      • set of VPs define a virtual private network (not the “tunnelled encrypted network over LAN”)

  • ATM Adaptation Layer (AAL – network layer/rest of the stack)

    • AAL 1

      • for synchronous apps

      • timestamps, clocking, sequencing

      • constant bit rate

      • forward error correcting

    • AAL 3/4

      • create a packet of whatever size, put on a frame header and trailer

      • then break down and put in ATM packets with a per cell AAL header

      • error detection, segmentation and reassembly

    • AAL 5

      • violates layering to be efficient

  • SSCOP

    • similar to TCP

    • no ACKs

    • sender polls, receiver sends status, including ACK and window size

      • poll interval is key variable

  • IP over ATM

    • treat ATM as a link-level technology

    • problems

      • ATM is connection oriented

      • different addressing schemes

      • ATM is point to point while IP assumes broadcast

    • IP encapsulation in ATM

      • put entire IP packet in AAL5 frame

      • is multi-protocol encapsulation

    • resolving IP addresses to ATM addresses

      • designate an ATM host as an ARP server

      • need to use inverse ARP to get the layer 3 address of the client (which is needed to make the connection)

    • creating an ATM based IP subnet

      • if all hosts on an ATM are on the same IP subnet then broadcast reaches all (causes congestion)

      • partition into logical IP subnets

        • this can make longer paths between hosts which are actually attached at ATM level, as if it's outside your logical subnet you have to send your packets via the router

    • next hop routing

      • next hope server stores IP to ATM translations independent of logical subnet boundaries

      • like DNS

    • resolving multicast addresses

      • maintain a set of endpoints that correspond to a particular class D address

      • Multicast Address Resolution Server keeps this

    • LAN emulation

      • used to make IP think it's working over Ethernet or token ring

      • translate from MAC address to ATM address

      • need to emulate broadcast

    • Cells in Frame (CIF)

      • previous solutions required ATM host-adapter card

      • can use Ethernet card:

        • encapsulate AAL5 frame in Ethernet header on point to point Ethernet link

        • CIF attachment device at the other end decapsulates and injects frame into ATM network

    • holding time problem

      • to send an IP packet have to open an ATM connection – when close it?

      • locality principle, more packets are likely soon after some others have been sent

      • optimal solution depends on pricing policy and packet arrival characteristics

      • use measurement based heuristics

        • close connection if expected cost exceeds expected benefit

 

Protocol Implementation

  • depends on

    • structure

    • environment

  • structure

    • partitioning of functionality between user and kernel

      • trade-offs

        • customisability

        • security

        • performance

      • partitions:

 

 

 

 

 

 

 

 

  •  
    •  
      • separation of layer processing (interface) strategies

        • single context

          • ...

        • tasks

        • upcalls

  • environment

    • data copy cost

    • interrupt overhead

    • context switch time

    • memory access latency

    • cache effects

  • rules of thumb

    • fine tune inner loops

    • beware of data touching

    • send largest packets possible

    • use hardware

Multiple access problem

  • how should participants coordinate actions so that

    • the number of messages exchanged per second is maximised

    • the time spent waiting for a chance to speak is minimised

  • contexts for the problem

    • broadcast transmission medium

    • colliding messages are garbled

    • main contexts

      • wired broadcast LAN

      • wireless

      • packet radio

      • cellular telephony

      • satellite communication

    • solution

      • choose base technology

      • choose how to allocate limited transmission resource to contending users

  • choices

    • centralised vs. distributed design

      • moderator?

      • Centralised: one is master, other are slaves

      • distributed: all are peers

    • circuit mode vs. packet mode

      • do stations send streams or bursts?

      • circuit mode – allocate resources to stream

      • packet mode – contend for resources with each packet

  • constraints

    • in real life propagation is not inverse square law

    • spectrum scarcity

    • radio link properties

      • error proneness

      • multipath interference

      • hidden terminals

  • parameter 'a'

    • number of packets sent by a source before the farthest station receives the first bit

  • performance metrics

    • normalised throughput: fraction of link capacity used to carry non-retransmitted packets

    • mean delay: time station has to wait before it successfully transmits a packet

    • stability

      • with a heavy load is all the time spent on resolving contentions?

      • if infinite uncontrolled stations share a link then instability is guaranteed

    • fairness (arbitrary definition)

  • base technologies

    • frequency division multiple access (FDMA)

      • each station has its own frequency band separated by guard bands

      • number of frequencies is limited

        • reduce transmitter powers and reuse frequencies in non-adjacent cells

    • transmission division multiple access (TDMA)

      • send on same frequency at different times

      • need time synchronisation

      • pros

        • can give different amounts of bandwidth to different users

        • can switch off power when it's someone else's slot

      • cons

        • synchronisation overhead

        • multipath interference on wireless links

    • code division multiple access (CDMA)

      • frequency hopping

        • send at a different frequency in each time slot

      • direct sequence

        • convert a single bit into a code

      • pros

        • hard to spy

        • immune from narrow band noise

        • all cells can use all frequencies

      • cons

        • implementation complexity

        • need large contiguous frequency band for direct sequence

    • method to convert wireless medium to duplex channel

      • frequency division duplex (FDD)

        • uplink and downlink use different frequencies

      • time division duplex (TDD)

        • uplink and downlink use the same frequency at different times

  • centralised schemes

    • slave can only transmit when master permits

    • natural fit in some solutions

      • wireless LAN: base station is the only station that can see everyone

      • cellular telephony: base station the only one capable of high transmission power

    • pros

      • simple

    • cons

      • single POF (and need to have re-election procedure)

      • master involved in every transfer (causes overhead for shall transmissions)

    • circuit mode

      • when station wants to transmit sends message to master using packet mode

      • master allocates resources to slave which uses until is done

      • no contention during transfer

      • also known as reservation based schemes

        • when 'a' is large can't used distributed schemes (too many collisions)

        • e.g. satellite links

        • stations contend for packet slots for reservation messages

        • master then allocates main channel based on the reservation messages

 

 

  •  
    • polling

      • master asks each station if it wants to send

      • can be reasonably efficient – ask those who have been sending more frequently

    • probing

      • stations are numbered with consecutive logical addresses

      • master does binary search to locate next active station

  • distributed schemes

    • decentralised polling

      • each station assigned a slot it can use

      • if nothing to send the slot is wasted

      • need a protocol to get time synchronisation

    • decentralised probing

      • probe in a pre-established order

      • works well if 'a' is small

      • works poorly if many active stations or if all active stations are on the substation

  •  
    • CSMA(carrier sense multiple access)

      • an improvement on the more simple distributed multiple access schemes

      • check whether the medium is active before sending a packet

        • don't have to wait for a time slot or permission from a master

      • if the medium is free then can transmitter

      • if a collision happens then detect and resolve

      • it works well when 'a' (the number of packets sent by a source before the farthest station receives the first bit) is small

      • possible solutions to the collision problem

  1.  
    1.  
      1.  
        1. p-persistent

  •  
    •  
      • on idle, transmit with probability p (hard to choose)

      • if too p is small then wasted time

      • if p is large then more collisions

  1.  
    1.  
      1.  
        1. exponential backoff

  •  
    •  
      • on a collision choose a timeout randomly from a range which doubles each time

      • don't need to choose a p

  •  
    •  
      • in basic CSMA there is no collision detection in the transmitter, it is detected as a frame error (indistinguishable from any other) in the receiver and some error control protocol is used

    • CSMA/CD

      • the transmitter has some scheme to detect if there has been a collision on detection

      • method

        • wait until medium is free

        • start transmitting

        • check for collision after each bit transmitted

        • if detected

          • transmit a jam signal

          • waits for time interval set by exponential backoff

  •  
    • CSMA/CA

      • used in wireless LANS

        • can't detect collision because the transmitter overwhelms co-located receiver (think radio: shouting very loud then listening for a whisper – cannot be listening for the whisper of someone else whilst shouting)

        • need explicit acks

        • have to reduce collisions as collisions are more expensive

      • method

        • check to see if medium is busy

        • if not then wait interframe spacing

        • set a contention timer to an interval randomly chosen in the range [1,CW]

        • on timeout send packet and wait

        • if no ack

          • assume packet lost

          • try again with a doubled CW

        • if another station transmits while counting down, freeze CW and unfreeze after the other station finishes transmission

      • this only works when every station can receive transmissions from all other stations

        • hidden terminal: some stations in the area cannot hear transmissions from others, though the base can

        • exposed terminal: some, but not all, stations can hear transmissions from stations outside the local area

    • methods of dealing with hidden terminals

      • busy tone multiple access (BTMA)

        • use separate frequency to send out a “busy tone” when receiving

      • multiple access collision avoidance

        • BTMA requires split frequency band (more complicated receivers)

        • instead use explicit messages to indicate when receiver is busy

  1.  
    1.  
      1.  
        1.  
          1. transmitter sends Request To Sent message

          2. if receiver is free it will send back Clear To Send

          3. if transmitter receives a CTS it sends the data

  •  
    • Ethernet

  •  
    •  
      • uses CSMA/CD with exponential backoff

      • on collision place jam on wire

      • small 'a' so time wasted in collision is minimised

      • maximum packet size

        • 1500 bytes

        • given two clocks at the same frequency they will drift

        • this is the max no of bits which can be safely sent without missampling

        • do so don't have to implement phase locked loop

      • two part bridge is technically a switch

      • getting faster: not “how to do” but “how to do cheap enough for PCs”

      • to avoid needing QoS guarantees can just increase bandwidth

        • doesn't work past 100 Gbps

      • pros

        • easy to set up

        • requires no configuration (has built in unique hardware address)

        • can segment LANs to reduce load

      • cons

        • at heavy loads users see large delays due to backoff

        • non-deterministic service

        • big overhead on small packers

    • token passing

      • improve on polling

      • token gives right to transmit

      • if don't want to transmit pass on the token

      • prevents starvation

      • when receive a data packet always pass it on (unless you were the sender)

        • if you are the receiver set the ACK bit

      • pros

        • medium access protocol is simple and explicit

        • don't need to carrier sense/synchronise etc.

        • no collisions

      • cons

        • token is single POF

        • all stations must communicate

        • stations must actively monitor the network

        • normally need one station as a monitor

      • Fiber Distributed Data Interface

        • dual counter-rotating rings

        • supports both realtime and non-realtime traffic

    • ALOHA

      • just send the packet

      • if there's no ack try again after waiting a random time

        • no backoff

      • pros

        • useful when 'a' is large

        • simple

      • cons

        • at high loads collisions are very frequent

        • sudden burst of traffic can lead to instability

    • Slotted ALOHA

      • use discrete timeslots (can only start transmitting at the start of a slot)

      • decreases possibility of contention

    • reservation ALOHA

      • contend for reservation minislots using slotted aloha

      • stations independently examine reservation requests and come to consistent conclusions

      • pros

        • supports circuit and packet mode transfer

        • works with large 'a'

        • simple

      • cons

        • cannot pre-empt hogs

 

 

Switching

  • packet vs. circuit switches

    • packets have headers

    • samples don't (just an identifier)

  • connection oriented vs. connectionless

    • connection oriented need call setup

      • setup handled in the control place

  • goals

    • maximise capacity (subject to cost/reliability constraints)

      • capacity of switch is maximum rate at which it can move information

    • minimise call blocking

      • only for circuit switch

      • circuit switch must reject a call if can't find a path for samples

    • minimise packet loss

      • only for packet switching

      • packet switch must reject a packet if can't find a buffer to store it

    • don't reorder packets

  • multiplexor

    • for N identical inputs, trunk runs at N times the speed of one input

    • demultiplexer similar

    • neither mux nor demux need addressing information

      • always in same place on trunk so know which is which channel's by time division mux

    • inverse multiplexing – take high bit rate stream and scatter it across multiple trunks

  • circuit switching

    • move 8-bit sample from input port to output port

    • generally inputs trunks are multiplexed (rather than 200 000 individual inputs)

      • multiplexed trunks carry frames (sets of samples)

    • extract sample from frame, depending on position in frame, switch to output

    • call blocking

      • internal blocking: there is an output slot but no path

      • output blocking: no output slot available

    • transit switch

      • can put sample in one of several slots going to desired next hope

      • reduces output blocking

    • Time Division Switching

  •  
    •  
      • when demultiplexing position in frame determines position in output trunk

        • write n bytes from incoming trunk in order into n memory slots

        • read slots out in a permuted order

      • limited by read/write speed of memory

    • Space Division Switching

      • each sample takes a different path through the switch depending on its destination

      • crossbar

 

 

 

 

 

 

 

  •  
    •  
      •  
        • for multiplexed input need switching schedule

      • multistage crossbar

        • can save crosspoints if a crosspoint can attach to more than one input

        • have more than one crossbar in the system

        • you can have more than one path to the output


    • time space switching

      • use TSI for first stage, with crossbar for the middle and final stages

      • delay samples so that they arrive at the right time for the space division switch's schedule

    • time-space-time switching

      • use TSI for first and final stages, with crossbar in the middle

      • replace n input * k output space switch by TSI switch that takes n-slot input frame and switches it to k-slot output frame

  • packet switching

    • datagram: lookup based on the entire destination address

    • cell: lookup based on VCI

    • repeaters: physical level

    • bridges:

      • datalink layer

      • based on MAC addresses

      • discover who is on which side by listening

    • routers

      • network level

      • participate in routing protocols

    • application level gateways: treat entire network as single hop

    • deciding on output port for a datagram

      • (in circuit switching can do using table lookup as have stored during call setup)

      • need to find longest prefix match

      • standard solution: use a trie

        • a sorted tree data structure

        • can improve performance by

          • caching more recently used addresses

          • move common entries to a higher level (i.e. match on longer strings)

    • blocking (internal or output)

      • must buffer or drop

      • deal with by

        • over provisioning: internal links much faster than inputs

        • buffers at input and output

        • back pressure: if switch fabric doesn't have buffering, prevent from entering until path available

    • switches

      • first generation

        • basically like having a load of network cards in a PC

 

 

  •  
    •  
      • second generation

        • port mapping intelligence in line cards

        • bottleneck: bus or ring

 

 

 

 

  •  
    •  
      • third generation

        • the self routing fabric provides parallel paths

        • contention is for the output buffers

        • potential for unlimited scaling

    • switch fabrics

      • is for transferring from input to output ignoring scheduling and buffering

      • ways of doing

        • crossbar

          • need to compute schedule on the fly (unless fixed size in known arrival pattern)

        • buffered crossbar

          • buffer the crosspoints

        • broadcast

          • packets are tagged with an output port number and sent to all ports

          • need to match N stages in parallel at each output port

          • only useful as part of a switch or in a small one

        • switched fabric element fabrics

          • switched fabric element

            • if 0, send packet to upper output

              • else to lower output

            • if both packets to same output buffer or drop

          • properties

            • NxN switch with bxb elements has logbN elements

            • self routing

            • recursive

            • can be synchronous or asynchronous (i.e. can use for variable sized packets)

          • Banyan

            • if two packets want to go to the same output there is output blocking

            • avoiding blocking

              • check path is available before sending

              • or

                • sort by output

                  • merge sort in hardware

                • remove duplicates

                  • recirculate to the beginning

                  • or run output of trap to multiple Banyans

                • remove gaps

                • perform a perfect shuffle

                  • split set in half and interleave

                  • i.e. 1 2 3 4 5 6 7 8 -> 1 5 2 6 3 7 4 8

          • packet size

            • small fixed size

              • easy to build large parallel fabrics

              • more overhead – overhead dominates at high speeds

            • variable size

              • can fragment or build asynchronous fabric

              • hence variable size not hurt too much

    • buffer placement

      • need buffers to match input rate to service rate

      • input buffering (queueing)

        • need arbiter to say which queue's packet for output K can go

        • head of line blocking

          • if a 1 had been chosen from a different input then the 3 on this output could go through the fabric but is trapped

          • solution:

            • multiple queues per input

            • arbiter must choose which queue an input port sends

            • but multicast is usually pathological for this

 

 

 

 

 

 

 

  •  
    •  
      • output queueing

        • output queues have to run faster than the trunk speed

        • knockout principle

          • use to reduce speed required

          • drop extra packets, fairly distributing loss between inputs

      • buffered fabric

        • buffers in each switch element

        • pros

          • only need to be as many times faster than trunk speed as the degree of the fan in

          • if you use hardware back pressure you can reduce the buffering requirements

            • if the arbiter knows the buffers are full it can prevent a certain output number from entering the fabric

        • cons

          • costly

          • hard scheduling

    • multicast switches

      • do in hardware

      • port mapper has a list of outputs

      • incoming packet must be copied to these output ports

      • sub-problems

        • generating and distributing copies

          • implicit

            • multiple outputs enabled after placing packet on shared bus

            • bus/ring based, crossbar or broadcast switches

          • explicit

            • copy a packet at the switch elements

            • use copy network

        • header (VCI) translation for copies

          • normally can do VCI translation at input or output

          • for multicast

            • input maps VCI to output ports

            • output swaps the VCI

 

Integrated services

  • problems with IP

    • data transfer

      • no recognition of flows

      • connectionless – no signals

    • forwarding

      • per datagram

      • no examination of type of traffic, no priorities

        • e.g. file transfer vs. VoIP

    • routing

      • no fixed path, so no fixed QoS

    • scheduling in the routers

      • FCFS – no examination of type of traffic

    • requirements for an Integrated Services Network

      • QoS service-level

        • packet handling

        • traffic description

        • policing

        • application flow description

      • Service interface

        • common data structures and parameters

        • signalling protocol

      • admission control

        • check request can be honoured

      • scheduling

        • packet classification

        • prioritisation of traffic

        • queue management

  • Traffic and QoS parameters

    • generally a network is three level structure

  1.  
    1. access network

  •  
    • local

    • low multiplexing and volume of traffic

  1.  
    1. distribution network

  •  
    • medium volume

    • low multiplexing

  1.  
    1. core network

  •  
    • high multiplexing and volume of traffic

  • mixed traffic in the network

    • routers use FCFS queues

      • no knowledge of traffic patterns or applications

    • traffic is often aggregated in complicated ways in the core

    • packet size

      • large

        • good for general data

        • “router friendly”

        • slows real time traffic

      • small

        • good for real time data

        • less end-to-end delay (defined using:

          • propagation

          • transmission (data rate)

          • buffering

          • application specific processing)

        • better tolerance to loss

        • less jitter (defined using:

          • jitter is variation in end to end delay

          • causes

            • FIFO queuing

            • dynamic path changes, etc.

          • only critical in TCP if it is a first order effect

          • critical to voice, Video over IP etc.

        • less efficient due to overhead/data ration

        • not “router friendly”

  • end to end loss

    • non real-time

      • retransmit (as in TCP)

    • real-time

      • forward error correction

      • media specific “fill in” at the receiver

    • reordering can be seen as loss

    • causes

      • packet dropping at routers

      • traffic violations

      • excessive load

      • link/router quality

 

 

  • end to end data rate

    • short term changes: during the course of a flow

    • long term changes: during the course of a day

    • goodput – residual throughput after allowing for overheads (inc. retransmission)

    • affected by

      • protocol behaviour (TCP congestion control)

      • dynamic routing

      • congestion

      • aggregation

      • link multiplexing

  • elastic (non-real time) applications

    • asynchronous: e.g. email

    • interactive: http, ftp, etc.

  • inelastic (real time) applications

    • streaming voice

      • non interactive

      • end to end delay and jitter not important

      • data rate and loss very important

    • real time voice

  • QoS parameters for the internet

    • delay

      • cannot set a maximum, but can measure it (Dmax)

    • jitter

      • cannot set end-to-end jitter value

    • loss

      • not really QoS parameter for IP networks

      • linked to data rate

    • packet size

      • restrictions

      • may be used by routers for buffer allocation etc.

    • data rate – specify using

      • leaky bucket

        • parameters

          • B: bucket size in Bytes

          • L: leak rate in Bytes/sec

        • data poured into bucket

        • maximum latency is B/L (i.e. how long it would take from the top of the bucket)

      • token bucket

        • parameters

          • b: bucket size in Bytes

          • r: bucket rate in Bytes/sec

          • p: peak rate in Bytes/sec

        • bucket fills with tokens at rate r

        • presence of a token allows a data transmission

        • transmission is allowed at a rate up to p

  • real time media flows

    • audio

      • low loss preferable

      • data rate basically fixed for certain applications

    • video

      • loss must be low

      • data rate dependent on

        • frame size

        • colour depth

        • frame rate

        • encoding

 

QoS services and application level service interfaces

  • INTSERV (an integrate services network architecture)

    • this is an architecture used to guarantee QoS on networks

    • is a fine-grained QoS system

    • concept:

      • every router in the system implements IntServ

      • every application that requires some kind of guarantees has to make a reservation

      • "Flow Specs" describe what the reservation is for (describe service semantics)

      • "RSVP" is the underlying mechanism to signal it across the network

    • service templates/“flow specs”, consists of

      • TSpec: allowed traffic pattern, normally bucket parameters

      • RSpec: service request specification

    • policing:

      • check that the TSpec is adhered to

      • if violated can change the packet handling (degrade service etc.)

    • RSVP

      • used for signalling resource requests between user, server and network

      • receiver oriented: receiver of a flow initiates and maintains the resource reservation for that flow

        • every 30s or so each server which can send QoS data sends out a message to see if any clients are interested

        • if a client is interested it sends a message back indicating what it wants and what guarantees it needs

        • each router along the way decides if it can deal with the resource request

          • if not send back a cancellation message

          • if so keeps soft state of reservation options

        • routers then police according to that policy, dropping the flow after a timeout

      • is flow based (vs. class based) – treat each flow differently

    • problems

      • lots of state being held

      • lots of refresh messages clogging up the system

      • applications and routers need to be RSVP aware - what about legacy applications?

  • DIFFSERV

    • is a coarse-grained QoS system

    • is class based (vs. flow based)

      • data packet placed in one of a limited number of traffic classes

      • doesn't differentiate network traffic based on the requirements of an individual flow, but on the class of a packet

      • each traffic class can be managed differently

    • the packets are assigned to classes by the network (i.e. the ISP)

      • this ensures user cannot always ask for highest QoS without paying

      • also means that the end system is unaware that or how this happens

        • good for legacy applications

    • per hop behaviours (traffic classes)

      • expedited forwarding: low loss, low latency traffic

      • assured forwarding: has varying levels of drop precedence

    • problems

      • going from one DIFFSERV cloud to another (one ISP to another), one cloud's gold packet may be another's bronze

      • no standards for per hop behaviour types or how to treat them

  • INTSERV and DIFFSERV?

    • INTSERV is good for per-flow reservations

    • DIFFSERV is good for aggregate, per customer/application/group etc.

    • run INTSERV reservations within DIFFSERV flows?

  • RTP

    • is carried in UDP packets

    • supports multicast natively

    • uses RTCP to permit application level control/signalling

  • Real Time Control Protocol (RTCP)

    • uses signalling channel to send data about the flow

      • this will affect how the sender and receiver act as entities

      • is not directly applicable to specific packets

    • it provides QoS info for deciding how to shape the flow

      • loss, delay, jitter

      • user information

      • etc.

  • application level signalling

    • user to network

      • Telco network

        • separate data and signalling path

      • IP

        • signalling information carried on the data path

          • RSVP carried in IP packets along data path

        • need aggregated signalling towards the core, e.g. using INTSERV with DIFFSERV?

          • i.e. can't keep individual

    • user to user signalling

      • call/session set-up

      • use to exchange each's capabilities

      • application level signalling supported by network

  • summary

    • need QoS mechanisms for UP

    • per flow

      • INTSERV

      • RSVP

      • does not scale well

    • customer/provider services

      • DIFFSERV

 

Scheduling and queue management

  • traditional routing behaviour

    • per packet, no idea of flows

    • generally FCFS, no priority traffic

  • this basic behaviour won't support QoS, need to modify it

  • simple router schematic:

    • the input lines have no input buffering

    • uses policy to classify packets

    • different classifications go in different output buffer for each output

    • scheduler decides which buffer to schedule next

  • FCFS in this scheme

    • use a default classifier

    • queue packets to outputs in order they arrive

    • service a packet as soon as possible after it arrives

  • work conservation

    • a system is work conserving if it is not idle when there is work to do (i.e. packets waiting)

    • we cannot give all flows a lower delay than they would get under FCFS and we have the equation:

where

ρn = λn . μn

ρn is the mean link utilisation

qn is the mean delay due to scheduler

λn is the mean packet rate (packets per second)

μn is the mean per-packet service rate (seconds)

C is a constant

  • non work conserving schedulers

    • these are permitted to be idle even if packets are available

      • wait until a packet is eligible for transmission

    • advantages

      • permits traffic smoothing for flows

        • makes downstream traffic more predictable

      • less jitter

      • possibly can get away with less buffer space

    • disadvantages

      • higher end to end delay

      • complex to implement

 

  • requirements of a scheduler

    • simple to implement

      • it needs to be implemented on high speed networks

      • want little complexity so can have a fast (hardware) implementation

    • fairness

      • protect any flow from the misbehaviour of any other

      • local fairness generally implies global fairness

    • performance bounds

      • on jitter, delay, loss ...

      • can we do it so it's deterministic, provable, per flow?

    • Admission control

      • if it is required it needs to be fast, simple and efficient

  • fairness: max-min fair share criteria

    • this is an algorithm to provide fairness to protect flows from each other

    • algorithm

  1.  
    1.  
      1. find the flow with the smallest demand for the resource, from a pool of n flows

      2. give it as much as it wants (no more), up to R/n where R is the remaining resource available

  •  
    •  
      • i.e. flows which can't have as much as they want get an equal share of the resource available

  1.  
    1.  
      1. remove the flow from the pool of flows, decrement the amount of resource available

      2. go to 1

mn = min(xn , Mn ) 1 <= n <= N

where

C is the total capacity of the resource

mn is the actual resource allocated to flow n

xn is the resource demanded by flow n

Mn is the potential resource available to n

  •  
    • you can do weighted max-min share

  • parameters in a scheduler

    • number of priority levels

    • how to serve the different levels

      • especially to prevent starvation of lower levels

    • is jitter/delay control for some levels

      • this could make it non work conserving

    • aggregation: will priority be decided

      • per flow?

      • per application?

      • per user?

    • how to service within a given level's queue

 

  • possible schemes

    • simple priority queueing

      • k queues

      • queue (k+1) has higher priority than queue k

    • generalised processor sharing

      • weighted max-min share

      • serves an infinitesimally small amount of data from flow i

      • visits flows round-robin

      • is unimplementable but used as a standard

        • relative fairness: fairness of a scheduler with respect to other flows it is servicing

        • absolute fairness: fairness of a scheduler compared to GPS for the same flow

    • weighted round robin

      • different flows will have different mean packet sizes

        • so to help make more fair divide the weighting coefficient by the mean packet size

      • this can introduce unfairness if the mean packet size is estimated incorrectly

      • service is fair over long time-scales

    • deficit round robin

      • each queue has a (deficit) counter dc which is initially zero

      • scheduler goes through the flows in order

        • attempts to serve the queue if

            size <= quantum + dc

        • if it does serve it then it sets

            dc = quantum + dc – size

        • otherwise it sets

            dc = quantum + dc

      • the quantum is a 'credit' which builds up each round

        • it is normally the maximum expected packet size

    • weighted fair queueing

      • is a GPS emulation

      • estimates the time the packet would have completed under bit by bit (i.e. discrete) GPS

      • packets are tagged with this finish number

      • the smallest finish number across the queues is served first

      • buffer drop policy

        • drop packets already queued in order of decreasing finishing number

      • can provide

        • “best effort” queueing

        • guaranteed data rate and deterministic end to end delay

  • Class based queueing

    • is only a queueing mechanism (still needs a scheduler)

    • link capacity is shared

    • there is class based allocation

    • a node can “borrow” spare capacity from a parent

 

  • queue management

    • packet dropping

      • policies

        • drop from tail

        • drop from head

          • better for TCP

        • random drop

          • fair if all sources are behaving

        • flush queue (drop all packets)

        • intelligent drop

          • needs lots of higher level state information

      • reactions of end systems to packet drops

        • non-real time – TCP

          • backs off

        • real time – UDP

          • application level control flow required

          • real time data flows may not be able to adapt

            • hence this hurts flows which do adapt (as they have to adapt for the non-adaption of other flows as well as for the general congestion)

    • congestion control

      • random early detection

        • method

          • monitor the average queue size and drop packets based on statistical probabilities

          • if the buffer is almost empty, all incoming packets are accepted

          • as the queue grows, the probability for dropping an incoming packet grows too

          • when the buffer is full, the probability has reached 1 and all incoming packets are dropped

        • any time a packet is dropped the sender is notified and hopefully slows down

        • this process should reduce the cycle of

          • the router becoming full (the network is congested)

          • all incoming packets are dropped

          • all incoming flows reduce flow rate drastically

          • the network becomes underutilised

          • all flows build up again at roughly the same rate

          • repeat

      • explicit congestion notification

        • works using IP, only if both ends signal that they will use it

        • on congestion a router sets a bit in the IP packet

        • in the ACK the received indicates this congestion

        • the received must act (in TCP terms) as if the packet was dropped (in terms of reducing flow, not in terms of retransmission)

      • we really need adaptive sources

      • issues

        • fairness and protection

        • scalability

          • granularity

          • speed of operation

        • flow adaptation

        • aggregation

        • signalling

          • router state

          • ECN?

Routing for integrated services

  • routing requirements for integrated services

    • multiparty communications

      • multicast

      • multi-user games

      • etc.

    • support for QoS in routing

  • QoS based routing

    • traditional (telephone) routing

      • the destination determines the path

      • there is only one optimal path to the destination

    • properties of QoS routing

      • multiple paths possible

      • different paths have different QoS properties

      • routing updates include QoS parameter information

    • can do in IP header

      • not widely used – no global agreements

    • there are lots of metrics available to determine QoS, depending on what you want

      • possible metrics

        • minimum delay path

        • maximum throughput path

        • maximum reliability path

        • minimum cost path

      • if the router is having to deal with choosing a path for a packet based on its indicated preferences there will be a lot of overheads

        • larger router updates (as QoS parameter information is passed in updates, as above)

        • more state is stored in routers

        • more processing per packet

    • there can be advantages to staying on the same route, even if a “better” one becomes available, and there are a couple methodologies to do this

      • route pinning

        • ensures that, whilst there are still packets coming from A to B, in any flow, that they all follow the same route

        • this can cause congestion (as it defies normal routing ideas of taking the best route for each packet)

      • path pinning

        • ensures that all packets from A to B in flow F go along the same route

        • if a new flow starts it may take another route

        • this can be problematic if there is a long lived flow and the route has since become a much worse choice

    • intra domain QoS routing

      • can use agreed single/multiple metrics

      • need to indicate any disruptions to QoS along a path

      • need to support best effort traffic

      • is really a research issue

    • inter domain QoS routing

      • needs to be scalable

      • QoS routing shouldn't be very dynamic

        • don't want lots of updates

      • shouldn't constrain intra-domain QoS routing

 

  •  
    • QoS routing for multicast

      • has to support

        • widely dispersed groups

        • dynamic membership changes

        • must scale across domains

      • is really hard...

 

Traffic management

  • this is a set of policies and mechanisms that permit a network to efficiently satisfy a diverse range of service requests

  • needed for next generation data networks

  • economic principals

    • utility function

      • it is assumed that there is a function from a given quality of service to a utility

      • utility in this case is “level of satisfaction”

      • this function is determinable by observing how rational users act

      • e.g. a file transfer

        •  

            u is utility from file transfer

            S is satisfaction from a infinitely fast transfer

            a is rate at which satisfaction decreases with time

            t is transfer time

        • u = S – a . t

          where

        • NB, if t > S/A then the user is worse off (negative utility)

    • social welfare

      • social welfare is maximised when some combination (normally summation) of all the utility functions is maximised

      • efficiency: a network is efficient when increasing the utility of one user must necessarily decrease the utility of another

      • envy-free: a network is envy-free if no user would trade places with another (NB better performance costs more)

      • attempt to maximise social welfare, envy freeness and still make a profit

    • a single network which provides separate QoS is better than separate networks for each QoS

      • can share unused capacity

    • lowering delay of delay-sensitive traffic increases welfare

      • will the person downloading program really notice an extra couple seconds, verses VoIP

    • normally welfare increases more than linearly with an increase in capacity

    • applying the principals

      • it's better to give 5% of the traffic low delay than all the traffic slightly lower delay

    • the big question

      • if we took the money we spent on attempting to control the traffic and just blindly increased capacity, would we be better off?

    • traffic models

      • to do the above we need a traffic model of how the users behave

      • can either measure or guess

      • telephone traffic models

        • time between calls is an exponential distribution (Poisson process)

        • length of calls is an exponential with a heavy tail (significant numbers of calls are very long)

      • internet traffic models

        • LAN connections differ from WAN connections

        • many parameters are heavy-tailed (a few calls are responsible for most of the traffic)

  • traffic classes

    • traffic classes encompass both

      • user requirements

      • what the network service offers

    • e.g. telnet requires low bandwidth and low delay

    • basic classes

      • guaranteed service

        • utility is zero unless an application gets a minimum level of QoS

          • that QoS could be in one or more of bandwidth, delay, loss, jitter ...

        • typically synchronous and real-time

      • best effort

        • typically asynchronous and indifferent to real time

        • e.g. email

    • ATM guaranteed service subclasses

      • constant bit rate

        • mean and peak rate are the same

      • variable bit rate

        • long term average with occasional bursts

        • try to minimise delay

        • e.g. constant quality variable bandwidth audio

      • available bit rate

        • user gets whatever is available

      • unspecified bit rate

        • like ABR but without feedback

  • time scales

    • actions are taken at different rates during a call

      • some once per call (e.g. resource requests)

      • some every few RTTs (e.g. feedback flow control)

      • some very rapidly (e.g. policing)

    • traffic management mechanisms need to deal with a range of traffic classes at a range of time scales

  • mechanisms

    • renegotiation

      • static per-call descriptors doesn't make sense for many traffic sources

      • e.g. if the network serves a real time video stream at slightly less than then agreed average rate it will need a large amount of buffering space to keep up

      • renegotiating service rate every 10 sec gets the bandwidth requirement to nearly average rate

      • note that renegotiation does cost in terms of overhead

      • renegotiated constant bit rate (RCBR) service

        • all traffic sent as constant bit rate

        • CBR is renegotiated if necessary

        • buffers are at the edge of the network

        • is easy to price

    • signalling

      • this is a source signalling its utility function to the network, in two parts

        • how to carry the message (transport)

        • how to interpret it (semantics)

      • how the signalling works

        • setups are sent and acknowledged per hop

        • resources are reserved tentatively

      • is hard due to complex services, feature interaction, performance tradeoffs...

      • resource translation

        • the application wants an end-to-end quality

        • this needs to be translated into per-hope requirements

        • two pass approach

          • forward: attempt to get as high a QoS at each hop as possible

          • reverse: relax the actual resources reserved so that it fits the required end to end quality

      • signalling on the Internet: RSVP

        • the main motivation behind this is to efficiently support multipoint multicast with resource reservations

      • multicast reservation styles

        • naïve (source initiated)

          • source contacts each receiver in turn

        • intelligent multicast (merge replies)

          • the source needs to know all receivers and the rate at which they can absorb

          • there are two messages per link of the spanning tree

          • doesn't scale

        • naïve multipoint multicast

          • two messages per source per link

          • can't share resources among multicast groups

        • RSVP

          • receiver initiated

          • state on reservation is kept per group, not pre connection

    • admission control

      • constant bit rate admission control

        • simple

        • on failure try again

      • best effort admission control

        • trivial

      • variable bit rate admission control

        • the peak rate differs from the average rate

        • need to decide how much over the average rate to book

        • trade off between wasting bandwidth or risking dropping packets in a burst

        • options

          • peak rate admission control

            • reserve at the connection's peak rate

            • pros

              • simple

              • zero fluid delay and zero loss

            • cons

              • wastes bandwidth

          • worst case admission control

            • source is described using

              • average rate

              • burst size

            • use weighted fair queueing or rate controlled discipline to reserve bandwidth at an average rate

            • pros

              • may use less bandwidth than with peak rate

              • can get an end-to-end delay guarantee

            • cons

              • complex

          • admission with statistical guarantees

            • as the number of calls increases, the probability that multiple sources send a burst decreases

            • the sum of connection rates is increasingly smooth

            • with sufficient sources all can be assumed to be arriving at their average rate

            • traffic from a source is sent to a buffer which is drained at rate e

              • if a burst is sent, its delay goes up

              • for too large a burst packets are dropped

            • equivalent bandwidth: the rate at which we need to drain the buffer so that the probability of loss is less than i and the delay in leaving the buffer is less than d

            • many sources will generally share a buffer

            • admission control method in this case

              • a request from a source arrives

              • use its performance requirements and network state to assign it an equivalent bandwidth

              • the sum of equivalent bandwidths should be less than the link capacity

            • pros

              • can trade off a small loss probability for a large decrease in bandwidth reservation

              • can obtain delay bounds

            • cons

              • assumes uncorrelated sources

              • hairy mathematics

          • measurement based admission

            • for traffic which cannot easily describe itself

            • a metric is obtained by measuring actual average load

            • the user gives a peak that it expects

            • if average + peak < capacity then admit

            • problems:

              • assumes past performance is indicative of future performance

              • when to 'forget' about the past?

    • peak load pricing

      • service providers want to

        • avoid overloading

        • use all available capacity

      • traffic has a periodic daily distribution showing peaks

      • “price” can be used to move some traffic to off peak times

      • “price” is a signal to the user from the provider about network preferences

        • in deciding pricing have to create utility function to show

          • the weight the network places on using capacity and on revenue

          • the weight the user places on cost and on lack of overload

      • rational users help the system by helping themselves

 

  •  
    • capacity planning

      • this is the design of modifications to

        • network topology

        • link capacity

        • routing choices

        to most efficiently use existing resources or to alleviate long term congestion

      • approach

  1.  
    1.  
      1. measure the network during its busy hour

  •  
    •  
      •  
        •  
          • measure traffic from end points to end points

            • we're just modifying the internal network, not end points

          • find the busiest hour by measuring in general a week or so

            • add a “fudge factor” to this rate for future growth

  1.  
    1.  
      1. create traffic matrix

  •  
    •  
      •  
        •  
          • this is the number of bits sent from each source to each destination

          • assume this pattern will predict future behaviour

  1.  
    1.  
      1. decide topology, which depends upon:

  •  
    •  
      •  
        •  
          • connectivity between arbitrary points

          • geographical considerations – some links may be easier to build than others

          • existing capacity

  1.  
    1.  
      1. assign capacity

  •  
    •  
      •  
        •  
          • assign sufficient capacity to carry the busy hour's traffic

          • however routes are decided dynamically using link and congestion status

            • it is easier to assign capacities if routing is static and links are always up (telephone network)

  • some open problems

    • resource translation

      • we can do this for bandwidth

      • doing it for delay is hard

      • can either

        • translate delay into an equivalent bandwidth

        • obtain per hop delay bounds

    • renegotiation

      • static descriptors aren't that good for dynamic traffic

      • renegotiation is not free as there s signalling overhead

      • problems:

        • when to renegotiate?

        • admission control?

        • what to do on renegotiation failure?

    • measurement based admission

      • over what time interval to measure the average?

    • peak load pricing

      • how to choose prices for on and off peak

      • when peak times should end

    • capacity planning

      • simultaneously choosing topology, link capacity and routing metrics as routing and link capacity interact

    • smart vs. dumb

      • getting low delay for VoIP means giving low delay for all traffic

      • but if we took the money spent on traffic management and increased capacity we'd get the lower delay?

 

Miscellanea

  • scaling/complexity

    • all mechanisms added to IP have some cost

    • we would like this cost to be O(c)

    • normally it's O(n) in

      • number of end systems

      • number of routers

    • per flow queues

      • there will be a data structure for every active pair of senders/receivers

    • per class queues

      • data structure per class

      • edge systems have to do policing and shaping

      • may get O(log(n))

  • stability

    • traffic, both data and control, should be stable

    • we need to do a theoretic analysis to figure out if it is

      • oscillatory should still converge or be bounded

    • design to cause stability

      • end to end congestion system: damped feedback

      • routing systems: randomised timers

      • we don't want to do alternate path routing and admission control on the same timescales

  • mobile issues

    • wireless links

      • throughput, delay and loss can vary on wireless links

      • offering hard QoS is hard

        • it may be coordinatable if different layer two media can cooperate and do handoff properly

    • patterns

      • mobile access patterns may be different from fixed ones (we don't know)

      • mobile multicast with a moving source or sink could be complex (need to rebuild the tree)

    • resources

      • some QoS approaches are based on the network being essentially underloaded

        • expedited forwarding and assured forwarding only work for VoIP if it's a small percentage of the traffic

      • this isn't the case on many wireless links

  • pricing

    • you have to charge for QoS

    • how much are users prepared to pay?

    • how to stop the rich from buying all the bandwidth and resell it at a different price?

    • normally access fee can cover the cost of infrastructure

      • the bill is an incentive scheme to stop users hogging capacity

    • varying the price with network conditions – congestion pricing

      • is theoretically optimal

      • generally too complex for users

      • predictable bills are often more important than cheapest fare

  • provisioning

    • users don't like being refused access

    • we need to dimension the network to balance user satisfaction and revenue levels

 

  • implementation novelties

    • active networks

      • allows the packet to provide code to execute on routers and nodes in the network

      • was investigated as it was becoming hard to deploy new services in telephony and IP networks due to the scale of the infrastructure

      • can reprogram the switch on the fly (possibly per packet)

    • multi protocol label switching (MPLS)

      • getting from the source to the destination(s) as fast as possible

        • not quite the same as QoS provisioning

      • is intended to replace ATM (similar technology)

      • flows are identified and labelled

      • label switching

        • packet enters label switching router (LSR)

        • destination and new label looked up in table

        • new label applied, sent to output port

      • labels

        • packets that share the same next hop share the same local label

        • that label is determined by a downstream label switching router

      • routing information is used to distribute labels

        • piggy back the label information onto existing protocols

      • label stacking

        • it's like source routing

        • the LSRs along the way pop off the next label on the stack

          • makes it even faster

  • performance issues

    • router architectures

    • fast route table lookup

    • fast packet classification (for QoS)

    • better address aggregation

    • traffic engineering (differentiated services)

    • faster boxes or smarter software? (soft vs. dumb)

  • if you push something out of one part of the architecture, it will show up somewhere else