simple solutions for complex problems
Digital Communications
Link characteristics
bandwidth
delay (light travels at 0.7c in fibre)
attenuation
Telephone network
basic service
two way voice
low end-end delay
guarantee of (bandwidth and buffer) resources to an accepted call
connected by a virtual circuit
overall design
fully connected core
simple routing
hierarchically allocated telephone number space
end system
must be powered from the exchange
generally analogue
instead of two wires for transmit and receive circuits, use the same circuit
has sidetone (local)
we want this to stop people shouting
and echo (far end)
some received signal is transmitted back
need expensive echo cancellers for long distance calls
dialling
pulse dialling: send pulse per unary coded digit
tone dialling:
dual tone multifrequency
Pressing a single key (such as '1' ) will send a sinusoidal tone of the two frequencies (697 and 1209 hertz (Hz))
transmission
multiplexing
analogue
band limit call to 3.4kHz and frequency shift
was used to run one cable to two homes, was problem when wanting to upgrade to broadband
digital
convert voice to samples (8 bits/sample, 8000 samples/sec, call 64 kb/s)
interleave samples from different calls
use 256 logarithmically spaced quantization levels (as dB is logarithmic)
TDM
trunk works faster rate than inputs
interleave samples (serve all inputs in time it takes one sample to arrive)
multiplexed trunks can be multiplexed on faster trunls
syandards
Digital Signalling (in US)
SONET defined to standardise but was implemented a little differently in US
Link (physical media of transmission)
cost is in installation, not the link
builders install cat 5, fibre and coax to every room
over-provision long distance links, unless undersea
Fibre:
< 10 Pbps capacity
little attenuation
generally only thermal noise
types:
multimode
cheap (uses LEDs)
suffers inter symbol interference
short distance (a couple km)
single mode (only one solution to wave equation)
expensive (uses lasers)
long distance (hundreds km)
satellites
geosynchronous
36000km away
propagation delay of 250ms
low earth orbit
appear to move in the sky
need more
handover is tricky
Switching
two parts to the system
switch (data plane)
switch controller (control plane)
Signalling
controller does not touch voice samples
manages network (routing, billing etc.)
switch controllers are separate from the switching hardware
have to keep track of the state at each endpoint (each phone, whether idle, dial tone, talking, etc.)
Cellular communication
Establishing cellular communication:
there aren't enough frequencies to give each mobile a permanent frequency
temporal reuse: if mobile is off, don't assign frequency to it
spatial reuse: mobiles in non-adjacent cells can use the same frequency
how complete a call to a mobile?
need to know where it is
how deal with moving mobile phone?
need to send packet sooner if cell is further from base station
hand off between base stations?
Issues with telephone system
critical systems (power grid control) built on assumptions
what about changing to allow multimedia stuff?
Internet
administered by a collection of entities
key technical innovations
packets
self descriptive data (data + metadata)
better than samples
we have to know where it is going
can't store it
store and forward
metadata allows us to forward packets when we want (e.g. bundle of letters to one PO)
allows efficient use of critical resources
problems
hard to control delay
switches need buffer memory
convergence of flow leads to congestion
since have IP layer, can have different underlying layers and link lots of different networks
addressing
IP address made up of network number and host number
network number
early system: Class A, B, C networks
class A: 8 bit network number
class B: 16 bit network number
class C: 24 bit network number
CIDR
use a netmask
all routers in the internet know the mask
for a subnet mask only know what it means inside the LAN
routing
need to know next hop for each network on the internet
normally just keep local ones, otherwise send to default router
reduces routing table at expense of possible non-optimal routing
endpoint control
use a dumb network
do as much as possible at the edge
higher layer (TCP) deals with network defects
run over any lower layers, but no QoS
need to to end to end checks
challenges
IP address space shortage
decentralisation
makes reliability next to impossible
cannot guarantee that route even exists
anyone can join
no uniform solution for accounting/billing
non-optimal routing
multimedia
needs QoS
Internet has no way to identify streams
routers not required to cooperate
ATM networks
design goals
end to end QoS
high bandwidth
scalable
manageable
cost effective
concepts
virtual circuits
actual circuits (POTS) have drawbacks
idle users consume bandwidth
quantization of link capacity
carry only identifier in the header (saves space)
connection needs to be pre-established
also we need to translate IDs at each switch point
all packets must follow same route
easier to look up in hardware
use signalling to separate control from data
fixed sized packets
pros
simpler buffering hardware
simpler scheduling
easier to build parallel switches
cons
overhead for sending small amounts of data
segmentation and reassembly cost
last cell normally wastes bandwidth
small packet size
balance packetization delay (time to fill a packet) vs. overhead of packet data/header
set at 48 bit for data, 5 bit for header
statistical multiplexing
each connection gives information about transmission (average rate, max peak rate, max peak length)
can trade off worst case delay against speed of output trunk
need more buffering than originally thought
integrated service
voice, video and data on the same network
hardware support for switching different data types
do by:
signalling
admission control
resource reservation
lots of bandwidth
challenges
QoS (defined but not used)
scaling
competition from other LAN techs (esp. Ethernet as it is cheap because of volume of people using it)
interoperation with IP (vast, fast growing non-ATM structure)
connection vs connectionless, etc.
Protocol Layering
protocol is a set of rules and formats that govern communication between communicating peers (Customer A and Customer B)
set of valid messages
meaning of each message
tells us
syntax of a message (fields/format etc.)
semantics of a message (e.g. a nack means received a corrupted file)
actions to take on receipt of a message (e.g. on nack retransmit)
can view protocol as providing a service
some protocols rely on other ones (layered underneath it)
e.g. reliable file transfer relies on packet transfer
terminology
Service Access Point (SAP): interface between upper and lower layer
Service Data Unit (SDU): packets hander to a lower layer by an upper
Protocol Data Unit (PDU): packets exchanged between peer entities
PDU = SDU + optional header/trailer
protocol stack
allows implementation/information hiding
allows upper layers to share lower layer functionality
actually there is a tension between abstraction and good performance
e.g. with TCP assuming packet loss always implies congestion
have to leak some information to get good run time
ISO Open System Interconnect protocol
defined by:
reference model: formally defines a layer/service etc.
service architecture: describes services provided by the layer and the service access point
protocol architecture: set of protocols which implement the service architecture
Physical layer
moving bits on physically connected end systems
prescribes
coding scheme to represent a bit
shapes and sizes of connectors
bit-level synchronisation
Datalink layer
have a frame (bits that belong together)
beginning and end markers
have idle markers for when a frame is not being transmitted
done in software but on the custom hardware (network card)
e.g. Ethernet
since this is broadcast, need addresses and contention resolution
done by MAC
Network layer
creates abstraction of one end to end link from several Datalink layer links
allows any two endpoints to have an end to end link
in datagram networks
provides routing and data forwarding
in connection oriented networks
have a data and control plane for each of the layers (i.e. separate stack for data and control layers)
e.g. IP
Transport layer
creates abstraction of error control, flow control and multiplexing
multiplexing applications onto same end to end link (using port numbers)
e.g. TCP or UDP
Session layer
creates abstraction of full duplex, expedited data delivery and session synchronisation
expedited data delivery:
put a later packet in front of outgoing data
used in phone network, if a call is started and the person hangs up, to cancel database lookups etc.
synchronisation
allow users to place markers in the data stream and roll back to a pre-specified mark
e.g. chief clerk of shipping, can do expedited delivery by courier, provide duplex etc.
Presentation layer
abstracts data representation differences, e.g. endian-ness
deals with application data
can do encryption too
Application layer
set of application useful protocols
e.g. FTP, SMTP, HTTP
System design
every design decision is a trade-off
goal
maximise a set of performance metrics given a set of resource restrictions
constrained vs. unconstrained resources
e.g. high end PC with 28.8 modem
constrained resource is link bandwidth
CPU and memory are unconstrained
common resources
time (response/throughput etc.)
space (memory/bandwidth etc.)
computation
money (cost of components/limits customers willing to pay, etc.)
labor
social constrains (standards/backwards compatibility etc.)
balanced system
one where all resources are simultaneously bottlenecked
normally you have one bottleneck, remove it and the bottleneck is then something else...
standard trade off techniques
multiplexing: time and space for money
pipelining
batching: reduced overhead for longer worst case response time and increased throughput
optimising the common case:
80% of time spent in 20% of code
spend labour optimising that bit of code
hierarchy: scalability for communication expense
abstraction
separating data and control: overhead of per packet metadata for per-packet QoS
Naming and addressing
both need to be globally unique
names
long, variable length
human readable
difficult to parse
addresses
shorter, fixed length
easy for computer to process/use efficiently in routing
name resolution
done by servers (single PoF)
naming
we want Lookup, Create Rename Update and Delete of names to be scalable and efficient
hierarchical
naïve version: ask other authorities if name is unique
recursive decomposition of the namespace
decompose into mutually exclusive partitions
scales arbitrarily
Domain Name Service (DNS)
name server which is distributed
a name server is responsible for (an authoritative server for) a set of domains
can delegate some responsibility to a child
root servers are replicated
if a local server cannot answer a query it asks its parent
replies are cached and timed out
addressing
use hierarchical schemes
globally unique names
aggregation to reduce size of routing tables at expense of non-optimum routing
in the telephone network
variable length addresses
assumes local scope unless use pointer to other scope (e.g. 01223)
in the internet
IPv4
network number and host number, defined by mask in different ways (oldest-youngest)
A, B and C class networks
classless routing by clustering addresses within a network
contiguous C blocks referred to as a larger address block
due using a CIDR mask
must be known by all internet routers
is similar to subnet mask
DHCP
computer broadcasts discover to subnet
DHCP servers offer IP addresses
host picks one and sends request to a particular server
selected server sends ack
when the host is finished with the address it sends a release
the IP address has a lease which limits the time it is valid
NAT
multiplex network layer onto transport layer (i.e. use TCP and UDP ports)
generally use non-routable addresses, e.g. 192.168.0.0/16
IPv6
extends address from 32 bit to 128 bit
multiple levels of aggregation
several types of multicast
anycast (there is a set of endpoints to which a packet can go, but unlike multicast it only goes to one of them)
ATM
Network Service Access Point address
variable length (7-20 bytes)
Datalink layer addresses
MAC addresses
hierarchically allocated (when piece of hardware is made, indicates manufacturer etc.)
not hierarchical deployed
ARP to resolve
in point to point LAN need ARP server
susceptible to spoofing by posing as the network gateway
Routing
process of finding path from source to every destination in the network
a routing protocol sets up a routing table in routers and switch controllers
node makes local choice on global topology
routing protocol needs to intelligently summarize relevant information about (complex) global state
requirements
minimise the routing table space
minimise number and frequency of control messages
want to avoid
black holes: traffic vanishes
loops: traffic stuck within a network
oscillations: paths cycle between alternatives
use optimal path
choices
centralised vs distributed routing
source based vs. hop by hop
stochastic vs. deterministic
stochastic spreads load, avoiding oscillations, but misorders
single vs. multiple path
state dependent vs. state independent
do routes depend on current network state (e.g. delay)
telephone network
three level hierarchy, with fully connected core
Central office
Local Exchange
Wide area core
only major routing decision is at toll switch
if have to use two hop path then which?
features
stable load
can predict pairwise delay throughout the day
hence can choose optimal routes in advance
extremely reliable switches
can assume a chosen route is available (can't do on Internet)
single organisation controls the core
highly connected
connections require resources (but all need the same)
costs
requires reliability in every component and a fully connected core
Dynamic Non-Hierarchical Routing (DNHR)
divide the day into around ten periods
in each period, each toll switch is assigned a primary one-hop path and a list of alternative two-hop paths
drop only if the primary and all the alternatives are unavailable
bursts of high activity can lead to metastability
overflow into another node's direct path (can rule out by disallowing)
measures traffic once a week
Trunk status map routing (TSMR)
similar to DNHR but updates measurements once an hour or so, if they change “significantly”
Real Time Network Routing (RTNR)
decentralised
each toll switch maintains a list of lightly loaded links
one switch asks another for its list
intersection of source and destination lists gives set of lightly loaded paths
links(A) = C, D, E
links(B) = D, F, G
hence ADB lightly loaded and good alternative path
very effective in practice (used in the core)
Distance vector routing
node sends Distance Vector to all its neighbours
that says what the node's best idea of distance to every node in the network, from it
advantages
distributed
adapts to traffic changes and link failures
suitable for networks with multiple administrative entities
problem: count to infinity, can deal with using
DV has path vector
triggered updates (faster count to infinity)
Link state routing
router knows entire network topology, computes shortest path itself
link state packet (LSP)
cost of link between itself and its neighbour, e.g. (from A) {A; B; 4}
control flood the network with LSPs
store LSPs
if one comes in that is new send it out on all routes besides incoming one
network with E edges will copy at most 2E times
use sequence numbers to tell if a LSP is new
sequence numbers run out/don't know which to use on boot up
solutions:
ageing
put a timeout value in the header
and control flood notification of the event so that routers that don't know the time can do it too
on booting a router has to wait until all the old LSPs are purged and refreshed
lollipop
need a unique start sequence number
a is older than b if
![]()
![]()
![]()
if a router gets an older LSP tell sender about newer LSP
recovering from a partition
the two nodes on either side of the re-established link co-ordinate to update databases
Detecting router failure
send periodic very small “hello” packets
on timeout flood information about the dead router
Links costs
static metrics
set all link costs to 1(min hop routing)
give weights inversely proportional to capacity
low weight value links will be preferred
dynamic metrics
NB if the dynamic cost depends on load, can have oscillations
ARPAnet original (first cut)
cost proportional to length of router queue
problems + resolutions
queue length averaged over small time (spikes cause re-routing)
average queue length over longer time
wide dynamic range of path costs (high cost paths ignored)
restrict dynamic range
queue length assumed to predict future loads (remember earlier NB)
depend also on intrinsic link capacity
no restriction on successively reported costs
make restrictions
all tables computer simultaneously (low cost link flooded)
stagger table computations
Hierarchical routing
concept
divide network into set of domains
gateways connect domains
computers within domain unaware of outside computers
gateways only know about other gateways
three level hierarchy in address
network number
subnet number
host number
core advertises routes to networks only
gateways talk to backbone to find best next-hop to every other network in the Internet
if a domain has multiple gateways
external records tell hosts in a domain which to pick to reach a host in another domain
contains distance from gateway to external node
summary records tell backbone which gateway to use to reach an internal node
contains distance from gateway to internal node
Internet has three levels of routing
backbone, connecting autonomous systems (AS): exterior gateway protocol
within AS: interior gateway protocol
within LAN
exterior gateway protocols
between mutually untrusting routers
must tell trusted border gateway what paths are allowed
different ASs will use different cost metrics: convert or just ignore
Exterior Gateway Protocol
nominally DV, costs are
128 reachable
255 unreachable
allow admins to pick neighbours to peer with
Border Gateway Protocol
path vector
DV annotated with entire path
guaranteed to be loop free
can use non-tree backbone topologies
interior gateway protocols
Routing Information Protocol
distance vector
cost metric is hop count
infinity = 16
exchange DVs every 30s
Open Shortest Path First
link state
uses areas to route packets hierarchically within ASs
uses designated routers to reduce the number of endpoints
Exterior Gateway Protocol (EGP)
ATM routing
Private Network-Network Interface
link state
many levels of hierarchy
switch controllers at each level form a peer group
group has a group leader
leaders are members of the next higher level group
leaders summarise information about group to tell higher level peers
all records received by leader are flooded to lower level
LSPs can be annotated with pre-link QoS metrics
switch controller uses this to compute source routes for call-setup packets
routing in a broadcast LAN: how decide if a packet is meant for inside or outside the LAN
Internet solution:
all hosts on LAN have same subnet address
routers periodically send out advertisements (including TTL, etc.)
redirection
if a router's next hop is another router on the same LAN, host gets a redirect message
multicast routing
packet sent by any member of a group is received by all
uses:
resource location
internet TV
associates a group of senders with a group of receivers
don't need to know each others' identities
can create when sender starts sending or receiver starts receiving
even if no-one else there
multicast groups have their own class D addresses
problems:
which groups are currently active
how to express interest in joining a group
discovering the set of receivers in a group
delivering data to members of the group
expanding ring search
used to find the closest multicast server which is not heavily loaded
send out multicast packets increasing the TTL each time
multicast packet will only be forwarded if it reaches server within the TTL, and that server is not heavily loaded
hence discovering local resources first
flavours
point to multipoint
multipoint to multipoint
efficiency issues
A wants to talk to B, G, H and I
B wants to talk to A, G, H and I
could approximate point to multipoint with set of point to points
but AC, BC would carry packet three times, for G, H, I
could do with two point to multipoint multicast groups
but overhead of establishing two groups
do with multipoint to multipoint
multicast tree
send at most one multicast packet per link
optimal tree provides shortest path from sender to every receiver
difficulties
sources dynamically join and leave
need to be able to do so without explicitly notifying receiver
leaves of tree are often members of broadcast LAN: want to exploit broadcast capability
Class D to MAC address translation
insert the bottom 23 bits of the multicast IP address into the 48 bit MAC address
the top 25 MAC bits are a fixed pattern which indicate that this is a multicast packet
network cards know to listen for layer 2 packets with these 25 top bits
network card filter out those packets from the multicast groups of which the computer is a member
Internet Group Management Protocol
if a LAN has no members can not send to it (prune it)
router periodically broadcasts query message
hosts broadcast reply with list of groups they are interested in
reply after random timeout
if someone else has expressed interest you don't need to reply
wide area multicast
assume each endpoint is a router which can use IGMP to determine if it wants subscription
goal: deliver packets to all routers on the path to a subscribing receiver
solution 1
controlled flooding
need state to see if a packet is new
solution 2: reverse path forwarding (RPF)
flood
at node A forward packet labelled from S iff that packet arrived on the link which is the next hop on the shortest path to S
solution 3
don't send packet downstream if you are not on the shortest path from the downstream router to the source
potential confusion of downstream router has choice of shortest paths to source
pruning
in RPF, B and C still get packets even though they don't need them
router tells parent in tree to stop forwarding
rejoining
router C sends join(group, A) to B
or periodically broadcast a packet and C refrains from pruning
but some routers do not support multicast
virtual links (IP in IP) between multicast enabled routers
need shortest paths (for RPF) only using these virtual links
Distance Vector Multicast Routing Protocol (DVMRP)
similar to Routing Information Protocol
flood and prune
explicit join messages
Multicast extension to OSPF (MOSPF)
routers flood group membership information with LSPs
each router computer shortest path tree that only includes multicast-capable routers
complex
Core Based Tree (CBT)
have a statically configured core tree
co-ordinate multicast with core router
host sends join request to core router
routers along path mark incoming interface for forwarding
eliminates the pre-source and per-group prune records at each router (how?)
Protocol Independent Multicast (PIM)
flood and prune good for dense groups (only need a few prunes)
dense mode PIM == DVMRP
sparse mode PIM similar to CBT
can switch from CBT to shortest path tree
core is actually a “rendez-vous” point, as more than one
leaf router switches to different “rendez-vous” on timeout
routing vs. policy routing
routing: forward packet on 'best' path
policy routing: routes chosen on policy directives regarding things like
source and destination address
QoS
time of day
charging and accounting
multiple metrics
advertise multiple costs per link (e.g. delay and monetary cost)
routers construct multiple shortest path trees
problems
all routers must use the same rules
remote routers may misinterpret policy
provider selection
assume single service provider provides almost all the path
choose policy by choosing provider
crankback
router returns packet if no next hop with sufficient QoS can be found
in ATM networks do just for the call setup packet
in internet have to do for every packet
routing vs. forwarding
routing:
knowing how to get packets from source to destination
have to maintain routes
forwarding
moving a packet from an input port to an output port
make a forwarding decision based on route information
getting the packet to the output port fast
in IP
packet arrives
check destination address
look up in routing table
queue packet to output port
mobile routing
location: where is the host?
routing: how get packets to it?
telephone network
each cell phone has a global ID that it tells remote MTSO when turned on (using slotted ALOHA up-channel)
remote MTSO tells home MTSO
to phone: call forwarded to remote MTSO to closest base
from phone: call forwarded to home MTSO from closest base
new MTSOs can be added as load increases
in the internet
Very similar to mobile telephony
but outgoing traffic does not go through home
and need to use tunnels to forward data
Use registration packets instead of slotted ALOHA
passed on to home address agent
Old care-of-agent forwards packets to new care-of-agent until home address agent learns of change
problems
security
mobile and home address agent share a common secret
checked before forwarding packets to COA
loops
Error control
CRCs
detect
all single bit errors
almost all two bit errors
any odd number of errors
all bursts up to M where generator length is M
longer bursts with probability 2-m
implementation
hardware: with shift register
software: one's complement
pre-compute remainders for 16 bit words
add remainders to running sum
want to touch every byte only once
packet errors
loss
uncorrectable bit errors/overflow in buffer
duplication
insertion
solutions
per connection incarnation number
assign port numbers serially
assign sequence numbers serially
wait one Mean Packet Loss after boot up (30s-2min)
flushes old packets from the network
3 way handshake
C -> S : SYN(c)
S -> C : SYN-ACK ACK(c+1) SYN(s)
C -> S :ACK ACK(s+1)
reordering
sequence numbers
incremented for non-retransmitted packets
ACKs carry cumulative sequence number
size of sequence number
> max time can have an ACK * bit rate
> number of packets that can be reordered
> number of packets that can be lost
maximum packet lifetime
time to live (TTL)
NACKs aren't very useful
take up bandwidth, have to timeout anyway
timeout
if no ACK then resend
SYN-attack
forge unused IP address
send a long sequence of SYN packets to server
up to four minutes' resource allocation per SYN while server retires its SYN-ACKs
static scheme
know RTT before hand
dynamic scheme
measure RTT
timeout is a function of measured RTTs
old TCP scheme
![]()
![]()
a = 1 -> ![]()
a = 0 -> no history
new TCP scheme
![]()
![]()
![]()
error control windows
retransmission
go back N
selective
wastes header space (bitmap of received packet sequence numbers)
fast retransmit
assume cumulative ACKs
if see 1, 2, 3, 3, 3, retransmit (n+1) = 4
SMART
cumulative sequence number and
for packet sequence number
Flow control
considerations
send at rate which receiver and network can process
simplicity
overhead
scaling
fairness
stability
usually transport layer
classification
open loop
call setup
source describes desired flow rate
network admits call
data transmission
user sends within parameter range
network polices users
closed loop
source monitors available service rate
sends at that rate
due to speed of light delay, some errors occur
first generation
ignore network state, only match receiver
second generation
responsive to different types of state
state measurement
explicit
network tells its current rate
more overhead
implicit
endpoint tries to figure out by looking at network
control: two methods
transmission window used for error control and flow control
rate:
directly control the rate
no coupling of flow control and error control
point of control: two methods
hop by hop
first generation flow control at each link
end to end
sender matches all the servers on its path
hybrid
source asks for minimum rate
can send more if bandwidth available
traffic descriptors
traffic envelope
use for
traffic contract
input to regulator
input to policer
requirements
representativity
adequately describes flow so that network does not reserve too little or too much resource
verifiability
verify that descriptor holds
preservability
doesn't change inside the network
usability
easy to describe and use for admission control
peak rate
fixed size packet network: minimum inter packet spacing
variable size packet network: highest rate over all intervals of a particular duration
sensitive to extremes
average rate
jumping window
over consecutive intervals of length t, only a bits sent
regulator reinitialises every interval
moving window
over all intervals of length t only a bits sent
regulator forgets packets sent more than t seconds ago
linear bounded arrival process
number of bits transmitted < rt + s
r = long term rate
s is burst limit
leaky bucket
regulator for LBAP
three way trade off between
burst limit s
loss rate
long term rate r
large bursts mess it up
on-off
send on, off signals
stop and wait
send and wait for ACK before sending next
static window
can send multiple packets per RTT (whilst waiting for ACK)
bandwidth delay product/optimal window size
w > b*R
w is packet size of window
b is bottleneck bit rate
R is RTT
b changes, static doesn't work
DECbit flow control
every packet has a specific bit in the header
this bit is set by any of the intermediate servers if a queue has built up
sink copies the bit to ACK
if bit set then source reduces window size
oscillates around optimal state
additive increase, multiplicative decrease
TCP flow control
implicit
dynamic window
end to end
similar to DECbit without the router support
window starts at 1
exponential increase, “slow start”
doubles every RTT
each ACK results in window increase by one
linear increase, “congestion avoidance”
increase by one every RTT
window increases by one when number of ACKS = window size
loss detected with timeout or fast retransmit
Tahoe: both cases, drop window to one
Reno:
timeout: drop window to one
fast retransmit: drop window to half previous size
weaknesses
loss doesn't necessarily imply congestion
congestion doesn't necessarily imply self blame
NETwork Block Transfer (NETBLT)
rate based flow control scheme
separates error control from flow control, losses and retransmissions do not affect the flow rate
application data sent as a series of buffers
if received rate < sending rate multiplicatively decrease
packet pair
improves on NETBLT
assume all bottlenecks serve packets in the same order
ATM forum End to End Rate baled flow Control (EERC)
similar to DECbit, but send whole cell instead of one bit
Resource Management cell
source sends periodically (once every 32 cells)
has rate request
each server fills in RM cell with current share, if less
comparison to DECbit
sources know exact rate
non-zero initial cell rate
RM cells sitting in the data path is a mess
general points
flow control is easier with RR (round robin?) scheduling
explicit schemes are more robust
hop by hop schemes are more responsive but more complex
separation of flow control and error control is good
rate based schemes are inherently unstable
Common protocols
plesiochronous hierarchy
“nearly synchronous”
tight control of deviation from synchrony
output runs a bit faster always
overhead identifies bits from a particular stream
if a stream runs faster use overhead to identify it
incompatible hierarchies around the world
cannot switch bundles of calls
synchronous digital hierarchy
requires atomic clocks
SDH
9 rows, 90 columns
each payload container serviced in 125 microseconds
one byte = one call (works out, given 1000/125 payloads/second at rate 64kbps)
all overhead is in the headers
IP
properties
unreliable
best effort
end to end
fragmentation
if a packet reaches a link with a smaller Maximum Transmission Unit than the size of the packet can break the packet down into fragments and transmit those
reassembly happens at the receiving host
can set a bits to indicate
this packet is a fragment
do not fragment this packet
error multiplication: if error in any fragment have to retransmit whole packet
traceroute
router sends ICMP error packet if decrement TTL to 0
source sends packets with increasing TTL
ICMP
used to send error messages
destination unreachable
redirect
TTL exceeded
fragmentation needed, but “don't fragment” bit set
TCP
multiplexed
duplex
connection-oriented
reliable
flow-controlled
byte-stream
HTTP
request/response (stateless connection)
protocol is simple
browser is complex (and usually buggy)
cookies can violate statelessness
e.g. one cookie for a website and two windows with two shopping trolleys contending
ATM (datalink layer)
connection oriented
in-sequence
unreliable
QoS guaranteed
routing information
made up of
virtual path identifier (VPI)
virtual circuit identifier (VCI)
virtual paths
identified by the VPI
all VCIs for the same VP share path and resource reservation
set of VPs define a virtual private network (not the “tunnelled encrypted network over LAN”)
ATM Adaptation Layer (AAL – network layer/rest of the stack)
AAL 1
for synchronous apps
timestamps, clocking, sequencing
constant bit rate
forward error correcting
AAL 3/4
create a packet of whatever size, put on a frame header and trailer
then break down and put in ATM packets with a per cell AAL header
error detection, segmentation and reassembly
AAL 5
violates layering to be efficient
SSCOP
similar to TCP
no ACKs
sender polls, receiver sends status, including ACK and window size
poll interval is key variable
IP over ATM
treat ATM as a link-level technology
problems
ATM is connection oriented
different addressing schemes
ATM is point to point while IP assumes broadcast
IP encapsulation in ATM
put entire IP packet in AAL5 frame
is multi-protocol encapsulation
resolving IP addresses to ATM addresses
designate an ATM host as an ARP server
need to use inverse ARP to get the layer 3 address of the client (which is needed to make the connection)
creating an ATM based IP subnet
if all hosts on an ATM are on the same IP subnet then broadcast reaches all (causes congestion)
partition into logical IP subnets
this can make longer paths between hosts which are actually attached at ATM level, as if it's outside your logical subnet you have to send your packets via the router
next hop routing
next hope server stores IP to ATM translations independent of logical subnet boundaries
like DNS
resolving multicast addresses
maintain a set of endpoints that correspond to a particular class D address
Multicast Address Resolution Server keeps this
LAN emulation
used to make IP think it's working over Ethernet or token ring
translate from MAC address to ATM address
need to emulate broadcast
Cells in Frame (CIF)
previous solutions required ATM host-adapter card
can use Ethernet card:
encapsulate AAL5 frame in Ethernet header on point to point Ethernet link
CIF attachment device at the other end decapsulates and injects frame into ATM network
holding time problem
to send an IP packet have to open an ATM connection – when close it?
locality principle, more packets are likely soon after some others have been sent
optimal solution depends on pricing policy and packet arrival characteristics
use measurement based heuristics
close connection if expected cost exceeds expected benefit
Protocol Implementation
depends on
structure
environment
structure
partitioning of functionality between user and kernel
trade-offs
customisability
security
performance
partitions:





separation of layer processing (interface) strategies
single context
...
tasks
upcalls
environment
data copy cost
interrupt overhead
context switch time
memory access latency
cache effects
rules of thumb
fine tune inner loops
beware of data touching
send largest packets possible
use hardware
Multiple access problem
how should participants coordinate actions so that
the number of messages exchanged per second is maximised
the time spent waiting for a chance to speak is minimised
contexts for the problem
broadcast transmission medium
colliding messages are garbled
main contexts
wired broadcast LAN
wireless
packet radio
cellular telephony
satellite communication
solution
choose base technology
choose how to allocate limited transmission resource to contending users
choices
centralised vs. distributed design
moderator?
Centralised: one is master, other are slaves
distributed: all are peers
circuit mode vs. packet mode
do stations send streams or bursts?
circuit mode – allocate resources to stream
packet mode – contend for resources with each packet
constraints
in real life propagation is not inverse square law
spectrum scarcity
radio link properties
error proneness
multipath interference
hidden terminals
parameter 'a'
number of packets sent by a source before the farthest station receives the first bit
performance metrics
normalised throughput: fraction of link capacity used to carry non-retransmitted packets
mean delay: time station has to wait before it successfully transmits a packet
stability
with a heavy load is all the time spent on resolving contentions?
if infinite uncontrolled stations share a link then instability is guaranteed
fairness (arbitrary definition)
base technologies
frequency division multiple access (FDMA)
each station has its own frequency band separated by guard bands
number of frequencies is limited
reduce transmitter powers and reuse frequencies in non-adjacent cells
transmission division multiple access (TDMA)
send on same frequency at different times
need time synchronisation
pros
can give different amounts of bandwidth to different users
can switch off power when it's someone else's slot
cons
synchronisation overhead
multipath interference on wireless links
code division multiple access (CDMA)
frequency hopping
send at a different frequency in each time slot
direct sequence
convert a single bit into a code
pros
hard to spy
immune from narrow band noise
all cells can use all frequencies
cons
implementation complexity
need large contiguous frequency band for direct sequence
method to convert wireless medium to duplex channel
frequency division duplex (FDD)
uplink and downlink use different frequencies
time division duplex (TDD)
uplink and downlink use the same frequency at different times
centralised schemes
slave can only transmit when master permits
natural fit in some solutions
wireless LAN: base station is the only station that can see everyone
cellular telephony: base station the only one capable of high transmission power
pros
simple
cons
single POF (and need to have re-election procedure)
master involved in every transfer (causes overhead for shall transmissions)
circuit mode
when station wants to transmit sends message to master using packet mode
master allocates resources to slave which uses until is done
no contention during transfer
also known as reservation based schemes
when 'a' is large can't used distributed schemes (too many collisions)
e.g. satellite links
stations contend for packet slots for reservation messages
master then allocates main channel based on the reservation messages
polling
master asks each station if it wants to send
can be reasonably efficient – ask those who have been sending more frequently
probing
stations are numbered with consecutive logical addresses
master does binary search to locate next active station
distributed schemes
decentralised polling
each station assigned a slot it can use
if nothing to send the slot is wasted
need a protocol to get time synchronisation
decentralised probing
probe in a pre-established order
works well if 'a' is small
works poorly if many active stations or if all active stations are on the substation
CSMA(carrier sense multiple access)
an improvement on the more simple distributed multiple access schemes
check whether the medium is active before sending a packet
don't have to wait for a time slot or permission from a master
if the medium is free then can transmitter
if a collision happens then detect and resolve
it works well when 'a' (the number of packets sent by a source before the farthest station receives the first bit) is small
possible solutions to the collision problem
p-persistent
on idle, transmit with probability p (hard to choose)
if too p is small then wasted time
if p is large then more collisions
exponential backoff
on a collision choose a timeout randomly from a range which doubles each time
don't need to choose a p
in basic CSMA there is no collision detection in the transmitter, it is detected as a frame error (indistinguishable from any other) in the receiver and some error control protocol is used
CSMA/CD
the transmitter has some scheme to detect if there has been a collision on detection
method
wait until medium is free
start transmitting
check for collision after each bit transmitted
if detected
transmit a jam signal
waits for time interval set by exponential backoff
CSMA/CA
used in wireless LANS
can't detect collision because the transmitter overwhelms co-located receiver (think radio: shouting very loud then listening for a whisper – cannot be listening for the whisper of someone else whilst shouting)
need explicit acks
have to reduce collisions as collisions are more expensive
method
check to see if medium is busy
if not then wait interframe spacing
set a contention timer to an interval randomly chosen in the range [1,CW]
on timeout send packet and wait
if no ack
assume packet lost
try again with a doubled CW
if another station transmits while counting down, freeze CW and unfreeze after the other station finishes transmission
this only works when every station can receive transmissions from all other stations
hidden terminal: some stations in the area cannot hear transmissions from others, though the base can
exposed terminal: some, but not all, stations can hear transmissions from stations outside the local area
methods of dealing with hidden terminals
busy tone multiple access (BTMA)
use separate frequency to send out a “busy tone” when receiving
multiple access collision avoidance
BTMA requires split frequency band (more complicated receivers)
instead use explicit messages to indicate when receiver is busy
transmitter sends Request To Sent message
if receiver is free it will send back Clear To Send
if transmitter receives a CTS it sends the data
Ethernet
uses CSMA/CD with exponential backoff
on collision place jam on wire
small 'a' so time wasted in collision is minimised
maximum packet size
1500 bytes
given two clocks at the same frequency they will drift
this is the max no of bits which can be safely sent without missampling
do so don't have to implement phase locked loop
two part bridge is technically a switch
getting faster: not “how to do” but “how to do cheap enough for PCs”
to avoid needing QoS guarantees can just increase bandwidth
doesn't work past 100 Gbps
pros
easy to set up
requires no configuration (has built in unique hardware address)
can segment LANs to reduce load
cons
at heavy loads users see large delays due to backoff
non-deterministic service
big overhead on small packers
token passing
improve on polling
token gives right to transmit
if don't want to transmit pass on the token
prevents starvation
when receive a data packet always pass it on (unless you were the sender)
if you are the receiver set the ACK bit
pros
medium access protocol is simple and explicit
don't need to carrier sense/synchronise etc.
no collisions
cons
token is single POF
all stations must communicate
stations must actively monitor the network
normally need one station as a monitor
Fiber Distributed Data Interface
dual counter-rotating rings
supports both realtime and non-realtime traffic
ALOHA
just send the packet
if there's no ack try again after waiting a random time
no backoff
pros
useful when 'a' is large
simple
cons
at high loads collisions are very frequent
sudden burst of traffic can lead to instability
Slotted ALOHA
use discrete timeslots (can only start transmitting at the start of a slot)
decreases possibility of contention
reservation ALOHA
contend for reservation minislots using slotted aloha
stations independently examine reservation requests and come to consistent conclusions
pros
supports circuit and packet mode transfer
works with large 'a'
simple
cons
cannot pre-empt hogs
Switching
packet vs. circuit switches
packets have headers
samples don't (just an identifier)
connection oriented vs. connectionless
connection oriented need call setup
setup handled in the control place
goals
maximise capacity (subject to cost/reliability constraints)
capacity of switch is maximum rate at which it can move information
minimise call blocking
only for circuit switch
circuit switch must reject a call if can't find a path for samples
minimise packet loss
only for packet switching
packet switch must reject a packet if can't find a buffer to store it
don't reorder packets
multiplexor
for N identical inputs, trunk runs at N times the speed of one input
demultiplexer similar
neither mux nor demux need addressing information
always in same place on trunk so know which is which channel's by time division mux
inverse multiplexing – take high bit rate stream and scatter it across multiple trunks
circuit switching
move 8-bit sample from input port to output port
generally inputs trunks are multiplexed (rather than 200 000 individual inputs)
multiplexed trunks carry frames (sets of samples)
extract sample from frame, depending on position in frame, switch to output
call blocking
internal blocking: there is an output slot but no path
output blocking: no output slot available
transit switch
can put sample in one of several slots going to desired next hope
reduces output blocking
Time Division Switching

when demultiplexing position in frame determines position in output trunk
write n bytes from incoming trunk in order into n memory slots
read slots out in a permuted order
limited by read/write speed of memory
Space Division Switching
each sample takes a different path through the switch depending on its destination
crossbar
for multiplexed input need switching schedule
multistage crossbar
can save crosspoints if a crosspoint can attach to more than one input
have more than one crossbar in the system
you can have more than one path to the output

time space switching
use TSI for first stage, with crossbar for the middle and final stages
delay samples so that they arrive at the right time for the space division switch's schedule
time-space-time switching
use TSI for first and final stages, with crossbar in the middle
replace n input * k output space switch by TSI switch that takes n-slot input frame and switches it to k-slot output frame
packet switching
datagram: lookup based on the entire destination address
cell: lookup based on VCI
repeaters: physical level
bridges:
datalink layer
based on MAC addresses
discover who is on which side by listening
routers
network level
participate in routing protocols
application level gateways: treat entire network as single hop
deciding on output port for a datagram
(in circuit switching can do using table lookup as have stored during call setup)
need to find longest prefix match
standard solution: use a trie
a sorted tree data structure
can improve performance by
caching more recently used addresses
move common entries to a higher level (i.e. match on longer strings)
blocking (internal or output)
must buffer or drop
deal with by
over provisioning: internal links much faster than inputs
buffers at input and output
back pressure: if switch fabric doesn't have buffering, prevent from entering until path available
switches
first generation
basically like having a load of network cards in a PC

second generation
port mapping intelligence in line cards
bottleneck: bus or ring
third generation
the self routing fabric provides parallel paths
contention is for the output buffers
potential for unlimited scaling
switch fabrics
is for transferring from input to output ignoring scheduling and buffering
ways of doing
crossbar
need to compute schedule on the fly (unless fixed size in known arrival pattern)
buffered crossbar
buffer the crosspoints
broadcast
packets are tagged with an output port number and sent to all ports
need to match N stages in parallel at each output port
only useful as part of a switch or in a small one
switched fabric element fabrics
switched fabric element
if 0, send packet to upper output
else to lower output
if both packets to same output buffer or drop
properties
NxN switch with bxb elements has logbN elements
self routing
recursive
can be synchronous or asynchronous (i.e. can use for variable sized packets)
Banyan
if two packets want to go to the same output there is output blocking
avoiding blocking
check path is available before sending
or
sort by output
merge sort in hardware
remove duplicates
recirculate to the beginning
or run output of trap to multiple Banyans
remove gaps
perform a perfect shuffle
split set in half and interleave
i.e. 1 2 3 4 5 6 7 8 -> 1 5 2 6 3 7 4 8
packet size
small fixed size
easy to build large parallel fabrics
more overhead – overhead dominates at high speeds
variable size
can fragment or build asynchronous fabric
hence variable size not hurt too much
buffer placement
need buffers to match input rate to service rate
input buffering (queueing)
need arbiter to say which queue's packet for output K can go
head of line blocking
if a 1 had been chosen from a different input then the 3 on this output could go through the fabric but is trapped
solution:
multiple queues per input
arbiter must choose which queue an input port sends
but multicast is usually pathological for this
output queueing
output queues have to run faster than the trunk speed
knockout principle
use to reduce speed required
drop extra packets, fairly distributing loss between inputs
buffered fabric
buffers in each switch element
pros
only need to be as many times faster than trunk speed as the degree of the fan in
if you use hardware back pressure you can reduce the buffering requirements
if the arbiter knows the buffers are full it can prevent a certain output number from entering the fabric
cons
costly
hard scheduling
multicast switches
do in hardware
port mapper has a list of outputs
incoming packet must be copied to these output ports
sub-problems
generating and distributing copies
implicit
multiple outputs enabled after placing packet on shared bus
bus/ring based, crossbar or broadcast switches
explicit
copy a packet at the switch elements
use copy network
header (VCI) translation for copies
normally can do VCI translation at input or output
for multicast
input maps VCI to output ports
output swaps the VCI
Integrated services
problems with IP
data transfer
no recognition of flows
connectionless – no signals
forwarding
per datagram
no examination of type of traffic, no priorities
e.g. file transfer vs. VoIP
routing
no fixed path, so no fixed QoS
scheduling in the routers
FCFS – no examination of type of traffic
requirements for an Integrated Services Network
QoS service-level
packet handling
traffic description
policing
application flow description
Service interface
common data structures and parameters
signalling protocol
admission control
check request can be honoured
scheduling
packet classification
prioritisation of traffic
queue management
Traffic and QoS parameters
generally a network is three level structure
access network
local
low multiplexing and volume of traffic
distribution network
medium volume
low multiplexing
core network
high multiplexing and volume of traffic
mixed traffic in the network
routers use FCFS queues
no knowledge of traffic patterns or applications
traffic is often aggregated in complicated ways in the core
packet size
large
good for general data
“router friendly”
slows real time traffic
small
good for real time data
less end-to-end delay (defined using:
propagation
transmission (data rate)
buffering
application specific processing)
better tolerance to loss
less jitter (defined using:
jitter is variation in end to end delay
causes
FIFO queuing
dynamic path changes, etc.
only critical in TCP if it is a first order effect
critical to voice, Video over IP etc.
less efficient due to overhead/data ration
not “router friendly”
end to end loss
non real-time
retransmit (as in TCP)
real-time
forward error correction
media specific “fill in” at the receiver
reordering can be seen as loss
causes
packet dropping at routers
traffic violations
excessive load
link/router quality
end to end data rate
short term changes: during the course of a flow
long term changes: during the course of a day
goodput – residual throughput after allowing for overheads (inc. retransmission)
affected by
protocol behaviour (TCP congestion control)
dynamic routing
congestion
aggregation
link multiplexing
elastic (non-real time) applications
asynchronous: e.g. email
interactive: http, ftp, etc.
inelastic (real time) applications
streaming voice
non interactive
end to end delay and jitter not important
data rate and loss very important
real time voice
QoS parameters for the internet
delay
cannot set a maximum, but can measure it (Dmax)
jitter
cannot set end-to-end jitter value
![]()
loss
not really QoS parameter for IP networks
linked to data rate
packet size
restrictions
may be used by routers for buffer allocation etc.
data rate – specify using
leaky bucket
parameters
B: bucket size in Bytes
L: leak rate in Bytes/sec
data poured into bucket
maximum latency is B/L (i.e. how long it would take from the top of the bucket)
token bucket
parameters
b: bucket size in Bytes
r: bucket rate in Bytes/sec
p: peak rate in Bytes/sec
bucket fills with tokens at rate r
presence of a token allows a data transmission
transmission is allowed at a rate up to p
real time media flows
audio
low loss preferable
data rate basically fixed for certain applications
video
loss must be low
data rate dependent on
frame size
colour depth
frame rate
encoding
QoS services and application level service interfaces
INTSERV (an integrate services network architecture)
this is an architecture used to guarantee QoS on networks
is a fine-grained QoS system
concept:
every router in the system implements IntServ
every application that requires some kind of guarantees has to make a reservation
"Flow Specs" describe what the reservation is for (describe service semantics)
"RSVP" is the underlying mechanism to signal it across the network
service templates/“flow specs”, consists of
TSpec: allowed traffic pattern, normally bucket parameters
RSpec: service request specification
policing:
check that the TSpec is adhered to
if violated can change the packet handling (degrade service etc.)
RSVP
used for signalling resource requests between user, server and network
receiver oriented: receiver of a flow initiates and maintains the resource reservation for that flow
every 30s or so each server which can send QoS data sends out a message to see if any clients are interested
if a client is interested it sends a message back indicating what it wants and what guarantees it needs
each router along the way decides if it can deal with the resource request
if not send back a cancellation message
if so keeps soft state of reservation options
routers then police according to that policy, dropping the flow after a timeout
is flow based (vs. class based) – treat each flow differently
problems
lots of state being held
lots of refresh messages clogging up the system
applications and routers need to be RSVP aware - what about legacy applications?
DIFFSERV
is a coarse-grained QoS system
is class based (vs. flow based)
data packet placed in one of a limited number of traffic classes
doesn't differentiate network traffic based on the requirements of an individual flow, but on the class of a packet
each traffic class can be managed differently
the packets are assigned to classes by the network (i.e. the ISP)
this ensures user cannot always ask for highest QoS without paying
also means that the end system is unaware that or how this happens
good for legacy applications
per hop behaviours (traffic classes)
expedited forwarding: low loss, low latency traffic
assured forwarding: has varying levels of drop precedence
problems
going from one DIFFSERV cloud to another (one ISP to another), one cloud's gold packet may be another's bronze
no standards for per hop behaviour types or how to treat them
INTSERV and DIFFSERV?
INTSERV is good for per-flow reservations
DIFFSERV is good for aggregate, per customer/application/group etc.
run INTSERV reservations within DIFFSERV flows?
RTP
is carried in UDP packets
supports multicast natively
uses RTCP to permit application level control/signalling
Real Time Control Protocol (RTCP)
uses signalling channel to send data about the flow
this will affect how the sender and receiver act as entities
is not directly applicable to specific packets
it provides QoS info for deciding how to shape the flow
loss, delay, jitter
user information
etc.
application level signalling
user to network
Telco network
separate data and signalling path
IP
signalling information carried on the data path
RSVP carried in IP packets along data path
need aggregated signalling towards the core, e.g. using INTSERV with DIFFSERV?
i.e. can't keep individual
user to user signalling
call/session set-up
use to exchange each's capabilities
application level signalling supported by network
summary
need QoS mechanisms for UP
per flow
INTSERV
RSVP
does not scale well
customer/provider services
DIFFSERV
Scheduling and queue management
traditional routing behaviour
per packet, no idea of flows
generally FCFS, no priority traffic
this basic behaviour won't support QoS, need to modify it
simple router schematic:
the input lines have no input buffering
uses policy to classify packets
different classifications go in different output buffer for each output
scheduler decides which buffer to schedule next
FCFS in this scheme
use a default classifier
queue packets to outputs in order they arrive
service a packet as soon as possible after it arrives
work conservation
a system is work conserving if it is not idle when there is work to do (i.e. packets waiting)
we cannot give all flows a lower delay than they would get under FCFS and we have the equation:
![]()
where
ρn = λn . μn
ρn is the mean link utilisation
qn is the mean delay due to scheduler
λn is the mean packet rate (packets per second)
μn is the mean per-packet service rate (seconds)
C is a constant
non work conserving schedulers
these are permitted to be idle even if packets are available
wait until a packet is eligible for transmission
advantages
permits traffic smoothing for flows
makes downstream traffic more predictable
less jitter
possibly can get away with less buffer space
disadvantages
higher end to end delay
complex to implement
requirements of a scheduler
simple to implement
it needs to be implemented on high speed networks
want little complexity so can have a fast (hardware) implementation
fairness
protect any flow from the misbehaviour of any other
local fairness generally implies global fairness
performance bounds
on jitter, delay, loss ...
can we do it so it's deterministic, provable, per flow?
Admission control
if it is required it needs to be fast, simple and efficient
fairness: max-min fair share criteria
this is an algorithm to provide fairness to protect flows from each other
algorithm
find the flow with the smallest demand for the resource, from a pool of n flows
give it as much as it wants (no more), up to R/n where R is the remaining resource available
i.e. flows which can't have as much as they want get an equal share of the resource available
remove the flow from the pool of flows, decrement the amount of resource available
go to 1
mn = min(xn , Mn ) 1 <= n <= N

where
C is the total capacity of the resource
mn is the actual resource allocated to flow n
xn is the resource demanded by flow n
Mn is the potential resource available to n
you can do weighted max-min share
parameters in a scheduler
number of priority levels
how to serve the different levels
especially to prevent starvation of lower levels
is jitter/delay control for some levels
this could make it non work conserving
aggregation: will priority be decided
per flow?
per application?
per user?
how to service within a given level's queue
possible schemes
simple priority queueing
k queues
queue (k+1) has higher priority than queue k
generalised processor sharing
weighted max-min share
serves an infinitesimally small amount of data from flow i
visits flows round-robin
is unimplementable but used as a standard
relative fairness: fairness of a scheduler with respect to other flows it is servicing
absolute fairness: fairness of a scheduler compared to GPS for the same flow
weighted round robin
different flows will have different mean packet sizes
so to help make more fair divide the weighting coefficient by the mean packet size
this can introduce unfairness if the mean packet size is estimated incorrectly
service is fair over long time-scales
deficit round robin
each queue has a (deficit) counter dc which is initially zero
scheduler goes through the flows in order
attempts to serve the queue if
size <= quantum + dc
if it does serve it then it sets
dc = quantum + dc – size
otherwise it sets
dc = quantum + dc
the quantum is a 'credit' which builds up each round
it is normally the maximum expected packet size
weighted fair queueing
is a GPS emulation
estimates the time the packet would have completed under bit by bit (i.e. discrete) GPS
packets are tagged with this finish number
the smallest finish number across the queues is served first
buffer drop policy
drop packets already queued in order of decreasing finishing number
can provide
“best effort” queueing
guaranteed data rate and deterministic end to end delay
Class based queueing
is only a queueing mechanism (still needs a scheduler)
link capacity is shared
there is class based allocation
a node can “borrow” spare capacity from a parent
queue management
packet dropping
policies
drop from tail
drop from head
better for TCP
random drop
fair if all sources are behaving
flush queue (drop all packets)
intelligent drop
needs lots of higher level state information
reactions of end systems to packet drops
non-real time – TCP
backs off
real time – UDP
application level control flow required
real time data flows may not be able to adapt
hence this hurts flows which do adapt (as they have to adapt for the non-adaption of other flows as well as for the general congestion)
congestion control
random early detection
method
monitor the average queue size and drop packets based on statistical probabilities
if the buffer is almost empty, all incoming packets are accepted
as the queue grows, the probability for dropping an incoming packet grows too
when the buffer is full, the probability has reached 1 and all incoming packets are dropped
any time a packet is dropped the sender is notified and hopefully slows down
this process should reduce the cycle of
the router becoming full (the network is congested)
all incoming packets are dropped
all incoming flows reduce flow rate drastically
the network becomes underutilised
all flows build up again at roughly the same rate
repeat
explicit congestion notification
works using IP, only if both ends signal that they will use it
on congestion a router sets a bit in the IP packet
in the ACK the received indicates this congestion
the received must act (in TCP terms) as if the packet was dropped (in terms of reducing flow, not in terms of retransmission)
we really need adaptive sources
issues
fairness and protection
scalability
granularity
speed of operation
flow adaptation
aggregation
signalling
router state
ECN?
Routing for integrated services
routing requirements for integrated services
multiparty communications
multicast
multi-user games
etc.
support for QoS in routing
QoS based routing
traditional (telephone) routing
the destination determines the path
there is only one optimal path to the destination
properties of QoS routing
multiple paths possible
different paths have different QoS properties
routing updates include QoS parameter information
can do in IP header
not widely used – no global agreements
there are lots of metrics available to determine QoS, depending on what you want
possible metrics
minimum delay path
maximum throughput path
maximum reliability path
minimum cost path
if the router is having to deal with choosing a path for a packet based on its indicated preferences there will be a lot of overheads
larger router updates (as QoS parameter information is passed in updates, as above)
more state is stored in routers
more processing per packet
there can be advantages to staying on the same route, even if a “better” one becomes available, and there are a couple methodologies to do this
route pinning
ensures that, whilst there are still packets coming from A to B, in any flow, that they all follow the same route
this can cause congestion (as it defies normal routing ideas of taking the best route for each packet)
path pinning
ensures that all packets from A to B in flow F go along the same route
if a new flow starts it may take another route
this can be problematic if there is a long lived flow and the route has since become a much worse choice
intra domain QoS routing
can use agreed single/multiple metrics
need to indicate any disruptions to QoS along a path
need to support best effort traffic
is really a research issue
inter domain QoS routing
needs to be scalable
QoS routing shouldn't be very dynamic
don't want lots of updates
shouldn't constrain intra-domain QoS routing
QoS routing for multicast
has to support
widely dispersed groups
dynamic membership changes
must scale across domains
is really hard...
Traffic management
this is a set of policies and mechanisms that permit a network to efficiently satisfy a diverse range of service requests
needed for next generation data networks
economic principals
utility function
it is assumed that there is a function from a given quality of service to a utility
utility in this case is “level of satisfaction”
this function is determinable by observing how rational users act
e.g. a file transfer
u is utility from file transfer
S is satisfaction from a infinitely fast transfer
a is rate at which satisfaction decreases with time
t is transfer time
u = S – a . t
where
NB, if t > S/A then the user is worse off (negative utility)
social welfare
social welfare is maximised when some combination (normally summation) of all the utility functions is maximised
efficiency: a network is efficient when increasing the utility of one user must necessarily decrease the utility of another
envy-free: a network is envy-free if no user would trade places with another (NB better performance costs more)
attempt to maximise social welfare, envy freeness and still make a profit
a single network which provides separate QoS is better than separate networks for each QoS
can share unused capacity
lowering delay of delay-sensitive traffic increases welfare
will the person downloading program really notice an extra couple seconds, verses VoIP
normally welfare increases more than linearly with an increase in capacity
applying the principals
it's better to give 5% of the traffic low delay than all the traffic slightly lower delay
the big question
if we took the money we spent on attempting to control the traffic and just blindly increased capacity, would we be better off?
traffic models
to do the above we need a traffic model of how the users behave
can either measure or guess
telephone traffic models
time between calls is an exponential distribution (Poisson process)
length of calls is an exponential with a heavy tail (significant numbers of calls are very long)
internet traffic models
LAN connections differ from WAN connections
many parameters are heavy-tailed (a few calls are responsible for most of the traffic)
traffic classes
traffic classes encompass both
user requirements
what the network service offers
e.g. telnet requires low bandwidth and low delay
basic classes
guaranteed service
utility is zero unless an application gets a minimum level of QoS
that QoS could be in one or more of bandwidth, delay, loss, jitter ...
typically synchronous and real-time
best effort
typically asynchronous and indifferent to real time
e.g. email
ATM guaranteed service subclasses
constant bit rate
mean and peak rate are the same
variable bit rate
long term average with occasional bursts
try to minimise delay
e.g. constant quality variable bandwidth audio
available bit rate
user gets whatever is available
unspecified bit rate
like ABR but without feedback
time scales
actions are taken at different rates during a call
some once per call (e.g. resource requests)
some every few RTTs (e.g. feedback flow control)
some very rapidly (e.g. policing)
traffic management mechanisms need to deal with a range of traffic classes at a range of time scales
mechanisms
renegotiation
static per-call descriptors doesn't make sense for many traffic sources
e.g. if the network serves a real time video stream at slightly less than then agreed average rate it will need a large amount of buffering space to keep up
renegotiating service rate every 10 sec gets the bandwidth requirement to nearly average rate
note that renegotiation does cost in terms of overhead
renegotiated constant bit rate (RCBR) service
all traffic sent as constant bit rate
CBR is renegotiated if necessary
buffers are at the edge of the network
is easy to price
signalling
this is a source signalling its utility function to the network, in two parts
how to carry the message (transport)
how to interpret it (semantics)
how the signalling works
setups are sent and acknowledged per hop
resources are reserved tentatively
is hard due to complex services, feature interaction, performance tradeoffs...
resource translation
the application wants an end-to-end quality
this needs to be translated into per-hope requirements
two pass approach
forward: attempt to get as high a QoS at each hop as possible
reverse: relax the actual resources reserved so that it fits the required end to end quality
signalling on the Internet: RSVP
the main motivation behind this is to efficiently support multipoint multicast with resource reservations
multicast reservation styles
naïve (source initiated)
source contacts each receiver in turn
intelligent multicast (merge replies)
the source needs to know all receivers and the rate at which they can absorb
there are two messages per link of the spanning tree
doesn't scale
naïve multipoint multicast
two messages per source per link
can't share resources among multicast groups
RSVP
receiver initiated
state on reservation is kept per group, not pre connection
admission control
constant bit rate admission control
simple
on failure try again
best effort admission control
trivial
variable bit rate admission control
the peak rate differs from the average rate
need to decide how much over the average rate to book
trade off between wasting bandwidth or risking dropping packets in a burst
options
peak rate admission control
reserve at the connection's peak rate
pros
simple
zero fluid delay and zero loss
cons
wastes bandwidth
worst case admission control
source is described using
average rate
burst size
use weighted fair queueing or rate controlled discipline to reserve bandwidth at an average rate
pros
may use less bandwidth than with peak rate
can get an end-to-end delay guarantee
cons
complex
admission with statistical guarantees
as the number of calls increases, the probability that multiple sources send a burst decreases
the sum of connection rates is increasingly smooth
with sufficient sources all can be assumed to be arriving at their average rate
traffic from a source is sent to a buffer which is drained at rate e
if a burst is sent, its delay goes up
for too large a burst packets are dropped
equivalent bandwidth: the rate at which we need to drain the buffer so that the probability of loss is less than i and the delay in leaving the buffer is less than d
many sources will generally share a buffer
admission control method in this case
a request from a source arrives
use its performance requirements and network state to assign it an equivalent bandwidth
the sum of equivalent bandwidths should be less than the link capacity
pros
can trade off a small loss probability for a large decrease in bandwidth reservation
can obtain delay bounds
cons
assumes uncorrelated sources
hairy mathematics
measurement based admission
for traffic which cannot easily describe itself
a metric is obtained by measuring actual average load
the user gives a peak that it expects
if average + peak < capacity then admit
problems:
assumes past performance is indicative of future performance
when to 'forget' about the past?
peak load pricing
service providers want to
avoid overloading
use all available capacity
traffic has a periodic daily distribution showing peaks
“price” can be used to move some traffic to off peak times
“price” is a signal to the user from the provider about network preferences
in deciding pricing have to create utility function to show
the weight the network places on using capacity and on revenue
the weight the user places on cost and on lack of overload
rational users help the system by helping themselves
capacity planning
this is the design of modifications to
network topology
link capacity
routing choices
to most efficiently use existing resources or to alleviate long term congestion
approach
measure the network during its busy hour
measure traffic from end points to end points
we're just modifying the internal network, not end points
find the busiest hour by measuring in general a week or so
add a “fudge factor” to this rate for future growth
create traffic matrix
this is the number of bits sent from each source to each destination
assume this pattern will predict future behaviour
decide topology, which depends upon:
connectivity between arbitrary points
geographical considerations – some links may be easier to build than others
existing capacity
assign capacity
assign sufficient capacity to carry the busy hour's traffic
however routes are decided dynamically using link and congestion status
it is easier to assign capacities if routing is static and links are always up (telephone network)
some open problems
resource translation
we can do this for bandwidth
doing it for delay is hard
can either
translate delay into an equivalent bandwidth
obtain per hop delay bounds
renegotiation
static descriptors aren't that good for dynamic traffic
renegotiation is not free as there s signalling overhead
problems:
when to renegotiate?
admission control?
what to do on renegotiation failure?
measurement based admission
over what time interval to measure the average?
peak load pricing
how to choose prices for on and off peak
when peak times should end
capacity planning
simultaneously choosing topology, link capacity and routing metrics as routing and link capacity interact
smart vs. dumb
getting low delay for VoIP means giving low delay for all traffic
but if we took the money spent on traffic management and increased capacity we'd get the lower delay?
Miscellanea
scaling/complexity
all mechanisms added to IP have some cost
we would like this cost to be O(c)
normally it's O(n) in
number of end systems
number of routers
per flow queues
there will be a data structure for every active pair of senders/receivers
per class queues
data structure per class
edge systems have to do policing and shaping
may get O(log(n))
stability
traffic, both data and control, should be stable
we need to do a theoretic analysis to figure out if it is
oscillatory should still converge or be bounded
design to cause stability
end to end congestion system: damped feedback
routing systems: randomised timers
we don't want to do alternate path routing and admission control on the same timescales
mobile issues
wireless links
throughput, delay and loss can vary on wireless links
offering hard QoS is hard
it may be coordinatable if different layer two media can cooperate and do handoff properly
patterns
mobile access patterns may be different from fixed ones (we don't know)
mobile multicast with a moving source or sink could be complex (need to rebuild the tree)
resources
some QoS approaches are based on the network being essentially underloaded
expedited forwarding and assured forwarding only work for VoIP if it's a small percentage of the traffic
this isn't the case on many wireless links
pricing
you have to charge for QoS
how much are users prepared to pay?
how to stop the rich from buying all the bandwidth and resell it at a different price?
normally access fee can cover the cost of infrastructure
the bill is an incentive scheme to stop users hogging capacity
varying the price with network conditions – congestion pricing
is theoretically optimal
generally too complex for users
predictable bills are often more important than cheapest fare
provisioning
users don't like being refused access
we need to dimension the network to balance user satisfaction and revenue levels
implementation novelties
active networks
allows the packet to provide code to execute on routers and nodes in the network
was investigated as it was becoming hard to deploy new services in telephony and IP networks due to the scale of the infrastructure
can reprogram the switch on the fly (possibly per packet)
multi protocol label switching (MPLS)
getting from the source to the destination(s) as fast as possible
not quite the same as QoS provisioning
is intended to replace ATM (similar technology)
flows are identified and labelled
label switching
packet enters label switching router (LSR)
destination and new label looked up in table
new label applied, sent to output port
labels
packets that share the same next hop share the same local label
that label is determined by a downstream label switching router
routing information is used to distribute labels
piggy back the label information onto existing protocols
label stacking
it's like source routing
the LSRs along the way pop off the next label on the stack
makes it even faster
performance issues
router architectures
fast route table lookup
fast packet classification (for QoS)
better address aggregation
traffic engineering (differentiated services)
faster boxes or smarter software? (soft vs. dumb)
if you push something out of one part of the architecture, it will show up somewhere else