Computer Systems Modelling

 

 

Modelling

  • why model?

    • investigating the use of alternatives with different characteristics

    • modifying variables to fix a badly performing system

  • utilisation: proportion of the time that the resource is busy

    • may have differing desires

      • user wants low utilisation to get QoS

      • provider wants high utilisation to make it cheaper per customer

  • techniques

    • measurement

    • simulation

    • queueing theory

    • operational analysis (not covered)

  • chose technique by

    • time available (measurements/simulation takes a long time)

    • resources available for the analysis

    • desired accuracy

 

Little's result

α(t) = number of arrivals in (0, t)

δ(t) = number of departures in (0, t)

N(t) = α(t) – δ(t) is the number in the system at t

γ(t) = the area between α(t) and δ(t) representing the total time all customers have spent in the system

let

λ(t) = α(t)/t the average arrival rate during (0, t)

T(t) = γ(t)/α(t) the average time each customer spends in the system

N(t) = γ(t)/t the average number of customers

these combine to

N(t) = λ(t).T(t)

taking the limits of λ and T as they tend to infinity we get

N = λ .T

where

N is the average number in the system

λ is the average arrival rate

T is the average time in the system

  • NB Little's result can be applied to subsections of a system, i.e. server and queue

 

 

P(A | B) =

 

 

moments:

  • the nth moment of a probability distribution is

 

  • the nth central moment of a probability distribution is

  •  
    • variance is the second central moment

 

Coefficient of variation

 

Exponential distribution

fX(x) = { λ . e -λx if x > 0

{ 0 o/w

FX(x) = { 1 - e -λx if x > 0

{ 0 o/w

μ = 1 / λ

σ = 1 / λ2

 

Memoryless property

P( X > t + s | X > t) = P(X > s)

 

Gamma distribution

fX(x) = { if x > 0

{ 0 o/w

where

n = 1, 2, 3 ...

λ > 0

μ = n / λ

σ = n / λ2

 

Normal distribution

with

FZ (x) = Θ(x)

FX (x) =

 

Central limit theorem

for

Xi independent, identically distributed random variables with mean μ and variance σ

 

 

Bernoulli distribution

P(X = x) = { p if x = 1

{ 1-p if x = 0

 

Binomial distribution

  • the number of successes in a fixed length sequence of independent Bernoulli trials

P(X = x) = nCx . px . (1 – p)n – x

 

Poisson

μ = λ

σ = λ

 

 

Geometric distribution

P(X = n) = p.(1 – p)n – 1

μ = 1/p

σ = (1-o)/p2

 

Poisson process

  • this a process of events occurring at random points in time

  • the number of events in disjoint time intervals are independent

  •  
    • homogeneous Poisson process (if λ is a constant):

      • the number of events in an interval depends only on the length the interval and not its location

      • if X1, X2 ... are the inter-arrival times between events

P(Xi+1 > t | Xi = s) = P(0 events in (s, s + t] | X1 = S)

= P(0 events in (s, s + t]) by the memoryless property

= e -λτ

  •  
    •  
      •  
        • hence they are independent identically distributed exponential random variables

  •  
    • non-homogeneous Poisson process

      • if λ is a function of time

 

 

Random number generation

  • we can create most any distribution we want from a source of U(0,1) random variables

  • generating U(0,1) continuous random variables

    • requirements for algorithms

      • the algorithm should be fast

      • storage requirements should be low

      • the (pseudo) random sequence should only repeat after a very long period

      • the sequence should be uniform and independent (statistical properties)

    • generally we work over 1, 2, ... , m and then divide 1 by m to get a number (0,1)

    • multiplicative congruential method

      • start with a seed X0

      • Xn = (a . X n – 1) (mod m)

    • mixed congruential method

      • start with a seed X0

      • Xn = (a . X n – 1 + c) (mod m)

  • normally assume a generator of U(0,1) random number generator

  • discrete random variables

    • for an arbitrary discrete distribution

        P( X = xi ) = pi

      • can generate a random variable X

        • by generating a U ~ U(0,1)

{x0 if U < p0

{x1 if p0 <= U < p0 + p1

X = {...

{xi if

{...

 

  •  
    •  
      •  
        • inverse transform method

          • define

          • generate a U ~ U(0,1)

          • assume x0 < x1 < ...

          • do

            • if U < F(x0) set X = x0 and stop

            • if U < F(x1) set X = x1 and stop

            • ...

    • specific distributions

      • specialisations can be made for specific distributions

        • geometric

        • Poisson

          • note that you can generate F in an efficient way by setting

            • i = 0, p0 = e -λ , F0 = p0

            • , Fi+1 = Fi + pi+1

 

  • continuous random variables

    • for an arbitrary continuous distribution

      • is a random variable with cdf FX(x), as

P(X <= x) = P(F-1X (U) <= x)

= P(FX (F-1X (U)) <= FX (x)) if FX is non-decreasing

= P( U <= FX(x))

= FX (x)

  •  
    •  
      • NB you do not actually need to work out F-1X (x)

    • exponential distribution

FX(x) = {1 - e -λx if x > 0

{0 o/w

X = F-1X (U)

U = FX(X)

= 1 - e -λX

  •  
    • simulating a Poisson process

      • algorithm to generate the first T time units of a Poisson process

  1.  
    1.  
      1.  
        1. let t = 0, i = 0

        2. generate U

        3. let ti+1 = ti -

  •  
    •  
      •  
        •  
          • if ti > T then stop

  1.  
    1.  
      1.  
        1. go to 2

 

Queueing systems

  • characteristics of a queueing system

    • arrival/service time distributions

    • queue length

    • number of servers

    • classes of arriving jobs (large/small etc.)

    • queueing discipline (FIFO, LCFS etc)

  • Kendall notation: A / B / m / k / l where

    • A is the inter-arrival time distribution of customers

    • B is the service time distribution

    • m is the number of parallel servers

    • k is the limit on the customers in this system

      • includes customers in servers and in the queue

    • l is the population size

  • time distributions

    • M – exponential (memoryless) distribution

    • Er – r stage Erlangian distribution

    • D – deterministic (fixed time)

    • G – general (arbitrary distribution)

  • queueing networks

    • a system of multiple interconnected servers

    • describe

      • each server using Kendall notation

      • the way which jobs move between servers (arcs)

    • closed networks – fixed set of jobs circulate between servers (no jobs may enter or leave the system)

    • open network – jobs may enter or leave the system

Simulation

  • advantages of simulation

    • can evaluate arbitrarily complex systems

    • captures the dynamics of complex systems

    • tracks the evolution of the system

      • it can show periods of transition

  • disadvantages of simulation

    • work to design, coding and debugging a simulation can approach implementing and measuring the system

    • can be computationally expensive

    • statistical analysis of the output can be difficult

      • have we really run this long enough to get a good average?

  • how simulate?

    • there are a series of equations which are 'solved' by following their dynamic evolution in time

  • discrete state simulation

    • state of the system is described by discrete variables

    • e.g. number of jobs at different stages on a production line

  • continuous state simulation

    • state of the system is described by continuous variables

    • e.g. quantities of various chemical reagents in a reaction

  • much of the simulation time is spent maintaining chronological order of the event list

 

Discrete state simulation

  • variables (the things we monitor the change in over time)

    • time variable t

    • counter variables: keep count of the number of times certain events have happened by time t

    • system state variables: define the state of the system at time t

  • e.g. simulating a single server queue

    • time variable

      • t

    • counter variables

      • NA the number of arrivals

      • ND the number of departures

    • system state

      • n the number of customers in the system

  • modelling stochastic systems

    • use stochastic probability distributions to model arrivals/departures/etc.

    • since the simulation is stochastic, so are its outputs

  • statistical error

    • simulation yields only estimates for performance measures

    • reduce error by

      • running the simulation for a long time until sufficient samples have been taken

      • running the same simulation with different pseudo-random number sequences

      • combining results from multiple runs

 

  • features which may be of interest from the simulation

    • utilisation

      • this is the proportion of the time that a server is busy

      • for a k server system running for a simulation length T:

        •  

            where busytime is a length of time when a server i was busy

      • aggregate the busy times (summation above) by

        • sampling the system to observe when servers are busy or idle, or

        • save the time when the server becomes busy, then save when it becomes idle again

    • throughput: where

      • n is the total number of customers serviced

      • T is the simulation length

    • mean queue length

      • take the queue length as a function against time

      • find the mean queue length by

        • determining the function of the distribution and use standard mean for that distribution, or

        • find the area under the curve and divide by T

    • mean queueing time, methods:

      • observe queueing times and take their average, or

      • use Little's law

Nq = λ . Wq

where

Nq is the total number of arrivals

λ is the arrival rate

Wq is the mean queueing time

the system is in a steady state

  • statistically analysing results

    • suppose

      • X1 , X2 , ... , Xn are independent, identically distributed random variables

        • for our uses each of these is a measured distribution from a simulation, e.g. for customers waiting verses time, we are trying to extract useful things from these multiple runs, like the mean and variance which will occur in general

      • μ is their common mean

      • σ2 is their common variance

    • how can we estimate μ and σ2?

    • how does n affect the accuracy of estimates?

 

  •  
    • define the random variable

      • this is the “sample mean”

      • this is the estimator for the mean μ of Xi

      • properties

        • its sample variance, which can be thought of the mean squared error in the estimation of μ from the estimator is

          • hence a large sample will help get a more accurate μ

    • define the random variable

      • S2 is the “sample variance”

      • S2 is the estimator for the variance σ2 of Xi

          E(S2) = σ2

    • confidence intervals

      • framework

        • define a random variable

        • by the Central Limit Theorem this is approximately distributed N(0, 1)

        • since the true value of σ2 may not be known, it may be required to use S2 instead

      • let Z ~ N(0,1)

      • let zα be a value such that P(Z > zα) = α

      • then it follows that

      • by substituting for Z we have

      • rearranging we can get

      • hence is the 100(1-α) percent confidence interval for μ

 

  •  
    • student's distribution

      • if Xi all have a normal distribution then

      • has exactly a distribution called the Student's t-distribution, with (n-1) degrees of freedom

      • hence instead of

          P(Z > zα) = α

        we can use

          P(T > tα) = α

        for the confidence interval

    • stopping rules

      • when has a system be run “long enough” for accurate performance measures to be obtained? options include:

  1.  
    1.  
      1.  
        1. repeat the simulation several times with different random seed values

  •  
    •  
      •  
        •  
          • these multiple runs are called replications

          • with n runs a confidence interval on the measure L which we are interested in we can construct a confidence interval

          • disadvantage: each replication requires re-stabilising the simulation

  1.  
    1.  
      1.  
        1. a single long run, broken up into n blocks

  •  
    •  
      •  
        •  
          • each of these blocks can have a sample measure Li

          • this could lead to correlation between successive blocks

            • setting block size large reduces this

          • there are techniques to estimate the correlation

          • the simulation can be stopped when the confidence interval around L becomes sufficiently narrow (i.e. L stabilises)

      • running a simulation until the 100(1-α) confidence interval for the true mean value is at most width l

        • generate an initial k data points (k is an arbitrary small number)

        • generate additional data values until the number of values generated n is such that

        • this is as (I think)

 

  •  
    • “goodness of fit” tests

      • this is used to check whether a distribution which we have assumed (e.g. Poisson arrival process) is actually consistent with the data seen

      • Chi squared test for discrete data

        • we have n independent random variables Y1, ..., Yn

        • null hypothesis – they have the distribution

            P(Yj = i) = pi

            where i = 1 ... k

            j = 1 ... n

        • let Ni be the number of Yj that are equal to i then we expect under the null hypothesis that

          • E(Ni) = n . pi

        • we should consider rejecting the null hypothesis when

          is too large

        • for large n that the distribution of T is approximately a chi-squared random variable with k-1 degrees of freedom

        • define

          • the value

          • the random variable Y ~ Chi2(k-1)

        • typically reject the null hypothesis if

            P(Y > x) < 0.05

      • Kolmogorov-Smirnov test for continuous data

        • we have n independent random variables Y1, ..., Yn

        • null hypothesis – they have the continuous distribution F(x)

        • construct the distribution

        • this

          • will measure the proportion of observed values less than or equal to x

          • should be close to the function F(x) under the null hypothesis

        • define

        • we expect this to be small and should reject the hypothesis if D is too large

    • variance reduction techniques

      • have been using such that

      • attempting to find estimators for μ with smaller variance can produce significant speed ups

 

  •  
    • antithetic variables

      • note that, for two identically distributed random variables with common mean μ

      • hence there would be a smaller variance if the covariance were negative (X1 and X2 were negatively correlated)

      • for the inverse transform method the values U1, U2 ... are generated from U(0,1)

        • note that (1-U1), (1-U2) ... are also distributed U(0,1)

        • these two series of numbers are negatively correlated

      • we can divide the simulation run into two groups

        • use (1-Ui) for the second group

      • this yields two random variables X1 X2 which are negatively correlated

      • X1 and X2 are called antithetic variables

    • control variates

      • let

        • X be a random variable gathered from the output of a random variable X s.t.

            E(X) = μ

        • Y be another random variable gathered from the same output with a known mean value

            E(Y) = μY

        • Z = X + c . (Y - μY)

      • note that

          E(Z) = μ

          Var(Z) = Var(X + x . (Y – μy))

          = Var(X + c.Y)

          = Var(X) + c2.Var(Y) + 2.c.Cov(X, Y)

      • Var(Z) is minimised when

        such that

          Var(Y) <= Var(X)

      • Y is called the control variate for X

 

Queueing theory

  • stochastic processes

    • a stochastic process is

      • a collection of random variables X(t)

      • the possible values of X are those in state space S

      • X is indexed by values of time t in set T

    • discrete state stochastic process

      • T is countable (e.g. integers or natural numbers)

    • continuous state stochastic process

      • T is uncountable

    • Markov process

        t1 , ... , tn , t , A1 , ... , An . t1 < ... < tn < t =>

        P( X(t)A | X(t1)A1 ... X(tn)An) = P( X(t)A | X(tn)An)

      • where A1 ... An A are events

      • this is the Markov property that the next state depends only upon the current state

    • Markov chains

      • a discrete state discrete time stochastic Markov process

  • birth death process

    • this is a discrete state continuous time Markov process

    • transitions are only permitted between neighbouring states

    • taking N as the state space, given Xn = i

        Xn+1 = { i + 1 for a birth event

        { i – 1 for a death event

    • birth and death rate

      • we can define the birth and death rates for each state i

        • λi is the birth rate in state i

        • μi is the death rate in state i

    • state space diagrams

      • these can be used to show the birth and death rates in each state

    • Pi(t) denotes the probability of being in state i at time t

      • the probability of a birth at state i in interval Δt is assumed to be

          λi . Δt + o(Δt)

        where

          o(Δt) is the probability of more than one event in Δt

          o(Δt) denotes a quantity which becomes negligible (in comparison with Δt) as Δt→0

      • the probability of a death at state i in interval Δt is assumed to be

          μi . Δt + o(Δt)

    • Chapman Kolmogorov equations

      • these are a set of difference equations used to solve Pi(t)

        • pi is the probability of being in state i

      • for i >= 1

          Pi(t + Δt) = Pi(t) . (1 - λi . Δt) . (1 - μi . Δt) we're in state i and don't leave

          + Pi+1(t) . (1 - λi+1 . Δt) . μi+1 . Δt were in state i+1 and death event

          + Pi-1(t) . λi-1 . Δt . (1 - μi-1 . Δt) were in state i-1 and birth event

          + o(Δt) more than one event got us here

      • for i = 0

          P0(t + Δt) = P0(t) . (1 - λi . Δt) we're in state 0 and no births

          + P1(t) . (1 - λ1 . Δt) . μ1 . Δt were in state 1 and death event

          + o(Δt) more than one event got us here

          NB. you can't have deaths in state 0

      • divide through by Δt and let Δt → 0, and you get

        • for i >= 1

        • for i = 0

      • solving this is hard

 

  •  
    • we can solve the Chapman Kolmogorov equations for the stationary solution

      • we want the long term probabilities after the system has reached equilibrium

      • this equilibrium exists if

      • since the system in equilibrium we have

        •  

      • under steady state conditions the total flow into a state equals the total flow out of a state

        • i.e. the probability of going from (i-1)→i = the probability of going from i→(i-1)

        • this leads to the detailed balance equations

          • pi i = pi+1 . μi+1 0 <= i

          • pi i = pi -1 i-1 1 <= i

        • global balance equation: pi i + pi i = pi -1 i-1 + pi+1 . μi+1 1 <= i

      • stochastic balance

        • k >= 1

        • since the sum of state probabilities is 1 then

    • the M/M/1 queue

      • i > 0. μi = μ and λi = λ, μ0 = μ, λ0 = λ

        note that since it is going from i = 0 to i = k-1 it is being multiplied k times

      • hence

      • average number of customers in the system

        • insert hand written notes for derivation

 

  •  
    •  
      • average residence time

        • an arriving customer will

          • find an average E(N) customers in front of them

          • spend an average time E(T) in the system

        • during E(T) there will be an average λ.E(T) arrivals

          • hence E(N) = λ . E(T) (Little's result)

          • we can rearrange this in terms of λ and μ

      • performance at high load

        • predicting residence time is difficult – lots of variation

    • the M/M/m queue: m identical servers

      • k . λk = λ

      • μ k = {k.μ if 0 <= k <= m since there are more than k servers all of the

        { items are being serviced

        {m.μ if m < k since m < k, we can only be serving m items at

        { at once, hence the death rate is m times the

        { death rate of a single server

        where μ is the death rate for a single server

        k is the number of items in the system

      • for an equilibrium distribution we require

    • the M/M/1/K queue

      • the system can only hold up to K customers

      • k >= 1. μk = μ

      • λ k = {λ k < K

        {0 K <= k

      • p k = 0 if K <= k

      • use the equations from the M/M/1 queue but placings lower bounds on the summations

      • an equilibrium distribution exists no matter what the values of μ and λ

    • the M/M/1/ /N

      • the system has a finite population

      • need to modify λ k to model the arrival rate

      • k >= 1. μk = μ

      • λ k = {(N – k).λ k <= N

        {0 N < k

 

  •  
    • the M/M/m/m system

      • the m server loss system

      • e.g. the telephone network

        • a link contains m circuits, each of which can carry a single call

        • calls arrive at the link according to a Poisson process at rate λ

        • a connected call holds the circuit for an exponentially distributed length of time, with mean

        • if no circuits are available then the call is lost

      • μk = k.μ 1 <= k <= m

      • λ k = {λ k < m

        {0 m <= k

      • flow balance equations

      • loss probability: the probability that an arriving call finds all circuits occupied, pm

        • also known as Erlang's formula

  • extensions

    • we want to model systems in which the coefficient of variation of the inter-arrival time is less than 1

    • a system where birth occurs in r serial stages

      • each stage has an exponentially distributed residence time

      • if the desired birth rate is λ then each stage should have a rate λ.r

      • the average time τ to get through the combined birth process will be

      • the variance of each stage (given an exponentially distributed residence time)

      • the variance of τ to get through the combined process is

      • the coefficient of variation is

      • this is the r stage Erlangian distribution

      • e.g. M/E2/1 queue

 

 

 

  •  
    •  
      •  
        • the state of the process represents the number of stages remaining to be served

          • a birth/arrival increases the number of stages to serve by 2, these occur at rate λ

          • a departure reduces the number of stages to be served by 1

            • hence these must occur at a rate 2μ in order to keep the system stationary, by flow conservation

            • in actuality it would be better to describe

              • departure rate as μ

              • arrival rate between i and i+1 as λ, but since they can only happen in groups of 2 then the actual arrival rate is λ/2

    • systems with a coefficient of variation larger than 1

      • combining stages in parallel (vs. in series which made the coefficient less than 1)

 

 

 

 

  •  
    • queueing networks

      • in general we want to solve for a system of queues representing a real world problem

        • e.g. a distributed computing system

        • represent the system as a network of connected queues

        • customers move instantaneously between service centres where they receive service

          • if you want to model delay then just stick in an intermediary server which has a processing time which has the delay property you want (fixed, memoryless, etc.)

      • the model

        • customers:

          • programs or data packets etc.

        • service centres:

          • system resources like disks, CPU, transmission links etc.

        • service time distributions:

          • may vary by customer type and visit

        • load dependence:

          • multi-processor systems have load dependent service rates

        • waiting lines and scheduling:

          • lines may have limited capacity

          • different lines may have different scheduling algorithms

        • customer types:

          • multiple customer classes may exist