Alec's c.d.f-sidestepping sampling method

From Maths
Jump to: navigation, search
See generating samples from a distribution for and overview of methods used

Algorithm

Let f(x) be a p.d.f for which computing F1(x) is either difficult, impractical, impossible or otherwise unavailable, but where f(x) itself is easy to compute.


Suppose that either:

  • a,bR[x[a,b]R[f(x)>0]], there exists a closed interval, [a,b] for which f(x)>0 on over, or:
  • xR[f(x)>0] - f(x)>0 everywhere

then we will sample f(x) over [a,b] or R

  • herein we will use A:=[a,b] or A:=R for whichever case is chosen


Define the following function carefully:

  • Define h:AR such that xA[h(x)>f(x)]

Then the algorithm proceeds as follows:

Steps

To generate X, a sample from the distribution f:

  1. Define g:AR by g(x):=h(x)Ah(t)dt
    - note that this makes g itself a p.d.f
  2. Generate Y on A from g
    • if G:A[0,1]R, or G(x), is the c.d.f of g then G1:[0,1]A is how we usually sample from the distribution, g, we generate a random number, say y uniformly from [0,1], then Y=G1(y) is a random sample from g
  3. Generate Z as an (independent of Y) uniformly on [0,1]
  4. If Zh(Y)f(Y) then:
    • Define X:=Y
    Else
    • Goto step 2


Note that other methods may be used to generate either Y or Z, it is only important that we generate them somehow.


It is possible (although extremely unlikely for high numbers) for this to loop a lot of times before generating a sample, that is the trade-off it makes compared with generating samples from the inverse of a cumulative distribution function. The better the h approximation is to f the higher the throughput of generated samples.

Also unlike generating samples from the inverse of a cumulative distribution function it requires at least 2 generated values unlike the inverse c.d.f generating method which requires only on generated uniform value on [0,1] always.

Proof

Grade: B
This page requires one or more proofs to be filled in, it is on a to-do list for being expanded with them.
Please note that this does not mean the content is unreliable. Unless there are any caveats mentioned below the statement comes from a reliable source. As always, Warnings and limitations will be clearly shown and possibly highlighted if very important (see template:Caution et al).
The message provided is:
The gist is outlined
TODO: Take a picture of sheet 242
and upload

Notes

References

Template:Algorithm Of