Alec's c.d.f-sidestepping sampling method
- See generating samples from a distribution for and overview of methods used
Contents
[hide]Algorithm
Let f(x) be a p.d.f for which computing F−1(x) is either difficult, impractical, impossible or otherwise unavailable, but where f(x) itself is easy to compute.
Suppose that either:
- ∃a,b∈R[∀x∈[a,b]⊆R[f(x)>0]], there exists a closed interval, [a,b] for which f(x)>0 on over, or:
- ∀x∈R[f(x)>0] - f(x)>0 everywhere
then we will sample f(x) over [a,b] or R
- herein we will use A:=[a,b] or A:=R for whichever case is chosen
Define the following function carefully:
- Define h:A→R such that ∀x∈A[h(x)>f(x)]
Then the algorithm proceeds as follows:
Steps
To generate X, a sample from the distribution f:
- Define g:A→R by g(x):=h(x)∫Ah(t)dt- note that this makes g itself a p.d.f
- Generate Y on A from g
- Generate Z as an (independent of Y) uniformly on [0,1]
- If Zh(Y)≤f(Y) then:
- Define X:=Y
- Else
- Goto step 2
Note that other methods may be used to generate either Y or Z, it is only important that we generate them somehow.
It is possible (although extremely unlikely for high numbers) for this to loop a lot of times before generating a sample, that is the trade-off it makes compared with generating samples from the inverse of a cumulative distribution function. The better the h approximation is to f the higher the throughput of generated samples.
Also unlike generating samples from the inverse of a cumulative distribution function it requires at least 2 generated values unlike the inverse c.d.f generating method which requires only on generated uniform value on [0,1] always.
Proof
The message provided is: