Difference between revisions of "Notes:Distribution of the sample median"
m (Adding conclusion getting ready for next section) |
(Fixing some typos, summing up steps) |
||
Line 1: | Line 1: | ||
{{ProbMacros}}{{M|\newcommand{\O}[0]{\mathcal{O} } \newcommand{\M}[0]{\mathcal{M} } \newcommand{\Q}[0]{\mathcal{Q} } \newcommand{\Min}[1]{\text{Min}\left({#1}\right)} \newcommand{\d}[0]{\mathrm{d} } }} | {{ProbMacros}}{{M|\newcommand{\O}[0]{\mathcal{O} } \newcommand{\M}[0]{\mathcal{M} } \newcommand{\Q}[0]{\mathcal{Q} } \newcommand{\Min}[1]{\text{Min}\left({#1}\right)} \newcommand{\d}[0]{\mathrm{d} } }} | ||
__TOC__ | __TOC__ | ||
+ | ==Important results== | ||
+ | # {{M|\P{\text{Median}(X_1,\ldots,X_{2m+1})\le r}\eq\Pcond{X_1\le\cdots\le X_{m+1}\le r}{X_1\le\cdots\le X_{2m+1} } }} | ||
+ | #: {{MM|\eq \frac{\P{X_1\le\cdots\le X_{m+1}\le\Min{r,X_{m+2} }\le X_{m+2}\le X_{m+3}\le\cdots\le X_{2m+1} } }{\frac{1}{(2m+1)!} } }} | ||
+ | #: {{MM|\eq \big((2m+1)!\big)\P{X_1\le\cdots\le X_{m+1}\le\Min{r,X_{m+2} }\le X_{m+2}\le X_{m+3}\le\cdots\le X_{2m+1} } }} | ||
+ | #: {{MM|\eq \lim_{t\rightarrow+\infty}\Bigg(\big((2m+1)!\big)\P{X_1\le\cdots\le X_{m+1}\le\Min{r,X_{m+2} }\le X_{m+2}\le X_{m+3}\le\cdots\le X_{2m+1}\le t }\Bigg) }} | ||
+ | #: {{MM|\eq\frac{(2m+1)!}{m!}\lim_{t\rightarrow+\infty}\Bigg[\int^t_{-\infty}f(x_{2m+1})\left(\int^{x_{2m+1} }_{-\infty}f(x_{2m})\left(\cdots\int^{x_{m+3} }_{-\infty}f(x_{m+2})\left(\int^{\Min{r,x_{m+2} } }_{-\infty} f(x_{m+1})F(x_{m+1})^m\d x_{m+1}\right)\d x_{m+2}\cdots\right)\d x_{2m}\right)\d x_{2m+1}\Bigg] }} | ||
+ | |||
==Problem overview== | ==Problem overview== | ||
Let {{M|X_1,\ldots,X_{2m+1} }} be a sample from a population {{M|X}}, meaning that the {{M|X_i}} are {{iid}} [[random variables]], for some {{M|m\in\mathbb{N}_{0} }}. We wish to find: | Let {{M|X_1,\ldots,X_{2m+1} }} be a sample from a population {{M|X}}, meaning that the {{M|X_i}} are {{iid}} [[random variables]], for some {{M|m\in\mathbb{N}_{0} }}. We wish to find: | ||
Line 12: | Line 19: | ||
* {{M|\mathcal{O}:\eq X_1\le\cdots\le X_{2m+1} }} - representing the order part | * {{M|\mathcal{O}:\eq X_1\le\cdots\le X_{2m+1} }} - representing the order part | ||
* {{M|\mathcal{M}:\eq X_1\le\cdots\le X_{m+1}\le r}} - representing the median part | * {{M|\mathcal{M}:\eq X_1\le\cdots\le X_{m+1}\le r}} - representing the median part | ||
− | * {{M|\mathcal{Q}:\eq\P{\text{Median}(X_1,\ldots,X_{2m+1})\le r}\eq\Pcond{\mathcal{ | + | * {{M|\mathcal{Q}:\eq\P{\text{Median}(X_1,\ldots,X_{2m+1})\le r}\eq\Pcond{\mathcal{M} }{\mathcal{O} } }} - representing the question |
Line 62: | Line 69: | ||
We substitute this back in to yield: | We substitute this back in to yield: | ||
* {{MM|\frac{1}{m!}\int^t_{-\infty}f(x_{2m+1})\left(\int^{x_{2m+1} }_{-\infty}f(x_{2m})\left(\cdots\int^{x_{m+3} }_{-\infty}f(x_{m+2})\left(\int^{\Min{r,x_{m+2} } }_{-\infty} f(x_{m+1})F(x_{m+1})^m\d x_{m+1}\right)\d x_{m+2}\cdots\right)\d x_{2m}\right)\d x_{2m+1} }} | * {{MM|\frac{1}{m!}\int^t_{-\infty}f(x_{2m+1})\left(\int^{x_{2m+1} }_{-\infty}f(x_{2m})\left(\cdots\int^{x_{m+3} }_{-\infty}f(x_{m+2})\left(\int^{\Min{r,x_{m+2} } }_{-\infty} f(x_{m+1})F(x_{m+1})^m\d x_{m+1}\right)\d x_{m+2}\cdots\right)\d x_{2m}\right)\d x_{2m+1} }} | ||
+ | ===Conclusion of progression 1=== | ||
+ | We see here that | ||
==Progression: 2== | ==Progression: 2== | ||
This'll involve induction and dealing with the {{M|\text{Min}()}} will be "tricky", both for practice and induction we will consider the special cases {{M|m\eq 1}} and {{M|m\eq 2}} by evaluating: | This'll involve induction and dealing with the {{M|\text{Min}()}} will be "tricky", both for practice and induction we will consider the special cases {{M|m\eq 1}} and {{M|m\eq 2}} by evaluating: | ||
Line 95: | Line 104: | ||
===Conclusion of progression 2=== | ===Conclusion of progression 2=== | ||
* {{MM|\Pcond{X_1\le X_2\le r}{X_1\le X_2\le X_3}\eq F(r)^2\big(3-2F(r)\big)}} | * {{MM|\Pcond{X_1\le X_2\le r}{X_1\le X_2\le X_3}\eq F(r)^2\big(3-2F(r)\big)}} | ||
− | ==Progression 3== | + | ==Progression: 3== |
Now we look at {{M|m\eq 2}}, or 5 samples. | Now we look at {{M|m\eq 2}}, or 5 samples. |
Revision as of 17:50, 16 December 2017
Contents
[hide]Important results
- P[Median(X1,…,X2m+1)≤r]=P[X1≤⋯≤Xm+1≤r | X1≤⋯≤X2m+1]
- =P[X1≤⋯≤Xm+1≤Min(r,Xm+2)≤Xm+2≤Xm+3≤⋯≤X2m+1]1(2m+1)!
- =((2m+1)!)P[X1≤⋯≤Xm+1≤Min(r,Xm+2)≤Xm+2≤Xm+3≤⋯≤X2m+1]
- =limt→+∞(((2m+1)!)P[X1≤⋯≤Xm+1≤Min(r,Xm+2)≤Xm+2≤Xm+3≤⋯≤X2m+1≤t])
- =(2m+1)!m!limt→+∞[∫t−∞f(x2m+1)(∫x2m+1−∞f(x2m)(⋯∫xm+3−∞f(xm+2)(∫Min(r,xm+2)−∞f(xm+1)F(xm+1)mdxm+1)dxm+2⋯)dx2m)dx2m+1]
- =P[X1≤⋯≤Xm+1≤Min(r,Xm+2)≤Xm+2≤Xm+3≤⋯≤X2m+1]1(2m+1)!
Problem overview
Let X1,…,X2m+1 be a sample from a population X, meaning that the Xi are i.i.d random variables, for some m∈N0. We wish to find:
- P[Median(X1,…,X2m+1)≤r]- the Template:Cdf of the median.
Initial work
Since the variables are independent then any ordering is as likely as any other (which I proved the long way, rather than just jumping to 1(2m+1)!
I believe the P[Median(X1,…,X2m+1)≤r]=P[X1≤⋯≤Xm+1≤r | X1≤⋯≤X2m+1]. Let us make some definitions to make this shorter.
- O:=X1≤⋯≤X2m+1 - representing the order part
- M:=X1≤⋯≤Xm+1≤r - representing the median part
- Q:=P[Median(X1,…,X2m+1)≤r]=P[M | O] - representing the question
We should also have some sort of converse, related to r≤Xm+2≤⋯X2m+1 or something.
We also have:
- An expression for P[X1≤⋯≤Xn≤r] from Probability of i.i.d random variables being in an order and not greater than something
- It's =1n!FX(r)n
- It's =1n!FX(r)n
Analysis
Let us look at X≤r and X≤Y to see what we can say if both are true (the "and")
- Claim: (X≤r∧X≤Y)⟺(X≤Min(r,Y))
- Proof:
- ⟹
- Suppose r≤Y, so Min(r,Y)=r, obviously X≤r ⟹ X≤r=Min(r,Y), so the implication holds in this case
- Suppose Y≤r, so Min(r,Y)=Y, obviously X≤Y ⟹ X≤Y=Min(r,Y), so the implication holds in this case too.
- ⟸
- We notice either Min(r,Y)=r if r≤Y, or Min(r,Y)=Y if Y≤r (slightly modify the language for the equality, it doesn't matter though really)
- Thus if r≤Y then X≤r and as r≤Y by assumption, we use the transitivity of ≤ to see X≤r≤Y thus X≤Y too - as required
- Thus if Y≤r then X≤Y and as Y≤r by assumption, we use the transitivity of ≤ to see X≤Y≤r and thus X≤r too - as required.
- So in either case, we have X≤Y and X≤r - as required
- We notice either Min(r,Y)=r if r≤Y, or Min(r,Y)=Y if Y≤r (slightly modify the language for the equality, it doesn't matter though really)
- ⟹
Problem statement
Thus we really want to find:
- P[Median(X1,…,X2m+1)≤r]=P[X1≤⋯≤Xm+1≤r | X1≤⋯≤X2m+1]
- =P[M and O]P[O]
- =((2m+1)!)P[X1≤⋯≤Xm+1≤Min(r,Xm+2)≤Xm+2≤Xm+3⋯≤X2m+1]
- Caveat:We now need: (X≤r∧X≤Y≤Z)⟹(X≤Min(r,Y)≤Y≤Z)to justify this format. Although that's arguably not that helpful for the integral.
- =P[M and O]P[O]
Initial integral
- This isn't about the median specifically, this is just looking at the specific integral.
Suppose we have a sample of length 3, X,Y,Z then we are looking at:
- P[X≤Min(r,Y)≤Y≤Z≤t] (where t will be used for a limit towards ∞ to get P[X≤Min(r,Y)≤Y≤Z] in the end), or as an integral:
- ∫t−∞f(z)(∫z−∞f(y)(∫Min(r,y)−∞f(x)dx)dy)dz
- if t>r then the minimum will get involved (for some zs anyway) and limit it to r, otherwise it'll always stay under r - of course in practice (as we'll take t→∞) this will certainly happen.
- ∫t−∞f(z)(∫z−∞f(y)(∫Min(r,y)−∞f(x)dx)dy)dz
Progression: 1
We are evaluating: P[X1≤⋯≤Xm+1≤Min(r,Xm+2)≤Xm+2≤Xm+3⋯≤X2m+1≤t]
- ∫t−∞f(x2m+1)(∫x2m+1−∞f(x2m)(⋯∫xm+3−∞f(xm+2)(∫Min(r,xm+2)−∞f(xm+1)(∫xm+1−∞f(xm)(⋯∫x2−∞f(x1)dx1⋯)dxm)dxm+1)dxm+2⋯)dx2m)dx2m+1
We operate on the inner bit:
- ∫xm+1−∞f(xm)(⋯∫x2−∞f(x1)dx1⋯)dxm=1m!F(xm+1)m
We substitute this back in to yield:
- 1m!∫t−∞f(x2m+1)(∫x2m+1−∞f(x2m)(⋯∫xm+3−∞f(xm+2)(∫Min(r,xm+2)−∞f(xm+1)F(xm+1)mdxm+1)dxm+2⋯)dx2m)dx2m+1
Conclusion of progression 1
We see here that
Progression: 2
This'll involve induction and dealing with the Min() will be "tricky", both for practice and induction we will consider the special cases m=1 and m=2 by evaluating:
- m=1 yields I1:=11!∫t−∞f(x3)(∫Min(r,x3)−∞f(x2)F(x2)dx2)dx3, by case analysis:
- if t≤r then x3≤t≤r or x3≤r over the entire domain of interest, so Min(r,x3)=x3 over the entire domain, giving:
- I1=11!∫t−∞f(x3)(∫x3−∞f(x2)F(x2)dx2)dx3
- We now use the corollary below to see:
- I1=12!∫t−∞f(x3)F(x3)2dx3
- =13!F(t)3
- =13!F(t)3
- I1=12!∫t−∞f(x3)F(x3)2dx3
- We now use the corollary below to see:
- I1=11!∫t−∞f(x3)(∫x3−∞f(x2)F(x2)dx2)dx3
- if t≥r then we split (−∞,t] into (−∞,r) and [r,t], giving:
- I1=11![∫r−∞f(x3)(∫Min(r,x3)−∞f(x2)F(x2)dx2)dx3+∫trf(x3)(∫Min(r,x3)−∞f(x2)F(x2)dx2)dx3]
- =11![∫r−∞f(x3)(∫x3−∞f(x2)F(x2)dx2)dx3+∫trf(x3)(∫r−∞f(x2)F(x2)dx2)dx3]
- We now use the required corollary immediately below to yield:
- I1=11![∫r−∞f(x3)⋅12F(x3)2dx3+∫trf(x3)⋅12F(r)2dx3]
- =12![13F(r)3+F(r)2∫trf(x3)dx3], note that: ∫trf(x)dx=∫t−∞f(x)dx−∫r−∞f(x)dx=F(t)−F(r)
- =12!F(r)2[13F(r)+(F(t)−F(r))], note that: F(t)−F(r)=3F(t)−3F(r)3which we'll use next
- =12!F(r)2[3F(t)−2F(r)3]
- =13!F(r)2(3F(t)−2F(r))
- I1=11![∫r−∞f(x3)⋅12F(x3)2dx3+∫trf(x3)⋅12F(r)2dx3]
- =11![∫r−∞f(x3)(∫x3−∞f(x2)F(x2)dx2)dx3+∫trf(x3)(∫r−∞f(x2)F(x2)dx2)dx3]
- I1=11![∫r−∞f(x3)(∫Min(r,x3)−∞f(x2)F(x2)dx2)dx3+∫trf(x3)(∫Min(r,x3)−∞f(x2)F(x2)dx2)dx3]
- if t≤r then x3≤t≤r or x3≤r over the entire domain of interest, so Min(r,x3)=x3 over the entire domain, giving:
It is clear that as t→∞ that we end up with I1=13!F(r)2(3−2F(r))
Thus: P[X1≤X2≤Min(r,X3)≤X3]=13!F(r)2(3−2F(r))
Finally:
- P[X1≤X2≤r | X1≤X2≤X3]=F(r)2(3−2F(r))
Required corollary
Recall from Probability of i.i.d random variables being in an order and not greater than something that:
- 1k!∫r−∞f(x)F(x)kdx=1(k+1)!F(r)k+1
So:
- ∫r−∞f(x)F(x)kdx=1k+1F(r)k+1
By applying this to above (with the x2 integrals):
- ∫r−∞f(x)F(x)1dx=12F(r)2, we then substitute this for the cases r:=r and r:=x3
We'll then apply it to the x3 integrals.
Conclusion of progression 2
- P[X1≤X2≤r | X1≤X2≤X3]=F(r)2(3−2F(r))
Progression: 3
Now we look at m=2, or 5 samples.