Difference between revisions of "Notes:Distribution of the sample median"
m (Adding expression for m=3) |
(Fixed typo added work for n=7) |
||
Line 3: | Line 3: | ||
==Findings== | ==Findings== | ||
I've found results for two sample sizes, {{M|n\eq 3}} and {{M|n\eq 5}}, they are respectively: | I've found results for two sample sizes, {{M|n\eq 3}} and {{M|n\eq 5}}, they are respectively: | ||
− | * {{M|F(r)^ | + | * {{M|F(r)^2\big[4-3F(r)\big]}} for {{M|n\eq 3}}, and |
* {{M|F(r)^3\big[10-15F(r)+6F(r)^2\big]}} for {{M|n\eq 5}} | * {{M|F(r)^3\big[10-15F(r)+6F(r)^2\big]}} for {{M|n\eq 5}} | ||
− | ** I've experimentally verified this one | + | ** I've experimentally verified this one |
+ | * {{M|F(r)^4\big(-20F(r)^3+70F(r)^2-84F(r)+35\big)}} for {{M|n\eq 7}} | ||
==Important results== | ==Important results== | ||
# {{M|\P{\text{Median}(X_1,\ldots,X_{2m+1})\le r}\eq\Pcond{X_1\le\cdots\le X_{m+1}\le r}{X_1\le\cdots\le X_{2m+1} } }} | # {{M|\P{\text{Median}(X_1,\ldots,X_{2m+1})\le r}\eq\Pcond{X_1\le\cdots\le X_{m+1}\le r}{X_1\le\cdots\le X_{2m+1} } }} | ||
Line 112: | Line 113: | ||
I am now looking at {{M|m\eq 3}}, which is 7 samples. To find this we evaluate: | I am now looking at {{M|m\eq 3}}, which is 7 samples. To find this we evaluate: | ||
* {{MM|\P{\text{Median}\le r}\eq\frac{7!}{3!}\lim_{t\rightarrow+\infty}\left(\int^t_{-\infty}f(x_7)\left(\int^{x_7}_{-\infty}f(x_6)\left(\int^{x_6}_{-\infty}f(x_5)\left(\int^{\Min{r,x_5} }_{-\infty}f(x_4)F(x_4)^3 \d x_4\right)\d x_5\right)\d x_6\right)\d x_7\right)}} | * {{MM|\P{\text{Median}\le r}\eq\frac{7!}{3!}\lim_{t\rightarrow+\infty}\left(\int^t_{-\infty}f(x_7)\left(\int^{x_7}_{-\infty}f(x_6)\left(\int^{x_6}_{-\infty}f(x_5)\left(\int^{\Min{r,x_5} }_{-\infty}f(x_4)F(x_4)^3 \d x_4\right)\d x_5\right)\d x_6\right)\d x_7\right)}} | ||
+ | Initial work: | ||
+ | # {{MM|I_1(x_6):\eq \int^{x_6}_{-\infty}f(x_5)\left(\int^{\Min{r,x_5} }_{-\infty}f(x_4)F(x_4)^3 \d x_4\right)\d x_5\eq\left\{1514F(x6)5if x6≤r1514F(r)4(5F(x6)−4F(r))if x6≥r \right.}} - these agree if {{M|x_6\eq r}} | ||
+ | # {{MM|I_2(x_7):\eq \int^{x_7}_{-\infty}f(x_6)\left(\int^{x_6}_{-\infty}f(x_5)\left(\int^{\Min{r,x_5} }_{-\infty}f(x_4)F(x_4)^3 \d x_4\right)\d x_5\right)\d x_6\eq \int^{x_7}_{-\infty}f(x_6)I_1(x_6)\d x_6}} {{MM|\eq\frac{1}{6}\frac{1}{5}\frac{1}{4}\left\{F(x7)6if x7≤rF(r)4(10F(r)2−24F(r)F(x7)+15F(x7)2)if x7≥r \right.}} - note both parts agree if {{M|r\eq x_7}} as {{M|10+15-24\eq 1}} | ||
+ | # {{M|I_3(t)\eq}} (everything in the limit) {{MM|\eq \int^t_{-\infty} f(x_7)I_2(x_7)\d x_7}} {{MM|\eq\frac{1}{7}\frac{1}{6}\frac{1}{5}\frac{1}{4}\left\{F(t)7if t≤rF(r)4(−20F(r)3+70F(r)2F(t)−84F(r)F(t)2+35F(t)3)if t≥r \right.}} - note these agree if {{M|t\eq r}} | ||
+ | #* Clearly as {{M|t\rightarrow+\infty}} we get {{MM|I_3(t)\rightarrow\frac{1}{7}\frac{1}{6}\frac{1}{5}\frac{1}{4} F(r)^4\big(-20F(r)^3+70F(r)^2-84F(r)+35\big)}} as {{M|F(t)\rightarrow 1}} | ||
+ | |||
+ | From the top of this section: | ||
+ | * {{MM|\P{\text{Median}\le r}\eq \frac{7!}{3!} I_3(+\infty)\eq F(r)^4\big(-20F(r)^3+70F(r)^2-84F(r)+35\big)}} | ||
+ | |||
+ | |||
+ | '''Conclusion:''' | ||
+ | * {{MM|\P{\text{Median}\le r}\eq F(r)^4\big(-20F(r)^3+70F(r)^2-84F(r)+35\big)}} |
Revision as of 11:59, 17 December 2017
Contents
[hide]Findings
I've found results for two sample sizes, n=3 and n=5, they are respectively:
- F(r)2[4−3F(r)] for n=3, and
- F(r)3[10−15F(r)+6F(r)2] for n=5
- I've experimentally verified this one
- F(r)4(−20F(r)3+70F(r)2−84F(r)+35) for n=7
Important results
- P[Median(X1,…,X2m+1)≤r]=P[X1≤⋯≤Xm+1≤r | X1≤⋯≤X2m+1]
- =P[X1≤⋯≤Xm+1≤Min(r,Xm+2)≤Xm+2≤Xm+3≤⋯≤X2m+1]1(2m+1)!
- =((2m+1)!)P[X1≤⋯≤Xm+1≤Min(r,Xm+2)≤Xm+2≤Xm+3≤⋯≤X2m+1]
- =limt→+∞(((2m+1)!)P[X1≤⋯≤Xm+1≤Min(r,Xm+2)≤Xm+2≤Xm+3≤⋯≤X2m+1≤t])
- =(2m+1)!m!limt→+∞[∫t−∞f(x2m+1)(∫x2m+1−∞f(x2m)(⋯∫xm+3−∞f(xm+2)(∫Min(r,xm+2)−∞f(xm+1)F(xm+1)mdxm+1)dxm+2⋯)dx2m)dx2m+1]
- =P[X1≤⋯≤Xm+1≤Min(r,Xm+2)≤Xm+2≤Xm+3≤⋯≤X2m+1]1(2m+1)!
Problem overview
Let X1,…,X2m+1 be a sample from a population X, meaning that the Xi are i.i.d random variables, for some m∈N0. We wish to find:
- P[Median(X1,…,X2m+1)≤r]- the Template:Cdf of the median.
Initial work
Since the variables are independent then any ordering is as likely as any other (which I proved the long way, rather than just jumping to 1(2m+1)!
I believe the P[Median(X1,…,X2m+1)≤r]=P[X1≤⋯≤Xm+1≤r | X1≤⋯≤X2m+1]. Let us make some definitions to make this shorter.
- O:=X1≤⋯≤X2m+1 - representing the order part
- M:=X1≤⋯≤Xm+1≤r - representing the median part
- Q:=P[Median(X1,…,X2m+1)≤r]=P[M | O] - representing the question
We should also have some sort of converse, related to r≤Xm+2≤⋯X2m+1 or something.
We also have:
- An expression for P[X1≤⋯≤Xn≤r] from Probability of i.i.d random variables being in an order and not greater than something
- It's =1n!FX(r)n
- It's =1n!FX(r)n
Analysis
Let us look at X≤r and X≤Y to see what we can say if both are true (the "and")
- Claim: (X≤r∧X≤Y)⟺(X≤Min(r,Y))
- Proof:
- ⟹
- Suppose r≤Y, so Min(r,Y)=r, obviously X≤r ⟹ X≤r=Min(r,Y), so the implication holds in this case
- Suppose Y≤r, so Min(r,Y)=Y, obviously X≤Y ⟹ X≤Y=Min(r,Y), so the implication holds in this case too.
- ⟸
- We notice either Min(r,Y)=r if r≤Y, or Min(r,Y)=Y if Y≤r (slightly modify the language for the equality, it doesn't matter though really)
- Thus if r≤Y then X≤r and as r≤Y by assumption, we use the transitivity of ≤ to see X≤r≤Y thus X≤Y too - as required
- Thus if Y≤r then X≤Y and as Y≤r by assumption, we use the transitivity of ≤ to see X≤Y≤r and thus X≤r too - as required.
- So in either case, we have X≤Y and X≤r - as required
- We notice either Min(r,Y)=r if r≤Y, or Min(r,Y)=Y if Y≤r (slightly modify the language for the equality, it doesn't matter though really)
- ⟹
Problem statement
Thus we really want to find:
- P[Median(X1,…,X2m+1)≤r]=P[X1≤⋯≤Xm+1≤r | X1≤⋯≤X2m+1]
- =P[M and O]P[O]
- =((2m+1)!)P[X1≤⋯≤Xm+1≤Min(r,Xm+2)≤Xm+2≤Xm+3⋯≤X2m+1]
- Caveat:We now need: (X≤r∧X≤Y≤Z)⟹(X≤Min(r,Y)≤Y≤Z)to justify this format. Although that's arguably not that helpful for the integral.
- =P[M and O]P[O]
Initial integral
- This isn't about the median specifically, this is just looking at the specific integral.
Suppose we have a sample of length 3, X,Y,Z then we are looking at:
- P[X≤Min(r,Y)≤Y≤Z≤t] (where t will be used for a limit towards ∞ to get P[X≤Min(r,Y)≤Y≤Z] in the end), or as an integral:
- ∫t−∞f(z)(∫z−∞f(y)(∫Min(r,y)−∞f(x)dx)dy)dz
- if t>r then the minimum will get involved (for some zs anyway) and limit it to r, otherwise it'll always stay under r - of course in practice (as we'll take t→∞) this will certainly happen.
- ∫t−∞f(z)(∫z−∞f(y)(∫Min(r,y)−∞f(x)dx)dy)dz
Progression: 1
We are evaluating: P[X1≤⋯≤Xm+1≤Min(r,Xm+2)≤Xm+2≤Xm+3⋯≤X2m+1≤t]
- ∫t−∞f(x2m+1)(∫x2m+1−∞f(x2m)(⋯∫xm+3−∞f(xm+2)(∫Min(r,xm+2)−∞f(xm+1)(∫xm+1−∞f(xm)(⋯∫x2−∞f(x1)dx1⋯)dxm)dxm+1)dxm+2⋯)dx2m)dx2m+1
We operate on the inner bit:
- ∫xm+1−∞f(xm)(⋯∫x2−∞f(x1)dx1⋯)dxm=1m!F(xm+1)m
We substitute this back in to yield:
- 1m!∫t−∞f(x2m+1)(∫x2m+1−∞f(x2m)(⋯∫xm+3−∞f(xm+2)(∫Min(r,xm+2)−∞f(xm+1)F(xm+1)mdxm+1)dxm+2⋯)dx2m)dx2m+1
Conclusion of progression 1
We see here that
Progression: 2
This'll involve induction and dealing with the Min() will be "tricky", both for practice and induction we will consider the special cases m=1 and m=2 by evaluating:
- m=1 yields I1:=11!∫t−∞f(x3)(∫Min(r,x3)−∞f(x2)F(x2)dx2)dx3, by case analysis:
- if t≤r then x3≤t≤r or x3≤r over the entire domain of interest, so Min(r,x3)=x3 over the entire domain, giving:
- I1=11!∫t−∞f(x3)(∫x3−∞f(x2)F(x2)dx2)dx3
- We now use the corollary below to see:
- I1=12!∫t−∞f(x3)F(x3)2dx3
- =13!F(t)3
- =13!F(t)3
- I1=12!∫t−∞f(x3)F(x3)2dx3
- We now use the corollary below to see:
- I1=11!∫t−∞f(x3)(∫x3−∞f(x2)F(x2)dx2)dx3
- if t≥r then we split (−∞,t] into (−∞,r) and [r,t], giving:
- I1=11![∫r−∞f(x3)(∫Min(r,x3)−∞f(x2)F(x2)dx2)dx3+∫trf(x3)(∫Min(r,x3)−∞f(x2)F(x2)dx2)dx3]
- =11![∫r−∞f(x3)(∫x3−∞f(x2)F(x2)dx2)dx3+∫trf(x3)(∫r−∞f(x2)F(x2)dx2)dx3]
- We now use the required corollary immediately below to yield:
- I1=11![∫r−∞f(x3)⋅12F(x3)2dx3+∫trf(x3)⋅12F(r)2dx3]
- =12![13F(r)3+F(r)2∫trf(x3)dx3], note that: ∫trf(x)dx=∫t−∞f(x)dx−∫r−∞f(x)dx=F(t)−F(r)
- =12!F(r)2[13F(r)+(F(t)−F(r))], note that: F(t)−F(r)=3F(t)−3F(r)3which we'll use next
- =12!F(r)2[3F(t)−2F(r)3]
- =13!F(r)2(3F(t)−2F(r))
- I1=11![∫r−∞f(x3)⋅12F(x3)2dx3+∫trf(x3)⋅12F(r)2dx3]
- =11![∫r−∞f(x3)(∫x3−∞f(x2)F(x2)dx2)dx3+∫trf(x3)(∫r−∞f(x2)F(x2)dx2)dx3]
- I1=11![∫r−∞f(x3)(∫Min(r,x3)−∞f(x2)F(x2)dx2)dx3+∫trf(x3)(∫Min(r,x3)−∞f(x2)F(x2)dx2)dx3]
- if t≤r then x3≤t≤r or x3≤r over the entire domain of interest, so Min(r,x3)=x3 over the entire domain, giving:
It is clear that as t→∞ that we end up with I1=13!F(r)2(3−2F(r))
Thus: P[X1≤X2≤Min(r,X3)≤X3]=13!F(r)2(3−2F(r))
Finally:
- P[X1≤X2≤r | X1≤X2≤X3]=F(r)2(3−2F(r))
Required corollary
Recall from Probability of i.i.d random variables being in an order and not greater than something that:
- 1k!∫r−∞f(x)F(x)kdx=1(k+1)!F(r)k+1
So:
- ∫r−∞f(x)F(x)kdx=1k+1F(r)k+1
By applying this to above (with the x2 integrals):
- ∫r−∞f(x)F(x)1dx=12F(r)2, we then substitute this for the cases r:=r and r:=x3
We'll then apply it to the x3 integrals.
Conclusion of progression 2
- P[X1≤X2≤r | X1≤X2≤X3]=F(r)2(3−2F(r))
Progression: 3
I am now looking at m=3, which is 7 samples. To find this we evaluate:
- P[Median≤r]=7!3!limt→+∞(∫t−∞f(x7)(∫x7−∞f(x6)(∫x6−∞f(x5)(∫Min(r,x5)−∞f(x4)F(x4)3dx4)dx5)dx6)dx7)
Initial work:
- I1(x6):=∫x6−∞f(x5)(∫Min(r,x5)−∞f(x4)F(x4)3dx4)dx5={1514F(x6)5if x6≤r1514F(r)4(5F(x6)−4F(r))if x6≥r- these agree if x6=r
- I2(x7):=∫x7−∞f(x6)(∫x6−∞f(x5)(∫Min(r,x5)−∞f(x4)F(x4)3dx4)dx5)dx6=∫x7−∞f(x6)I1(x6)dx6=161514{F(x7)6if x7≤rF(r)4(10F(r)2−24F(r)F(x7)+15F(x7)2)if x7≥r- note both parts agree if r=x7 as 10+15−24=1
- I3(t)= (everything in the limit) =∫t−∞f(x7)I2(x7)dx7=17161514{F(t)7if t≤rF(r)4(−20F(r)3+70F(r)2F(t)−84F(r)F(t)2+35F(t)3)if t≥r- note these agree if t=r
- Clearly as t→+∞ we get I3(t)→17161514F(r)4(−20F(r)3+70F(r)2−84F(r)+35)as F(t)→1
- Clearly as t→+∞ we get I3(t)→17161514F(r)4(−20F(r)3+70F(r)2−84F(r)+35)
From the top of this section:
- P[Median≤r]=7!3!I3(+∞)=F(r)4(−20F(r)3+70F(r)2−84F(r)+35)
Conclusion:
- P[Median≤r]=F(r)4(−20F(r)3+70F(r)2−84F(r)+35)