Notes:Distribution of the sample median
Contents
[hide]Findings
I've found results for two sample sizes, n\eq 3 and n\eq 5, they are respectively:
- F(r)^2\big[3-2F(r)\big] for n\eq 3, and
- F(r)^3\big[10-15F(r)+6F(r)^2\big] for n\eq 5
- I've experimentally verified this one
- F(r)^4\big(-20F(r)^3+70F(r)^2-84F(r)+35\big) for n\eq 7
Unfortunately it seems prior results are of no help
- F(r)^5\big(70F(r)^4-315F(r)^3+540F(r)^2-420F(r)+126\big) PREDICTED for n\eq 9
Important results
- \P{\text{Median}(X_1,\ldots,X_{2m+1})\le r}\eq\Pcond{X_1\le\cdots\le X_{m+1}\le r}{X_1\le\cdots\le X_{2m+1} }
- \eq \frac{\P{X_1\le\cdots\le X_{m+1}\le\Min{r,X_{m+2} }\le X_{m+2}\le X_{m+3}\le\cdots\le X_{2m+1} } }{\frac{1}{(2m+1)!} }
- \eq \big((2m+1)!\big)\P{X_1\le\cdots\le X_{m+1}\le\Min{r,X_{m+2} }\le X_{m+2}\le X_{m+3}\le\cdots\le X_{2m+1} }
- \eq \lim_{t\rightarrow+\infty}\Bigg(\big((2m+1)!\big)\P{X_1\le\cdots\le X_{m+1}\le\Min{r,X_{m+2} }\le X_{m+2}\le X_{m+3}\le\cdots\le X_{2m+1}\le t }\Bigg)
- \eq\frac{(2m+1)!}{m!}\lim_{t\rightarrow+\infty}\Bigg[\int^t_{-\infty}f(x_{2m+1})\left(\int^{x_{2m+1} }_{-\infty}f(x_{2m})\left(\cdots\int^{x_{m+3} }_{-\infty}f(x_{m+2})\left(\int^{\Min{r,x_{m+2} } }_{-\infty} f(x_{m+1})F(x_{m+1})^m\d x_{m+1}\right)\d x_{m+2}\cdots\right)\d x_{2m}\right)\d x_{2m+1}\Bigg]
Problem overview
Let X_1,\ldots,X_{2m+1} be a sample from a population X, meaning that the X_i are i.i.d random variables, for some m\in\mathbb{N}_{0} . We wish to find:
- \P{\text{Median}(X_1,\ldots,X_{2m+1})\le r} - the Template:Cdf of the median.
Initial work
Since the variables are independent then any ordering is as likely as any other (which I proved the long way, rather than just jumping to \frac{1}{(2m+1)!} - silly me) however the result, found in Probability of i.i.d random variables being in an order and not greater than something will be useful.
I believe the \P{\text{Median}(X_1,\ldots,X_{2m+1})\le r}\eq\Pcond{X_1\le\cdots\le X_{m+1}\le r}{X_1\le\cdots\le X_{2m+1} } . Let us make some definitions to make this shorter.
- \mathcal{O}:\eq X_1\le\cdots\le X_{2m+1} - representing the order part
- \mathcal{M}:\eq X_1\le\cdots\le X_{m+1}\le r - representing the median part
- \mathcal{Q}:\eq\P{\text{Median}(X_1,\ldots,X_{2m+1})\le r}\eq\Pcond{\mathcal{M} }{\mathcal{O} } - representing the question
We should also have some sort of converse, related to r\le X_{m+2}\le\cdots X_{2m+1} or something.
We also have:
- An expression for \P{X_1\le \cdots\le X_n\le r} from Probability of i.i.d random variables being in an order and not greater than something
- It's \eq\frac{1}{n!}F_X(r)^n
Analysis
Let us look at X\le r and X\le Y to see what we can say if both are true (the "and")
- Claim: (X\le r\wedge X\le Y)\iff(X\le\Min{r,Y})
- Proof:
- \implies
- Suppose r\le Y, so \Min{r,Y}\eq r, obviously X\le r\ \implies\ X\le r\eq\Min{r,Y} , so the implication holds in this case
- Suppose Y\le r, so \Min{r,Y}\eq Y, obviously X\le Y\ \implies\ X\le Y\eq\Min{r,Y} , so the implication holds in this case too.
- \impliedby
- We notice either \Min{r,Y}\eq r if r\le Y, or \Min{r,Y}\eq Y if Y\le r (slightly modify the language for the equality, it doesn't matter though really)
- Thus if r\le Y then X\le r and as r\le Y by assumption, we use the transitivity of \le to see X\le r\le Y thus X\le Y too - as required
- Thus if Y\le r then X\le Y and as Y\le r by assumption, we use the transitivity of \le to see X\le Y\le r and thus X\le r too - as required.
- So in either case, we have X\le Y and X\le r - as required
- We notice either \Min{r,Y}\eq r if r\le Y, or \Min{r,Y}\eq Y if Y\le r (slightly modify the language for the equality, it doesn't matter though really)
- \implies
Problem statement
Thus we really want to find:
- \P{\text{Median}(X_1,\ldots,X_{2m+1})\le r}\eq\Pcond{X_1\le\cdots\le X_{m+1}\le r}{X_1\le\cdots\le X_{2m+1} }
- \eq\frac{\P{\M\ \text{and}\ \O} }{\P{\O} }
- \eq \big((2m+1)!\big)\P{X_1\le\cdots\le X_{m+1}\le\Min{r,X_{m+2} }\le X_{m+2}\le X_{m+3}\cdots\le X_{2m+1} }
- Caveat:We now need: \big(X\le r\wedge X\le Y\le Z\big)\implies\big(X\le\Min{r,Y}\le Y\le Z\big) to justify this format. Although that's arguably not that helpful for the integral.
Initial integral
- This isn't about the median specifically, this is just looking at the specific integral.
Suppose we have a sample of length 3, X,Y,Z then we are looking at:
- \P{X\le\Min{r,Y}\le Y\le Z\le t} (where t will be used for a limit towards \infty to get \P{X\le \Min{r,Y}\le Y\le Z} in the end), or as an integral:
- \int^t_{-\infty}f(z)\left(\int^z_{-\infty}f(y)\left(\int^{\Min{r,y} }_{-\infty} f(x)\d x\right)\d y\right)\d z
- if t>r then the minimum will get involved (for some zs anyway) and limit it to r, otherwise it'll always stay under r - of course in practice (as we'll take t\rightarrow\infty) this will certainly happen.
- \int^t_{-\infty}f(z)\left(\int^z_{-\infty}f(y)\left(\int^{\Min{r,y} }_{-\infty} f(x)\d x\right)\d y\right)\d z
Progression: 1
We are evaluating: \P{X_1\le\cdots\le X_{m+1}\le\Min{r,X_{m+2} }\le X_{m+2}\le X_{m+3}\cdots\le X_{2m+1}\le t } (our answer is \big((2m+1)!\big)\times of this as t\rightarrow\infty ), the full integral follows:
- \int^t_{-\infty}f(x_{2m+1})\left(\int^{x_{2m+1} }_{-\infty}f(x_{2m})\left(\cdots\int^{x_{m+3} }_{-\infty}f(x_{m+2})\left(\int^{\Min{r,x_{m+2} } }_{-\infty} f(x_{m+1}){\left(\int^{x_{m+1} }_{-\infty}f(x_{m} )\left(\cdots\int^{x_2}_{-\infty}f(x_1)\d x_1\cdots\right)\d x_m\right)}\d x_{m+1}\right)\d x_{m+2}\cdots\right)\d x_{2m}\right)\d x_{2m+1}
We operate on the inner bit:
- {\int^{x_{m+1} }_{-\infty}f(x_{m} )\left(\cdots\int^{x_2}_{-\infty}f(x_1)\d x_1\cdots\right)\d x_m}\eq \frac{1}{m!}F(x_{m+1})^m
We substitute this back in to yield:
- \frac{1}{m!}\int^t_{-\infty}f(x_{2m+1})\left(\int^{x_{2m+1} }_{-\infty}f(x_{2m})\left(\cdots\int^{x_{m+3} }_{-\infty}f(x_{m+2})\left(\int^{\Min{r,x_{m+2} } }_{-\infty} f(x_{m+1})F(x_{m+1})^m\d x_{m+1}\right)\d x_{m+2}\cdots\right)\d x_{2m}\right)\d x_{2m+1}
Conclusion of progression 1
We see here that
Progression: 2
This'll involve induction and dealing with the \text{Min}() will be "tricky", both for practice and induction we will consider the special cases m\eq 1 and m\eq 2 by evaluating:
- m\eq 1 yields I_1:\eq\frac{1}{1!}\int^t_{-\infty} f(x_3)\left(\int^{\Min{r,x_3} }_{-\infty}f(x_2)F(x_2) \d x_2\right)\d x_3, by case analysis:
- if t\le r then x_3\le t\le r or x_3\le r over the entire domain of interest, so \Min{r,x_3}\eq x_3 over the entire domain, giving:
- I_1\eq\frac{1}{1!}\int^t_{-\infty}f(x_3)\left(\int^{x_3}_{-\infty}f(x_2)F(x_2)\d x_2\right)\d x_3
- We now use the corollary below to see:
- I_1\eq\frac{1}{2!}\int^t_{-\infty}f(x_3)F(x_3)^2\d x_3
- \eq\frac{1}{3!}F(t)^3
- I_1\eq\frac{1}{2!}\int^t_{-\infty}f(x_3)F(x_3)^2\d x_3
- We now use the corollary below to see:
- I_1\eq\frac{1}{1!}\int^t_{-\infty}f(x_3)\left(\int^{x_3}_{-\infty}f(x_2)F(x_2)\d x_2\right)\d x_3
- if t\ge r then we split (-\infty,t] into (-\infty,r) and [r,t], giving:
- I_1\eq\frac{1}{1!}\left[\int^r_{-\infty} f(x_3)\left(\int^{\Min{r,x_3} }_{-\infty}f(x_2)F(x_2) \d x_2\right)\d x_3+\int_r^tf(x_3)\left(\int^{\Min{r,x_3} }_{-\infty}f(x_2)F(x_2) \d x_2\right)\d x_3\right]
- \eq\frac{1}{1!}\left[\int^r_{-\infty}f(x_3)\left(\int^{x_3}_{-\infty}f(x_2)F(x_2) \d x_2\right)\d x_3+\int_r^tf(x_3)\left(\int^r_{-\infty}f(x_2)F(x_2) \d x_2\right)\d x_3\right]
- We now use the required corollary immediately below to yield:
- I_1\eq\frac{1}{1!}\left[\int^r_{-\infty}f(x_3)\cdot\frac{1}{2}F(x_3)^2\d x_3+\int_r^tf(x_3)\cdot\frac{1}{2}F(r)^2\d x_3\right]
- \eq\frac{1}{2!}\left[\frac{1}{3}F(r)^3+F(r)^2\int^t_rf(x_3)\d x_3\right], note that: \int^t_rf(x)\d x\eq\int_{-\infty}^tf(x)\d x-\int_{-\infty}^rf(x)\d x \eq F(t)-F(r)
- \eq\frac{1}{2!}F(r)^2\left[\frac{1}{3}F(r)+\big(F(t)-F(r)\big)\right], note that: F(t)-F(r)\eq\frac{3F(t)-3F(r)}{3} which we'll use next
- \eq\frac{1}{2!}F(r)^2\left[\frac{3F(t)-2F(r)}{3}\right]
- \eq\frac{1}{3!}F(r)^2\big(3F(t)-2F(r)\big)
- I_1\eq\frac{1}{1!}\left[\int^r_{-\infty} f(x_3)\left(\int^{\Min{r,x_3} }_{-\infty}f(x_2)F(x_2) \d x_2\right)\d x_3+\int_r^tf(x_3)\left(\int^{\Min{r,x_3} }_{-\infty}f(x_2)F(x_2) \d x_2\right)\d x_3\right]
- if t\le r then x_3\le t\le r or x_3\le r over the entire domain of interest, so \Min{r,x_3}\eq x_3 over the entire domain, giving:
It is clear that as t\rightarrow\infty that we end up with I_1\eq\frac{1}{3!}F(r)^2\big(3-2F(r)\big)
Thus: \P{X_1\le X_2\le\Min{r,X_3}\le X_3}\eq\frac{1}{3!}F(r)^2\big(3-2F(r)\big)
Finally:
- \Pcond{X_1\le X_2\le r}{X_1\le X_2\le X_3}\eq F(r)^2\big(3-2F(r)\big)
Required corollary
Recall from Probability of i.i.d random variables being in an order and not greater than something that:
- \frac{1}{k!}\int^r_{-\infty}f(x)F(x)^k\d x\eq \frac{1}{(k+1)!}F(r)^{k+1}
So:
- \int^r_{-\infty}f(x)F(x)^k\d x\eq \frac{1}{k+1}F(r)^{k+1}
By applying this to above (with the x_2 integrals):
- \int^r_{-\infty}f(x)F(x)^1\d x\eq \frac{1}{2}F(r)^2 , we then substitute this for the cases r:\eq r and r:\eq x_3
We'll then apply it to the x_3 integrals.
Conclusion of progression 2
- \Pcond{X_1\le X_2\le r}{X_1\le X_2\le X_3}\eq F(r)^2\big(3-2F(r)\big)
Progression: 3
I am now looking at m\eq 3, which is 7 samples. To find this we evaluate:
- \P{\text{Median}\le r}\eq\frac{7!}{3!}\lim_{t\rightarrow+\infty}\left(\int^t_{-\infty}f(x_7)\left(\int^{x_7}_{-\infty}f(x_6)\left(\int^{x_6}_{-\infty}f(x_5)\left(\int^{\Min{r,x_5} }_{-\infty}f(x_4)F(x_4)^3 \d x_4\right)\d x_5\right)\d x_6\right)\d x_7\right)
Initial work:
- I_1(x_6):\eq \int^{x_6}_{-\infty}f(x_5)\left(\int^{\Min{r,x_5} }_{-\infty}f(x_4)F(x_4)^3 \d x_4\right)\d x_5\eq\left\{\begin{array}{lr}\frac{1}{5}\frac{1}{4}F(x_6)^5 && \text{if }x_6\le r\\\frac{1}{5}\frac{1}{4}F(r)^4\big(5F(x_6)-4F(r)\big) &&\text{if }x_6\ge r\end{array}\right. - these agree if x_6\eq r
- I_2(x_7):\eq \int^{x_7}_{-\infty}f(x_6)\left(\int^{x_6}_{-\infty}f(x_5)\left(\int^{\Min{r,x_5} }_{-\infty}f(x_4)F(x_4)^3 \d x_4\right)\d x_5\right)\d x_6\eq \int^{x_7}_{-\infty}f(x_6)I_1(x_6)\d x_6 \eq\frac{1}{6}\frac{1}{5}\frac{1}{4}\left\{\begin{array}{lr} F(x_7)^6 && \text{if }x_7\le r \\ F(r)^4\big(10F(r)^2-24F(r)F(x_7)+15F(x_7)^2\big) && \text{if }x_7\ge r\end{array}\right. - note both parts agree if r\eq x_7 as 10+15-24\eq 1
- I_3(t)\eq (everything in the limit) \eq \int^t_{-\infty} f(x_7)I_2(x_7)\d x_7 \eq\frac{1}{7}\frac{1}{6}\frac{1}{5}\frac{1}{4}\left\{\begin{array}{lr}F(t)^7 && \text{if }t\le r \\ F(r)^4\big(-20 F(r)^3 + 70F(r)^2 F(t)-84F(r)F(t)^2+35F(t)^3\big) && \text{if }t\ge r\end{array}\right. - note these agree if t\eq r
- Clearly as t\rightarrow+\infty we get I_3(t)\rightarrow\frac{1}{7}\frac{1}{6}\frac{1}{5}\frac{1}{4} F(r)^4\big(-20F(r)^3+70F(r)^2-84F(r)+35\big) as F(t)\rightarrow 1
From the top of this section:
- \P{\text{Median}\le r}\eq \frac{7!}{3!} I_3(+\infty)\eq F(r)^4\big(-20F(r)^3+70F(r)^2-84F(r)+35\big)
Conclusion:
- \P{\text{Median}\le r}\eq F(r)^4\big(-20F(r)^3+70F(r)^2-84F(r)+35\big)