Notes:Distribution of the sample median
Contents
Problem overview
Let [ilmath]X_1,\ldots,X_{2m+1} [/ilmath] be a sample from a population [ilmath]X[/ilmath], meaning that the [ilmath]X_i[/ilmath] are i.i.d random variables, for some [ilmath]m\in\mathbb{N}_{0} [/ilmath]. We wish to find:
- [math]\P{\text{Median}(X_1,\ldots,X_{2m+1})\le r} [/math] - the Template:Cdf of the median.
Initial work
Since the variables are independent then any ordering is as likely as any other (which I proved the long way, rather than just jumping to [math]\frac{1}{(2m+1)!} [/math] - silly me) however the result, found in Probability of i.i.d random variables being in an order and not greater than something will be useful.
I believe the [ilmath]\P{\text{Median}(X_1,\ldots,X_{2m+1})\le r}\eq\Pcond{X_1\le\cdots\le X_{m+1}\le r}{X_1\le\cdots\le X_{2m+1} } [/ilmath]. Let us make some definitions to make this shorter.
- [ilmath]\mathcal{O}:\eq X_1\le\cdots\le X_{2m+1} [/ilmath] - representing the order part
- [ilmath]\mathcal{M}:\eq X_1\le\cdots\le X_{m+1}\le r[/ilmath] - representing the median part
- [ilmath]\mathcal{Q}:\eq\P{\text{Median}(X_1,\ldots,X_{2m+1})\le r}\eq\Pcond{\mathcal{O} }{\mathcal{O} } [/ilmath] - representing the question
We should also have some sort of converse, related to [ilmath]r\le X_{m+2}\le\cdots X_{2m+1} [/ilmath] or something.
We also have:
- An expression for [ilmath]\P{X_1\le \cdots\le X_n\le r} [/ilmath] from Probability of i.i.d random variables being in an order and not greater than something
- It's [math]\eq\frac{1}{n!}F_X(r)^n[/math]
Analysis
Let us look at [ilmath]X\le r[/ilmath] and [ilmath]X\le Y[/ilmath] to see what we can say if both are true (the "and")
- Claim: [ilmath](X\le r\wedge X\le Y)\iff(X\le\Min{r,Y})[/ilmath]
- Proof:
- [ilmath]\implies[/ilmath]
- Suppose [ilmath]r\le Y[/ilmath], so [ilmath]\Min{r,Y}\eq r[/ilmath], obviously [ilmath]X\le r\ \implies\ X\le r\eq\Min{r,Y} [/ilmath], so the implication holds in this case
- Suppose [ilmath]Y\le r[/ilmath], so [ilmath]\Min{r,Y}\eq Y[/ilmath], obviously [ilmath]X\le Y\ \implies\ X\le Y\eq\Min{r,Y} [/ilmath], so the implication holds in this case too.
- [ilmath]\impliedby[/ilmath]
- We notice either [ilmath]\Min{r,Y}\eq r[/ilmath] if [ilmath]r\le Y[/ilmath], or [ilmath]\Min{r,Y}\eq Y[/ilmath] if [ilmath]Y\le r[/ilmath] (slightly modify the language for the equality, it doesn't matter though really)
- Thus if [ilmath]r\le Y[/ilmath] then [ilmath]X\le r[/ilmath] and as [ilmath]r\le Y[/ilmath] by assumption, we use the transitivity of [ilmath]\le[/ilmath] to see [ilmath]X\le r\le Y[/ilmath] thus [ilmath]X\le Y[/ilmath] too - as required
- Thus if [ilmath]Y\le r[/ilmath] then [ilmath]X\le Y[/ilmath] and as [ilmath]Y\le r[/ilmath] by assumption, we use the transitivity of [ilmath]\le[/ilmath] to see [ilmath]X\le Y\le r[/ilmath] and thus [ilmath]X\le r[/ilmath] too - as required.
- So in either case, we have [ilmath]X\le Y[/ilmath] and [ilmath]X\le r[/ilmath] - as required
- We notice either [ilmath]\Min{r,Y}\eq r[/ilmath] if [ilmath]r\le Y[/ilmath], or [ilmath]\Min{r,Y}\eq Y[/ilmath] if [ilmath]Y\le r[/ilmath] (slightly modify the language for the equality, it doesn't matter though really)
- [ilmath]\implies[/ilmath]
Problem statement
Thus we really want to find:
- [ilmath]\P{\text{Median}(X_1,\ldots,X_{2m+1})\le r}\eq\Pcond{X_1\le\cdots\le X_{m+1}\le r}{X_1\le\cdots\le X_{2m+1} } [/ilmath]
- [math]\eq\frac{\P{\M\ \text{and}\ \O} }{\P{\O} } [/math]
- [math]\eq \big((2m+1)!\big)\P{X_1\le\cdots\le X_{m+1}\le\Min{r,X_{m+2} }\le X_{m+2}\le X_{m+3}\cdots\le X_{2m+1} } [/math]
- Caveat:We now need: [math]\big(X\le r\wedge X\le Y\le Z\big)\implies\big(X\le\Min{r,Y}\le Y\le Z\big)[/math] to justify this format. Although that's arguably not that helpful for the integral.
Initial integral
- This isn't about the median specifically, this is just looking at the specific integral.
Suppose we have a sample of length 3, [ilmath]X,Y,Z[/ilmath] then we are looking at:
- [ilmath]\P{X\le\Min{r,Y}\le Y\le Z\le t} [/ilmath] (where [ilmath]t[/ilmath] will be used for a limit towards [ilmath]\infty[/ilmath] to get [ilmath]\P{X\le \Min{r,Y}\le Y\le Z} [/ilmath] in the end), or as an integral:
- [math]\int^t_{-\infty}f(z)\left(\int^z_{-\infty}f(y)\left(\int^{\Min{r,y} }_{-\infty} f(x)\d x\right)\d y\right)\d z[/math]
- if [ilmath]t>r[/ilmath] then the minimum will get involved (for some [ilmath]z[/ilmath]s anyway) and limit it to [ilmath]r[/ilmath], otherwise it'll always stay under [ilmath]r[/ilmath] - of course in practice (as we'll take [ilmath]t\rightarrow\infty[/ilmath]) this will certainly happen.
- [math]\int^t_{-\infty}f(z)\left(\int^z_{-\infty}f(y)\left(\int^{\Min{r,y} }_{-\infty} f(x)\d x\right)\d y\right)\d z[/math]
Progression: 1
We are evaluating: [math]\P{X_1\le\cdots\le X_{m+1}\le\Min{r,X_{m+2} }\le X_{m+2}\le X_{m+3}\cdots\le X_{2m+1}\le t } [/math] (our answer is [math]\big((2m+1)!\big)\times[/math] of this as [ilmath]t\rightarrow\infty[/ilmath] ), the full integral follows:
- [math]\int^t_{-\infty}f(x_{2m+1})\left(\int^{x_{2m+1} }_{-\infty}f(x_{2m})\left(\cdots\int^{x_{m+3} }_{-\infty}f(x_{m+2})\left(\int^{\Min{r,x_{m+2} } }_{-\infty} f(x_{m+1}){\left(\int^{x_{m+1} }_{-\infty}f(x_{m} )\left(\cdots\int^{x_2}_{-\infty}f(x_1)\d x_1\cdots\right)\d x_m\right)}\d x_{m+1}\right)\d x_{m+2}\cdots\right)\d x_{2m}\right)\d x_{2m+1} [/math]
We operate on the inner bit:
- [math]{\int^{x_{m+1} }_{-\infty}f(x_{m} )\left(\cdots\int^{x_2}_{-\infty}f(x_1)\d x_1\cdots\right)\d x_m}\eq \frac{1}{m!}F(x_{m+1})^m[/math]
We substitute this back in to yield:
- [math]\frac{1}{m!}\int^t_{-\infty}f(x_{2m+1})\left(\int^{x_{2m+1} }_{-\infty}f(x_{2m})\left(\cdots\int^{x_{m+3} }_{-\infty}f(x_{m+2})\left(\int^{\Min{r,x_{m+2} } }_{-\infty} f(x_{m+1})F(x_{m+1})^m\d x_{m+1}\right)\d x_{m+2}\cdots\right)\d x_{2m}\right)\d x_{2m+1} [/math]
Progression: 2
This'll involve induction and dealing with the [ilmath]\text{Min}()[/ilmath] will be "tricky", both for practice and induction we will consider the special cases [ilmath]m\eq 1[/ilmath] and [ilmath]m\eq 2[/ilmath] by evaluating:
- [ilmath]m\eq 1[/ilmath] yields [math]I_1:\eq\frac{1}{1!}\int^t_{-\infty} f(x_3)\left(\int^{\Min{r,x_3} }_{-\infty}f(x_2)F(x_2) \d x_2\right)\d x_3[/math], by case analysis:
- if [ilmath]t\le r[/ilmath] then [ilmath]x_3\le t\le r[/ilmath] or [ilmath]x_3\le r[/ilmath] over the entire domain of interest, so [ilmath]\Min{r,x_3}\eq x_3[/ilmath] over the entire domain, giving:
- [math]I_1\eq\frac{1}{1!}\int^t_{-\infty}f(x_3)\left(\int^{x_3}_{-\infty}f(x_2)F(x_2)\d x_2\right)\d x_3[/math]
- if [ilmath]t\ge r[/ilmath] then we split [ilmath](-\infty,t][/ilmath] into [ilmath](-\infty,r)[/ilmath] and [ilmath][r,t][/ilmath], giving:
- [math]I_1\eq\frac{1}{1!}\left[\int^r_{-\infty} f(x_3)\left(\int^{\Min{r,x_3} }_{-\infty}f(x_2)F(x_2) \d x_2\right)\d x_3+\int_r^tf(x_3)\left(\int^{\Min{r,x_3} }_{-\infty}f(x_2)F(x_2) \d x_2\right)\d x_3\right][/math]
- [math]\eq\frac{1}{1!}\left[\int^r_{-\infty}f(x_3)\left(\int^{x_3}_{-\infty}f(x_2)F(x_2) \d x_2\right)\d x_3+\int_r^tf(x_3)\left(\int^r_{-\infty}f(x_2)F(x_2) \d x_2\right)\d x_3\right][/math]
- [math]I_1\eq\frac{1}{1!}\left[\int^r_{-\infty} f(x_3)\left(\int^{\Min{r,x_3} }_{-\infty}f(x_2)F(x_2) \d x_2\right)\d x_3+\int_r^tf(x_3)\left(\int^{\Min{r,x_3} }_{-\infty}f(x_2)F(x_2) \d x_2\right)\d x_3\right][/math]
- if [ilmath]t\le r[/ilmath] then [ilmath]x_3\le t\le r[/ilmath] or [ilmath]x_3\le r[/ilmath] over the entire domain of interest, so [ilmath]\Min{r,x_3}\eq x_3[/ilmath] over the entire domain, giving: