Difference between revisions of "Notes:Distribution of the sample median"

Revision as of 11:59, 17 December 2017

$\newcommand{\P}[2][]{\mathbb{P}#1{\left[{#2}\right]} } \newcommand{\Pcond}[3][]{\mathbb{P}#1{\left[{#2}\!\ \middle\vert\!\ {#3}\right]} } \newcommand{\Plcond}[3][]{\Pcond[#1]{#2}{#3} } \newcommand{\Prcond}[3][]{\Pcond[#1]{#2}{#3} }$

$\newcommand{\E}[1]{ {\mathbb{E}{\left[{#1}\right]} } }$

$\newcommand{\Mdm}[1]{\text{Mdm}{\left({#1}\right) } }$

$\newcommand{\Var}[1]{\text{Var}{\left({#1}\right) } }$

$\newcommand{\ncr}[2]{ \vphantom{C}^{#1}\!C_{#2} }$

$\newcommand{\O}[0]{\mathcal{O} } \newcommand{\M}[0]{\mathcal{M} } \newcommand{\Q}[0]{\mathcal{Q} } \newcommand{\Min}[1]{\text{Min}\left({#1}\right)} \newcommand{\d}[0]{\mathrm{d} }$

Findings

I've found results for two sample sizes, $n\eq 3$ and $n\eq 5$ , they are respectively:

$F(r)^2\big[4-3F(r)\big]$ for $n\eq 3$ , and
F(r)3[10−15F(r)+6F(r)2] for n=5
- I've experimentally verified this one
$F(r)^4\big(-20F(r)^3+70F(r)^2-84F(r)+35\big)$ for $n\eq 7$

Important results

$\P{\text{Median}(X_1,\ldots,X_{2m+1})\le r}\eq\Pcond{X_1\le\cdots\le X_{m+1}\le r}{X_1\le\cdots\le X_{2m+1} }$
$\eq \frac{\P{X_1\le\cdots\le X_{m+1}\le\Min{r,X_{m+2} }\le X_{m+2}\le X_{m+3}\le\cdots\le X_{2m+1} } }{\frac{1}{(2m+1)!} }$

$\eq \big((2m+1)!\big)\P{X_1\le\cdots\le X_{m+1}\le\Min{r,X_{m+2} }\le X_{m+2}\le X_{m+3}\le\cdots\le X_{2m+1} }$

$\eq \lim_{t\rightarrow+\infty}\Bigg(\big((2m+1)!\big)\P{X_1\le\cdots\le X_{m+1}\le\Min{r,X_{m+2} }\le X_{m+2}\le X_{m+3}\le\cdots\le X_{2m+1}\le t }\Bigg)$

$\eq\frac{(2m+1)!}{m!}\lim_{t\rightarrow+\infty}\Bigg[\int^t_{-\infty}f(x_{2m+1})\left(\int^{x_{2m+1} }_{-\infty}f(x_{2m})\left(\cdots\int^{x_{m+3} }_{-\infty}f(x_{m+2})\left(\int^{\Min{r,x_{m+2} } }_{-\infty} f(x_{m+1})F(x_{m+1})^m\d x_{m+1}\right)\d x_{m+2}\cdots\right)\d x_{2m}\right)\d x_{2m+1}\Bigg]$

Problem overview

Let $X_1,\ldots,X_{2m+1}$ be a sample from a population $X$ , meaning that the $X_i$ are i.i.d random variables, for some $m\in\mathbb{N}_{0}$ . We wish to find:

$\P{\text{Median}(X_1,\ldots,X_{2m+1})\le r}$ - the Template:Cdf of the median.

Initial work

Since the variables are independent then any ordering is as likely as any other (which I proved the long way, rather than just jumping to

$\frac{1}{(2m+1)!}$ - silly me) however the result, found in Probability of i.i.d random variables being in an order and not greater than something will be useful.

I believe the $\P{\text{Median}(X_1,\ldots,X_{2m+1})\le r}\eq\Pcond{X_1\le\cdots\le X_{m+1}\le r}{X_1\le\cdots\le X_{2m+1} }$ . Let us make some definitions to make this shorter.

$\mathcal{O}:\eq X_1\le\cdots\le X_{2m+1}$ - representing the order part
$\mathcal{M}:\eq X_1\le\cdots\le X_{m+1}\le r$ - representing the median part
$\mathcal{Q}:\eq\P{\text{Median}(X_1,\ldots,X_{2m+1})\le r}\eq\Pcond{\mathcal{M} }{\mathcal{O} }$ - representing the question

We should also have some sort of converse, related to $r\le X_{m+2}\le\cdots X_{2m+1}$ or something.

We also have:

An expression for P[X1≤⋯≤Xn≤r] from Probability of i.i.d random variables being in an order and not greater than something
- It's $\eq\frac{1}{n!}F_X(r)^n$

Analysis

Let us look at $X\le r$ and $X\le Y$ to see what we can say if both are true (the "and")

Claim: $(X\le r\wedge X\le Y)\iff(X\le\Min{r,Y})$
Proof:
- ⟹
  1. Suppose $r\le Y$ , so $\Min{r,Y}\eq r$ , obviously $X\le r\ \implies\ X\le r\eq\Min{r,Y}$ , so the implication holds in this case
  2. Suppose $Y\le r$ , so $\Min{r,Y}\eq Y$ , obviously $X\le Y\ \implies\ X\le Y\eq\Min{r,Y}$ , so the implication holds in this case too.
- ⟸
  - We notice either Min(r,Y)=r if r≤Y, or Min(r,Y)=Y if Y≤r (slightly modify the language for the equality, it doesn't matter though really)
    - Thus if $r\le Y$ then $X\le r$ and as $r\le Y$ by assumption, we use the transitivity of $\le$ to see $X\le r\le Y$ thus $X\le Y$ too - as required
    - Thus if $Y\le r$ then $X\le Y$ and as $Y\le r$ by assumption, we use the transitivity of $\le$ to see $X\le Y\le r$ and thus $X\le r$ too - as required.
  - So in either case, we have $X\le Y$ and $X\le r$ - as required

Problem statement

Thus we really want to find:

P[Median(X1,…,X2m+1)≤r]=P[X1≤⋯≤Xm+1≤r | X1≤⋯≤X2m+1]
$\eq\frac{\P{\M\ \text{and}\ \O} }{\P{\O} }$

$\eq \big((2m+1)!\big)\P{X_1\le\cdots\le X_{m+1}\le\Min{r,X_{m+2} }\le X_{m+2}\le X_{m+3}\cdots\le X_{2m+1} }$
- $\big(X\le r\wedge X\le Y\le Z\big)\implies\big(X\le\Min{r,Y}\le Y\le Z\big)$ to justify this format. Although that's arguably not that helpful for the integral.

Initial integral

This isn't about the median specifically, this is just looking at the specific integral.

Suppose we have a sample of length 3, $X,Y,Z$ then we are looking at:

P[X≤Min(r,Y)≤Y≤Z≤t] (where t will be used for a limit towards ∞ to get P[X≤Min(r,Y)≤Y≤Z] in the end), or as an integral:
- ∫t−∞f(z)(∫z−∞f(y)(∫Min(r,y)−∞f(x)dx)dy)dz
  - if $t>r$ then the minimum will get involved (for some $z$ s anyway) and limit it to $r$ , otherwise it'll always stay under $r$ - of course in practice (as we'll take $t\rightarrow\infty$ ) this will certainly happen.

Progression: 1

We are evaluating:

$\P{X_1\le\cdots\le X_{m+1}\le\Min{r,X_{m+2} }\le X_{m+2}\le X_{m+3}\cdots\le X_{2m+1}\le t }$ (our answer is

$\big((2m+1)!\big)\times$ of this as

$t\rightarrow\infty$ ), the full integral follows:

$\int^t_{-\infty}f(x_{2m+1})\left(\int^{x_{2m+1} }_{-\infty}f(x_{2m})\left(\cdots\int^{x_{m+3} }_{-\infty}f(x_{m+2})\left(\int^{\Min{r,x_{m+2} } }_{-\infty} f(x_{m+1}){\left(\int^{x_{m+1} }_{-\infty}f(x_{m} )\left(\cdots\int^{x_2}_{-\infty}f(x_1)\d x_1\cdots\right)\d x_m\right)}\d x_{m+1}\right)\d x_{m+2}\cdots\right)\d x_{2m}\right)\d x_{2m+1}$

We operate on the inner bit:

${\int^{x_{m+1} }_{-\infty}f(x_{m} )\left(\cdots\int^{x_2}_{-\infty}f(x_1)\d x_1\cdots\right)\d x_m}\eq \frac{1}{m!}F(x_{m+1})^m$

We substitute this back in to yield:

$\frac{1}{m!}\int^t_{-\infty}f(x_{2m+1})\left(\int^{x_{2m+1} }_{-\infty}f(x_{2m})\left(\cdots\int^{x_{m+3} }_{-\infty}f(x_{m+2})\left(\int^{\Min{r,x_{m+2} } }_{-\infty} f(x_{m+1})F(x_{m+1})^m\d x_{m+1}\right)\d x_{m+2}\cdots\right)\d x_{2m}\right)\d x_{2m+1}$

Conclusion of progression 1

We see here that

Progression: 2

This'll involve induction and dealing with the $\text{Min}()$ will be "tricky", both for practice and induction we will consider the special cases $m\eq 1$ and $m\eq 2$ by evaluating:

m=1 yields I1:=11!∫t−∞f(x3)(∫Min(r,x3)−∞f(x2)F(x2)dx2)dx3
, by case analysis:
1. if t≤r then x3≤t≤r or x3≤r over the entire domain of interest, so Min(r,x3)=x3 over the entire domain, giving:
  - I1=11!∫t−∞f(x3)(∫x3−∞f(x2)F(x2)dx2)dx3
    - We now use the corollary below to see:
      - $I_1\eq\frac{1}{2!}\int^t_{-\infty}f(x_3)F(x_3)^2\d x_3$
        $\eq\frac{1}{3!}F(t)^3$
2. if t≥r then we split (−∞,t] into (−∞,r) and [r,t], giving:
  - I1=11![∫r−∞f(x3)(∫Min(r,x3)−∞f(x2)F(x2)dx2)dx3+∫trf(x3)(∫Min(r,x3)−∞f(x2)F(x2)dx2)dx3]
    
    $\eq\frac{1}{1!}\left[\int^r_{-\infty}f(x_3)\left(\int^{x_3}_{-\infty}f(x_2)F(x_2) \d x_2\right)\d x_3+\int_r^tf(x_3)\left(\int^r_{-\infty}f(x_2)F(x_2) \d x_2\right)\d x_3\right]$
    - We now use the required corollary immediately below to yield:
      $I_1\eq\frac{1}{1!}\left[\int^r_{-\infty}f(x_3)\cdot\frac{1}{2}F(x_3)^2\d x_3+\int_r^tf(x_3)\cdot\frac{1}{2}F(r)^2\d x_3\right]$
      
      $\eq\frac{1}{2!}\left[\frac{1}{3}F(r)^3+F(r)^2\int^t_rf(x_3)\d x_3\right]$ , note that: $\int^t_rf(x)\d x\eq\int_{-\infty}^tf(x)\d x-\int_{-\infty}^rf(x)\d x$ $\eq F(t)-F(r)$
      
      $\eq\frac{1}{2!}F(r)^2\left[\frac{1}{3}F(r)+\big(F(t)-F(r)\big)\right]$ , note that: $F(t)-F(r)\eq\frac{3F(t)-3F(r)}{3}$ which we'll use next
      
      $\eq\frac{1}{2!}F(r)^2\left[\frac{3F(t)-2F(r)}{3}\right]$
      
      $\eq\frac{1}{3!}F(r)^2\big(3F(t)-2F(r)\big)$

It is clear that as $t\rightarrow\infty$ that we end up with

$I_1\eq\frac{1}{3!}F(r)^2\big(3-2F(r)\big)$

Thus:

$\P{X_1\le X_2\le\Min{r,X_3}\le X_3}\eq\frac{1}{3!}F(r)^2\big(3-2F(r)\big)$

Finally:

$\Pcond{X_1\le X_2\le r}{X_1\le X_2\le X_3}\eq F(r)^2\big(3-2F(r)\big)$

Required corollary

Recall from Probability of i.i.d random variables being in an order and not greater than something that:

$\frac{1}{k!}\int^r_{-\infty}f(x)F(x)^k\d x\eq \frac{1}{(k+1)!}F(r)^{k+1}$

So:

$\int^r_{-\infty}f(x)F(x)^k\d x\eq \frac{1}{k+1}F(r)^{k+1}$

By applying this to above (with the $x_2$ integrals):

$\int^r_{-\infty}f(x)F(x)^1\d x\eq \frac{1}{2}F(r)^2$ , we then substitute this for the cases $r:\eq r$ and $r:\eq x_3$

We'll then apply it to the $x_3$ integrals.

Conclusion of progression 2

$\Pcond{X_1\le X_2\le r}{X_1\le X_2\le X_3}\eq F(r)^2\big(3-2F(r)\big)$

Progression: 3

I am now looking at $m\eq 3$ , which is 7 samples. To find this we evaluate:

$\P{\text{Median}\le r}\eq\frac{7!}{3!}\lim_{t\rightarrow+\infty}\left(\int^t_{-\infty}f(x_7)\left(\int^{x_7}_{-\infty}f(x_6)\left(\int^{x_6}_{-\infty}f(x_5)\left(\int^{\Min{r,x_5} }_{-\infty}f(x_4)F(x_4)^3 \d x_4\right)\d x_5\right)\d x_6\right)\d x_7\right)$

Initial work:

$I_1(x_6):\eq \int^{x_6}_{-\infty}f(x_5)\left(\int^{\Min{r,x_5} }_{-\infty}f(x_4)F(x_4)^3 \d x_4\right)\d x_5\eq\left\{\begin{array}{lr}\frac{1}{5}\frac{1}{4}F(x_6)^5 && \text{if }x_6\le r\\\frac{1}{5}\frac{1}{4}F(r)^4\big(5F(x_6)-4F(r)\big) &&\text{if }x_6\ge r\end{array}\right.$ - these agree if $x_6\eq r$
$I_2(x_7):\eq \int^{x_7}_{-\infty}f(x_6)\left(\int^{x_6}_{-\infty}f(x_5)\left(\int^{\Min{r,x_5} }_{-\infty}f(x_4)F(x_4)^3 \d x_4\right)\d x_5\right)\d x_6\eq \int^{x_7}_{-\infty}f(x_6)I_1(x_6)\d x_6$ $\eq\frac{1}{6}\frac{1}{5}\frac{1}{4}\left\{\begin{array}{lr} F(x_7)^6 && \text{if }x_7\le r \\ F(r)^4\big(10F(r)^2-24F(r)F(x_7)+15F(x_7)^2\big) && \text{if }x_7\ge r\end{array}\right.$ - note both parts agree if $r\eq x_7$ as $10+15-24\eq 1$
I3(t)= (everything in the limit) =∫t−∞f(x7)I2(x7)dx7
=17161514{F(t)7if t≤rF(r)4(−20F(r)3+70F(r)2F(t)−84F(r)F(t)2+35F(t)3)if t≥r
- note these agree if t=r
- Clearly as $t\rightarrow+\infty$ we get $I_3(t)\rightarrow\frac{1}{7}\frac{1}{6}\frac{1}{5}\frac{1}{4} F(r)^4\big(-20F(r)^3+70F(r)^2-84F(r)+35\big)$ as $F(t)\rightarrow 1$

From the top of this section:

$\P{\text{Median}\le r}\eq \frac{7!}{3!} I_3(+\infty)\eq F(r)^4\big(-20F(r)^3+70F(r)^2-84F(r)+35\big)$

Conclusion:

$\P{\text{Median}\le r}\eq F(r)^4\big(-20F(r)^3+70F(r)^2-84F(r)+35\big)$

@@ Line 3: / Line 3: @@
 ==Findings==
 I've found results for two sample sizes, {{M|n\eq 3}} and {{M|n\eq 5}}, they are respectively:
-* {{M|F(r)^3\big[4-3F(r)\big]}} for {{M|n\eq 3}}, and
+* {{M|F(r)^2\big[4-3F(r)\big]}} for {{M|n\eq 3}}, and
 * {{M|F(r)^3\big[10-15F(r)+6F(r)^2\big]}} for {{M|n\eq 5}}
-** I've experimentally verified this one, it's built on the {{M|n\eq 3}} result, which was not checked so thoroughly
+** I've experimentally verified this one
+* {{M|F(r)^4\big(-20F(r)^3+70F(r)^2-84F(r)+35\big)}} for {{M|n\eq 7}}
 ==Important results==
 # {{M|\P{\text{Median}(X_1,\ldots,X_{2m+1})\le r}\eq\Pcond{X_1\le\cdots\le X_{m+1}\le r}{X_1\le\cdots\le X_{2m+1} } }}
@@ Line 112: / Line 113: @@
 I am now looking at {{M|m\eq 3}}, which is 7 samples. To find this we evaluate:
 * {{MM|\P{\text{Median}\le r}\eq\frac{7!}{3!}\lim_{t\rightarrow+\infty}\left(\int^t_{-\infty}f(x_7)\left(\int^{x_7}_{-\infty}f(x_6)\left(\int^{x_6}_{-\infty}f(x_5)\left(\int^{\Min{r,x_5} }_{-\infty}f(x_4)F(x_4)^3 \d x_4\right)\d x_5\right)\d x_6\right)\d x_7\right)}}
+Initial work:
+# {{MM|I_1(x_6):\eq \int^{x_6}_{-\infty}f(x_5)\left(\int^{\Min{r,x_5} }_{-\infty}f(x_4)F(x_4)^3 \d x_4\right)\d x_5\eq\left\{ $\begin{array}{lr}\frac{1}{5}\frac{1}{4}F(x_6)^5 && \text{if }x_6\le r\\\frac{1}{5}\frac{1}{4}F(r)^4\big(5F(x_6)-4F(r)\big) &&\text{if }x_6\ge r\end{array}$ \right.}} - these agree if {{M|x_6\eq r}}
+# {{MM|I_2(x_7):\eq \int^{x_7}_{-\infty}f(x_6)\left(\int^{x_6}_{-\infty}f(x_5)\left(\int^{\Min{r,x_5} }_{-\infty}f(x_4)F(x_4)^3 \d x_4\right)\d x_5\right)\d x_6\eq \int^{x_7}_{-\infty}f(x_6)I_1(x_6)\d x_6}} {{MM|\eq\frac{1}{6}\frac{1}{5}\frac{1}{4}\left\{ $\begin{array}{lr} F(x_7)^6 && \text{if }x_7\le r \\ F(r)^4\big(10F(r)^2-24F(r)F(x_7)+15F(x_7)^2\big) && \text{if }x_7\ge r\end{array}$ \right.}} - note both parts agree if {{M|r\eq x_7}} as {{M|10+15-24\eq 1}}
+# {{M|I_3(t)\eq}} (everything in the limit) {{MM|\eq \int^t_{-\infty} f(x_7)I_2(x_7)\d x_7}} {{MM|\eq\frac{1}{7}\frac{1}{6}\frac{1}{5}\frac{1}{4}\left\{ $\begin{array}{lr}F(t)^7 && \text{if }t\le r \\ F(r)^4\big(-20 F(r)^3 + 70F(r)^2 F(t)-84F(r)F(t)^2+35F(t)^3\big) && \text{if }t\ge r\end{array}$ \right.}} - note these agree if {{M|t\eq r}}
+#* Clearly as {{M|t\rightarrow+\infty}} we get {{MM|I_3(t)\rightarrow\frac{1}{7}\frac{1}{6}\frac{1}{5}\frac{1}{4} F(r)^4\big(-20F(r)^3+70F(r)^2-84F(r)+35\big)}} as {{M|F(t)\rightarrow 1}}
+From the top of this section:
+* {{MM|\P{\text{Median}\le r}\eq \frac{7!}{3!} I_3(+\infty)\eq F(r)^4\big(-20F(r)^3+70F(r)^2-84F(r)+35\big)}}
+'''Conclusion:'''
+* {{MM|\P{\text{Median}\le r}\eq  F(r)^4\big(-20F(r)^3+70F(r)^2-84F(r)+35\big)}}

Difference between revisions of "Notes:Distribution of the sample median"

Revision as of 11:59, 17 December 2017

Contents

Findings

Important results

Problem overview

Initial work

Analysis

Problem statement

Initial integral

Progression: 1

Conclusion of progression 1

Progression: 2

Required corollary

Conclusion of progression 2

Progression: 3

Navigation menu

Views

Personal tools

Navigation

Search

Tools