Distribution of the sample median

From Maths
Revision as of 20:17, 1 January 2018 by Alec (Talk | contribs)

Jump to: navigation, search
Warning:This page is currently in the "notes" stage, and is a staging area from conclusions drawn from Notes:Distribution of the sample median
  • This page is not "formal" yet. However the contents are accurate (to whatever they apply to) - I hope to distil the essence of an "ordered" unit from Alec's taxonomy of units and describe the median's distribution purely on that foundation, then anything which is itself ordinal (in theory both additive and real units) will be a corollary.
[ilmath]\newcommand{\P}[2][]{\mathbb{P}#1{\left[{#2}\right]} } \newcommand{\Pcond}[3][]{\mathbb{P}#1{\left[{#2}\!\ \middle\vert\!\ {#3}\right]} } \newcommand{\Plcond}[3][]{\Pcond[#1]{#2}{#3} } \newcommand{\Prcond}[3][]{\Pcond[#1]{#2}{#3} }[/ilmath]
[ilmath]\newcommand{\E}[1]{ {\mathbb{E}{\left[{#1}\right]} } } [/ilmath][ilmath]\newcommand{\Mdm}[1]{\text{Mdm}{\left({#1}\right) } } [/ilmath][ilmath]\newcommand{\Var}[1]{\text{Var}{\left({#1}\right) } } [/ilmath][ilmath]\newcommand{\ncr}[2]{ \vphantom{C}^{#1}\!C_{#2} } [/ilmath]

Set up

Let [ilmath]m\in\mathbb{N}_0[/ilmath] describe the size of the sample, [ilmath]n[/ilmath], by the relation: [ilmath]n\eq 2m+1[/ilmath] - forcing [ilmath]n[/ilmath] to be odd.

Let [ilmath]X_1,\ldots,X_{2m+1} [/ilmath] be i.i.d samples from a population distribution [ilmath]X[/ilmath]; let [ilmath]M[/ilmath] denote the median of the sample, [ilmath]X_1,\ldots,X_n[/ilmath], and let [ilmath]F(r):\eq \P{X_i\le r}\eq\P{X\le r} [/ilmath] for any [ilmath]i[/ilmath][Note 1] then:

  • For [ilmath]m\eq 1[/ilmath] / [ilmath]n\eq 3[/ilmath] we have: [ilmath]\P{M\le r}\eq F(r)^2\big(-2F(r)+3\big)[/ilmath]
  • For [ilmath]m\eq 2[/ilmath] / [ilmath]n\eq 5[/ilmath] we have: [ilmath]\P{M\le r}\eq F(r)^3\big(6F(r)^2-15F(r)+10\big)[/ilmath]
  • For [ilmath]m\eq 3[/ilmath] / [ilmath]n\eq 7[/ilmath] we have: [ilmath]\P{M\le r}\eq F(r)^4\big(-20F(r)^3+70F(r)^2-84F(r)+35\big)[/ilmath]
  • For [ilmath]m\eq 4[/ilmath] / [ilmath]n\eq 9[/ilmath] we have: [ilmath]\P{M\le r}\eq F(r)^5\big(70F(r)^4-315F(r)^3+540F(r)^2-420F(r)+126\big)[/ilmath]Caveat:[Note 2]

In general

Warning:This is written from memory, not from my notes! - Alec check the notes! Alec (talk) 21:57, 19 December 2017 (UTC)

In general I believe for [ilmath]m\in\mathbb{N}_0[/ilmath] given and [ilmath]n:\eq 2m+1[/ilmath] that:

  • [math]g(x):\eq \sum_{i\eq 0}^m{}^{(2m+1)}C_i * (x-1)^i[/math] - which we then expand and reverse the coefficients of to obtain the polynomials in the brackets with the [ilmath]x^k[/ilmath] factor removed for our [ilmath]\P{M\le r} [/ilmath] equations above. We must then multiply this reversed polynomial by [ilmath]x^{m+1} [/ilmath] I believe and job done!

Findings

The median seems to be a rather crappy estimator for the median of a distribution, for example with the Exponential distribution, so [ilmath]X\sim\text{Exp}(\lambda)[/ilmath] for any [ilmath]\lambda\in\mathbb{R} [/ilmath] and [ilmath]\lambda>0[/ilmath] Then [ilmath]\E{X} [/ilmath] is above the true median of [ilmath]X[/ilmath], which is [math]\frac{\ln(2)}{\lambda} [/math] but appears to show convergence

Continuing the exponential example, the following hold for all real [ilmath]\lambda>0[/ilmath], here [ilmath]M[/ilmath] is the true median of the distribution, [ilmath]M_i[/ilmath] is the median random variable for [ilmath]n\eq i[/ilmath] samples and the constants are accurate to 2 s.f

  • [ilmath]0.88\E{M_3}\approx M[/ilmath]
  • [ilmath]0.83\E{M_5}\approx M[/ilmath]
  • [ilmath]0.91\E{M_7}\approx M[/ilmath]


However [ilmath]X[/ilmath] were normally distributed then the [ilmath]M_{2m+1} [/ilmath]s would not be biased and multiplying by these constants would be bad.

Ideas

  1. Use Alec's alternate expectation formula to try and find the expected value - again this is written from memory not notes but it states something like:
    • [math]\E{X}\eq\int_0^\infty \P{X\ge x}\mathrm{d} x[/math] - but the requirement for [ilmath]X\ge 0[/ilmath] may not be needed (it could work for ALL RVs maybe - can't see my notes right now)... I'm not sure.

Notes

  1. As the [ilmath]X_i[/ilmath] are i.i.d this will be the same for all [ilmath]i[/ilmath]
  2. This one is predicted from a formula I found however this and further predictions seem to work. They have not been experimentally confirmed to the same high standard as previous results though

OLD WORK - unsaved addendum=

Reversing polynomial order

Let [math]f(x):\eq\sum^n_{i\eq 0}a_ix^i\eq a_0+a_1x+a_2x^2+\cdots+a_{n-1}x^{n-1}+a_nx^n[/math][Note 1] be a polynomial in [ilmath]x[/ilmath]

Our goal is to "reverse the coefficients" of [ilmath]f[/ilmath], to obtain a polynomial [ilmath]f'[/ilmath] such that:

  • [math]f'(x)\eq \sum^n_{i\eq 0}a_ix^{n-i} \eq\sum^n_{i\eq 0}a_{n-i}x^i\eq a_0x^n+a_1x^{n-1}+a_2x^{n-2}+\cdots+a_{n-1}x+a_n[/math], perhaps better written as: [math]f'(x)\eq a_n+a_{n-1}x+a_{n-2}x^2+\cdots+a_1x^{n-1}+a_0x^n[/math]

Process

Let [ilmath]f(x)[/ilmath] be a [ilmath]n^\text{th} [/ilmath]-order polynomial over some field
TODO: TODO - just assume [ilmath]\mathbb{R} [/ilmath] for now
, then:
  • Define [ilmath]f_1(x):\eq \frac{1}{x^n}f(x)\eq x^{-n}f(x)\eq\sum^n_{i\eq 0}a_ix^{i-n}\eq a_0x^{-n}+a_1x^{1-n}+a_2x^{2-n}+\cdots+ a_{n-1}x^{-1}+a_n[/ilmath] - Caveat:Provided [ilmath]x\neq 0[/ilmath]
    • Define [ilmath]f_2(x):\eq f_1\left(\frac{1}{x}\right)[/ilmath] so [ilmath]f_2(x)\eq \sum^n_{i\eq 0}a_ix^{-(i-n)}\eq a_0x^n+a_1x^{-(1-n)}+a_2x^{-(2-n)}+\cdots+ a_{n-1}x^{-(-1)}+a_n[/ilmath]
      [ilmath]\eq a_0x^n+a_1x^{n-1}+a_2x^{n-2}+\cdots+a_{n-1}x^1+a_n[/ilmath] - which is what we want!

Notes

  1. Usually we use indexes starting at one, so for a polynomial of order n we would write [ilmath]\sum^{n+1}_{i\eq 1}a_ix^{i-1} [/ilmath] but the convenience outweighs the "minor gains" (if any) we'd make here

Found this in an old tab, I then made the above content, so adding it here