Median (sample)
- Not to be confused with the median of a random variable
Caveat:
Median of an even number of samples
For a sample of an even number number of values, say [ilmath]x_1,\ldots,x_{2m} [/ilmath] for some [ilmath]m\in\mathbb{N}_{\ge 1} [/ilmath], and let us write [ilmath]x'_1,\ldots,x'_{2m} [/ilmath] for the sorted sample values, so [ilmath]x'_1\le x'_2\le\cdots\le x'_{2m-1}\le x'_{2m} [/ilmath], then it is convention to define the median as:
- [math]\text{Median}(x_1,\ldots,x_{2m}):\eq\frac{x'_m+x'_{m+1} }{2} [/math] - the average of the two middle points.
However, as per Alec's taxonomy of units, we can "do Median" on just "ordered" unit types, Here there is no natural or canonical concept of "average between them" unless you map them onto some subset of the natural numbers. There are some options in this case, see Notes:Median of an ordered unit type sample.
The current options being considered are:
- Consider two items as the median, this means that the median is exactly 1 or 2 items, or zero should median of no samples be considered
- Introduce some new element, say [ilmath]a[/ilmath], which is by definition: [ilmath]x'_m < a < x'_{m+1} [/ilmath]
Definition
There is complete consensus on the median of an odd number of sample values. However for an even number of samples things are a little less clear. As per the caveat.
In what follows we shall define:
- [ilmath]x_1,\ldots,x_n[/ilmath] as the sample, for which we may have [ilmath]n:\eq 2m+1[/ilmath] or [ilmath]n:\eq 2n[/ilmath] depending on the case, and
- [ilmath]x'_1,\ldots,x'_n[/ilmath] meaning "the ordered sample", a permutation on the [ilmath]x_i[/ilmath] where the values have been sorted, so:
- [ilmath]x'_1\le x'_2\le\cdots\le x'_{n-1}\le x'_n[/ilmath]
Odd sample
Let [ilmath]x_1,\ldots,x_{2m+1} [/ilmath] be given, for some [ilmath]m\in\mathbb{N}_{\ge 0} [/ilmath], then we define:
- [ilmath]\text{Median}(x_1,\ldots,x_{2m+1}):\eq x'_{m+1} [/ilmath]
Example:
- Say giving the sample [ilmath]x_1,\ x_2,\ x_3,\ x_4,\ x_5[/ilmath] then the median is [ilmath]x'_3[/ilmath]
- [ilmath]\text{Median}(2,3,2,4,5)\eq\text{Median}(2,2,3,4,5)\eq 3[/ilmath]
Even sample
- Warning:This is a "conventional definition" that requires [ilmath]\frac{1}{2}(a+b)[/ilmath] to be defined for sample values [ilmath]a,b[/ilmath]. There is work to be done as per the caveat above
Let [ilmath]x_1,\ldots,x_{2m} [/ilmath] be given, for some [ilmath]m\in\mathbb{N}_{\ge 1} [/ilmath], then we conventionally define:
- [math]\text{Median}(x_1,\ldots,x_{2m}):\eq \frac{x'_m+x'_{m+1} }{2} [/math]
This requires a concept of "division by 2" and adding. This may not always be the case! See the caveat for more details.
See also
- Order statistic - of which the median is a special case
- Average
- Mode
- Alec's taxonomy of units - of which median can be used on "ordered" units and above.