2 * Introduction to descriptive::
3 * Functions and Variables for data manipulation::
4 * Functions and Variables for descriptive statistics::
5 * Functions and Variables for statistical graphs::
8 @node Introduction to descriptive, Functions and Variables for data manipulation, Package descriptive, Package descriptive
9 @section Introduction to descriptive
11 Package @code{descriptive} contains a set of functions for
12 making descriptive statistical computations and graphing.
13 Together with the source code there are three data sets in
14 your Maxima tree: @code{pidigits.data}, @code{wind.data} and @code{biomed.data}.
16 Any statistics manual can be used as a reference to the functions in package @code{descriptive}.
18 For comments, bugs or suggestions, please contact me at @var{'riotorto AT yahoo DOT com'}.
20 Here is a simple example on how the descriptive functions in @code{descriptive} do they work, depending on the nature of their arguments, lists or matrices,
23 @c load ("descriptive")$
24 @c /* univariate sample */ mean ([a, b, c]);
25 @c matrix ([a, b], [c, d], [e, f]);
26 @c /* multivariate sample */ mean (%);
29 (%i1) load ("descriptive")$
31 (%i2) /* univariate sample */ mean ([a, b, c]);
37 (%i3) matrix ([a, b], [c, d], [e, f]);
45 (%i4) /* multivariate sample */ mean (%);
47 (%o4) [---------, ---------]
52 Note that in multivariate samples the mean is calculated for each column.
54 In case of several samples with possible different sizes, the Maxima function @code{map} can be used to get the desired results for each sample,
57 @c load ("descriptive")$
58 @c map (mean, [[a, b, c], [d, e]]);
61 (%i1) load ("descriptive")$
63 (%i2) map (mean, [[a, b, c], [d, e]]);
65 (%o2) [---------, -----]
70 In this case, two samples of sizes 3 and 2 were stored into a list.
72 Univariate samples must be stored in lists like
75 @c s1 : [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5];
79 (%i1) s1 : [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5];
80 (%o1) [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]
84 and multivariate samples in matrices as in
87 @c s2 : matrix ([13.17, 9.29], [14.71, 16.88], [18.50, 16.88],
88 @c [10.58, 6.63], [13.33, 13.25], [13.21, 8.12]);
92 (%i1) s2 : matrix ([13.17, 9.29], [14.71, 16.88], [18.50, 16.88],
93 [10.58, 6.63], [13.33, 13.25], [13.21, 8.12]);
108 In this case, the number of columns equals the random variable dimension and the number of rows is the sample size.
110 Data can be introduced by hand, but big samples are usually stored in plain text files. For example, file @code{pidigits.data} contains the first 100 digits of number @code{%pi}:
126 In order to load these digits in Maxima,
129 @c s1 : read_list (file_search ("pidigits.data"))$
133 (%i1) s1 : read_list (file_search ("pidigits.data"))$
140 On the other hand, file @code{wind.data} contains daily average wind speeds at 5 meteorological stations in the Republic of Ireland (This is part of a data set taken at 12 meteorological stations. The original file is freely downloadable from the StatLib Data Repository and its analysis is discussed in Haslett, J., Raftery, A. E. (1989) @var{Space-time Modelling with Long-memory Dependence: Assessing Ireland's Wind Power Resource, with Discussion}. Applied Statistics 38, 1-50). This loads the data:
143 @c s2 : read_matrix (file_search ("wind.data"))$
145 @c s2 [%]; /* last record */
148 (%i1) s2 : read_matrix (file_search ("wind.data"))$
154 (%i3) s2 [%]; /* last record */
155 (%o3) [3.58, 6.0, 4.58, 7.62, 11.25]
159 Some samples contain non numeric data. As an example, file @code{biomed.data} (which is part of another bigger one downloaded from the StatLib Data Repository) contains four blood measures taken from two groups of patients, @code{A} and @code{B}, of different ages,
162 @c s3 : read_matrix (file_search ("biomed.data"))$
164 @c s3 [1]; /* first record */
167 (%i1) s3 : read_matrix (file_search ("biomed.data"))$
173 (%i3) s3 [1]; /* first record */
174 (%o3) [A, 30, 167.0, 89.0, 25.6, 364]
178 The first individual belongs to group @code{A}, is 30 years old and his/her blood measures were 167.0, 89.0, 25.6 and 364.
180 One must take care when working with categorical data. In the next example, symbol @code{a} is assigned a value in some previous moment and then a sample with categorical value @code{a} is taken,
184 @c matrix ([a, 3], [b, 5]);
189 (%i2) matrix ([a, 3], [b, 5]);
196 @opencatbox{Categories:}
197 @category{Descriptive statistics}
198 @category{Share packages}
199 @category{Package descriptive}
202 @node Functions and Variables for data manipulation, Functions and Variables for descriptive statistics, Introduction to descriptive, Package descriptive
203 @section Functions and Variables for data manipulation
207 @anchor{build_sample}
208 @deffn {Function} build_sample @
209 @fname{build_sample} (@var{list}) @
210 @fname{build_sample} (@var{matrix})
212 Builds a sample from a table of absolute frequencies.
213 The input table can be a matrix or a list of lists, all of
214 them of equal size. The number of columns or the length of
215 the lists must be greater than 1. The last element of each
216 row or list is interpreted as the absolute frequency.
217 The output is always a sample in matrix form.
221 Univariate frequency table.
224 @c load ("descriptive")$
225 @c sam1: build_sample([[6,1], [j,2], [2,1]]);
230 (%i1) load ("descriptive")$
232 (%i2) sam1: build_sample([[6,1], [j,2], [2,1]]);
247 (%i4) barsplot(sam1) $
250 Multivariate frequency table.
253 @c load ("descriptive")$
254 @c sam2: build_sample([[6,3,1], [5,6,2], [u,2,1],[6,8,2]]) ;
256 @c barsplot(sam2, grouping=stacked) $
259 (%i1) load ("descriptive")$
261 (%i2) sam2: build_sample([[6,3,1], [5,6,2], [u,2,1],[6,8,2]]) ;
277 [ u + 158 (u + 28) 2 u + 174 11 (u + 28) ]
278 [ -------- - --------- --------- - ----------- ]
281 [ 2 u + 174 11 (u + 28) 21 ]
282 [ --------- - ----------- -- ]
285 (%i4) barsplot(sam2, grouping=stacked) $
288 @opencatbox{Categories:}
289 @category{Package descriptive}
295 @anchor{continuous_freq}
296 @deffn {Function} continuous_freq @
297 @fname{continuous_freq} (@var{data}) @
298 @fname{continuous_freq} (@var{data}, @var{m})
300 Divides the range of @var{data} into intervals,
301 and counts how many values fall into each one.
303 A value @var{x} falls into an interval with left and right endpoints @var{a} and @var{b}
304 if and only if @code{@var{x} > @var{a}} and @code{@var{x} <= @var{b}},
305 except for the first (least or leftmost) interval,
306 for which @code{@var{x} >= @var{a}} and @code{@var{x} <= @var{b}}.
307 That is, an interval excludes its left endpoint and includes its right endpoint,
308 except for the first interval, which includes both the left and right endpoints.
310 @var{data} must be a list of numbers,
311 or 1-dimensional array (as created by @code{make_array}).
313 @var{m} is optional, and equals either the number of classes (10 by default),
314 or a list of two elements (the least and greatest values to be counted),
315 or a list of three elements (the least and greatest values to be counted, and the number of classes),
316 or a set containing the endpoints of the class intervals.
318 It is assumed that class intervals are contiguous.
319 That is, the right endpoint of one interval is equal to the left endpoint of the next.
321 @code{continuous_freq} returns a list of two lists.
322 The first list comprises all the endpoints of the class intervals,
323 concatenated into a single list.
324 The second list contains the class counts for the intervals corresponding to elements of the first list.
326 If sample values are all equal, this function returns exactly
327 one class of width 2.
331 Optional argument indicates the number of classes we want.
332 The first list in the output contains the interval limits, and
333 the second the corresponding counts: there are 16 digits inside
334 the interval @code{[0, 1.8]}, 24 digits in @code{(1.8, 3.6]}, and so on.
337 @c load ("descriptive")$
338 @c s1 : read_list (file_search ("pidigits.data"))$
339 @c continuous_freq (s1, 5);
342 (%i1) load ("descriptive")$
343 (%i2) s1 : read_list (file_search ("pidigits.data"))$
345 (%i3) continuous_freq (s1, 5);
347 (%o3) [[0, -, --, --, --, 9], [16, 24, 18, 17, 25]]
352 Optional argument indicates we want 7 classes with limits
356 @c load ("descriptive")$
357 @c s1 : read_list (file_search ("pidigits.data"))$
358 @c continuous_freq (s1, [-2,12,7]);
361 (%i1) load ("descriptive")$
362 (%i2) s1 : read_list (file_search ("pidigits.data"))$
364 (%i3) continuous_freq (s1, [-2,12,7]);
365 (%o3) [[- 2, 0, 2, 4, 6, 8, 10, 12], [8, 20, 22, 17, 20, 13, 0]]
369 Optional argument indicates we want the default number of classes with limits
373 @c load ("descriptive")$
374 @c s1 : read_list (file_search ("pidigits.data"))$
375 @c continuous_freq (s1, [-2,12]);
378 (%i1) load ("descriptive")$
379 (%i2) s1 : read_list (file_search ("pidigits.data"))$
381 (%i3) continuous_freq (s1, [-2,12]);
382 3 4 11 18 32 39 46 53
383 (%o3) [[- 2, - -, -, --, --, 5, --, --, --, --, 12],
385 [0, 8, 20, 12, 18, 9, 8, 25, 0, 0]]
389 The first argument may be an array.
392 @c load ("descriptive")$
393 @c s1 : read_list (file_search ("pidigits.data"))$
394 @c a1 : make_array (fixnum, length (s1)) $
395 @c fillarray (a1, s1);
396 @c continuous_freq (a1);
399 (%i1) load ("descriptive")$
400 (%i2) s1 : read_list (file_search ("pidigits.data"))$
401 (%i3) a1 : make_array (fixnum, length (s1)) $
403 (%i4) fillarray (a1, s1);
404 (%o4) @{Lisp Array: #(3 1 4 1 5 9 2 6 5 3 5 8 9 7 9 3 2 3 8 4 6 2\
406 0 2 8 8 4 1 9 7 1 6 9 3 9 9 3 7 5 1 0 5 8 2 0 9 7\
408 3 0 7 8 1 6 4 0 6 2 8 6 2 0 8 9 9 8 6 2 8 0 3 4 8\
413 (%i5) continuous_freq (a1);
414 9 9 27 18 9 27 63 36 81
415 (%o5) [[0, --, -, --, --, -, --, --, --, --, 9],
416 10 5 10 5 2 5 10 5 10
417 [8, 8, 12, 12, 10, 8, 9, 8, 12, 13]]
421 @opencatbox{Categories:}
422 @category{Package descriptive}
428 @anchor{discrete_freq}
429 @deffn {Function} discrete_freq (@var{data})
431 Counts absolute frequencies in discrete samples, both numeric and categorical. Its sole argument is a list,
432 or 1-dimensional array (as created by @code{make_array}).
437 @c load ("descriptive")$
438 @c s1 : read_list (file_search ("pidigits.data"))$
439 @c discrete_freq (s1);
442 (%i1) load ("descriptive")$
443 (%i2) s1 : read_list (file_search ("pidigits.data"))$
445 (%i3) discrete_freq (s1);
446 (%o3) [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
447 [8, 8, 12, 12, 10, 8, 9, 8, 12, 13]]
452 the first list gives the sample values, and the second, their absolute frequencies.
454 The argument may be an array.
457 @c load ("descriptive")$
458 @c s1 : read_list (file_search ("pidigits.data"))$
459 @c a1 : make_array (fixnum, length (s1)) $
460 @c fillarray (a1, s1);
461 @c discrete_freq (a1);
464 (%i1) load ("descriptive")$
465 (%i2) s1 : read_list (file_search ("pidigits.data"))$
466 (%i3) a1 : make_array (fixnum, length (s1)) $
468 (%i4) fillarray (a1, s1);
469 (%o4) @{Lisp Array: #(3 1 4 1 5 9 2 6 5 3 5 8 9 7 9 3 2 3 8 4 6 2\
471 0 2 8 8 4 1 9 7 1 6 9 3 9 9 3 7 5 1 0 5 8 2 0 9 7\
473 3 0 7 8 1 6 4 0 6 2 8 6 2 0 8 9 9 8 6 2 8 0 3 4 8\
478 (%i5) discrete_freq (a1);
479 (%o5) [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
480 [8, 8, 12, 12, 10, 8, 9, 8, 12, 13]]
484 @opencatbox{Categories:}
485 @category{Package descriptive}
493 @deffn {Function} standardize @
494 @fname{standardize} (@var{list}) @
495 @fname{standardize} (@var{matrix})
497 Subtracts to each element of the list the sample mean and divides
498 the result by the standard deviation. When the input is a matrix,
499 @code{standardize} subtracts to each row the multivariate mean, and then
500 divides each component by the corresponding standard deviation.
502 @opencatbox{Categories:}
503 @category{Package descriptive}
511 @deffn {Function} subsample @
512 @fname{subsample} (@var{data_matrix}, @var{predicate_function}) @
513 @fname{subsample} (@var{data_matrix}, @var{predicate_function}, @var{col_num1}, @var{col_num2}, ...)
515 This is a sort of variant of the Maxima @code{submatrix} function.
516 The first argument is the data matrix, the second is a predicate function
517 and optional additional arguments are the numbers of the columns to be taken.
521 These are multivariate records in which the wind speed
522 in the first meteorological station were greater than 18.
523 See that in the lambda expression the @var{i}-th component is
524 referred to as @code{v[i]}.
527 @c load ("descriptive")$
528 @c s2 : read_matrix (file_search ("wind.data"))$
529 @c subsample (s2, lambda([v], v[1] > 18));
532 (%i1) load ("descriptive")$
533 (%i2) s2 : read_matrix (file_search ("wind.data"))$
535 (%i3) subsample (s2, lambda([v], v[1] > 18));
536 [ 19.38 15.37 15.12 23.09 25.25 ]
538 [ 18.29 18.66 19.08 26.08 27.63 ]
540 [ 20.25 21.46 19.95 27.71 23.38 ]
542 [ 18.79 18.96 14.46 26.38 21.84 ]
546 In the following example, we request only the first, second and fifth
547 components of those records with wind speeds greater or equal than 16
548 in station number 1 and less than 25 knots in station number 4. The sample
549 contains only data from stations 1, 2 and 5. In this case,
550 the predicate function is defined as an ordinary Maxima function.
553 @c load ("descriptive")$
554 @c s2 : read_matrix (file_search ("wind.data"))$
555 @c g(x):= x[1] >= 16 and x[4] < 25$
556 @c subsample (s2, g, 1, 2, 5);
559 (%i1) load ("descriptive")$
560 (%i2) s2 : read_matrix (file_search ("wind.data"))$
561 (%i3) g(x):= x[1] >= 16 and x[4] < 25$
563 (%i4) subsample (s2, g, 1, 2, 5);
564 [ 19.38 15.37 25.25 ]
566 [ 17.33 14.67 19.58 ]
568 [ 16.92 13.21 21.21 ]
570 [ 17.25 18.46 23.87 ]
574 Here is an example with the categorical variables of @code{biomed.data}.
575 We want the records corresponding to those patients in group @code{B}
576 who are older than 38 years.
579 @c load ("descriptive")$
580 @c s3 : read_matrix (file_search ("biomed.data"))$
581 @c h(u):= u[1] = B and u[2] > 38 $
582 @c subsample (s3, h);
585 (%i1) load ("descriptive")$
586 (%i2) s3 : read_matrix (file_search ("biomed.data"))$
587 (%i3) h(u):= u[1] = B and u[2] > 38 $
589 (%i4) subsample (s3, h);
590 [ B 39 28.0 102.3 17.1 146 ]
592 [ B 39 21.0 92.4 10.3 197 ]
594 [ B 39 23.0 111.5 10.0 133 ]
596 [ B 39 26.0 92.6 12.3 196 ]
598 [ B 39 25.0 98.7 10.0 174 ]
600 [ B 39 21.0 93.2 5.9 181 ]
602 [ B 39 18.0 95.0 11.3 66 ]
604 [ B 39 39.0 88.5 7.6 168 ]
608 Probably, the statistical analysis will involve only the blood measures,
611 @c load ("descriptive")$
612 @c s3 : read_matrix (file_search ("biomed.data"))$
613 @c subsample (s3, lambda([v], v[1] = B and v[2] > 38),
617 (%i1) load ("descriptive")$
618 (%i2) s3 : read_matrix (file_search ("biomed.data"))$
620 (%i3) subsample (s3, lambda([v], v[1] = B and v[2] > 38),
622 [ 28.0 102.3 17.1 146 ]
624 [ 21.0 92.4 10.3 197 ]
626 [ 23.0 111.5 10.0 133 ]
628 [ 26.0 92.6 12.3 196 ]
630 [ 25.0 98.7 10.0 174 ]
632 [ 21.0 93.2 5.9 181 ]
634 [ 18.0 95.0 11.3 66 ]
636 [ 39.0 88.5 7.6 168 ]
640 This is the multivariate mean of @code{s3},
643 @c load ("descriptive")$
644 @c s3 : read_matrix (file_search ("biomed.data"))$
648 (%i1) load ("descriptive")$
649 (%i2) s3 : read_matrix (file_search ("biomed.data"))$
653 (%o3) [----------, ---, 87.178, 0.06 NA + 81.44999999999999,
656 18.122999999999998, ------------]
661 Here, the first component is meaningless, since @code{A} and @code{B} are categorical, the second component is the mean age of individuals in rational form, and the fourth and last values exhibit some strange behaviour. This is because symbol @code{NA} is used here to indicate @var{non available} data, and the two means are nonsense. A possible solution would be to take out from the matrix those rows with @code{NA} symbols, although this deserves some loss of information.
664 @c load ("descriptive")$
665 @c s3 : read_matrix (file_search ("biomed.data"))$
666 @c g(v):= v[4] # NA and v[6] # NA $
667 @c mean (subsample (s3, g, 3, 4, 5, 6));
670 (%i1) load ("descriptive")$
671 (%i2) s3 : read_matrix (file_search ("biomed.data"))$
672 (%i3) g(v):= v[4] # NA and v[6] # NA $
674 (%i4) mean (subsample (s3, g, 3, 4, 5, 6));
675 (%o4) [79.4923076923077, 86.2032967032967, 16.93186813186813,
682 @opencatbox{Categories:}
683 @category{Package descriptive}
691 @anchor{transform_sample}
692 @deffn {Function} transform_sample (@var{matrix}, @var{varlist}, @var{exprlist})
694 Transforms the sample @var{matrix}, where each column is called according to
695 @var{varlist}, following expressions in @var{exprlist}.
699 The second argument assigns names to the three columns. With these names,
700 a list of expressions define the transformation of the sample.
703 (%i1) load ("descriptive")$
704 (%i2) data: matrix([3,2,7],[3,7,2],[8,2,4],[5,2,4]) $
706 (%i3) transform_sample(data, [a,b,c], [c, a*b, log(a)]);
717 Add a constant column and remove the third variable.
720 (%i1) load ("descriptive")$
721 (%i2) data: matrix([3,2,7],[3,7,2],[8,2,4],[5,2,4]) $
722 (%i3) transform_sample(data, [a,b,c], [makelist(1,k,length(data)),a,b]);
734 @opencatbox{Categories:}
735 @category{Package descriptive}
745 @node Functions and Variables for descriptive statistics, Functions and Variables for statistical graphs, Functions and Variables for data manipulation, Package descriptive
746 @section Functions and Variables for descriptive statistics
751 @deffn {Function} mean @
752 @fname{mean} (@var{x}) @
753 @fname{mean} (@var{x}, @var{w})
755 Returns the sample mean.
756 @var{x} must be a list or matrix.
758 When @var{x} is a list,
759 @code{mean} returns the sample mean of @var{x}.
761 When @var{x} is a matrix,
762 @code{mean} returns a list comprising the sample mean of each column.
764 @var{w} is an optional per-datum weight.
765 @var{w} must either be 1, in which case every datum @var{x[i]} is given equal weight,
766 or a list of the same length as @var{x},
767 in which case the weight for @var{x[i]} is given by @var{w[i]}.
768 The elements of @var{w} must be nonnegative and not all zero;
769 it is not required that they sum to 1.
771 The unweighted sample mean is defined as
785 $${\bar{x}={1\over{n}}{\sum_{i=1}^{n}{x_{i}}}}$$
788 The weighted sample mean is defined as
802 $${\bar{x}={1\over{Z}}{\sum_{i=1}^{n}{w_{i} x_{i}}}}$$
805 where @var{Z} is the sum of the weights,
819 $${Z={\sum_{i=1}^{n}{w_{i}}}}$$
824 Sample mean of a list.
827 @c load ("descriptive")$
828 @c s1 : read_list (file_search ("pidigits.data"))$
832 (%i1) load ("descriptive")$
833 (%i2) s1 : read_list (file_search ("pidigits.data"))$
842 Sample mean of each column of a matrix.
845 @c load ("descriptive")$
846 @c s2 : read_matrix (file_search ("wind.data"))$
850 (%i1) load ("descriptive")$
851 (%i2) s2 : read_matrix (file_search ("wind.data"))$
854 (%o3) [9.9485, 10.160700000000004, 10.868499999999997,
855 15.716600000000001, 14.844100000000001]
859 Weighted sample mean of a list.
862 @c load ("descriptive")$
863 @c mean ([a, b, c, d], [1, 2, 3, 4]);
866 (%i1) load ("descriptive")$
868 (%i2) mean ([a, b, c, d], [1, 2, 3, 4]);
870 (%o2) -------------------
875 Weighted sample mean of each column of a matrix.
878 @c load ("descriptive")$
879 @c mm: matrix ([p, q, r], [s, t, u]);
880 @c mean (mm, [vv, ww]);
883 (%i1) load ("descriptive")$
885 (%i2) mm: matrix ([p, q, r], [s, t, u]);
891 (%i3) mean (mm, [vv, ww]);
892 s ww + p vv t ww + q vv u ww + r vv
893 (%o3) [-----------, -----------, -----------]
894 ww + vv ww + vv ww + vv
898 @opencatbox{Categories:}
899 @category{Package descriptive}
906 @deffn {Function} var @
907 @fname{var} (@var{x}) @
908 @fname{var} (@var{x}, @var{w})
910 Returns the sample variance.
911 @var{x} must be a list or matrix.
913 When @var{x} is a list,
914 @code{var} returns the sample variance of @var{x}.
916 When @var{x} is a matrix,
917 @code{var} returns a list comprising the sample variance of each column.
919 @var{w} is an optional per-datum weight.
920 @var{w} must either be 1, in which case every datum @var{x[i]} is given equal weight,
921 or a list of the same length as @var{x},
922 in which case the weight for @var{x[i]} is given by @var{w[i]}.
923 The elements of @var{w} must be nonnegative and not all zero;
924 it is not required that they sum to 1.
926 The unweighted sample variance is defined as
942 $${{1}\over{n}}{\sum_{i=1}^{n}{(x_{i}-\bar{x})^2}}$$
945 The weighted sample variance is defined as
961 $${{1}\over{Z}}{\sum_{i=1}^{n}{(x_{i}-\bar{x})^2}}$$
964 where @var{Z} is the sum of the weights,
978 $${Z={\sum_{i=1}^{n}{w_{i}}}}$$
983 Sample variance of a list.
986 @c load ("descriptive")$
987 @c s1 : read_list (file_search ("pidigits.data"))$
991 (%i1) load ("descriptive")$
992 (%i2) s1 : read_list (file_search ("pidigits.data"))$
994 (%i3) var (s1), numer;
995 (%o3) 8.425899999999999
999 Sample variance of each column of a matrix.
1002 @c load ("descriptive")$
1003 @c s2 : read_matrix (file_search ("wind.data"))$
1007 (%i1) load ("descriptive")$
1008 (%i2) s2 : read_matrix (file_search ("wind.data"))$
1011 (%o3) [17.22190675000001, 14.987736510000005,
1012 15.475728749999998, 32.17651044000001, 24.423076190000007]
1016 Weighted sample variance of a list.
1019 @c load ("descriptive")$
1020 @c var ([a - b, a, a + b], [3, 5, 7]);
1023 (%i1) load ("descriptive")$
1025 (%i2) var ([a - b, a, a + b], [3, 5, 7]);
1033 Weighted sample variance of each column of a matrix.
1036 @c load ("descriptive")$
1037 @c mm: matrix ([a - b, c - d], [a, c], [a + b, c + d]);
1038 @c var (mm, [3, 5, 7]);
1041 (%i1) load ("descriptive")$
1043 (%i2) mm: matrix ([a - b, c - d], [a, c], [a + b, c + d]);
1051 (%i3) var (mm, [3, 5, 7]);
1054 (%o3) [------, ------]
1059 See also function @mrefdot{var1}
1061 @opencatbox{Categories:}
1062 @category{Package descriptive}
1069 @deffn {Function} var1 @
1070 @fname{var1} (@var{list}) @
1071 @fname{var1} (@var{matrix})
1073 This is the sample variance, defined as
1088 $${{1\over{n-1}}{\sum_{i=1}^{n}{(x_{i}-\bar{x})^2}}}$$
1094 @c load ("descriptive")$
1095 @c s1 : read_list (file_search ("pidigits.data"))$
1096 @c var1 (s1), numer;
1097 @c s2 : read_matrix (file_search ("wind.data"))$
1101 (%i1) load ("descriptive")$
1102 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1104 (%i3) var1 (s1), numer;
1105 (%o3) 8.5110101010101
1107 (%i4) s2 : read_matrix (file_search ("wind.data"))$
1110 (%o5) [17.395865404040414, 15.139127787878794,
1111 15.632049242424243, 32.50152569696971, 24.669773929292937]
1115 See also function @mrefdot{var}
1117 @opencatbox{Categories:}
1118 @category{Package descriptive}
1125 @deffn {Function} std @
1126 @fname{std} (@var{x}) @
1127 @fname{std} (@var{x}, @var{w})
1129 Returns the sample standard deviation.
1130 @var{x} must be a list or matrix.
1132 When @var{x} is a list,
1133 @code{std} returns the sample standard deviation of @var{x},
1134 which is defined as the square root of the sample variance,
1135 as computed by @code{var}.
1137 When @var{x} is a matrix,
1138 @code{std} returns a list comprising the sample standard deviation of each column.
1140 @var{w} is an optional per-datum weight.
1141 @var{w} must either be 1, in which case every datum @var{x[i]} is given equal weight,
1142 or a list of the same length as @var{x},
1143 in which case the weight for @var{x[i]} is given by @var{w[i]}.
1144 The elements of @var{w} must be nonnegative and not all zero;
1145 it is not required that they sum to 1.
1149 Sample standard deviation of a list.
1152 @c load ("descriptive")$
1153 @c s1 : read_list (file_search ("pidigits.data"))$
1157 (%i1) load ("descriptive")$
1158 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1160 (%i3) std (s1), numer;
1161 (%o3) 2.9027400848164135
1165 Sample standard deviation of each column of a matrix.
1168 @c load ("descriptive")$
1169 @c s2 : read_matrix (file_search ("wind.data"))$
1173 (%i1) load ("descriptive")$
1174 (%i2) s2 : read_matrix (file_search ("wind.data"))$
1177 (%o3) [4.149928523480858, 3.8713998127292415,
1178 3.9339202775348663, 5.672434260526957, 4.941970881136392]
1182 See also functions @mref{var} and @mrefdot{std1}
1184 @opencatbox{Categories:}
1185 @category{Package descriptive}
1192 @deffn {Function} std1 @
1193 @fname{std1} (@var{list}) @
1194 @fname{std1} (@var{matrix})
1196 This is the square root of the function @mrefcomma{var1} the variance with denominator @math{n-1}.
1201 @c load ("descriptive")$
1202 @c s1 : read_list (file_search ("pidigits.data"))$
1203 @c std1 (s1), numer;
1204 @c s2 : read_matrix (file_search ("wind.data"))$
1208 (%i1) load ("descriptive")$
1209 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1211 (%i3) std1 (s1), numer;
1212 (%o3) 2.917363553109228
1214 (%i4) s2 : read_matrix (file_search ("wind.data"))$
1217 (%o5) [4.170835096721089, 3.8909032097803196,
1218 3.9537386411375555, 5.701010936401517, 4.966867617451963]
1222 See also functions @mref{var1} and @mrefdot{std}
1224 @opencatbox{Categories:}
1225 @category{Package descriptive}
1231 @anchor{noncentral_moment}
1232 @deffn {Function} noncentral_moment @
1233 @fname{noncentral_moment} (@var{x}, @var{k}) @
1234 @fname{noncentral_moment} (@var{x}, @var{k}, @var{w})
1236 Returns the noncentral moment of order @var{k}.
1237 @var{x} must be a list or matrix.
1239 When @var{x} is a list,
1240 @code{noncentral_moment} returns the noncentral moment of order @var{k} of @var{x}.
1242 When @var{x} is a matrix,
1243 @code{noncentral_moment} returns a list comprising the noncentral moment of order @var{k} of each column.
1245 @var{w} is an optional per-datum weight.
1246 @var{w} must either be 1, in which case every datum @var{x[i]} is given equal weight,
1247 or a list of the same length as @var{x},
1248 in which case the weight for @var{x[i]} is given by @var{w[i]}.
1249 The elements of @var{w} must be nonnegative and not all zero;
1250 it is not required that they sum to 1.
1252 The unweighted noncentral moment of order @var{k} is defined as
1268 $${{1\over{n}}{\sum_{i=1}^{n}{x_{i}^k}}}$$
1271 The weighted noncentral moment of order @var{k} is defined as
1287 $${{1\over{Z}}{\sum_{i=1}^{n}{w_{i} x_{i}^k}}}$$
1290 where @var{Z} is the sum of the weights,
1304 $${Z={\sum_{i=1}^{n}{w_{i}}}}$$
1309 First noncentral moment of a list.
1310 The first noncentral moment is equal to the sample mean.
1313 @c load ("descriptive")$
1314 @c s1 : read_list (file_search ("pidigits.data"))$
1315 @c noncentral_moment (s1, 1), numer;
1316 @c mean (s1), numer;
1319 (%i1) load ("descriptive")$
1320 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1322 (%i3) noncentral_moment (s1, 1), numer;
1326 (%i4) mean (s1), numer;
1331 Fifth noncentral moment of each column of a matrix.
1334 @c load ("descriptive")$
1335 @c s2 : read_matrix (file_search ("wind.data"))$
1336 @c noncentral_moment (s2, 5);
1339 (%i1) load ("descriptive")$
1340 (%i2) s2 : read_matrix (file_search ("wind.data"))$
1342 (%i3) noncentral_moment (s2, 5);
1343 (%o3) [319793.87247615046, 320532.19238924625,
1344 391249.56213815557, 2502278.205988911, 1691881.7977422548]
1348 See also function @mrefdot{central_moment}
1350 @opencatbox{Categories:}
1351 @category{Package descriptive}
1357 @anchor{central_moment}
1358 @deffn {Function} central_moment @
1359 @fname{central_moment} (@var{x}, @var{k}) @
1360 @fname{central_moment} (@var{x}, @var{k}, @var{w})
1362 Returns the central moment of order @var{k}.
1363 @var{x} must be a list or matrix.
1365 When @var{x} is a list,
1366 @code{central_moment} returns the central moment of order @var{k} of @var{x}.
1368 When @var{x} is a matrix,
1369 @code{central_moment} returns a list comprising the central moment of order @var{k} of each column.
1371 @var{w} is an optional per-datum weight.
1372 @var{w} must either be 1, in which case every datum @var{x[i]} is given equal weight,
1373 or a list of the same length as @var{x},
1374 in which case the weight for @var{x[i]} is given by @var{w[i]}.
1375 The elements of @var{w} must be nonnegative and not all zero;
1376 it is not required that they sum to 1.
1378 The unweighted central moment of order @var{k} is defined as
1394 $${{1\over{n}}{\sum_{i=1}^{n}{(x_{i}-\bar{x})^k}}}$$
1397 The weighted central moment of order @var{k} is defined as
1413 $${{1\over{Z}}{\sum_{i=1}^{n}{w_{i} (x_{i}-\bar{x})^k}}}$$
1416 where @var{Z} is the sum of the weights,
1430 $${Z={\sum_{i=1}^{n}{w_{i}}}}$$
1435 Second central moment of a list.
1436 The second central moment is equal to the sample variance.
1439 @c load ("descriptive")$
1440 @c s1 : read_list (file_search ("pidigits.data"))$
1441 @c central_moment (s1, 2), numer;
1445 (%i1) load ("descriptive")$
1446 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1448 (%i3) central_moment (s1, 2), numer;
1449 (%o3) 8.425899999999999
1452 (%i4) var (s1), numer;
1453 (%o4) 8.425899999999999
1457 Third central moment of each column of a matrix.
1460 @c load ("descriptive")$
1461 @c s2 : read_matrix (file_search ("wind.data"))$
1462 @c central_moment (s2, 3);
1465 (%i1) load ("descriptive")$
1466 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1468 (%i3) central_moment (s1, 2), numer; /* the variance */
1469 (%o3) 8.425899999999999
1471 (%i5) s2 : read_matrix (file_search ("wind.data"))$
1473 (%i6) central_moment (s2, 3);
1474 (%o6) [11.29584771375004, 16.97988248298583, 5.626661952750102,
1475 37.5986572057918, 25.85981904394192]
1479 See also functions @mref{central_moment} and @mrefdot{mean}
1481 @opencatbox{Categories:}
1482 @category{Package descriptive}
1489 @deffn {Function} cv @
1490 @fname{cv} (@var{list}) @
1491 @fname{cv} (@var{matrix})
1493 Returns the variation coefficient,
1494 defined as the sample standard deviation @mref{std} divided by the @mrefdot{mean}
1499 @c load ("descriptive")$
1500 @c s1 : read_list (file_search ("pidigits.data"))$
1502 @c s2 : read_matrix (file_search ("wind.data"))$
1506 (%i1) load ("descriptive")$
1507 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1509 (%i3) cv (s1), numer;
1510 (%o3) 0.6162930116383044
1512 (%i4) s2 : read_matrix (file_search ("wind.data"))$
1515 (%o5) [0.4171411291632767, 0.38101703748061055,
1516 0.3619561372346568, 0.3609199356430116, 0.3329249251309538]
1520 See also functions @mref{std} and @mrefdot{mean}
1522 @opencatbox{Categories:}
1523 @category{Package descriptive}
1530 @deffn {Function} smin @
1531 @fname{smin} (@var{list}) @
1532 @fname{smin} (@var{matrix})
1534 This is the minimum value of the sample @var{list}.
1535 When the argument is a matrix, @mref{smin} returns
1536 a list containing the minimum values of the columns,
1537 which are associated to statistical variables.
1542 @c load ("descriptive")$
1543 @c s1 : read_list (file_search ("pidigits.data"))$
1545 @c s2 : read_matrix (file_search ("wind.data"))$
1549 (%i1) load ("descriptive")$
1550 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1555 (%i4) s2 : read_matrix (file_search ("wind.data"))$
1558 (%o5) [0.58, 0.5, 2.67, 5.25, 5.17]
1562 See also function @mrefdot{smax}
1564 @opencatbox{Categories:}
1565 @category{Package descriptive}
1572 @deffn {Function} smax @
1573 @fname{smax} (@var{list}) @
1574 @fname{smax} (@var{matrix})
1576 This is the maximum value of the sample @var{list}.
1577 When the argument is a matrix, @mref{smax} returns
1578 a list containing the maximum values of the columns,
1579 which are associated to statistical variables.
1584 @c load ("descriptive")$
1585 @c s1 : read_list (file_search ("pidigits.data"))$
1587 @c s2 : read_matrix (file_search ("wind.data"))$
1591 (%i1) load ("descriptive")$
1592 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1597 (%i4) s2 : read_matrix (file_search ("wind.data"))$
1600 (%o5) [20.25, 21.46, 20.04, 29.63, 27.63]
1604 See also function @mrefdot{smin}
1606 @opencatbox{Categories:}
1607 @category{Package descriptive}
1614 @deffn {Function} range @
1615 @fname{range} (@var{list}) @
1616 @fname{range} (@var{matrix})
1618 The range is the difference between the extreme values.
1623 @c load ("descriptive")$
1624 @c s1 : read_list (file_search ("pidigits.data"))$
1626 @c s2 : read_matrix (file_search ("wind.data"))$
1630 (%i1) load ("descriptive")$
1631 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1636 (%i4) s2 : read_matrix (file_search ("wind.data"))$
1639 (%o5) [19.67, 20.96, 17.369999999999997, 24.38, 22.46]
1643 @opencatbox{Categories:}
1644 @category{Package descriptive}
1651 @deffn {Function} quantile @
1652 @fname{quantile} (@var{list}, @var{p}) @
1653 @fname{quantile} (@var{matrix}, @var{p})
1655 This is the @var{p}-quantile, with @var{p} a number in @math{[0, 1]}, of the sample @var{list}.
1656 Although there are several definitions for the sample quantile (Hyndman, R. J., Fan, Y. (1996) @var{Sample quantiles in statistical packages}. American Statistician, 50, 361-365), the one based on linear interpolation is implemented in package @ref{Package descriptive}
1660 Input is a list. First and third quartiles are computed.
1663 @c load ("descriptive")$
1664 @c s1 : read_list (file_search ("pidigits.data"))$
1665 @c [quantile (s1, 1/4), quantile (s1, 3/4)], numer;
1668 (%i1) load ("descriptive")$
1669 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1671 (%i3) [quantile (s1, 1/4), quantile (s1, 3/4)], numer;
1676 Input is a matrix. First quartile is computed for each column.
1679 @c load ("descriptive")$
1680 @c s2 : read_matrix (file_search ("wind.data"))$
1681 @c quantile (s2, 1/4);
1684 (%i1) load ("descriptive")$
1685 (%i2) s2 : read_matrix (file_search ("wind.data"))$
1687 (%i3) quantile (s2, 1/4);
1688 (%o3) [7.2575, 7.477500000000001, 7.82, 11.28, 11.48]
1692 @opencatbox{Categories:}
1693 @category{Package descriptive}
1700 @deffn {Function} median @
1701 @fname{median} (@var{list}) @
1702 @fname{median} (@var{matrix})
1704 Once the sample is ordered, if the sample size is odd the median is the central value, otherwise it is the mean of the two central values.
1709 @c load ("descriptive")$
1710 @c s1 : read_list (file_search ("pidigits.data"))$
1712 @c s2 : read_matrix (file_search ("wind.data"))$
1716 (%i1) load ("descriptive")$
1717 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1724 (%i4) s2 : read_matrix (file_search ("wind.data"))$
1727 (%o5) [10.059999999999999, 9.855, 10.73, 15.48, 14.105]
1731 The median is the 1/2-quantile.
1733 See also function @mrefdot{quantile}
1735 @opencatbox{Categories:}
1736 @category{Package descriptive}
1743 @deffn {Function} qrange @
1744 @fname{qrange} (@var{x})
1746 Returns the interquartile range,
1747 defined as the difference between the third and first quartiles:
1748 @code{quantile(@var{x}, 3/4) - quantile(@var{x}, 1/4)}
1750 @var{x} must be a list or matrix.
1751 When @var{x} is a matrix,
1752 @code{qrange} returns the interquartile range for each column.
1757 @c load ("descriptive")$
1758 @c s1 : read_list (file_search ("pidigits.data"))$
1760 @c s2 : read_matrix (file_search ("wind.data"))$
1764 (%i1) load ("descriptive")$
1765 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1772 (%i4) s2 : read_matrix (file_search ("wind.data"))$
1775 (%o5) [5.385, 5.572499999999998, 6.022500000000001,
1776 8.729999999999999, 6.649999999999999]
1780 See also function @mrefdot{quantile}
1782 @opencatbox{Categories:}
1783 @category{Package descriptive}
1789 @anchor{mean_deviation}
1790 @deffn {Function} mean_deviation @
1791 @fname{mean_deviation} (@var{list}) @
1792 @fname{mean_deviation} (@var{matrix})
1794 The mean deviation, defined as
1809 $${{1\over{n}}{\sum_{i=1}^{n}{|x_{i}-\bar{x}|}}}$$
1815 @c load ("descriptive")$
1816 @c s1 : read_list (file_search ("pidigits.data"))$
1817 @c mean_deviation (s1);
1818 @c s2 : read_matrix (file_search ("wind.data"))$
1819 @c mean_deviation (s2);
1822 (%i1) load ("descriptive")$
1823 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1825 (%i3) mean_deviation (s1);
1830 (%i4) s2 : read_matrix (file_search ("wind.data"))$
1832 (%i5) mean_deviation (s2);
1833 (%o5) [3.2879599999999987, 3.075342, 3.2390700000000003,
1834 4.715664000000001, 4.028546000000002]
1838 See also function @mrefdot{mean}
1840 @opencatbox{Categories:}
1841 @category{Package descriptive}
1847 @anchor{median_deviation}
1848 @deffn {Function} median_deviation @
1849 @fname{median_deviation} (@var{list}) @
1850 @fname{median_deviation} (@var{matrix})
1852 The median deviation, defined as
1867 $${{1\over{n}}{\sum_{i=1}^{n}{|x_{i}-med|}}}$$
1869 where @code{med} is the median of @var{list}.
1874 @c load ("descriptive")$
1875 @c s1 : read_list (file_search ("pidigits.data"))$
1876 @c median_deviation (s1);
1877 @c s2 : read_matrix (file_search ("wind.data"))$
1878 @c median_deviation (s2);
1881 (%i1) load ("descriptive")$
1882 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1884 (%i3) median_deviation (s1);
1889 (%i4) s2 : read_matrix (file_search ("wind.data"))$
1891 (%i5) median_deviation (s2);
1892 (%o5) [2.75, 2.7550000000000003, 3.08, 4.315, 3.3099999999999996]
1896 See also function @mrefdot{mean}
1898 @opencatbox{Categories:}
1899 @category{Package descriptive}
1905 @anchor{harmonic_mean}
1906 @deffn {Function} harmonic_mean @
1907 @fname{harmonic_mean} (@var{list}) @
1908 @fname{harmonic_mean} (@var{matrix})
1910 The harmonic mean, defined as
1927 $${{n}\over{\sum_{i=1}^{n}{{{1}\over{x_{i}}}}}}$$
1933 @c load ("descriptive")$
1934 @c y : [5, 7, 2, 5, 9, 5, 6, 4, 9, 2, 4, 2, 5]$
1935 @c harmonic_mean (y), numer;
1936 @c s2 : read_matrix (file_search ("wind.data"))$
1937 @c harmonic_mean (s2);
1940 (%i1) load ("descriptive")$
1941 (%i2) y : [5, 7, 2, 5, 9, 5, 6, 4, 9, 2, 4, 2, 5]$
1943 (%i3) harmonic_mean (y), numer;
1944 (%o3) 3.9018580276322052
1946 (%i4) s2 : read_matrix (file_search ("wind.data"))$
1948 (%i5) harmonic_mean (s2);
1949 (%o5) [6.948015590052786, 7.391967752360356, 9.055658197151745,
1950 13.441990281936924, 13.01439145898509]
1954 See also functions @mref{mean} and @mrefdot{geometric_mean}
1956 @opencatbox{Categories:}
1957 @category{Package descriptive}
1964 @anchor{geometric_mean}
1965 @deffn {Function} geometric_mean @
1966 @fname{geometric_mean} (@var{list}) @
1967 @fname{geometric_mean} (@var{matrix})
1969 The geometric mean, defined as
1984 $$\left(\prod_{i=1}^{n}{x_{i}}\right)^{{{1}\over{n}}}$$
1990 @c load ("descriptive")$
1991 @c y : [5, 7, 2, 5, 9, 5, 6, 4, 9, 2, 4, 2, 5]$
1992 @c geometric_mean (y), numer;
1993 @c s2 : read_matrix (file_search ("wind.data"))$
1994 @c geometric_mean (s2);
1997 (%i1) load ("descriptive")$
1998 (%i2) y : [5, 7, 2, 5, 9, 5, 6, 4, 9, 2, 4, 2, 5]$
2000 (%i3) geometric_mean (y), numer;
2001 (%o3) 4.454845412337012
2003 (%i4) s2 : read_matrix (file_search ("wind.data"))$
2005 (%i5) geometric_mean (s2);
2006 (%o5) [8.82476274347979, 9.22652604739361, 10.044267571488904,
2007 14.612741263490207, 13.96184163444275]
2011 See also functions @mref{mean} and @mrefdot{harmonic_mean}
2013 @opencatbox{Categories:}
2014 @category{Package descriptive}
2021 @deffn {Function} kurtosis @
2022 @fname{kurtosis} (@var{list}) @
2023 @fname{kurtosis} (@var{matrix})
2025 The kurtosis coefficient, defined as
2040 $${{1\over{n s^4}}{\sum_{i=1}^{n}{(x_{i}-\bar{x})^4}}-3}$$
2046 @c load ("descriptive")$
2047 @c s1 : read_list (file_search ("pidigits.data"))$
2048 @c kurtosis (s1), numer;
2049 @c s2 : read_matrix (file_search ("wind.data"))$
2053 (%i1) load ("descriptive")$
2054 (%i2) s1 : read_list (file_search ("pidigits.data"))$
2056 (%i3) kurtosis (s1), numer;
2057 (%o3) - 1.273247946514421
2059 (%i4) s2 : read_matrix (file_search ("wind.data"))$
2061 (%i5) kurtosis (s2);
2062 (%o5) [- 0.2715445622195385, 0.119998784429451,
2063 - 0.42752334904828615, - 0.6405361979019522,
2064 - 0.4952382132352935]
2068 See also functions @mrefcomma{mean} @mref{var} and @mrefdot{skewness}
2070 @opencatbox{Categories:}
2071 @category{Package descriptive}
2078 @deffn {Function} skewness @
2079 @fname{skewness} (@var{list}) @
2080 @fname{skewness} (@var{matrix})
2082 The skewness coefficient, defined as
2097 $${{1\over{n s^3}}{\sum_{i=1}^{n}{(x_{i}-\bar{x})^3}}}$$
2103 @c load ("descriptive")$
2104 @c s1 : read_list (file_search ("pidigits.data"))$
2105 @c skewness (s1), numer;
2106 @c s2 : read_matrix (file_search ("wind.data"))$
2110 (%i1) load ("descriptive")$
2111 (%i2) s1 : read_list (file_search ("pidigits.data"))$
2113 (%i3) skewness (s1), numer;
2114 (%o3) 0.009196180476450424
2116 (%i4) s2 : read_matrix (file_search ("wind.data"))$
2118 (%i5) skewness (s2);
2119 (%o5) [0.1580509020000978, 0.2926379232061854,
2120 0.09242174416107717, 0.20599843481486865, 0.21425202488908313]
2124 See also functions @mrefcomma{mean}, @mref{var} and @mrefdot{kurtosis}
2126 @opencatbox{Categories:}
2127 @category{Package descriptive}
2133 @anchor{pearson_skewness}
2134 @deffn {Function} pearson_skewness @
2135 @fname{pearson_skewness} (@var{list}) @
2136 @fname{pearson_skewness} (@var{matrix})
2138 Pearson's skewness coefficient, defined as
2150 $${{3\,\left(\bar{x}-med\right)}\over{s}}$$
2152 where @var{med} is the median of @var{list}.
2157 @c load ("descriptive")$
2158 @c s1 : read_list (file_search ("pidigits.data"))$
2159 @c pearson_skewness (s1), numer;
2160 @c s2 : read_matrix (file_search ("wind.data"))$
2161 @c pearson_skewness (s2);
2164 (%i1) load ("descriptive")$
2165 (%i2) s1 : read_list (file_search ("pidigits.data"))$
2167 (%i3) pearson_skewness (s1), numer;
2168 (%o3) 0.21594840290938955
2170 (%i4) s2 : read_matrix (file_search ("wind.data"))$
2172 (%i5) pearson_skewness (s2);
2173 (%o5) [- 0.08019976629211892, 0.2357036272952649,
2174 0.10509040624912039, 0.12450423405923679, 0.44641817958045193]
2178 See also functions @mrefcomma{mean} @mref{var} and @mrefdot{median}
2180 @opencatbox{Categories:}
2181 @category{Package descriptive}
2187 @anchor{quartile_skewness}
2188 @deffn {Function} quartile_skewness @
2189 @fname{quartile_skewness} (@var{list}) @
2190 @fname{quartile_skewness} (@var{matrix})
2192 The quartile skewness coefficient, defined as
2198 --------------------
2205 $${{c_{{{3}\over{4}}}-2\,c_{{{1}\over{2}}}+c_{{{1}\over{4}}}}\over{c
2206 _{{{3}\over{4}}}-c_{{{1}\over{4}}}}}$$
2208 where @math{c_p} is the @var{p}-quantile of sample @var{list}.
2213 @c load ("descriptive")$
2214 @c s1 : read_list (file_search ("pidigits.data"))$
2215 @c quartile_skewness (s1), numer;
2216 @c s2 : read_matrix (file_search ("wind.data"))$
2217 @c quartile_skewness (s2);
2220 (%i1) load ("descriptive")$
2221 (%i2) s1 : read_list (file_search ("pidigits.data"))$
2223 (%i3) quartile_skewness (s1), numer;
2224 (%o3) 0.047619047619047616
2226 (%i4) s2 : read_matrix (file_search ("wind.data"))$
2228 (%i5) quartile_skewness (s2);
2229 (%o5) [- 0.040854224698235304, 0.14670255720053824,
2230 0.033623910336239196, 0.03780068728522298, 0.2105263157894735]
2234 See also function @mrefdot{quantile}
2236 @opencatbox{Categories:}
2237 @category{Package descriptive}
2244 @deffn {Function} km @
2245 @fname{km} (@var{list}, @var{option} ...) @
2246 @fname{km} (@var{matrix}, @var{option} ...)
2248 Kaplan Meier estimator of the survival, or reliability, function @math{S(x)=1-F(x)}.
2250 Data can be introduced as a list of pairs, or as a two column matrix. The first
2251 component is the observed time, and the second component a censoring index
2252 (1 = non censored, 0 = right censored).
2254 The optional argument is the name of the variable in the returned expression,
2255 which is @var{x} by default.
2259 Sample as a list of pairs.
2262 @c load ("descriptive")$
2263 @c S: km([[2,1], [3,1], [5,0], [8,1]]);
2266 @c line_width = 3, grid = true,
2267 @c explicit(S, x, -0.1, 10))$
2270 (%i1) load ("descriptive")$
2272 (%i2) S: km([[2,1], [3,1], [5,0], [8,1]]);
2273 charfun((3 <= x) and (x < 8))
2274 (%o2) charfun(x < 0) + -----------------------------
2276 3 charfun((2 <= x) and (x < 3))
2277 + -------------------------------
2279 + charfun((0 <= x) and (x < 2))
2281 (%i3) load ("draw")$
2284 line_width = 3, grid = true,
2285 explicit(S, x, -0.1, 10))$
2289 Estimate survival probabilities.
2292 @c load ("descriptive")$
2293 @c S(t):= ''(km([[2,1], [3,1], [5,0], [8,1]], t)) $
2297 (%i1) load ("descriptive")$
2298 (%i2) S(t):= ''(km([[2,1], [3,1], [5,0], [8,1]], t)) $
2307 @opencatbox{Categories:}
2308 @category{Package descriptive}
2314 @anchor{cdf_empirical}
2315 @deffn {Function} cdf_empirical @
2316 @fname{cdf_empirical} (@var{list}, @var{option} ...) @
2317 @fname{cdf_empirical} (@var{matrix}, @var{option} ...)
2319 Empirical distribution function @math{F(x)}.
2321 Data can be introduced as a list of numbers, or as an one column matrix.
2323 The optional argument is the name of the variable in the returned expression,
2324 which is @var{x} by default.
2328 Empirical distribution function.
2331 @c load ("descriptive")$
2332 @c F(x):= ''(cdf_empirical([1,3,3,5,7,7,7,8,9]));
2338 @c explicit(F(z), z, -2, 12)) $
2341 (%i1) load ("descriptive")$
2343 (%i2) F(x):= ''(cdf_empirical([1,3,3,5,7,7,7,8,9]));
2344 (%o2) F(x) := (charfun(x >= 9) + charfun(x >= 8)
2345 + 3 charfun(x >= 7) + charfun(x >= 5) + 2 charfun(x >= 3)
2346 + charfun(x >= 1))/9
2359 explicit(F(z), z, -2, 12)) $
2363 @opencatbox{Categories:}
2364 @category{Package descriptive}
2371 @deffn {Function} cov @
2372 @fname{cov} (@var{X}) @
2373 @fname{cov} (@var{X}, @var{w})
2375 Returns the sample covariance matrix.
2376 @var{X} must be a matrix.
2378 The sample covariance matrix has the same number of rows and columns,
2379 both equal to the number of columns of @var{X};
2380 each diagonal element @var{X[i, i]} is equal to the sample variance of the @var{i}'th column,
2381 and each off-diagonal element @var{X[i, j]} is equal to the sample covariance of the @var{i}'th and @var{j}'th columns.
2383 @var{w} is an optional per-datum weight.
2384 @var{w} must either be 1, in which case every datum @var{X[i]} is given equal weight,
2385 or a list of the same length as @var{X},
2386 in which case the weight for @var{X[i]} is given by @var{w[i]}.
2387 The elements of @var{w} must be nonnegative and not all zero;
2388 it is not required that they sum to 1.
2390 The unweighted sample covariance is defined as
2398 S = - > (X - X) (X - X)'
2406 $${S={1\over{n}}{\sum_{j=1}^{n}{\left(X_{j}-\bar{X}\right)\,\left(X_{j}-\bar{X}\right)'}}}$$
2409 where @var{X[j]} is the @var{j}'th row of the sample matrix.
2411 The weighted sample covariance is defined as
2419 S = - > w (X - X) (X - X)'
2427 $${S={1\over{Z}}{\sum_{j=1}^{n}{w_j \left(X_{j}-\bar{X}\right)\,\left(X_{j}-\bar{X}\right)'}}}$$
2430 where @var{Z} is the sum of the weights,
2444 $${Z={\sum_{i=1}^{n}{w_{i}}}}$$
2450 @c load ("descriptive")$
2451 @c s2 : read_matrix (file_search ("wind.data"))$
2456 (%i1) load ("descriptive")$
2457 (%i2) s2 : read_matrix (file_search ("wind.data"))$
2458 (%i3) fpprintprec : 7$
2461 [ 17.22191 13.61811 14.37217 19.39624 15.42162 ]
2463 [ 13.61811 14.98774 13.30448 15.15834 14.9711 ]
2465 (%o4) [ 14.37217 13.30448 15.47573 17.32544 16.18171 ]
2467 [ 19.39624 15.15834 17.32544 32.17651 20.44685 ]
2469 [ 15.42162 14.9711 16.18171 20.44685 24.42308 ]
2473 See also function @mrefdot{cov1}
2475 @opencatbox{Categories:}
2476 @category{Package descriptive}
2483 @deffn {Function} cov1 (@var{matrix})
2485 The covariance matrix of the multivariate sample, defined as
2492 S = --- > (X - X) (X - X)'
2500 $${{1\over{n-1}}{\sum_{j=1}^{n}{\left(X_{j}-\bar{X}\right)\,\left(X_{j}-\bar{X}\right)'}}}$$
2502 where @math{X_j} is the @math{j}-th row of the sample matrix.
2507 @c load ("descriptive")$
2508 @c s2 : read_matrix (file_search ("wind.data"))$
2513 (%i1) load ("descriptive")$
2514 (%i2) s2 : read_matrix (file_search ("wind.data"))$
2515 (%i3) fpprintprec : 7$
2518 [ 17.39587 13.75567 14.51734 19.59216 15.5774 ]
2520 [ 13.75567 15.13913 13.43887 15.31145 15.12232 ]
2522 (%o4) [ 14.51734 13.43887 15.63205 17.50044 16.34516 ]
2524 [ 19.59216 15.31145 17.50044 32.50153 20.65338 ]
2526 [ 15.5774 15.12232 16.34516 20.65338 24.66977 ]
2530 See also function @mrefdot{cov}
2532 @opencatbox{Categories:}
2533 @category{Package descriptive}
2539 @anchor{global_variances}
2540 @deffn {Function} global_variances @
2541 @fname{global_variances} (@var{matrix}) @
2542 @fname{global_variances} (@var{matrix}, @var{options} ...)
2544 Function @code{global_variances} returns a list of global variance measures:
2548 @var{total variance}: @code{trace(S_1)},
2550 @var{mean variance}: @code{trace(S_1)/p},
2552 @var{generalized variance}: @code{determinant(S_1)},
2554 @var{generalized standard deviation}: @code{sqrt(determinant(S_1))},
2556 @var{effective variance} @code{determinant(S_1)^(1/p)}, (defined in: Pe@~na, D. (2002) @var{An@'alisis de datos multivariantes}; McGraw-Hill, Madrid.)
2558 @var{effective standard deviation}: @code{determinant(S_1)^(1/(2*p))}.
2560 where @var{p} is the dimension of the multivariate random variable and @math{S_1} the covariance matrix returned by @code{cov1}.
2566 @code{'data}, default @code{'true}, indicates whether the input matrix contains the sample data,
2567 in which case the covariance matrix @code{cov1} must be calculated, or not, and then the covariance
2568 matrix (symmetric) must be given, instead of the data.
2573 Calculate the @code{global_variances} from sample data.
2576 @c load ("descriptive")$
2577 @c s2 : read_matrix (file_search ("wind.data"))$
2578 @c global_variances (s2);
2581 (%i1) load ("descriptive")$
2582 (%i2) s2 : read_matrix (file_search ("wind.data"))$
2584 (%i3) global_variances (s2);
2585 (%o3) [105.33834206060595, 21.06766841212119, 12874.34690469686,
2586 113.46517926085015, 6.636590811800794, 2.5761581496097623]
2590 Calculate the @code{global_variances} from the covariance matrix.
2593 @c load ("descriptive")$
2594 @c s2 : read_matrix (file_search ("wind.data"))$
2596 @c global_variances (s, data=false);
2599 (%i1) load ("descriptive")$
2600 (%i2) s2 : read_matrix (file_search ("wind.data"))$
2601 (%i3) s : cov1 (s2)$
2603 (%i4) global_variances (s, data=false);
2604 (%o4) [105.33834206060595, 21.06766841212119, 12874.34690469686,
2605 113.46517926085015, 6.636590811800794, 2.5761581496097623]
2609 See also @mref{cov} and @mrefdot{cov1}
2611 @opencatbox{Categories:}
2612 @category{Package descriptive}
2619 @deffn {Function} cor @
2620 @fname{cor} (@var{matrix}) @
2621 @fname{cor} (@var{matrix}, @var{logical_value})
2623 The correlation matrix of the multivariate sample.
2629 @code{'data}, default @code{'true}, indicates whether the input matrix contains the sample data,
2630 in which case the covariance matrix @code{cov1} must be calculated, or not, and then the covariance
2631 matrix (symmetric) must be given, instead of the data.
2637 @c load ("descriptive")$
2638 @c fpprintprec : 7 $
2639 @c s2 : read_matrix (file_search ("wind.data"))$
2643 (%i1) load ("descriptive")$
2644 (%i2) fpprintprec : 7 $
2645 (%i3) s2 : read_matrix (file_search ("wind.data"))$
2648 [ 1.0 0.8476339 0.8803515 0.8239624 0.7519506 ]
2650 [ 0.8476339 1.0 0.8735834 0.6902622 0.782502 ]
2652 (%o4) [ 0.8803515 0.8735834 1.0 0.7764065 0.8323358 ]
2654 [ 0.8239624 0.6902622 0.7764065 1.0 0.7293848 ]
2656 [ 0.7519506 0.782502 0.8323358 0.7293848 1.0 ]
2660 Calculate the correlation matrix from the covariance matrix.
2663 @c load ("descriptive")$
2664 @c fpprintprec : 7 $
2665 @c s2 : read_matrix (file_search ("wind.data"))$
2667 @c cor (s, data=false); /* this is faster */
2670 (%i1) load ("descriptive")$
2671 (%i2) fpprintprec : 7 $
2672 (%i3) s2 : read_matrix (file_search ("wind.data"))$
2673 (%i4) s : cov1 (s2)$
2675 (%i5) cor (s, data=false); /* this is faster */
2676 [ 1.0 0.8476339 0.8803515 0.8239624 0.7519506 ]
2678 [ 0.8476339 1.0 0.8735834 0.6902622 0.782502 ]
2680 (%o5) [ 0.8803515 0.8735834 1.0 0.7764065 0.8323358 ]
2682 [ 0.8239624 0.6902622 0.7764065 1.0 0.7293848 ]
2684 [ 0.7519506 0.782502 0.8323358 0.7293848 1.0 ]
2688 See also @mref{cov} and @mrefdot{cov1}
2690 @opencatbox{Categories:}
2691 @category{Package descriptive}
2697 @anchor{list_correlations}
2698 @deffn {Function} list_correlations @
2699 @fname{list_correlations} (@var{matrix}) @
2700 @fname{list_correlations} (@var{matrix}, @var{options} ...)
2702 Function @code{list_correlations} returns a list of correlation measures:
2707 @var{precision matrix}: the inverse of the covariance matrix @math{S_1},
2718 $${S_{1}^{-1}}={\left(s^{ij}\right)_{i,j=1,2,\ldots, p}}$$
2722 @var{multiple correlation vector}: @math{(R_1^2, R_2^2, ..., R_p^2)}, with
2735 $${R_{i}^{2}}={1-{{1}\over{s^{ii}s_{ii}}}}$$
2737 being an indicator of the goodness of fit of the linear multivariate regression model on @math{X_i} when the rest of variables are used as regressors.
2740 @var{partial correlation matrix}: with element @math{(i, j)} being
2747 ij.rest / ii jj\ 1/2
2754 $${r_{ij.rest}}={-{{s^{ij}}\over \sqrt{s^{ii}s^{jj}}}}$$
2763 @code{'data}, default @code{'true}, indicates whether the input matrix contains the sample data,
2764 in which case the covariance matrix @code{cov1} must be calculated, or not, and then the covariance
2765 matrix (symmetric) must be given, instead of the data.
2771 @c load ("descriptive")$
2772 @c s2 : read_matrix (file_search ("wind.data"))$
2773 @c z : list_correlations (s2)$
2775 @c precision_matrix: z[1];
2776 @c multiple_correlation_vector: z[2];
2777 @c partial_correlation_matrix: z[3];
2780 (%i1) load ("descriptive")$
2781 (%i2) s2 : read_matrix (file_search ("wind.data"))$
2782 (%i3) z : list_correlations (s2)$
2783 (%i4) fpprintprec : 5$
2785 (%i5) precision_matrix: z[1];
2787 [ 0.38486 - 0.13856 - 0.15626 - 0.10239 0.031179 ]
2789 [ - 0.13856 0.34107 - 0.15233 0.038447 - 0.052842 ]
2791 [ - 0.15626 - 0.15233 0.47296 - 0.024816 - 0.10054 ]
2793 [ - 0.10239 0.038447 - 0.024816 0.10937 - 0.034033 ]
2795 [ 0.031179 - 0.052842 - 0.10054 - 0.034033 0.14834 ]
2798 (%i6) multiple_correlation_vector: z[2];
2799 (%o6) [0.85063, 0.80634, 0.86474, 0.71867, 0.72675]
2802 (%i7) partial_correlation_matrix: z[3];
2803 [ - 1.0 0.38244 0.36627 0.49908 - 0.13049 ]
2805 [ 0.38244 - 1.0 0.37927 - 0.19907 0.23492 ]
2807 (%o7) [ 0.36627 0.37927 - 1.0 0.10911 0.37956 ]
2809 [ 0.49908 - 0.19907 0.10911 - 1.0 0.26719 ]
2811 [ - 0.13049 0.23492 0.37956 0.26719 - 1.0 ]
2815 See also @mref{cov} and @mrefdot{cov1}
2817 @opencatbox{Categories:}
2818 @category{Package descriptive}
2825 @anchor{principal_components}
2826 @deffn {Function} principal_components @
2827 @fname{principal_components} (@var{matrix}) @
2828 @fname{principal_components} (@var{matrix}, @var{options} ...)
2830 Calculates the principal components of a multivariate sample. Principal components are
2831 used in multivariate statistical analysis to reduce the dimensionality of the sample.
2837 @code{'data}, default @code{'true}, indicates whether the input matrix contains the sample data,
2838 in which case the covariance matrix @mref{cov1} must be calculated, or not, and then the covariance
2839 matrix (symmetric) must be given, instead of the data.
2842 The output of function @code{principal_components} is a list with the following results:
2846 variances of the principal components,
2848 percentage of total variance explained by each principal component,
2855 In this sample, the first component explains 83.13 per cent of total
2859 (%i1) load ("descriptive")$
2860 (%i2) s2 : read_matrix (file_search ("wind.data"))$
2861 (%i3) fpprintprec:4 $
2862 (%i4) res: principal_components(s2);
2863 0 errors, 0 warnings
2864 (%o4) [[87.57, 8.753, 5.515, 1.889, 1.613],
2865 [83.13, 8.31, 5.235, 1.793, 1.531],
2867 [ .4149 .03379 - .4757 - 0.581 - .5126 ]
2869 [ 0.369 - .3657 - .4298 .7237 - .1469 ]
2871 [ .3959 - .2178 - .2181 - .2749 .8201 ]]
2873 [ .5548 .7744 .1857 .2319 .06498 ]
2875 [ .4765 - .4669 0.712 - .09605 - .1969 ]
2877 (%i5) /* accumulated percentages */
2878 block([ap: copy(res[2])],
2879 for k:2 thru length(ap) do ap[k]: ap[k]+ap[k-1],
2881 (%o5) [83.13, 91.44, 96.68, 98.47, 100.0]
2882 (%i6) /* sample dimension */
2883 p: length(first(res));
2885 (%i7) /* plot percentages to select number of
2886 principal components for further work */
2889 apply(bars, makelist([k, res[2][k], 1/2], k, p)),
2890 points_joined = true,
2891 point_type = filled_circle,
2893 points(makelist([k, res[2][k]], k, p)),
2894 xlabel = "Variances",
2895 ylabel = "Percentages",
2896 xtics = setify(makelist([concat("PC",k),k], k, p))) $
2899 In case the covariance matrix is known, it can be passed to the function,
2900 but option @code{data=false} must be used.
2903 (%i1) load ("descriptive")$
2904 (%i2) S: matrix([1,-2,0],[-2,5,0],[0,0,2]);
2910 (%i3) fpprintprec:4 $
2911 (%i4) /* the argument is a covariance matrix */
2912 res: principal_components(S, data=false);
2913 0 errors, 0 warnings
2914 [ - .3827 0.0 .9239 ]
2916 (%o4) [[5.828, 2.0, .1716], [72.86, 25.0, 2.145], [ .9239 0.0 .3827 ]]
2919 (%i5) /* transformation to get the principal components
2920 from original records */
2921 matrix([a1,b2,c3],[a2,b2,c2]).last(res);
2922 [ .9239 b2 - .3827 a1 1.0 c3 .3827 b2 + .9239 a1 ]
2924 [ .9239 b2 - .3827 a2 1.0 c2 .3827 b2 + .9239 a2 ]
2927 @opencatbox{Categories:}
2928 @category{Package descriptive}
2934 @node Functions and Variables for statistical graphs, , Functions and Variables for descriptive statistics, Package descriptive
2935 @section Functions and Variables for statistical graphs
2940 @deffn {Function} barsplot (@var{data1}, @var{data2}, @dots{}, @var{option_1}, @var{option_2}, @dots{})
2942 Plots bars diagrams for discrete statistical variables,
2943 both for one or multiple samples.
2945 @var{data} can be a list of outcomes representing one sample, or a
2946 matrix of @var{m} rows and @var{n} columns, representing @var{n} samples of size
2949 Available options are:
2954 @var{box_width} (default, @code{3/4}): relative width of rectangles. This
2955 value must be in the range @code{[0,1]}.
2958 @var{grouping} (default, @code{clustered}): indicates how multiple samples are
2959 shown. Valid values are: @code{clustered} and @code{stacked}.
2962 @var{groups_gap} (default, @code{1}): a positive integer number representing
2963 the gap between two consecutive groups of bars.
2966 @var{bars_colors} (default, @code{[]}): a list of colors for multiple samples.
2967 When there are more samples than specified colors, the extra necessary colors
2968 are chosen at random. See @code{color} to learn more about them.
2971 @var{frequency} (default, @code{absolute}): indicates the scale of the
2972 ordinates. Possible values are: @code{absolute}, @code{relative},
2976 @var{ordering} (default, @code{orderlessp}): possible values are @code{orderlessp} or @code{ordergreatp},
2977 indicating how statistical outcomes should be ordered on the @var{x}-axis.
2980 @var{sample_keys} (default, @code{[]}): a list with the strings to be used in the legend.
2981 When the list length is other than 0 or the number of samples, an error message is returned.
2984 @var{start_at} (default, @code{0}): indicates where the plot begins to be plotted on the
2988 All global @code{draw} options, except @code{xtics}, which is
2989 internally assigned by @code{barsplot}.
2990 If you want to set your own values for this option or want to build
2991 complex scenes, make use of @code{barsplot_description}. See example below.
2994 The following local @ref{Package draw} options: @mrefcomma{key} @mrefcomma{color_draw}
2995 @mrefcomma{fill_color} @mref{fill_density} and @mrefdot{line_width}
3001 There is also a function @code{wxbarsplot} for creating embedded
3002 histograms in interfaces wxMaxima and iMaxima. @code{barsplot} in a
3007 Univariate sample in matrix form. Absolute frequencies.
3010 @c load ("descriptive")$
3011 @c m : read_matrix (file_search ("biomed.data"))$
3015 @c xlabel = "years",
3017 @c fill_density = 3/4)$
3020 (%i1) load ("descriptive")$
3021 (%i2) m : read_matrix (file_search ("biomed.data"))$
3028 fill_density = 3/4)$
3032 Two samples of different sizes, with
3033 relative frequencies and user declared colors.
3036 @c load ("descriptive")$
3037 @c l1:makelist(random(10),k,1,50)$
3038 @c l2:makelist(random(10),k,1,100)$
3042 @c fill_density = 1,
3043 @c bars_colors = [black, grey],
3044 @c frequency = relative,
3045 @c sample_keys = ["A", "B"])$
3048 (%i1) load ("descriptive")$
3049 (%i2) l1:makelist(random(10),k,1,50)$
3050 (%i3) l2:makelist(random(10),k,1,100)$
3056 bars_colors = [black, grey],
3057 frequency = relative,
3058 sample_keys = ["A", "B"])$
3062 Four non numeric samples of equal size.
3065 @c load ("descriptive")$
3067 @c makelist([Yes, No, Maybe][random(3)+1],k,1,50),
3068 @c makelist([Yes, No, Maybe][random(3)+1],k,1,50),
3069 @c makelist([Yes, No, Maybe][random(3)+1],k,1,50),
3070 @c makelist([Yes, No, Maybe][random(3)+1],k,1,50),
3071 @c title = "Asking for something to four groups",
3072 @c ylabel = "# of individuals",
3074 @c fill_density = 0.5,
3075 @c ordering = ordergreatp)$
3078 (%i1) load ("descriptive")$
3081 makelist([Yes, No, Maybe][random(3)+1],k,1,50),
3082 makelist([Yes, No, Maybe][random(3)+1],k,1,50),
3083 makelist([Yes, No, Maybe][random(3)+1],k,1,50),
3084 makelist([Yes, No, Maybe][random(3)+1],k,1,50),
3085 title = "Asking for something to four groups",
3086 ylabel = "# of individuals",
3089 ordering = ordergreatp)$
3096 @c load ("descriptive")$
3098 @c makelist([Yes, No, Maybe][random(3)+1],k,1,50),
3099 @c makelist([Yes, No, Maybe][random(3)+1],k,1,50),
3100 @c makelist([Yes, No, Maybe][random(3)+1],k,1,50),
3101 @c makelist([Yes, No, Maybe][random(3)+1],k,1,50),
3102 @c title = "Asking for something to four groups",
3103 @c ylabel = "# of individuals",
3104 @c grouping = stacked,
3105 @c fill_density = 0.5,
3106 @c ordering = ordergreatp)$
3109 (%i1) load ("descriptive")$
3112 makelist([Yes, No, Maybe][random(3)+1],k,1,50),
3113 makelist([Yes, No, Maybe][random(3)+1],k,1,50),
3114 makelist([Yes, No, Maybe][random(3)+1],k,1,50),
3115 makelist([Yes, No, Maybe][random(3)+1],k,1,50),
3116 title = "Asking for something to four groups",
3117 ylabel = "# of individuals",
3120 ordering = ordergreatp)$
3124 For bars diagrams related options, see @mref{barsplot} of package @ref{Package draw}
3125 See also functions @mref{histogram} and @mrefdot{piechart}
3127 @opencatbox{Categories:}
3128 @category{Package descriptive}
3133 @anchor{barsplot_description}
3134 @deffn {Function} barsplot_description (@dots{})
3136 Function @code{barsplot_description} creates a graphic object
3137 suitable for creating complex scenes, together with other
3140 Example: @code{barsplot} in a multiplot context.
3143 (%i1) load ("descriptive")$
3144 (%i2) l1:makelist(random(10),k,1,50)$
3145 (%i3) l2:makelist(random(10),k,1,100)$
3147 barsplot_description(
3151 bars_colors = [blue],
3152 frequency = relative)$
3154 barsplot_description(
3158 bars_colors = [red],
3159 frequency = relative)$
3160 (%i6) draw(gr2d(bp1), gr2d(bp2))$
3163 @opencatbox{Categories:}
3164 @category{Package descriptive}
3170 @deffn {Function} boxplot (@var{data}) @
3171 @fname{boxplot} (@var{data}, @var{option_1}, @var{option_2}, @dots{})
3173 This function plots box-and-whisker diagrams. Argument @var{data} can be a list,
3174 which is not of great interest, since these diagrams are mainly used for
3175 comparing different samples, or a matrix, so it is possible to compare
3176 two or more components of a multivariate statistical variable.
3177 But it is also allowed @var{data} to be a list of samples with
3178 possible different sample sizes, in fact this is the only function
3179 in package @code{descriptive} that admits this type of data structure.
3181 The box is plotted from the first quartile to the third, with an horizontal
3182 segment situated at the second quartile or median. By default, lower and
3183 upper whiskers are plotted at the minimum and maximum values,
3184 respectively. Option @var{range} can be used to indicate that values greater
3185 than @code{quantile(x,3/4)+range*(quantile(x,3/4)-quantile(x,1/4))} or
3186 less than @code{quantile(x,1/4)-range*(quantile(x,3/4)-quantile(x,1/4))}
3187 must be considered as outliers, in which case they are plotted as
3188 isolated points, and the whiskers are located at the extremes of the rest of
3191 Available options are:
3196 @var{box_width} (default, @code{3/4}): relative width of boxes.
3197 This value must be in the range @code{[0,1]}.
3200 @var{box_orientation} (default, @code{vertical}): possible values: @code{vertical}
3201 and @code{horizontal}.
3204 @var{range} (default, @code{inf}): positive coefficient of the interquartilic range
3205 to set outliers boundaries.
3208 @var{outliers_size} (default, @code{1}): circle size for isolated outliers.
3211 All @code{draw} options, except @code{points_joined}, @code{point_size}, @code{point_type},
3212 @code{xtics}, @code{ytics}, @code{xrange}, and @code{yrange}, which are
3213 internally assigned by @code{boxplot}.
3214 If you want to set your own values for this options or want to build
3215 complex scenes, make use of @code{boxplot_description}.
3218 The following local @code{draw} options: @code{key}, @code{color},
3219 and @code{line_width}.
3223 There is also a function @code{wxboxplot} for creating embedded
3224 histograms in interfaces wxMaxima and iMaxima.
3228 Box-and-whisker diagram from a multivariate sample.
3231 @c load ("descriptive")$
3232 @c s2 : read_matrix(file_search("wind.data"))$
3235 @c title = "Windspeed in knots",
3236 @c xlabel = "Stations",
3241 (%i1) load ("descriptive")$
3242 (%i2) s2 : read_matrix(file_search("wind.data"))$
3246 title = "Windspeed in knots",
3247 xlabel = "Stations",
3253 Box-and-whisker diagram from three samples of different sizes.
3256 @c load ("descriptive")$
3258 @c [[6, 4, 6, 2, 4, 8, 6, 4, 6, 4, 3, 2],
3259 @c [8, 10, 7, 9, 12, 8, 10],
3260 @c [16, 13, 17, 12, 11, 18, 13, 18, 14, 12]]$
3261 @c boxplot (A, box_orientation = horizontal)$
3264 (%i1) load ("descriptive")$
3267 [[6, 4, 6, 2, 4, 8, 6, 4, 6, 4, 3, 2],
3268 [8, 10, 7, 9, 12, 8, 10],
3269 [16, 13, 17, 12, 11, 18, 13, 18, 14, 12]]$
3271 (%i3) boxplot (A, box_orientation = horizontal)$
3274 Option @var{range} can be used to handle outliers.
3277 @c load ("descriptive")$
3278 @c B: [[7, 15, 5, 8, 6, 5, 7, 3, 1],
3279 @c [10, 8, 12, 8, 11, 9, 20],
3280 @c [23, 17, 19, 7, 22, 19]] $
3281 @c boxplot (B, range=1)$
3282 @c boxplot (B, range=1.5, box_orientation = horizontal)$
3284 @c boxplot_description(
3288 @c outliers_size = 2,
3290 @c background_color = light_gray),
3291 @c xtics = {["Low",1],["Medium",2],["High",3]}) $
3295 (%i1) load ("descriptive")$
3296 B: [[7, 15, 5, 8, 6, 5, 7, 3, 1],
3297 [10, 8, 12, 8, 11, 9, 20],
3298 [23, 17, 19, 7, 22, 19]] $
3299 boxplot (B, range=1)$
3300 boxplot (B, range=1.5, box_orientation = horizontal)$
3302 boxplot_description(
3308 background_color = light_gray),
3309 xtics = @{["Low",1],["Medium",2],["High",3]@}) $
3313 @opencatbox{Categories:}
3314 @category{Package descriptive}
3319 @anchor{boxplot_description}
3320 @deffn {Function} boxplot_description (@dots{})
3322 Function @code{boxplot_description} creates a graphic object
3323 suitable for creating complex scenes, together with other
3326 @opencatbox{Categories:}
3327 @category{Package descriptive}
3333 @deffn {Function} histogram @
3334 @fname{histogram} (@var{list}) @
3335 @fname{histogram} (@var{list}, @var{option_1}, @var{option_2}, @dots{}) @
3336 @fname{histogram} (@var{one_column_matrix}) @
3337 @fname{histogram} (@var{one_column_matrix}, @var{option_1}, @var{option_2}, @dots{}) @
3338 @fname{histogram} (@var{one_row_matrix}) @
3339 @fname{histogram} (@var{one_row_matrix}, @var{option_1}, @var{option_2}, @dots{})
3341 Constructs and displays a histogram from a data sample.
3342 Data must be stored as a list of numbers, or a matrix of one row or one column.
3349 @code{nclasses} (default, 10):
3350 the number of classes (also called bins) in the histogram,
3351 or a list of two numbers (the least and greatest values included in the histogram),
3352 or a list of three numbers (the least and greatest values included in the histogram, and the number of classes),
3353 or a set containing the endpoints of the class intervals,
3354 or a symbol specifying the name of one of three algorithms to automatically determine the number of classes:
3355 @code{fd} (Ref. [1]), @code{scott} (Ref. [2]), or @code{sturges} (Ref. [3]).
3357 A class interval excludes its left endpoint and includes its right endpoint,
3358 except for the first interval, which includes both the left and right endpoints.
3359 It is assumed that class intervals are contiguous.
3360 That is, the right endpoint of one interval is equal to the left endpoint of the next.
3363 @code{frequency} (default, @code{absolute}): indicates the scale of the vertical axis.
3364 Possible values are: @code{absolute} (heights of bars add up to number of data),
3365 @code{relative} (heights of bars add up to 1),
3366 @code{percent} (heights of bars add up to 100),
3367 and @code{density} (total area of histogram is 1).
3370 @code{htics} (default, @code{auto}): format of tic marks on the horizontal axis.
3371 Possible values are: @code{auto} (tics are placed automatically),
3372 @code{endpoints} (tics are placed at the divisions between classes),
3373 @code{intervals} (classes are labeled with the corresponding intervals),
3374 or a list of labels, one for each class.
3377 All global @code{draw} options, except @code{xrange}, @code{yrange},
3378 and @code{xtics}, which are internally assigned by @code{histogram}.
3379 If you want to set your own values for these options, make use of
3380 @code{histogram_description}.
3383 The following local @ref{Package draw} options: @mrefcomma{key}
3384 @mrefcomma{fill_color} @mrefcomma{fill_density} and @mrefdot{line_width}
3385 Note that the outlines of bars,
3386 as well as the interior of bars when @code{fill_density} is nonzero,
3387 are drawn with @code{fill_color}, not @code{color}.
3391 @code{histogram} honors the global option @code{histogram_skyline}.
3392 When @code{histogram_skyline} is @code{true},
3393 @code{histogram} and @code{histogram_description} construct "skyline" plots,
3394 which shows the outline of the histogram bars,
3395 instead of drawing all the vertical segments.
3396 Otherwise (the default), histograms are displayed with bars showing vertical segments.
3398 There is also a function @code{wxhistogram} for creating embedded
3399 histograms in interfaces wxMaxima and iMaxima.
3401 See also @mrefcomma{continuous_freq}
3402 which, like @code{histogram},
3403 counts data in intervals,
3404 but returns the counts instead of displaying a graphic representation.
3406 See also @mrefdot{barsplot}
3410 A simple histogram with eight classes:
3413 @c load ("descriptive")$
3414 @c s1 : read_list (file_search ("pidigits.data"))$
3418 @c title = "pi digits",
3419 @c xlabel = "digits",
3420 @c ylabel = "Absolute frequency",
3421 @c fill_color = grey,
3422 @c fill_density = 0.6)$
3425 (%i1) load ("descriptive")$
3426 (%i2) s1 : read_list (file_search ("pidigits.data"))$
3431 title = "pi digits",
3433 ylabel = "Absolute frequency",
3435 fill_density = 0.6)$
3439 Setting the limits of the histogram to -2 and 12, with 3 classes.
3440 Also, we introduce predefined tics:
3443 @c load ("descriptive")$
3444 @c s1 : read_list (file_search ("pidigits.data"))$
3447 @c nclasses = [-2,12,3],
3448 @c htics = ["A", "B", "C"],
3450 @c fill_color = "#23afa0",
3451 @c fill_density = 0.6)$
3454 (%i1) load ("descriptive")$
3455 (%i2) s1 : read_list (file_search ("pidigits.data"))$
3459 nclasses = [-2,12,3],
3460 htics = ["A", "B", "C"],
3462 fill_color = "#23afa0",
3463 fill_density = 0.6)$
3467 Bounds for varying class widths.
3470 @c load ("descriptive")$
3471 @c s1 : read_list (file_search ("pidigits.data"))$
3472 @c histogram (s1, nclasses = {0,3,6,7,11})$
3475 (%i1) load ("descriptive")$
3476 (%i2) s1 : read_list (file_search ("pidigits.data"))$
3477 (%i3) histogram (s1, nclasses = @{0,3,6,7,11@})$
3480 Freedman-Diaconis formula for the number of classes.
3483 @c load ("descriptive")$
3484 @c s1 : read_list (file_search ("pidigits.data"))$
3485 @c histogram(s1, nclasses=fd) $
3488 (%i1) load ("descriptive")$
3489 (%i2) s1 : read_list (file_search ("pidigits.data"))$
3490 (%i3) histogram(s1, nclasses=fd) $
3495 [1] Freedman, D., and Diaconis, P. (1981) On the histogram as a density estimator: L_2 theory.
3496 Zeitschrift f@"ur Wahrscheinlichkeitstheorie und verwandte Gebiete 57, 453-476.
3498 [2] Scott, D. W. (1979) On optimal and data-based histograms. Biometrika 66, 605-610.
3500 [3] Sturges, H. A. (1926) The choice of a class interval. Journal of the American Statistical Association 21, 65-66.
3502 @opencatbox{Categories:}
3503 @category{Package descriptive}
3508 @anchor{histogram_description}
3509 @deffn {Function} histogram_description (@dots{})
3511 Creates a graphic object which represents a histogram.
3512 Such an object is suitable for creating complex scenes together with other graphic objects,
3513 to be displayed by @code{draw2d}.
3515 @code{histogram_description} takes the same arguments
3516 as the stand-alone function @code{histogram}.
3517 See @mref{histogram} for more information.
3521 We make use of @code{histogram_description} for setting
3522 @code{xrange} and adding an explicit curve into the scene:
3525 (%i1) load ("descriptive")$
3526 (%i2) ( load("distrib"),
3528 s2: random_normal(m, s, 1000) ) $
3532 histogram_description(
3535 frequency = density,
3536 fill_density = 0.5),
3537 explicit(pdf_normal(x,m,s), x, m - 3*s, m + 3* s))$
3540 @opencatbox{Categories:}
3541 @category{Package descriptive}
3546 @anchor{histogram_skyline}
3547 @defvr {Option variable} histogram_skyline
3548 Default value: @code{false}
3550 When @code{histogram_skyline} is @code{true},
3551 @code{histogram} and @code{histogram_description} construct "skyline" plots,
3552 which shows the outline of the histogram bars,
3553 instead of drawing all the vertical segments.
3555 The outline is drawn with the current @code{fill_color} (not the current @code{color}).
3556 The interior of the histogram is filled with @code{fill_color},
3557 but only if @code{fill_density} is nonzero.
3559 Otherwise, histograms are displayed with bars showing vertical segments.
3563 Construct a skyline histogram,
3564 and an ordinary histogram for comparison,
3568 (%i1) load ("descriptive") $
3569 (%i2) L: read_list (file_search ("pidigits.data")) $
3570 (%i3) histogram_skyline: true $
3571 (%i4) skyline_hist: histogram_description (L) $
3572 (%i5) histogram_skyline: false $
3573 (%i6) ordinary_hist: histogram_description (L) $
3574 (%i7) draw (gr2d (skyline_hist), gr2d (ordinary_hist)) $
3577 Continuing the preceding example.
3578 Set display options for @code{fill_color} and @code{fill_density}.
3581 (%i8) histogram_skyline: true $
3582 (%i9) skyline_hist: histogram_description (L, fill_color = blue, fill_density = 0.2) $
3583 (%i10) histogram_skyline: false $
3584 (%i11) ordinary_hist: histogram_description (L, fill_color = blue, fill_density = 0.2) $
3585 (%i12) draw (gr2d (skyline_hist), gr2d (ordinary_hist)) $
3588 @opencatbox{Categories:}
3589 @category{Package descriptive}
3595 @deffn {Function} piechart @
3596 @fname{piechart} (@var{list}) @
3597 @fname{piechart} (@var{list}, @var{option_1}, @var{option_2}, @dots{}) @
3598 @fname{piechart} (@var{one_column_matrix}) @
3599 @fname{piechart} (@var{one_column_matrix}, @var{option_1}, @var{option_2}, @dots{}) @
3600 @fname{piechart} (@var{one_row_matrix}) @
3601 @fname{piechart} (@var{one_row_matrix}, @var{option_1}, @var{option_2}, @dots{})
3603 Similar to @code{barsplot}, but plots sectors instead of rectangles.
3605 Available options are:
3610 @var{sector_colors} (default, @code{[]}): a list of colors for sectors.
3611 When there are more sectors than specified colors, the extra necessary colors
3612 are chosen at random. See @code{color} to learn more about them.
3615 @var{pie_center} (default, @code{[0,0]}): diagram's center.
3618 @var{pie_radius} (default, @code{1}): diagram's radius.
3621 All global @code{draw} options, except @code{key}, which is
3622 internally assigned by @code{piechart}.
3623 If you want to set your own values for this option or want to build
3624 complex scenes, make use of @code{piechart_description}.
3627 The following local @code{draw} options: @code{key}, @code{color},
3628 @code{fill_density} and @code{line_width}. See also
3633 There is also a function @code{wxpiechart} for
3634 creating embedded histograms in interfaces wxMaxima and iMaxima.
3639 @c load ("descriptive")$
3640 @c s1 : read_list (file_search ("pidigits.data"))$
3643 @c xrange = [-1.1, 1.3],
3644 @c yrange = [-1.1, 1.1],
3645 @c title = "Digit frequencies in pi")$
3648 (%i1) load ("descriptive")$
3649 (%i2) s1 : read_list (file_search ("pidigits.data"))$
3653 xrange = [-1.1, 1.3],
3654 yrange = [-1.1, 1.1],
3655 title = "Digit frequencies in pi")$
3659 See also function @mrefdot{barsplot}
3661 @opencatbox{Categories:}
3662 @category{Package descriptive}
3667 @anchor{piechart_description}
3668 @deffn {Function} piechart_description (@dots{})
3670 Function @code{piechart_description} creates a graphic object
3671 suitable for creating complex scenes, together with other
3674 @opencatbox{Categories:}
3675 @category{Package descriptive}
3680 @anchor{scatterplot}
3681 @deffn {Function} scatterplot @
3682 @fname{scatterplot} (@var{list}) @
3683 @fname{scatterplot} (@var{list}, @var{option_1}, @var{option_2}, @dots{}) @
3684 @fname{scatterplot} (@var{matrix}) @
3685 @fname{scatterplot} (@var{matrix}, @var{option_1}, @var{option_2}, @dots{})
3687 Plots scatter diagrams both for univariate (@var{list}) and multivariate
3688 (@var{matrix}) samples.
3690 Available options are the same admitted by @code{histogram}.
3692 There is also a function @code{wxscatterplot} for
3693 creating embedded histograms in interfaces wxMaxima and iMaxima.
3697 Univariate scatter diagram from a simulated Gaussian sample.
3700 @c load ("descriptive")$
3701 @c load ("distrib")$
3703 @c random_normal(0,1,200),
3706 @c dimensions = [600,150])$
3709 (%i1) load ("descriptive")$
3710 (%i2) load ("distrib")$
3713 random_normal(0,1,200),
3716 dimensions = [600,150])$
3720 Two dimensional scatter plot.
3723 @c load ("descriptive")$
3724 @c s2 : read_matrix (file_search ("wind.data"))$
3726 @c submatrix(s2, 1,2,3),
3727 @c title = "Data from stations #4 and #5",
3728 @c point_type = diamant,
3733 (%i1) load ("descriptive")$
3734 (%i2) s2 : read_matrix (file_search ("wind.data"))$
3737 submatrix(s2, 1,2,3),
3738 title = "Data from stations #4 and #5",
3739 point_type = diamant,
3745 Three dimensional scatter plot.
3748 @c load ("descriptive")$
3749 @c s2 : read_matrix (file_search ("wind.data"))$
3750 @c scatterplot(submatrix (s2, 1,2), nclasses=4)$
3753 (%i1) load ("descriptive")$
3754 (%i2) s2 : read_matrix (file_search ("wind.data"))$
3755 (%i3) scatterplot(submatrix (s2, 1,2), nclasses=4)$
3758 Five dimensional scatter plot, with five classes histograms.
3761 @c load ("descriptive")$
3762 @c s2 : read_matrix (file_search ("wind.data"))$
3766 @c frequency = relative,
3767 @c fill_color = blue,
3768 @c fill_density = 0.3,
3772 (%i1) load ("descriptive")$
3773 (%i2) s2 : read_matrix (file_search ("wind.data"))$
3778 frequency = relative,
3785 For plotting isolated or line-joined points in two and three dimensions,
3786 see @code{points}. See also @mrefdot{histogram}
3788 @opencatbox{Categories:}
3789 @category{Package descriptive}
3794 @anchor{scatterplot_description}
3795 @deffn {Function} scatterplot_description (@dots{})
3797 Function @code{scatterplot_description} creates a graphic object
3798 suitable for creating complex scenes, together with other
3801 @opencatbox{Categories:}
3802 @category{Package descriptive}
3808 @deffn {Function} starplot (@var{data1}, @var{data2}, @dots{}, @var{option_1}, @var{option_2}, @dots{})
3810 Plots star diagrams for discrete statistical variables,
3811 both for one or multiple samples.
3813 @var{data} can be a list of outcomes representing one sample, or a
3814 matrix of @var{m} rows and @var{n} columns, representing @var{n} samples of size
3817 Available options are:
3822 @var{stars_colors} (default, @code{[]}): a list of colors for multiple samples.
3823 When there are more samples than specified colors, the extra necessary colors
3824 are chosen at random. See @code{color} to learn more about them.
3827 @var{frequency} (default, @code{absolute}): indicates the scale of the
3828 radii. Possible values are: @code{absolute} and @code{relative}.
3831 @var{ordering} (default, @code{orderlessp}): possible values are @code{orderlessp} or @code{ordergreatp},
3832 indicating how statistical outcomes should be ordered.
3835 @var{sample_keys} (default, @code{[]}): a list with the strings to be used in the legend.
3836 When the list length is other than 0 or the number of samples, an error message is returned.
3840 @var{star_center} (default, @code{[0,0]}): diagram's center.
3843 @var{star_radius} (default, @code{1}): diagram's radius.
3846 All global @code{draw} options, except @code{points_joined}, @code{point_type},
3847 and @code{key}, which are internally assigned by @code{starplot}.
3848 If you want to set your own values for this options or want to build
3849 complex scenes, make use of @code{starplot_description}.
3852 The following local @code{draw} option: @code{line_width}.
3856 There is also a function @code{wxstarplot} for
3857 creating embedded histograms in interfaces wxMaxima and iMaxima.
3861 Plot based on absolute frequencies.
3862 Location and radius defined by the user.
3865 (%i1) load ("descriptive")$
3866 (%i2) l1: makelist(random(10),k,1,50)$
3867 (%i3) l2: makelist(random(10),k,1,200)$
3871 stars_colors = [blue,red],
3872 sample_keys = ["1st sample", "2nd sample"],
3873 star_center = [1,2],
3875 proportional_axes = xy,
3880 @opencatbox{Categories:}
3881 @category{Package descriptive}
3886 @anchor{starplot_description}
3887 @deffn {Function} starplot_description (@dots{})
3889 Function @code{starplot_description} creates a graphic object
3890 suitable for creating complex scenes, together with other
3893 @opencatbox{Categories:}
3894 @category{Package descriptive}
3900 @deffn {Function} stemplot @
3901 @fname{stemplot} (@var{data}) @
3902 @fname{stemplot} (@var{data}, @var{option})
3904 Plots stem and leaf diagrams.
3906 The only available option is:
3911 @var{leaf_unit} (default, @code{1}): indicates the unit of the leaves; must be a
3919 (%i1) load ("descriptive")$
3920 (%i2) load("distrib")$
3923 random_normal(15, 6, 100),
3957 @opencatbox{Categories:}
3958 @category{Package descriptive}