2 * Introduction to descriptive::
3 * Functions and Variables for data manipulation::
4 * Functions and Variables for descriptive statistics::
5 * Functions and Variables for statistical graphs::
8 @node Introduction to descriptive, Functions and Variables for data manipulation, descriptive-pkg, descriptive-pkg
9 @section Introduction to descriptive
11 Package @code{descriptive} contains a set of functions for
12 making descriptive statistical computations and graphing.
13 Together with the source code there are three data sets in
14 your Maxima tree: @code{pidigits.data}, @code{wind.data} and @code{biomed.data}.
16 Any statistics manual can be used as a reference to the functions in package @code{descriptive}.
18 For comments, bugs or suggestions, please contact me at @var{'riotorto AT yahoo DOT com'}.
20 Here is a simple example on how the descriptive functions in @code{descriptive} do they work, depending on the nature of their arguments, lists or matrices,
23 @c load ("descriptive")$
24 @c /* univariate sample */ mean ([a, b, c]);
25 @c matrix ([a, b], [c, d], [e, f]);
26 @c /* multivariate sample */ mean (%);
29 (%i1) load ("descriptive")$
31 (%i2) /* univariate sample */ mean ([a, b, c]);
37 (%i3) matrix ([a, b], [c, d], [e, f]);
45 (%i4) /* multivariate sample */ mean (%);
47 (%o4) [---------, ---------]
52 Note that in multivariate samples the mean is calculated for each column.
54 In case of several samples with possible different sizes, the Maxima function @code{map} can be used to get the desired results for each sample,
57 @c load ("descriptive")$
58 @c map (mean, [[a, b, c], [d, e]]);
61 (%i1) load ("descriptive")$
63 (%i2) map (mean, [[a, b, c], [d, e]]);
65 (%o2) [---------, -----]
70 In this case, two samples of sizes 3 and 2 were stored into a list.
72 Univariate samples must be stored in lists like
75 @c s1 : [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5];
79 (%i1) s1 : [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5];
80 (%o1) [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]
84 and multivariate samples in matrices as in
87 @c s2 : matrix ([13.17, 9.29], [14.71, 16.88], [18.50, 16.88],
88 @c [10.58, 6.63], [13.33, 13.25], [13.21, 8.12]);
92 (%i1) s2 : matrix ([13.17, 9.29], [14.71, 16.88], [18.50, 16.88],
93 [10.58, 6.63], [13.33, 13.25], [13.21, 8.12]);
108 In this case, the number of columns equals the random variable dimension and the number of rows is the sample size.
110 Data can be introduced by hand, but big samples are usually stored in plain text files. For example, file @code{pidigits.data} contains the first 100 digits of number @code{%pi}:
126 In order to load these digits in Maxima,
129 @c s1 : read_list (file_search ("pidigits.data"))$
133 (%i1) s1 : read_list (file_search ("pidigits.data"))$
140 On the other hand, file @code{wind.data} contains daily average wind speeds at 5 meteorological stations in the Republic of Ireland (This is part of a data set taken at 12 meteorological stations. The original file is freely downloadable from the StatLib Data Repository and its analysis is discussed in Haslett, J., Raftery, A. E. (1989) @var{Space-time Modelling with Long-memory Dependence: Assessing Ireland's Wind Power Resource, with Discussion}. Applied Statistics 38, 1-50). This loads the data:
143 @c s2 : read_matrix (file_search ("wind.data"))$
145 @c s2 [%]; /* last record */
148 (%i1) s2 : read_matrix (file_search ("wind.data"))$
154 (%i3) s2 [%]; /* last record */
155 (%o3) [3.58, 6.0, 4.58, 7.62, 11.25]
159 Some samples contain non numeric data. As an example, file @code{biomed.data} (which is part of another bigger one downloaded from the StatLib Data Repository) contains four blood measures taken from two groups of patients, @code{A} and @code{B}, of different ages,
162 @c s3 : read_matrix (file_search ("biomed.data"))$
164 @c s3 [1]; /* first record */
167 (%i1) s3 : read_matrix (file_search ("biomed.data"))$
173 (%i3) s3 [1]; /* first record */
174 (%o3) [A, 30, 167.0, 89.0, 25.6, 364]
178 The first individual belongs to group @code{A}, is 30 years old and his/her blood measures were 167.0, 89.0, 25.6 and 364.
180 One must take care when working with categorical data. In the next example, symbol @code{a} is assigned a value in some previous moment and then a sample with categorical value @code{a} is taken,
184 @c matrix ([a, 3], [b, 5]);
189 (%i2) matrix ([a, 3], [b, 5]);
196 @opencatbox{Categories:}
197 @category{Descriptive statistics}
198 @category{Share packages}
199 @category{Package descriptive}
202 @node Functions and Variables for data manipulation, Functions and Variables for descriptive statistics, Introduction to descriptive, descriptive-pkg
203 @section Functions and Variables for data manipulation
207 @anchor{build_sample}
208 @deffn {Function} build_sample @
209 @fname{build_sample} (@var{list}) @
210 @fname{build_sample} (@var{matrix})
212 Builds a sample from a table of absolute frequencies.
213 The input table can be a matrix or a list of lists, all of
214 them of equal size. The number of columns or the length of
215 the lists must be greater than 1. The last element of each
216 row or list is interpreted as the absolute frequency.
217 The output is always a sample in matrix form.
221 Univariate frequency table.
224 @c load ("descriptive")$
225 @c sam1: build_sample([[6,1], [j,2], [2,1]]);
230 (%i1) load ("descriptive")$
232 (%i2) sam1: build_sample([[6,1], [j,2], [2,1]]);
247 (%i4) barsplot(sam1) $
250 Multivariate frequency table.
253 @c load ("descriptive")$
254 @c sam2: build_sample([[6,3,1], [5,6,2], [u,2,1],[6,8,2]]) ;
256 @c barsplot(sam2, grouping=stacked) $
259 (%i1) load ("descriptive")$
261 (%i2) sam2: build_sample([[6,3,1], [5,6,2], [u,2,1],[6,8,2]]) ;
277 [ u + 158 (u + 28) 2 u + 174 11 (u + 28) ]
278 [ -------- - --------- --------- - ----------- ]
281 [ 2 u + 174 11 (u + 28) 21 ]
282 [ --------- - ----------- -- ]
285 (%i4) barsplot(sam2, grouping=stacked) $
288 @opencatbox{Categories:}
289 @category{Package descriptive}
295 @anchor{continuous_freq}
296 @deffn {Function} continuous_freq @
297 @fname{continuous_freq} (@var{data}) @
298 @fname{continuous_freq} (@var{data}, @var{m})
300 Divides the range of @var{data} into intervals,
301 and counts how many values fall into each one.
303 A value @var{x} falls into an interval with left and right endpoints @var{a} and @var{b}
304 if and only if @code{@var{x} > @var{a}} and @code{@var{x} <= @var{b}},
305 except for the first (least or leftmost) interval,
306 for which @code{@var{x} >= @var{a}} and @code{@var{x} <= @var{b}}.
307 That is, an interval excludes its left endpoint and includes its right endpoint,
308 except for the first interval, which includes both the left and right endpoints.
310 @var{data} must be a list of numbers,
311 or 1-dimensional array (as created by @code{make_array}).
313 @var{m} is optional, and equals either the number of classes (10 by default),
314 or a list of two elements (the least and greatest values to be counted),
315 or a list of three elements (the least and greatest values to be counted, and the number of classes),
316 or a set containing the endpoints of the class intervals.
318 It is assumed that class intervals are contiguous.
319 That is, the right endpoint of one interval is equal to the left endpoint of the next.
321 @code{continuous_freq} returns a list of two lists.
322 The first list comprises all the endpoints of the class intervals,
323 concatenated into a single list.
324 The second list contains the class counts for the intervals corresponding to elements of the first list.
326 If sample values are all equal, this function returns exactly
327 one class of width 2.
331 Optional argument indicates the number of classes we want.
332 The first list in the output contains the interval limits, and
333 the second the corresponding counts: there are 16 digits inside
334 the interval @code{[0, 1.8]}, 24 digits in @code{(1.8, 3.6]}, and so on.
337 @c load ("descriptive")$
338 @c s1 : read_list (file_search ("pidigits.data"))$
339 @c continuous_freq (s1, 5);
342 (%i1) load ("descriptive")$
343 (%i2) s1 : read_list (file_search ("pidigits.data"))$
345 (%i3) continuous_freq (s1, 5);
347 (%o3) [[0, -, --, --, --, 9], [16, 24, 18, 17, 25]]
352 Optional argument indicates we want 7 classes with limits
356 @c load ("descriptive")$
357 @c s1 : read_list (file_search ("pidigits.data"))$
358 @c continuous_freq (s1, [-2,12,7]);
361 (%i1) load ("descriptive")$
362 (%i2) s1 : read_list (file_search ("pidigits.data"))$
364 (%i3) continuous_freq (s1, [-2,12,7]);
365 (%o3) [[- 2, 0, 2, 4, 6, 8, 10, 12], [8, 20, 22, 17, 20, 13, 0]]
369 Optional argument indicates we want the default number of classes with limits
373 @c load ("descriptive")$
374 @c s1 : read_list (file_search ("pidigits.data"))$
375 @c continuous_freq (s1, [-2,12]);
378 (%i1) load ("descriptive")$
379 (%i2) s1 : read_list (file_search ("pidigits.data"))$
381 (%i3) continuous_freq (s1, [-2,12]);
382 3 4 11 18 32 39 46 53
383 (%o3) [[- 2, - -, -, --, --, 5, --, --, --, --, 12],
385 [0, 8, 20, 12, 18, 9, 8, 25, 0, 0]]
389 The first argument may be an array.
392 @c load ("descriptive")$
393 @c s1 : read_list (file_search ("pidigits.data"))$
394 @c a1 : make_array (fixnum, length (s1)) $
395 @c fillarray (a1, s1);
396 @c continuous_freq (a1);
399 (%i1) load ("descriptive")$
400 (%i2) s1 : read_list (file_search ("pidigits.data"))$
401 (%i3) a1 : make_array (fixnum, length (s1)) $
403 (%i4) fillarray (a1, s1);
404 (%o4) @{Lisp Array: #(3 1 4 1 5 9 2 6 5 3 5 8 9 7 9 3 2 3 8 4 6 2\
406 0 2 8 8 4 1 9 7 1 6 9 3 9 9 3 7 5 1 0 5 8 2 0 9 7\
408 3 0 7 8 1 6 4 0 6 2 8 6 2 0 8 9 9 8 6 2 8 0 3 4 8\
413 (%i5) continuous_freq (a1);
414 9 9 27 18 9 27 63 36 81
415 (%o5) [[0, --, -, --, --, -, --, --, --, --, 9],
416 10 5 10 5 2 5 10 5 10
417 [8, 8, 12, 12, 10, 8, 9, 8, 12, 13]]
421 @opencatbox{Categories:}
422 @category{Package descriptive}
428 @anchor{discrete_freq}
429 @deffn {Function} discrete_freq (@var{data})
430 Counts absolute frequencies in discrete samples, both numeric and categorical. Its unique argument is a list,
431 or 1-dimensional array (as created by @code{make_array}).
434 @c load ("descriptive")$
435 @c s1 : read_list (file_search ("pidigits.data"))$
436 @c discrete_freq (s1);
439 (%i1) load ("descriptive")$
440 (%i2) s1 : read_list (file_search ("pidigits.data"))$
442 (%i3) discrete_freq (s1);
443 (%o3) [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
444 [8, 8, 12, 12, 10, 8, 9, 8, 12, 13]]
448 The first list gives the sample values and the second their absolute frequencies. Commands @code{? col} and @code{? transpose} should help you to understand the last input.
450 The argument may be an array.
453 @c load ("descriptive")$
454 @c s1 : read_list (file_search ("pidigits.data"))$
455 @c a1 : make_array (fixnum, length (s1)) $
456 @c fillarray (a1, s1);
457 @c discrete_freq (a1);
460 (%i1) load ("descriptive")$
461 (%i2) s1 : read_list (file_search ("pidigits.data"))$
462 (%i3) a1 : make_array (fixnum, length (s1)) $
464 (%i4) fillarray (a1, s1);
465 (%o4) @{Lisp Array: #(3 1 4 1 5 9 2 6 5 3 5 8 9 7 9 3 2 3 8 4 6 2\
467 0 2 8 8 4 1 9 7 1 6 9 3 9 9 3 7 5 1 0 5 8 2 0 9 7\
469 3 0 7 8 1 6 4 0 6 2 8 6 2 0 8 9 9 8 6 2 8 0 3 4 8\
474 (%i5) discrete_freq (a1);
475 (%o5) [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
476 [8, 8, 12, 12, 10, 8, 9, 8, 12, 13]]
480 @opencatbox{Categories:}
481 @category{Package descriptive}
489 @deffn {Function} standardize @
490 @fname{standardize} (@var{list}) @
491 @fname{standardize} (@var{matrix})
493 Subtracts to each element of the list the sample mean and divides
494 the result by the standard deviation. When the input is a matrix,
495 @code{standardize} subtracts to each row the multivariate mean, and then
496 divides each component by the corresponding standard deviation.
498 @opencatbox{Categories:}
499 @category{Package descriptive}
507 @deffn {Function} subsample @
508 @fname{subsample} (@var{data_matrix}, @var{predicate_function}) @
509 @fname{subsample} (@var{data_matrix}, @var{predicate_function}, @var{col_num1}, @var{col_num2}, ...)
511 This is a sort of variant of the Maxima @code{submatrix} function.
512 The first argument is the data matrix, the second is a predicate function
513 and optional additional arguments are the numbers of the columns to be taken.
514 Its behaviour is better understood with examples.
516 These are multivariate records in which the wind speed
517 in the first meteorological station were greater than 18.
518 See that in the lambda expression the @var{i}-th component is
519 referred to as @code{v[i]}.
521 @c load ("descriptive")$
522 @c s2 : read_matrix (file_search ("wind.data"))$
523 @c subsample (s2, lambda([v], v[1] > 18));
526 (%i1) load ("descriptive")$
527 (%i2) s2 : read_matrix (file_search ("wind.data"))$
529 (%i3) subsample (s2, lambda([v], v[1] > 18));
530 [ 19.38 15.37 15.12 23.09 25.25 ]
532 [ 18.29 18.66 19.08 26.08 27.63 ]
534 [ 20.25 21.46 19.95 27.71 23.38 ]
536 [ 18.79 18.96 14.46 26.38 21.84 ]
540 In the following example, we request only the first, second and fifth
541 components of those records with wind speeds greater or equal than 16
542 in station number 1 and less than 25 knots in station number 4. The sample
543 contains only data from stations 1, 2 and 5. In this case,
544 the predicate function is defined as an ordinary Maxima function.
546 @c load ("descriptive")$
547 @c s2 : read_matrix (file_search ("wind.data"))$
548 @c g(x):= x[1] >= 16 and x[4] < 25$
549 @c subsample (s2, g, 1, 2, 5);
552 (%i1) load ("descriptive")$
553 (%i2) s2 : read_matrix (file_search ("wind.data"))$
554 (%i3) g(x):= x[1] >= 16 and x[4] < 25$
556 (%i4) subsample (s2, g, 1, 2, 5);
557 [ 19.38 15.37 25.25 ]
559 [ 17.33 14.67 19.58 ]
561 [ 16.92 13.21 21.21 ]
563 [ 17.25 18.46 23.87 ]
567 Here is an example with the categorical variables of @code{biomed.data}.
568 We want the records corresponding to those patients in group @code{B}
569 who are older than 38 years.
571 @c load ("descriptive")$
572 @c s3 : read_matrix (file_search ("biomed.data"))$
573 @c h(u):= u[1] = B and u[2] > 38 $
574 @c subsample (s3, h);
577 (%i1) load ("descriptive")$
578 (%i2) s3 : read_matrix (file_search ("biomed.data"))$
579 (%i3) h(u):= u[1] = B and u[2] > 38 $
581 (%i4) subsample (s3, h);
582 [ B 39 28.0 102.3 17.1 146 ]
584 [ B 39 21.0 92.4 10.3 197 ]
586 [ B 39 23.0 111.5 10.0 133 ]
588 [ B 39 26.0 92.6 12.3 196 ]
590 [ B 39 25.0 98.7 10.0 174 ]
592 [ B 39 21.0 93.2 5.9 181 ]
594 [ B 39 18.0 95.0 11.3 66 ]
596 [ B 39 39.0 88.5 7.6 168 ]
600 Probably, the statistical analysis will involve only the blood measures,
602 @c load ("descriptive")$
603 @c s3 : read_matrix (file_search ("biomed.data"))$
604 @c subsample (s3, lambda([v], v[1] = B and v[2] > 38),
608 (%i1) load ("descriptive")$
609 (%i2) s3 : read_matrix (file_search ("biomed.data"))$
611 (%i3) subsample (s3, lambda([v], v[1] = B and v[2] > 38),
613 [ 28.0 102.3 17.1 146 ]
615 [ 21.0 92.4 10.3 197 ]
617 [ 23.0 111.5 10.0 133 ]
619 [ 26.0 92.6 12.3 196 ]
621 [ 25.0 98.7 10.0 174 ]
623 [ 21.0 93.2 5.9 181 ]
625 [ 18.0 95.0 11.3 66 ]
627 [ 39.0 88.5 7.6 168 ]
631 This is the multivariate mean of @code{s3},
633 @c load ("descriptive")$
634 @c s3 : read_matrix (file_search ("biomed.data"))$
638 (%i1) load ("descriptive")$
639 (%i2) s3 : read_matrix (file_search ("biomed.data"))$
643 (%o3) [----------, ---, 87.178, 0.06 NA + 81.44999999999999,
646 18.122999999999998, ------------]
651 Here, the first component is meaningless, since @code{A} and @code{B} are categorical, the second component is the mean age of individuals in rational form, and the fourth and last values exhibit some strange behaviour. This is because symbol @code{NA} is used here to indicate @var{non available} data, and the two means are nonsense. A possible solution would be to take out from the matrix those rows with @code{NA} symbols, although this deserves some loss of information.
653 @c load ("descriptive")$
654 @c s3 : read_matrix (file_search ("biomed.data"))$
655 @c g(v):= v[4] # NA and v[6] # NA $
656 @c mean (subsample (s3, g, 3, 4, 5, 6));
659 (%i1) load ("descriptive")$
660 (%i2) s3 : read_matrix (file_search ("biomed.data"))$
661 (%i3) g(v):= v[4] # NA and v[6] # NA $
663 (%i4) mean (subsample (s3, g, 3, 4, 5, 6));
664 (%o4) [79.4923076923077, 86.2032967032967, 16.93186813186813,
671 @opencatbox{Categories:}
672 @category{Package descriptive}
680 @anchor{transform_sample}
681 @deffn {Function} transform_sample (@var{matrix}, @var{varlist}, @var{exprlist})
683 Transforms the sample @var{matrix}, where each column is called according to
684 @var{varlist}, following expressions in @var{exprlist}.
688 The second argument assigns names to the three columns. With these names,
689 a list of expressions define the transformation of the sample.
692 (%i1) load ("descriptive")$
693 (%i2) data: matrix([3,2,7],[3,7,2],[8,2,4],[5,2,4]) $
695 (%i3) transform_sample(data, [a,b,c], [c, a*b, log(a)]);
706 Add a constant column and remove the third variable.
709 (%i1) load ("descriptive")$
710 (%i2) data: matrix([3,2,7],[3,7,2],[8,2,4],[5,2,4]) $
711 (%i3) transform_sample(data, [a,b,c], [makelist(1,k,length(data)),a,b]);
723 @opencatbox{Categories:}
724 @category{Package descriptive}
734 @node Functions and Variables for descriptive statistics, Functions and Variables for statistical graphs, Functions and Variables for data manipulation, descriptive-pkg
735 @section Functions and Variables for descriptive statistics
740 @deffn {Function} mean @
741 @fname{mean} (@var{list}) @
742 @fname{mean} (@var{matrix})
744 This is the sample mean, defined as
757 $${\bar{x}={1\over{n}}{\sum_{i=1}^{n}{x_{i}}}}$$
763 @c load ("descriptive")$
764 @c s1 : read_list (file_search ("pidigits.data"))$
767 @c s2 : read_matrix (file_search ("wind.data"))$
771 (%i1) load ("descriptive")$
772 (%i2) s1 : read_list (file_search ("pidigits.data"))$
783 (%i5) s2 : read_matrix (file_search ("wind.data"))$
786 (%o6) [9.9485, 10.160700000000004, 10.868499999999997,
787 15.716600000000001, 14.844100000000001]
791 @opencatbox{Categories:}
792 @category{Package descriptive}
799 @deffn {Function} var @
800 @fname{var} (@var{list}) @
801 @fname{var} (@var{matrix})
803 This is the sample variance, defined as
818 $${{1}\over{n}}{\sum_{i=1}^{n}{(x_{i}-\bar{x})^2}}$$
824 @c load ("descriptive")$
825 @c s1 : read_list (file_search ("pidigits.data"))$
829 (%i1) load ("descriptive")$
830 (%i2) s1 : read_list (file_search ("pidigits.data"))$
832 (%i3) var (s1), numer;
833 (%o3) 8.425899999999999
837 See also function @mrefdot{var1}
839 @opencatbox{Categories:}
840 @category{Package descriptive}
847 @deffn {Function} var1 @
848 @fname{var1} (@var{list}) @
849 @fname{var1} (@var{matrix})
851 This is the sample variance, defined as
866 $${{1\over{n-1}}{\sum_{i=1}^{n}{(x_{i}-\bar{x})^2}}}$$
872 @c load ("descriptive")$
873 @c s1 : read_list (file_search ("pidigits.data"))$
875 @c s2 : read_matrix (file_search ("wind.data"))$
879 (%i1) load ("descriptive")$
880 (%i2) s1 : read_list (file_search ("pidigits.data"))$
882 (%i3) var1 (s1), numer;
883 (%o3) 8.5110101010101
885 (%i4) s2 : read_matrix (file_search ("wind.data"))$
888 (%o5) [17.395865404040414, 15.139127787878794,
889 15.632049242424243, 32.50152569696971, 24.669773929292937]
893 See also function @mrefdot{var}
895 @opencatbox{Categories:}
896 @category{Package descriptive}
903 @deffn {Function} std @
904 @fname{std} (@var{list}) @
905 @fname{std} (@var{matrix})
907 This is the square root of the function @code{var}, the variance with denominator @math{n}.
912 @c load ("descriptive")$
913 @c s1 : read_list (file_search ("pidigits.data"))$
915 @c s2 : read_matrix (file_search ("wind.data"))$
919 (%i1) load ("descriptive")$
920 (%i2) s1 : read_list (file_search ("pidigits.data"))$
922 (%i3) std (s1), numer;
923 (%o3) 2.9027400848164135
925 (%i4) s2 : read_matrix (file_search ("wind.data"))$
928 (%o5) [4.149928523480858, 3.8713998127292415,
929 3.9339202775348663, 5.672434260526957, 4.941970881136392]
933 See also functions @mref{var} and @mrefdot{std1}
935 @opencatbox{Categories:}
936 @category{Package descriptive}
943 @deffn {Function} std1 @
944 @fname{std1} (@var{list}) @
945 @fname{std1} (@var{matrix})
947 This is the square root of the function @mrefcomma{var1} the variance with denominator @math{n-1}.
952 @c load ("descriptive")$
953 @c s1 : read_list (file_search ("pidigits.data"))$
955 @c s2 : read_matrix (file_search ("wind.data"))$
959 (%i1) load ("descriptive")$
960 (%i2) s1 : read_list (file_search ("pidigits.data"))$
962 (%i3) std1 (s1), numer;
963 (%o3) 2.917363553109228
965 (%i4) s2 : read_matrix (file_search ("wind.data"))$
968 (%o5) [4.170835096721089, 3.8909032097803196,
969 3.9537386411375555, 5.701010936401517, 4.966867617451963]
973 See also functions @mref{var1} and @mrefdot{std}
975 @opencatbox{Categories:}
976 @category{Package descriptive}
982 @anchor{noncentral_moment}
983 @deffn {Function} noncentral_moment @
984 @fname{noncentral_moment} (@var{list}, @var{k}) @
985 @fname{noncentral_moment} (@var{matrix}, @var{k})
987 The non central moment of order @math{k}, defined as
1002 $${{1\over{n}}{\sum_{i=1}^{n}{x_{i}^k}}}$$
1008 The first noncentral moment is equal to the sample mean.
1010 @c load ("descriptive")$
1011 @c s1 : read_list (file_search ("pidigits.data"))$
1012 @c noncentral_moment (s1, 1), numer;
1013 @c mean (s1), numer;
1016 (%i1) load ("descriptive")$
1017 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1019 (%i3) noncentral_moment (s1, 1), numer;
1023 (%i4) mean (s1), numer;
1029 Calculation of the fifth noncentral moment for each column.
1031 @c load ("descriptive")$
1032 @c s2 : read_matrix (file_search ("wind.data"))$
1033 @c noncentral_moment (s2, 5);
1036 (%i1) load ("descriptive")$
1037 (%i2) s2 : read_matrix (file_search ("wind.data"))$
1039 (%i3) noncentral_moment (s2, 5);
1040 (%o3) [319793.87247615046, 320532.19238924625,
1041 391249.56213815557, 2502278.205988911, 1691881.7977422548]
1045 See also function @mrefdot{central_moment}
1047 @opencatbox{Categories:}
1048 @category{Package descriptive}
1054 @anchor{central_moment}
1055 @deffn {Function} central_moment @
1056 @fname{central_moment} (@var{list}, @var{k}) @
1057 @fname{central_moment} (@var{matrix}, @var{k})
1059 The central moment of order @math{k}, defined as
1074 $${{1\over{n}}{\sum_{i=1}^{n}{(x_{i}-\bar{x})^k}}}$$
1080 The second central moment is equal to the sample variance.
1082 @c load ("descriptive")$
1083 @c s1 : read_list (file_search ("pidigits.data"))$
1084 @c central_moment (s1, 2), numer;
1088 (%i1) load ("descriptive")$
1089 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1091 (%i3) central_moment (s1, 2), numer;
1092 (%o3) 8.425899999999999
1095 (%i4) var (s1), numer;
1096 (%o4) 8.425899999999999
1101 Calculation of the third central moment.
1103 @c load ("descriptive")$
1104 @c s2 : read_matrix (file_search ("wind.data"))$
1105 @c central_moment (s2, 3);
1108 (%i1) load ("descriptive")$
1109 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1111 (%i3) central_moment (s1, 2), numer; /* the variance */
1112 (%o3) 8.425899999999999
1114 (%i5) s2 : read_matrix (file_search ("wind.data"))$
1116 (%i6) central_moment (s2, 3);
1117 (%o6) [11.29584771375004, 16.97988248298583, 5.626661952750102,
1118 37.5986572057918, 25.85981904394192]
1122 See also functions @mref{central_moment} and @mrefdot{mean}
1124 @opencatbox{Categories:}
1125 @category{Package descriptive}
1132 @deffn {Function} cv @
1133 @fname{cv} (@var{list}) @
1134 @fname{cv} (@var{matrix})
1136 The variation coefficient is the quotient between the sample standard deviation (@mref{std}) and the @mrefcomma{mean}
1139 @c load ("descriptive")$
1140 @c s1 : read_list (file_search ("pidigits.data"))$
1142 @c s2 : read_matrix (file_search ("wind.data"))$
1146 (%i1) load ("descriptive")$
1147 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1149 (%i3) cv (s1), numer;
1150 (%o3) 0.6162930116383044
1152 (%i4) s2 : read_matrix (file_search ("wind.data"))$
1155 (%o5) [0.4171411291632767, 0.38101703748061055,
1156 0.3619561372346568, 0.3609199356430116, 0.3329249251309538]
1160 See also functions @mref{std} and @mrefdot{mean}
1162 @opencatbox{Categories:}
1163 @category{Package descriptive}
1170 @deffn {Function} smin @
1171 @fname{smin} (@var{list}) @
1172 @fname{smin} (@var{matrix})
1174 This is the minimum value of the sample @var{list}.
1175 When the argument is a matrix, @mref{smin} returns
1176 a list containing the minimum values of the columns,
1177 which are associated to statistical variables.
1180 @c load ("descriptive")$
1181 @c s1 : read_list (file_search ("pidigits.data"))$
1183 @c s2 : read_matrix (file_search ("wind.data"))$
1187 (%i1) load ("descriptive")$
1188 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1193 (%i4) s2 : read_matrix (file_search ("wind.data"))$
1196 (%o5) [0.58, 0.5, 2.67, 5.25, 5.17]
1200 See also function @mrefdot{smax}
1202 @opencatbox{Categories:}
1203 @category{Package descriptive}
1210 @deffn {Function} smax @
1211 @fname{smax} (@var{list}) @
1212 @fname{smax} (@var{matrix})
1214 This is the maximum value of the sample @var{list}.
1215 When the argument is a matrix, @mref{smax} returns
1216 a list containing the maximum values of the columns,
1217 which are associated to statistical variables.
1220 @c load ("descriptive")$
1221 @c s1 : read_list (file_search ("pidigits.data"))$
1223 @c s2 : read_matrix (file_search ("wind.data"))$
1227 (%i1) load ("descriptive")$
1228 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1233 (%i4) s2 : read_matrix (file_search ("wind.data"))$
1236 (%o5) [20.25, 21.46, 20.04, 29.63, 27.63]
1240 See also function @mrefdot{smin}
1242 @opencatbox{Categories:}
1243 @category{Package descriptive}
1250 @deffn {Function} range @
1251 @fname{range} (@var{list}) @
1252 @fname{range} (@var{matrix})
1254 The range is the difference between the extreme values.
1259 @c load ("descriptive")$
1260 @c s1 : read_list (file_search ("pidigits.data"))$
1262 @c s2 : read_matrix (file_search ("wind.data"))$
1266 (%i1) load ("descriptive")$
1267 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1272 (%i4) s2 : read_matrix (file_search ("wind.data"))$
1275 (%o5) [19.67, 20.96, 17.369999999999997, 24.38, 22.46]
1279 @opencatbox{Categories:}
1280 @category{Package descriptive}
1287 @deffn {Function} quantile @
1288 @fname{quantile} (@var{list}, @var{p}) @
1289 @fname{quantile} (@var{matrix}, @var{p})
1291 This is the @var{p}-quantile, with @var{p} a number in @math{[0, 1]}, of the sample @var{list}.
1292 Although there are several definitions for the sample quantile (Hyndman, R. J., Fan, Y. (1996) @var{Sample quantiles in statistical packages}. American Statistician, 50, 361-365), the one based on linear interpolation is implemented in package @ref{descriptive-pkg}
1296 Input is a list. First and third quartiles are computed.
1299 @c load ("descriptive")$
1300 @c s1 : read_list (file_search ("pidigits.data"))$
1301 @c [quantile (s1, 1/4), quantile (s1, 3/4)], numer;
1304 (%i1) load ("descriptive")$
1305 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1307 (%i3) [quantile (s1, 1/4), quantile (s1, 3/4)], numer;
1312 Input is a matrix. First quartile is computed for each column.
1315 @c load ("descriptive")$
1316 @c s2 : read_matrix (file_search ("wind.data"))$
1317 @c quantile (s2, 1/4);
1320 (%i1) load ("descriptive")$
1321 (%i2) s2 : read_matrix (file_search ("wind.data"))$
1323 (%i3) quantile (s2, 1/4);
1324 (%o3) [7.2575, 7.477500000000001, 7.82, 11.28, 11.48]
1328 @opencatbox{Categories:}
1329 @category{Package descriptive}
1336 @deffn {Function} median @
1337 @fname{median} (@var{list}) @
1338 @fname{median} (@var{matrix})
1340 Once the sample is ordered, if the sample size is odd the median is the central value, otherwise it is the mean of the two central values.
1345 @c load ("descriptive")$
1346 @c s1 : read_list (file_search ("pidigits.data"))$
1348 @c s2 : read_matrix (file_search ("wind.data"))$
1352 (%i1) load ("descriptive")$
1353 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1360 (%i4) s2 : read_matrix (file_search ("wind.data"))$
1363 (%o5) [10.059999999999999, 9.855, 10.73, 15.48, 14.105]
1367 The median is the 1/2-quantile.
1369 See also function @mrefdot{quantile}
1371 @opencatbox{Categories:}
1372 @category{Package descriptive}
1379 @deffn {Function} qrange @
1380 @fname{qrange} (@var{list}) @
1381 @fname{qrange} (@var{matrix})
1383 The interquartilic range is the difference between the third and first quartiles, @code{quantile(@var{list},3/4) - quantile(@var{list},1/4)},
1386 @c load ("descriptive")$
1387 @c s1 : read_list (file_search ("pidigits.data"))$
1389 @c s2 : read_matrix (file_search ("wind.data"))$
1393 (%i1) load ("descriptive")$
1394 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1401 (%i4) s2 : read_matrix (file_search ("wind.data"))$
1404 (%o5) [5.385, 5.572499999999998, 6.022500000000001,
1405 8.729999999999999, 6.649999999999999]
1409 See also function @mrefdot{quantile}
1411 @opencatbox{Categories:}
1412 @category{Package descriptive}
1418 @anchor{mean_deviation}
1419 @deffn {Function} mean_deviation @
1420 @fname{mean_deviation} (@var{list}) @
1421 @fname{mean_deviation} (@var{matrix})
1423 The mean deviation, defined as
1438 $${{1\over{n}}{\sum_{i=1}^{n}{|x_{i}-\bar{x}|}}}$$
1444 @c load ("descriptive")$
1445 @c s1 : read_list (file_search ("pidigits.data"))$
1446 @c mean_deviation (s1);
1447 @c s2 : read_matrix (file_search ("wind.data"))$
1448 @c mean_deviation (s2);
1451 (%i1) load ("descriptive")$
1452 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1454 (%i3) mean_deviation (s1);
1459 (%i4) s2 : read_matrix (file_search ("wind.data"))$
1461 (%i5) mean_deviation (s2);
1462 (%o5) [3.2879599999999987, 3.075342, 3.2390700000000003,
1463 4.715664000000001, 4.028546000000002]
1467 See also function @mrefdot{mean}
1469 @opencatbox{Categories:}
1470 @category{Package descriptive}
1476 @anchor{median_deviation}
1477 @deffn {Function} median_deviation @
1478 @fname{median_deviation} (@var{list}) @
1479 @fname{median_deviation} (@var{matrix})
1481 The median deviation, defined as
1496 $${{1\over{n}}{\sum_{i=1}^{n}{|x_{i}-med|}}}$$
1498 where @code{med} is the median of @var{list}.
1503 @c load ("descriptive")$
1504 @c s1 : read_list (file_search ("pidigits.data"))$
1505 @c median_deviation (s1);
1506 @c s2 : read_matrix (file_search ("wind.data"))$
1507 @c median_deviation (s2);
1510 (%i1) load ("descriptive")$
1511 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1513 (%i3) median_deviation (s1);
1518 (%i4) s2 : read_matrix (file_search ("wind.data"))$
1520 (%i5) median_deviation (s2);
1521 (%o5) [2.75, 2.7550000000000003, 3.08, 4.315, 3.3099999999999996]
1525 See also function @mrefdot{mean}
1527 @opencatbox{Categories:}
1528 @category{Package descriptive}
1534 @anchor{harmonic_mean}
1535 @deffn {Function} harmonic_mean @
1536 @fname{harmonic_mean} (@var{list}) @
1537 @fname{harmonic_mean} (@var{matrix})
1539 The harmonic mean, defined as
1556 $${{n}\over{\sum_{i=1}^{n}{{{1}\over{x_{i}}}}}}$$
1562 @c load ("descriptive")$
1563 @c y : [5, 7, 2, 5, 9, 5, 6, 4, 9, 2, 4, 2, 5]$
1564 @c harmonic_mean (y), numer;
1565 @c s2 : read_matrix (file_search ("wind.data"))$
1566 @c harmonic_mean (s2);
1569 (%i1) load ("descriptive")$
1570 (%i2) y : [5, 7, 2, 5, 9, 5, 6, 4, 9, 2, 4, 2, 5]$
1572 (%i3) harmonic_mean (y), numer;
1573 (%o3) 3.9018580276322052
1575 (%i4) s2 : read_matrix (file_search ("wind.data"))$
1577 (%i5) harmonic_mean (s2);
1578 (%o5) [6.948015590052786, 7.391967752360356, 9.055658197151745,
1579 13.441990281936924, 13.01439145898509]
1583 See also functions @mref{mean} and @mrefdot{geometric_mean}
1585 @opencatbox{Categories:}
1586 @category{Package descriptive}
1593 @anchor{geometric_mean}
1594 @deffn {Function} geometric_mean @
1595 @fname{geometric_mean} (@var{list}) @
1596 @fname{geometric_mean} (@var{matrix})
1598 The geometric mean, defined as
1613 $$\left(\prod_{i=1}^{n}{x_{i}}\right)^{{{1}\over{n}}}$$
1619 @c load ("descriptive")$
1620 @c y : [5, 7, 2, 5, 9, 5, 6, 4, 9, 2, 4, 2, 5]$
1621 @c geometric_mean (y), numer;
1622 @c s2 : read_matrix (file_search ("wind.data"))$
1623 @c geometric_mean (s2);
1626 (%i1) load ("descriptive")$
1627 (%i2) y : [5, 7, 2, 5, 9, 5, 6, 4, 9, 2, 4, 2, 5]$
1629 (%i3) geometric_mean (y), numer;
1630 (%o3) 4.454845412337012
1632 (%i4) s2 : read_matrix (file_search ("wind.data"))$
1634 (%i5) geometric_mean (s2);
1635 (%o5) [8.82476274347979, 9.22652604739361, 10.044267571488904,
1636 14.612741263490207, 13.96184163444275]
1640 See also functions @mref{mean} and @mrefdot{harmonic_mean}
1642 @opencatbox{Categories:}
1643 @category{Package descriptive}
1650 @deffn {Function} kurtosis @
1651 @fname{kurtosis} (@var{list}) @
1652 @fname{kurtosis} (@var{matrix})
1654 The kurtosis coefficient, defined as
1669 $${{1\over{n s^4}}{\sum_{i=1}^{n}{(x_{i}-\bar{x})^4}}-3}$$
1675 @c load ("descriptive")$
1676 @c s1 : read_list (file_search ("pidigits.data"))$
1677 @c kurtosis (s1), numer;
1678 @c s2 : read_matrix (file_search ("wind.data"))$
1682 (%i1) load ("descriptive")$
1683 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1685 (%i3) kurtosis (s1), numer;
1686 (%o3) - 1.273247946514421
1688 (%i4) s2 : read_matrix (file_search ("wind.data"))$
1690 (%i5) kurtosis (s2);
1691 (%o5) [- 0.2715445622195385, 0.119998784429451,
1692 - 0.42752334904828615, - 0.6405361979019522,
1693 - 0.4952382132352935]
1697 See also functions @mrefcomma{mean} @mref{var} and @mrefdot{skewness}
1699 @opencatbox{Categories:}
1700 @category{Package descriptive}
1707 @deffn {Function} skewness @
1708 @fname{skewness} (@var{list}) @
1709 @fname{skewness} (@var{matrix})
1711 The skewness coefficient, defined as
1726 $${{1\over{n s^3}}{\sum_{i=1}^{n}{(x_{i}-\bar{x})^3}}}$$
1732 @c load ("descriptive")$
1733 @c s1 : read_list (file_search ("pidigits.data"))$
1734 @c skewness (s1), numer;
1735 @c s2 : read_matrix (file_search ("wind.data"))$
1739 (%i1) load ("descriptive")$
1740 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1742 (%i3) skewness (s1), numer;
1743 (%o3) 0.009196180476450424
1745 (%i4) s2 : read_matrix (file_search ("wind.data"))$
1747 (%i5) skewness (s2);
1748 (%o5) [0.1580509020000978, 0.2926379232061854,
1749 0.09242174416107717, 0.20599843481486865, 0.21425202488908313]
1753 See also functions @mrefcomma{mean}, @mref{var} and @mrefdot{kurtosis}
1755 @opencatbox{Categories:}
1756 @category{Package descriptive}
1762 @anchor{pearson_skewness}
1763 @deffn {Function} pearson_skewness @
1764 @fname{pearson_skewness} (@var{list}) @
1765 @fname{pearson_skewness} (@var{matrix})
1767 Pearson's skewness coefficient, defined as
1779 $${{3\,\left(\bar{x}-med\right)}\over{s}}$$
1781 where @var{med} is the median of @var{list}.
1786 @c load ("descriptive")$
1787 @c s1 : read_list (file_search ("pidigits.data"))$
1788 @c pearson_skewness (s1), numer;
1789 @c s2 : read_matrix (file_search ("wind.data"))$
1790 @c pearson_skewness (s2);
1793 (%i1) load ("descriptive")$
1794 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1796 (%i3) pearson_skewness (s1), numer;
1797 (%o3) 0.21594840290938955
1799 (%i4) s2 : read_matrix (file_search ("wind.data"))$
1801 (%i5) pearson_skewness (s2);
1802 (%o5) [- 0.08019976629211892, 0.2357036272952649,
1803 0.10509040624912039, 0.12450423405923679, 0.44641817958045193]
1807 See also functions @mrefcomma{mean} @mref{var} and @mrefdot{median}
1809 @opencatbox{Categories:}
1810 @category{Package descriptive}
1816 @anchor{quartile_skewness}
1817 @deffn {Function} quartile_skewness @
1818 @fname{quartile_skewness} (@var{list}) @
1819 @fname{quartile_skewness} (@var{matrix})
1821 The quartile skewness coefficient, defined as
1827 --------------------
1834 $${{c_{{{3}\over{4}}}-2\,c_{{{1}\over{2}}}+c_{{{1}\over{4}}}}\over{c
1835 _{{{3}\over{4}}}-c_{{{1}\over{4}}}}}$$
1837 where @math{c_p} is the @var{p}-quantile of sample @var{list}.
1842 @c load ("descriptive")$
1843 @c s1 : read_list (file_search ("pidigits.data"))$
1844 @c quartile_skewness (s1), numer;
1845 @c s2 : read_matrix (file_search ("wind.data"))$
1846 @c quartile_skewness (s2);
1849 (%i1) load ("descriptive")$
1850 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1852 (%i3) quartile_skewness (s1), numer;
1853 (%o3) 0.047619047619047616
1855 (%i4) s2 : read_matrix (file_search ("wind.data"))$
1857 (%i5) quartile_skewness (s2);
1858 (%o5) [- 0.040854224698235304, 0.14670255720053824,
1859 0.033623910336239196, 0.03780068728522298, 0.2105263157894735]
1863 See also function @mrefdot{quantile}
1865 @opencatbox{Categories:}
1866 @category{Package descriptive}
1873 @deffn {Function} km @
1874 @fname{km} (@var{list}, @var{option} ...) @
1875 @fname{km} (@var{matrix}, @var{option} ...)
1877 Kaplan Meier estimator of the survival, or reliability, function @math{S(x)=1-F(x)}.
1879 Data can be introduced as a list of pairs, or as a two column matrix. The first
1880 component is the observed time, and the second component a censoring index
1881 (1 = non censored, 0 = right censored).
1883 The optional argument is the name of the variable in the returned expression,
1884 which is @var{x} by default.
1888 Sample as a list of pairs.
1891 @c load ("descriptive")$
1892 @c S: km([[2,1], [3,1], [5,0], [8,1]]);
1895 @c line_width = 3, grid = true,
1896 @c explicit(S, x, -0.1, 10))$
1899 (%i1) load ("descriptive")$
1901 (%i2) S: km([[2,1], [3,1], [5,0], [8,1]]);
1902 charfun((3 <= x) and (x < 8))
1903 (%o2) charfun(x < 0) + -----------------------------
1905 3 charfun((2 <= x) and (x < 3))
1906 + -------------------------------
1908 + charfun((0 <= x) and (x < 2))
1910 (%i3) load ("draw")$
1913 line_width = 3, grid = true,
1914 explicit(S, x, -0.1, 10))$
1918 Estimate survival probabilities.
1921 @c load ("descriptive")$
1922 @c S(t):= ''(km([[2,1], [3,1], [5,0], [8,1]], t)) $
1926 (%i1) load ("descriptive")$
1927 (%i2) S(t):= ''(km([[2,1], [3,1], [5,0], [8,1]], t)) $
1936 @opencatbox{Categories:}
1937 @category{Package descriptive}
1943 @anchor{cdf_empirical}
1944 @deffn {Function} cdf_empirical @
1945 @fname{cdf_empirical} (@var{list}, @var{option} ...) @
1946 @fname{cdf_empirical} (@var{matrix}, @var{option} ...)
1948 Empirical distribution function @math{F(x)}.
1950 Data can be introduced as a list of numbers, or as an one column matrix.
1952 The optional argument is the name of the variable in the returned expression,
1953 which is @var{x} by default.
1957 Empirical distribution function.
1960 @c load ("descriptive")$
1961 @c F(x):= ''(cdf_empirical([1,3,3,5,7,7,7,8,9]));
1967 @c explicit(F(z), z, -2, 12)) $
1970 (%i1) load ("descriptive")$
1972 (%i2) F(x):= ''(cdf_empirical([1,3,3,5,7,7,7,8,9]));
1973 (%o2) F(x) := (charfun(x >= 9) + charfun(x >= 8)
1974 + 3 charfun(x >= 7) + charfun(x >= 5) + 2 charfun(x >= 3)
1975 + charfun(x >= 1))/9
1988 explicit(F(z), z, -2, 12)) $
1992 @opencatbox{Categories:}
1993 @category{Package descriptive}
2000 @deffn {Function} cov (@var{matrix})
2001 The covariance matrix of the multivariate sample, defined as
2008 S = - > (X - X) (X - X)'
2016 $${S={1\over{n}}{\sum_{j=1}^{n}{\left(X_{j}-\bar{X}\right)\,\left(X_{j}-\bar{X}\right)'}}}$$
2018 where @math{X_j} is the @math{j}-th row of the sample matrix.
2023 @c load ("descriptive")$
2024 @c s2 : read_matrix (file_search ("wind.data"))$
2029 (%i1) load ("descriptive")$
2030 (%i2) s2 : read_matrix (file_search ("wind.data"))$
2031 (%i3) fpprintprec : 7$
2034 [ 17.22191 13.61811 14.37217 19.39624 15.42162 ]
2036 [ 13.61811 14.98774 13.30448 15.15834 14.9711 ]
2038 (%o4) [ 14.37217 13.30448 15.47573 17.32544 16.18171 ]
2040 [ 19.39624 15.15834 17.32544 32.17651 20.44685 ]
2042 [ 15.42162 14.9711 16.18171 20.44685 24.42308 ]
2046 See also function @mrefdot{cov1}
2048 @opencatbox{Categories:}
2049 @category{Package descriptive}
2056 @deffn {Function} cov1 (@var{matrix})
2057 The covariance matrix of the multivariate sample, defined as
2064 S = --- > (X - X) (X - X)'
2072 $${{1\over{n-1}}{\sum_{j=1}^{n}{\left(X_{j}-\bar{X}\right)\,\left(X_{j}-\bar{X}\right)'}}}$$
2074 where @math{X_j} is the @math{j}-th row of the sample matrix.
2079 @c load ("descriptive")$
2080 @c s2 : read_matrix (file_search ("wind.data"))$
2085 (%i1) load ("descriptive")$
2086 (%i2) s2 : read_matrix (file_search ("wind.data"))$
2087 (%i3) fpprintprec : 7$
2090 [ 17.39587 13.75567 14.51734 19.59216 15.5774 ]
2092 [ 13.75567 15.13913 13.43887 15.31145 15.12232 ]
2094 (%o4) [ 14.51734 13.43887 15.63205 17.50044 16.34516 ]
2096 [ 19.59216 15.31145 17.50044 32.50153 20.65338 ]
2098 [ 15.5774 15.12232 16.34516 20.65338 24.66977 ]
2102 See also function @mrefdot{cov}
2104 @opencatbox{Categories:}
2105 @category{Package descriptive}
2111 @anchor{global_variances}
2112 @deffn {Function} global_variances @
2113 @fname{global_variances} (@var{matrix}) @
2114 @fname{global_variances} (@var{matrix}, @var{options} ...)
2116 Function @code{global_variances} returns a list of global variance measures:
2120 @var{total variance}: @code{trace(S_1)},
2122 @var{mean variance}: @code{trace(S_1)/p},
2124 @var{generalized variance}: @code{determinant(S_1)},
2126 @var{generalized standard deviation}: @code{sqrt(determinant(S_1))},
2128 @var{effective variance} @code{determinant(S_1)^(1/p)}, (defined in: Pe@~na, D. (2002) @var{An@'alisis de datos multivariantes}; McGraw-Hill, Madrid.)
2130 @var{effective standard deviation}: @code{determinant(S_1)^(1/(2*p))}.
2132 where @var{p} is the dimension of the multivariate random variable and @math{S_1} the covariance matrix returned by @code{cov1}.
2138 @code{'data}, default @code{'true}, indicates whether the input matrix contains the sample data,
2139 in which case the covariance matrix @code{cov1} must be calculated, or not, and then the covariance
2140 matrix (symmetric) must be given, instead of the data.
2146 @c load ("descriptive")$
2147 @c s2 : read_matrix (file_search ("wind.data"))$
2148 @c global_variances (s2);
2151 (%i1) load ("descriptive")$
2152 (%i2) s2 : read_matrix (file_search ("wind.data"))$
2154 (%i3) global_variances (s2);
2155 (%o3) [105.33834206060595, 21.06766841212119, 12874.34690469686,
2156 113.46517926085015, 6.636590811800794, 2.5761581496097623]
2160 Calculate the @code{global_variances} from the covariance matrix.
2163 @c load ("descriptive")$
2164 @c s2 : read_matrix (file_search ("wind.data"))$
2166 @c global_variances (s, data=false);
2169 (%i1) load ("descriptive")$
2170 (%i2) s2 : read_matrix (file_search ("wind.data"))$
2171 (%i3) s : cov1 (s2)$
2173 (%i4) global_variances (s, data=false);
2174 (%o4) [105.33834206060595, 21.06766841212119, 12874.34690469686,
2175 113.46517926085015, 6.636590811800794, 2.5761581496097623]
2179 See also @mref{cov} and @mrefdot{cov1}
2181 @opencatbox{Categories:}
2182 @category{Package descriptive}
2189 @deffn {Function} cor @
2190 @fname{cor} (@var{matrix}) @
2191 @fname{cor} (@var{matrix}, @var{logical_value})
2193 The correlation matrix of the multivariate sample.
2199 @code{'data}, default @code{'true}, indicates whether the input matrix contains the sample data,
2200 in which case the covariance matrix @code{cov1} must be calculated, or not, and then the covariance
2201 matrix (symmetric) must be given, instead of the data.
2207 @c load ("descriptive")$
2208 @c fpprintprec : 7 $
2209 @c s2 : read_matrix (file_search ("wind.data"))$
2213 (%i1) load ("descriptive")$
2214 (%i2) fpprintprec : 7 $
2215 (%i3) s2 : read_matrix (file_search ("wind.data"))$
2218 [ 1.0 0.8476339 0.8803515 0.8239624 0.7519506 ]
2220 [ 0.8476339 1.0 0.8735834 0.6902622 0.782502 ]
2222 (%o4) [ 0.8803515 0.8735834 1.0 0.7764065 0.8323358 ]
2224 [ 0.8239624 0.6902622 0.7764065 1.0 0.7293848 ]
2226 [ 0.7519506 0.782502 0.8323358 0.7293848 1.0 ]
2230 Calculate de correlation matrix from the covariance matrix.
2233 @c load ("descriptive")$
2234 @c fpprintprec : 7 $
2235 @c s2 : read_matrix (file_search ("wind.data"))$
2237 @c cor (s, data=false); /* this is faster */
2240 (%i1) load ("descriptive")$
2241 (%i2) fpprintprec : 7 $
2242 (%i3) s2 : read_matrix (file_search ("wind.data"))$
2243 (%i4) s : cov1 (s2)$
2245 (%i5) cor (s, data=false); /* this is faster */
2246 [ 1.0 0.8476339 0.8803515 0.8239624 0.7519506 ]
2248 [ 0.8476339 1.0 0.8735834 0.6902622 0.782502 ]
2250 (%o5) [ 0.8803515 0.8735834 1.0 0.7764065 0.8323358 ]
2252 [ 0.8239624 0.6902622 0.7764065 1.0 0.7293848 ]
2254 [ 0.7519506 0.782502 0.8323358 0.7293848 1.0 ]
2258 See also @mref{cov} and @mrefdot{cov1}
2260 @opencatbox{Categories:}
2261 @category{Package descriptive}
2267 @anchor{list_correlations}
2268 @deffn {Function} list_correlations @
2269 @fname{list_correlations} (@var{matrix}) @
2270 @fname{list_correlations} (@var{matrix}, @var{options} ...)
2272 Function @code{list_correlations} returns a list of correlation measures:
2277 @var{precision matrix}: the inverse of the covariance matrix @math{S_1},
2288 $${S_{1}^{-1}}={\left(s^{ij}\right)_{i,j=1,2,\ldots, p}}$$
2292 @var{multiple correlation vector}: @math{(R_1^2, R_2^2, ..., R_p^2)}, with
2305 $${R_{i}^{2}}={1-{{1}\over{s^{ii}s_{ii}}}}$$
2307 being an indicator of the goodness of fit of the linear multivariate regression model on @math{X_i} when the rest of variables are used as regressors.
2310 @var{partial correlation matrix}: with element @math{(i, j)} being
2317 ij.rest / ii jj\ 1/2
2324 $${r_{ij.rest}}={-{{s^{ij}}\over \sqrt{s^{ii}s^{jj}}}}$$
2333 @code{'data}, default @code{'true}, indicates whether the input matrix contains the sample data,
2334 in which case the covariance matrix @code{cov1} must be calculated, or not, and then the covariance
2335 matrix (symmetric) must be given, instead of the data.
2341 @c load ("descriptive")$
2342 @c s2 : read_matrix (file_search ("wind.data"))$
2343 @c z : list_correlations (s2)$
2345 @c precision_matrix: z[1];
2346 @c multiple_correlation_vector: z[2];
2347 @c partial_correlation_matrix: z[3];
2350 (%i1) load ("descriptive")$
2351 (%i2) s2 : read_matrix (file_search ("wind.data"))$
2352 (%i3) z : list_correlations (s2)$
2353 (%i4) fpprintprec : 5$
2355 (%i5) precision_matrix: z[1];
2357 [ 0.38486 - 0.13856 - 0.15626 - 0.10239 0.031179 ]
2359 [ - 0.13856 0.34107 - 0.15233 0.038447 - 0.052842 ]
2361 [ - 0.15626 - 0.15233 0.47296 - 0.024816 - 0.10054 ]
2363 [ - 0.10239 0.038447 - 0.024816 0.10937 - 0.034033 ]
2365 [ 0.031179 - 0.052842 - 0.10054 - 0.034033 0.14834 ]
2368 (%i6) multiple_correlation_vector: z[2];
2369 (%o6) [0.85063, 0.80634, 0.86474, 0.71867, 0.72675]
2372 (%i7) partial_correlation_matrix: z[3];
2373 [ - 1.0 0.38244 0.36627 0.49908 - 0.13049 ]
2375 [ 0.38244 - 1.0 0.37927 - 0.19907 0.23492 ]
2377 (%o7) [ 0.36627 0.37927 - 1.0 0.10911 0.37956 ]
2379 [ 0.49908 - 0.19907 0.10911 - 1.0 0.26719 ]
2381 [ - 0.13049 0.23492 0.37956 0.26719 - 1.0 ]
2385 See also @mref{cov} and @mrefdot{cov1}
2387 @opencatbox{Categories:}
2388 @category{Package descriptive}
2395 @anchor{principal_components}
2396 @deffn {Function} principal_components @
2397 @fname{principal_components} (@var{matrix}) @
2398 @fname{principal_components} (@var{matrix}, @var{options} ...)
2400 Calculates the principal components of a multivariate sample. Principal components are
2401 used in multivariate statistical analysis to reduce the dimensionality of the sample.
2407 @code{'data}, default @code{'true}, indicates whether the input matrix contains the sample data,
2408 in which case the covariance matrix @mref{cov1} must be calculated, or not, and then the covariance
2409 matrix (symmetric) must be given, instead of the data.
2412 The output of function @code{principal_components} is a list with the following results:
2416 variances of the principal components,
2418 percentage of total variance explained by each principal component,
2425 In this sample, the first component explains 83.13 per cent of total
2429 (%i1) load ("descriptive")$
2430 (%i2) s2 : read_matrix (file_search ("wind.data"))$
2431 (%i3) fpprintprec:4 $
2432 (%i4) res: principal_components(s2);
2433 0 errors, 0 warnings
2434 (%o4) [[87.57, 8.753, 5.515, 1.889, 1.613],
2435 [83.13, 8.31, 5.235, 1.793, 1.531],
2437 [ .4149 .03379 - .4757 - 0.581 - .5126 ]
2439 [ 0.369 - .3657 - .4298 .7237 - .1469 ]
2441 [ .3959 - .2178 - .2181 - .2749 .8201 ]]
2443 [ .5548 .7744 .1857 .2319 .06498 ]
2445 [ .4765 - .4669 0.712 - .09605 - .1969 ]
2447 (%i5) /* accumulated percentages */
2448 block([ap: copy(res[2])],
2449 for k:2 thru length(ap) do ap[k]: ap[k]+ap[k-1],
2451 (%o5) [83.13, 91.44, 96.68, 98.47, 100.0]
2452 (%i6) /* sample dimension */
2453 p: length(first(res));
2455 (%i7) /* plot percentages to select number of
2456 principal components for further work */
2459 apply(bars, makelist([k, res[2][k], 1/2], k, p)),
2460 points_joined = true,
2461 point_type = filled_circle,
2463 points(makelist([k, res[2][k]], k, p)),
2464 xlabel = "Variances",
2465 ylabel = "Percentages",
2466 xtics = setify(makelist([concat("PC",k),k], k, p))) $
2469 In case de covariance matrix is known, it can be passed to the function,
2470 but option @code{data=false} must be used.
2473 (%i1) load ("descriptive")$
2474 (%i2) S: matrix([1,-2,0],[-2,5,0],[0,0,2]);
2480 (%i3) fpprintprec:4 $
2481 (%i4) /* the argument is a covariance matrix */
2482 res: principal_components(S, data=false);
2483 0 errors, 0 warnings
2484 [ - .3827 0.0 .9239 ]
2486 (%o4) [[5.828, 2.0, .1716], [72.86, 25.0, 2.145], [ .9239 0.0 .3827 ]]
2489 (%i5) /* transformation to get the principal components
2490 from original records */
2491 matrix([a1,b2,c3],[a2,b2,c2]).last(res);
2492 [ .9239 b2 - .3827 a1 1.0 c3 .3827 b2 + .9239 a1 ]
2494 [ .9239 b2 - .3827 a2 1.0 c2 .3827 b2 + .9239 a2 ]
2497 @opencatbox{Categories:}
2498 @category{Package descriptive}
2504 @node Functions and Variables for statistical graphs, , Functions and Variables for descriptive statistics, descriptive-pkg
2505 @section Functions and Variables for statistical graphs
2510 @deffn {Function} barsplot (@var{data1}, @var{data2}, @dots{}, @var{option_1}, @var{option_2}, @dots{})
2512 Plots bars diagrams for discrete statistical variables,
2513 both for one or multiple samples.
2515 @var{data} can be a list of outcomes representing one sample, or a
2516 matrix of @var{m} rows and @var{n} columns, representing @var{n} samples of size
2519 Available options are:
2524 @var{box_width} (default, @code{3/4}): relative width of rectangles. This
2525 value must be in the range @code{[0,1]}.
2528 @var{grouping} (default, @code{clustered}): indicates how multiple samples are
2529 shown. Valid values are: @code{clustered} and @code{stacked}.
2532 @var{groups_gap} (default, @code{1}): a positive integer number representing
2533 the gap between two consecutive groups of bars.
2536 @var{bars_colors} (default, @code{[]}): a list of colors for multiple samples.
2537 When there are more samples than specified colors, the extra necessary colors
2538 are chosen at random. See @code{color} to learn more about them.
2541 @var{frequency} (default, @code{absolute}): indicates the scale of the
2542 ordinates. Possible values are: @code{absolute}, @code{relative},
2546 @var{ordering} (default, @code{orderlessp}): possible values are @code{orderlessp} or @code{ordergreatp},
2547 indicating how statistical outcomes should be ordered on the @var{x}-axis.
2550 @var{sample_keys} (default, @code{[]}): a list with the strings to be used in the legend.
2551 When the list length is other than 0 or the number of samples, an error message is returned.
2554 @var{start_at} (default, @code{0}): indicates where the plot begins to be plotted on the
2558 All global @code{draw} options, except @code{xtics}, which is
2559 internally assigned by @code{barsplot}.
2560 If you want to set your own values for this option or want to build
2561 complex scenes, make use of @code{barsplot_description}. See example below.
2564 The following local @ref{draw-pkg} options: @mrefcomma{key} @mrefcomma{color_draw}
2565 @mrefcomma{fill_color} @mref{fill_density} and @mrefdot{line_width}
2571 There is also a function @code{wxbarsplot} for creating embedded
2572 histograms in interfaces wxMaxima and iMaxima. @code{barsplot} in a
2577 Univariate sample in matrix form. Absolute frequencies.
2580 @c load ("descriptive")$
2581 @c m : read_matrix (file_search ("biomed.data"))$
2585 @c xlabel = "years",
2587 @c fill_density = 3/4)$
2590 (%i1) load ("descriptive")$
2591 (%i2) m : read_matrix (file_search ("biomed.data"))$
2598 fill_density = 3/4)$
2602 Two samples of different sizes, with
2603 relative frequencies and user declared colors.
2606 @c load ("descriptive")$
2607 @c l1:makelist(random(10),k,1,50)$
2608 @c l2:makelist(random(10),k,1,100)$
2612 @c fill_density = 1,
2613 @c bars_colors = [black, grey],
2614 @c frequency = relative,
2615 @c sample_keys = ["A", "B"])$
2618 (%i1) load ("descriptive")$
2619 (%i2) l1:makelist(random(10),k,1,50)$
2620 (%i3) l2:makelist(random(10),k,1,100)$
2626 bars_colors = [black, grey],
2627 frequency = relative,
2628 sample_keys = ["A", "B"])$
2632 Four non numeric samples of equal size.
2635 @c load ("descriptive")$
2637 @c makelist([Yes, No, Maybe][random(3)+1],k,1,50),
2638 @c makelist([Yes, No, Maybe][random(3)+1],k,1,50),
2639 @c makelist([Yes, No, Maybe][random(3)+1],k,1,50),
2640 @c makelist([Yes, No, Maybe][random(3)+1],k,1,50),
2641 @c title = "Asking for something to four groups",
2642 @c ylabel = "# of individuals",
2644 @c fill_density = 0.5,
2645 @c ordering = ordergreatp)$
2648 (%i1) load ("descriptive")$
2651 makelist([Yes, No, Maybe][random(3)+1],k,1,50),
2652 makelist([Yes, No, Maybe][random(3)+1],k,1,50),
2653 makelist([Yes, No, Maybe][random(3)+1],k,1,50),
2654 makelist([Yes, No, Maybe][random(3)+1],k,1,50),
2655 title = "Asking for something to four groups",
2656 ylabel = "# of individuals",
2659 ordering = ordergreatp)$
2666 @c load ("descriptive")$
2668 @c makelist([Yes, No, Maybe][random(3)+1],k,1,50),
2669 @c makelist([Yes, No, Maybe][random(3)+1],k,1,50),
2670 @c makelist([Yes, No, Maybe][random(3)+1],k,1,50),
2671 @c makelist([Yes, No, Maybe][random(3)+1],k,1,50),
2672 @c title = "Asking for something to four groups",
2673 @c ylabel = "# of individuals",
2674 @c grouping = stacked,
2675 @c fill_density = 0.5,
2676 @c ordering = ordergreatp)$
2679 (%i1) load ("descriptive")$
2682 makelist([Yes, No, Maybe][random(3)+1],k,1,50),
2683 makelist([Yes, No, Maybe][random(3)+1],k,1,50),
2684 makelist([Yes, No, Maybe][random(3)+1],k,1,50),
2685 makelist([Yes, No, Maybe][random(3)+1],k,1,50),
2686 title = "Asking for something to four groups",
2687 ylabel = "# of individuals",
2690 ordering = ordergreatp)$
2694 For bars diagrams related options, see @mref{barsplot} of package @ref{draw-pkg}
2695 See also functions @mref{histogram} and @mrefdot{piechart}
2697 @opencatbox{Categories:}
2698 @category{Package descriptive}
2703 @anchor{barsplot_description}
2704 @deffn {Function} barsplot_description (@dots{})
2706 Function @code{barsplot_description} creates a graphic object
2707 suitable for creating complex scenes, together with other
2710 Example: @code{barsplot} in a multiplot context.
2713 (%i1) load ("descriptive")$
2714 (%i2) l1:makelist(random(10),k,1,50)$
2715 (%i3) l2:makelist(random(10),k,1,100)$
2717 barsplot_description(
2721 bars_colors = [blue],
2722 frequency = relative)$
2724 barsplot_description(
2728 bars_colors = [red],
2729 frequency = relative)$
2730 (%i6) draw(gr2d(bp1), gr2d(bp2))$
2733 @opencatbox{Categories:}
2734 @category{Package descriptive}
2740 @deffn {Function} boxplot (@var{data}) @
2741 @fname{boxplot} (@var{data}, @var{option_1}, @var{option_2}, @dots{})
2743 This function plots box-and-whisker diagrams. Argument @var{data} can be a list,
2744 which is not of great interest, since these diagrams are mainly used for
2745 comparing different samples, or a matrix, so it is possible to compare
2746 two or more components of a multivariate statistical variable.
2747 But it is also allowed @var{data} to be a list of samples with
2748 possible different sample sizes, in fact this is the only function
2749 in package @code{descriptive} that admits this type of data structure.
2751 The box is plotted from the first quartile to the third, with an horizontal
2752 segment situated at the second quartile or median. By default, lower and
2753 upper whiskers are plotted at the minimum and maximum values,
2754 respectively. Option @var{range} can be used to indicate that values greater
2755 than @code{quantile(x,3/4)+range*(quantile(x,3/4)-quantile(x,1/4))} or
2756 less than @code{quantile(x,1/4)-range*(quantile(x,3/4)-quantile(x,1/4))}
2757 must be considered as outliers, in which case they are plotted as
2758 isolated points, and the whiskers are located at the extremes of the rest of
2761 Available options are:
2766 @var{box_width} (default, @code{3/4}): relative width of boxes.
2767 This value must be in the range @code{[0,1]}.
2770 @var{box_orientation} (default, @code{vertical}): possible values: @code{vertical}
2771 and @code{horizontal}.
2774 @var{range} (default, @code{inf}): positive coefficient of the interquartilic range
2775 to set outliers boundaries.
2778 @var{outliers_size} (default, @code{1}): circle size for isolated outliers.
2781 All @code{draw} options, except @code{points_joined}, @code{point_size}, @code{point_type},
2782 @code{xtics}, @code{ytics}, @code{xrange}, and @code{yrange}, which are
2783 internally assigned by @code{boxplot}.
2784 If you want to set your own values for this options or want to build
2785 complex scenes, make use of @code{boxplot_description}.
2788 The following local @code{draw} options: @code{key}, @code{color},
2789 and @code{line_width}.
2793 There is also a function @code{wxboxplot} for creating embedded
2794 histograms in interfaces wxMaxima and iMaxima.
2798 Box-and-whisker diagram from a multivariate sample.
2801 @c load ("descriptive")$
2802 @c s2 : read_matrix(file_search("wind.data"))$
2805 @c title = "Windspeed in knots",
2806 @c xlabel = "Stations",
2811 (%i1) load ("descriptive")$
2812 (%i2) s2 : read_matrix(file_search("wind.data"))$
2816 title = "Windspeed in knots",
2817 xlabel = "Stations",
2823 Box-and-whisker diagram from three samples of different sizes.
2826 @c load ("descriptive")$
2828 @c [[6, 4, 6, 2, 4, 8, 6, 4, 6, 4, 3, 2],
2829 @c [8, 10, 7, 9, 12, 8, 10],
2830 @c [16, 13, 17, 12, 11, 18, 13, 18, 14, 12]]$
2831 @c boxplot (A, box_orientation = horizontal)$
2834 (%i1) load ("descriptive")$
2837 [[6, 4, 6, 2, 4, 8, 6, 4, 6, 4, 3, 2],
2838 [8, 10, 7, 9, 12, 8, 10],
2839 [16, 13, 17, 12, 11, 18, 13, 18, 14, 12]]$
2841 (%i3) boxplot (A, box_orientation = horizontal)$
2844 Option @var{range} can be used to handle outliers.
2847 @c load ("descriptive")$
2848 @c B: [[7, 15, 5, 8, 6, 5, 7, 3, 1],
2849 @c [10, 8, 12, 8, 11, 9, 20],
2850 @c [23, 17, 19, 7, 22, 19]] $
2851 @c boxplot (B, range=1)$
2852 @c boxplot (B, range=1.5, box_orientation = horizontal)$
2854 @c boxplot_description(
2858 @c outliers_size = 2,
2860 @c background_color = light_gray),
2861 @c xtics = {["Low",1],["Medium",2],["High",3]}) $
2865 (%i1) load ("descriptive")$
2866 B: [[7, 15, 5, 8, 6, 5, 7, 3, 1],
2867 [10, 8, 12, 8, 11, 9, 20],
2868 [23, 17, 19, 7, 22, 19]] $
2869 boxplot (B, range=1)$
2870 boxplot (B, range=1.5, box_orientation = horizontal)$
2872 boxplot_description(
2878 background_color = light_gray),
2879 xtics = @{["Low",1],["Medium",2],["High",3]@}) $
2883 @opencatbox{Categories:}
2884 @category{Package descriptive}
2889 @anchor{boxplot_description}
2890 @deffn {Function} boxplot_description (@dots{})
2892 Function @code{boxplot_description} creates a graphic object
2893 suitable for creating complex scenes, together with other
2896 @opencatbox{Categories:}
2897 @category{Package descriptive}
2903 @deffn {Function} histogram @
2904 @fname{histogram} (@var{list}) @
2905 @fname{histogram} (@var{list}, @var{option_1}, @var{option_2}, @dots{}) @
2906 @fname{histogram} (@var{one_column_matrix}) @
2907 @fname{histogram} (@var{one_column_matrix}, @var{option_1}, @var{option_2}, @dots{}) @
2908 @fname{histogram} (@var{one_row_matrix}) @
2909 @fname{histogram} (@var{one_row_matrix}, @var{option_1}, @var{option_2}, @dots{})
2911 Constructs and displays a histogram from a data sample.
2912 Data must be stored as a list of numbers, or a matrix of one row or one column.
2919 @code{nclasses} (default, 10):
2920 the number of classes (also called bins) in the histogram,
2921 or a list of two numbers (the least and greatest values included in the histogram),
2922 or a list of three numbers (the least and greatest values included in the histogram, and the number of classes),
2923 or a set containing the endpoints of the class intervals,
2924 or a symbol specifying the name of one of three algorithms to automatically determine the number of classes:
2925 @code{fd} (Ref. [1]), @code{scott} (Ref. [2]), or @code{sturges} (Ref. [3]).
2927 A class interval excludes its left endpoint and includes its right endpoint,
2928 except for the first interval, which includes both the left and right endpoints.
2929 It is assumed that class intervals are contiguous.
2930 That is, the right endpoint of one interval is equal to the left endpoint of the next.
2933 @code{frequency} (default, @code{absolute}): indicates the scale of the vertical axis.
2934 Possible values are: @code{absolute} (heights of bars add up to number of data),
2935 @code{relative} (heights of bars add up to 1),
2936 @code{percent} (heights of bars add up to 100),
2937 and @code{density} (total area of histogram is 1).
2940 @code{htics} (default, @code{auto}): format of tic marks on the horizontal axis.
2941 Possible values are: @code{auto} (tics are placed automatically),
2942 @code{endpoints} (tics are placed at the divisions between classes),
2943 @code{intervals} (classes are labeled with the corresponding intervals),
2944 or a list of labels, one for each class.
2947 All global @code{draw} options, except @code{xrange}, @code{yrange},
2948 and @code{xtics}, which are internally assigned by @code{histogram}.
2949 If you want to set your own values for these options, make use of
2950 @code{histogram_description}.
2953 The following local @ref{draw-pkg} options: @mrefcomma{key}
2954 @mrefcomma{fill_color} @mrefcomma{fill_density} and @mrefdot{line_width}
2955 Note that the outlines of bars,
2956 as well as the interior of bars when @code{fill_density} is nonzero,
2957 are drawn with @code{fill_color}, not @code{color}.
2961 @code{histogram} honors the global option @code{histogram_skyline}.
2962 When @code{histogram_skyline} is @code{true},
2963 @code{histogram} and @code{histogram_description} construct "skyline" plots,
2964 which shows the outline of the histogram bars,
2965 instead of drawing all the vertical segments.
2966 Otherwise (the default), histograms are displayed with bars showing vertical segments.
2968 There is also a function @code{wxhistogram} for creating embedded
2969 histograms in interfaces wxMaxima and iMaxima.
2971 See also @mrefcomma{continuous_freq}
2972 which, like @code{histogram},
2973 counts data in intervals,
2974 but returns the counts instead of displaying a graphic representation.
2976 See also @mrefdot{barsplot}
2980 A simple histogram with eight classes:
2983 @c load ("descriptive")$
2984 @c s1 : read_list (file_search ("pidigits.data"))$
2988 @c title = "pi digits",
2989 @c xlabel = "digits",
2990 @c ylabel = "Absolute frequency",
2991 @c fill_color = grey,
2992 @c fill_density = 0.6)$
2995 (%i1) load ("descriptive")$
2996 (%i2) s1 : read_list (file_search ("pidigits.data"))$
3001 title = "pi digits",
3003 ylabel = "Absolute frequency",
3005 fill_density = 0.6)$
3009 Setting the limits of the histogram to -2 and 12, with 3 classes.
3010 Also, we introduce predefined tics:
3013 @c load ("descriptive")$
3014 @c s1 : read_list (file_search ("pidigits.data"))$
3017 @c nclasses = [-2,12,3],
3018 @c htics = ["A", "B", "C"],
3020 @c fill_color = "#23afa0",
3021 @c fill_density = 0.6)$
3024 (%i1) load ("descriptive")$
3025 (%i2) s1 : read_list (file_search ("pidigits.data"))$
3029 nclasses = [-2,12,3],
3030 htics = ["A", "B", "C"],
3032 fill_color = "#23afa0",
3033 fill_density = 0.6)$
3037 Bounds for varying class widths.
3040 @c load ("descriptive")$
3041 @c s1 : read_list (file_search ("pidigits.data"))$
3042 @c histogram (s1, nclasses = {0,3,6,7,11})$
3045 (%i1) load ("descriptive")$
3046 (%i2) s1 : read_list (file_search ("pidigits.data"))$
3047 (%i3) histogram (s1, nclasses = @{0,3,6,7,11@})$
3050 Freedman-Diaconis formula for the number of classes.
3053 @c load ("descriptive")$
3054 @c s1 : read_list (file_search ("pidigits.data"))$
3055 @c histogram(s1, nclasses=fd) $
3058 (%i1) load ("descriptive")$
3059 (%i2) s1 : read_list (file_search ("pidigits.data"))$
3060 (%i3) histogram(s1, nclasses=fd) $
3065 [1] Freedman, D., and Diaconis, P. (1981) On the histogram as a density estimator: L_2 theory.
3066 Zeitschrift f@"ur Wahrscheinlichkeitstheorie und verwandte Gebiete 57, 453-476.
3068 [2] Scott, D. W. (1979) On optimal and data-based histograms. Biometrika 66, 605-610.
3070 [3] Sturges, H. A. (1926) The choice of a class interval. Journal of the American Statistical Association 21, 65-66.
3072 @opencatbox{Categories:}
3073 @category{Package descriptive}
3078 @anchor{histogram_description}
3079 @deffn {Function} histogram_description (@dots{})
3081 Creates a graphic object which represents a histogram.
3082 Such an object is suitable for creating complex scenes together with other graphic objects,
3083 to be displayed by @code{draw2d}.
3085 @code{histogram_description} takes the same arguments
3086 as the stand-alone function @code{histogram}.
3087 See @mref{histogram} for more information.
3091 We make use of @code{histogram_description} for setting
3092 @code{xrange} and adding an explicit curve into the scene:
3095 (%i1) load ("descriptive")$
3096 (%i2) ( load("distrib"),
3098 s2: random_normal(m, s, 1000) ) $
3102 histogram_description(
3105 frequency = density,
3106 fill_density = 0.5),
3107 explicit(pdf_normal(x,m,s), x, m - 3*s, m + 3* s))$
3110 @opencatbox{Categories:}
3111 @category{Package descriptive}
3116 @anchor{histogram_skyline}
3117 @defvr {Option variable} histogram_skyline
3118 Default value: @code{false}
3120 When @code{histogram_skyline} is @code{true},
3121 @code{histogram} and @code{histogram_description} construct "skyline" plots,
3122 which shows the outline of the histogram bars,
3123 instead of drawing all the vertical segments.
3125 The outline is drawn with the current @code{fill_color} (not the current @code{color}).
3126 The interior of the histogram is filled with @code{fill_color},
3127 but only if @code{fill_density} is nonzero.
3129 Otherwise, histograms are displayed with bars showing vertical segments.
3133 Construct a skyline histogram,
3134 and an ordinary histogram for comparison,
3138 (%i1) load ("descriptive") $
3139 (%i2) L: read_list (file_search ("pidigits.data")) $
3140 (%i3) histogram_skyline: true $
3141 (%i4) skyline_hist: histogram_description (L) $
3142 (%i5) histogram_skyline: false $
3143 (%i6) ordinary_hist: histogram_description (L) $
3144 (%i7) draw (gr2d (skyline_hist), gr2d (ordinary_hist)) $
3147 Continuing the preceding example.
3148 Set display options for @code{fill_color} and @code{fill_density}.
3151 (%i8) histogram_skyline: true $
3152 (%i9) skyline_hist: histogram_description (L, fill_color = blue, fill_density = 0.2) $
3153 (%i10) histogram_skyline: false $
3154 (%i11) ordinary_hist: histogram_description (L, fill_color = blue, fill_density = 0.2) $
3155 (%i12) draw (gr2d (skyline_hist), gr2d (ordinary_hist)) $
3158 @opencatbox{Categories:}
3159 @category{Package descriptive}
3165 @deffn {Function} piechart @
3166 @fname{piechart} (@var{list}) @
3167 @fname{piechart} (@var{list}, @var{option_1}, @var{option_2}, @dots{}) @
3168 @fname{piechart} (@var{one_column_matrix}) @
3169 @fname{piechart} (@var{one_column_matrix}, @var{option_1}, @var{option_2}, @dots{}) @
3170 @fname{piechart} (@var{one_row_matrix}) @
3171 @fname{piechart} (@var{one_row_matrix}, @var{option_1}, @var{option_2}, @dots{})
3173 Similar to @code{barsplot}, but plots sectors instead of rectangles.
3175 Available options are:
3180 @var{sector_colors} (default, @code{[]}): a list of colors for sectors.
3181 When there are more sectors than specified colors, the extra necessary colors
3182 are chosen at random. See @code{color} to learn more about them.
3185 @var{pie_center} (default, @code{[0,0]}): diagram's center.
3188 @var{pie_radius} (default, @code{1}): diagram's radius.
3191 All global @code{draw} options, except @code{key}, which is
3192 internally assigned by @code{piechart}.
3193 If you want to set your own values for this option or want to build
3194 complex scenes, make use of @code{piechart_description}.
3197 The following local @code{draw} options: @code{key}, @code{color},
3198 @code{fill_density} and @code{line_width}. See also
3203 There is also a function @code{wxpiechart} for
3204 creating embedded histograms in interfaces wxMaxima and iMaxima.
3209 @c load ("descriptive")$
3210 @c s1 : read_list (file_search ("pidigits.data"))$
3213 @c xrange = [-1.1, 1.3],
3214 @c yrange = [-1.1, 1.1],
3215 @c title = "Digit frequencies in pi")$
3218 (%i1) load ("descriptive")$
3219 (%i2) s1 : read_list (file_search ("pidigits.data"))$
3223 xrange = [-1.1, 1.3],
3224 yrange = [-1.1, 1.1],
3225 title = "Digit frequencies in pi")$
3229 See also function @mrefdot{barsplot}
3231 @opencatbox{Categories:}
3232 @category{Package descriptive}
3237 @anchor{piechart_description}
3238 @deffn {Function} piechart_description (@dots{})
3240 Function @code{piechart_description} creates a graphic object
3241 suitable for creating complex scenes, together with other
3244 @opencatbox{Categories:}
3245 @category{Package descriptive}
3250 @anchor{scatterplot}
3251 @deffn {Function} scatterplot @
3252 @fname{scatterplot} (@var{list}) @
3253 @fname{scatterplot} (@var{list}, @var{option_1}, @var{option_2}, @dots{}) @
3254 @fname{scatterplot} (@var{matrix}) @
3255 @fname{scatterplot} (@var{matrix}, @var{option_1}, @var{option_2}, @dots{})
3257 Plots scatter diagrams both for univariate (@var{list}) and multivariate
3258 (@var{matrix}) samples.
3260 Available options are the same admitted by @code{histogram}.
3262 There is also a function @code{wxscatterplot} for
3263 creating embedded histograms in interfaces wxMaxima and iMaxima.
3267 Univariate scatter diagram from a simulated Gaussian sample.
3270 @c load ("descriptive")$
3271 @c load ("distrib")$
3273 @c random_normal(0,1,200),
3276 @c dimensions = [600,150])$
3279 (%i1) load ("descriptive")$
3280 (%i2) load ("distrib")$
3283 random_normal(0,1,200),
3286 dimensions = [600,150])$
3290 Two dimensional scatter plot.
3293 @c load ("descriptive")$
3294 @c s2 : read_matrix (file_search ("wind.data"))$
3296 @c submatrix(s2, 1,2,3),
3297 @c title = "Data from stations #4 and #5",
3298 @c point_type = diamant,
3303 (%i1) load ("descriptive")$
3304 (%i2) s2 : read_matrix (file_search ("wind.data"))$
3307 submatrix(s2, 1,2,3),
3308 title = "Data from stations #4 and #5",
3309 point_type = diamant,
3315 Three dimensional scatter plot.
3318 @c load ("descriptive")$
3319 @c s2 : read_matrix (file_search ("wind.data"))$
3320 @c scatterplot(submatrix (s2, 1,2), nclasses=4)$
3323 (%i1) load ("descriptive")$
3324 (%i2) s2 : read_matrix (file_search ("wind.data"))$
3325 (%i3) scatterplot(submatrix (s2, 1,2), nclasses=4)$
3328 Five dimensional scatter plot, with five classes histograms.
3331 @c load ("descriptive")$
3332 @c s2 : read_matrix (file_search ("wind.data"))$
3336 @c frequency = relative,
3337 @c fill_color = blue,
3338 @c fill_density = 0.3,
3342 (%i1) load ("descriptive")$
3343 (%i2) s2 : read_matrix (file_search ("wind.data"))$
3348 frequency = relative,
3355 For plotting isolated or line-joined points in two and three dimensions,
3356 see @code{points}. See also @mrefdot{histogram}
3358 @opencatbox{Categories:}
3359 @category{Package descriptive}
3364 @anchor{scatterplot_description}
3365 @deffn {Function} scatterplot_description (@dots{})
3367 Function @code{scatterplot_description} creates a graphic object
3368 suitable for creating complex scenes, together with other
3371 @opencatbox{Categories:}
3372 @category{Package descriptive}
3378 @deffn {Function} starplot (@var{data1}, @var{data2}, @dots{}, @var{option_1}, @var{option_2}, @dots{})
3380 Plots star diagrams for discrete statistical variables,
3381 both for one or multiple samples.
3383 @var{data} can be a list of outcomes representing one sample, or a
3384 matrix of @var{m} rows and @var{n} columns, representing @var{n} samples of size
3387 Available options are:
3392 @var{stars_colors} (default, @code{[]}): a list of colors for multiple samples.
3393 When there are more samples than specified colors, the extra necessary colors
3394 are chosen at random. See @code{color} to learn more about them.
3397 @var{frequency} (default, @code{absolute}): indicates the scale of the
3398 radii. Possible values are: @code{absolute} and @code{relative}.
3401 @var{ordering} (default, @code{orderlessp}): possible values are @code{orderlessp} or @code{ordergreatp},
3402 indicating how statistical outcomes should be ordered.
3405 @var{sample_keys} (default, @code{[]}): a list with the strings to be used in the legend.
3406 When the list length is other than 0 or the number of samples, an error message is returned.
3410 @var{star_center} (default, @code{[0,0]}): diagram's center.
3413 @var{star_radius} (default, @code{1}): diagram's radius.
3416 All global @code{draw} options, except @code{points_joined}, @code{point_type},
3417 and @code{key}, which are internally assigned by @code{starplot}.
3418 If you want to set your own values for this options or want to build
3419 complex scenes, make use of @code{starplot_description}.
3422 The following local @code{draw} option: @code{line_width}.
3426 There is also a function @code{wxstarplot} for
3427 creating embedded histograms in interfaces wxMaxima and iMaxima.
3431 Plot based on absolute frequencies.
3432 Location and radius defined by the user.
3435 (%i1) load ("descriptive")$
3436 (%i2) l1: makelist(random(10),k,1,50)$
3437 (%i3) l2: makelist(random(10),k,1,200)$
3441 stars_colors = [blue,red],
3442 sample_keys = ["1st sample", "2nd sample"],
3443 star_center = [1,2],
3445 proportional_axes = xy,
3450 @opencatbox{Categories:}
3451 @category{Package descriptive}
3456 @anchor{starplot_description}
3457 @deffn {Function} starplot_description (@dots{})
3459 Function @code{starplot_description} creates a graphic object
3460 suitable for creating complex scenes, together with other
3463 @opencatbox{Categories:}
3464 @category{Package descriptive}
3470 @deffn {Function} stemplot @
3471 @fname{stemplot} (@var{data}) @
3472 @fname{stemplot} (@var{data}, @var{option})
3474 Plots stem and leaf diagrams.
3476 Unique available option is:
3481 @var{leaf_unit} (default, @code{1}): indicates the unit of the leaves; must be a
3489 (%i1) load ("descriptive")$
3490 (%i2) load("distrib")$
3493 random_normal(15, 6, 100),
3527 @opencatbox{Categories:}
3528 @category{Package descriptive}