doc/info/descriptive.texi

   1 @menu
   2 * Introduction to descriptive::
   3 * Functions and Variables for data manipulation::
   4 * Functions and Variables for descriptive statistics::
   5 * Functions and Variables for statistical graphs::
   6 @end menu
   7
   8 @node Introduction to descriptive, Functions and Variables for data manipulation, Package descriptive, Package descriptive
   9 @section Introduction to descriptive
  10
  11 Package @code{descriptive} contains a set of functions for
  12 making descriptive statistical computations and graphing.
  13 Together with the source code there are three data sets in
  14 your Maxima tree: @code{pidigits.data}, @code{wind.data} and @code{biomed.data}.
  15
  16 Any statistics manual can be used as a reference to the functions in package @code{descriptive}.
  17
  18 For comments, bugs or suggestions, please contact me at @var{'riotorto AT yahoo DOT com'}.
  19
  20 Here is a simple example on how the descriptive functions in @code{descriptive} do they work, depending on the nature of their arguments, lists or matrices,
  21
  22 @c ===beg===
  23 @c load ("descriptive")$
  24 @c /* univariate sample */   mean ([a, b, c]);
  25 @c matrix ([a, b], [c, d], [e, f]);
  26 @c /* multivariate sample */ mean (%);
  27 @c ===end===
  28 @example
  29 (%i1) load ("descriptive")$
  30 @group
  31 (%i2) /* univariate sample */   mean ([a, b, c]);
  32                             c + b + a
  33 (%o2)                       ---------
  34                                 3
  35 @end group
  36 @group
  37 (%i3) matrix ([a, b], [c, d], [e, f]);
  38                             [ a  b ]
  39                             [      ]
  40 (%o3)                       [ c  d ]
  41                             [      ]
  42                             [ e  f ]
  43 @end group
  44 @group
  45 (%i4) /* multivariate sample */ mean (%);
  46                       e + c + a  f + d + b
  47 (%o4)                [---------, ---------]
  48                           3          3
  49 @end group
  50 @end example
  51
  52 Note that in multivariate samples the mean is calculated for each column.
  53
  54 In case of several samples with possible different sizes, the Maxima function @code{map} can be used to get the desired results for each sample,
  55
  56 @c ===beg===
  57 @c load ("descriptive")$
  58 @c map (mean, [[a, b, c], [d, e]]);
  59 @c ===end===
  60 @example
  61 (%i1) load ("descriptive")$
  62 @group
  63 (%i2) map (mean, [[a, b, c], [d, e]]);
  64                         c + b + a  e + d
  65 (%o2)                  [---------, -----]
  66                             3        2
  67 @end group
  68 @end example
  69
  70 In this case, two samples of sizes 3 and 2 were stored into a list.
  71
  72 Univariate samples must be stored in lists like
  73
  74 @c ===beg===
  75 @c s1 : [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5];
  76 @c ===end===
  77 @example
  78 @group
  79 (%i1) s1 : [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5];
  80 (%o1)           [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]
  81 @end group
  82 @end example
  83
  84 and multivariate samples in matrices as in
  85
  86 @c ===beg===
  87 @c s2 : matrix ([13.17, 9.29], [14.71, 16.88], [18.50, 16.88],
  88 @c              [10.58, 6.63], [13.33, 13.25], [13.21,  8.12]);
  89 @c ===end===
  90 @example
  91 @group
  92 (%i1) s2 : matrix ([13.17, 9.29], [14.71, 16.88], [18.50, 16.88],
  93              [10.58, 6.63], [13.33, 13.25], [13.21,  8.12]);
  94                         [ 13.17  9.29  ]
  95                         [              ]
  96                         [ 14.71  16.88 ]
  97                         [              ]
  98                         [ 18.5   16.88 ]
  99 (%o1)                   [              ]
 100                         [ 10.58  6.63  ]
 101                         [              ]
 102                         [ 13.33  13.25 ]
 103                         [              ]
 104                         [ 13.21  8.12  ]
 105 @end group
 106 @end example
 107
 108 In this case, the number of columns equals the random variable dimension and the number of rows is the sample size.
 109
 110 Data can be introduced by hand, but big samples are usually stored in plain text files. For example, file @code{pidigits.data} contains the first 100 digits of number @code{%pi}:
 111 @example
 112 @group
 113       3
 114       1
 115       4
 116       1
 117       5
 118       9
 119       2
 120       6
 121       5
 122       3 ...
 123 @end group
 124 @end example
 125
 126 In order to load these digits in Maxima,
 127
 128 @c ===beg===
 129 @c s1 : read_list (file_search ("pidigits.data"))$
 130 @c length (s1);
 131 @c ===end===
 132 @example
 133 (%i1) s1 : read_list (file_search ("pidigits.data"))$
 134 @group
 135 (%i2) length (s1);
 136 (%o2)                          100
 137 @end group
 138 @end example
 139
 140 On the other hand, file @code{wind.data} contains daily average wind speeds at 5 meteorological stations in the Republic of Ireland (This is part of a data set taken at 12 meteorological stations. The original file is freely downloadable from the StatLib Data Repository and its analysis is discussed in Haslett, J., Raftery, A. E. (1989) @var{Space-time Modelling with Long-memory Dependence: Assessing Ireland's Wind Power Resource, with Discussion}. Applied Statistics 38, 1-50). This loads the data:
 141
 142 @c ===beg===
 143 @c s2 : read_matrix (file_search ("wind.data"))$
 144 @c length (s2);
 145 @c s2 [%]; /* last record */
 146 @c ===end===
 147 @example
 148 (%i1) s2 : read_matrix (file_search ("wind.data"))$
 149 @group
 150 (%i2) length (s2);
 151 (%o2)                          100
 152 @end group
 153 @group
 154 (%i3) s2 [%]; /* last record */
 155 (%o3)            [3.58, 6.0, 4.58, 7.62, 11.25]
 156 @end group
 157 @end example
 158
 159 Some samples contain non numeric data. As an example, file @code{biomed.data} (which is part of another bigger one downloaded from the StatLib Data Repository) contains four blood measures taken from two groups of patients, @code{A} and @code{B}, of different ages,
 160
 161 @c ===beg===
 162 @c s3 : read_matrix (file_search ("biomed.data"))$
 163 @c length (s3);
 164 @c s3 [1]; /* first record */
 165 @c ===end===
 166 @example
 167 (%i1) s3 : read_matrix (file_search ("biomed.data"))$
 168 @group
 169 (%i2) length (s3);
 170 (%o2)                          100
 171 @end group
 172 @group
 173 (%i3) s3 [1]; /* first record */
 174 (%o3)            [A, 30, 167.0, 89.0, 25.6, 364]
 175 @end group
 176 @end example
 177
 178 The first individual belongs to group @code{A}, is 30 years old and his/her blood measures were 167.0, 89.0, 25.6 and 364.
 179
 180 One must take care when working with categorical data. In the next example, symbol @code{a} is assigned a value in some previous moment and then a sample with categorical value @code{a} is taken,
 181
 182 @c ===beg===
 183 @c a : 1$
 184 @c matrix ([a, 3], [b, 5]);
 185 @c ===end===
 186 @example
 187 (%i1) a : 1$
 188 @group
 189 (%i2) matrix ([a, 3], [b, 5]);
 190                             [ 1  3 ]
 191 (%o2)                       [      ]
 192                             [ b  5 ]
 193 @end group
 194 @end example
 195
 196 @opencatbox{Categories:}
 197 @category{Descriptive statistics}
 198 @category{Share packages}
 199 @category{Package descriptive}
 200 @closecatbox
 201
 202 @node Functions and Variables for data manipulation, Functions and Variables for descriptive statistics, Introduction to descriptive, Package descriptive
 203 @section Functions and Variables for data manipulation
 204
 205
 206
 207 @anchor{build_sample}
 208 @deffn {Function} build_sample @
 209 @fname{build_sample} (@var{list}) @
 210 @fname{build_sample} (@var{matrix})
 211
 212 Builds a sample from a table of absolute frequencies.
 213 The input table can be a matrix or a list of lists, all of
 214 them of equal size. The number of columns or the length of
 215 the lists must be greater than 1. The last element of each
 216 row or list is interpreted as the absolute frequency.
 217 The output is always a sample in matrix form.
 218
 219 Examples:
 220
 221 Univariate frequency table.
 222
 223 @c ===beg===
 224 @c load ("descriptive")$
 225 @c sam1: build_sample([[6,1], [j,2], [2,1]]);
 226 @c mean(sam1);
 227 @c barsplot(sam1) $
 228 @c ===end===
 229 @example
 230 (%i1) load ("descriptive")$
 231 @group
 232 (%i2) sam1: build_sample([[6,1], [j,2], [2,1]]);
 233                               [ 6 ]
 234                               [   ]
 235                               [ j ]
 236 (%o2)                         [   ]
 237                               [ j ]
 238                               [   ]
 239                               [ 2 ]
 240 @end group
 241 @group
 242 (%i3) mean(sam1);
 243                               j + 4
 244 (%o3)                        [-----]
 245                                 2
 246 @end group
 247 (%i4) barsplot(sam1) $
 248 @end example
 249
 250 Multivariate frequency table.
 251
 252 @c ===beg===
 253 @c load ("descriptive")$
 254 @c sam2: build_sample([[6,3,1], [5,6,2], [u,2,1],[6,8,2]]) ;
 255 @c cov(sam2);
 256 @c barsplot(sam2, grouping=stacked) $
 257 @c ===end===
 258 @example
 259 (%i1) load ("descriptive")$
 260 @group
 261 (%i2) sam2: build_sample([[6,3,1], [5,6,2], [u,2,1],[6,8,2]]) ;
 262                             [ 6  3 ]
 263                             [      ]
 264                             [ 5  6 ]
 265                             [      ]
 266                             [ 5  6 ]
 267 (%o2)                       [      ]
 268                             [ u  2 ]
 269                             [      ]
 270                             [ 6  8 ]
 271                             [      ]
 272                             [ 6  8 ]
 273 @end group
 274 @group
 275 (%i3) cov(sam2);
 276       [   2                 2                            ]
 277       [  u  + 158   (u + 28)     2 u + 174   11 (u + 28) ]
 278       [  -------- - ---------    --------- - ----------- ]
 279 (%o3) [     6          36            6           12      ]
 280       [                                                  ]
 281       [ 2 u + 174   11 (u + 28)            21            ]
 282       [ --------- - -----------            --            ]
 283       [     6           12                 4             ]
 284 @end group
 285 (%i4) barsplot(sam2, grouping=stacked) $
 286 @end example
 287
 288 @opencatbox{Categories:}
 289 @category{Package descriptive}
 290 @closecatbox
 291 @end deffn
 292
 293
 294
 295 @anchor{continuous_freq}
 296 @deffn {Function} continuous_freq @
 297 @fname{continuous_freq} (@var{data}) @
 298 @fname{continuous_freq} (@var{data}, @var{m})
 299
 300 Divides the range of @var{data} into intervals,
 301 and counts how many values fall into each one.
 302
 303 A value @var{x} falls into an interval with left and right endpoints @var{a} and @var{b}
 304 if and only if @code{@var{x} > @var{a}} and @code{@var{x} <= @var{b}},
 305 except for the first (least or leftmost) interval,
 306 for which @code{@var{x} >= @var{a}} and @code{@var{x} <= @var{b}}.
 307 That is, an interval excludes its left endpoint and includes its right endpoint,
 308 except for the first interval, which includes both the left and right endpoints.
 309
 310 @var{data} must be a list of numbers,
 311 or 1-dimensional array (as created by @code{make_array}).
 312
 313 @var{m} is optional, and equals either the number of classes (10 by default),
 314 or a list of two elements (the least and greatest values to be counted),
 315 or a list of three elements (the least and greatest values to be counted, and the number of classes),
 316 or a set containing the endpoints of the class intervals.
 317
 318 It is assumed that class intervals are contiguous.
 319 That is, the right endpoint of one interval is equal to the left endpoint of the next.
 320
 321 @code{continuous_freq} returns a list of two lists.
 322 The first list comprises all the endpoints of the class intervals,
 323 concatenated into a single list.
 324 The second list contains the class counts for the intervals corresponding to elements of the first list.
 325
 326 If sample values are all equal, this function returns exactly
 327 one class of width 2.
 328
 329 Examples:
 330
 331 Optional argument indicates the number of classes we want.
 332 The first list in the output contains the interval limits, and
 333 the second the corresponding counts: there are 16 digits inside
 334 the interval @code{[0, 1.8]}, 24 digits in @code{(1.8, 3.6]}, and so on.
 335
 336 @c ===beg===
 337 @c load ("descriptive")$
 338 @c s1 : read_list (file_search ("pidigits.data"))$
 339 @c continuous_freq (s1, 5);
 340 @c ===end===
 341 @example
 342 (%i1) load ("descriptive")$
 343 (%i2) s1 : read_list (file_search ("pidigits.data"))$
 344 @group
 345 (%i3) continuous_freq (s1, 5);
 346                9  18  27  36
 347 (%o3)     [[0, -, --, --, --, 9], [16, 24, 18, 17, 25]]
 348                5  5   5   5
 349 @end group
 350 @end example
 351
 352 Optional argument indicates we want 7 classes with limits
 353 -2 and 12:
 354
 355 @c ===beg===
 356 @c load ("descriptive")$
 357 @c s1 : read_list (file_search ("pidigits.data"))$
 358 @c continuous_freq (s1, [-2,12,7]);
 359 @c ===end===
 360 @example
 361 (%i1) load ("descriptive")$
 362 (%i2) s1 : read_list (file_search ("pidigits.data"))$
 363 @group
 364 (%i3) continuous_freq (s1, [-2,12,7]);
 365 (%o3) [[- 2, 0, 2, 4, 6, 8, 10, 12], [8, 20, 22, 17, 20, 13, 0]]
 366 @end group
 367 @end example
 368
 369 Optional argument indicates we want the default number of classes with limits
 370 -2 and 12:
 371
 372 @c ===beg===
 373 @c load ("descriptive")$
 374 @c s1 : read_list (file_search ("pidigits.data"))$
 375 @c continuous_freq (s1, [-2,12]);
 376 @c ===end===
 377 @example
 378 (%i1) load ("descriptive")$
 379 (%i2) s1 : read_list (file_search ("pidigits.data"))$
 380 @group
 381 (%i3) continuous_freq (s1, [-2,12]);
 382                3  4  11  18     32  39  46  53
 383 (%o3) [[- 2, - -, -, --, --, 5, --, --, --, --, 12],
 384                5  5  5   5      5   5   5   5
 385                               [0, 8, 20, 12, 18, 9, 8, 25, 0, 0]]
 386 @end group
 387 @end example
 388
 389 The first argument may be an array.
 390
 391 @c ===beg===
 392 @c load ("descriptive")$
 393 @c s1 : read_list (file_search ("pidigits.data"))$
 394 @c a1 : make_array (fixnum, length (s1)) $
 395 @c fillarray (a1, s1);
 396 @c continuous_freq (a1);
 397 @c ===end===
 398 @example
 399 (%i1) load ("descriptive")$
 400 (%i2) s1 : read_list (file_search ("pidigits.data"))$
 401 (%i3) a1 : make_array (fixnum, length (s1)) $
 402 @group
 403 (%i4) fillarray (a1, s1);
 404 (%o4) @{Lisp Array: #(3 1 4 1 5 9 2 6 5 3 5 8 9 7 9 3 2 3 8 4 6 2\
 405  6 4 3 3 8 3 2 7 9 5
 406                0 2 8 8 4 1 9 7 1 6 9 3 9 9 3 7 5 1 0 5 8 2 0 9 7\
 407  4 9 4 4 5 9 2
 408                3 0 7 8 1 6 4 0 6 2 8 6 2 0 8 9 9 8 6 2 8 0 3 4 8\
 409  2 5 3 4 2 1 1
 410                7 0 6 7)@}
 411 @end group
 412 @group
 413 (%i5) continuous_freq (a1);
 414            9   9  27  18  9  27  63  36  81
 415 (%o5) [[0, --, -, --, --, -, --, --, --, --, 9],
 416            10  5  10  5   2  5   10  5   10
 417                              [8, 8, 12, 12, 10, 8, 9, 8, 12, 13]]
 418 @end group
 419 @end example
 420
 421 @opencatbox{Categories:}
 422 @category{Package descriptive}
 423 @closecatbox
 424 @end deffn
 425
 426
 427
 428 @anchor{discrete_freq}
 429 @deffn {Function} discrete_freq (@var{data})
 430
 431 Counts absolute frequencies in discrete samples, both numeric and categorical. Its sole argument is a list,
 432 or 1-dimensional array (as created by @code{make_array}).
 433
 434 Examples:
 435
 436 @c ===beg===
 437 @c load ("descriptive")$
 438 @c s1 : read_list (file_search ("pidigits.data"))$
 439 @c discrete_freq (s1);
 440 @c ===end===
 441 @example
 442 (%i1) load ("descriptive")$
 443 (%i2) s1 : read_list (file_search ("pidigits.data"))$
 444 @group
 445 (%i3) discrete_freq (s1);
 446 (%o3) [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
 447                              [8, 8, 12, 12, 10, 8, 9, 8, 12, 13]]
 448 @end group
 449 @end example
 450
 451 In the return value,
 452 the first list gives the sample values, and the second, their absolute frequencies.
 453
 454 The argument may be an array.
 455
 456 @c ===beg===
 457 @c load ("descriptive")$
 458 @c s1 : read_list (file_search ("pidigits.data"))$
 459 @c a1 : make_array (fixnum, length (s1)) $
 460 @c fillarray (a1, s1);
 461 @c discrete_freq (a1);
 462 @c ===end===
 463 @example
 464 (%i1) load ("descriptive")$
 465 (%i2) s1 : read_list (file_search ("pidigits.data"))$
 466 (%i3) a1 : make_array (fixnum, length (s1)) $
 467 @group
 468 (%i4) fillarray (a1, s1);
 469 (%o4) @{Lisp Array: #(3 1 4 1 5 9 2 6 5 3 5 8 9 7 9 3 2 3 8 4 6 2\
 470  6 4 3 3 8 3 2 7 9 5
 471                0 2 8 8 4 1 9 7 1 6 9 3 9 9 3 7 5 1 0 5 8 2 0 9 7\
 472  4 9 4 4 5 9 2
 473                3 0 7 8 1 6 4 0 6 2 8 6 2 0 8 9 9 8 6 2 8 0 3 4 8\
 474  2 5 3 4 2 1 1
 475                7 0 6 7)@}
 476 @end group
 477 @group
 478 (%i5) discrete_freq (a1);
 479 (%o5) [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
 480                              [8, 8, 12, 12, 10, 8, 9, 8, 12, 13]]
 481 @end group
 482 @end example
 483
 484 @opencatbox{Categories:}
 485 @category{Package descriptive}
 486 @closecatbox
 487 @end deffn
 488
 489
 490
 491
 492 @anchor{standardize}
 493 @deffn {Function} standardize @
 494 @fname{standardize} (@var{list}) @
 495 @fname{standardize} (@var{matrix})
 496
 497 Subtracts to each element of the list the sample mean and divides
 498 the result by the standard deviation. When the input is a matrix,
 499 @code{standardize} subtracts to each row the multivariate mean, and then
 500 divides each component by the corresponding standard deviation.
 501
 502 @opencatbox{Categories:}
 503 @category{Package descriptive}
 504 @closecatbox
 505 @end deffn
 506
 507
 508
 509
 510 @anchor{subsample}
 511 @deffn {Function} subsample @
 512 @fname{subsample} (@var{data_matrix}, @var{predicate_function}) @
 513 @fname{subsample} (@var{data_matrix}, @var{predicate_function}, @var{col_num1}, @var{col_num2}, ...)
 514
 515 This is a sort of variant of the Maxima @code{submatrix} function.
 516 The first argument is the data matrix, the second is a predicate function
 517 and optional additional arguments are the numbers of the columns to be taken.
 518
 519 Examples:
 520
 521 These are multivariate records in which the wind speed
 522 in the first meteorological station were greater than 18.
 523 See that in the lambda expression the @var{i}-th component is
 524 referred to as @code{v[i]}.
 525
 526 @c ===beg===
 527 @c load ("descriptive")$
 528 @c s2 : read_matrix (file_search ("wind.data"))$
 529 @c subsample (s2, lambda([v], v[1] > 18));
 530 @c ===end===
 531 @example
 532 (%i1) load ("descriptive")$
 533 (%i2) s2 : read_matrix (file_search ("wind.data"))$
 534 @group
 535 (%i3) subsample (s2, lambda([v], v[1] > 18));
 536               [ 19.38  15.37  15.12  23.09  25.25 ]
 537               [                                   ]
 538               [ 18.29  18.66  19.08  26.08  27.63 ]
 539 (%o3)         [                                   ]
 540               [ 20.25  21.46  19.95  27.71  23.38 ]
 541               [                                   ]
 542               [ 18.79  18.96  14.46  26.38  21.84 ]
 543 @end group
 544 @end example
 545
 546 In the following example, we request only the first, second and fifth
 547 components of those records with wind speeds greater or equal than 16
 548 in station number 1 and less than 25 knots in station number 4. The sample
 549 contains only data from stations 1, 2 and 5. In this case,
 550 the predicate function is defined as an ordinary Maxima function.
 551
 552 @c ===beg===
 553 @c load ("descriptive")$
 554 @c s2 : read_matrix (file_search ("wind.data"))$
 555 @c g(x):= x[1] >= 16 and x[4] < 25$
 556 @c subsample (s2, g, 1, 2, 5);
 557 @c ===end===
 558 @example
 559 (%i1) load ("descriptive")$
 560 (%i2) s2 : read_matrix (file_search ("wind.data"))$
 561 (%i3) g(x):= x[1] >= 16 and x[4] < 25$
 562 @group
 563 (%i4) subsample (s2, g, 1, 2, 5);
 564                      [ 19.38  15.37  25.25 ]
 565                      [                     ]
 566                      [ 17.33  14.67  19.58 ]
 567 (%o4)                [                     ]
 568                      [ 16.92  13.21  21.21 ]
 569                      [                     ]
 570                      [ 17.25  18.46  23.87 ]
 571 @end group
 572 @end example
 573
 574 Here is an example with the categorical variables of @code{biomed.data}.
 575 We want the records corresponding to those patients in group @code{B}
 576 who are older than 38 years.
 577
 578 @c ===beg===
 579 @c load ("descriptive")$
 580 @c s3 : read_matrix (file_search ("biomed.data"))$
 581 @c h(u):= u[1] = B and u[2] > 38 $
 582 @c subsample (s3, h);
 583 @c ===end===
 584 @example
 585 (%i1) load ("descriptive")$
 586 (%i2) s3 : read_matrix (file_search ("biomed.data"))$
 587 (%i3) h(u):= u[1] = B and u[2] > 38 $
 588 @group
 589 (%i4) subsample (s3, h);
 590                 [ B  39  28.0  102.3  17.1  146 ]
 591                 [                               ]
 592                 [ B  39  21.0  92.4   10.3  197 ]
 593                 [                               ]
 594                 [ B  39  23.0  111.5  10.0  133 ]
 595                 [                               ]
 596                 [ B  39  26.0  92.6   12.3  196 ]
 597 (%o4)           [                               ]
 598                 [ B  39  25.0  98.7   10.0  174 ]
 599                 [                               ]
 600                 [ B  39  21.0  93.2   5.9   181 ]
 601                 [                               ]
 602                 [ B  39  18.0  95.0   11.3  66  ]
 603                 [                               ]
 604                 [ B  39  39.0  88.5   7.6   168 ]
 605 @end group
 606 @end example
 607
 608 Probably, the statistical analysis will involve only the blood measures,
 609
 610 @c ===beg===
 611 @c load ("descriptive")$
 612 @c s3 : read_matrix (file_search ("biomed.data"))$
 613 @c subsample (s3, lambda([v], v[1] = B and v[2] > 38),
 614 @c            3, 4, 5, 6);
 615 @c ===end===
 616 @example
 617 (%i1) load ("descriptive")$
 618 (%i2) s3 : read_matrix (file_search ("biomed.data"))$
 619 @group
 620 (%i3) subsample (s3, lambda([v], v[1] = B and v[2] > 38),
 621            3, 4, 5, 6);
 622                    [ 28.0  102.3  17.1  146 ]
 623                    [                        ]
 624                    [ 21.0  92.4   10.3  197 ]
 625                    [                        ]
 626                    [ 23.0  111.5  10.0  133 ]
 627                    [                        ]
 628                    [ 26.0  92.6   12.3  196 ]
 629 (%o3)              [                        ]
 630                    [ 25.0  98.7   10.0  174 ]
 631                    [                        ]
 632                    [ 21.0  93.2   5.9   181 ]
 633                    [                        ]
 634                    [ 18.0  95.0   11.3  66  ]
 635                    [                        ]
 636                    [ 39.0  88.5   7.6   168 ]
 637 @end group
 638 @end example
 639
 640 This is the multivariate mean of @code{s3},
 641
 642 @c ===beg===
 643 @c load ("descriptive")$
 644 @c s3 : read_matrix (file_search ("biomed.data"))$
 645 @c mean (s3);
 646 @c ===end===
 647 @example
 648 (%i1) load ("descriptive")$
 649 (%i2) s3 : read_matrix (file_search ("biomed.data"))$
 650 @group
 651 (%i3) mean (s3);
 652        13 B + 7 A  317
 653 (%o3) [----------, ---, 87.178, 0.06 NA + 81.44999999999999,
 654            20      10
 655                                                     3 NA + 19587
 656                                 18.122999999999998, ------------]
 657                                                         100
 658 @end group
 659 @end example
 660
 661 Here, the first component is meaningless, since @code{A} and @code{B} are categorical, the second component is the mean age of individuals in rational form, and the fourth and last values exhibit some strange behaviour. This is because symbol @code{NA} is used here to indicate @var{non available} data, and the two means are nonsense. A possible solution would be to take out from the matrix those rows with @code{NA} symbols, although this deserves some loss of information.
 662
 663 @c ===beg===
 664 @c load ("descriptive")$
 665 @c s3 : read_matrix (file_search ("biomed.data"))$
 666 @c g(v):= v[4] # NA and v[6] # NA $
 667 @c mean (subsample (s3, g, 3, 4, 5, 6));
 668 @c ===end===
 669 @example
 670 (%i1) load ("descriptive")$
 671 (%i2) s3 : read_matrix (file_search ("biomed.data"))$
 672 (%i3) g(v):= v[4] # NA and v[6] # NA $
 673 @group
 674 (%i4) mean (subsample (s3, g, 3, 4, 5, 6));
 675 (%o4) [79.4923076923077, 86.2032967032967, 16.93186813186813,
 676                                                             2514
 677                                                             ----]
 678                                                              13
 679 @end group
 680 @end example
 681
 682 @opencatbox{Categories:}
 683 @category{Package descriptive}
 684 @closecatbox
 685 @end deffn
 686
 687
 688
 689
 690
 691 @anchor{transform_sample}
 692 @deffn {Function} transform_sample (@var{matrix}, @var{varlist}, @var{exprlist})
 693
 694 Transforms the sample @var{matrix}, where each column is called according to
 695 @var{varlist}, following expressions in @var{exprlist}.
 696
 697 Examples:
 698
 699 The second argument assigns names to the three columns. With these names,
 700 a list of expressions define the transformation of the sample.
 701
 702 @example
 703 (%i1) load ("descriptive")$
 704 (%i2) data: matrix([3,2,7],[3,7,2],[8,2,4],[5,2,4]) $
 705 @group
 706 (%i3) transform_sample(data, [a,b,c], [c, a*b, log(a)]);
 707                                [ 7  6   log(3) ]
 708                                [               ]
 709                                [ 2  21  log(3) ]
 710 (%o3)                          [               ]
 711                                [ 4  16  log(8) ]
 712                                [               ]
 713                                [ 4  10  log(5) ]
 714 @end group
 715 @end example
 716
 717 Add a constant column and remove the third variable.
 718
 719 @example
 720 (%i1) load ("descriptive")$
 721 (%i2) data: matrix([3,2,7],[3,7,2],[8,2,4],[5,2,4]) $
 722 (%i3) transform_sample(data, [a,b,c], [makelist(1,k,length(data)),a,b]);
 723 @group
 724                                   [ 1  3  2 ]
 725                                   [         ]
 726                                   [ 1  3  7 ]
 727 (%o3)                             [         ]
 728                                   [ 1  8  2 ]
 729                                   [         ]
 730                                   [ 1  5  2 ]
 731 @end group
 732 @end example
 733
 734 @opencatbox{Categories:}
 735 @category{Package descriptive}
 736 @closecatbox
 737 @end deffn
 738
 739
 740
 741
 742
 743
 744
 745 @node Functions and Variables for descriptive statistics, Functions and Variables for statistical graphs, Functions and Variables for data manipulation, Package descriptive
 746 @section Functions and Variables for descriptive statistics
 747
 748
 749
 750 @anchor{mean}
 751 @deffn {Function} mean @
 752 @fname{mean} (@var{x}) @
 753 @fname{mean} (@var{x}, @var{w})
 754
 755 Returns the sample mean.
 756 @var{x} must be a list or matrix.
 757
 758 When @var{x} is a list,
 759 @code{mean} returns the sample mean of @var{x}.
 760
 761 When @var{x} is a matrix,
 762 @code{mean} returns a list comprising the sample mean of each column.
 763
 764 @var{w} is an optional per-datum weight.
 765 @var{w} must either be 1, in which case every datum @var{x[i]} is given equal weight,
 766 or a list of the same length as @var{x},
 767 in which case the weight for @var{x[i]} is given by @var{w[i]}.
 768 The elements of @var{w} must be nonnegative and not all zero;
 769 it is not required that they sum to 1.
 770
 771 The unweighted sample mean is defined as
 772
 773 @ifnottex
 774 @example
 775                      n
 776                     ====
 777              _   1  \
 778              x = -   >    x
 779                  n  /      i
 780                     ====
 781                     i = 1
 782 @end example
 783 @end ifnottex
 784 @tex
 785 $${\bar{x}={1\over{n}}{\sum_{i=1}^{n}{x_{i}}}}$$
 786 @end tex
 787
 788 The weighted sample mean is defined as
 789
 790 @ifnottex
 791 @example
 792                      n
 793                     ====
 794              _   1  \
 795              x = -   >    w  x
 796                  Z  /      i  i
 797                     ====
 798                     i = 1
 799 @end example
 800 @end ifnottex
 801 @tex
 802 $${\bar{x}={1\over{Z}}{\sum_{i=1}^{n}{w_{i} x_{i}}}}$$
 803 @end tex
 804
 805 where @var{Z} is the sum of the weights,
 806
 807 @ifnottex
 808 @example
 809                    n
 810                   ====
 811                   \
 812              Z =   >    w
 813                   /      i
 814                   ====
 815                   i = 1
 816 @end example
 817 @end ifnottex
 818 @tex
 819 $${Z={\sum_{i=1}^{n}{w_{i}}}}$$
 820 @end tex
 821
 822 Examples:
 823
 824 Sample mean of a list.
 825
 826 @c ===beg===
 827 @c load ("descriptive")$
 828 @c s1 : read_list (file_search ("pidigits.data"))$
 829 @c mean (s1);
 830 @c ===end===
 831 @example
 832 (%i1) load ("descriptive")$
 833 (%i2) s1 : read_list (file_search ("pidigits.data"))$
 834 @group
 835 (%i3) mean (s1);
 836                                471
 837 (%o3)                          ---
 838                                100
 839 @end group
 840 @end example
 841
 842 Sample mean of each column of a matrix.
 843
 844 @c ===beg===
 845 @c load ("descriptive")$
 846 @c s2 : read_matrix (file_search ("wind.data"))$
 847 @c mean (s2);
 848 @c ===end===
 849 @example
 850 (%i1) load ("descriptive")$
 851 (%i2) s2 : read_matrix (file_search ("wind.data"))$
 852 @group
 853 (%i3) mean (s2);
 854 (%o3) [9.9485, 10.160700000000004, 10.868499999999997,
 855                           15.716600000000001, 14.844100000000001]
 856 @end group
 857 @end example
 858
 859 Weighted sample mean of a list.
 860
 861 @c ===beg===
 862 @c load ("descriptive")$
 863 @c mean ([a, b, c, d], [1, 2, 3, 4]);
 864 @c ===end===
 865 @example
 866 (%i1) load ("descriptive")$
 867 @group
 868 (%i2) mean ([a, b, c, d], [1, 2, 3, 4]);
 869                        4 d + 3 c + 2 b + a
 870 (%o2)                  -------------------
 871                                10
 872 @end group
 873 @end example
 874
 875 Weighted sample mean of each column of a matrix.
 876
 877 @c ===beg===
 878 @c load ("descriptive")$
 879 @c mm: matrix ([p, q, r], [s, t, u]);
 880 @c mean (mm, [vv, ww]);
 881 @c ===end===
 882 @example
 883 (%i1) load ("descriptive")$
 884 @group
 885 (%i2) mm: matrix ([p, q, r], [s, t, u]);
 886                            [ p  q  r ]
 887 (%o2)                      [         ]
 888                            [ s  t  u ]
 889 @end group
 890 @group
 891 (%i3) mean (mm, [vv, ww]);
 892               s ww + p vv  t ww + q vv  u ww + r vv
 893 (%o3)        [-----------, -----------, -----------]
 894                 ww + vv      ww + vv      ww + vv
 895 @end group
 896 @end example
 897
 898 @opencatbox{Categories:}
 899 @category{Package descriptive}
 900 @closecatbox
 901 @end deffn
 902
 903
 904
 905 @anchor{var}
 906 @deffn {Function} var @
 907 @fname{var} (@var{x}) @
 908 @fname{var} (@var{x}, @var{w})
 909
 910 Returns the sample variance.
 911 @var{x} must be a list or matrix.
 912
 913 When @var{x} is a list,
 914 @code{var} returns the sample variance of @var{x}.
 915
 916 When @var{x} is a matrix,
 917 @code{var} returns a list comprising the sample variance of each column.
 918
 919 @var{w} is an optional per-datum weight.
 920 @var{w} must either be 1, in which case every datum @var{x[i]} is given equal weight,
 921 or a list of the same length as @var{x},
 922 in which case the weight for @var{x[i]} is given by @var{w[i]}.
 923 The elements of @var{w} must be nonnegative and not all zero;
 924 it is not required that they sum to 1.
 925
 926 The unweighted sample variance is defined as
 927
 928 @ifnottex
 929 @example
 930 @group
 931                     n
 932                   ====
 933            2   1  \          _ 2
 934           s  = -   >    (x - x)
 935                n  /       i
 936                   ====
 937                   i = 1
 938 @end group
 939 @end example
 940 @end ifnottex
 941 @tex
 942 $${{1}\over{n}}{\sum_{i=1}^{n}{(x_{i}-\bar{x})^2}}$$
 943 @end tex
 944
 945 The weighted sample variance is defined as
 946
 947 @ifnottex
 948 @example
 949 @group
 950                     n
 951                   ====
 952            2   1  \             _ 2
 953           s  = -   >    w  (x - x)
 954                Z  /      i   i
 955                   ====
 956                   i = 1
 957 @end group
 958 @end example
 959 @end ifnottex
 960 @tex
 961 $${{1}\over{Z}}{\sum_{i=1}^{n}{(x_{i}-\bar{x})^2}}$$
 962 @end tex
 963
 964 where @var{Z} is the sum of the weights,
 965
 966 @ifnottex
 967 @example
 968                    n
 969                   ====
 970                   \
 971              Z =   >    w
 972                   /      i
 973                   ====
 974                   i = 1
 975 @end example
 976 @end ifnottex
 977 @tex
 978 $${Z={\sum_{i=1}^{n}{w_{i}}}}$$
 979 @end tex
 980
 981 Example:
 982
 983 Sample variance of a list.
 984
 985 @c ===beg===
 986 @c load ("descriptive")$
 987 @c s1 : read_list (file_search ("pidigits.data"))$
 988 @c var (s1), numer;
 989 @c ===end===
 990 @example
 991 (%i1) load ("descriptive")$
 992 (%i2) s1 : read_list (file_search ("pidigits.data"))$
 993 @group
 994 (%i3) var (s1), numer;
 995 (%o3)                   8.425899999999999
 996 @end group
 997 @end example
 998
 999 Sample variance of each column of a matrix.
1000
1001 @c ===beg===
1002 @c load ("descriptive")$
1003 @c s2 : read_matrix (file_search ("wind.data"))$
1004 @c var (s2);
1005 @c ===end===
1006 @example
1007 (%i1) load ("descriptive")$
1008 (%i2) s2 : read_matrix (file_search ("wind.data"))$
1009 @group
1010 (%i3) var (s2);
1011 (%o3) [17.22190675000001, 14.987736510000005,
1012        15.475728749999998, 32.17651044000001, 24.423076190000007]
1013 @end group
1014 @end example
1015
1016 Weighted sample variance of a list.
1017
1018 @c ===beg===
1019 @c load ("descriptive")$
1020 @c var ([a - b, a, a + b], [3, 5, 7]);
1021 @c ===end===
1022 @example
1023 (%i1) load ("descriptive")$
1024 @group
1025 (%i2) var ([a - b, a, a + b], [3, 5, 7]);
1026                                   2
1027                              134 b
1028 (%o2)                        ------
1029                               225
1030 @end group
1031 @end example
1032
1033 Weighted sample variance of each column of a matrix.
1034
1035 @c ===beg===
1036 @c load ("descriptive")$
1037 @c mm: matrix ([a - b, c - d], [a, c], [a + b, c + d]);
1038 @c var (mm, [3, 5, 7]);
1039 @c ===end===
1040 @example
1041 (%i1) load ("descriptive")$
1042 @group
1043 (%i2) mm: matrix ([a - b, c - d], [a, c], [a + b, c + d]);
1044                         [ a - b  c - d ]
1045                         [              ]
1046 (%o2)                   [   a      c   ]
1047                         [              ]
1048                         [ b + a  d + c ]
1049 @end group
1050 @group
1051 (%i3) var (mm, [3, 5, 7]);
1052                               2       2
1053                          134 b   134 d
1054 (%o3)                   [------, ------]
1055                           225     225
1056 @end group
1057 @end example
1058
1059 See also function @mrefdot{var1}
1060
1061 @opencatbox{Categories:}
1062 @category{Package descriptive}
1063 @closecatbox
1064 @end deffn
1065
1066
1067
1068 @anchor{var1}
1069 @deffn {Function} var1 @
1070 @fname{var1} (@var{list}) @
1071 @fname{var1} (@var{matrix})
1072
1073 This is the sample variance, defined as
1074 @ifnottex
1075 @example
1076 @group
1077                      n
1078                    ====
1079                1   \          _ 2
1080               ---   >    (x - x)
1081               n-1  /       i
1082                    ====
1083                    i = 1
1084 @end group
1085 @end example
1086 @end ifnottex
1087 @tex
1088 $${{1\over{n-1}}{\sum_{i=1}^{n}{(x_{i}-\bar{x})^2}}}$$
1089 @end tex
1090
1091 Example:
1092
1093 @c ===beg===
1094 @c load ("descriptive")$
1095 @c s1 : read_list (file_search ("pidigits.data"))$
1096 @c var1 (s1), numer;
1097 @c s2 : read_matrix (file_search ("wind.data"))$
1098 @c var1 (s2);
1099 @c ===end===
1100 @example
1101 (%i1) load ("descriptive")$
1102 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1103 @group
1104 (%i3) var1 (s1), numer;
1105 (%o3)                    8.5110101010101
1106 @end group
1107 (%i4) s2 : read_matrix (file_search ("wind.data"))$
1108 @group
1109 (%i5) var1 (s2);
1110 (%o5) [17.395865404040414, 15.139127787878794,
1111        15.632049242424243, 32.50152569696971, 24.669773929292937]
1112 @end group
1113 @end example
1114
1115 See also function @mrefdot{var}
1116
1117 @opencatbox{Categories:}
1118 @category{Package descriptive}
1119 @closecatbox
1120 @end deffn
1121
1122
1123
1124 @anchor{std}
1125 @deffn {Function} std @
1126 @fname{std} (@var{x}) @
1127 @fname{std} (@var{x}, @var{w})
1128
1129 Returns the sample standard deviation.
1130 @var{x} must be a list or matrix.
1131
1132 When @var{x} is a list,
1133 @code{std} returns the sample standard deviation of @var{x},
1134 which is defined as the square root of the sample variance,
1135 as computed by @code{var}.
1136
1137 When @var{x} is a matrix,
1138 @code{std} returns a list comprising the sample standard deviation of each column.
1139
1140 @var{w} is an optional per-datum weight.
1141 @var{w} must either be 1, in which case every datum @var{x[i]} is given equal weight,
1142 or a list of the same length as @var{x},
1143 in which case the weight for @var{x[i]} is given by @var{w[i]}.
1144 The elements of @var{w} must be nonnegative and not all zero;
1145 it is not required that they sum to 1.
1146
1147 Example:
1148
1149 Sample standard deviation of a list.
1150
1151 @c ===beg===
1152 @c load ("descriptive")$
1153 @c s1 : read_list (file_search ("pidigits.data"))$
1154 @c std (s1), numer;
1155 @c ===end===
1156 @example
1157 (%i1) load ("descriptive")$
1158 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1159 @group
1160 (%i3) std (s1), numer;
1161 (%o3)                  2.9027400848164135
1162 @end group
1163 @end example
1164
1165 Sample standard deviation of each column of a matrix.
1166
1167 @c ===beg===
1168 @c load ("descriptive")$
1169 @c s2 : read_matrix (file_search ("wind.data"))$
1170 @c std (s2);
1171 @c ===end===
1172 @example
1173 (%i1) load ("descriptive")$
1174 (%i2) s2 : read_matrix (file_search ("wind.data"))$
1175 @group
1176 (%i3) std (s2);
1177 (%o3) [4.149928523480858, 3.8713998127292415,
1178         3.9339202775348663, 5.672434260526957, 4.941970881136392]
1179 @end group
1180 @end example
1181
1182 See also functions @mref{var} and @mrefdot{std1}
1183
1184 @opencatbox{Categories:}
1185 @category{Package descriptive}
1186 @closecatbox
1187 @end deffn
1188
1189
1190
1191 @anchor{std1}
1192 @deffn {Function} std1 @
1193 @fname{std1} (@var{list}) @
1194 @fname{std1} (@var{matrix})
1195
1196 This is the square root of the function @mrefcomma{var1} the variance with denominator @math{n-1}.
1197
1198 Example:
1199
1200 @c ===beg===
1201 @c load ("descriptive")$
1202 @c s1 : read_list (file_search ("pidigits.data"))$
1203 @c std1 (s1), numer;
1204 @c s2 : read_matrix (file_search ("wind.data"))$
1205 @c std1 (s2);
1206 @c ===end===
1207 @example
1208 (%i1) load ("descriptive")$
1209 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1210 @group
1211 (%i3) std1 (s1), numer;
1212 (%o3)                   2.917363553109228
1213 @end group
1214 (%i4) s2 : read_matrix (file_search ("wind.data"))$
1215 @group
1216 (%i5) std1 (s2);
1217 (%o5) [4.170835096721089, 3.8909032097803196,
1218         3.9537386411375555, 5.701010936401517, 4.966867617451963]
1219 @end group
1220 @end example
1221
1222 See also functions @mref{var1} and @mrefdot{std}
1223
1224 @opencatbox{Categories:}
1225 @category{Package descriptive}
1226 @closecatbox
1227 @end deffn
1228
1229
1230
1231 @anchor{noncentral_moment}
1232 @deffn {Function} noncentral_moment @
1233 @fname{noncentral_moment} (@var{x}, @var{k}) @
1234 @fname{noncentral_moment} (@var{x}, @var{k}, @var{w})
1235
1236 Returns the noncentral moment of order @var{k}.
1237 @var{x} must be a list or matrix.
1238
1239 When @var{x} is a list,
1240 @code{noncentral_moment} returns the noncentral moment of order @var{k} of @var{x}.
1241
1242 When @var{x} is a matrix,
1243 @code{noncentral_moment} returns a list comprising the noncentral moment of order @var{k} of each column.
1244
1245 @var{w} is an optional per-datum weight.
1246 @var{w} must either be 1, in which case every datum @var{x[i]} is given equal weight,
1247 or a list of the same length as @var{x},
1248 in which case the weight for @var{x[i]} is given by @var{w[i]}.
1249 The elements of @var{w} must be nonnegative and not all zero;
1250 it is not required that they sum to 1.
1251
1252 The unweighted noncentral moment of order @var{k} is defined as
1253
1254 @ifnottex
1255 @example
1256 @group
1257                      n
1258                     ====
1259                  1  \      k
1260                  -   >    x
1261                  n  /      i
1262                     ====
1263                     i = 1
1264 @end group
1265 @end example
1266 @end ifnottex
1267 @tex
1268 $${{1\over{n}}{\sum_{i=1}^{n}{x_{i}^k}}}$$
1269 @end tex
1270
1271 The weighted noncentral moment of order @var{k} is defined as
1272
1273 @ifnottex
1274 @example
1275 @group
1276                      n
1277                     ====
1278                  1  \         k
1279                  -   >    w  x
1280                  Z  /      i  i
1281                     ====
1282                     i = 1
1283 @end group
1284 @end example
1285 @end ifnottex
1286 @tex
1287 $${{1\over{Z}}{\sum_{i=1}^{n}{w_{i} x_{i}^k}}}$$
1288 @end tex
1289
1290 where @var{Z} is the sum of the weights,
1291
1292 @ifnottex
1293 @example
1294                    n
1295                   ====
1296                   \
1297              Z =   >    w
1298                   /      i
1299                   ====
1300                   i = 1
1301 @end example
1302 @end ifnottex
1303 @tex
1304 $${Z={\sum_{i=1}^{n}{w_{i}}}}$$
1305 @end tex
1306
1307 Examples:
1308
1309 First noncentral moment of a list.
1310 The first noncentral moment is equal to the sample mean.
1311
1312 @c ===beg===
1313 @c load ("descriptive")$
1314 @c s1 : read_list (file_search ("pidigits.data"))$
1315 @c noncentral_moment (s1, 1), numer;
1316 @c mean (s1), numer;
1317 @c ===end===
1318 @example
1319 (%i1) load ("descriptive")$
1320 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1321 @group
1322 (%i3) noncentral_moment (s1, 1), numer;
1323 (%o3)                         4.71
1324 @end group
1325 @group
1326 (%i4) mean (s1), numer;
1327 (%o4)                         4.71
1328 @end group
1329 @end example
1330
1331 Fifth noncentral moment of each column of a matrix.
1332
1333 @c ===beg===
1334 @c load ("descriptive")$
1335 @c s2 : read_matrix (file_search ("wind.data"))$
1336 @c noncentral_moment (s2, 5);
1337 @c ===end===
1338 @example
1339 (%i1) load ("descriptive")$
1340 (%i2) s2 : read_matrix (file_search ("wind.data"))$
1341 @group
1342 (%i3) noncentral_moment (s2, 5);
1343 (%o3) [319793.87247615046, 320532.19238924625,
1344        391249.56213815557, 2502278.205988911, 1691881.7977422548]
1345 @end group
1346 @end example
1347
1348 See also function @mrefdot{central_moment}
1349
1350 @opencatbox{Categories:}
1351 @category{Package descriptive}
1352 @closecatbox
1353 @end deffn
1354
1355
1356
1357 @anchor{central_moment}
1358 @deffn {Function} central_moment @
1359 @fname{central_moment} (@var{x}, @var{k}) @
1360 @fname{central_moment} (@var{x}, @var{k}, @var{w})
1361
1362 Returns the central moment of order @var{k}.
1363 @var{x} must be a list or matrix.
1364
1365 When @var{x} is a list,
1366 @code{central_moment} returns the central moment of order @var{k} of @var{x}.
1367
1368 When @var{x} is a matrix,
1369 @code{central_moment} returns a list comprising the central moment of order @var{k} of each column.
1370
1371 @var{w} is an optional per-datum weight.
1372 @var{w} must either be 1, in which case every datum @var{x[i]} is given equal weight,
1373 or a list of the same length as @var{x},
1374 in which case the weight for @var{x[i]} is given by @var{w[i]}.
1375 The elements of @var{w} must be nonnegative and not all zero;
1376 it is not required that they sum to 1.
1377
1378 The unweighted central moment of order @var{k} is defined as
1379
1380 @ifnottex
1381 @example
1382 @group
1383                   n
1384                  ====
1385               1  \          _ k
1386               -   >    (x - x)
1387               n  /       i
1388                  ====
1389                  i = 1
1390 @end group
1391 @end example
1392 @end ifnottex
1393 @tex
1394 $${{1\over{n}}{\sum_{i=1}^{n}{(x_{i}-\bar{x})^k}}}$$
1395 @end tex
1396
1397 The weighted central moment of order @var{k} is defined as
1398
1399 @ifnottex
1400 @example
1401 @group
1402                   n
1403                  ====
1404               1  \             _ k
1405               -   >    w  (x - x)
1406               Z  /      i   i
1407                  ====
1408                  i = 1
1409 @end group
1410 @end example
1411 @end ifnottex
1412 @tex
1413 $${{1\over{Z}}{\sum_{i=1}^{n}{w_{i} (x_{i}-\bar{x})^k}}}$$
1414 @end tex
1415
1416 where @var{Z} is the sum of the weights,
1417
1418 @ifnottex
1419 @example
1420                    n
1421                   ====
1422                   \
1423              Z =   >    w
1424                   /      i
1425                   ====
1426                   i = 1
1427 @end example
1428 @end ifnottex
1429 @tex
1430 $${Z={\sum_{i=1}^{n}{w_{i}}}}$$
1431 @end tex
1432
1433 Examples:
1434
1435 Second central moment of a list.
1436 The second central moment is equal to the sample variance.
1437
1438 @c ===beg===
1439 @c load ("descriptive")$
1440 @c s1 : read_list (file_search ("pidigits.data"))$
1441 @c central_moment (s1, 2), numer;
1442 @c var (s1), numer;
1443 @c ===end===
1444 @example
1445 (%i1) load ("descriptive")$
1446 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1447 @group
1448 (%i3) central_moment (s1, 2), numer;
1449 (%o3)                   8.425899999999999
1450 @end group
1451 @group
1452 (%i4) var (s1), numer;
1453 (%o4)                   8.425899999999999
1454 @end group
1455 @end example
1456
1457 Third central moment of each column of a matrix.
1458
1459 @c ===end===
1460 @c load ("descriptive")$
1461 @c s2 : read_matrix (file_search ("wind.data"))$
1462 @c central_moment (s2, 3);
1463 @c ===end===
1464 @example
1465 (%i1) load ("descriptive")$
1466 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1467 @group
1468 (%i3) central_moment (s1, 2), numer; /* the variance */
1469 (%o3)                   8.425899999999999
1470 @end group
1471 (%i5) s2 : read_matrix (file_search ("wind.data"))$
1472 @group
1473 (%i6) central_moment (s2, 3);
1474 (%o6) [11.29584771375004, 16.97988248298583, 5.626661952750102,
1475                              37.5986572057918, 25.85981904394192]
1476 @end group
1477 @end example
1478
1479 See also functions @mref{central_moment} and @mrefdot{mean}
1480
1481 @opencatbox{Categories:}
1482 @category{Package descriptive}
1483 @closecatbox
1484 @end deffn
1485
1486
1487
1488 @anchor{cv}
1489 @deffn {Function} cv @
1490 @fname{cv} (@var{list}) @
1491 @fname{cv} (@var{matrix})
1492
1493 Returns the variation coefficient,
1494 defined as the sample standard deviation @mref{std} divided by the @mrefdot{mean}
1495
1496 Examples:
1497
1498 @c ===beg===
1499 @c load ("descriptive")$
1500 @c s1 : read_list (file_search ("pidigits.data"))$
1501 @c cv (s1), numer;
1502 @c s2 : read_matrix (file_search ("wind.data"))$
1503 @c cv (s2);
1504 @c ===end===
1505 @example
1506 (%i1) load ("descriptive")$
1507 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1508 @group
1509 (%i3) cv (s1), numer;
1510 (%o3)                  0.6162930116383044
1511 @end group
1512 (%i4) s2 : read_matrix (file_search ("wind.data"))$
1513 @group
1514 (%i5) cv (s2);
1515 (%o5) [0.4171411291632767, 0.38101703748061055,
1516       0.3619561372346568, 0.3609199356430116, 0.3329249251309538]
1517 @end group
1518 @end example
1519
1520 See also functions @mref{std} and @mrefdot{mean}
1521
1522 @opencatbox{Categories:}
1523 @category{Package descriptive}
1524 @closecatbox
1525 @end deffn
1526
1527
1528
1529 @anchor{smin}
1530 @deffn {Function} smin @
1531 @fname{smin} (@var{list}) @
1532 @fname{smin} (@var{matrix})
1533
1534 This is the minimum value of the sample @var{list}.
1535 When the argument is a matrix, @mref{smin} returns
1536 a list containing the minimum values of the columns,
1537 which are associated to statistical variables.
1538
1539 Examples:
1540
1541 @c ===beg===
1542 @c load ("descriptive")$
1543 @c s1 : read_list (file_search ("pidigits.data"))$
1544 @c smin (s1);
1545 @c s2 : read_matrix (file_search ("wind.data"))$
1546 @c smin (s2);
1547 @c ===end===
1548 @example
1549 (%i1) load ("descriptive")$
1550 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1551 @group
1552 (%i3) smin (s1);
1553 (%o3)                           0
1554 @end group
1555 (%i4) s2 : read_matrix (file_search ("wind.data"))$
1556 @group
1557 (%i5) smin (s2);
1558 (%o5)             [0.58, 0.5, 2.67, 5.25, 5.17]
1559 @end group
1560 @end example
1561
1562 See also function @mrefdot{smax}
1563
1564 @opencatbox{Categories:}
1565 @category{Package descriptive}
1566 @closecatbox
1567 @end deffn
1568
1569
1570
1571 @anchor{smax}
1572 @deffn {Function} smax @
1573 @fname{smax} (@var{list}) @
1574 @fname{smax} (@var{matrix})
1575
1576 This is the maximum value of the sample @var{list}.
1577 When the argument is a matrix, @mref{smax} returns
1578 a list containing the maximum values of the columns,
1579 which are associated to statistical variables.
1580
1581 Examples:
1582
1583 @c ===beg===
1584 @c load ("descriptive")$
1585 @c s1 : read_list (file_search ("pidigits.data"))$
1586 @c smax (s1);
1587 @c s2 : read_matrix (file_search ("wind.data"))$
1588 @c smax (s2);
1589 @c ===end===
1590 @example
1591 (%i1) load ("descriptive")$
1592 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1593 @group
1594 (%i3) smax (s1);
1595 (%o3)                           9
1596 @end group
1597 (%i4) s2 : read_matrix (file_search ("wind.data"))$
1598 @group
1599 (%i5) smax (s2);
1600 (%o5)          [20.25, 21.46, 20.04, 29.63, 27.63]
1601 @end group
1602 @end example
1603
1604 See also function @mrefdot{smin}
1605
1606 @opencatbox{Categories:}
1607 @category{Package descriptive}
1608 @closecatbox
1609 @end deffn
1610
1611
1612
1613 @anchor{range}
1614 @deffn {Function} range @
1615 @fname{range} (@var{list}) @
1616 @fname{range} (@var{matrix})
1617
1618 The range is the difference between the extreme values.
1619
1620 Example:
1621
1622 @c ===beg===
1623 @c load ("descriptive")$
1624 @c s1 : read_list (file_search ("pidigits.data"))$
1625 @c range (s1);
1626 @c s2 : read_matrix (file_search ("wind.data"))$
1627 @c range (s2);
1628 @c ===end===
1629 @example
1630 (%i1) load ("descriptive")$
1631 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1632 @group
1633 (%i3) range (s1);
1634 (%o3)                           9
1635 @end group
1636 (%i4) s2 : read_matrix (file_search ("wind.data"))$
1637 @group
1638 (%i5) range (s2);
1639 (%o5)   [19.67, 20.96, 17.369999999999997, 24.38, 22.46]
1640 @end group
1641 @end example
1642
1643 @opencatbox{Categories:}
1644 @category{Package descriptive}
1645 @closecatbox
1646 @end deffn
1647
1648
1649
1650 @anchor{quantile}
1651 @deffn {Function} quantile @
1652 @fname{quantile} (@var{list}, @var{p}) @
1653 @fname{quantile} (@var{matrix}, @var{p})
1654
1655 This is the @var{p}-quantile, with @var{p} a number in @math{[0, 1]}, of the sample @var{list}.
1656 Although there are several definitions for the sample quantile (Hyndman, R. J., Fan, Y. (1996) @var{Sample quantiles in statistical packages}. American Statistician, 50, 361-365), the one based on linear interpolation is implemented in package @ref{Package descriptive}
1657
1658 Examples:
1659
1660 Input is a list. First and third quartiles are computed.
1661
1662 @c ===beg===
1663 @c load ("descriptive")$
1664 @c s1 : read_list (file_search ("pidigits.data"))$
1665 @c [quantile (s1, 1/4), quantile (s1, 3/4)], numer;
1666 @c ===end===
1667 @example
1668 (%i1) load ("descriptive")$
1669 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1670 @group
1671 (%i3) [quantile (s1, 1/4), quantile (s1, 3/4)], numer;
1672 (%o3)                      [2.0, 7.25]
1673 @end group
1674 @end example
1675
1676 Input is a matrix. First quartile is computed for each column.
1677
1678 @c ===beg===
1679 @c load ("descriptive")$
1680 @c s2 : read_matrix (file_search ("wind.data"))$
1681 @c quantile (s2, 1/4);
1682 @c ===end===
1683 @example
1684 (%i1) load ("descriptive")$
1685 (%i2) s2 : read_matrix (file_search ("wind.data"))$
1686 @group
1687 (%i3) quantile (s2, 1/4);
1688 (%o3)    [7.2575, 7.477500000000001, 7.82, 11.28, 11.48]
1689 @end group
1690 @end example
1691
1692 @opencatbox{Categories:}
1693 @category{Package descriptive}
1694 @closecatbox
1695 @end deffn
1696
1697
1698
1699 @anchor{median}
1700 @deffn {Function} median @
1701 @fname{median} (@var{list}) @
1702 @fname{median} (@var{matrix})
1703
1704 Once the sample is ordered, if the sample size is odd the median is the central value, otherwise it is the mean of the two central values.
1705
1706 Example:
1707
1708 @c ===beg===
1709 @c load ("descriptive")$
1710 @c s1 : read_list (file_search ("pidigits.data"))$
1711 @c median (s1);
1712 @c s2 : read_matrix (file_search ("wind.data"))$
1713 @c median (s2);
1714 @c ===end===
1715 @example
1716 (%i1) load ("descriptive")$
1717 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1718 @group
1719 (%i3) median (s1);
1720                                 9
1721 (%o3)                           -
1722                                 2
1723 @end group
1724 (%i4) s2 : read_matrix (file_search ("wind.data"))$
1725 @group
1726 (%i5) median (s2);
1727 (%o5)   [10.059999999999999, 9.855, 10.73, 15.48, 14.105]
1728 @end group
1729 @end example
1730
1731 The median is the 1/2-quantile.
1732
1733 See also function @mrefdot{quantile}
1734
1735 @opencatbox{Categories:}
1736 @category{Package descriptive}
1737 @closecatbox
1738 @end deffn
1739
1740
1741
1742 @anchor{qrange}
1743 @deffn {Function} qrange @
1744 @fname{qrange} (@var{x})
1745
1746 Returns the interquartile range,
1747 defined as the difference between the third and first quartiles:
1748 @code{quantile(@var{x}, 3/4) - quantile(@var{x}, 1/4)}
1749
1750 @var{x} must be a list or matrix.
1751 When @var{x} is a matrix,
1752 @code{qrange} returns the interquartile range for each column.
1753
1754 Examples:
1755
1756 @c ===beg===
1757 @c load ("descriptive")$
1758 @c s1 : read_list (file_search ("pidigits.data"))$
1759 @c qrange (s1);
1760 @c s2 : read_matrix (file_search ("wind.data"))$
1761 @c qrange (s2);
1762 @c ===end===
1763 @example
1764 (%i1) load ("descriptive")$
1765 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1766 @group
1767 (%i3) qrange (s1);
1768                                21
1769 (%o3)                          --
1770                                4
1771 @end group
1772 (%i4) s2 : read_matrix (file_search ("wind.data"))$
1773 @group
1774 (%i5) qrange (s2);
1775 (%o5) [5.385, 5.572499999999998, 6.022500000000001,
1776                             8.729999999999999, 6.649999999999999]
1777 @end group
1778 @end example
1779
1780 See also function @mrefdot{quantile}
1781
1782 @opencatbox{Categories:}
1783 @category{Package descriptive}
1784 @closecatbox
1785 @end deffn
1786
1787
1788
1789 @anchor{mean_deviation}
1790 @deffn {Function} mean_deviation @
1791 @fname{mean_deviation} (@var{list}) @
1792 @fname{mean_deviation} (@var{matrix})
1793
1794 The mean deviation, defined as
1795 @ifnottex
1796 @example
1797 @group
1798                      n
1799                    ====
1800                1   \          _
1801                -    >    |x - x|
1802                n   /       i
1803                    ====
1804                    i = 1
1805 @end group
1806 @end example
1807 @end ifnottex
1808 @tex
1809 $${{1\over{n}}{\sum_{i=1}^{n}{|x_{i}-\bar{x}|}}}$$
1810 @end tex
1811
1812 Example:
1813
1814 @c ===beg===
1815 @c load ("descriptive")$
1816 @c s1 : read_list (file_search ("pidigits.data"))$
1817 @c mean_deviation (s1);
1818 @c s2 : read_matrix (file_search ("wind.data"))$
1819 @c mean_deviation (s2);
1820 @c ===end===
1821 @example
1822 (%i1) load ("descriptive")$
1823 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1824 @group
1825 (%i3) mean_deviation (s1);
1826                                51
1827 (%o3)                          --
1828                                20
1829 @end group
1830 (%i4) s2 : read_matrix (file_search ("wind.data"))$
1831 @group
1832 (%i5) mean_deviation (s2);
1833 (%o5) [3.2879599999999987, 3.075342, 3.2390700000000003,
1834                             4.715664000000001, 4.028546000000002]
1835 @end group
1836 @end example
1837
1838 See also function @mrefdot{mean}
1839
1840 @opencatbox{Categories:}
1841 @category{Package descriptive}
1842 @closecatbox
1843 @end deffn
1844
1845
1846
1847 @anchor{median_deviation}
1848 @deffn {Function} median_deviation @
1849 @fname{median_deviation} (@var{list}) @
1850 @fname{median_deviation} (@var{matrix})
1851
1852 The median deviation, defined as
1853 @ifnottex
1854 @example
1855 @group
1856                  n
1857                ====
1858            1   \
1859            -    >    |x - med|
1860            n   /       i
1861                ====
1862                i = 1
1863 @end group
1864 @end example
1865 @end ifnottex
1866 @tex
1867 $${{1\over{n}}{\sum_{i=1}^{n}{|x_{i}-med|}}}$$
1868 @end tex
1869 where @code{med} is the median of @var{list}.
1870
1871 Example:
1872
1873 @c ===beg===
1874 @c load ("descriptive")$
1875 @c s1 : read_list (file_search ("pidigits.data"))$
1876 @c median_deviation (s1);
1877 @c s2 : read_matrix (file_search ("wind.data"))$
1878 @c median_deviation (s2);
1879 @c ===end===
1880 @example
1881 (%i1) load ("descriptive")$
1882 (%i2) s1 : read_list (file_search ("pidigits.data"))$
1883 @group
1884 (%i3) median_deviation (s1);
1885                                 5
1886 (%o3)                           -
1887                                 2
1888 @end group
1889 (%i4) s2 : read_matrix (file_search ("wind.data"))$
1890 @group
1891 (%i5) median_deviation (s2);
1892 (%o5) [2.75, 2.7550000000000003, 3.08, 4.315, 3.3099999999999996]
1893 @end group
1894 @end example
1895
1896 See also function @mrefdot{mean}
1897
1898 @opencatbox{Categories:}
1899 @category{Package descriptive}
1900 @closecatbox
1901 @end deffn
1902
1903
1904
1905 @anchor{harmonic_mean}
1906 @deffn {Function} harmonic_mean @
1907 @fname{harmonic_mean} (@var{list}) @
1908 @fname{harmonic_mean} (@var{matrix})
1909
1910 The harmonic mean, defined as
1911 @ifnottex
1912 @example
1913 @group
1914                   n
1915                --------
1916                 n
1917                ====
1918                \     1
1919                 >    --
1920                /     x
1921                ====   i
1922                i = 1
1923 @end group
1924 @end example
1925 @end ifnottex
1926 @tex
1927 $${{n}\over{\sum_{i=1}^{n}{{{1}\over{x_{i}}}}}}$$
1928 @end tex
1929
1930 Example:
1931
1932 @c ===beg===
1933 @c load ("descriptive")$
1934 @c y : [5, 7, 2, 5, 9, 5, 6, 4, 9, 2, 4, 2, 5]$
1935 @c harmonic_mean (y), numer;
1936 @c s2 : read_matrix (file_search ("wind.data"))$
1937 @c harmonic_mean (s2);
1938 @c ===end===
1939 @example
1940 (%i1) load ("descriptive")$
1941 (%i2) y : [5, 7, 2, 5, 9, 5, 6, 4, 9, 2, 4, 2, 5]$
1942 @group
1943 (%i3) harmonic_mean (y), numer;
1944 (%o3)                  3.9018580276322052
1945 @end group
1946 (%i4) s2 : read_matrix (file_search ("wind.data"))$
1947 @group
1948 (%i5) harmonic_mean (s2);
1949 (%o5) [6.948015590052786, 7.391967752360356, 9.055658197151745,
1950                            13.441990281936924, 13.01439145898509]
1951 @end group
1952 @end example
1953
1954 See also functions @mref{mean} and @mrefdot{geometric_mean}
1955
1956 @opencatbox{Categories:}
1957 @category{Package descriptive}
1958 @closecatbox
1959
1960 @end deffn
1961
1962
1963
1964 @anchor{geometric_mean}
1965 @deffn {Function} geometric_mean @
1966 @fname{geometric_mean} (@var{list}) @
1967 @fname{geometric_mean} (@var{matrix})
1968
1969 The geometric mean, defined as
1970 @ifnottex
1971 @example
1972 @group
1973                  /  n      \ 1/n
1974                  | /===\   |
1975                  |  ! !    |
1976                  |  ! !  x |
1977                  |  ! !   i|
1978                  | i = 1   |
1979                  \         /
1980 @end group
1981 @end example
1982 @end ifnottex
1983 @tex
1984 $$\left(\prod_{i=1}^{n}{x_{i}}\right)^{{{1}\over{n}}}$$
1985 @end tex
1986
1987 Example:
1988
1989 @c ===beg===
1990 @c load ("descriptive")$
1991 @c y : [5, 7, 2, 5, 9, 5, 6, 4, 9, 2, 4, 2, 5]$
1992 @c geometric_mean (y), numer;
1993 @c s2 : read_matrix (file_search ("wind.data"))$
1994 @c geometric_mean (s2);
1995 @c ===end===
1996 @example
1997 (%i1) load ("descriptive")$
1998 (%i2) y : [5, 7, 2, 5, 9, 5, 6, 4, 9, 2, 4, 2, 5]$
1999 @group
2000 (%i3) geometric_mean (y), numer;
2001 (%o3)                   4.454845412337012
2002 @end group
2003 (%i4) s2 : read_matrix (file_search ("wind.data"))$
2004 @group
2005 (%i5) geometric_mean (s2);
2006 (%o5) [8.82476274347979, 9.22652604739361, 10.044267571488904,
2007                            14.612741263490207, 13.96184163444275]
2008 @end group
2009 @end example
2010
2011 See also functions @mref{mean} and @mrefdot{harmonic_mean}
2012
2013 @opencatbox{Categories:}
2014 @category{Package descriptive}
2015 @closecatbox
2016 @end deffn
2017
2018
2019
2020 @anchor{kurtosis}
2021 @deffn {Function} kurtosis @
2022 @fname{kurtosis} (@var{list}) @
2023 @fname{kurtosis} (@var{matrix})
2024
2025 The kurtosis coefficient, defined as
2026 @ifnottex
2027 @example
2028 @group
2029                     n
2030                   ====
2031             1     \          _ 4
2032            ----    >    (x - x)  - 3
2033               4   /       i
2034            n s    ====
2035                   i = 1
2036 @end group
2037 @end example
2038 @end ifnottex
2039 @tex
2040 $${{1\over{n s^4}}{\sum_{i=1}^{n}{(x_{i}-\bar{x})^4}}-3}$$
2041 @end tex
2042
2043 Example:
2044
2045 @c ===beg===
2046 @c load ("descriptive")$
2047 @c s1 : read_list (file_search ("pidigits.data"))$
2048 @c kurtosis (s1), numer;
2049 @c s2 : read_matrix (file_search ("wind.data"))$
2050 @c kurtosis (s2);
2051 @c ===end===
2052 @example
2053 (%i1) load ("descriptive")$
2054 (%i2) s1 : read_list (file_search ("pidigits.data"))$
2055 @group
2056 (%i3) kurtosis (s1), numer;
2057 (%o3)                  - 1.273247946514421
2058 @end group
2059 (%i4) s2 : read_matrix (file_search ("wind.data"))$
2060 @group
2061 (%i5) kurtosis (s2);
2062 (%o5) [- 0.2715445622195385, 0.119998784429451,
2063 - 0.42752334904828615, - 0.6405361979019522,
2064 - 0.4952382132352935]
2065 @end group
2066 @end example
2067
2068 See also functions @mrefcomma{mean} @mref{var} and @mrefdot{skewness}
2069
2070 @opencatbox{Categories:}
2071 @category{Package descriptive}
2072 @closecatbox
2073 @end deffn
2074
2075
2076
2077 @anchor{skewness}
2078 @deffn {Function} skewness @
2079 @fname{skewness} (@var{list}) @
2080 @fname{skewness} (@var{matrix})
2081
2082 The skewness coefficient, defined as
2083 @ifnottex
2084 @example
2085 @group
2086                     n
2087                   ====
2088             1     \          _ 3
2089            ----    >    (x - x)
2090               3   /       i
2091            n s    ====
2092                   i = 1
2093 @end group
2094 @end example
2095 @end ifnottex
2096 @tex
2097 $${{1\over{n s^3}}{\sum_{i=1}^{n}{(x_{i}-\bar{x})^3}}}$$
2098 @end tex
2099
2100 Example:
2101
2102 @c ===beg===
2103 @c load ("descriptive")$
2104 @c s1 : read_list (file_search ("pidigits.data"))$
2105 @c skewness (s1), numer;
2106 @c s2 : read_matrix (file_search ("wind.data"))$
2107 @c skewness (s2);
2108 @c ===end===
2109 @example
2110 (%i1) load ("descriptive")$
2111 (%i2) s1 : read_list (file_search ("pidigits.data"))$
2112 @group
2113 (%i3) skewness (s1), numer;
2114 (%o3)                 0.009196180476450424
2115 @end group
2116 (%i4) s2 : read_matrix (file_search ("wind.data"))$
2117 @group
2118 (%i5) skewness (s2);
2119 (%o5) [0.1580509020000978, 0.2926379232061854,
2120    0.09242174416107717, 0.20599843481486865, 0.21425202488908313]
2121 @end group
2122 @end example
2123
2124 See also functions @mrefcomma{mean}, @mref{var} and @mrefdot{kurtosis}
2125
2126 @opencatbox{Categories:}
2127 @category{Package descriptive}
2128 @closecatbox
2129 @end deffn
2130
2131
2132
2133 @anchor{pearson_skewness}
2134 @deffn {Function} pearson_skewness @
2135 @fname{pearson_skewness} (@var{list}) @
2136 @fname{pearson_skewness} (@var{matrix})
2137
2138 Pearson's skewness coefficient, defined as
2139 @ifnottex
2140 @example
2141 @group
2142                 _
2143              3 (x - med)
2144              -----------
2145                   s
2146 @end group
2147 @end example
2148 @end ifnottex
2149 @tex
2150 $${{3\,\left(\bar{x}-med\right)}\over{s}}$$
2151 @end tex
2152 where @var{med} is the median of @var{list}.
2153
2154 Example:
2155
2156 @c ===beg===
2157 @c load ("descriptive")$
2158 @c s1 : read_list (file_search ("pidigits.data"))$
2159 @c pearson_skewness (s1), numer;
2160 @c s2 : read_matrix (file_search ("wind.data"))$
2161 @c pearson_skewness (s2);
2162 @c ===end===
2163 @example
2164 (%i1) load ("descriptive")$
2165 (%i2) s1 : read_list (file_search ("pidigits.data"))$
2166 @group
2167 (%i3) pearson_skewness (s1), numer;
2168 (%o3)                  0.21594840290938955
2169 @end group
2170 (%i4) s2 : read_matrix (file_search ("wind.data"))$
2171 @group
2172 (%i5) pearson_skewness (s2);
2173 (%o5) [- 0.08019976629211892, 0.2357036272952649,
2174    0.10509040624912039, 0.12450423405923679, 0.44641817958045193]
2175 @end group
2176 @end example
2177
2178 See also functions @mrefcomma{mean} @mref{var} and @mrefdot{median}
2179
2180 @opencatbox{Categories:}
2181 @category{Package descriptive}
2182 @closecatbox
2183 @end deffn
2184
2185
2186
2187 @anchor{quartile_skewness}
2188 @deffn {Function} quartile_skewness @
2189 @fname{quartile_skewness} (@var{list}) @
2190 @fname{quartile_skewness} (@var{matrix})
2191
2192 The quartile skewness coefficient, defined as
2193 @ifnottex
2194 @example
2195 @group
2196                c    - 2 c    + c
2197                 3/4      1/2    1/4
2198                --------------------
2199                    c    - c
2200                     3/4    1/4
2201 @end group
2202 @end example
2203 @end ifnottex
2204 @tex
2205 $${{c_{{{3}\over{4}}}-2\,c_{{{1}\over{2}}}+c_{{{1}\over{4}}}}\over{c
2206  _{{{3}\over{4}}}-c_{{{1}\over{4}}}}}$$
2207 @end tex
2208 where @math{c_p} is the @var{p}-quantile of sample @var{list}.
2209
2210 Example:
2211
2212 @c ===beg===
2213 @c load ("descriptive")$
2214 @c s1 : read_list (file_search ("pidigits.data"))$
2215 @c quartile_skewness (s1), numer;
2216 @c s2 : read_matrix (file_search ("wind.data"))$
2217 @c quartile_skewness (s2);
2218 @c ===end===
2219 @example
2220 (%i1) load ("descriptive")$
2221 (%i2) s1 : read_list (file_search ("pidigits.data"))$
2222 @group
2223 (%i3) quartile_skewness (s1), numer;
2224 (%o3)                 0.047619047619047616
2225 @end group
2226 (%i4) s2 : read_matrix (file_search ("wind.data"))$
2227 @group
2228 (%i5) quartile_skewness (s2);
2229 (%o5) [- 0.040854224698235304, 0.14670255720053824,
2230    0.033623910336239196, 0.03780068728522298, 0.2105263157894735]
2231 @end group
2232 @end example
2233
2234 See also function @mrefdot{quantile}
2235
2236 @opencatbox{Categories:}
2237 @category{Package descriptive}
2238 @closecatbox
2239 @end deffn
2240
2241
2242
2243 @anchor{km}
2244 @deffn {Function} km @
2245 @fname{km} (@var{list}, @var{option} ...) @
2246 @fname{km} (@var{matrix}, @var{option} ...)
2247
2248 Kaplan Meier estimator of the survival, or reliability, function @math{S(x)=1-F(x)}.
2249
2250 Data can be introduced as a list of pairs, or as a two column matrix. The first
2251 component is the observed time, and the second component a censoring index
2252 (1 = non censored, 0 = right censored).
2253
2254 The optional argument is the name of the variable in the returned expression,
2255 which is @var{x} by default.
2256
2257 Examples:
2258
2259 Sample as a list of pairs.
2260
2261 @c ===beg===
2262 @c load ("descriptive")$
2263 @c S: km([[2,1], [3,1], [5,0], [8,1]]);
2264 @c load ("draw")$
2265 @c draw2d(
2266 @c   line_width = 3, grid = true,
2267 @c   explicit(S, x, -0.1, 10))$
2268 @c ===end===
2269 @example
2270 (%i1) load ("descriptive")$
2271 @group
2272 (%i2) S: km([[2,1], [3,1], [5,0], [8,1]]);
2273                        charfun((3 <= x) and (x < 8))
2274 (%o2) charfun(x < 0) + -----------------------------
2275                                      2
2276    3 charfun((2 <= x) and (x < 3))
2277  + -------------------------------
2278                   4
2279  + charfun((0 <= x) and (x < 2))
2280 @end group
2281 (%i3) load ("draw")$
2282 @group
2283 (%i4) draw2d(
2284   line_width = 3, grid = true,
2285   explicit(S, x, -0.1, 10))$
2286 @end group
2287 @end example
2288
2289 Estimate survival probabilities.
2290
2291 @c ===beg===
2292 @c load ("descriptive")$
2293 @c S(t):= ''(km([[2,1], [3,1], [5,0], [8,1]], t)) $
2294 @c S(6);
2295 @c ===end===
2296 @example
2297 (%i1) load ("descriptive")$
2298 (%i2) S(t):= ''(km([[2,1], [3,1], [5,0], [8,1]], t)) $
2299 @group
2300 (%i3) S(6);
2301                                 1
2302 (%o3)                           -
2303                                 2
2304 @end group
2305 @end example
2306
2307 @opencatbox{Categories:}
2308 @category{Package descriptive}
2309 @closecatbox
2310 @end deffn
2311
2312
2313
2314 @anchor{cdf_empirical}
2315 @deffn {Function} cdf_empirical @
2316 @fname{cdf_empirical} (@var{list}, @var{option} ...) @
2317 @fname{cdf_empirical} (@var{matrix}, @var{option} ...)
2318
2319 Empirical distribution function @math{F(x)}.
2320
2321 Data can be introduced as a list of numbers, or as an one column matrix.
2322
2323 The optional argument is the name of the variable in the returned expression,
2324 which is @var{x} by default.
2325
2326 Example:
2327
2328 Empirical distribution function.
2329
2330 @c ===beg===
2331 @c load ("descriptive")$
2332 @c F(x):= ''(cdf_empirical([1,3,3,5,7,7,7,8,9]));
2333 @c F(6);
2334 @c load("draw")$
2335 @c draw2d(
2336 @c    line_width = 3,
2337 @c    grid       = true,
2338 @c    explicit(F(z), z, -2, 12)) $
2339 @c ===end===
2340 @example
2341 (%i1) load ("descriptive")$
2342 @group
2343 (%i2) F(x):= ''(cdf_empirical([1,3,3,5,7,7,7,8,9]));
2344 (%o2) F(x) := (charfun(x >= 9) + charfun(x >= 8)
2345  + 3 charfun(x >= 7) + charfun(x >= 5) + 2 charfun(x >= 3)
2346  + charfun(x >= 1))/9
2347 @end group
2348 @group
2349 (%i3) F(6);
2350                                 4
2351 (%o3)                           -
2352                                 9
2353 @end group
2354 (%i4) load("draw")$
2355 @group
2356 (%i5) draw2d(
2357    line_width = 3,
2358    grid       = true,
2359    explicit(F(z), z, -2, 12)) $
2360 @end group
2361 @end example
2362
2363 @opencatbox{Categories:}
2364 @category{Package descriptive}
2365 @closecatbox
2366 @end deffn
2367
2368
2369
2370 @anchor{cov}
2371 @deffn {Function} cov @
2372 @fname{cov} (@var{X}) @
2373 @fname{cov} (@var{X}, @var{w})
2374
2375 Returns the sample covariance matrix.
2376 @var{X} must be a matrix.
2377
2378 The sample covariance matrix has the same number of rows and columns,
2379 both equal to the number of columns of @var{X};
2380 each diagonal element @var{X[i, i]} is equal to the sample variance of the @var{i}'th column,
2381 and each off-diagonal element @var{X[i, j]} is equal to the sample covariance of the @var{i}'th and @var{j}'th columns.
2382
2383 @var{w} is an optional per-datum weight.
2384 @var{w} must either be 1, in which case every datum @var{X[i]} is given equal weight,
2385 or a list of the same length as @var{X},
2386 in which case the weight for @var{X[i]} is given by @var{w[i]}.
2387 The elements of @var{w} must be nonnegative and not all zero;
2388 it is not required that they sum to 1.
2389
2390 The unweighted sample covariance is defined as
2391
2392 @ifnottex
2393 @example
2394 @group
2395               n
2396              ====
2397           1  \           _        _
2398       S = -   >    (X  - X) (X  - X)'
2399           n  /       j        j
2400              ====
2401              j = 1
2402 @end group
2403 @end example
2404 @end ifnottex
2405 @tex
2406 $${S={1\over{n}}{\sum_{j=1}^{n}{\left(X_{j}-\bar{X}\right)\,\left(X_{j}-\bar{X}\right)'}}}$$
2407 @end tex
2408
2409 where @var{X[j]} is the @var{j}'th row of the sample matrix.
2410
2411 The weighted sample covariance is defined as
2412
2413 @ifnottex
2414 @example
2415 @group
2416               n
2417              ====
2418           1  \           _        _
2419       S = -   >    w  (X  - X) (X  - X)'
2420           Z  /      j   j        j
2421              ====
2422              j = 1
2423 @end group
2424 @end example
2425 @end ifnottex
2426 @tex
2427 $${S={1\over{Z}}{\sum_{j=1}^{n}{w_j \left(X_{j}-\bar{X}\right)\,\left(X_{j}-\bar{X}\right)'}}}$$
2428 @end tex
2429
2430 where @var{Z} is the sum of the weights,
2431
2432 @ifnottex
2433 @example
2434                    n
2435                   ====
2436                   \
2437              Z =   >    w
2438                   /      i
2439                   ====
2440                   i = 1
2441 @end example
2442 @end ifnottex
2443 @tex
2444 $${Z={\sum_{i=1}^{n}{w_{i}}}}$$
2445 @end tex
2446
2447 Example:
2448
2449 @c ===beg===
2450 @c load ("descriptive")$
2451 @c s2 : read_matrix (file_search ("wind.data"))$
2452 @c fpprintprec : 7$
2453 @c cov (s2);
2454 @c ===end===
2455 @example
2456 (%i1) load ("descriptive")$
2457 (%i2) s2 : read_matrix (file_search ("wind.data"))$
2458 (%i3) fpprintprec : 7$
2459 @group
2460 (%i4) cov (s2);
2461       [ 17.22191  13.61811  14.37217  19.39624  15.42162 ]
2462       [                                                  ]
2463       [ 13.61811  14.98774  13.30448  15.15834  14.9711  ]
2464       [                                                  ]
2465 (%o4) [ 14.37217  13.30448  15.47573  17.32544  16.18171 ]
2466       [                                                  ]
2467       [ 19.39624  15.15834  17.32544  32.17651  20.44685 ]
2468       [                                                  ]
2469       [ 15.42162  14.9711   16.18171  20.44685  24.42308 ]
2470 @end group
2471 @end example
2472
2473 See also function @mrefdot{cov1}
2474
2475 @opencatbox{Categories:}
2476 @category{Package descriptive}
2477 @closecatbox
2478 @end deffn
2479
2480
2481
2482 @anchor{cov1}
2483 @deffn {Function} cov1 (@var{matrix})
2484
2485 The covariance matrix of the multivariate sample, defined as
2486 @ifnottex
2487 @example
2488 @group
2489               n
2490              ====
2491          1   \           _        _
2492    S  = ---   >    (X  - X) (X  - X)'
2493     1   n-1  /       j        j
2494              ====
2495              j = 1
2496 @end group
2497 @end example
2498 @end ifnottex
2499 @tex
2500 $${{1\over{n-1}}{\sum_{j=1}^{n}{\left(X_{j}-\bar{X}\right)\,\left(X_{j}-\bar{X}\right)'}}}$$
2501 @end tex
2502 where @math{X_j} is the @math{j}-th row of the sample matrix.
2503
2504 Example:
2505
2506 @c ===beg===
2507 @c load ("descriptive")$
2508 @c s2 : read_matrix (file_search ("wind.data"))$
2509 @c fpprintprec : 7$
2510 @c cov1 (s2);
2511 @c ===end===
2512 @example
2513 (%i1) load ("descriptive")$
2514 (%i2) s2 : read_matrix (file_search ("wind.data"))$
2515 (%i3) fpprintprec : 7$
2516 @group
2517 (%i4) cov1 (s2);
2518       [ 17.39587  13.75567  14.51734  19.59216  15.5774  ]
2519       [                                                  ]
2520       [ 13.75567  15.13913  13.43887  15.31145  15.12232 ]
2521       [                                                  ]
2522 (%o4) [ 14.51734  13.43887  15.63205  17.50044  16.34516 ]
2523       [                                                  ]
2524       [ 19.59216  15.31145  17.50044  32.50153  20.65338 ]
2525       [                                                  ]
2526       [ 15.5774   15.12232  16.34516  20.65338  24.66977 ]
2527 @end group
2528 @end example
2529
2530 See also function @mrefdot{cov}
2531
2532 @opencatbox{Categories:}
2533 @category{Package descriptive}
2534 @closecatbox
2535 @end deffn
2536
2537
2538
2539 @anchor{global_variances}
2540 @deffn {Function} global_variances @
2541 @fname{global_variances} (@var{matrix}) @
2542 @fname{global_variances} (@var{matrix}, @var{options} ...)
2543
2544 Function @code{global_variances} returns a list of global variance measures:
2545
2546 @itemize @bullet
2547 @item
2548 @var{total variance}: @code{trace(S_1)},
2549 @item
2550 @var{mean variance}: @code{trace(S_1)/p},
2551 @item
2552 @var{generalized variance}: @code{determinant(S_1)},
2553 @item
2554 @var{generalized standard deviation}: @code{sqrt(determinant(S_1))},
2555 @item
2556 @var{effective variance} @code{determinant(S_1)^(1/p)}, (defined in: Pe@~na, D. (2002) @var{An@'alisis de datos multivariantes}; McGraw-Hill, Madrid.)
2557 @item
2558 @var{effective standard deviation}: @code{determinant(S_1)^(1/(2*p))}.
2559 @end itemize
2560 where @var{p} is the dimension of the multivariate random variable and @math{S_1} the covariance matrix returned by @code{cov1}.
2561
2562 Option:
2563
2564 @itemize @bullet
2565 @item
2566 @code{'data}, default @code{'true}, indicates whether the input matrix contains the sample data,
2567 in which case the covariance matrix @code{cov1} must be calculated, or not, and then the covariance
2568 matrix (symmetric) must be given, instead of the data.
2569 @end itemize
2570
2571 Examples:
2572
2573 Calculate the @code{global_variances} from sample data.
2574
2575 @c ===beg===
2576 @c load ("descriptive")$
2577 @c s2 : read_matrix (file_search ("wind.data"))$
2578 @c global_variances (s2);
2579 @c ===end===
2580 @example
2581 (%i1) load ("descriptive")$
2582 (%i2) s2 : read_matrix (file_search ("wind.data"))$
2583 @group
2584 (%i3) global_variances (s2);
2585 (%o3) [105.33834206060595, 21.06766841212119, 12874.34690469686,
2586        113.46517926085015, 6.636590811800794, 2.5761581496097623]
2587 @end group
2588 @end example
2589
2590 Calculate the @code{global_variances} from the covariance matrix.
2591
2592 @c ===beg===
2593 @c load ("descriptive")$
2594 @c s2 : read_matrix (file_search ("wind.data"))$
2595 @c s : cov1 (s2)$
2596 @c global_variances (s, data=false);
2597 @c ===end===
2598 @example
2599 (%i1) load ("descriptive")$
2600 (%i2) s2 : read_matrix (file_search ("wind.data"))$
2601 (%i3) s : cov1 (s2)$
2602 @group
2603 (%i4) global_variances (s, data=false);
2604 (%o4) [105.33834206060595, 21.06766841212119, 12874.34690469686,
2605        113.46517926085015, 6.636590811800794, 2.5761581496097623]
2606 @end group
2607 @end example
2608
2609 See also @mref{cov} and @mrefdot{cov1}
2610
2611 @opencatbox{Categories:}
2612 @category{Package descriptive}
2613 @closecatbox
2614 @end deffn
2615
2616
2617
2618 @anchor{cor}
2619 @deffn {Function} cor @
2620 @fname{cor} (@var{matrix}) @
2621 @fname{cor} (@var{matrix}, @var{logical_value})
2622
2623 The correlation matrix of the multivariate sample.
2624
2625 Option:
2626
2627 @itemize @bullet
2628 @item
2629 @code{'data}, default @code{'true}, indicates whether the input matrix contains the sample data,
2630 in which case the covariance matrix @code{cov1} must be calculated, or not, and then the covariance
2631 matrix (symmetric) must be given, instead of the data.
2632 @end itemize
2633
2634 Examples:
2635
2636 @c ===beg===
2637 @c load ("descriptive")$
2638 @c fpprintprec : 7 $
2639 @c s2 : read_matrix (file_search ("wind.data"))$
2640 @c cor (s2);
2641 @c ===end===
2642 @example
2643 (%i1) load ("descriptive")$
2644 (%i2) fpprintprec : 7 $
2645 (%i3) s2 : read_matrix (file_search ("wind.data"))$
2646 @group
2647 (%i4) cor (s2);
2648       [    1.0     0.8476339  0.8803515  0.8239624  0.7519506 ]
2649       [                                                       ]
2650       [ 0.8476339     1.0     0.8735834  0.6902622  0.782502  ]
2651       [                                                       ]
2652 (%o4) [ 0.8803515  0.8735834     1.0     0.7764065  0.8323358 ]
2653       [                                                       ]
2654       [ 0.8239624  0.6902622  0.7764065     1.0     0.7293848 ]
2655       [                                                       ]
2656       [ 0.7519506  0.782502   0.8323358  0.7293848     1.0    ]
2657 @end group
2658 @end example
2659
2660 Calculate the correlation matrix from the covariance matrix.
2661
2662 @c ===beg===
2663 @c load ("descriptive")$
2664 @c fpprintprec : 7 $
2665 @c s2 : read_matrix (file_search ("wind.data"))$
2666 @c s : cov1 (s2)$
2667 @c cor (s, data=false); /* this is faster */
2668 @c ===end===
2669 @example
2670 (%i1) load ("descriptive")$
2671 (%i2) fpprintprec : 7 $
2672 (%i3) s2 : read_matrix (file_search ("wind.data"))$
2673 (%i4) s : cov1 (s2)$
2674 @group
2675 (%i5) cor (s, data=false); /* this is faster */
2676       [    1.0     0.8476339  0.8803515  0.8239624  0.7519506 ]
2677       [                                                       ]
2678       [ 0.8476339     1.0     0.8735834  0.6902622  0.782502  ]
2679       [                                                       ]
2680 (%o5) [ 0.8803515  0.8735834     1.0     0.7764065  0.8323358 ]
2681       [                                                       ]
2682       [ 0.8239624  0.6902622  0.7764065     1.0     0.7293848 ]
2683       [                                                       ]
2684       [ 0.7519506  0.782502   0.8323358  0.7293848     1.0    ]
2685 @end group
2686 @end example
2687
2688 See also @mref{cov} and @mrefdot{cov1}
2689
2690 @opencatbox{Categories:}
2691 @category{Package descriptive}
2692 @closecatbox
2693 @end deffn
2694
2695
2696
2697 @anchor{list_correlations}
2698 @deffn {Function} list_correlations @
2699 @fname{list_correlations} (@var{matrix}) @
2700 @fname{list_correlations} (@var{matrix}, @var{options} ...)
2701
2702 Function @code{list_correlations} returns a list of correlation measures:
2703
2704 @itemize @bullet
2705
2706 @item
2707 @var{precision matrix}: the inverse of the covariance matrix @math{S_1},
2708 @ifnottex
2709 @example
2710 @group
2711        -1     ij
2712       S   = (s  )
2713        1         i,j = 1,2,...,p
2714 @end group
2715 @end example
2716 @end ifnottex
2717 @tex
2718 $${S_{1}^{-1}}={\left(s^{ij}\right)_{i,j=1,2,\ldots, p}}$$
2719 @end tex
2720
2721 @item
2722 @var{multiple correlation vector}:  @math{(R_1^2, R_2^2, ..., R_p^2)}, with
2723 @ifnottex
2724 @example
2725 @group
2726        2          1
2727       R  = 1 - -------
2728        i        ii
2729                s   s
2730                     ii
2731 @end group
2732 @end example
2733 @end ifnottex
2734 @tex
2735 $${R_{i}^{2}}={1-{{1}\over{s^{ii}s_{ii}}}}$$
2736 @end tex
2737 being an indicator of the goodness of fit of the linear multivariate regression model on @math{X_i} when the rest of variables are used as regressors.
2738
2739 @item
2740 @var{partial correlation matrix}: with element @math{(i, j)} being
2741 @ifnottex
2742 @example
2743 @group
2744                          ij
2745                         s
2746       r        = - ------------
2747        ij.rest     / ii  jj\ 1/2
2748                    |s   s  |
2749                    \       /
2750 @end group
2751 @end example
2752 @end ifnottex
2753 @tex
2754 $${r_{ij.rest}}={-{{s^{ij}}\over \sqrt{s^{ii}s^{jj}}}}$$
2755 @end tex
2756
2757 @end itemize
2758
2759 Option:
2760
2761 @itemize @bullet
2762 @item
2763 @code{'data}, default @code{'true}, indicates whether the input matrix contains the sample data,
2764 in which case the covariance matrix @code{cov1} must be calculated, or not, and then the covariance
2765 matrix (symmetric) must be given, instead of the data.
2766 @end itemize
2767
2768 Example:
2769
2770 @c ===beg===
2771 @c load ("descriptive")$
2772 @c s2 : read_matrix (file_search ("wind.data"))$
2773 @c z : list_correlations (s2)$
2774 @c fpprintprec : 5$
2775 @c precision_matrix: z[1];
2776 @c multiple_correlation_vector: z[2];
2777 @c partial_correlation_matrix: z[3];
2778 @c ===end===
2779 @example
2780 (%i1) load ("descriptive")$
2781 (%i2) s2 : read_matrix (file_search ("wind.data"))$
2782 (%i3) z : list_correlations (s2)$
2783 (%i4) fpprintprec : 5$
2784 @group
2785 (%i5) precision_matrix: z[1];
2786 (%o5)
2787     [  0.38486   - 0.13856   - 0.15626   - 0.10239    0.031179  ]
2788     [                                                           ]
2789     [ - 0.13856   0.34107    - 0.15233    0.038447   - 0.052842 ]
2790     [                                                           ]
2791     [ - 0.15626  - 0.15233    0.47296    - 0.024816  - 0.10054  ]
2792     [                                                           ]
2793     [ - 0.10239   0.038447   - 0.024816   0.10937    - 0.034033 ]
2794     [                                                           ]
2795     [ 0.031179   - 0.052842  - 0.10054   - 0.034033   0.14834   ]
2796 @end group
2797 @group
2798 (%i6) multiple_correlation_vector: z[2];
2799 (%o6)     [0.85063, 0.80634, 0.86474, 0.71867, 0.72675]
2800 @end group
2801 @group
2802 (%i7) partial_correlation_matrix: z[3];
2803       [   - 1.0     0.38244   0.36627   0.49908   - 0.13049 ]
2804       [                                                     ]
2805       [  0.38244     - 1.0    0.37927  - 0.19907   0.23492  ]
2806       [                                                     ]
2807 (%o7) [  0.36627    0.37927    - 1.0    0.10911    0.37956  ]
2808       [                                                     ]
2809       [  0.49908   - 0.19907  0.10911    - 1.0     0.26719  ]
2810       [                                                     ]
2811       [ - 0.13049   0.23492   0.37956   0.26719     - 1.0   ]
2812 @end group
2813 @end example
2814
2815 See also @mref{cov} and @mrefdot{cov1}
2816
2817 @opencatbox{Categories:}
2818 @category{Package descriptive}
2819 @closecatbox
2820 @end deffn
2821
2822
2823
2824
2825 @anchor{principal_components}
2826 @deffn {Function} principal_components @
2827 @fname{principal_components} (@var{matrix}) @
2828 @fname{principal_components} (@var{matrix}, @var{options} ...)
2829
2830 Calculates the principal components of a multivariate sample. Principal components are
2831 used in multivariate statistical analysis to reduce the dimensionality of the sample.
2832
2833 Option:
2834
2835 @itemize @bullet
2836 @item
2837 @code{'data}, default @code{'true}, indicates whether the input matrix contains the sample data,
2838 in which case the covariance matrix @mref{cov1} must be calculated, or not, and then the covariance
2839 matrix (symmetric) must be given, instead of the data.
2840 @end itemize
2841
2842 The output of function @code{principal_components} is a list with the following results:
2843
2844 @itemize @bullet
2845 @item
2846 variances of the principal components,
2847 @item
2848 percentage of total variance explained by each principal component,
2849 @item
2850 rotation matrix.
2851 @end itemize
2852
2853 Examples:
2854
2855 In this sample, the first component explains 83.13 per cent of total
2856 variance.
2857
2858 @example
2859 (%i1) load ("descriptive")$
2860 (%i2) s2 : read_matrix (file_search ("wind.data"))$
2861 (%i3) fpprintprec:4 $
2862 (%i4) res: principal_components(s2);
2863 0 errors, 0 warnings
2864 (%o4) [[87.57, 8.753, 5.515, 1.889, 1.613],
2865 [83.13, 8.31, 5.235, 1.793, 1.531],
2866 @group
2867 [ .4149  .03379   - .4757  - 0.581   - .5126 ]
2868 [                                            ]
2869 [ 0.369  - .3657  - .4298   .7237    - .1469 ]
2870 [                                            ]
2871 [ .3959  - .2178  - .2181  - .2749    .8201  ]]
2872 [                                            ]
2873 [ .5548   .7744    .1857    .2319    .06498  ]
2874 [                                            ]
2875 [ .4765  - .4669   0.712   - .09605  - .1969 ]
2876 @end group
2877 (%i5) /* accumulated percentages  */
2878     block([ap: copy(res[2])],
2879       for k:2 thru length(ap) do ap[k]: ap[k]+ap[k-1],
2880       ap);
2881 (%o5)                 [83.13, 91.44, 96.68, 98.47, 100.0]
2882 (%i6) /* sample dimension */
2883       p: length(first(res));
2884 (%o6)                                  5
2885 (%i7) /* plot percentages to select number of
2886          principal components for further work */
2887      draw2d(
2888         fill_density = 0.2,
2889         apply(bars, makelist([k, res[2][k], 1/2], k, p)),
2890         points_joined = true,
2891         point_type    = filled_circle,
2892         point_size    = 3,
2893         points(makelist([k, res[2][k]], k, p)),
2894         xlabel = "Variances",
2895         ylabel = "Percentages",
2896         xtics  = setify(makelist([concat("PC",k),k], k, p))) $
2897 @end example
2898
2899 In case the covariance matrix is known, it can be passed to the function,
2900 but option @code{data=false} must be used.
2901
2902 @example
2903 (%i1) load ("descriptive")$
2904 (%i2) S: matrix([1,-2,0],[-2,5,0],[0,0,2]);
2905                                 [  1   - 2  0 ]
2906                                 [             ]
2907 (%o2)                           [ - 2   5   0 ]
2908                                 [             ]
2909                                 [  0    0   2 ]
2910 (%i3) fpprintprec:4 $
2911 (%i4) /* the argument is a covariance matrix */
2912       res: principal_components(S, data=false);
2913 0 errors, 0 warnings
2914                                                   [ - .3827  0.0  .9239 ]
2915                                                   [                     ]
2916 (%o4) [[5.828, 2.0, .1716], [72.86, 25.0, 2.145], [  .9239   0.0  .3827 ]]
2917                                                   [                     ]
2918                                                   [   0.0    1.0   0.0  ]
2919 (%i5) /* transformation to get the principal components
2920          from original records */
2921       matrix([a1,b2,c3],[a2,b2,c2]).last(res);
2922              [ .9239 b2 - .3827 a1  1.0 c3  .3827 b2 + .9239 a1 ]
2923 (%o5)        [                                                  ]
2924              [ .9239 b2 - .3827 a2  1.0 c2  .3827 b2 + .9239 a2 ]
2925 @end example
2926
2927 @opencatbox{Categories:}
2928 @category{Package descriptive}
2929 @closecatbox
2930 @end deffn
2931
2932
2933
2934 @node Functions and Variables for statistical graphs,  , Functions and Variables for descriptive statistics, Package descriptive
2935 @section Functions and Variables for statistical graphs
2936
2937
2938
2939 @anchor{barsplot}
2940 @deffn {Function} barsplot (@var{data1}, @var{data2}, @dots{}, @var{option_1}, @var{option_2}, @dots{})
2941
2942 Plots bars diagrams for discrete statistical variables,
2943 both for one or multiple samples.
2944
2945 @var{data} can be a list of outcomes representing one sample, or a
2946 matrix of @var{m} rows and @var{n} columns, representing @var{n} samples of size
2947 @var{m} each.
2948
2949 Available options are:
2950
2951 @itemize @bullet
2952
2953 @item
2954 @var{box_width} (default, @code{3/4}): relative width of rectangles. This
2955 value must be in the range @code{[0,1]}.
2956
2957 @item
2958 @var{grouping} (default, @code{clustered}): indicates how multiple samples are
2959 shown. Valid values are: @code{clustered} and @code{stacked}.
2960
2961 @item
2962 @var{groups_gap} (default, @code{1}): a positive integer number representing
2963 the gap between two consecutive groups of bars.
2964
2965 @item
2966 @var{bars_colors} (default, @code{[]}): a list of colors for multiple samples.
2967 When there are more samples than specified colors, the extra necessary colors
2968 are chosen at random. See @code{color} to learn more about them.
2969
2970 @item
2971 @var{frequency} (default, @code{absolute}): indicates the scale of the
2972 ordinates. Possible values are:  @code{absolute}, @code{relative},
2973 and @code{percent}.
2974
2975 @item
2976 @var{ordering} (default, @code{orderlessp}): possible values are @code{orderlessp} or @code{ordergreatp},
2977 indicating how statistical outcomes should be ordered on the @var{x}-axis.
2978
2979 @item
2980 @var{sample_keys} (default, @code{[]}): a list with the strings to be used in the legend.
2981 When the list length is other than 0 or the number of samples, an error message is returned.
2982
2983 @item
2984 @var{start_at} (default, @code{0}): indicates where the plot begins to be plotted on the
2985 x axis.
2986
2987 @item
2988 All global @code{draw} options, except @code{xtics}, which is
2989 internally assigned by @code{barsplot}.
2990 If you want to set your own values for this option or want to build
2991 complex scenes, make use of @code{barsplot_description}. See example below.
2992
2993 @item
2994 The following local @ref{Package draw} options: @mrefcomma{key} @mrefcomma{color_draw}
2995 @mrefcomma{fill_color} @mref{fill_density} and @mrefdot{line_width}
2996 See also
2997 @mrefdot{barsplot}
2998
2999 @end itemize
3000
3001 There is also a function @code{wxbarsplot} for creating embedded
3002 histograms in interfaces wxMaxima and iMaxima.  @code{barsplot} in a
3003 multiplot context.
3004
3005 Examples:
3006
3007 Univariate sample in matrix form. Absolute frequencies.
3008
3009 @c ===beg===
3010 @c load ("descriptive")$
3011 @c m : read_matrix (file_search ("biomed.data"))$
3012 @c barsplot(
3013 @c   col(m,2),
3014 @c   title        = "Ages",
3015 @c   xlabel       = "years",
3016 @c   box_width    = 1/2,
3017 @c   fill_density = 3/4)$
3018 @c ===end===
3019 @example
3020 (%i1) load ("descriptive")$
3021 (%i2) m : read_matrix (file_search ("biomed.data"))$
3022 @group
3023 (%i3) barsplot(
3024   col(m,2),
3025   title        = "Ages",
3026   xlabel       = "years",
3027   box_width    = 1/2,
3028   fill_density = 3/4)$
3029 @end group
3030 @end example
3031
3032 Two samples of different sizes, with
3033 relative frequencies and user declared colors.
3034
3035 @c ===beg===
3036 @c load ("descriptive")$
3037 @c l1:makelist(random(10),k,1,50)$
3038 @c l2:makelist(random(10),k,1,100)$
3039 @c barsplot(
3040 @c    l1,l2,
3041 @c    box_width = 1,
3042 @c    fill_density = 1,
3043 @c    bars_colors = [black, grey],
3044 @c    frequency = relative,
3045 @c    sample_keys = ["A", "B"])$
3046 @c ===end===
3047 @example
3048 (%i1) load ("descriptive")$
3049 (%i2) l1:makelist(random(10),k,1,50)$
3050 (%i3) l2:makelist(random(10),k,1,100)$
3051 @group
3052 (%i4) barsplot(
3053    l1,l2,
3054    box_width = 1,
3055    fill_density = 1,
3056    bars_colors = [black, grey],
3057    frequency = relative,
3058    sample_keys = ["A", "B"])$
3059 @end group
3060 @end example
3061
3062 Four non numeric samples of equal size.
3063
3064 @c ===beg===
3065 @c load ("descriptive")$
3066 @c barsplot(
3067 @c   makelist([Yes, No, Maybe][random(3)+1],k,1,50),
3068 @c   makelist([Yes, No, Maybe][random(3)+1],k,1,50),
3069 @c   makelist([Yes, No, Maybe][random(3)+1],k,1,50),
3070 @c   makelist([Yes, No, Maybe][random(3)+1],k,1,50),
3071 @c   title      = "Asking for something to four groups",
3072 @c   ylabel     = "# of individuals",
3073 @c   groups_gap = 3,
3074 @c   fill_density = 0.5,
3075 @c   ordering = ordergreatp)$
3076 @c ===end===
3077 @example
3078 (%i1) load ("descriptive")$
3079 @group
3080 (%i2) barsplot(
3081   makelist([Yes, No, Maybe][random(3)+1],k,1,50),
3082   makelist([Yes, No, Maybe][random(3)+1],k,1,50),
3083   makelist([Yes, No, Maybe][random(3)+1],k,1,50),
3084   makelist([Yes, No, Maybe][random(3)+1],k,1,50),
3085   title      = "Asking for something to four groups",
3086   ylabel     = "# of individuals",
3087   groups_gap = 3,
3088   fill_density = 0.5,
3089   ordering = ordergreatp)$
3090 @end group
3091 @end example
3092
3093 Stacked bars.
3094
3095 @c ===beg===
3096 @c load ("descriptive")$
3097 @c barsplot(
3098 @c   makelist([Yes, No, Maybe][random(3)+1],k,1,50),
3099 @c   makelist([Yes, No, Maybe][random(3)+1],k,1,50),
3100 @c   makelist([Yes, No, Maybe][random(3)+1],k,1,50),
3101 @c   makelist([Yes, No, Maybe][random(3)+1],k,1,50),
3102 @c   title      = "Asking for something to four groups",
3103 @c   ylabel     = "# of individuals",
3104 @c   grouping   = stacked,
3105 @c   fill_density = 0.5,
3106 @c   ordering = ordergreatp)$
3107 @c ===end===
3108 @example
3109 (%i1) load ("descriptive")$
3110 @group
3111 (%i2) barsplot(
3112   makelist([Yes, No, Maybe][random(3)+1],k,1,50),
3113   makelist([Yes, No, Maybe][random(3)+1],k,1,50),
3114   makelist([Yes, No, Maybe][random(3)+1],k,1,50),
3115   makelist([Yes, No, Maybe][random(3)+1],k,1,50),
3116   title      = "Asking for something to four groups",
3117   ylabel     = "# of individuals",
3118   grouping   = stacked,
3119   fill_density = 0.5,
3120   ordering = ordergreatp)$
3121 @end group
3122 @end example
3123
3124 For bars diagrams related options, see @mref{barsplot} of package @ref{Package draw}
3125 See also functions @mref{histogram} and @mrefdot{piechart}
3126
3127 @opencatbox{Categories:}
3128 @category{Package descriptive}
3129 @category{Plotting}
3130 @closecatbox
3131 @end deffn
3132
3133 @anchor{barsplot_description}
3134 @deffn {Function} barsplot_description (@dots{})
3135
3136 Function @code{barsplot_description} creates a graphic object
3137 suitable for creating complex scenes, together with other
3138 graphic objects.
3139
3140 Example: @code{barsplot} in a multiplot context.
3141
3142 @example
3143 (%i1) load ("descriptive")$
3144 (%i2) l1:makelist(random(10),k,1,50)$
3145 (%i3) l2:makelist(random(10),k,1,100)$
3146 (%i4) bp1 :
3147         barsplot_description(
3148          l1,
3149          box_width = 1,
3150          fill_density = 0.5,
3151          bars_colors = [blue],
3152          frequency = relative)$
3153 (%i5) bp2 :
3154         barsplot_description(
3155          l2,
3156          box_width = 1,
3157          fill_density = 0.5,
3158          bars_colors = [red],
3159          frequency = relative)$
3160 (%i6) draw(gr2d(bp1), gr2d(bp2))$
3161 @end example
3162
3163 @opencatbox{Categories:}
3164 @category{Package descriptive}
3165 @category{Plotting}
3166 @closecatbox
3167 @end deffn
3168
3169 @anchor{boxplot}
3170 @deffn {Function} boxplot (@var{data}) @
3171 @fname{boxplot} (@var{data}, @var{option_1}, @var{option_2}, @dots{})
3172
3173 This function plots box-and-whisker diagrams. Argument @var{data} can be a list,
3174 which is not of great interest, since these diagrams are mainly used for
3175 comparing different samples, or a matrix, so it is possible to compare
3176 two or more components of a multivariate statistical variable.
3177 But it is also allowed @var{data} to be a list of samples with
3178 possible different sample sizes, in fact this is the only function
3179 in package @code{descriptive} that admits this type of data structure.
3180
3181 The box is plotted from the first quartile to the third, with an horizontal
3182 segment situated at the second quartile or median. By default, lower and
3183 upper whiskers are plotted at the minimum and maximum values,
3184 respectively. Option @var{range} can be used to indicate that values greater
3185 than @code{quantile(x,3/4)+range*(quantile(x,3/4)-quantile(x,1/4))} or
3186 less than @code{quantile(x,1/4)-range*(quantile(x,3/4)-quantile(x,1/4))}
3187 must be considered as outliers, in which case they are plotted as
3188 isolated points, and the whiskers are located at the extremes of the rest of
3189 the sample.
3190
3191 Available options are:
3192
3193 @itemize @bullet
3194
3195 @item
3196 @var{box_width} (default, @code{3/4}): relative width of boxes.
3197 This  value must be in the range @code{[0,1]}.
3198
3199 @item
3200 @var{box_orientation} (default, @code{vertical}): possible values: @code{vertical}
3201 and @code{horizontal}.
3202
3203 @item
3204 @var{range} (default, @code{inf}): positive coefficient of the interquartilic range
3205 to set outliers boundaries.
3206
3207 @item
3208 @var{outliers_size} (default, @code{1}): circle size for isolated outliers.
3209
3210 @item
3211 All @code{draw} options, except @code{points_joined}, @code{point_size}, @code{point_type},
3212 @code{xtics}, @code{ytics}, @code{xrange}, and @code{yrange}, which are
3213 internally assigned by @code{boxplot}.
3214 If you want to set your own values for this options or want to build
3215 complex scenes, make use of @code{boxplot_description}.
3216
3217 @item
3218 The following local @code{draw} options: @code{key}, @code{color},
3219 and @code{line_width}.
3220
3221 @end itemize
3222
3223 There is also a function @code{wxboxplot} for creating embedded
3224 histograms in interfaces wxMaxima and iMaxima.
3225
3226 Examples:
3227
3228 Box-and-whisker diagram from a multivariate sample.
3229
3230 @c ===beg===
3231 @c load ("descriptive")$
3232 @c s2 : read_matrix(file_search("wind.data"))$
3233 @c boxplot(s2,
3234 @c   box_width  = 0.2,
3235 @c   title      = "Windspeed in knots",
3236 @c   xlabel     = "Stations",
3237 @c   color      = red,
3238 @c   line_width = 2)$
3239 @c ===end===
3240 @example
3241 (%i1) load ("descriptive")$
3242 (%i2) s2 : read_matrix(file_search("wind.data"))$
3243 @group
3244 (%i3) boxplot(s2,
3245   box_width  = 0.2,
3246   title      = "Windspeed in knots",
3247   xlabel     = "Stations",
3248   color      = red,
3249   line_width = 2)$
3250 @end group
3251 @end example
3252
3253 Box-and-whisker diagram from three samples of different sizes.
3254
3255 @c ===beg===
3256 @c load ("descriptive")$
3257 @c A :
3258 @c  [[6, 4, 6, 2, 4, 8, 6, 4, 6, 4, 3, 2],
3259 @c   [8, 10, 7, 9, 12, 8, 10],
3260 @c   [16, 13, 17, 12, 11, 18, 13, 18, 14, 12]]$
3261 @c boxplot (A, box_orientation = horizontal)$
3262 @c ===end===
3263 @example
3264 (%i1) load ("descriptive")$
3265 @group
3266 (%i2) A :
3267  [[6, 4, 6, 2, 4, 8, 6, 4, 6, 4, 3, 2],
3268   [8, 10, 7, 9, 12, 8, 10],
3269   [16, 13, 17, 12, 11, 18, 13, 18, 14, 12]]$
3270 @end group
3271 (%i3) boxplot (A, box_orientation = horizontal)$
3272 @end example
3273
3274 Option @var{range} can be used to handle outliers.
3275
3276 @c ===beg===
3277 @c  load ("descriptive")$
3278 @c  B: [[7, 15, 5, 8, 6, 5, 7, 3, 1],
3279 @c      [10, 8, 12, 8, 11, 9, 20],
3280 @c      [23, 17, 19, 7, 22, 19]] $
3281 @c  boxplot (B, range=1)$
3282 @c  boxplot (B, range=1.5, box_orientation = horizontal)$
3283 @c  draw2d(
3284 @c     boxplot_description(
3285 @c        B,
3286 @c        range            = 1.5,
3287 @c        line_width       = 3,
3288 @c        outliers_size    = 2,
3289 @c        color            = red,
3290 @c        background_color = light_gray),
3291 @c     xtics = {["Low",1],["Medium",2],["High",3]}) $
3292 @c ===end===
3293 @example
3294 @group
3295 (%i1)  load ("descriptive")$
3296  B: [[7, 15, 5, 8, 6, 5, 7, 3, 1],
3297      [10, 8, 12, 8, 11, 9, 20],
3298      [23, 17, 19, 7, 22, 19]] $
3299  boxplot (B, range=1)$
3300  boxplot (B, range=1.5, box_orientation = horizontal)$
3301  draw2d(
3302     boxplot_description(
3303        B,
3304        range            = 1.5,
3305        line_width       = 3,
3306        outliers_size    = 2,
3307        color            = red,
3308        background_color = light_gray),
3309     xtics = @{["Low",1],["Medium",2],["High",3]@}) $
3310 @end group
3311 @end example
3312
3313 @opencatbox{Categories:}
3314 @category{Package descriptive}
3315 @category{Plotting}
3316 @closecatbox
3317 @end deffn
3318
3319 @anchor{boxplot_description}
3320 @deffn {Function} boxplot_description (@dots{})
3321
3322 Function @code{boxplot_description} creates a graphic object
3323 suitable for creating complex scenes, together with other
3324 graphic objects.
3325
3326 @opencatbox{Categories:}
3327 @category{Package descriptive}
3328 @category{Plotting}
3329 @closecatbox
3330 @end deffn
3331
3332 @anchor{histogram}
3333 @deffn {Function} histogram @
3334 @fname{histogram} (@var{list}) @
3335 @fname{histogram} (@var{list}, @var{option_1}, @var{option_2}, @dots{}) @
3336 @fname{histogram} (@var{one_column_matrix}) @
3337 @fname{histogram} (@var{one_column_matrix}, @var{option_1}, @var{option_2}, @dots{}) @
3338 @fname{histogram} (@var{one_row_matrix}) @
3339 @fname{histogram} (@var{one_row_matrix}, @var{option_1}, @var{option_2}, @dots{})
3340
3341 Constructs and displays a histogram from a data sample.
3342 Data must be stored as a list of numbers, or a matrix of one row or one column.
3343
3344 Optional arguments:
3345
3346 @itemize @bullet
3347
3348 @item
3349 @code{nclasses} (default, 10):
3350 the number of classes (also called bins) in the histogram,
3351 or a list of two numbers (the least and greatest values included in the histogram),
3352 or a list of three numbers (the least and greatest values included in the histogram, and the number of classes),
3353 or a set containing the endpoints of the class intervals,
3354 or a symbol specifying the name of one of three algorithms to automatically determine the number of classes:
3355 @code{fd} (Ref. [1]), @code{scott} (Ref. [2]), or @code{sturges} (Ref. [3]).
3356
3357 A class interval excludes its left endpoint and includes its right endpoint,
3358 except for the first interval, which includes both the left and right endpoints.
3359 It is assumed that class intervals are contiguous.
3360 That is, the right endpoint of one interval is equal to the left endpoint of the next.
3361
3362 @item
3363 @code{frequency} (default, @code{absolute}): indicates the scale of the vertical axis.
3364 Possible values are:  @code{absolute} (heights of bars add up to number of data),
3365 @code{relative} (heights of bars add up to 1),
3366 @code{percent} (heights of bars add up to 100),
3367 and @code{density} (total area of histogram is 1).
3368
3369 @item
3370 @code{htics} (default, @code{auto}): format of tic marks on the horizontal axis.
3371 Possible values are: @code{auto} (tics are placed automatically),
3372 @code{endpoints} (tics are placed at the divisions between classes),
3373 @code{intervals} (classes are labeled with the corresponding intervals),
3374 or a list of labels, one for each class.
3375
3376 @item
3377 All global @code{draw} options, except @code{xrange}, @code{yrange},
3378 and @code{xtics}, which are internally assigned by @code{histogram}.
3379 If you want to set your own values for these options, make use of
3380 @code{histogram_description}.
3381
3382 @item
3383 The following local @ref{Package draw} options: @mrefcomma{key}
3384 @mrefcomma{fill_color} @mrefcomma{fill_density} and @mrefdot{line_width}
3385 Note that the outlines of bars,
3386 as well as the interior of bars when @code{fill_density} is nonzero,
3387 are drawn with @code{fill_color}, not @code{color}.
3388
3389 @end itemize
3390
3391 @code{histogram} honors the global option @code{histogram_skyline}.
3392 When @code{histogram_skyline} is @code{true},
3393 @code{histogram} and @code{histogram_description} construct "skyline" plots,
3394 which shows the outline of the histogram bars,
3395 instead of drawing all the vertical segments.
3396 Otherwise (the default), histograms are displayed with bars showing vertical segments.
3397
3398 There is also a function @code{wxhistogram} for creating embedded
3399 histograms in interfaces wxMaxima and iMaxima.
3400
3401 See also @mrefcomma{continuous_freq}
3402 which, like @code{histogram},
3403 counts data in intervals,
3404 but returns the counts instead of displaying a graphic representation.
3405
3406 See also @mrefdot{barsplot}
3407
3408 Examples:
3409
3410 A simple histogram with eight classes:
3411
3412 @c ===beg===
3413 @c load ("descriptive")$
3414 @c s1 : read_list (file_search ("pidigits.data"))$
3415 @c histogram (
3416 @c      s1,
3417 @c      nclasses     = 8,
3418 @c      title        = "pi digits",
3419 @c      xlabel       = "digits",
3420 @c      ylabel       = "Absolute frequency",
3421 @c      fill_color   = grey,
3422 @c      fill_density = 0.6)$
3423 @c ===end===
3424 @example
3425 (%i1) load ("descriptive")$
3426 (%i2) s1 : read_list (file_search ("pidigits.data"))$
3427 @group
3428 (%i3) histogram (
3429      s1,
3430      nclasses     = 8,
3431      title        = "pi digits",
3432      xlabel       = "digits",
3433      ylabel       = "Absolute frequency",
3434      fill_color   = grey,
3435      fill_density = 0.6)$
3436 @end group
3437 @end example
3438
3439 Setting the limits of the histogram to -2 and 12, with 3 classes.
3440 Also, we introduce predefined tics:
3441
3442 @c ===beg===
3443 @c load ("descriptive")$
3444 @c s1 : read_list (file_search ("pidigits.data"))$
3445 @c histogram (
3446 @c      s1,
3447 @c      nclasses     = [-2,12,3],
3448 @c      htics        = ["A", "B", "C"],
3449 @c      terminal     = png,
3450 @c      fill_color   = "#23afa0",
3451 @c      fill_density = 0.6)$
3452 @c ===end===
3453 @example
3454 (%i1) load ("descriptive")$
3455 (%i2) s1 : read_list (file_search ("pidigits.data"))$
3456 @group
3457 (%i3) histogram (
3458      s1,
3459      nclasses     = [-2,12,3],
3460      htics        = ["A", "B", "C"],
3461      terminal     = png,
3462      fill_color   = "#23afa0",
3463      fill_density = 0.6)$
3464 @end group
3465 @end example
3466
3467 Bounds for varying class widths.
3468
3469 @c ===beg===
3470 @c load ("descriptive")$
3471 @c s1 : read_list (file_search ("pidigits.data"))$
3472 @c histogram (s1, nclasses = {0,3,6,7,11})$
3473 @c ===end===
3474 @example
3475 (%i1) load ("descriptive")$
3476 (%i2) s1 : read_list (file_search ("pidigits.data"))$
3477 (%i3) histogram (s1, nclasses = @{0,3,6,7,11@})$
3478 @end example
3479
3480 Freedman-Diaconis formula for the number of classes.
3481
3482 @c ===beg===
3483 @c load ("descriptive")$
3484 @c s1 : read_list (file_search ("pidigits.data"))$
3485 @c histogram(s1, nclasses=fd) $
3486 @c ===end===
3487 @example
3488 (%i1) load ("descriptive")$
3489 (%i2) s1 : read_list (file_search ("pidigits.data"))$
3490 (%i3) histogram(s1, nclasses=fd) $
3491 @end example
3492
3493 References:
3494
3495 [1] Freedman, D., and Diaconis, P. (1981) On the histogram as a density estimator: L_2 theory.
3496 Zeitschrift f@"ur Wahrscheinlichkeitstheorie und verwandte Gebiete 57, 453-476.
3497
3498 [2] Scott, D. W. (1979) On optimal and data-based histograms. Biometrika 66, 605-610.
3499
3500 [3] Sturges, H. A. (1926) The choice of a class interval. Journal of the American Statistical Association 21, 65-66.
3501
3502 @opencatbox{Categories:}
3503 @category{Package descriptive}
3504 @category{Plotting}
3505 @closecatbox
3506 @end deffn
3507
3508 @anchor{histogram_description}
3509 @deffn {Function} histogram_description (@dots{})
3510
3511 Creates a graphic object which represents a histogram.
3512 Such an object is suitable for creating complex scenes together with other graphic objects,
3513 to be displayed by @code{draw2d}.
3514
3515 @code{histogram_description} takes the same arguments
3516 as the stand-alone function @code{histogram}.
3517 See @mref{histogram} for more information.
3518
3519 Example:
3520
3521 We make use of @code{histogram_description} for setting
3522 @code{xrange} and adding an explicit curve into the scene:
3523
3524 @example
3525 (%i1) load ("descriptive")$
3526 (%i2) ( load("distrib"),
3527         m: 14, s: 2,
3528         s2: random_normal(m, s, 1000) ) $
3529 (%i3) draw2d(
3530         grid   = true,
3531         xrange = [5, 25],
3532         histogram_description(
3533           s2,
3534           nclasses     = 9,
3535           frequency    = density,
3536           fill_density = 0.5),
3537         explicit(pdf_normal(x,m,s), x, m - 3*s, m + 3* s))$
3538 @end example
3539
3540 @opencatbox{Categories:}
3541 @category{Package descriptive}
3542 @category{Plotting}
3543 @closecatbox
3544 @end deffn
3545
3546 @anchor{histogram_skyline}
3547 @defvr {Option variable} histogram_skyline
3548 Default value: @code{false}
3549
3550 When @code{histogram_skyline} is @code{true},
3551 @code{histogram} and @code{histogram_description} construct "skyline" plots,
3552 which shows the outline of the histogram bars,
3553 instead of drawing all the vertical segments.
3554
3555 The outline is drawn with the current @code{fill_color} (not the current @code{color}).
3556 The interior of the histogram is filled with @code{fill_color},
3557 but only if @code{fill_density} is nonzero.
3558
3559 Otherwise, histograms are displayed with bars showing vertical segments.
3560
3561 Examples:
3562
3563 Construct a skyline histogram,
3564 and an ordinary histogram for comparison,
3565 on the same plot.
3566
3567 @example
3568 (%i1) load ("descriptive") $
3569 (%i2) L: read_list (file_search ("pidigits.data")) $
3570 (%i3) histogram_skyline: true $
3571 (%i4) skyline_hist: histogram_description (L) $
3572 (%i5) histogram_skyline: false $
3573 (%i6) ordinary_hist: histogram_description (L) $
3574 (%i7) draw (gr2d (skyline_hist), gr2d (ordinary_hist)) $
3575 @end example
3576
3577 Continuing the preceding example.
3578 Set display options for @code{fill_color} and @code{fill_density}.
3579
3580 @example
3581 (%i8) histogram_skyline: true $
3582 (%i9) skyline_hist: histogram_description (L, fill_color = blue, fill_density = 0.2) $
3583 (%i10) histogram_skyline: false $
3584 (%i11) ordinary_hist: histogram_description (L, fill_color = blue, fill_density = 0.2) $
3585 (%i12) draw (gr2d (skyline_hist), gr2d (ordinary_hist)) $
3586 @end example
3587
3588 @opencatbox{Categories:}
3589 @category{Package descriptive}
3590 @category{Plotting}
3591 @closecatbox
3592 @end defvr
3593
3594 @anchor{piechart}
3595 @deffn {Function} piechart @
3596 @fname{piechart} (@var{list}) @
3597 @fname{piechart} (@var{list}, @var{option_1}, @var{option_2}, @dots{}) @
3598 @fname{piechart} (@var{one_column_matrix}) @
3599 @fname{piechart} (@var{one_column_matrix}, @var{option_1}, @var{option_2}, @dots{}) @
3600 @fname{piechart} (@var{one_row_matrix}) @
3601 @fname{piechart} (@var{one_row_matrix}, @var{option_1}, @var{option_2}, @dots{})
3602
3603 Similar to @code{barsplot}, but plots sectors instead of rectangles.
3604
3605 Available options are:
3606
3607 @itemize @bullet
3608
3609 @item
3610 @var{sector_colors} (default, @code{[]}): a list of colors for sectors.
3611 When there are more sectors than specified colors, the extra necessary colors
3612 are chosen at random. See @code{color} to learn more about them.
3613
3614 @item
3615 @var{pie_center} (default, @code{[0,0]}): diagram's center.
3616
3617 @item
3618 @var{pie_radius} (default, @code{1}): diagram's radius.
3619
3620 @item
3621 All global @code{draw} options, except @code{key}, which is
3622 internally assigned by @code{piechart}.
3623 If you want to set your own values for this option or want to build
3624 complex scenes, make use of @code{piechart_description}.
3625
3626 @item
3627 The following local @code{draw} options: @code{key}, @code{color},
3628 @code{fill_density} and @code{line_width}. See also
3629 @code{ellipse}
3630
3631 @end itemize
3632
3633 There is also a function @code{wxpiechart} for
3634 creating embedded histograms in interfaces wxMaxima and iMaxima.
3635
3636 Example:
3637
3638 @c ===beg===
3639 @c load ("descriptive")$
3640 @c s1 : read_list (file_search ("pidigits.data"))$
3641 @c piechart(
3642 @c   s1,
3643 @c   xrange = [-1.1, 1.3],
3644 @c   yrange = [-1.1, 1.1],
3645 @c   title  = "Digit frequencies in pi")$
3646 @c ===end===
3647 @example
3648 (%i1) load ("descriptive")$
3649 (%i2) s1 : read_list (file_search ("pidigits.data"))$
3650 @group
3651 (%i3) piechart(
3652   s1,
3653   xrange = [-1.1, 1.3],
3654   yrange = [-1.1, 1.1],
3655   title  = "Digit frequencies in pi")$
3656 @end group
3657 @end example
3658
3659 See also function @mrefdot{barsplot}
3660
3661 @opencatbox{Categories:}
3662 @category{Package descriptive}
3663 @category{Plotting}
3664 @closecatbox
3665 @end deffn
3666
3667 @anchor{piechart_description}
3668 @deffn {Function} piechart_description (@dots{})
3669
3670 Function @code{piechart_description} creates a graphic object
3671 suitable for creating complex scenes, together with other
3672 graphic objects.
3673
3674 @opencatbox{Categories:}
3675 @category{Package descriptive}
3676 @category{Plotting}
3677 @closecatbox
3678 @end deffn
3679
3680 @anchor{scatterplot}
3681 @deffn {Function} scatterplot @
3682 @fname{scatterplot} (@var{list}) @
3683 @fname{scatterplot} (@var{list}, @var{option_1}, @var{option_2}, @dots{}) @
3684 @fname{scatterplot} (@var{matrix}) @
3685 @fname{scatterplot} (@var{matrix}, @var{option_1}, @var{option_2}, @dots{})
3686
3687 Plots scatter diagrams both for univariate (@var{list}) and multivariate
3688 (@var{matrix}) samples.
3689
3690 Available options are the same admitted by @code{histogram}.
3691
3692 There is also a function @code{wxscatterplot} for
3693 creating embedded histograms in interfaces wxMaxima and iMaxima.
3694
3695 Examples:
3696
3697 Univariate scatter diagram from a simulated Gaussian sample.
3698
3699 @c ===beg===
3700 @c load ("descriptive")$
3701 @c load ("distrib")$
3702 @c scatterplot(
3703 @c   random_normal(0,1,200),
3704 @c   xaxis      = true,
3705 @c   point_size = 2,
3706 @c   dimensions = [600,150])$
3707 @c ===end===
3708 @example
3709 (%i1) load ("descriptive")$
3710 (%i2) load ("distrib")$
3711 @group
3712 (%i3) scatterplot(
3713   random_normal(0,1,200),
3714   xaxis      = true,
3715   point_size = 2,
3716   dimensions = [600,150])$
3717 @end group
3718 @end example
3719
3720 Two dimensional scatter plot.
3721
3722 @c ===beg===
3723 @c load ("descriptive")$
3724 @c s2 : read_matrix (file_search ("wind.data"))$
3725 @c scatterplot(
3726 @c  submatrix(s2, 1,2,3),
3727 @c  title      = "Data from stations #4 and #5",
3728 @c  point_type = diamant,
3729 @c  point_size = 2,
3730 @c  color      = blue)$
3731 @c ===end===
3732 @example
3733 (%i1) load ("descriptive")$
3734 (%i2) s2 : read_matrix (file_search ("wind.data"))$
3735 @group
3736 (%i3) scatterplot(
3737  submatrix(s2, 1,2,3),
3738  title      = "Data from stations #4 and #5",
3739  point_type = diamant,
3740  point_size = 2,
3741  color      = blue)$
3742 @end group
3743 @end example
3744
3745 Three dimensional scatter plot.
3746
3747 @c ===beg===
3748 @c load ("descriptive")$
3749 @c s2 : read_matrix (file_search ("wind.data"))$
3750 @c scatterplot(submatrix (s2, 1,2), nclasses=4)$
3751 @c ===end===
3752 @example
3753 (%i1) load ("descriptive")$
3754 (%i2) s2 : read_matrix (file_search ("wind.data"))$
3755 (%i3) scatterplot(submatrix (s2, 1,2), nclasses=4)$
3756 @end example
3757
3758 Five dimensional scatter plot, with five classes histograms.
3759
3760 @c ===beg===
3761 @c load ("descriptive")$
3762 @c s2 : read_matrix (file_search ("wind.data"))$
3763 @c scatterplot(
3764 @c   s2,
3765 @c   nclasses     = 5,
3766 @c   frequency    = relative,
3767 @c   fill_color   = blue,
3768 @c   fill_density = 0.3,
3769 @c   xtics        = 5)$
3770 @c ===end===
3771 @example
3772 (%i1) load ("descriptive")$
3773 (%i2) s2 : read_matrix (file_search ("wind.data"))$
3774 @group
3775 (%i3) scatterplot(
3776   s2,
3777   nclasses     = 5,
3778   frequency    = relative,
3779   fill_color   = blue,
3780   fill_density = 0.3,
3781   xtics        = 5)$
3782 @end group
3783 @end example
3784
3785 For plotting isolated or line-joined points in two and three dimensions,
3786 see @code{points}. See also @mrefdot{histogram}
3787
3788 @opencatbox{Categories:}
3789 @category{Package descriptive}
3790 @category{Plotting}
3791 @closecatbox
3792 @end deffn
3793
3794 @anchor{scatterplot_description}
3795 @deffn {Function} scatterplot_description (@dots{})
3796
3797 Function @code{scatterplot_description} creates a graphic object
3798 suitable for creating complex scenes, together with other
3799 graphic objects.
3800
3801 @opencatbox{Categories:}
3802 @category{Package descriptive}
3803 @category{Plotting}
3804 @closecatbox
3805 @end deffn
3806
3807 @anchor{starplot}
3808 @deffn {Function} starplot (@var{data1}, @var{data2}, @dots{}, @var{option_1}, @var{option_2}, @dots{})
3809
3810 Plots star diagrams for discrete statistical variables,
3811 both for one or multiple samples.
3812
3813 @var{data} can be a list of outcomes representing one sample, or a
3814 matrix of @var{m} rows and @var{n} columns, representing @var{n} samples of size
3815 @var{m} each.
3816
3817 Available options are:
3818
3819 @itemize @bullet
3820
3821 @item
3822 @var{stars_colors} (default, @code{[]}): a list of colors for multiple samples.
3823 When there are more samples than specified colors, the extra necessary colors
3824 are chosen at random. See @code{color} to learn more about them.
3825
3826 @item
3827 @var{frequency} (default, @code{absolute}): indicates the scale of the
3828 radii. Possible values are:  @code{absolute} and @code{relative}.
3829
3830 @item
3831 @var{ordering} (default, @code{orderlessp}): possible values are @code{orderlessp} or @code{ordergreatp},
3832 indicating how statistical outcomes should be ordered.
3833
3834 @item
3835 @var{sample_keys} (default, @code{[]}): a list with the strings to be used in the legend.
3836 When the list length is other than 0 or the number of samples, an error message is returned.
3837
3838
3839 @item
3840 @var{star_center} (default, @code{[0,0]}): diagram's center.
3841
3842 @item
3843 @var{star_radius} (default, @code{1}): diagram's radius.
3844
3845 @item
3846 All global @code{draw} options, except @code{points_joined}, @code{point_type},
3847 and @code{key}, which are internally assigned by @code{starplot}.
3848 If you want to set your own values for this options or want to build
3849 complex scenes, make use of @code{starplot_description}.
3850
3851 @item
3852 The following local @code{draw} option: @code{line_width}.
3853
3854 @end itemize
3855
3856 There is also a function @code{wxstarplot} for
3857 creating embedded histograms in interfaces wxMaxima and iMaxima.
3858
3859 Example:
3860
3861 Plot based on absolute frequencies.
3862 Location and radius defined by the user.
3863
3864 @example
3865 (%i1) load ("descriptive")$
3866 (%i2) l1: makelist(random(10),k,1,50)$
3867 (%i3) l2: makelist(random(10),k,1,200)$
3868 @group
3869 (%i4) starplot(
3870         l1, l2,
3871         stars_colors = [blue,red],
3872         sample_keys = ["1st sample", "2nd sample"],
3873         star_center = [1,2],
3874         star_radius = 4,
3875         proportional_axes = xy,
3876         line_width = 2 ) $
3877 @end group
3878 @end example
3879
3880 @opencatbox{Categories:}
3881 @category{Package descriptive}
3882 @category{Plotting}
3883 @closecatbox
3884 @end deffn
3885
3886 @anchor{starplot_description}
3887 @deffn {Function} starplot_description (@dots{})
3888
3889 Function @code{starplot_description} creates a graphic object
3890 suitable for creating complex scenes, together with other
3891 graphic objects.
3892
3893 @opencatbox{Categories:}
3894 @category{Package descriptive}
3895 @category{Plotting}
3896 @closecatbox
3897 @end deffn
3898
3899 @anchor{stemplot}
3900 @deffn {Function} stemplot @
3901 @fname{stemplot} (@var{data}) @
3902 @fname{stemplot} (@var{data}, @var{option})
3903
3904 Plots stem and leaf diagrams.
3905
3906 The only available option is:
3907
3908 @itemize @bullet
3909
3910 @item
3911 @var{leaf_unit} (default, @code{1}): indicates the unit of the leaves; must be a
3912 power of 10.
3913
3914 @end itemize
3915
3916 Example:
3917
3918 @example
3919 (%i1) load ("descriptive")$
3920 (%i2) load("distrib")$
3921 @group
3922 (%i3) stemplot(
3923         random_normal(15, 6, 100),
3924         leaf_unit = 0.1);
3925 -5|4
3926  0|37
3927  1|7
3928  3|6
3929  4|4
3930  5|4
3931  6|57
3932  7|0149
3933  8|3
3934  9|1334588
3935 10|07888
3936 11|01144467789
3937 12|12566889
3938 13|24778
3939 14|047
3940 15|223458
3941 16|4
3942 17|11557
3943 18|000247
3944 19|4467799
3945 20|00
3946 21|1
3947 22|2335
3948 23|01457
3949 24|12356
3950 25|455
3951 27|79
3952 key: 6|3 =  6.3
3953 (%o3)                  done
3954 @end group
3955 @end example
3956
3957 @opencatbox{Categories:}
3958 @category{Package descriptive}
3959 @category{Plotting}
3960 @closecatbox
3961 @end deffn
3962