2 * Introduction to stats::
3 * Functions and Variables for inference_result::
4 * Functions and Variables for stats::
5 * Functions and Variables for special distributions::
8 @node Introduction to stats, Functions and Variables for inference_result, , Package stats
9 @section Introduction to stats
11 Package @code{stats} contains a set of classical statistical inference and
12 hypothesis testing procedures.
14 All these functions return an @code{inference_result} Maxima object which contains
15 the necessary results for population inferences and decision making.
17 Global variable @code{stats_numer} controls whether results are given in
18 floating point or symbolic and rational format; its default value is @code{true}
19 and results are returned in floating point format.
21 Package @code{descriptive} contains some utilities to manipulate data structures
22 (lists and matrices); for example, to extract subsamples. It also contains some
23 examples on how to use package @code{numericalio} to read data from plain text
24 files. See @code{descriptive} and @code{numericalio} for more details.
26 Package @code{stats} loads packages @code{descriptive}, @code{distrib} and
27 @code{inference_result}.
29 For comments, bugs or suggestions, please contact the author at
31 @var{'mario AT edu DOT xunta DOT es'}.
33 @opencatbox{Categories:}
34 @category{Statistical inference}
35 @category{Share packages}
36 @category{Package stats}
39 @node Functions and Variables for inference_result, Functions and Variables for stats, Introduction to stats, Package stats
40 @section Functions and Variables for inference_result
42 @anchor{inference_result}
43 @deffn {Function} inference_result (@var{title}, @var{values}, @var{numbers})
45 Constructs an @code{inference_result} object of the type returned by the
46 stats functions. Argument @var{title} is a
47 string with the name of the procedure; @var{values} is a list with
48 elements of the form @code{symbol = value} and @var{numbers} is a list
49 with positive integer numbers ranging from one to @code{length(@var{values})},
50 indicating which values will be shown by default.
54 This is a simple example showing results concerning a rectangle. The title of
55 this object is the string @code{"Rectangle"}, it stores five results, named
56 @code{'base}, @code{'height}, @code{'diagonal}, @code{'area},
57 and @code{'perimeter}, but only the first, second, fifth, and fourth
58 will be displayed. The @code{'diagonal} is stored in this object, but it is
59 not displayed; to access its value, make use of function @code{take_inference}.
62 @c load ("inference_result")$
65 @c inference_result("Rectangle",
68 @c 'diagonal=sqrt(b^2+h^2),
70 @c 'perimeter=2*(b+h)],
72 @c take_inference('diagonal,%);
75 (%i1) load("inference_result")$
77 (%i3) inference_result("Rectangle",
80 'diagonal=sqrt(b^2+h^2),
93 (%i4) take_inference('diagonal,%);
97 See also @mrefdot{take_inference}
99 @opencatbox{Categories:}
100 @category{Package stats}
106 @deffn {Function} inferencep (@var{obj})
108 Returns @code{true} or @code{false}, depending on whether @var{obj} is an
109 @code{inference_result} object or not.
111 @opencatbox{Categories:}
112 @category{Package stats}
117 @anchor{items_inference}
118 @deffn {Function} items_inference (@var{obj})
120 Returns a list with the names of the items stored in @var{obj}, which must
121 be an @code{inference_result} object.
125 The @code{inference_result} object stores two values, named @code{'pi} and @code{'e},
126 but only the second is displayed. The @code{items_inference} function returns the names
127 of all items, no matter they are displayed or not.
130 @c load ("inference_result")$
131 @c inference_result("Hi", ['pi=%pi,'e=%e],[2]);
132 @c items_inference(%);
135 (%i1) load("inference_result")$
136 (%i2) inference_result("Hi", ['pi=%pi,'e=%e],[2]);
140 (%i3) items_inference(%);
144 @opencatbox{Categories:}
145 @category{Package stats}
150 @anchor{take_inference}
151 @deffn {Function} take_inference @
152 @fname{take_inference} (@var{n}, @var{obj}) @
153 @fname{take_inference} (@var{name}, @var{obj}) @
154 @fname{take_inference} (@var{list}, @var{obj})
156 Returns the @var{n}-th value stored in @var{obj} if @var{n} is a positive integer,
157 or the item named @var{name} if this is the name of an item. If the first
158 argument is a list of numbers and/or symbols, function @code{take_inference} returns
159 a list with the corresponding results.
163 Given an @code{inference_result} object, function @code{take_inference} is
164 called in order to extract some information stored in it.
167 @c load ("inference_result")$
170 @c sol:inference_result("Rectangle",
173 @c 'diagonal=sqrt(b^2+h^2),
175 @c 'perimeter=2*(b+h)],
177 @c take_inference('base,sol);
178 @c take_inference(5,sol);
179 @c take_inference([1,'diagonal],sol);
180 @c take_inference(items_inference(sol),sol);
183 (%i1) load("inference_result")$
185 (%i3) sol: inference_result("Rectangle",
188 'diagonal=sqrt(b^2+h^2),
201 (%i4) take_inference('base,sol);
203 (%i5) take_inference(5,sol);
205 (%i6) take_inference([1,'diagonal],sol);
207 (%i7) take_inference(items_inference(sol),sol);
208 (%o7) [3, 2, sqrt(13), 6, 10]
211 See also @mrefcomma{inference_result} and @mrefdot{take_inference}
213 @opencatbox{Categories:}
214 @category{Package stats}
219 @node Functions and Variables for stats, Functions and Variables for special distributions, Functions and Variables for inference_result, Package stats
220 @section Functions and Variables for stats
223 @defvr {Option variable} stats_numer
224 Default value: @code{true}
226 If @code{stats_numer} is @code{true}, inference statistical functions
227 return their results in floating point numbers. If it is @code{false},
228 results are given in symbolic and rational format.
230 @opencatbox{Categories:}
231 @category{Package stats}
232 @category{Numerical evaluation}
238 @deffn {Function} test_mean @
239 @fname{test_mean} (@var{x}) @
240 @fname{test_mean} (@var{x}, @var{options} ...)
242 This is the mean @var{t}-test. Argument @var{x} is a list or a column matrix
243 containing an one dimensional sample. It also performs an asymptotic test
244 based on the @i{Central Limit Theorem} if option @code{'asymptotic} is
252 @code{'mean}, default @code{0}, is the mean value to be checked.
255 @code{'alternative}, default @code{'twosided}, is the alternative hypothesis;
256 valid values are: @code{'twosided}, @code{'greater} and @code{'less}.
259 @code{'dev}, default @code{'unknown}, this is the value of the standard deviation when it is
260 known; valid values are: @code{'unknown} or a positive expression.
263 @code{'conflevel}, default @code{95/100}, confidence level for the confidence interval; it must
264 be an expression which takes a value in (0,1).
267 @code{'asymptotic}, default @code{false}, indicates whether it performs an exact @var{t}-test or
268 an asymptotic one based on the @i{Central Limit Theorem};
269 valid values are @code{true} and @code{false}.
273 The output of function @code{test_mean} is an @code{inference_result} Maxima object
274 showing the following results:
279 @code{'mean_estimate}: the sample mean.
282 @code{'conf_level}: confidence level selected by the user.
285 @code{'conf_interval}: confidence interval for the population mean.
288 @code{'method}: inference procedure.
291 @code{'hypotheses}: null and alternative hypotheses to be tested.
294 @code{'statistic}: value of the sample statistic used for testing the null hypothesis.
297 @code{'distribution}: distribution of the sample statistic, together with its parameter(s).
300 @code{'p_value}: @math{p}-value of the test.
306 Performs an exact @var{t}-test with unknown variance. The null hypothesis
307 is @math{H_0: mean=50} against the one sided alternative @math{H_1: mean<50};
308 according to the results, the @math{p}-value is too great, there are no
309 evidence for rejecting @math{H_0}.
313 @c data: [78,64,35,45,45,75,43,74,42,42]$
314 @c test_mean(data,'conflevel=0.9,'alternative='less,'mean=50);
318 (%i2) data: [78,64,35,45,45,75,43,74,42,42]$
319 (%i3) test_mean(data,'conflevel=0.9,'alternative='less,'mean=50);
322 | mean_estimate = 54.3
326 | conf_interval = [minf, 61.51314273502712]
328 (%o3) | method = Exact t-test. Unknown variance.
330 | hypotheses = H0: mean = 50 , H1: mean < 50
332 | statistic = .8244705235071678
334 | distribution = [student_t, 9]
336 | p_value = .7845100411786889
339 This time Maxima performs an asymptotic test, based on the @i{Central Limit Theorem}.
340 The null hypothesis is @math{H_0: equal(mean, 50)} against the two sided alternative @math{H_1: not equal(mean, 50)};
341 according to the results, the @math{p}-value is very small, @math{H_0} should be rejected in
342 favor of the alternative @math{H_1}. Note that, as indicated by the @code{Method} component,
343 this procedure should be applied to large samples.
347 @c test_mean([36,118,52,87,35,256,56,178,57,57,89,34,25,98,35,
348 @c 98,41,45,198,54,79,63,35,45,44,75,42,75,45,45,
349 @c 45,51,123,54,151],
350 @c 'asymptotic=true,'mean=50);
354 (%i2) test_mean([36,118,52,87,35,256,56,178,57,57,89,34,25,98,35,
355 98,41,45,198,54,79,63,35,45,44,75,42,75,45,45,
357 'asymptotic=true,'mean=50);
360 | mean_estimate = 74.88571428571429
364 | conf_interval = [57.72848600856194, 92.04294256286663]
366 (%o2) | method = Large sample z-test. Unknown variance.
368 | hypotheses = H0: mean = 50 , H1: mean # 50
370 | statistic = 2.842831192874313
372 | distribution = [normal, 0, 1]
374 | p_value = .004471474652002261
377 @opencatbox{Categories:}
378 @category{Package stats}
383 @anchor{test_means_difference}
384 @deffn {Function} test_means_difference @
385 @fname{test_means_difference} (@var{x1}, @var{x2}) @
386 @fname{test_means_difference} (@var{x1}, @var{x2}, @var{options} ...)
388 This is the difference of means @var{t}-test for two samples.
389 Arguments @var{x1} and @var{x2} are lists or column matrices
390 containing two independent samples. In case of different unknown variances
391 (see options @code{'dev1}, @code{'dev2} and @code{'varequal} bellow),
392 the degrees of freedom are computed by means of the Welch approximation.
393 It also performs an asymptotic test
394 based on the @i{Central Limit Theorem} if option @code{'asymptotic} is
404 @code{'alternative}, default @code{'twosided}, is the alternative hypothesis;
405 valid values are: @code{'twosided}, @code{'greater} and @code{'less}.
408 @code{'dev1}, default @code{'unknown}, this is the value of the standard deviation
409 of the @var{x1} sample when it is known; valid values are: @code{'unknown} or a positive expression.
412 @code{'dev2}, default @code{'unknown}, this is the value of the standard deviation
413 of the @var{x2} sample when it is known; valid values are: @code{'unknown} or a positive expression.
416 @code{'varequal}, default @code{false}, whether variances should be considered to be equal or not;
417 this option takes effect only when @code{'dev1} and/or @code{'dev2} are @code{'unknown}.
420 @code{'conflevel}, default @code{95/100}, confidence level for the confidence interval; it must
421 be an expression which takes a value in (0,1).
424 @code{'asymptotic}, default @code{false}, indicates whether it performs an exact @var{t}-test or
425 an asymptotic one based on the @i{Central Limit Theorem};
426 valid values are @code{true} and @code{false}.
430 The output of function @code{test_means_difference} is an @code{inference_result} Maxima object
431 showing the following results:
436 @code{'diff_estimate}: the difference of means estimate.
439 @code{'conf_level}: confidence level selected by the user.
442 @code{'conf_interval}: confidence interval for the difference of means.
445 @code{'method}: inference procedure.
448 @code{'hypotheses}: null and alternative hypotheses to be tested.
451 @code{'statistic}: value of the sample statistic used for testing the null hypothesis.
454 @code{'distribution}: distribution of the sample statistic, together with its parameter(s).
457 @code{'p_value}: @math{p}-value of the test.
463 The equality of means is tested with two small samples @var{x} and @var{y},
464 against the alternative @math{H_1: m_1>m_2}, being @math{m_1} and @math{m_2}
465 the populations means; variances are unknown and supposed to be different.
467 @c equivalent code for R:
468 @c x <- c(20.4,62.5,61.3,44.2,11.1,23.7)
469 @c y <- c(1.2,6.9,38.7,20.4,17.2)
470 @c t.test(x,y,alternative="greater")
474 @c x: [20.4,62.5,61.3,44.2,11.1,23.7]$
475 @c y: [1.2,6.9,38.7,20.4,17.2]$
476 @c test_means_difference(x,y,'alternative='greater);
480 (%i2) x: [20.4,62.5,61.3,44.2,11.1,23.7]$
481 (%i3) y: [1.2,6.9,38.7,20.4,17.2]$
482 (%i4) test_means_difference(x,y,'alternative='greater);
483 | DIFFERENCE OF MEANS TEST
485 | diff_estimate = 20.31999999999999
489 | conf_interval = [- .04597417812882298, inf]
491 (%o4) | method = Exact t-test. Welch approx.
493 | hypotheses = H0: mean1 = mean2 , H1: mean1 > mean2
495 | statistic = 1.838004300728477
497 | distribution = [student_t, 8.62758740184604]
499 | p_value = .05032746527991905
502 The same test as before, but now variances are supposed to be
505 @c equivalent code for R:
506 @c x <- c(20.4,62.5,61.3,44.2,11.1,23.7)
507 @c y <- c(1.2,6.9,38.7,20.4,17.2)
508 @c t.test(x,y,var.equal=T,alternative="greater")
512 @c x: [20.4,62.5,61.3,44.2,11.1,23.7]$
513 @c y: [1.2,6.9,38.7,20.4,17.2]$
514 @c test_means_difference(x,y,'alternative='greater,
519 (%i2) x: [20.4,62.5,61.3,44.2,11.1,23.7]$
520 (%i3) y: matrix([1.2],[6.9],[38.7],[20.4],[17.2])$
521 (%i4) test_means_difference(x,y,'alternative='greater,
523 | DIFFERENCE OF MEANS TEST
525 | diff_estimate = 20.31999999999999
529 | conf_interval = [- .7722627696897568, inf]
531 (%o4) | method = Exact t-test. Unknown equal variances
533 | hypotheses = H0: mean1 = mean2 , H1: mean1 > mean2
535 | statistic = 1.765996124515009
537 | distribution = [student_t, 9]
539 | p_value = .05560320992529344
542 @opencatbox{Categories:}
543 @category{Package stats}
548 @anchor{test_variance}
549 @deffn {Function} test_variance @
550 @fname{test_variance} (@var{x}) @
551 @fname{test_variance} (@var{x}, @var{options}, ...)
553 This is the variance @var{chi^2}-test. Argument @var{x} is a list or a column matrix
554 containing an one dimensional sample taken from a normal population.
561 @code{'mean}, default @code{'unknown}, is the population's mean, when it is known.
564 @code{'alternative}, default @code{'twosided}, is the alternative hypothesis;
565 valid values are: @code{'twosided}, @code{'greater} and @code{'less}.
568 @code{'variance}, default @code{1}, this is the variance value (positive) to be checked.
571 @code{'conflevel}, default @code{95/100}, confidence level for the confidence interval; it must
572 be an expression which takes a value in (0,1).
576 The output of function @code{test_variance} is an @code{inference_result} Maxima object
577 showing the following results:
582 @code{'var_estimate}: the sample variance.
585 @code{'conf_level}: confidence level selected by the user.
588 @code{'conf_interval}: confidence interval for the population variance.
591 @code{'method}: inference procedure.
594 @code{'hypotheses}: null and alternative hypotheses to be tested.
597 @code{'statistic}: value of the sample statistic used for testing the null hypothesis.
600 @code{'distribution}: distribution of the sample statistic, together with its parameter.
603 @code{'p_value}: @math{p}-value of the test.
609 It is tested whether the variance of a population with unknown mean
610 is equal to or greater than 200.
614 @c x: [203,229,215,220,223,233,208,228,20]$
615 @c test_variance(x,'alternative='greater,'variance=200);
619 (%i2) x: [203,229,215,220,223,233,208,228,209]$
620 (%i3) test_variance(x,'alternative='greater,'variance=200);
623 | var_estimate = 110.75
627 | conf_interval = [57.13433376937479, inf]
629 (%o3) | method = Variance Chi-square test. Unknown mean.
631 | hypotheses = H0: var = 200 , H1: var > 200
635 | distribution = [chi2, 8]
637 | p_value = .8163948512777689
640 @opencatbox{Categories:}
641 @category{Package stats}
646 @anchor{test_variance_ratio}
647 @deffn {Function} test_variance_ratio @
648 @fname{test_variance_ratio} (@var{x1}, @var{x2}) @
649 @fname{test_variance_ratio} (@var{x1}, @var{x2}, @var{options} ...)
651 This is the variance ratio @var{F}-test for two normal populations.
652 Arguments @var{x1} and @var{x2} are lists or column matrices
653 containing two independent samples.
660 @code{'alternative}, default @code{'twosided}, is the alternative hypothesis;
661 valid values are: @code{'twosided}, @code{'greater} and @code{'less}.
664 @code{'mean1}, default @code{'unknown}, when it is known, this is the mean of
665 the population from which @var{x1} was taken.
668 @code{'mean2}, default @code{'unknown}, when it is known, this is the mean of
669 the population from which @var{x2} was taken.
672 @code{'conflevel}, default @code{95/100}, confidence level for the confidence interval of the
673 ratio; it must be an expression which takes a value in (0,1).
677 The output of function @code{test_variance_ratio} is an @code{inference_result} Maxima object
678 showing the following results:
683 @code{'ratio_estimate}: the sample variance ratio.
686 @code{'conf_level}: confidence level selected by the user.
689 @code{'conf_interval}: confidence interval for the variance ratio.
692 @code{'method}: inference procedure.
695 @code{'hypotheses}: null and alternative hypotheses to be tested.
698 @code{'statistic}: value of the sample statistic used for testing the null hypothesis.
701 @code{'distribution}: distribution of the sample statistic, together with its parameters.
704 @code{'p_value}: @math{p}-value of the test.
711 The equality of the variances of two normal populations is checked
712 against the alternative that the first is greater than the second.
714 @c equivalent code for R:
715 @c x <- c(20.4,62.5,61.3,44.2,11.1,23.7)
716 @c y <- c(1.2,6.9,38.7,20.4,17.2)
717 @c var.test(x,y,alternative="greater")
721 @c x: [20.4,62.5,61.3,44.2,11.1,23.7]$
722 @c y: [1.2,6.9,38.7,20.4,17.2]$
723 @c test_variance_ratio(x,y,'alternative='greater);
727 (%i2) x: [20.4,62.5,61.3,44.2,11.1,23.7]$
728 (%i3) y: [1.2,6.9,38.7,20.4,17.2]$
729 (%i4) test_variance_ratio(x,y,'alternative='greater);
730 | VARIANCE RATIO TEST
732 | ratio_estimate = 2.316933391522034
736 | conf_interval = [.3703504689507268, inf]
738 (%o4) | method = Variance ratio F-test. Unknown means.
740 | hypotheses = H0: var1 = var2 , H1: var1 > var2
742 | statistic = 2.316933391522034
744 | distribution = [f, 5, 4]
746 | p_value = .2179269692254457
749 @opencatbox{Categories:}
750 @category{Package stats}
755 @anchor{test_proportion}
756 @deffn {Function} test_proportion @
757 @fname{test_proportion} (@var{x}, @var{n}) @
758 @fname{test_proportion} (@var{x}, @var{n}, @var{options} ...)
760 Inferences on a proportion. Argument @var{x} is the number of successes
761 in @var{n} trials in a Bernoulli experiment with unknown probability.
768 @code{'proportion}, default @code{1/2}, is the value of the proportion to be checked.
771 @code{'alternative}, default @code{'twosided}, is the alternative hypothesis;
772 valid values are: @code{'twosided}, @code{'greater} and @code{'less}.
775 @code{'conflevel}, default @code{95/100}, confidence level for the confidence interval; it must
776 be an expression which takes a value in (0,1).
779 @code{'asymptotic}, default @code{false}, indicates whether it performs an exact test
780 based on the binomial distribution, or an asymptotic one based on the @i{Central Limit Theorem};
781 valid values are @code{true} and @code{false}.
784 @code{'correct}, default @code{true}, indicates whether Yates correction is applied or not.
788 The output of function @code{test_proportion} is an @code{inference_result} Maxima object
789 showing the following results:
794 @code{'sample_proportion}: the sample proportion.
797 @code{'conf_level}: confidence level selected by the user.
800 @code{'conf_interval}: Wilson confidence interval for the proportion.
803 @code{'method}: inference procedure.
806 @code{'hypotheses}: null and alternative hypotheses to be tested.
809 @code{'statistic}: value of the sample statistic used for testing the null hypothesis.
812 @code{'distribution}: distribution of the sample statistic, together with its parameters.
815 @code{'p_value}: @math{p}-value of the test.
821 Performs an exact test. The null hypothesis
822 is @math{H_0: p=1/2} against the one sided alternative @math{H_1: p<1/2}.
826 @c test_proportion(45, 103, alternative = less);
830 (%i2) test_proportion(45, 103, alternative = less);
833 | sample_proportion = .4368932038834951
837 | conf_interval = [0, 0.522714149150231]
839 (%o2) | method = Exact binomial test.
841 | hypotheses = H0: p = 0.5 , H1: p < 0.5
845 | distribution = [binomial, 103, 0.5]
847 | p_value = .1184509388901454
850 A two sided asymptotic test. Confidence level is 99/100.
855 @c test_proportion(45, 103,
856 @c conflevel = 99/100, asymptotic=true);
861 (%i3) test_proportion(45, 103,
862 conflevel = 99/100, asymptotic=true);
865 | sample_proportion = .43689
869 | conf_interval = [.31422, .56749]
871 (%o3) | method = Asympthotic test with Yates correction.
873 | hypotheses = H0: p = 0.5 , H1: p # 0.5
877 | distribution = [normal, 0.5, .048872]
882 @opencatbox{Categories:}
883 @category{Package stats}
892 @anchor{test_proportions_difference}
893 @deffn {Function} test_proportions_difference @
894 @fname{test_proportions_difference} (@var{x1}, @var{n1}, @var{x2}, @var{n2}) @
895 @fname{test_proportions_difference} (@var{x1}, @var{n1}, @var{x2}, @var{n2}, @var{options} @dots{})
897 Inferences on the difference of two proportions. Argument @var{x1} is the number of successes
898 in @var{n1} trials in a Bernoulli experiment in the first population, and @var{x2} and @var{n2}
899 are the corresponding values in the second population. Samples are independent and the test
907 @code{'alternative}, default @code{'twosided}, is the alternative hypothesis;
908 valid values are: @code{'twosided} (@code{p1 # p2}), @code{'greater} (@code{p1 > p2})
909 and @code{'less} (@code{p1 < p2}).
912 @code{'conflevel}, default @code{95/100}, confidence level for the confidence interval; it must
913 be an expression which takes a value in (0,1).
916 @code{'correct}, default @code{true}, indicates whether Yates correction is applied or not.
920 The output of function @code{test_proportions_difference} is an @code{inference_result} Maxima object
921 showing the following results:
926 @code{'proportions}: list with the two sample proportions.
929 @code{'conf_level}: confidence level selected by the user.
932 @code{'conf_interval}: Confidence interval for the difference of proportions @code{p1 - p2}.
935 @code{'method}: inference procedure and warning message in case of any of the samples sizes
939 @code{'hypotheses}: null and alternative hypotheses to be tested.
942 @code{'statistic}: value of the sample statistic used for testing the null hypothesis.
945 @code{'distribution}: distribution of the sample statistic, together with its parameters.
948 @code{'p_value}: @math{p}-value of the test.
954 A machine produced 10 defective articles in a batch of 250.
955 After some maintenance work, it produces 4 defective in a batch of 150.
956 In order to know if the machine has improved, we test the null
957 hypothesis @code{H0:p1=p2}, against the alternative @code{H0:p1>p2},
958 where @code{p1} and @code{p2} are the probabilities for one produced
959 article to be defective before and after maintenance. According to
960 the p value, there is not enough evidence to accept the alternative.
965 @c test_proportions_difference(10, 250, 4, 150,
966 @c alternative = greater);
971 (%i3) test_proportions_difference(10, 250, 4, 150,
972 alternative = greater);
973 | DIFFERENCE OF PROPORTIONS TEST
975 | proportions = [0.04, .02666667]
979 | conf_interval = [- .02172761, 1]
981 (%o3) | method = Asymptotic test. Yates correction.
983 | hypotheses = H0: p1 = p2 , H1: p1 > p2
985 | statistic = .01333333
987 | distribution = [normal, 0, .01898069]
992 Exact standard deviation of the asymptotic normal
993 distribution when the data are unknown.
997 @c stats_numer: false$
998 @c sol: test_proportions_difference(x1,n1,x2,n2)$
999 @c last(take_inference('distribution,sol));
1002 (%i1) load("stats")$
1003 (%i2) stats_numer: false$
1004 (%i3) sol: test_proportions_difference(x1,n1,x2,n2)$
1005 (%i4) last(take_inference('distribution,sol));
1007 (-- + --) (x2 + x1) (1 - -------)
1009 (%o4) sqrt(---------------------------------)
1013 @opencatbox{Categories:}
1014 @category{Package stats}
1020 @deffn {Function} test_sign @
1021 @fname{test_sign} (@var{x}) @
1022 @fname{test_sign} (@var{x}, @var{options} @dots{})
1024 This is the non parametric sign test for the median of a continuous population.
1025 Argument @var{x} is a list or a column matrix containing an one dimensional sample.
1032 @code{'alternative}, default @code{'twosided}, is the alternative hypothesis;
1033 valid values are: @code{'twosided}, @code{'greater} and @code{'less}.
1036 @code{'median}, default @code{0}, is the median value to be checked.
1040 The output of function @code{test_sign} is an @code{inference_result} Maxima object
1041 showing the following results:
1046 @code{'med_estimate}: the sample median.
1049 @code{'method}: inference procedure.
1052 @code{'hypotheses}: null and alternative hypotheses to be tested.
1055 @code{'statistic}: value of the sample statistic used for testing the null hypothesis.
1058 @code{'distribution}: distribution of the sample statistic, together with its parameter(s).
1061 @code{'p_value}: @math{p}-value of the test.
1067 Checks whether the population from which the sample was taken has median 6,
1068 against the alternative @math{H_1: median > 6}.
1072 @c x: [2,0.1,7,1.8,4,2.3,5.6,7.4,5.1,6.1,6]$
1073 @c test_sign(x,'median=6,'alternative='greater);
1076 (%i1) load("stats")$
1077 (%i2) x: [2,0.1,7,1.8,4,2.3,5.6,7.4,5.1,6.1,6]$
1078 (%i3) test_sign(x,'median=6,'alternative='greater);
1081 | med_estimate = 5.1
1083 | method = Non parametric sign test.
1085 (%o3) | hypotheses = H0: median = 6 , H1: median > 6
1089 | distribution = [binomial, 10, 0.5]
1091 | p_value = .05468749999999989
1094 @opencatbox{Categories:}
1095 @category{Package stats}
1100 @anchor{test_signed_rank}
1101 @deffn {Function} test_signed_rank @
1102 @fname{test_signed_rank} (@var{x}) @
1103 @fname{test_signed_rank} (@var{x}, @var{options} @dots{})
1105 This is the Wilcoxon signed rank test to make inferences about the median of a
1106 continuous population. Argument @var{x} is a list or a column matrix
1107 containing an one dimensional sample. Performs normal approximation if the
1108 sample size is greater than 20, or if there are zeroes or ties.
1110 @c TODO: These two variables/functions aren't documented
1111 See also @code{pdf_rank_test} and @code{cdf_rank_test}
1118 @code{'median}, default @code{0}, is the median value to be checked.
1121 @code{'alternative}, default @code{'twosided}, is the alternative hypothesis;
1122 valid values are: @code{'twosided}, @code{'greater} and @code{'less}.
1126 The output of function @code{test_signed_rank} is an @code{inference_result} Maxima object
1127 with the following results:
1132 @code{'med_estimate}: the sample median.
1135 @code{'method}: inference procedure.
1138 @code{'hypotheses}: null and alternative hypotheses to be tested.
1141 @code{'statistic}: value of the sample statistic used for testing the null hypothesis.
1144 @code{'distribution}: distribution of the sample statistic, together with its parameter(s).
1147 @code{'p_value}: @math{p}-value of the test.
1153 Checks the null hypothesis @math{H_0: median = 15} against the
1154 alternative @math{H_1: median > 15}. This is an exact test, since
1157 @c equivalent code for R:
1158 @c x <- c(17.1,15.9,13.7,13.4,15.5,17.6)
1159 @c wilcox.test(x,mu=15,alternative="greater")
1163 @c x: [17.1,15.9,13.7,13.4,15.5,17.6]$
1164 @c test_signed_rank(x,median=15,alternative=greater);
1167 (%i1) load("stats")$
1168 (%i2) x: [17.1,15.9,13.7,13.4,15.5,17.6]$
1169 (%i3) test_signed_rank(x,median=15,alternative=greater);
1172 | med_estimate = 15.7
1174 | method = Exact test
1176 (%o3) | hypotheses = H0: med = 15 , H1: med > 15
1180 | distribution = [signed_rank, 6]
1185 Checks the null hypothesis @math{H_0: equal(median, 2.5)} against the
1186 alternative @math{H_1: not equal(median, 2.5)}. This is an approximated test,
1187 since there are ties.
1189 @c equivalent code for R:
1190 @c y<-c(1.9,2.3,2.6,1.9,1.6,3.3,4.2,4,2.4,2.9,1.5,3,2.9,4.2,3.1)
1191 @c wilcox.test(y,mu=2.5)
1195 @c y:[1.9,2.3,2.6,1.9,1.6,3.3,4.2,4,2.4,2.9,1.5,3,2.9,4.2,3.1]$
1196 @c test_signed_rank(y,median=2.5);
1199 (%i1) load("stats")$
1200 (%i2) y:[1.9,2.3,2.6,1.9,1.6,3.3,4.2,4,2.4,2.9,1.5,3,2.9,4.2,3.1]$
1201 (%i3) test_signed_rank(y,median=2.5);
1204 | med_estimate = 2.9
1206 | method = Asymptotic test. Ties
1208 (%o3) | hypotheses = H0: med = 2.5 , H1: med # 2.5
1212 | distribution = [normal, 60.5, 17.58195097251724]
1214 | p_value = .3628097734643669
1217 @opencatbox{Categories:}
1218 @category{Package stats}
1223 @anchor{test_rank_sum}
1224 @deffn {Function} test_rank_sum @
1225 @fname{test_rank_sum} (@var{x1}, @var{x2}) @
1226 @fname{test_rank_sum} (@var{x1}, @var{x2}, @var{option})
1228 This is the Wilcoxon-Mann-Whitney test for comparing the medians of two
1229 continuous populations. The first two arguments @var{x1} and @var{x2} are lists
1230 or column matrices with the data of two independent samples. Performs normal
1231 approximation if any of the sample sizes is greater than 10, or if there are ties.
1238 @code{'alternative}, default @code{'twosided}, is the alternative hypothesis;
1239 valid values are: @code{'twosided}, @code{'greater} and @code{'less}.
1243 The output of function @code{test_rank_sum} is an @code{inference_result} Maxima object
1244 with the following results:
1249 @code{'method}: inference procedure.
1252 @code{'hypotheses}: null and alternative hypotheses to be tested.
1255 @code{'statistic}: value of the sample statistic used for testing the null hypothesis.
1258 @code{'distribution}: distribution of the sample statistic, together with its parameters.
1261 @code{'p_value}: @math{p}-value of the test.
1267 Checks whether populations have similar medians. Samples sizes
1268 are small and an exact test is made.
1270 @c equivalent code for R:
1271 @c x <- c(12,15,17,38,42,10,23,35,28)
1272 @c y <- c(21,18,25,14,52,65,40,43)
1277 @c x:[12,15,17,38,42,10,23,35,28]$
1278 @c y:[21,18,25,14,52,65,40,43]$
1279 @c test_rank_sum(x,y);
1282 (%i1) load("stats")$
1283 (%i2) x:[12,15,17,38,42,10,23,35,28]$
1284 (%i3) y:[21,18,25,14,52,65,40,43]$
1285 (%i4) test_rank_sum(x,y);
1288 | method = Exact test
1290 | hypotheses = H0: med1 = med2 , H1: med1 # med2
1294 | distribution = [rank_sum, 9, 8]
1296 | p_value = .1995886466474702
1299 Now, with greater samples and ties, the procedure makes
1300 normal approximation. The alternative hypothesis is
1301 @math{H_1: median1 < median2}.
1303 @c equivalent code for R:
1304 @c x <- c(39,42,35,13,10,23,15,20,17,27)
1305 @c y <- c(20,52,66,19,41,32,44,25,14,39,43,35,19,56,27,15)
1306 @c wilcox.test(x,y,alternative="less")
1310 @c x: [39,42,35,13,10,23,15,20,17,27]$
1311 @c y: [20,52,66,19,41,32,44,25,14,39,43,35,19,56,27,15]$
1312 @c test_rank_sum(x,y,'alternative='less);
1315 (%i1) load("stats")$
1316 (%i2) x: [39,42,35,13,10,23,15,20,17,27]$
1317 (%i3) y: [20,52,66,19,41,32,44,25,14,39,43,35,19,56,27,15]$
1318 (%i4) test_rank_sum(x,y,'alternative='less);
1321 | method = Asymptotic test. Ties
1323 | hypotheses = H0: med1 = med2 , H1: med1 < med2
1327 | distribution = [normal, 79.5, 18.95419580097078]
1329 | p_value = .05096985666598441
1332 @opencatbox{Categories:}
1333 @category{Package stats}
1338 @anchor{test_normality}
1339 @deffn {Function} test_normality (@var{x})
1341 Shapiro-Wilk test for normality. Argument @var{x} is a list of numbers, and sample
1342 size must be greater than 2 and less or equal than 5000, otherwise, function
1343 @code{test_normality} signals an error message.
1347 [1] Algorithm AS R94, Applied Statistics (1995), vol.44, no.4, 547-551
1349 The output of function @code{test_normality} is an @code{inference_result} Maxima object
1350 with the following results:
1355 @code{'statistic}: value of the @var{W} statistic.
1358 @code{'p_value}: @math{p}-value under normal assumption.
1364 Checks for the normality of a population, based on a sample of size 9.
1366 @c equivalent code for R:
1367 @c x <- c(12,15,17,38,42,10,23,35,28)
1372 @c x:[12,15,17,38,42,10,23,35,28]$
1373 @c test_normality(x);
1376 (%i1) load("stats")$
1377 (%i2) x:[12,15,17,38,42,10,23,35,28]$
1378 (%i3) test_normality(x);
1379 | SHAPIRO - WILK TEST
1381 (%o3) | statistic = .9251055695162436
1383 | p_value = .4361763918860381
1386 @opencatbox{Categories:}
1387 @category{Package stats}
1392 @anchor{linear_regression}
1393 @deffn {Function} linear_regression @
1394 @fname{linear_regression} (@var{x}) @
1395 @fname{linear_regression} (@var{x} @var{option})
1397 Multivariate linear regression,
1398 @math{y_i = b0 + b1*x_1i + b2*x_2i + ... + bk*x_ki + u_i},
1399 where @math{u_i} are @math{N(0,sigma)} independent random variables.
1400 Argument @var{x} must be a matrix with more than one column. The
1401 last column is considered as the responses (@math{y_i}).
1408 @code{'conflevel}, default @code{95/100}, confidence level for the
1409 confidence intervals; it must be an expression which takes a value
1413 The output of function @code{linear_regression} is an
1414 @code{inference_result} Maxima object with the following results:
1419 @code{'b_estimation}: regression coefficients estimates.
1422 @code{'b_covariances}: covariance matrix of the regression
1423 coefficients estimates.
1426 @code{b_conf_int}: confidence intervals of the regression coefficients.
1429 @code{b_statistics}: statistics for testing coefficient.
1432 @code{b_p_values}: p-values for coefficient tests.
1435 @code{b_distribution}: probability distribution for coefficient tests.
1438 @code{v_estimation}: unbiased variance estimator.
1441 @code{v_conf_int}: variance confidence interval.
1444 @code{v_distribution}: probability distribution for variance test.
1447 @code{residuals}: residuals.
1450 @code{adc}: adjusted determination coefficient.
1453 @code{aic}: Akaike's information criterion.
1456 @code{bic}: Bayes's information criterion.
1460 Only items 1, 4, 5, 6, 7, 8, 9 and 11 above, in this order,
1461 are shown by default. The rest remain hidden until the user
1462 makes use of functions @code{items_inference} and @code{take_inference}.
1466 Fitting a linear model to a trivariate sample. The
1467 last column is considered as the responses (@math{y_i}).
1472 @c [58,111,64],[84,131,78],[78,158,83],
1473 @c [81,147,88],[82,121,89],[102,165,99],
1474 @c [85,174,101],[102,169,102])$
1476 @c res: linear_regression(X);
1477 @c items_inference(res);
1478 @c take_inference('b_covariances, res);
1479 @c take_inference('bic, res);
1482 @c points_joined = true,
1484 @c points(take_inference('residuals, res)) )$
1487 (%i2) load("stats")$
1489 [58,111,64],[84,131,78],[78,158,83],
1490 [81,147,88],[82,121,89],[102,165,99],
1491 [85,174,101],[102,169,102])$
1492 (%i4) fpprintprec: 4$
1493 (%i5) res: linear_regression(X);
1494 | LINEAR REGRESSION MODEL
1496 | b_estimation = [9.054, .5203, .2397]
1498 | b_statistics = [.6051, 2.246, 1.74]
1500 | b_p_values = [.5715, .07466, .1423]
1502 (%o5) | b_distribution = [student_t, 5]
1504 | v_estimation = 35.27
1506 | v_conf_int = [13.74, 212.2]
1508 | v_distribution = [chi2, 5]
1511 (%i6) items_inference(res);
1512 (%o6) [b_estimation, b_covariances, b_conf_int, b_statistics,
1513 b_p_values, b_distribution, v_estimation, v_conf_int,
1514 v_distribution, residuals, adc, aic, bic]
1515 (%i7) take_inference('b_covariances, res);
1516 [ 223.9 - 1.12 - .8532 ]
1518 (%o7) [ - 1.12 .05367 - .02305 ]
1520 [ - .8532 - .02305 .01898 ]
1521 (%i8) take_inference('bic, res);
1525 points_joined = true,
1527 points(take_inference('residuals, res)) )$
1530 @opencatbox{Categories:}
1531 @category{Package stats}
1532 @category{Statistical estimation}
1537 @node Functions and Variables for special distributions, , Functions and Variables for stats, Package stats
1538 @section Functions and Variables for special distributions
1540 @anchor{pdf_signed_rank}
1541 @deffn {Function} pdf_signed_rank (@var{x}, @var{n})
1542 Probability density function of the exact distribution of the
1543 signed rank statistic. Argument @var{x} is a real
1544 number and @var{n} a positive integer.
1546 See also @mrefdot{test_signed_rank}
1548 @opencatbox{Categories:}
1549 @category{Package stats}
1554 @anchor{cdf_signed_rank}
1555 @deffn {Function} cdf_signed_rank (@var{x}, @var{n})
1556 Cumulative distribution function of the exact distribution of the
1557 signed rank statistic. Argument @var{x} is a real
1558 number and @var{n} a positive integer.
1560 See also @mrefdot{test_signed_rank}
1562 @opencatbox{Categories:}
1563 @category{Package stats}
1568 @anchor{pdf_rank_sum}
1569 @deffn {Function} pdf_rank_sum (@var{x}, @var{n}, @var{m})
1570 Probability density function of the exact distribution of the
1571 rank sum statistic. Argument @var{x} is a real
1572 number and @var{n} and @var{m} are both positive integers.
1574 See also @mrefdot{test_rank_sum}
1576 @opencatbox{Categories:}
1577 @category{Package stats}
1582 @anchor{cdf_rank_sum}
1583 @deffn {Function} cdf_rank_sum (@var{x}, @var{n}, @var{m})
1584 Cumulative distribution function of the exact distribution of the
1585 rank sum statistic. Argument @var{x} is a real
1586 number and @var{n} and @var{m} are both positive integers.
1588 See also @mrefdot{test_rank_sum}
1590 @opencatbox{Categories:}
1591 @category{Package stats}