updated version, but need to update installation scripts
[cls.git] / doc / glim.tex
blob963bad23af4adc4bef17bd0308097f4719b610d9
1 \documentstyle{article}
3 \newcommand{\refitem}[1]{%
4 \begin{list}%
5 {}%
6 {\setlength{\leftmargin}{.25in}\setlength{\itemindent}{-.25in}}
7 \item #1%
8 \end{list}}
10 \setlength{\textwidth}{6in}
11 \setlength{\textheight}{8.75in}
12 \setlength{\topmargin}{-0.25in}
13 \setlength{\oddsidemargin}{0.25in}
15 % This command enables hyphenation if \tt mode by changed \hyphencharacter
16 % in the 10 point typewriter font. To work in other point sizes it would
17 % have to be redefined. It may be Bator to just make the change globally
18 % and have it apply to anything that is set in \tt mode
19 \newcommand{\dcode}[1]{{\tt #1}}
21 \newcommand{\param}[1]{$\langle${\em #1\/}$\rangle$}
22 \newcommand{\protoimage}[1]{\begin{picture}(100,20)\put(0,0){\makebox(100,20){\tt #1}}\put(50,10){\oval(100,20)}\end{picture}}
23 \newcommand{\wprotoimage}[1]{\begin{picture}(120,20)\put(0,0){\makebox(120,20){\tt #1}}\put(60,10){\oval(120,20)}\end{picture}}
25 \title{Generalized Linear Models in Lisp-Stat}
26 \author{Luke Tierney}
28 \begin{document}
29 \maketitle
31 \section{Introduction}
32 This note outlines a simple system for fitting generalized linear
33 models in Lisp-Stat. Three standard models are implemented:
34 \begin{itemize}
35 \item Poisson regression models
36 \item Binomial regression models
37 \item Gamma regression models
38 \end{itemize}
39 The model prototypes inherit from the linear regression model
40 prototype. By default, each model uses the canonical link for its
41 error structure, but alternate link structures can be specified.
43 The next section outlines the basic use of the generalized linear
44 model objects. The third section describes a few functions for
45 handling categorical independent variables. The fourth section gives
46 further details on the structure of the model prototypes, and
47 describes how to define new models and link structures. The final
48 section illustrates several ways of fitting more specialized models,
49 using the Bradley-Terry model as an example.
51 \section{Basic Use of the Model Objects}
52 Three functions are available for constructing generalized linear
53 model objects. These functions are called as
54 \begin{flushleft}\tt
55 (poissonreg-model \param{x} \param{y} [\param{keyword arguments ...}])\\
56 (binomialreg-model \param{x} \param{y} \param{n} [\param{keyword arguments ...}])\\
57 (gammareg-model \param{x} \param{y} \param{keyword arguments ...})
58 \end{flushleft}
59 The \param{x} and \param{y} arguments are as for the
60 \dcode{regression-model} function. The sample size parameter \param{n}
61 for binomial models can be either an integer or a sequence of integers
62 the same length as the response vector. All optional keyword
63 arguments accepted by the \dcode{regression-model} function are
64 accepted by these functions as well. Four additional keywords are
65 available:
66 \dcode{:link}, \dcode{:offset}, \dcode{:verbose}, and \dcode{:pweights}.
67 The keyword \dcode{:link} can be used to specify an alternate link
68 structure. Available link structures include
69 \begin{center}
70 \begin{tabular}{llll}
71 \tt identity-link & \tt log-link & \tt inverse-link & \tt sqrt-link\\
72 \tt logit-link & \tt probit-link & \tt cloglog-link
73 \end{tabular}
74 \end{center}
75 By default, each model uses its canonical link structure. The
76 \dcode{:offset} keyword can be used to provide an offset value, and
77 the keyword \dcode{:verbose} can be given the value \dcode{nil} to
78 suppress printing of iteration information. A prior weight vector
79 should be specified with the \dcode{:pweights} keyword rather than the
80 \dcode{:weights} keyword.
82 As an example, we can examine a data set that records the number of
83 months prior to an interview when individuals remember a stressful
84 event (originally from Haberman, \cite[p. 2]{JKL}):
85 \begin{verbatim}
86 > (def months-before (iseq 1 18))
87 MONTHS-BEFORE
88 > (def event-counts '(15 11 14 17 5 11 10 4 8 10 7 9 11 3 6 1 1 4))
89 EVENTS-RECALLED
90 \end{verbatim}
91 The data are multinomial, and we can fit a log-linear Poisson model to
92 see if there is any time trend:
93 \begin{verbatim}
94 > (def m (poissonreg-model months-before event-counts))
95 Iteration 1: deviance = 26.3164
96 Iteration 2: deviance = 24.5804
97 Iteration 3: deviance = 24.5704
98 Iteration 4: deviance = 24.5704
100 Weighted Least Squares Estimates:
102 Constant 2.80316 (0.148162)
103 Variable 0 -0.0837691 (0.0167996)
105 Scale taken as: 1
106 Deviance: 24.5704
107 Number of cases: 18
108 Degrees of freedom: 16
109 \end{verbatim}
111 Residuals for the fit can be obtained using the \dcode{:residuals}
112 message:
113 \begin{verbatim}
114 > (send m :residuals)
115 (-0.0439191 -0.790305 ...)
116 \end{verbatim}
117 A residual plot can be obtained using
118 \begin{verbatim}
119 (send m :plot-residuals)
120 \end{verbatim}
121 The \dcode{:fit-values} message returns $X\beta$, the linear predictor
122 without any offset. The \dcode{:fit-means} message returns fitted mean
123 response values. Thus the expression
124 \begin{verbatim}
125 (let ((p (plot-points months-before event-counts)))
126 (send p :add-lines months-before (send m :fit-means)))
127 \end{verbatim}
128 constructs a plot of raw counts and fitted means against time.
130 To illustrate fitting binomial models, we can use the leukemia survival
131 data of Feigl and Zelen \cite[Section 2.8.3]{LS} with the survival
132 time converted to a one-year survival indicator:
133 \begin{verbatim}
134 > (def surv-1 (if-else (> times-pos 52) 1 0))
135 SURV-1
136 > surv-1
137 (1 1 1 1 0 1 1 0 0 1 1 0 0 0 0 0 1)
138 \end{verbatim}
139 The dependent variable is the base 10 logarithm of the white blood
140 cell counts divided by 10,000:
141 \begin{verbatim}
142 > transformed-wbc-pos
143 (-1.46968 -2.59027 -0.84397 -1.34707 -0.510826 0.0487902 0 0.530628 -0.616186
144 -0.356675 -0.0618754 1.16315 1.25276 2.30259 2.30259 1.64866 2.30259)
145 \end{verbatim}
146 A binomial model for these data can be constructed by
147 \begin{verbatim}
148 > (def lk (binomialreg-model transformed-wbc-pos surv-1 1))
149 Iteration 1: deviance = 18.2935
150 Iteration 2: deviance = 18.0789
151 Iteration 3: deviance = 18.0761
152 Iteration 4: deviance = 18.0761
154 Weighted Least Squares Estimates:
156 Constant 0.372897 (0.590934)
157 Variable 0 -0.985803 (0.508426)
159 Scale taken as: 1
160 Deviance: 18.0761
161 Number of cases: 17
162 Degrees of freedom: 15
163 \end{verbatim}
164 This model uses the logit link, the canonical link for the binomial
165 distribution. As an alternative, the expression
166 \begin{verbatim}
167 (binomialreg-model transformed-wbc-pos surv-1 1 :link probit-link)
168 \end{verbatim}
169 returns a model using a probit link.
171 The \dcode{:cooks-distances} message helps to highlight the last
172 observation for possible further examination:
173 \begin{verbatim}
174 > (send lk :cooks-distances)
175 (0.0142046 0.00403243 0.021907 0.0157153 0.149394 0.0359723 0.0346383
176 0.0450994 0.174799 0.0279114 0.0331333 0.0347883 0.033664 0.0170441
177 0.0170441 0.0280411 0.757332)
178 \end{verbatim}
179 This observation also stands out in the plot produced by
180 \begin{verbatim}
181 (send lk :plot-bayes-residuals)
182 \end{verbatim}
184 \section{Tools for Categorical Variables}
185 Four functions are provided to help construct indicator vectors for
186 categorical variables. As an illustration, a data set used by Bishop,
187 Fienberg, and Holland examines the relationship between occupational
188 classifications of fathers and sons. The classes are
189 \begin{center}
190 \begin{tabular}{|c|l|}
191 \hline
192 Label & Description\\
193 \hline
194 A & Professional, High Administrative\\
195 S & Managerial, Executive, High Supervisory\\
196 I & Low Inspectional, Supervisory\\
197 N & Routine Nonmanual, Skilled Manual\\
198 U & Semi- and Unskilled Manual\\
199 \hline
200 \end{tabular}
201 \end{center}
202 The counts are given by
203 \begin{center}
204 \begin{tabular}{|c|rrrrr|}
205 \hline
206 & \multicolumn{5}{c|}{Son}\\
207 \hline
208 Father & A & S & I & N & U \\
209 \hline
210 A & 50 & 45 & 8 & 18 & 8 \\
211 S & 28 & 174 & 84 & 154 & 55 \\
212 I & 11 & 78 & 110 & 223 & 96 \\
213 N & 14 & 150 & 185 & 714 & 447 \\
214 U & 3 & 42 & 72 & 320 & 411 \\
215 \hline
216 \end{tabular}
217 \end{center}
219 We can set up the occupation codes as
220 \begin{verbatim}
221 (def occupation '(a s i n u))
222 \end{verbatim}
223 and construct the son's and father's code vectors for entering the
224 data row by row as
225 \begin{verbatim}
226 (def son (repeat occupation 5))
227 (def father (repeat occupation (repeat 5 5)))
228 \end{verbatim}
229 The counts can then be entered as
230 \begin{verbatim}
231 (def counts '(50 45 8 18 8
232 28 174 84 154 55
233 11 78 110 223 96
234 14 150 185 714 447
235 3 42 72 320 411))
236 \end{verbatim}
238 To fit an additive log-linear model, we need to construct level
239 indicators. This can be done using the function \dcode{indicators}:
240 \begin{verbatim}
241 > (indicators son)
242 ((0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0)
243 (0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0)
244 (0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0)
245 (0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1))
246 \end{verbatim}
247 The result is a list of indicator variables for the second through the fifth
248 levels of the variable \dcode{son}. By default, the first level is dropped.
249 To obtain indicators for all five levels, we can supply the \dcode{:drop-first}
250 keyword with value \dcode{nil}:
251 \begin{verbatim}
252 > (indicators son :drop-first nil)
253 ((1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0)
254 (0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0)
255 (0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0)
256 (0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0)
257 (0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1))
258 \end{verbatim}
260 To produce a readable summary of the fit, we also need some labels:
261 \begin{verbatim}
262 > (level-names son :prefix 'son)
263 ("SON(S)" "SON(I)" "SON(N)" "SON(U)")
264 \end{verbatim}
265 By default, this function also drops the first level. This can again be
266 changed by supplying the \dcode{:drop-first} keyword argument as
267 \dcode{nil}:
268 \begin{verbatim}
269 > (level-names son :prefix 'son :drop-first nil)
270 ("SON(A)" "SON(S)" "SON(I)" "SON(N)" "SON(U)")
271 \end{verbatim}
272 The value of the \dcode{:prefix} keyword can be any Lisp expression.
273 For example, instead of the symbol \dcode{son} we can use the string
274 \dcode{"Son"}:
275 \begin{verbatim}
276 > (level-names son :prefix "Son")
277 ("Son(S)" "Son(I)" "Son(N)" "Son(U)")
278 \end{verbatim}
280 Using indicator variables and level labels, we can now fit an additive
281 model as
282 \begin{verbatim}
283 > (def mob-add
284 (poissonreg-model
285 (append (indicators son) (indicators father)) counts
286 :predictor-names (append (level-names son :prefix 'son)
287 (level-names father :prefix 'father))))
289 Iteration 1: deviance = 1007.97
290 Iteration 2: deviance = 807.484
291 Iteration 3: deviance = 792.389
292 Iteration 4: deviance = 792.19
293 Iteration 5: deviance = 792.19
295 Weighted Least Squares Estimates:
297 Constant 1.36273 (0.130001)
298 SON(S) 1.52892 (0.10714)
299 SON(I) 1.46561 (0.107762)
300 SON(N) 2.60129 (0.100667)
301 SON(U) 2.26117 (0.102065)
302 FATHER(S) 1.34475 (0.0988541)
303 FATHER(I) 1.39016 (0.0983994)
304 FATHER(N) 2.46005 (0.0917289)
305 FATHER(U) 1.88307 (0.0945049)
307 Scale taken as: 1
308 Deviance: 792.19
309 Number of cases: 25
310 Degrees of freedom: 16
311 \end{verbatim}
312 Examining the residuals using
313 \begin{verbatim}
314 (send mob-add :plot-residuals)
315 \end{verbatim}
316 shows that the first cell is an outlier -- the model does not fit this
317 cell well.
319 To fit a saturated model to these data, we need the cross products of
320 the indicator variables and also a corresponding set of labels. The
321 indicators are produced with the \dcode{cross-terms} function
322 \begin{verbatim}
323 > (cross-terms (indicators son) (indicators father))
324 ((0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0)
325 (0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0)
326 ...)
327 \end{verbatim}
328 and the names with the \dcode{cross-names} function:
329 \begin{verbatim}
330 > (cross-names (level-names son :prefix 'son)
331 (level-names father :prefix 'father))
332 ("SON(S).FATHER(S)" "SON(S).FATHER(I)" ...)
333 \end{verbatim}
334 The saturated model can now be fit by
335 \begin{verbatim}
336 > (let ((s (indicators son))
337 (f (indicators father))
338 (sn (level-names son :prefix 'son))
339 (fn (level-names father :prefix 'father)))
340 (def mob-sat
341 (poissonreg-model (append s f (cross-terms s f)) counts
342 :predictor-names
343 (append sn fn (cross-names sn fn)))))
345 Iteration 1: deviance = 5.06262e-14
346 Iteration 2: deviance = 2.44249e-15
348 Weighted Least Squares Estimates:
350 Constant 3.91202 (0.141421)
351 SON(S) -0.105361 (0.20548)
352 SON(I) -1.83258 (0.380789)
353 SON(N) -1.02165 (0.274874)
354 SON(U) -1.83258 (0.380789)
355 FATHER(S) -0.579818 (0.236039)
356 FATHER(I) -1.51413 (0.33303)
357 FATHER(N) -1.27297 (0.302372)
358 FATHER(U) -2.81341 (0.594418)
359 SON(S).FATHER(S) 1.93221 (0.289281)
360 SON(S).FATHER(I) 2.06417 (0.382036)
361 SON(S).FATHER(N) 2.47694 (0.346868)
362 SON(S).FATHER(U) 2.74442 (0.631953)
363 SON(I).FATHER(S) 2.93119 (0.438884)
364 SON(I).FATHER(I) 4.13517 (0.494975)
365 SON(I).FATHER(N) 4.41388 (0.470993)
366 SON(I).FATHER(U) 5.01064 (0.701586)
367 SON(N).FATHER(S) 2.7264 (0.343167)
368 SON(N).FATHER(I) 4.03093 (0.41346)
369 SON(N).FATHER(N) 4.95348 (0.385207)
370 SON(N).FATHER(U) 5.69136 (0.641883)
371 SON(U).FATHER(S) 2.50771 (0.445978)
372 SON(U).FATHER(I) 3.99903 (0.496312)
373 SON(U).FATHER(N) 5.29608 (0.467617)
374 SON(U).FATHER(U) 6.75256 (0.693373)
376 Scale taken as: 1
377 Deviance: 3.37508e-14
378 Number of cases: 25
379 Degrees of freedom: 0
380 \end{verbatim}
382 \section{Structure of the Generalized Linear Model System}
383 \subsection{Model Prototypes}
384 The model objects are organized into several prototypes, with the
385 general prototype \dcode{glim-proto} inheriting from
386 \dcode{regression-model-proto}, the prototype for normal linear
387 regression models. The inheritance tree is shown in Figure
388 \ref{GLIMTree}.
389 \begin{figure}
390 \begin{center}
391 \begin{picture}(400,160)
392 \put(140,140){\wprotoimage{regression-model-proto}}
393 \put(150,70){\protoimage{glim-proto}}
394 \put(0,0){\protoimage{poissonreg-proto}}
395 \put(150,0){\protoimage{binomialreg-proto}}
396 \put(300,0){\protoimage{gammareg-proto}}
397 \put(200,140){\line(0,-1){50}}
398 \put(200,70){\line(-3,-1){150}}
399 \put(200,70){\line(3,-1){150}}
400 \put(200,70){\line(0,-1){50}}
401 \end{picture}
402 \end{center}
403 \caption{Hierarchy of generalized linear model prototypes.}
404 \label{GLIMTree}
405 \end{figure}
406 This inheritance captures the reasoning by analogy to the linear case
407 that is the basis for many ideas in the analysis of generalized linear
408 models. The fitting strategy uses iteratively reweighted least squares
409 by changing the weight vector in the model and repeatedly calling the
410 linear regression \dcode{:compute} method.
412 Convergence of the iterations is determined by comparing the relative
413 change in the coefficients and the change in the deviance to cutoff
414 values. The iteration terminates if either change falls below the
415 corresponding cutoffs. The cutoffs are set and retrieved by the
416 \dcode{:epsilon} and \dcode{:epsilon-dev} methods. The default values
417 are given by
418 \begin{verbatim}
419 > (send glim-proto :epsilon)
420 1e-06
421 > (send glim-proto :epsilon-dev)
422 0.001
423 \end{verbatim}
424 A limit is also imposed on the number of iterations. The limit can be set
425 and retrieved by the \dcode{:count-limit} message. The default value
426 is given by
427 \begin{verbatim}
428 > (send glim-proto :count-limit)
430 \end{verbatim}
432 The analogy captured in the inheritance of the \dcode{glim-proto}
433 prototype from the normal linear regression prototype is based
434 primarily on the computational process, not the modeling process. As a
435 result, several accessor methods inherited from the linear regression
436 object refer to analogous components of the computational process,
437 rather than analogous components of the model. Two examples are the
438 messages \dcode{:weights} and \dcode{:y}. The weight vector in the
439 object returned by \dcode{:weights} is the final set of weights
440 obtained in the fit; prior weights can be set and retrieved with the
441 \dcode{:pweights} message. The value returned by the \dcode{:y}
442 message is the artificial dependent variable
443 \begin{displaymath}
444 z = \eta + (y - \mu) \frac{d\eta}{d\mu}
445 \end{displaymath}
446 constructed in the iteration; the actual dependent variable can be
447 obtained and changed with the \dcode{:yvar} message.
449 The message \dcode{:eta} returns the current linear predictor values,
450 including any offset. The \dcode{:offset} message sets and retrieves
451 the offset value. For binomial models, the \dcode{:trials} message
452 sets and retrieves the number of trials for each observation.
454 The scale factor is set and retrieved with the \dcode{:scale} message.
455 Some models permit the estimation of a scale parameter. For these
456 models, the fitting system uses the \dcode{:fit-scale} message to
457 obtain a new scale value. The message \dcode{:estimate-scale}
458 determines and sets whether the scale parameter is to be estimated or
459 not.
461 Deviances of individual observations, the total deviance, and the mean
462 deviance are returned by the messages \dcode{:deviances},
463 \dcode{:deviance} and \dcode{:mean-deviance}, respectively. The
464 \dcode{:deviance} and \dcode{:mean-deviance} methods adjusts for
465 omitted observations, and the denominator for the mean deviance is
466 adjusted for the degrees of freedom available.
468 Most inherited methods for residuals, standard errors, etc., should
469 make sense at least as approximations. For example, residuals returned
470 by the inherited \dcode{:residuals} message correspond to the Pearson
471 residuals for generalized linear models. Other forms of residuals are
472 returned by the messages
473 \begin{center}
474 \begin{tabular}{lll}
475 \tt :chi-residuals & \tt :deviance-residuals & \tt :g2-residuals\\
476 \tt :raw-residuals & \tt :standardized-chi-residuals & \tt :standardized-deviance-residuals.
477 \end{tabular}
478 \end{center}
480 \subsection{Error Structures}
481 The error structure of a generalized linear model affects four methods
482 and two slots The methods are called as
483 \begin{flushleft}\tt
484 (send \param{m} :initial-means)\\
485 (send \param{m} :fit-variances \param{mu})\\
486 (send \param{m} :fit-deviances \param{mu})\\
487 (send \param{m} :fit-scale)
488 \end{flushleft}
489 The \dcode{:initial-means} method should return an initial estimate of
490 the means for the iterative search. The default method simply returns
491 the dependent variable, but for some models this may need to be
492 adjusted to move the initial estimate away from a boundary. For
493 example, the method for the Poisson regression model can be defined as
494 \begin{verbatim}
495 (defmeth poissonreg-proto :initial-means () (pmax (send self :yvar) 0.5))
496 \end{verbatim}
497 which insures that initial mean estimates are at least 0.5.
499 The \dcode{:fit-variances} \dcode{:fit-deviances} methods return the
500 values on the variance and deviance functions for a specified vector
501 of means. For the Poisson regression model, these methods can be
502 defined as
503 \begin{verbatim}
504 (defmeth poissonreg-proto :fit-variances (mu) mu)
505 \end{verbatim}
507 \begin{verbatim}
508 (defmeth poissonreg-proto :fit-deviances (mu)
509 (flet ((log+ (x) (log (if-else (< 0 x) x 1))))
510 (let* ((y (send self :yvar))
511 (raw-dev (* 2 (- (* y (log+ (/ y mu))) (- y mu))))
512 (pw (send self :pweights)))
513 (if pw (* pw raw-dev) raw-dev))))
514 \end{verbatim}
515 The local function \dcode{log+} is used to avoid taking the logarithm
516 of zero.
518 The final message, \dcode{:fit-scale}, is only used by the
519 \dcode{:display} method. The default method returns the mean deviance.
521 The two slots related to the error structure are
522 \dcode{estimate-scale} and \dcode{link}. If the value of the
523 \dcode{estimate-scale} slot is not \dcode{nil}, then a scale estimate
524 is computed and printed by the \dcode{:dislay} method. The
525 \dcode{link} slot holds the link object used by the model. The Poisson
526 model does not have a scale parameter, and the canonical link is the
527 logit link. These defaults can be set by the expressions
528 \begin{verbatim}
529 (send poissonreg-proto :estimate-scale nil)
530 (send poissonreg-proto :link log-link)
531 \end{verbatim}
533 The \dcode{glim-proto} prototype itself uses normal errors and an
534 identity link. Other error structures can be implemented by
535 constructing a new prototype and defining appropriate methods and
536 default slot values.
538 \subsection{Link Structures}
539 The link function $g$ for a generalized linear model relates the
540 linear predictor $\eta$ to the mean response $\mu$ by
541 \begin{displaymath}
542 \eta = g(\mu).
543 \end{displaymath}
544 Links are implemented as objects.
545 Table \ref{Links} lists the pre-defined link functions, along with
546 the expressions used to return link objects.
547 \begin{table}
548 \caption{Link Functions and Expression for Obtaining Link Objects}
549 \label{Links}
550 \begin{center}
551 \begin{tabular}{lccl}
552 \hline
553 Link & Formula & Domain & Expression\\
554 \hline
555 Identity & $\mu$ & $(-\infty, \infty)$ & \tt identity-link \\
556 Logarithm & $\log \mu$ & $(0, \infty)$ & \tt log-link\\
557 Inverse & $ 1/\mu$ & $(0, \infty)$ & \tt inverse-link\\
558 Square Root & $\sqrt{\mu}$ & $(0, \infty)$ & \tt sqrt-link\\
559 Logit & $\log\frac{\mu}{1-\mu}$ & $[0,1]$ & \tt logit-link\\
560 Probit & $\Phi^{-1}(\mu)$ & $[0,1]$ & \tt probit-link\\
561 Compl. log-log & $\log(-\log(1-\mu))$ & $[0,1]$ & \tt cloglog-link\\
562 Power & $\mu^{k}$ & $(0, \infty)$ & \tt (send power-link-proto :new \param{k})\\
563 \hline
564 \end{tabular}
565 \end{center}
566 \end{table}
567 With one exception, the pre-defined links require no parameters.
568 These link objects can therefore be shared among models. The exception
569 is the power link. Links for binomial models are defined for $n = 1$
570 trials and assume $0 < \mu < 1$.
572 Link objects inherit from the \dcode{glim-link-proto} prototype. The
573 \dcode{log-link} object, for example, is constructed by
574 \begin{verbatim}
575 (defproto log-link () () glim-link-proto)
576 \end{verbatim}
577 Since this prototype can be used directly in model objects, the
578 convention of having prototype names end in \dcode{-proto} is not
579 used. The \dcode{glim-link-proto} prototype provides a \dcode{:print}
580 method that should work for most link functions. The \dcode{log-link}
581 object prints as
582 \begin{verbatim}
583 > log-link
584 #<Glim Link Object: LOG-LINK>
585 \end{verbatim}
587 The \dcode{glim-proto} computing methods assume that a link object
588 responds to three messages:
589 \begin{flushleft}\tt
590 (send \param{link} :eta \param{mu})\\
591 (send \param{link} :means \param{eta})\\
592 (send \param{link} :derivs \param{mu})
593 \end{flushleft}
594 The \dcode{:eta} method returns a sequence of linear predictor values
595 for a particular mean sequence. The \dcode{:means} method is the
596 inverse of \dcode{:eta}: it returns mean values for specified values
597 of the linear predictor. The \dcode{:derivs} method returns the values of
598 \begin{displaymath}
599 \frac{d\eta}{d\mu}
600 \end{displaymath}
601 at the specified mean values. As an example, for the \dcode{log-link}
602 object these three methods are defined as
603 \begin{verbatim}
604 (defmeth log-link :eta (mu) (log mu))
605 (defmeth log-link :means (eta) (exp eta))
606 (defmeth log-link :derivs (mu) (/ mu))
607 \end{verbatim}
609 Alternative link structures can be constructed by setting up a new
610 prototype and defining appropriate \dcode{:eta}, \dcode{:means}, and
611 \dcode{:derivs} methods. Parametric link families can be implemented by
612 providing one or more slots for holding the parameters. The power link
613 is an example of a parametric link family. The power link prototype
614 is defined as
615 \begin{verbatim}
616 (defproto power-link-proto '(power) () glim-link-proto)
617 \end{verbatim}
618 The slot \dcode{power} holds the power exponent. An accessor method
619 is defined by
620 \begin{verbatim}
621 (defmeth power-link-proto :power () (slot-value 'power))
622 \end{verbatim}
623 and the \dcode{:isnew} initialization method is defined to require a
624 power argument:
625 \begin{verbatim}
626 (defmeth power-link-proto :isnew (power) (setf (slot-value 'power) power))
627 \end{verbatim}
628 Thus a power link for a particular exponent, say the exponent 2, can
629 be constructed using the expression
630 \begin{verbatim}
631 (send power-link-proto :new 2)
632 \end{verbatim}
634 To complete the power link prototype, we need to define the three
635 required methods. They are defined as
636 \begin{verbatim}
637 (defmeth power-link-proto :eta (mu) (^ mu (send self :power)))
638 \end{verbatim}
639 \begin{verbatim}
640 (defmeth power-link-proto :means (eta) (^ eta (/ (slot-value 'power))))
641 \end{verbatim}
643 \begin{verbatim}
644 (defmeth power-link-proto :derivs (mu)
645 (let ((p (slot-value 'power)))
646 (* p (^ mu (- p 1)))))
647 \end{verbatim}
648 The definition of the \dcode{:means} method could be improved to allow
649 negative arguments when the power is an odd integer. Finally, the
650 \dcode{:print} method is redefined to reflect the value of the
651 exponent:
652 \begin{verbatim}
653 (defmeth power-link-proto :print (&optional (stream t))
654 (format stream ``#<Glim Link Object: Power Link (~s)>'' (send self :power)))
655 \end{verbatim}
656 Thus a square link prints as
657 \begin{verbatim}
658 > (send power-link-proto :new 2)
659 #<Glim Link Object: Power Link (2)>
660 \end{verbatim}
662 \section{Fitting a Bradley-Terry Model}
663 Many models used in categorical data analysis can be viewed as
664 special cases of generalized linear models. One example is the
665 Bradley-Terry model for paired comparisons. The Bradley-Terry model
666 deals with a situation in which $n$ individuals or items are compared
667 to one another in paired contests. The model assumes there are
668 positive quantities $\pi_{1}, \ldots, \pi_{n}$, which can be assumed
669 to sum to one, such that
670 \begin{displaymath}
671 P\{\mbox{$i$ beats $j$}\} = \frac{\pi_{i}}{\pi_{i} + \pi_{j}}.
672 \end{displaymath}
673 If the competitions are assumed to be mutually independent, then the
674 probability $p_{ij} = P\{\mbox{$i$ beats $j$}\}$ satisfies the logit
675 model
676 \begin{displaymath}
677 \log\frac{p_{ij}}{1-p_{ij}} = \phi_{i} - \phi_{j}
678 \end{displaymath}
679 with $\phi_{i} = \log \pi_{i}$. This model can be fit to a particular
680 set of data by setting up an appropriate design matrix and response
681 vector for a binomial regression model. For a single data set this can
682 be done from scratch. Alternatively, it is possible to construct
683 functions or prototypes that allow the data to be specified in a more
684 convenient form. Furthermore, there are certain specific questions
685 that can be asked for a Bradley-Terry model, such as what is the
686 estimated value of $P\{\mbox{$i$ beats $j$}\}$? In the object-oriented
687 framework, it is very natural to attach methods for answering such
688 questions to individual models or to a model prototype.
690 To illustrate these ideas, we can fit a Bradley-Terry model to the
691 results for the eastern division of the American league for the 1987
692 baseball season \cite{Agresti}. Table \ref{WinsLosses} gives the results
693 of the games within this division.
694 \begin{table}
695 \caption{Results of 1987 Season for American League Baseball Teams}
696 \label{WinsLosses}
697 \begin{center}
698 \begin{tabular}{lccccccc}
699 \hline
700 Winning & \multicolumn{7}{c}{Losing Team}\\
701 \cline{2-8}
702 Team & Milwaukee & Detroit & Toronto & New York & Boston &
703 Cleveland & Baltimore\\
704 \hline
705 Milwaukee & - & 7 & 9 & 7 & 7 & 9 & 11\\
706 Detroit & 6 & - & 7 & 5 & 11 & 9 & 9\\
707 Toronto & 4 & 6 & - & 7 & 7 & 8 & 12\\
708 New York & 6 & 8 & 6 & - & 6 & 7 & 10\\
709 Boston & 6 & 2 & 6 & 7 & - & 7 & 12\\
710 Cleveland & 4 & 4 & 5 & 6 & 6 & - & 6\\
711 Baltimore & 2 & 4 & 1 & 3 & 1 & 7 & -\\
712 \hline
713 \end{tabular}
714 \end{center}
715 \end{table}
717 The simplest way to enter this data is as a list, working through the
718 table one row at a time:
719 \begin{verbatim}
720 (def wins-losses '( - 7 9 7 7 9 11
721 6 - 7 5 11 9 9
722 4 6 - 7 7 8 12
723 6 8 6 - 6 7 10
724 6 2 6 7 - 7 12
725 4 4 5 6 6 - 6
726 2 4 1 3 1 7 -))
727 \end{verbatim}
728 The choice of the symbol \dcode{-} for the diagonal entries is
729 arbitrary; any other Lisp item could be used. The team names will also
730 be useful as labels:
731 \begin{verbatim}
732 (def teams '("Milwaukee" "Detroit" "Toronto" "New York"
733 "Boston" "Cleveland" "Baltimore"))
734 \end{verbatim}
736 To set up a model, we need to extract the wins and losses from the
737 \dcode{wins-losses} list. The expression
738 \begin{verbatim}
739 (let ((i (iseq 1 6)))
740 (def low-i (apply #'append (+ (* 7 i) (mapcar #'iseq i)))))
741 \end{verbatim}
742 constructs a list of the indices of the elements in the lower
743 triangle:
744 \begin{verbatim}
745 > low-i
746 (7 14 15 21 22 23 28 29 30 31 35 36 37 38 39 42 43 44 45 46 47)
747 \end{verbatim}
748 The wins can now be extracted from the \dcode{wins-losses} list using
749 \begin{verbatim}
750 > (select wins-losses low-i)
751 (6 4 6 6 8 6 6 2 6 7 4 4 5 6 6 2 4 1 3 1 7)
752 \end{verbatim}
753 Since we need to extract the lower triangle from a number of lists, we
754 can define a function to do this as
755 \begin{verbatim}
756 (defun lower (x) (select x low-i))
757 \end{verbatim}
758 Using this function, we can calculate the wins and save them in a
759 variable \dcode{wins}:
760 \begin{verbatim}
761 (def wins (lower wins-losses))
762 \end{verbatim}
764 To extract the losses, we need to form the list of the entries for the
765 transpose of our table. The function \dcode{split-list} can be used
766 to return a list of lists of the contents of the rows of the original
767 table. The \dcode{transpose} function transposes this list of lists,
768 and the \dcode{append} function can be applied to the result to
769 combine the lists of lists for the transpose into a single list:
770 \begin{verbatim}
771 (def losses-wins (apply #'append (transpose (split-list wins-losses 7))))
772 \end{verbatim}
773 The losses are then obtained by
774 \begin{verbatim}
775 (def losses (lower losses-wins))
776 \end{verbatim}
777 Either \dcode{wins} or \dcode{losses} can be used as the response for
778 a binomial model, with the trials given by
779 \begin{verbatim}
780 (+ wins losses)
781 \end{verbatim}
783 When fitting the Bradley-Terry model as a binomial regression model
784 with a logit link, the model has no intercept and the columns of the
785 design matrix are the differences of the row and column indicators for
786 the table of results. Since the rows of this matrix sum to zero if
787 all row and column levels are used, we can delete one of the levels,
788 say the first one. Lists of row and column indicators are set up by
789 the expressions
790 \begin{verbatim}
791 (def rows (mapcar #'lower (indicators (repeat (iseq 7) (repeat 7 7)))))
792 (def cols (mapcar #'lower (indicators (repeat (iseq 7) 7))))
793 \end{verbatim}
794 The function \dcode{indicators} drops the first level in constructing
795 its indicators. The function \dcode{mapcar} applies \dcode{lower} to
796 each element of the indicators list and returns a list of the results.
797 Using these two variables, the expression
798 \begin{verbatim}
799 (- rows cols)
800 \end{verbatim}
801 constructs a list of the columns of the design matrix.
803 We can now construct a model object for this data set:
804 \begin{verbatim}
805 > (def wl (binomialreg-model (- rows cols)
806 wins
807 (+ wins losses)
808 :intercept nil
809 :predictor-names (rest teams)))
810 Iteration 1: deviance = 16.1873
811 Iteration 2: deviance = 15.7371
813 Weighted Least Squares Estimates:
815 Detroit -0.144948 (0.311056)
816 Toronto -0.286871 (0.310207)
817 New York -0.333738 (0.310126)
818 Boston -0.473658 (0.310452)
819 Cleveland -0.897502 (0.316504)
820 Baltimore -1.58134 (0.342819)
822 Scale taken as: 1
823 Deviance: 15.7365
824 Number of cases: 21
825 Degrees of freedom: 15
826 \end{verbatim}
828 To fit to a Bradley-Terry model to other data sets, we can repeat this
829 process. As an alternative, we can incorporate the steps used here
830 into a function:
831 \begin{verbatim}
832 (defun bradley-terry-model (counts &key labels)
833 (let* ((n (round (sqrt (length counts))))
834 (i (iseq 1 (- n 1)))
835 (low-i (apply #'append (+ (* n i) (mapcar #'iseq i))))
836 (p-names (if labels
837 (rest labels)
838 (level-names (iseq n) :prefix "Choice"))))
839 (labels ((tr (x)
840 (apply #'append (transpose (split-list (coerce x 'list) n))))
841 (lower (x) (select x low-i))
842 (low-indicators (x) (mapcar #'lower (indicators x))))
843 (let ((wins (lower counts))
844 (losses (lower (tr counts)))
845 (rows (low-indicators (repeat (iseq n) (repeat n n))))
846 (cols (low-indicators (repeat (iseq n) n))))
847 (binomialreg-model (- rows cols)
848 wins
849 (+ wins losses)
850 :intercept nil
851 :predictor-names p-names)))))
852 \end{verbatim}
853 This function defines the function \dcode{lower} as a local function.
854 The local function \dcode{tr} calculates the list of the elements in
855 the transposed table, and the function \dcode{low-indicators} produces
856 indicators for the lower triangular portion of a categorical variable.
857 The \dcode{bradley-terry-model} function allows the labels for the
858 contestants to be specified as a keyword argument. If this argument is
859 omitted, reasonable default labels are constructed. Using this
860 function, we can construct our model object as
861 \begin{verbatim}
862 (def wl (bradley-terry-model wins-losses :labels teams))
863 \end{verbatim}
865 The definition of this function could be improved to allow some of the
866 keyword arguments accepted by \dcode{binomialreg-model}.
868 Using the fit model object, we can estimate the probability of Boston
869 $(i = 4)$ defeating New York $(j = 3)$:
870 \begin{verbatim}
871 > (let* ((phi (cons 0 (send wl :coef-estimates)))
872 (exp-logit (exp (- (select phi 3) (select phi 4)))))
873 (/ exp-logit (+ 1 exp-logit)))
874 0.534923
875 \end{verbatim}
876 To be able to easily calculate such an estimate for any pairing, we can
877 give our model object a method for the \dcode{:success-prob} message
878 that takes two indices as arguments:
879 \begin{verbatim}
880 (defmeth wl :success-prob (i j)
881 (let* ((phi (cons 0 (send self :coef-estimates)))
882 (exp-logit (exp (- (select phi i) (select phi j)))))
883 (/ exp-logit (+ 1 exp-logit))))
884 \end{verbatim}
885 Then
886 \begin{verbatim}
887 > (send wl :success-prob 4 3)
888 0.465077
889 \end{verbatim}
891 If we want this method to be available for other data sets, we can
892 construct a Bradley-Terry model prototype by
893 \begin{verbatim}
894 (defproto bradley-terry-proto () () binomialreg-proto)
895 \end{verbatim}
896 and add the \dcode{:success-prob} method to this prototype:
897 \begin{verbatim}
898 (defmeth bradley-terry-proto :success-prob (i j)
899 (let* ((phi (cons 0 (send self :coef-estimates)))
900 (exp-logit (exp (- (select phi i) (select phi j)))))
901 (/ exp-logit (+ 1 exp-logit))))
902 \end{verbatim}
903 If we modify the \dcode{bradley-terry-model} function to use this prototype
904 by defining the function as
905 \begin{verbatim}
906 (defun bradley-terry-model (counts &key labels)
907 (let* ((n (round (sqrt (length counts))))
908 (i (iseq 1 (- n 1)))
909 (low-i (apply #'append (+ (* n i) (mapcar #'iseq i))))
910 (p-names (if labels
911 (rest labels)
912 (level-names (iseq n) :prefix "Choice"))))
913 (labels ((tr (x)
914 (apply #'append (transpose (split-list (coerce x 'list) n))))
915 (lower (x) (select x low-i))
916 (low-indicators (x) (mapcar #'lower (indicators x))))
917 (let ((wins (lower counts))
918 (losses (lower (tr counts)))
919 (rows (low-indicators (repeat (iseq n) (repeat n n))))
920 (cols (low-indicators (repeat (iseq n) n))))
921 (send bradley-terry-proto :new
922 :x (- rows cols)
923 :y wins
924 :trials (+ wins losses)
925 :intercept nil
926 :predictor-names p-names)))))
927 \end{verbatim}
928 then the \dcode{:success-prob} metod is available immediately for a
929 model constructed using this function:
930 \begin{verbatim}
931 > (def wl (bradley-terry-model wins-losses :labels teams))
932 Iteration 1: deviance = 16.1873
933 Iteration 2: deviance = 15.7371
935 > (send wl :success-prob 4 3)
936 0.465077
937 \end{verbatim}
939 The \dcode{:success-prob} method can be improved in a number of ways.
940 As one example, we might want to be able to obtain standard errors in
941 addition to estimates. A convenient way to provide for this
942 possibility is to have our method take an optional argument. If this
943 argument is \dcode{nil}, the default, then the method just returns the
944 estimate. If the argument is not \dcode{nil}, then the method returns
945 a list of the estimate and its standard error.
947 To calculate the standard error, it is easier to start with the logit
948 of the probability, since the logit is a linear function of the model
949 coefficients. The method defined as
950 \begin{verbatim}
951 (defmeth bradley-terry-proto :success-logit (i j &optional stdev)
952 (let ((coefs (send self :coef-estimates)))
953 (flet ((lincomb (i j)
954 (let ((v (repeat 0 (length coefs))))
955 (if (/= 0 i) (setf (select v (- i 1)) 1))
956 (if (/= 0 j) (setf (select v (- j 1)) -1))
957 v)))
958 (let* ((v (lincomb i j))
959 (logit (inner-product v coefs))
960 (var (if stdev (matmult v (send self :xtxinv) v))))
961 (if stdev (list logit (sqrt var)) logit)))))
962 \end{verbatim}
963 returns the estimate or a list of the estimate and approximate
964 standard error of the logit:
965 \begin{verbatim}
966 > (send wl :success-logit 4 3)
967 -0.13992
968 > (send wl :success-logit 4 3 t)
969 (-0.13992 0.305583)
970 \end{verbatim}
971 The logit is calculated as a linear combination of the coefficients; a
972 list representing the linear combination vector is constructed by the
973 local function \dcode{lincomb}.
975 Standard errors for success probabilities can be computed form the
976 results of \dcode{:success-logit} using the delta method:
977 \begin{verbatim}
978 (defmeth bradley-terry-proto :success-prob (i j &optional stdev)
979 (let* ((success-logit (send self :success-logit i j stdev))
980 (exp-logit (exp (if stdev (first success-logit) success-logit)))
981 (p (/ exp-logit (+ 1 exp-logit)))
982 (s (if stdev (* p (- 1 p) (second success-logit)))))
983 (if stdev (list p s) p)))
984 \end{verbatim}
985 For our example, the results are
986 \begin{verbatim}
987 > (send wl :success-prob 4 3)
988 0.465077
989 > (send wl :success-prob 4 3 t)
990 (0.465077 0.0760231)
991 \end{verbatim}
993 These methods can be improved further by allowing them to accept
994 sequences of indices instead of only individual indices.
996 \section*{Acknowledgements}
997 I would like to thank Sandy Weisberg for many helpful comments and
998 suggestions, and for providing the code for the glim residuals
999 methods.
1001 %\section*{Notes}
1002 %On the DECstations you can load the generalized linear model
1003 %prototypes with the expression
1004 %\begin{verbatim}
1005 %(load-example "glim")
1006 %\end{verbatim}
1007 %This code seems to work reasonably on the examples I have tried, but
1008 %it is not yet thoroughly debugged.
1010 \begin{thebibliography}{99}
1011 \bibitem{Agresti}
1012 {\sc Agresti, A.} (1990), {\em Categorical Data Analysis}, New York,
1013 NY: Wiley.
1015 \bibitem{JKL}
1016 {\sc Lindsey, J. K.} (1989), {\em The Analysis of Categorical Data
1017 Using GLIM}, Springer Lecture Notes in Statistics No. 56, New York,
1018 NY: Springer.
1020 \bibitem{GLIM}
1021 {\sc McCullagh, P. and Nelder, J. A.} (1989), {\em Generalized Linear
1022 Models}, second edition, London: Chapman and Hall.
1024 \bibitem{LS}
1025 {\sc Tierney, L.} (1990), {\em Lisp-Stat: An Object-Oriented
1026 Environment for Statistical Computing and Dynamic Graphics}, New York,
1027 NY: Wiley.
1028 \end{thebibliography}
1029 \end{document}
1034 \begin{table}
1035 \caption{Wins/Losses by Home and Away Team, 1987}
1036 \begin{tabular}{lccccccc}
1037 \hline
1038 Home & \multicolumn{7}{c}{Away Team}\\
1039 \cline{2-8}
1040 Team & Milwaukee & Detroit & Toronto & New York & Boston &
1041 Cleveland & Baltimore\\
1042 \hline
1043 Milwaukee & - & 4-3 & 4-2 & 4-3 & 6-1 & 4-2 & 6-0\\
1044 Detroit & 3-3 & - & 4-2 & 4-3 & 6-0 & 6-1 & 4-3\\
1045 Toronto & 2-5 & 4-3 & - & 2-4 & 4-3 & 4-2 & 6-0\\
1046 New York & 3-3 & 5-1 & 2-5 & - & 4-3 & 4-2 & 6-1\\
1047 Boston & 5-1 & 2-5 & 3-3 & 4-2 & - & 5-2 & 6-0\\
1048 Cleveland & 2-5 & 3-3 & 3-4 & 4-3 & 4-2 & - & 2-4\\
1049 Baltimore & 2-5 & 1-5 & 1-6 & 2-4 & 1-6 & 3-4 & - \\
1050 \hline
1051 \end{tabular}
1052 \end{table}