doc/glim.tex

   1 \documentstyle{article}
   2
   3 \newcommand{\refitem}[1]{%
   4   \begin{list}%
   5         {}%
   6         {\setlength{\leftmargin}{.25in}\setlength{\itemindent}{-.25in}}
   7   \item #1%
   8   \end{list}}
   9
  10 \setlength{\textwidth}{6in}
  11 \setlength{\textheight}{8.75in}
  12 \setlength{\topmargin}{-0.25in}
  13 \setlength{\oddsidemargin}{0.25in}
  14
  15 % This command enables hyphenation if \tt mode by changed \hyphencharacter
  16 % in the 10 point typewriter font. To work in other point sizes it would
  17 % have to be redefined. It may be Bator to just make the change globally
  18 % and have it apply to anything that is set in \tt mode
  19 \newcommand{\dcode}[1]{{\tt #1}}
  20
  21 \newcommand{\param}[1]{$\langle${\em #1\/}$\rangle$}
  22 \newcommand{\protoimage}[1]{\begin{picture}(100,20)\put(0,0){\makebox(100,20){\tt #1}}\put(50,10){\oval(100,20)}\end{picture}}
  23 \newcommand{\wprotoimage}[1]{\begin{picture}(120,20)\put(0,0){\makebox(120,20){\tt #1}}\put(60,10){\oval(120,20)}\end{picture}}
  24
  25 \title{Generalized Linear Models in Lisp-Stat}
  26 \author{Luke Tierney}
  27
  28 \begin{document}
  29 \maketitle
  30
  31 \section{Introduction}
  32 This note outlines a simple system for fitting generalized linear
  33 models in Lisp-Stat. Three standard models are implemented:
  34 \begin{itemize}
  35 \item Poisson regression models
  36 \item Binomial regression models
  37 \item Gamma regression models
  38 \end{itemize}
  39 The model prototypes inherit from the linear regression model
  40 prototype. By default, each model uses the canonical link for its
  41 error structure, but alternate link structures can be specified.
  42
  43 The next section outlines the basic use of the generalized linear
  44 model objects. The third section describes a few functions for
  45 handling categorical independent variables. The fourth section gives
  46 further details on the structure of the model prototypes, and
  47 describes how to define new models and link structures. The final
  48 section illustrates several ways of fitting more specialized models,
  49 using the Bradley-Terry model as an example.
  50
  51 \section{Basic Use of the Model Objects}
  52 Three functions are available for constructing generalized linear
  53 model objects.  These functions are called as
  54 \begin{flushleft}\tt
  55 (poissonreg-model \param{x} \param{y} [\param{keyword arguments ...}])\\
  56 (binomialreg-model \param{x} \param{y} \param{n} [\param{keyword arguments ...}])\\
  57 (gammareg-model \param{x} \param{y} \param{keyword arguments ...})
  58 \end{flushleft}
  59 The \param{x} and \param{y} arguments are as for the
  60 \dcode{regression-model} function. The sample size parameter \param{n}
  61 for binomial models can be either an integer or a sequence of integers
  62 the same length as the response vector. All optional keyword
  63 arguments accepted by the \dcode{regression-model} function are
  64 accepted by these functions as well. Four additional keywords are
  65 available:
  66 \dcode{:link}, \dcode{:offset}, \dcode{:verbose}, and \dcode{:pweights}.
  67 The keyword \dcode{:link} can be used to specify an alternate link
  68 structure. Available link structures include
  69 \begin{center}
  70 \begin{tabular}{llll}
  71 \tt identity-link & \tt log-link    & \tt inverse-link & \tt sqrt-link\\
  72 \tt logit-link    & \tt probit-link & \tt cloglog-link
  73 \end{tabular}
  74 \end{center}
  75 By default, each model uses its canonical link structure.  The
  76 \dcode{:offset} keyword can be used to provide an offset value, and
  77 the keyword \dcode{:verbose} can be given the value \dcode{nil} to
  78 suppress printing of iteration information. A prior weight vector
  79 should be specified with the \dcode{:pweights} keyword rather than the
  80 \dcode{:weights} keyword.
  81
  82 As an example, we can examine a data set that records the number of
  83 months prior to an interview when individuals remember a stressful
  84 event (originally from Haberman, \cite[p. 2]{JKL}):
  85 \begin{verbatim}
  86 > (def months-before (iseq 1 18))
  87 MONTHS-BEFORE
  88 > (def event-counts '(15 11 14 17 5 11 10 4 8 10 7 9 11 3 6 1 1 4))
  89 EVENTS-RECALLED
  90 \end{verbatim}
  91 The data are multinomial, and we can fit a log-linear Poisson model to
  92 see if there is any time trend:
  93 \begin{verbatim}
  94 > (def m (poissonreg-model months-before event-counts))
  95 Iteration 1: deviance = 26.3164
  96 Iteration 2: deviance = 24.5804
  97 Iteration 3: deviance = 24.5704
  98 Iteration 4: deviance = 24.5704
  99
 100 Weighted Least Squares Estimates:
 101
 102 Constant                  2.80316   (0.148162)
 103 Variable 0             -0.0837691   (0.0167996)
 104
 105 Scale taken as:                 1
 106 Deviance:                 24.5704
 107 Number of cases:               18
 108 Degrees of freedom:            16
 109 \end{verbatim}
 110
 111 Residuals for the fit can be obtained using the \dcode{:residuals}
 112 message:
 113 \begin{verbatim}
 114 > (send m :residuals)
 115 (-0.0439191 -0.790305 ...)
 116 \end{verbatim}
 117 A residual plot can be obtained using
 118 \begin{verbatim}
 119 (send m :plot-residuals)
 120 \end{verbatim}
 121 The \dcode{:fit-values} message returns $X\beta$, the linear predictor
 122 without any offset. The \dcode{:fit-means} message returns fitted mean
 123 response values. Thus the expression
 124 \begin{verbatim}
 125 (let ((p (plot-points months-before event-counts)))
 126   (send p :add-lines months-before (send m :fit-means)))
 127 \end{verbatim}
 128 constructs a plot of raw counts and fitted means against time.
 129
 130 To illustrate fitting binomial models, we can use the leukemia survival
 131 data of Feigl and Zelen \cite[Section 2.8.3]{LS} with the survival
 132 time converted to a one-year survival indicator:
 133 \begin{verbatim}
 134 > (def surv-1 (if-else (> times-pos 52) 1 0))
 135 SURV-1
 136 > surv-1
 137 (1 1 1 1 0 1 1 0 0 1 1 0 0 0 0 0 1)
 138 \end{verbatim}
 139 The dependent variable is the base 10 logarithm of the white blood
 140 cell counts divided by 10,000:
 141 \begin{verbatim}
 142 > transformed-wbc-pos
 143 (-1.46968 -2.59027 -0.84397 -1.34707 -0.510826 0.0487902 0 0.530628 -0.616186
 144  -0.356675 -0.0618754 1.16315 1.25276 2.30259 2.30259 1.64866 2.30259)
 145 \end{verbatim}
 146 A binomial model for these data can be constructed by
 147 \begin{verbatim}
 148 > (def lk (binomialreg-model transformed-wbc-pos surv-1 1))
 149 Iteration 1: deviance = 18.2935
 150 Iteration 2: deviance = 18.0789
 151 Iteration 3: deviance = 18.0761
 152 Iteration 4: deviance = 18.0761
 153
 154 Weighted Least Squares Estimates:
 155
 156 Constant                 0.372897   (0.590934)
 157 Variable 0              -0.985803   (0.508426)
 158
 159 Scale taken as:                 1
 160 Deviance:                 18.0761
 161 Number of cases:               17
 162 Degrees of freedom:            15
 163 \end{verbatim}
 164 This model uses the logit link, the canonical link for the binomial
 165 distribution. As an alternative, the expression
 166 \begin{verbatim}
 167 (binomialreg-model transformed-wbc-pos surv-1 1 :link probit-link)
 168 \end{verbatim}
 169 returns a model using a probit link.
 170
 171 The \dcode{:cooks-distances} message helps to highlight the last
 172 observation for possible further examination:
 173 \begin{verbatim}
 174 > (send lk :cooks-distances)
 175 (0.0142046 0.00403243 0.021907 0.0157153 0.149394 0.0359723 0.0346383
 176  0.0450994 0.174799 0.0279114 0.0331333 0.0347883 0.033664 0.0170441
 177  0.0170441 0.0280411 0.757332)
 178 \end{verbatim}
 179 This observation also stands out in the plot produced by
 180 \begin{verbatim}
 181 (send lk :plot-bayes-residuals)
 182 \end{verbatim}
 183
 184 \section{Tools for Categorical Variables}
 185 Four functions are provided to help construct indicator vectors for
 186 categorical variables. As an illustration, a data set used by Bishop,
 187 Fienberg, and Holland examines the relationship between occupational
 188 classifications of fathers and sons. The classes are
 189 \begin{center}
 190 \begin{tabular}{|c|l|}
 191 \hline
 192 Label & Description\\
 193 \hline
 194 A     & Professional, High Administrative\\
 195 S     & Managerial, Executive, High Supervisory\\
 196 I     & Low Inspectional, Supervisory\\
 197 N     & Routine Nonmanual, Skilled Manual\\
 198 U     & Semi- and Unskilled Manual\\
 199 \hline
 200 \end{tabular}
 201 \end{center}
 202 The counts are given by
 203 \begin{center}
 204 \begin{tabular}{|c|rrrrr|}
 205 \hline
 206        & \multicolumn{5}{c|}{Son}\\
 207 \hline
 208 Father &  A &   S &   I &   N &   U \\
 209 \hline
 210  A     & 50 &  45 &   8 &  18 &   8 \\
 211  S     & 28 & 174 &  84 & 154 &  55 \\
 212  I     & 11 &  78 & 110 & 223 &  96 \\
 213  N     & 14 & 150 & 185 & 714 & 447 \\
 214  U     &  3 &  42 &  72 & 320 & 411 \\
 215 \hline
 216 \end{tabular}
 217 \end{center}
 218
 219 We can set up the occupation codes as
 220 \begin{verbatim}
 221 (def occupation '(a s i n u))
 222 \end{verbatim}
 223 and construct the son's and father's code vectors for entering the
 224 data row by row as
 225 \begin{verbatim}
 226 (def son (repeat occupation 5))
 227 (def father (repeat occupation (repeat 5 5)))
 228 \end{verbatim}
 229 The counts can then be entered as
 230 \begin{verbatim}
 231 (def counts '(50  45   8  18   8
 232               28 174  84 154  55
 233               11  78 110 223  96
 234               14 150 185 714 447
 235                3  42  72 320 411))
 236 \end{verbatim}
 237
 238 To fit an additive log-linear model, we need to construct level
 239 indicators.  This can be done using the function \dcode{indicators}:
 240 \begin{verbatim}
 241 > (indicators son)
 242 ((0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0)
 243  (0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0)
 244  (0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0)
 245  (0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1))
 246 \end{verbatim}
 247 The result is a list of indicator variables for the second through the fifth
 248 levels of the variable \dcode{son}. By default, the first level is dropped.
 249 To obtain indicators for all five levels, we can supply the \dcode{:drop-first}
 250 keyword with value \dcode{nil}:
 251 \begin{verbatim}
 252 > (indicators son :drop-first nil)
 253 ((1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0)
 254  (0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0)
 255  (0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0)
 256  (0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0)
 257  (0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1))
 258 \end{verbatim}
 259
 260 To produce a readable summary of the fit, we also need some labels:
 261 \begin{verbatim}
 262 > (level-names son :prefix 'son)
 263 ("SON(S)" "SON(I)" "SON(N)" "SON(U)")
 264 \end{verbatim}
 265 By default, this function also drops the first level. This can again be
 266 changed by supplying the \dcode{:drop-first} keyword argument as
 267 \dcode{nil}:
 268 \begin{verbatim}
 269 > (level-names son :prefix 'son :drop-first nil)
 270 ("SON(A)" "SON(S)" "SON(I)" "SON(N)" "SON(U)")
 271 \end{verbatim}
 272 The value of the \dcode{:prefix} keyword can be any Lisp expression.
 273 For example, instead of the symbol \dcode{son} we can use the string
 274 \dcode{"Son"}:
 275 \begin{verbatim}
 276 > (level-names son :prefix "Son")
 277 ("Son(S)" "Son(I)" "Son(N)" "Son(U)")
 278 \end{verbatim}
 279
 280 Using indicator variables and level labels, we can now fit an additive
 281 model as
 282 \begin{verbatim}
 283 > (def mob-add
 284        (poissonreg-model
 285         (append (indicators son) (indicators father)) counts
 286         :predictor-names (append (level-names son :prefix 'son)
 287                                  (level-names father :prefix 'father))))
 288
 289 Iteration 1: deviance = 1007.97
 290 Iteration 2: deviance = 807.484
 291 Iteration 3: deviance = 792.389
 292 Iteration 4: deviance = 792.19
 293 Iteration 5: deviance = 792.19
 294
 295 Weighted Least Squares Estimates:
 296
 297 Constant                  1.36273   (0.130001)
 298 SON(S)                    1.52892   (0.10714)
 299 SON(I)                    1.46561   (0.107762)
 300 SON(N)                    2.60129   (0.100667)
 301 SON(U)                    2.26117   (0.102065)
 302 FATHER(S)                 1.34475   (0.0988541)
 303 FATHER(I)                 1.39016   (0.0983994)
 304 FATHER(N)                 2.46005   (0.0917289)
 305 FATHER(U)                 1.88307   (0.0945049)
 306
 307 Scale taken as:                 1
 308 Deviance:                  792.19
 309 Number of cases:               25
 310 Degrees of freedom:            16
 311 \end{verbatim}
 312 Examining the residuals using
 313 \begin{verbatim}
 314 (send mob-add :plot-residuals)
 315 \end{verbatim}
 316 shows that the first cell is an outlier -- the model does not fit this
 317 cell well.
 318
 319 To fit a saturated model to these data, we need the cross products of
 320 the indicator variables and also a corresponding set of labels. The
 321 indicators are produced with the \dcode{cross-terms} function
 322 \begin{verbatim}
 323 > (cross-terms (indicators son) (indicators father))
 324 ((0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0)
 325  (0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0)
 326  ...)
 327 \end{verbatim}
 328 and the names with the \dcode{cross-names} function:
 329 \begin{verbatim}
 330 > (cross-names (level-names son :prefix 'son)
 331                (level-names father :prefix 'father))
 332 ("SON(S).FATHER(S)" "SON(S).FATHER(I)" ...)
 333 \end{verbatim}
 334 The saturated model can now be fit by
 335 \begin{verbatim}
 336 > (let ((s (indicators son))
 337         (f (indicators father))
 338         (sn (level-names son :prefix 'son))
 339         (fn (level-names father :prefix 'father)))
 340     (def mob-sat
 341          (poissonreg-model (append s f (cross-terms s f)) counts
 342                            :predictor-names
 343                            (append sn fn (cross-names sn fn)))))
 344
 345 Iteration 1: deviance = 5.06262e-14
 346 Iteration 2: deviance = 2.44249e-15
 347
 348 Weighted Least Squares Estimates:
 349
 350 Constant                  3.91202   (0.141421)
 351 SON(S)                  -0.105361   (0.20548)
 352 SON(I)                   -1.83258   (0.380789)
 353 SON(N)                   -1.02165   (0.274874)
 354 SON(U)                   -1.83258   (0.380789)
 355 FATHER(S)               -0.579818   (0.236039)
 356 FATHER(I)                -1.51413   (0.33303)
 357 FATHER(N)                -1.27297   (0.302372)
 358 FATHER(U)                -2.81341   (0.594418)
 359 SON(S).FATHER(S)          1.93221   (0.289281)
 360 SON(S).FATHER(I)          2.06417   (0.382036)
 361 SON(S).FATHER(N)          2.47694   (0.346868)
 362 SON(S).FATHER(U)          2.74442   (0.631953)
 363 SON(I).FATHER(S)          2.93119   (0.438884)
 364 SON(I).FATHER(I)          4.13517   (0.494975)
 365 SON(I).FATHER(N)          4.41388   (0.470993)
 366 SON(I).FATHER(U)          5.01064   (0.701586)
 367 SON(N).FATHER(S)           2.7264   (0.343167)
 368 SON(N).FATHER(I)          4.03093   (0.41346)
 369 SON(N).FATHER(N)          4.95348   (0.385207)
 370 SON(N).FATHER(U)          5.69136   (0.641883)
 371 SON(U).FATHER(S)          2.50771   (0.445978)
 372 SON(U).FATHER(I)          3.99903   (0.496312)
 373 SON(U).FATHER(N)          5.29608   (0.467617)
 374 SON(U).FATHER(U)          6.75256   (0.693373)
 375
 376 Scale taken as:                 1
 377 Deviance:              3.37508e-14
 378 Number of cases:               25
 379 Degrees of freedom:             0
 380 \end{verbatim}
 381
 382 \section{Structure of the Generalized Linear Model System}
 383 \subsection{Model Prototypes}
 384 The model objects are organized into several prototypes, with the
 385 general prototype \dcode{glim-proto} inheriting from
 386 \dcode{regression-model-proto}, the prototype for normal linear
 387 regression models. The inheritance tree is shown in Figure
 388 \ref{GLIMTree}.
 389 \begin{figure}
 390 \begin{center}
 391 \begin{picture}(400,160)
 392 \put(140,140){\wprotoimage{regression-model-proto}}
 393 \put(150,70){\protoimage{glim-proto}}
 394 \put(0,0){\protoimage{poissonreg-proto}}
 395 \put(150,0){\protoimage{binomialreg-proto}}
 396 \put(300,0){\protoimage{gammareg-proto}}
 397 \put(200,140){\line(0,-1){50}}
 398 \put(200,70){\line(-3,-1){150}}
 399 \put(200,70){\line(3,-1){150}}
 400 \put(200,70){\line(0,-1){50}}
 401 \end{picture}
 402 \end{center}
 403 \caption{Hierarchy of generalized linear model prototypes.}
 404 \label{GLIMTree}
 405 \end{figure}
 406 This inheritance captures the reasoning by analogy to the linear case
 407 that is the basis for many ideas in the analysis of generalized linear
 408 models. The fitting strategy uses iteratively reweighted least squares
 409 by changing the weight vector in the model and repeatedly calling the
 410 linear regression \dcode{:compute} method.
 411
 412 Convergence of the iterations is determined by comparing the relative
 413 change in the coefficients and the change in the deviance to cutoff
 414 values. The iteration terminates if either change falls below the
 415 corresponding cutoffs. The cutoffs are set and retrieved by the
 416 \dcode{:epsilon} and \dcode{:epsilon-dev} methods. The default values
 417 are given by
 418 \begin{verbatim}
 419 > (send glim-proto :epsilon)
 420 1e-06
 421 > (send glim-proto :epsilon-dev)
 422 0.001
 423 \end{verbatim}
 424 A limit is also imposed on the number of iterations. The limit can be set
 425 and retrieved by the \dcode{:count-limit} message. The default value
 426 is given by
 427 \begin{verbatim}
 428 > (send glim-proto :count-limit)
 429 30
 430 \end{verbatim}
 431
 432 The analogy captured in the inheritance of the \dcode{glim-proto}
 433 prototype from the normal linear regression prototype is based
 434 primarily on the computational process, not the modeling process. As a
 435 result, several accessor methods inherited from the linear regression
 436 object refer to analogous components of the computational process,
 437 rather than analogous components of the model. Two examples are the
 438 messages \dcode{:weights} and \dcode{:y}. The weight vector in the
 439 object returned by \dcode{:weights} is the final set of weights
 440 obtained in the fit; prior weights can be set and retrieved with the
 441 \dcode{:pweights} message. The value returned by the \dcode{:y}
 442 message is the artificial dependent variable
 443 \begin{displaymath}
 444 z = \eta + (y - \mu) \frac{d\eta}{d\mu}
 445 \end{displaymath}
 446 constructed in the iteration; the actual dependent variable can be
 447 obtained and changed with the \dcode{:yvar} message.
 448
 449 The message \dcode{:eta} returns the current linear predictor values,
 450 including any offset. The \dcode{:offset} message sets and retrieves
 451 the offset value. For binomial models, the \dcode{:trials} message
 452 sets and retrieves the number of trials for each observation.
 453
 454 The scale factor is set and retrieved with the \dcode{:scale} message.
 455 Some models permit the estimation of a scale parameter. For these
 456 models, the fitting system uses the \dcode{:fit-scale} message to
 457 obtain a new scale value. The message \dcode{:estimate-scale}
 458 determines and sets whether the scale parameter is to be estimated or
 459 not.
 460
 461 Deviances of individual observations, the total deviance, and the mean
 462 deviance are returned by the messages \dcode{:deviances},
 463 \dcode{:deviance} and \dcode{:mean-deviance}, respectively. The
 464 \dcode{:deviance} and \dcode{:mean-deviance} methods adjusts for
 465 omitted observations, and the denominator for the mean deviance is
 466 adjusted for the degrees of freedom available.
 467
 468 Most inherited methods for residuals, standard errors, etc., should
 469 make sense at least as approximations. For example, residuals returned
 470 by the inherited \dcode{:residuals} message correspond to the Pearson
 471 residuals for generalized linear models. Other forms of residuals are
 472 returned by the messages
 473 \begin{center}
 474 \begin{tabular}{lll}
 475 \tt :chi-residuals & \tt :deviance-residuals & \tt :g2-residuals\\
 476 \tt :raw-residuals & \tt :standardized-chi-residuals & \tt :standardized-deviance-residuals.
 477 \end{tabular}
 478 \end{center}
 479
 480 \subsection{Error Structures}
 481 The error structure of a generalized linear model affects four methods
 482 and two slots The methods are called as
 483 \begin{flushleft}\tt
 484 (send \param{m} :initial-means)\\
 485 (send \param{m} :fit-variances \param{mu})\\
 486 (send \param{m} :fit-deviances  \param{mu})\\
 487 (send \param{m} :fit-scale)
 488 \end{flushleft}
 489 The \dcode{:initial-means} method should return an initial estimate of
 490 the means for the iterative search. The default method simply returns
 491 the dependent variable, but for some models this may need to be
 492 adjusted to move the initial estimate away from a boundary. For
 493 example, the method for the Poisson regression model can be defined as
 494 \begin{verbatim}
 495 (defmeth poissonreg-proto :initial-means () (pmax (send self :yvar) 0.5))
 496 \end{verbatim}
 497 which insures that initial mean estimates are at least 0.5.
 498
 499 The \dcode{:fit-variances} \dcode{:fit-deviances} methods return the
 500 values on the variance and deviance functions for a specified vector
 501 of means. For the Poisson regression model, these methods can be
 502 defined as
 503 \begin{verbatim}
 504 (defmeth poissonreg-proto :fit-variances (mu) mu)
 505 \end{verbatim}
 506 and
 507 \begin{verbatim}
 508 (defmeth poissonreg-proto :fit-deviances (mu)
 509   (flet ((log+ (x) (log (if-else (< 0 x) x 1))))
 510     (let* ((y (send self :yvar))
 511            (raw-dev (* 2 (- (* y (log+ (/ y mu))) (- y mu))))
 512            (pw (send self :pweights)))
 513       (if pw (* pw raw-dev) raw-dev))))
 514 \end{verbatim}
 515 The local function \dcode{log+} is used to avoid taking the logarithm
 516 of zero.
 517
 518 The final message, \dcode{:fit-scale}, is only used by the
 519 \dcode{:display} method. The default method returns the mean deviance.
 520
 521 The two slots related to the error structure are
 522 \dcode{estimate-scale} and \dcode{link}. If the value of the
 523 \dcode{estimate-scale} slot is not \dcode{nil}, then a scale estimate
 524 is computed and printed by the \dcode{:dislay} method. The
 525 \dcode{link} slot holds the link object used by the model. The Poisson
 526 model does not have a scale parameter, and the canonical link is the
 527 logit link. These defaults can be set by the expressions
 528 \begin{verbatim}
 529 (send poissonreg-proto :estimate-scale nil)
 530 (send poissonreg-proto :link log-link)
 531 \end{verbatim}
 532
 533 The \dcode{glim-proto} prototype itself uses normal errors and an
 534 identity link. Other error structures can be implemented by
 535 constructing a new prototype and defining appropriate methods and
 536 default slot values.
 537
 538 \subsection{Link Structures}
 539 The link function $g$ for a generalized linear model relates the
 540 linear predictor $\eta$ to the mean response $\mu$ by
 541 \begin{displaymath}
 542 \eta = g(\mu).
 543 \end{displaymath}
 544 Links are implemented as objects.
 545 Table \ref{Links} lists the pre-defined link functions, along with
 546 the expressions used to return link objects.
 547 \begin{table}
 548 \caption{Link Functions and Expression for Obtaining Link Objects}
 549 \label{Links}
 550 \begin{center}
 551 \begin{tabular}{lccl}
 552 \hline
 553 Link & Formula & Domain & Expression\\
 554 \hline
 555 Identity    & $\mu$      & $(-\infty, \infty)$ & \tt identity-link \\
 556 Logarithm   & $\log \mu$ & $(0, \infty)$     & \tt log-link\\
 557 Inverse     & $ 1/\mu$   & $(0, \infty)$ & \tt inverse-link\\
 558 Square Root & $\sqrt{\mu}$ & $(0, \infty)$ & \tt sqrt-link\\
 559 Logit       & $\log\frac{\mu}{1-\mu}$ & $[0,1]$ & \tt logit-link\\
 560 Probit      & $\Phi^{-1}(\mu)$ & $[0,1]$ & \tt probit-link\\
 561 Compl. log-log & $\log(-\log(1-\mu))$ & $[0,1]$ & \tt cloglog-link\\
 562 Power       & $\mu^{k}$ & $(0, \infty)$ & \tt (send power-link-proto :new \param{k})\\
 563 \hline
 564 \end{tabular}
 565 \end{center}
 566 \end{table}
 567 With one exception, the pre-defined links require no parameters.
 568 These link objects can therefore be shared among models. The exception
 569 is the power link. Links for binomial models are defined for $n = 1$
 570 trials and assume $0 < \mu < 1$.
 571
 572 Link objects inherit from the \dcode{glim-link-proto} prototype.  The
 573 \dcode{log-link} object, for example, is constructed by
 574 \begin{verbatim}
 575 (defproto log-link () () glim-link-proto)
 576 \end{verbatim}
 577 Since this prototype can be used directly in model objects, the
 578 convention of having prototype names end in \dcode{-proto} is not
 579 used. The \dcode{glim-link-proto} prototype provides a \dcode{:print}
 580 method that should work for most link functions. The \dcode{log-link}
 581 object prints as
 582 \begin{verbatim}
 583 > log-link
 584 #<Glim Link Object: LOG-LINK>
 585 \end{verbatim}
 586
 587 The \dcode{glim-proto} computing methods assume that a link object
 588 responds to three messages:
 589 \begin{flushleft}\tt
 590 (send \param{link} :eta \param{mu})\\
 591 (send \param{link} :means \param{eta})\\
 592 (send \param{link} :derivs \param{mu})
 593 \end{flushleft}
 594 The \dcode{:eta} method returns a sequence of linear predictor values
 595 for a particular mean sequence. The \dcode{:means} method is the
 596 inverse of \dcode{:eta}: it returns mean values for specified values
 597 of the linear predictor. The \dcode{:derivs} method returns the values of
 598 \begin{displaymath}
 599 \frac{d\eta}{d\mu}
 600 \end{displaymath}
 601 at the specified mean values. As an example, for the \dcode{log-link}
 602 object these three methods are defined as
 603 \begin{verbatim}
 604 (defmeth log-link :eta (mu) (log mu))
 605 (defmeth log-link :means (eta) (exp eta))
 606 (defmeth log-link :derivs (mu) (/ mu))
 607 \end{verbatim}
 608
 609 Alternative link structures can be constructed by setting up a new
 610 prototype and defining appropriate \dcode{:eta}, \dcode{:means}, and
 611 \dcode{:derivs} methods. Parametric link families can be implemented by
 612 providing one or more slots for holding the parameters. The power link
 613 is an example of a parametric link family. The power link prototype
 614 is defined as
 615 \begin{verbatim}
 616 (defproto power-link-proto '(power) () glim-link-proto)
 617 \end{verbatim}
 618 The slot \dcode{power} holds the power exponent. An accessor method
 619 is defined by
 620 \begin{verbatim}
 621 (defmeth power-link-proto :power () (slot-value 'power))
 622 \end{verbatim}
 623 and the \dcode{:isnew} initialization method is defined to require a
 624 power argument:
 625 \begin{verbatim}
 626 (defmeth power-link-proto :isnew (power) (setf (slot-value 'power) power))
 627 \end{verbatim}
 628 Thus a power link for a particular exponent, say the exponent 2, can
 629 be constructed using the expression
 630 \begin{verbatim}
 631 (send power-link-proto :new 2)
 632 \end{verbatim}
 633
 634 To complete the power link prototype, we need to define the three
 635 required methods. They are defined as
 636 \begin{verbatim}
 637 (defmeth power-link-proto :eta (mu) (^ mu (send self :power)))
 638 \end{verbatim}
 639 \begin{verbatim}
 640 (defmeth power-link-proto :means (eta) (^ eta (/ (slot-value 'power))))
 641 \end{verbatim}
 642 and
 643 \begin{verbatim}
 644 (defmeth power-link-proto :derivs (mu)
 645   (let ((p (slot-value 'power)))
 646     (* p (^ mu (- p 1)))))
 647 \end{verbatim}
 648 The definition of the \dcode{:means} method could be improved to allow
 649 negative arguments when the power is an odd integer. Finally, the
 650 \dcode{:print} method is redefined to reflect the value of the
 651 exponent:
 652 \begin{verbatim}
 653 (defmeth power-link-proto :print (&optional (stream t))
 654   (format stream ``#<Glim Link Object: Power Link (~s)>'' (send self :power)))
 655 \end{verbatim}
 656 Thus a square link prints as
 657 \begin{verbatim}
 658 > (send power-link-proto :new 2)
 659 #<Glim Link Object: Power Link (2)>
 660 \end{verbatim}
 661
 662 \section{Fitting a Bradley-Terry Model}
 663 Many models used in categorical data analysis can be viewed as
 664 special cases of generalized linear models. One example is the
 665 Bradley-Terry model for paired comparisons. The Bradley-Terry model
 666 deals with a situation in which $n$ individuals or items are compared
 667 to one another in paired contests.  The model assumes there are
 668 positive quantities $\pi_{1}, \ldots, \pi_{n}$, which can be assumed
 669 to sum to one, such that
 670 \begin{displaymath}
 671 P\{\mbox{$i$ beats $j$}\} = \frac{\pi_{i}}{\pi_{i} + \pi_{j}}.
 672 \end{displaymath}
 673 If the competitions are assumed to be mutually independent, then the
 674 probability $p_{ij} = P\{\mbox{$i$ beats $j$}\}$ satisfies the logit
 675 model
 676 \begin{displaymath}
 677 \log\frac{p_{ij}}{1-p_{ij}} = \phi_{i} - \phi_{j}
 678 \end{displaymath}
 679 with $\phi_{i} = \log \pi_{i}$. This model can be fit to a particular
 680 set of data by setting up an appropriate design matrix and response
 681 vector for a binomial regression model. For a single data set this can
 682 be done from scratch. Alternatively, it is possible to construct
 683 functions or prototypes that allow the data to be specified in a more
 684 convenient form. Furthermore, there are certain specific questions
 685 that can be asked for a Bradley-Terry model, such as what is the
 686 estimated value of $P\{\mbox{$i$ beats $j$}\}$? In the object-oriented
 687 framework, it is very natural to attach methods for answering such
 688 questions to individual models or to a model prototype.
 689
 690 To illustrate these ideas, we can fit a Bradley-Terry model to the
 691 results for the eastern division of the American league for the 1987
 692 baseball season \cite{Agresti}. Table \ref{WinsLosses} gives the results
 693 of the games within this division.
 694 \begin{table}
 695 \caption{Results of 1987 Season for American League Baseball Teams}
 696 \label{WinsLosses}
 697 \begin{center}
 698 \begin{tabular}{lccccccc}
 699 \hline
 700 Winning & \multicolumn{7}{c}{Losing Team}\\
 701 \cline{2-8}
 702 Team & Milwaukee & Detroit & Toronto & New York & Boston &
 703 Cleveland & Baltimore\\
 704 \hline
 705 Milwaukee & -  & 7  & 9  & 7  &  7  & 9 & 11\\
 706 Detroit   & 6  & -  & 7  & 5  & 11  & 9 &  9\\
 707 Toronto   & 4  & 6  & -  & 7  &  7  & 8 & 12\\
 708 New York  & 6  & 8  & 6  & -  &  6  & 7 & 10\\
 709 Boston    & 6  & 2  & 6  & 7  &  -  & 7 & 12\\
 710 Cleveland & 4  & 4  & 5  & 6  &  6  & - &  6\\
 711 Baltimore & 2  & 4  & 1  & 3  &  1  & 7 &  -\\
 712 \hline
 713 \end{tabular}
 714 \end{center}
 715 \end{table}
 716
 717 The simplest way to enter this data is as a list, working through the
 718 table one row at a time:
 719 \begin{verbatim}
 720 (def wins-losses '( -  7  9  7  7  9 11
 721                     6  -  7  5 11  9  9
 722                     4  6  -  7  7  8 12
 723                     6  8  6  -  6  7 10
 724                     6  2  6  7  -  7 12
 725                     4  4  5  6  6  -  6
 726                     2  4  1  3  1  7  -))
 727 \end{verbatim}
 728 The choice of the symbol \dcode{-} for the diagonal entries is
 729 arbitrary; any other Lisp item could be used. The team names will also
 730 be useful as labels:
 731 \begin{verbatim}
 732 (def teams '("Milwaukee" "Detroit" "Toronto" "New York"
 733              "Boston" "Cleveland" "Baltimore"))
 734 \end{verbatim}
 735
 736 To set up a model, we need to extract the wins and losses from the
 737 \dcode{wins-losses} list. The expression
 738 \begin{verbatim}
 739 (let ((i (iseq 1 6)))
 740   (def low-i (apply #'append (+ (* 7 i) (mapcar #'iseq i)))))
 741 \end{verbatim}
 742 constructs a list of the indices of the elements in the lower
 743 triangle:
 744 \begin{verbatim}
 745 > low-i
 746 (7 14 15 21 22 23 28 29 30 31 35 36 37 38 39 42 43 44 45 46 47)
 747 \end{verbatim}
 748 The wins can now be extracted from the \dcode{wins-losses} list using
 749 \begin{verbatim}
 750 > (select wins-losses low-i)
 751 (6 4 6 6 8 6 6 2 6 7 4 4 5 6 6 2 4 1 3 1 7)
 752 \end{verbatim}
 753 Since we need to extract the lower triangle from a number of lists, we
 754 can define a function to do this as
 755 \begin{verbatim}
 756 (defun lower (x) (select x low-i))
 757 \end{verbatim}
 758 Using this function, we can calculate the wins and save them in a
 759 variable \dcode{wins}:
 760 \begin{verbatim}
 761 (def wins (lower wins-losses))
 762 \end{verbatim}
 763
 764 To extract the losses, we need to form the list of the entries for the
 765 transpose of our table.  The function \dcode{split-list} can be used
 766 to return a list of lists of the contents of the rows of the original
 767 table.  The \dcode{transpose} function transposes this list of lists,
 768 and the \dcode{append} function can be applied to the result to
 769 combine the lists of lists for the transpose into a single list:
 770 \begin{verbatim}
 771 (def losses-wins (apply #'append (transpose (split-list wins-losses 7))))
 772 \end{verbatim}
 773 The losses are then obtained by
 774 \begin{verbatim}
 775 (def losses (lower losses-wins))
 776 \end{verbatim}
 777 Either \dcode{wins} or \dcode{losses} can be used as the response for
 778 a binomial model, with the trials given by
 779 \begin{verbatim}
 780 (+ wins losses)
 781 \end{verbatim}
 782
 783 When fitting the Bradley-Terry model as a binomial regression model
 784 with a logit link, the model has no intercept and the columns of the
 785 design matrix are the differences of the row and column indicators for
 786 the table of results.  Since the rows of this matrix sum to zero if
 787 all row and column levels are used, we can delete one of the levels,
 788 say the first one. Lists of row and column indicators are set up by
 789 the expressions
 790 \begin{verbatim}
 791 (def rows (mapcar #'lower (indicators (repeat (iseq 7) (repeat 7 7)))))
 792 (def cols (mapcar #'lower (indicators (repeat (iseq 7) 7))))
 793 \end{verbatim}
 794 The function \dcode{indicators} drops the first level in constructing
 795 its indicators. The function \dcode{mapcar} applies \dcode{lower} to
 796 each element of the indicators list and returns a list of the results.
 797 Using these two variables, the expression
 798 \begin{verbatim}
 799 (- rows cols)
 800 \end{verbatim}
 801 constructs a list of the columns of the design matrix.
 802
 803 We can now construct a model object for this data set:
 804 \begin{verbatim}
 805 > (def wl (binomialreg-model (- rows cols)
 806                              wins
 807                              (+ wins losses)
 808                              :intercept nil
 809                              :predictor-names (rest teams)))
 810 Iteration 1: deviance = 16.1873
 811 Iteration 2: deviance = 15.7371
 812
 813 Weighted Least Squares Estimates:
 814
 815 Detroit                 -0.144948   (0.311056)
 816 Toronto                 -0.286871   (0.310207)
 817 New York                -0.333738   (0.310126)
 818 Boston                  -0.473658   (0.310452)
 819 Cleveland               -0.897502   (0.316504)
 820 Baltimore                -1.58134   (0.342819)
 821
 822 Scale taken as:                 1
 823 Deviance:                 15.7365
 824 Number of cases:               21
 825 Degrees of freedom:            15
 826 \end{verbatim}
 827
 828 To fit to a Bradley-Terry model to other data sets, we can repeat this
 829 process.  As an alternative, we can incorporate the steps used here
 830 into a function:
 831 \begin{verbatim}
 832 (defun bradley-terry-model (counts &key labels)
 833   (let* ((n (round (sqrt (length counts))))
 834          (i (iseq 1 (- n 1)))
 835          (low-i (apply #'append (+ (* n i) (mapcar #'iseq i))))
 836          (p-names (if labels
 837                       (rest labels)
 838                       (level-names (iseq n) :prefix "Choice"))))
 839     (labels ((tr (x)
 840                (apply #'append (transpose (split-list (coerce x 'list) n))))
 841              (lower (x) (select x low-i))
 842              (low-indicators (x) (mapcar #'lower (indicators x))))
 843       (let ((wins (lower counts))
 844             (losses (lower (tr counts)))
 845             (rows (low-indicators (repeat (iseq n) (repeat n n))))
 846             (cols (low-indicators (repeat (iseq n) n))))
 847         (binomialreg-model (- rows cols)
 848                            wins
 849                            (+ wins losses)
 850                            :intercept nil
 851                            :predictor-names p-names)))))
 852 \end{verbatim}
 853 This function defines the function \dcode{lower} as a local function.
 854 The local function \dcode{tr} calculates the list of the elements in
 855 the transposed table, and the function \dcode{low-indicators} produces
 856 indicators for the lower triangular portion of a categorical variable.
 857 The \dcode{bradley-terry-model} function allows the labels for the
 858 contestants to be specified as a keyword argument. If this argument is
 859 omitted, reasonable default labels are constructed. Using this
 860 function, we can construct our model object as
 861 \begin{verbatim}
 862 (def wl (bradley-terry-model wins-losses :labels teams))
 863 \end{verbatim}
 864
 865 The definition of this function could be improved to allow some of the
 866 keyword arguments accepted by \dcode{binomialreg-model}.
 867
 868 Using the fit model object, we can estimate the probability of Boston
 869 $(i = 4)$ defeating New York $(j = 3)$:
 870 \begin{verbatim}
 871 > (let* ((phi (cons 0 (send wl :coef-estimates)))
 872          (exp-logit (exp (- (select phi 3) (select phi 4)))))
 873     (/ exp-logit (+ 1 exp-logit)))
 874 0.534923
 875 \end{verbatim}
 876 To be able to easily calculate such an estimate for any pairing, we can
 877 give our model object a method for the \dcode{:success-prob} message
 878 that takes two indices as arguments:
 879 \begin{verbatim}
 880 (defmeth wl :success-prob (i j)
 881   (let* ((phi (cons 0 (send self :coef-estimates)))
 882          (exp-logit (exp (- (select phi i) (select phi j)))))
 883     (/ exp-logit (+ 1 exp-logit))))
 884 \end{verbatim}
 885 Then
 886 \begin{verbatim}
 887 > (send wl :success-prob 4 3)
 888 0.465077
 889 \end{verbatim}
 890
 891 If we want this method to be available for other data sets, we can
 892 construct a Bradley-Terry model prototype by
 893 \begin{verbatim}
 894 (defproto bradley-terry-proto () () binomialreg-proto)
 895 \end{verbatim}
 896 and add the \dcode{:success-prob} method to this prototype:
 897 \begin{verbatim}
 898 (defmeth bradley-terry-proto :success-prob (i j)
 899   (let* ((phi (cons 0 (send self :coef-estimates)))
 900          (exp-logit (exp (- (select phi i) (select phi j)))))
 901     (/ exp-logit (+ 1 exp-logit))))
 902 \end{verbatim}
 903 If we modify the \dcode{bradley-terry-model} function to use this prototype
 904 by defining the function as
 905 \begin{verbatim}
 906 (defun bradley-terry-model (counts &key labels)
 907   (let* ((n (round (sqrt (length counts))))
 908          (i (iseq 1 (- n 1)))
 909          (low-i (apply #'append (+ (* n i) (mapcar #'iseq i))))
 910          (p-names (if labels
 911                       (rest labels)
 912                       (level-names (iseq n) :prefix "Choice"))))
 913     (labels ((tr (x)
 914                (apply #'append (transpose (split-list (coerce x 'list) n))))
 915              (lower (x) (select x low-i))
 916              (low-indicators (x) (mapcar #'lower (indicators x))))
 917       (let ((wins (lower counts))
 918             (losses (lower (tr counts)))
 919             (rows (low-indicators (repeat (iseq n) (repeat n n))))
 920             (cols (low-indicators (repeat (iseq n) n))))
 921         (send bradley-terry-proto :new
 922               :x (- rows cols)
 923               :y wins
 924               :trials (+ wins losses)
 925               :intercept nil
 926               :predictor-names p-names)))))
 927 \end{verbatim}
 928 then the \dcode{:success-prob} metod is available immediately for a
 929 model constructed using this function:
 930 \begin{verbatim}
 931 > (def wl (bradley-terry-model wins-losses :labels teams))
 932 Iteration 1: deviance = 16.1873
 933 Iteration 2: deviance = 15.7371
 934 ...
 935 > (send wl :success-prob 4 3)
 936 0.465077
 937 \end{verbatim}
 938
 939 The \dcode{:success-prob} method can be improved in a number of ways.
 940 As one example, we might want to be able to obtain standard errors in
 941 addition to estimates. A convenient way to provide for this
 942 possibility is to have our method take an optional argument. If this
 943 argument is \dcode{nil}, the default, then the method just returns the
 944 estimate. If the argument is not \dcode{nil}, then the method returns
 945 a list of the estimate and its standard error.
 946
 947 To calculate the standard error, it is easier to start with the logit
 948 of the probability, since the logit is a linear function of the model
 949 coefficients. The method defined as
 950 \begin{verbatim}
 951 (defmeth bradley-terry-proto :success-logit (i j &optional stdev)
 952   (let ((coefs (send self :coef-estimates)))
 953     (flet ((lincomb (i j)
 954              (let ((v (repeat 0 (length coefs))))
 955                (if (/= 0 i) (setf (select v (- i 1)) 1))
 956                (if (/= 0 j) (setf (select v (- j 1)) -1))
 957                v)))
 958       (let* ((v (lincomb i j))
 959              (logit (inner-product v coefs))
 960              (var (if stdev (matmult v (send self :xtxinv) v))))
 961         (if stdev (list logit (sqrt var)) logit)))))
 962 \end{verbatim}
 963 returns the estimate or a list of the estimate and approximate
 964 standard error of the logit:
 965 \begin{verbatim}
 966 > (send wl :success-logit 4 3)
 967 -0.13992
 968 > (send wl :success-logit 4 3 t)
 969 (-0.13992 0.305583)
 970 \end{verbatim}
 971 The logit is calculated as a linear combination of the coefficients; a
 972 list representing the linear combination vector is constructed by the
 973 local function \dcode{lincomb}.
 974
 975 Standard errors for success probabilities can be computed form the
 976 results of \dcode{:success-logit} using the delta method:
 977 \begin{verbatim}
 978 (defmeth bradley-terry-proto :success-prob (i j &optional stdev)
 979   (let* ((success-logit (send self :success-logit i j stdev))
 980          (exp-logit (exp (if stdev (first success-logit) success-logit)))
 981          (p (/ exp-logit (+ 1 exp-logit)))
 982          (s (if stdev (* p (- 1 p) (second success-logit)))))
 983     (if stdev (list p s) p)))
 984 \end{verbatim}
 985 For our example, the results are
 986 \begin{verbatim}
 987 > (send wl :success-prob 4 3)
 988 0.465077
 989 > (send wl :success-prob 4 3 t)
 990 (0.465077 0.0760231)
 991 \end{verbatim}
 992
 993 These methods can be improved further by allowing them to accept
 994 sequences of indices instead of only individual indices.
 995
 996 \section*{Acknowledgements}
 997 I would like to thank Sandy Weisberg for many helpful comments and
 998 suggestions, and for providing the code for the glim residuals
 999 methods.
1000
1001 %\section*{Notes}
1002 %On the DECstations you can load the generalized linear model
1003 %prototypes with the expression
1004 %\begin{verbatim}
1005 %(load-example "glim")
1006 %\end{verbatim}
1007 %This code seems to work reasonably on the examples I have tried, but
1008 %it is not yet thoroughly debugged.
1009
1010 \begin{thebibliography}{99}
1011 \bibitem{Agresti}
1012 {\sc Agresti, A.} (1990), {\em Categorical Data Analysis}, New York,
1013 NY: Wiley.
1014
1015 \bibitem{JKL}
1016 {\sc Lindsey, J. K.} (1989), {\em The Analysis of Categorical Data
1017 Using GLIM}, Springer Lecture Notes in Statistics No.  56, New York,
1018 NY: Springer.
1019
1020 \bibitem{GLIM}
1021 {\sc McCullagh, P. and Nelder, J. A.} (1989), {\em Generalized Linear
1022 Models}, second edition, London: Chapman and Hall.
1023
1024 \bibitem{LS}
1025 {\sc Tierney, L.} (1990), {\em Lisp-Stat: An Object-Oriented
1026 Environment for Statistical Computing and Dynamic Graphics}, New York,
1027 NY: Wiley.
1028 \end{thebibliography}
1029 \end{document}
1030
1031
1032
1033
1034 \begin{table}
1035 \caption{Wins/Losses by Home and Away Team, 1987}
1036 \begin{tabular}{lccccccc}
1037 \hline
1038 Home & \multicolumn{7}{c}{Away Team}\\
1039 \cline{2-8}
1040 Team & Milwaukee & Detroit & Toronto & New York & Boston &
1041 Cleveland & Baltimore\\
1042 \hline
1043 Milwaukee &  -  & 4-3 & 4-2 & 4-3 & 6-1 & 4-2 & 6-0\\
1044 Detroit   & 3-3 &  -  & 4-2 & 4-3 & 6-0 & 6-1 & 4-3\\
1045 Toronto   & 2-5 & 4-3 &  -  & 2-4 & 4-3 & 4-2 & 6-0\\
1046 New York  & 3-3 & 5-1 & 2-5 &  -  & 4-3 & 4-2 & 6-1\\
1047 Boston    & 5-1 & 2-5 & 3-3 & 4-2 &  -  & 5-2 & 6-0\\
1048 Cleveland & 2-5 & 3-3 & 3-4 & 4-3 & 4-2 & -   & 2-4\\
1049 Baltimore & 2-5 & 1-5 & 1-6 & 2-4 & 1-6 & 3-4 &  - \\
1050 \hline
1051 \end{tabular}
1052 \end{table}