# FlexibleRegressionandSmoothing_UsingGAMLSSinR

Flexible Regression and Smoothing Using GAMLSS in R Chapman ?The skewness of the response variable, which might change as a function of the explanatory variables; ?The nonlinear or smooth relationship between the response variable and the explanatory variables. xvii xviiiPreface GAMLSS, and its associatedR software, is a fl exible statistical framework for learn- ing from data and enhancing data analytics. In particular, the GAMLSS statistical framework enables fl exible regression and smoothing models to be fi tted to the data. The GAMLSS model assumes that the response variable has any parametric distri- bution which might be heavy- or light-tailed, and positively or negatively skewed. In addition, all the parameters of the distribution (location (e.g. mean), scale (e.g. standard deviation) and shape (skewness and kurtosis)) can be modelled as linear, nonlinear or smooth functions of explanatory variables. The book includes a large number of practical examples which refl ect the range of problems addressed by GAMLSS models. This also means that the examples provide a practical illustration of the process of using fl exible GAMLSS models for statistical learning. This book is written for: ?practitioners and researchers who wish to understand and use the GAMLSS models to learn from data, ?students who wish to learn GAMLSS through practical examples, and ?for us the authors who often forget what we have done in the past, and require documentation to remember it. We assume that readers are familiar with the basic concepts of regression and have a working knowledge ofR. GenerallyRcommands are given within the text, and data sets used are available in the software. The reader is encouraged to learn by repeating the examples given within the book. Structure of this book This book is not designed to be read necessarily from beginning to end. What we are hoping to achieve is an easy and comprehensive introduction to GAMLSS models, describing the diff erent functionalities of the GAMLSSRpackages. With this in mind we have divided the book into six parts dealing with diff erent aspects of the statistical ‘regression type’ modelling: Part I Introduction to models and packages: This part provides an explana- tion of why GAMLSS models are needed and ination about the GAMLSS Rpackages, using two practical examples. Part II Algorithms, functions and inference: This part is designed to help users to familiarize themselves with the GAMLSS algorithms, basic functions of thegamlsspackage and the inferential tools. Part III Distributions : This part describes the diff erent available distributions for the response variable. They are the distributions available in thegamlss.dist Prefacexix package, and also distributions which are generated by transing, truncating and fi nite mixing. They comprise continuous, discrete and mixed (i.e. continuous- discrete) distributions, which can be highly skewed (positively or negatively) and/or highly platykurtotic or leptokurtotic (i.e. light or heavy tails). Part IV Additive terms : This part shows the diff erent ways in which additive terms can be used to model a distribution parameter within a GAMLSS model. In particular it explains linear parametric terms, nonlinear smoothing terms and random eff ects. Part V Model selection and diagnostics: Model selection is crucial in statis- tical modelling. This part explains the diff erent s and tools within the GAMLSS packages for model selection and diagnostics. Part VI Applications: Centile estimation and some further interesting applica- tions of the GAMLSS models are covered in this part. Readers interested in practical applications are advised to read parts I and VI fi rst, followed by the technical parts III, IV, V and fi nally II. Software ination All software and datasets used in this book are contained within the GAMLSS packages. There is an exception of this rule: rcises 4, 5, and 6 of Chapter 13, which need data that can be downloaded from the “Global Lung Function Initiative” website. More about latest developments, further examples and rcises using GAMLSS can be found onwww.gamlss.org. The GAMLSS software is available from: ?The Comprehensive R Archive Network:https://cran.r-project.org. ?GitHub: code ofgamlssin R and Java) Notation used in this book In this book we distinguish between statistical models,Rpackages andRfunc- tions. We use capital letters for models, bold characters for packages and code type characters (with extra brackets) for functions. For example ?GAMLSS refers to the statistical model, ? gamlssrefers to theRpackage, and ? gamlss()to theRfunction. xxPreface Vectors in general will be represented in lower-case bold letters, e.g.x= (x1,x2,.,xn) and matrices in an upper-case bold letter, for exampleX. Scalar random variables are represented by upper case, for exampleY. The observed value of a random variable is represented by lower case, for exampley. The following table shows the notation that is used throughout this book. Systematic part Ya univariate response variable ythe vector of observed values of the response variable, i.e. (y1,y2,.,yn) ntotal number of observations Kthe total number of parameters in the distribution ofY kthe parameter number (k= 1,.,K) pkthe number of columns in the design matrixXkfor thekth parameter vector,θk Jkthe total number of smoothers for thekth distribution param- eterθk qkj the dimension of the random eff ect vectorγkj xkjthejth explanatory variable vector for thekth parameter,θk Xkann×J 0 k fi xed eff ects design matrix for thekth parameter,θk βk a vector of fi xed eff ect parameters for thekth parameter,θk of lengthJ 0 k, i.e. (βk1,βk2,.,βkJ0 k) β a vector of all fi xed eff ects parameters, i.e. (β 1,β 2,.,β K) γkjthej th random eff ect parameter vector for thekth parameter, θk, of lengthqkj γ a vector of all random eff ects parameters, i.e. (γ 11,.,γKJK) Zkjann×qkj random eff ect design matrix for thejth smoother of thekth parameter,θk Gkjanqkj× qkjpenalty matrix forγkj ηkthe predictor for thekth distribution parameter, i.e.ηk= gk(θk) Hkthe hat matrix for thekth parameter zkthe adjusted dependent variable for thekth distribution param- eterθk gk() link function applied to model thekth distribution parameter θk skj() thejth nonparametric smooth function (in the predictorηk) Wan × ndiagonal matrix of weights wandimensional vector of weights (the diagonal elements ofW) Skjthejth smoothing matrix for thekth distribution parameterθk Distributions f(y) theoretical probability (density) function of the random variable Y 1 (dfunction) 1Occasionally, for clarity, the subscript Y is used, i.e. fY(·). Prefacexxi F(y) cumulative distribution function of the random variableY (pfunction) D(·) a generic distribution E(·) exponential family of distributions Y ind ～ D(μ,σ,ν,τ)Y1,.,Ynare independently distributed with distributions D(μi,σi,νi,τi), fori= 1,.,n Q(.) inverse cumulative distribution function of the random variable Y(qfunction), i.e.F?1(·) E(Y) Expectation of random variableY Var(Y) Variance of random variableY SD(Y) Standard deviation of random variableY fY |X(·) conditional probability (density) function of the random vari- ableYgivenX N(μ,σ2) normal distribution with meanμand varianceσ2 φ(·) probability density function of a standard normal distribution N(0,1) Φ(·) cumulative distribution function of a standard normal distri- bution πkthek th prior probability in a fi nite mixture π vector of prior (or mixing) probabilities in a fi nite mixture π= (π1,π2.,πK) Distribution parameters θkthekth distribution parameter, whereθ1=μ,θ2=σ,θ3=ν andθ4=τ θka vector of lengthnof thekth distribution parameter, e.g.θ1= μ,θ2=σ μ the fi rst parameter of the distribution (usually location) σthe second parameter of the distribution (usually scale) νthe third parameter of the distribution (usually shape, e.g. skewness) τthe fourth parameter of the distribution (usually shape, e.g. kurtosis) λa hyper-parameter λthe vector of all hyper-parameters in the model σb standard deviation of a normal random eff ect term for a param- eter ustandard normal (Gaussian) quadrature mass point Likelihood and ination criteria Llikelihood function ‘log-likelihood function Λ generalized likelihood ratio test statistic iFisher’s expected ination matrix Iobserved ination matrix GDEV global deviance, i.e. minus twice the fi tted log-likelihood xxiiPreface GAIC generalized Akaike ination criterion, i.e. GDEV + (κ × df) df total (eff ective) degrees of freedom used in the model κpenalty for each degree of freedom in the model Residuals uvector of (randomized) quantile residuals rvector of normalized (randomized) quantile residuals εvector of (partial) residuals QQ statistic calculated from the residuals Z Z-statistic calculated from the residuals GAMLSS model components Ma GAMLSS model containing{D,G,T ,L} D the specifi cation of the distribution of the response variable G the diff erent link functions, e.g.gk() wheregk(θk) =ηk Tthe explanatory variable terms in the model L the specifi cation of the smoothing parameters Vector operators ?the Hadamard element by element product, i.e. lety= (y1,y2,y3) andx= (x1,x2,x3)then (y ? x) = (y1x1,y2x2,y3x3) Acknowledgements The authors would like to thank: Paul Eilers for his valuable contribution to smooth- ing functions in GAMLSS and his encouragement and advice for the book; Bob Gilchrist for his helpful comments and for providing the right research environment for a long time; Tim Cole, Elaine Borghie, Stefan van Buuren and Huiqi Pan for helping with the development of the centiles functions; Popi Akanziliotou, Marco Enea, Daniil Kiose, Luiz Nakamura, Majid Djennad, Raydonal Ospina, Konstanti- nos Pateras and Nicoleta Mortan, for their past or current contributions to the GAMLSS software; Christian Kiff ne, Albert Wong, Willem Vervoort, Steve Ellison, Michael Hohle and Larisa Kosidou for suggesting changes in theRfunctions or for small contributions to the packages; Gareth Amber, John Chambers, Trevor Hastie, Jochen Einbeck, Ross Darnell, John Hinde, Jim Lindsey, Philippe Lambert, Brian Ripley, Simon Wood and Brian Marx, for providingRfunctions which we have adapted for our own purpose; and Thomas Kneib, Nadja Klein, Stefan Lang and Andreas Mayr for promoting GAMLSS in the Bayesian and ‘boosting’ communities. Finally we would like to thank Harry Stasinopoulos for providing us with the cover fi gure for this book. Part I Introduction to models and packages 1 Why GAMLSS? CONTENTS 1.1Introduction 3 1.2The 1980s Munich rent data .4 1.3The linear regression model (LM) .6 1.4The generalized linear model (GLM) 10 1.5The generalized additive model (GAM) .16 1.6Modelling the scale parameter .20 1.7The generalized additive model for location, scale and shape (GAMLSS) 23 1.8Bibliographic notes .27 1.9rcises .28 1.1Introduction This chapter shows the evolution of statistical modelling from the linear model (LM) through the generalized linear model (GLM) and the generalized additive model (GAM) to the generalized additive model for location, scale and shape (GAMLSS). It provides 1. a discussion on the historical evolution of GAMLSS through a simple exam- ple, 2. an introduction to the GAMLSS models inR, and 3. the defi nition of a GAMLSS model. This chapter is the starting point for using GAMLSS inR. This chapter serves as an introduction to generalized additive models for location, scale and shape (GAMLSS). It builds up the GAMLSS model using ideas from its predecessors, in particular from linear regression models, generalized linear models and generalized additive models. It uses a relatively simple example, the Munich rentdata, to demonstrate why GAMLSS is needed. 3 4Flexible Regression and Smoothing: Using GAMLSS in R 1.2The 1980s Munich rent data Therentdata come from a survey conducted in April 1993 by Infratest Sozial- forschung, where a random sample of accommodation with new tenancy agreements or increases of rents within the last four years in Munich was selected, including single rooms, small apartments, fl ats and two-family houses. The accommodation is subsequently referred to as a fl at. The data were analysed by Stasinopoulos et al. [2000] and they are in the packagegamlss.data(which is automatically loaded when gamlssis loaded bylibrary(gamlss)). There are 1,969 observations on nine vari- ables in the data set but, for the purpose of demonstrating GAMLSS, we will use only the following fi ve variables: R data fi le: rentin packagegamlss.dataof dimensions 1969×9 var R: the response variable which is the monthly net rent in Deutsche Marks (DM), i.e. the monthly rent minus calculated or estimated utility cost Fl : the fl oor space in square metres A: the year of construction H: a two-level factor indicating whether there is central heating (0=yes, 1=no) loc: a factor indicating whether the location is below average (1), average (2), or above average (3) purpose:to demonstrate the need for using GAMLSS models. Fig. 1.1 PPP is the response vector,Xis then × pdesign matrix (p=r+ 1) containing thercovariate columns, plus a column of ones (if the constant is required),β= (β0,.,βr) is the coeffi cient vector,μ= (μ1,.,μn) is the mean vector andσ2= (σ2,.,σ2)is a vector of (constant) variance. Note that in order for the model to be fi tted, bothβandσ2have to be estimated from the data. The usual practice is to estimateβusing the least squares estimator, obtained by minimizing the sum of squared diff erences between the observationsYi and the meansμi=β0+β1xi1+.+βrxir, with respect to theβ’s . In vector this is written as ? β= argminβ(Y ? Xβ)(Y ? Xβ) which has solution ? β= (XX)?1XY .(1.3) It can be shown that ? βis also the maximum likelihood estimator (MLE) ofβ. Let ?μ=X?β be the fi tted values of the model and ??=Y ??μthe simple residuals (i.e. fi tted errors). Then the MLE forσ2is ?σ2= ???? n ,(1.4) Why GAMLSS?7 which is a biased estimator, i.e. E ?? σ2? 6 =σ2. An unbiased estimator ofσ2is given by s2= ???? n ? p .(1.5) Sometimess2is referred as the REML (restricted maximum likelihood) estimator ofσ2. A linear regression model can be fi tted inRusing the functionlm(). Here we compare the results fromlm()to the ones obtained bygamlss(). The notation R ~ Fl+A+H+loc is explained in Section 8.2. r1 0, μ 0, φ 0. (Ingamlssthe gamma distribution is parameterized with scale parameterσ, where σ= √φ.) Link functions were introduced by Nelder and Wedderburn [1972] for GLMs, but are appropriate for all regression models since they guarantee that parameter estimates remain within the appropriate range. For example if a parameterθhas range 0 θ G(λ)γwhereλis a vector or scalar of hyperparam- eter(s). Rigby and Stasinopoulos [2005] have shown that the algorithm used for fi tting the GAMLSS model for fi xed values of the hyperparametersλmaximizes a penalized likelihood function‘pgiven by ‘p=‘ ? 1 2 4 X k=1 Jk X j=1 γ kjGkj(λkj)γkj (1.11) where‘= Pn i=1logf(yi|μi,σi,νi,τi) is the log-likelihood function. Rigby and Stasinopoulos [2005] suggested two basic algorithms for fi tting GAMLSS model (1.10). The fi rst, the CG algorithm, is a generalization of the Cole and Green [1992] algorithm and uses the fi rst derivatives and the (exact or approximate) expected values of the second and cross derivatives of the likelihood function with respect toθ= (μ,σ,ν,τ). However for many probability (density) func- tionsf(yi|μi,σi,νi,τi), the parameters are ination orthogonal (where the expected values of the cross derivatives of the likelihood function are zero), for example location and scale models and dispersion family models. In this case the second, the RS algorithm, which is a generalization of the algorithm used by Rigby and Stasinopoulos [1996a,b] for fi tting the MADAM models, is more suited as it does not use the expected values of the cross derivatives. We now return to the Munich data to see if we can improve the model by fi tting a three-parameter distribution. We use the Box–Cox Cole and Green (BCCGo) distribution, which is based on Cole and Green [1992] who were the fi rst to fi t a single smoothing term to each of the three parameters of the distribution. They called their model the “LMS ” and it is widely used for centile estimation; see Chapter 13. The fi rst model (r6 ) fi ts a constantνwhile the second (r7 ) fi ts the same model forν as was fi tted forμandσ. r6 h1b-h2i1-h2 ##6.7845145255 -0.75981128661.2732354424 -0.1585005113 ## . . . 2.3.3 Extracting fi tted values Fitted values of the distribution parameters of a GAMLSS model (for all cases) can be obtained using thefitted()function. For example plot(lboopen, fitted(m1,“mu“)) will plot the fi tted values ofμdistribution parameter againstx(lboopen). The con- stant estimated scale parameter (the standard deviation of the normal distribution in this case) can be obtained: fitted(m1,“sigma“)[1] ##1 ## 1.392632 where[1] indicates the fi rst element of the vector. The same value can be obtained using the more general functionpredict(): predict(m1,what=“sigma“, type=“response“)[1] ##1 ## 1.392632 The functionpredict()can also be used to predict the response variable distri- bution parameters for both old and new data values of the explanatory variables. This is explained in Section 5.4. One of the fl exibilities off ered by GAMLSS is the modelling of all the distribution parameters (rather than justμ). This means that the scale and shape of the distri- bution can vary as a (linear or smooth) function of explanatory variables. Below, we show how to model bothμandσof a normal response distribution. Figure 2.1 suggests that this fl exibility of a GAMLSS model might be required. 2.3.4Modelling both μ and σ To model the predictors of both the meanμand the scale parameterσas nonpara- metric smoothing P-spline functions oflboopen(with a normal response distri- bution) use: Introduction to the gamlss packages47 m3 Dand where the hyperparameterλregulates the amount of smoothing needed for the fi t. We shall refer to functions in this aspenalized smooth functions (orpenalized smoothers). Penalized smoothers are the subject of Chapter 9, where it is shown that diff erent ulations for theZ’s and theD’s lead to diff erent types of smoothing functions with diff erent statistical properties. The model (3.1) can be generalized and written as the random eff ects GAMLSS model: Y|γ ind ～ D(μ,σ,ν,τ) η1=g1(μ) =X1β1+Z11γ11+.+Z1J1γ1J1 η2=g2(σ) =X2β2+Z21γ21+.+Z2J2γ2J2(3.2) η3=g3(ν) =X3β3+Z31γ31+.+Z3J3γ3J3 η4=g4(τ) =X4β4+Z41γ41+.+Z4J4γ4J4 where the ‘betas’ are fi xed eff ect parameters: β= (β 1,β 2,β 3,β 4) and the ‘gammas’ are random eff ect parameters: γ= (γ 11,.,γ 1J1,γ 21,.,γ 4J4) . The algorithms61 Assume in (3.2) that theγkj’s are independent of each other, each with (prior) distribution γkj～ N ? 0,[Gkj(λkj)]?1 ? (3.3) where [Gkj(λkj)]?1is the (generalized) inverse of aqkj× qkjsymmetric matrix Gkj(λkj) which may depend on a vector of hyperparametersλkj. IfGkj(λkj) is singular, thenγkjis understood to have an improper density function propor- tional to exp ??1 2γ Gkj(λkj)γ?. An important special case of (3.3) is given when Gkj(λkj) =λkjGkjandGkjis a known matrix for allk,j. If there are no random eff ects in the model (3.2) it simplifi es to: Y ind ～ D(μ,σ,ν,τ) η1=g1(μ) =X1β1 η2=g2(σ) =X2β2(3.4) η3=g3(ν) =X3β3 η4=g4(τ) =X4β4. We refer to (3.4) as theparametricGAMLSS model, and to (3.2) as therandom eff ectsGAMLSS model. Fitting the parametric model (3.4) requires only estimates for the ‘betas’β. Fitting the random eff ects GAMLSS model (3.2) requires estimates for the ‘betas’β, the ‘gammas’γ, and also the ‘lambdas’λ(see (3.6) below), where: λ= (λ 11,.,λ 1J1,λ 21,.,λ 4J4) . Withingamlss , the parametric GAMLSS model is fi tted by maximum likelihood estimation with respect toβ , while the more general random eff ects model is fi tted by maximum penalized likelihood estimation (or equivalently posterior mode or maximum