Práctica 3. Estimación modelos multinivel con lmer en R

Correspondiente a la sesión del jueves, 3 de abril de 2025

En esta práctica:

1. Cargar/instalar librerías

if (!require("pacman")) install.packages("pacman") # solo la primera vez
pacman::p_load(lme4,
               reghelper,
               haven,
               stargazer,
               ggplot2, # gráficos
               texreg, # tablas de regresion (screenreg)
               dplyr # manipulacion de datos
               ) # paquetes a cargar

Para esta práctica utilizaremos la librería lme4, en particular la función lmer, así como también la librería haven, para lectura de base de datos externa. Además stargazer para visualizar outputs.

2. Leer datos base High School and Beyond (HSB)**

Como en la sesión anterior

mlm = read_dta("http://www.stata-press.com/data/mlmus3/hsb.dta")

“mlm”” es el nombre que le daremos a la base de datos “High School and Beyond”

Variables relevantes para ejercicios:

  1. Nivel 1:
  • minority, an indicator for student ethnicity (1 = minority, 0 = other)
  • female, an indicator for student gender (1 = female, 0 = male)
  • ses, a standardized scale constructed from variables measuring parental education, occupation, and income
  • mathach, a measure of mathematics achievement
  1. Nivel 2
  • size (school enrollment)
  • sector (1 = Catholic, 0 = public)
  • pracad (proportion of students in the academic track)
  • disclim (a scale measuring disciplinary climate)
  • himnty (1 = more than 40% minority enrollment, 0 = less than 40%)
  • mnses (mean of the SES values for the students in this school who are included in the level-1 file)
  1. Cluster variable: schoolid

Seleccionar variables

mlm = mlm %>% 
  select(minority,female,ses,mathach,size,sector,pracad,disclim,himinty,mnses,schoolid) %>%
  as.data.frame()
names(mlm)
 [1] "minority" "female"   "ses"      "mathach"  "size"     "sector"  
 [7] "pracad"   "disclim"  "himinty"  "mnses"    "schoolid"
#Tabla estadisticos descriptivos
stargazer(mlm,title="Estadísticos descriptivos", type = "text")

Estadísticos descriptivos
=================================================
Statistic   N     Mean    St. Dev.   Min    Max  
-------------------------------------------------
minority  7,185   0.275     0.446     0      1   
female    7,185   0.528     0.499     0      1   
ses       7,185  0.0001     0.779   -3.758 2.692 
mathach   7,185  12.748     6.878   -2.832 24.993
size      7,185 1,056.862  604.172   100   2,713 
sector    7,185   0.493     0.500     0      1   
pracad    7,185   0.534     0.251   0.000  1.000 
disclim   7,185  -0.132     0.944   -2.416 2.756 
himinty   7,185   0.280     0.449     0      1   
mnses     7,185  0.0001     0.414   -1.194 0.825 
schoolid  7,185 5,277.898 2,499.578 1,224  9,586 
-------------------------------------------------

3. Modelos

Modelo 0: Null model

results_0 = lmer(mathach ~ 1 + (1 | schoolid), data = mlm)
summary(results_0)
Linear mixed model fit by REML ['lmerMod']
Formula: mathach ~ 1 + (1 | schoolid)
   Data: mlm

REML criterion at convergence: 47116.8

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.0631 -0.7539  0.0267  0.7606  2.7426 

Random effects:
 Groups   Name        Variance Std.Dev.
 schoolid (Intercept)  8.614   2.935   
 Residual             39.148   6.257   
Number of obs: 7185, groups:  schoolid, 160

Fixed effects:
            Estimate Std. Error t value
(Intercept)  12.6370     0.2444   51.71
screenreg(results_0) # de library texreg

========================================
                           Model 1      
----------------------------------------
(Intercept)                    12.64 ***
                               (0.24)   
----------------------------------------
AIC                         47122.79    
BIC                         47143.43    
Log Likelihood             -23558.40    
Num. obs.                    7185       
Num. groups: schoolid         160       
Var: schoolid (Intercept)       8.61    
Var: Residual                  39.15    
========================================
*** p < 0.001; ** p < 0.01; * p < 0.05

En este modelo es posible identificar \(\gamma_{00}\) (intercept - fixed effects), así como también los componentes de la varianza:

  • \(\tau_{00}\) (Var:id)
  • \(\sigma^2\) (Var:Residual).

Con estos últimos coeficientes aleatorios es posible calcular la correlación intraclase (\(\rho\)).

Correlación intra-clase = ICC = \(\rho=\frac{\tau_{00}}{\tau_{00}+\sigma^2}\)

Nota

La correlación intraclase indica la proporción de la varianza de la variable dependiente que se asocia a la pertenencia a unidades de nivel 2

# Cálculo directo en R:
8.614/(8.614+39.148) # calculo ICC 
[1] 0.1803526
# O de manera directa con funcion ICC de reghelper
reghelper::ICC(results_0)
[1] 0.1803518

En este ejemplo, el 18% de la varianza de la variable dependiente se asocia a la pertenencia a una escuela

Modelo 1: Predictores de nivel individual

results_1 = lmer(mathach ~ 1 + ses + female + (1 | schoolid), data = mlm)
screenreg(results_1, naive=TRUE)

========================================
                           Model 1      
----------------------------------------
(Intercept)                    13.28 ***
                               (0.20)   
ses                             2.36 ***
                               (0.11)   
female                         -1.19 ***
                               (0.17)   
----------------------------------------
AIC                         46605.73    
BIC                         46640.13    
Log Likelihood             -23297.86    
Num. obs.                    7185       
Num. groups: schoolid         160       
Var: schoolid (Intercept)       4.49    
Var: Residual                  36.81    
========================================
*** p < 0.001; ** p < 0.01; * p < 0.05

Modelo 2: Predictores nivel 2

results_2 = lmer(mathach ~ 1 + sector + mnses + (1 | schoolid), data = mlm)
screenreg(results_2)

========================================
                           Model 1      
----------------------------------------
(Intercept)                    12.13 ***
                               (0.20)   
sector                          1.23 ***
                               (0.31)   
mnses                           5.33 ***
                               (0.37)   
----------------------------------------
AIC                         46956.50    
BIC                         46990.90    
Log Likelihood             -23473.25    
Num. obs.                    7185       
Num. groups: schoolid         160       
Var: schoolid (Intercept)       2.31    
Var: Residual                  39.16    
========================================
*** p < 0.001; ** p < 0.01; * p < 0.05

Modelo 3: Predictores individuales y grupales

results_3 = lmer(mathach ~ 1 + ses + female + sector + mnses + (1 | schoolid), data = mlm)
screenreg(results_3)

========================================
                           Model 1      
----------------------------------------
(Intercept)                    12.74 ***
                               (0.21)   
ses                             2.15 ***
                               (0.11)   
female                         -1.20 ***
                               (0.16)   
sector                          1.25 ***
                               (0.30)   
mnses                           3.07 ***
                               (0.37)   
----------------------------------------
AIC                         46515.61    
BIC                         46563.77    
Log Likelihood             -23250.80    
Num. obs.                    7185       
Num. groups: schoolid         160       
Var: schoolid (Intercept)       2.15    
Var: Residual                  36.80    
========================================
*** p < 0.001; ** p < 0.01; * p < 0.05

Comparación individual, agregado y multinivel

Generar regresiones para comparar

reg_ind=lm(mathach ~ ses + female + mnses + sector, data=mlm)
agg_mlm=mlm %>% group_by(schoolid) %>% summarise_all(funs(mean))
reg_agg=lm(mathach ~ ses + female + mnses + sector, data=agg_mlm)

Observar: ¿Qué sucede con los coeficientes y errores estándar cuando se comparan los coeficientes y los errores estándar?

screenreg(list(reg_ind, reg_agg, results_3))

=================================================================
                           Model 1      Model 2     Model 3      
-----------------------------------------------------------------
(Intercept)                  12.80 ***   13.13 ***      12.74 ***
                             (0.13)      (0.35)         (0.21)   
ses                           2.15 ***    5.19 ***       2.15 ***
                             (0.11)      (0.37)         (0.11)   
female                       -1.34 ***   -1.97 ***      -1.20 ***
                             (0.15)      (0.56)         (0.16)   
mnses                         2.90 ***                   3.07 ***
                             (0.22)                     (0.37)   
sector                        1.32 ***    1.25 ***       1.25 ***
                             (0.16)      (0.31)         (0.30)   
-----------------------------------------------------------------
R^2                           0.18        0.67                   
Adj. R^2                      0.18        0.67                   
Num. obs.                  7185         160           7185       
AIC                                                  46515.61    
BIC                                                  46563.77    
Log Likelihood                                      -23250.80    
Num. groups: schoolid                                  160       
Var: schoolid (Intercept)                                2.15    
Var: Residual                                           36.80    
=================================================================
*** p < 0.001; ** p < 0.01; * p < 0.05

Generación de tabla para publicar en HTML

htmlreg(list(reg_ind, reg_agg, results_3), 
    custom.model.names = c("Individual","Agregado","Multinivel"),    
    custom.coef.names = c("Intercepto", "$SES_{ij}$","$Mujer_{ij}$", "$SES_{j}$", "$Sector_{j}$"), 
    custom.gof.names=c(NA,NA,NA,NA,NA,NA,NA, 
                   "Var:id ($\\tau_{00}$)","Var: Residual ($\\sigma^2$)"),
    custom.note = "%stars. Errores estándar en paréntesis",
    caption="Comparación de modelos Individual, Agregado y Multinivel",
    caption.above=TRUE,
    doctype = FALSE)
Comparación de modelos Individual, Agregado y Multinivel
  Individual Agregado Multinivel
Intercepto 12.80*** 13.13*** 12.74***
  (0.13) (0.35) (0.21)
\(SES_{ij}\) 2.15*** 5.19*** 2.15***
  (0.11) (0.37) (0.11)
\(Mujer_{ij}\) -1.34*** -1.97*** -1.20***
  (0.15) (0.56) (0.16)
\(SES_{j}\) 2.90***   3.07***
  (0.22)   (0.37)
\(Sector_{j}\) 1.32*** 1.25*** 1.25***
  (0.16) (0.31) (0.30)
R2 0.18 0.67  
Adj. R2 0.18 0.67  
Num. obs. 7185 160 7185
AIC     46515.61
BIC     46563.77
Log Likelihood     -23250.80
Num. groups: schoolid     160
Var:id (\(\tau_{00}\))     2.15
Var: Residual (\(\sigma^2\))     36.80
***p < 0.001; **p < 0.01; *p < 0.05. Errores estándar en paréntesis

Foro