“To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of.”
RONALD FISHER, 1890 – 1962
Respuesta = Modelo + Error \[Y_{i} = \beta_{0} + \beta_{1}X_{1i} + \epsilon_{i}\] * El modelo es lineal porque los parámetros se combinan linealmente.
* Ningún parámetro es multiplicado o dividido por otro o aparece como exponente.
* Las variables independientes pueden ser no-lineales (por lo cual el modelo lineal puede representar relaciones curvilíneas).
\[Y_{i} = \beta_{0} + \beta_{1}X_{1i} + \epsilon_{i}\] \[X_{i} \sim continua\]
Call:
lm(formula = Y ~ X)
Residuals:
Min 1Q Median 3Q Max
-19.073 -6.835 -0.875 5.806 32.904
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 101.33319 5.00127 20.261 < 2e-16 ***
X -0.42624 0.05344 -7.976 2.85e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 9.707 on 98 degrees of freedom
Multiple R-squared: 0.3936, Adjusted R-squared: 0.3874
F-statistic: 63.62 on 1 and 98 DF, p-value: 2.853e-12
\[\color{green}{{SS_{modelo}} = \sum(\hat{y_{i}}-\bar{y})^2}\]
\[\color{red}{SS_{residual} = \sum({y_{i}}-\hat{y_{i}})^2}\]
\[\color{blue}{SS_{total} = \sum({y_{i}}-\bar{y})^2}\]
\[R^2 = \frac{SS_{modelo}}{SS_{total}}\]
\[R^2 = 1 - \frac{SS_{residual}}{SS_{total}}\]
\[Y_{ij} = \mu + \alpha_{i} + \epsilon_{ij}\] * Examinar la contribución relativa de distintas fuentes de variación.
* Probar la hipótesis nula de la media poblacional y las medias para cada nivel del factor son iguales.
\[\color{green}{{SS_{modelo}} = \sum(\hat{y_{i}}-\bar{y})^2}\]
\[\color{red}{SS_{residual} = \sum({y_{i}}-\hat{y_{i}})^2}\]
\[\color{blue}{SS_{total} = \sum({y_{i}}-\bar{y})^2}\]
\[R^2 = \frac{SS_{modelo}}{SS_{total}}\]
\[R^2 = 1 - \frac{SS_{residual}}{SS_{total}}\]
data(iris)
fit <- lm(Petal.Length ~ Species, data = iris)
anova(fit)
## Analysis of Variance Table
##
## Response: Petal.Length
## Df Sum Sq Mean Sq F value Pr(>F)
## Species 2 437.10 218.551 1180.2 < 2.2e-16 ***
## Residuals 147 27.22 0.185
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
data(iris)
fit <- lm(Petal.Length ~ Species, data = iris)
summary(fit)
##
## Call:
## lm(formula = Petal.Length ~ Species, data = iris)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.260 -0.258 0.038 0.240 1.348
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.46200 0.06086 24.02 <2e-16 ***
## Speciesversicolor 2.79800 0.08607 32.51 <2e-16 ***
## Speciesvirginica 4.09000 0.08607 47.52 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4303 on 147 degrees of freedom
## Multiple R-squared: 0.9414, Adjusted R-squared: 0.9406
## F-statistic: 1180 on 2 and 147 DF, p-value: < 2.2e-16
\[Y_{ij} = \mu + \alpha_{i} + \epsilon_{ij}\] \[Y_{ij} = \beta_{0} + \beta_{1}X_{i1} + ... \beta_{j}X_{ij} + \epsilon_{ij}\]
\[Y_{ij} = \mu + \alpha_{i} + \epsilon_{ij}\] \[Y_{ij} = \beta_{0} + \beta_{1}X_{i1} + ... \beta_{j}X_{ij} + \epsilon_{ij}\]
\[Y_{ij} = \beta_{0} + \beta_{dummy1}X_{i1} + \beta_{dummy2}X_{i, dummy2} + \epsilon_{ij}\]
\[Y_{i, setosa} = \beta_{0} + \epsilon_{ij}\] \[Y_{i, setosa} = \beta_{0} + \beta_{dummy1} + \epsilon_{ij}\] \[Y_{i, setosa} = \beta_{0} + \beta_{dummy2} + \epsilon_{ij}\]
data(iris)
fit <- lm(Petal.Length ~ Species, data = iris)
summary(fit)
##
## Call:
## lm(formula = Petal.Length ~ Species, data = iris)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.260 -0.258 0.038 0.240 1.348
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.46200 0.06086 24.02 <2e-16 ***
## Speciesversicolor 2.79800 0.08607 32.51 <2e-16 ***
## Speciesvirginica 4.09000 0.08607 47.52 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4303 on 147 degrees of freedom
## Multiple R-squared: 0.9414, Adjusted R-squared: 0.9406
## F-statistic: 1180 on 2 and 147 DF, p-value: < 2.2e-16
data(iris)
fit <- lm(Petal.Length ~ Species, data = iris)
shapiro.test(fit$resid)
##
## Shapiro-Wilk normality test
##
## data: fit$resid
## W = 0.98108, p-value = 0.03676
bartlett.test(fit$resid ~ iris$Species)
##
## Bartlett test of homogeneity of variances
##
## data: fit$resid by iris$Species
## Bartlett's K-squared = 55.423, df = 2, p-value = 9.229e-13
set.seed(1234)
X <-rnorm(100, mean = 10, sd=20)
Y <- 100 + 0.25*X + 0.019*X^2 + rnorm(100, 0, 10)
fit1 <- lm(Y ~ X)
fit2 <- lm(Y ~ poly(X, 2))
fit3 <- lm(Y ~ poly(X, 10))
summary(fit1)$r.squared #LINEAL
[1] 0.5408381
summary(fit2)$r.squared #CUADRÁTICO
[1] 0.7319006
summary(fit3)$r.squared #POLINOMIO GRADO 10
[1] 0.7412149