%!TEX root = /Users/chl/Documents/Notes/025_dae/tex/dae.tex
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% dae.tex
% Supplemental reference material to Montgomery's Design and
% Analysis of Experiments (2005, 6th Ed.)
% Latex + dvips + ps2pdf
% Time-stamp: <2012-12-02 12:24:40 chl>
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\documentclass[a4paper,11pt]{report}
\usepackage[latin1]{inputenc}
\usepackage[dvips,a4paper]{geometry}
\usepackage{graphicx,color}
\usepackage{amsmath}
\usepackage{amssymb,amstext,bm}
\usepackage{verbatim}
\usepackage[ps2pdf,breaklinks=true,colorlinks=true]{hyperref}
\usepackage[font=small,format=plain,labelfont=bf,up]{caption} %% give a better look to caption
\setlength{\captionmargin}{30pt}
\usepackage{subfigure}
\usepackage{multirow}
%% draft watermark %%
\newcommand{\reviewtimetoday}[2]{\special{!userdict begin
/bop-hook{gsave 20 710 translate 45 rotate 0.8 setgray
/Times-Roman findfont 12 scalefont setfont 0 0 moveto (#1) show
0 -12 moveto (#2) show grestore}def end}}
\reviewtimetoday{\today}{Draft Version}
\newcommand{\HRule}{\rule{\linewidth}{0.5mm}}
\newcommand{\R}{\textsf{R}}
%% Sweave specific settings %%
\usepackage{fancyvrb}
\DefineVerbatimEnvironment{Sinput}{Verbatim}{fontshape=sl}
\DefineVerbatimEnvironment{Soutput}{Verbatim}{}
\DefineVerbatimEnvironment{Scode}{Verbatim}{fontshape=sl}
%\newenvironment{Schunk}{\footnotesize}{}
\newenvironment{Schunk}{\begin{list}{}{%
\setlength{\topsep}{1em}\setlength{\leftmargin}{0.5cm}}%
\item\footnotesize}{\end{list}}
%%
\makeatletter
\renewcommand{\@biblabel}[1]{\quad #1.}
\def\verbatim@font{\small\ttfamily}
\makeatother
\newcommand{\marginlabel}[1]
{\mbox{}\marginpar{\scriptsize \raggedright\hspace{0pt}#1}}
%\renewcommand{\figurename}{\textbf{Figure}}
%\renewcommand{\tablename}{\textbf{Table}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{document}
\include{./titlepage}
\pagenumbering{roman}
\setcounter{page}{1}
\section*{Introduction}
\label{sec:introduction}
This document has been conceived as a supplemental reference material to
accompany the excellent book of Douglas C. Montgomery, \emph{Design and
Analysis of Experiments} (hereafter abbreviated as DAE). Now published in its
6th edition, this book covers numerous techniques used in the design and
analysis of experiments. This includes: Classical comparative experiments (two
groups of observations, independant or not), the natural extension to the case
of $k$ means to be compared (one-way ANOVA), various ways of blocking
(randomized blocks, latin squares and derived), the factorial-- in particular
the $2^k$ ones --and fractional designs, the fitting of regression models and
response surface methodology, a review of techniques for robust parameter
estimation, and the various derivation of standard design with fixed effects
(random factor, nested and split-plot designs).
Motivation for writting such a computer oriented document was initially
started when I was reading the document elaborated by Laura Thompson to
accompany Agresti's famous book, \emph{Categorical Data Analysis}\footnote{A
revised version of her textbook can be found here:
\href{https://home.comcast.net/~lthompson221/Splusdiscrete2.pdf}%
{https://home.comcast.net/~lthompson221/Splusdiscrete2.pdf}.}. Indeed, I found
that this really was a great idea as it brings to the other side of the
statistian's activity, that of computing. This document is somewhat different
of \texttt{splusdiscrete} since I don't pretend to be as exhaustive as she is
in her own manuscript.
While this textbook contains the same material as the original book written by
Montgomery, it is obviously not meant to be a partial electronic copy, nor to
be a complete replacement of the original book. Rather, I put some emphasis on
modern computer methods used to analyse data. Briefly, each chapter of this
textbook is organized as follow: first, I give a short summary of the main
concepts presented by Montgomery; then I try to illustrate some of the most
important (to my opinion!) ones with \R. Exemples used by Montgomery are
completely re-analysed using \R. However, I do not answer to the proposed
exercices that can be found at the end of each chapter of DAE. I left them to
the interested reader, giving occasionnally some advice on ``\R\ way'' to do
the intended analysis.
\section*{About R}
\label{sec:about-r}
Montgomery mainly uses non-free software to analyse the dataset presented in
each chapter. Though these dedicated softwares have proved to be very good
packages for statistical analysis, their cost restrict their use to people
working in laboratory where specific credits are devoted to such
investment. Now, it seems that the avalailability of open-source software,
like \R, offers an elegant alternative to such solutions (often inaccessible
to students).
\R\ has been developed based on the \textsf{S} programming language and
\textsf{S-PLUS} software, although it is not a free completely rewritten clone
of \textsf{S-PLUS}. In fact, there are several differences between the two,
and the interested reader can refer to the following adress for a deeper
understanding of the way \R\ has been built: www.
\R\ can be freely downloaded on CRAN website
(\href{http://www.cran.r-project.org}{www.cran.r-project.org}), and many
documentation and tutorials can be found at the same address. What makes \R\ a
better choice to closed software like the ones Montgomery uses in his book is
that the source code of all the statistical built-in routines is available and
can be studied separately. In addition, users can add their own function to
suit their needs for a particular analysis, or for batching several analysis
process at once.
\section*{Exemple Scripts}
\label{sec:exemple-scripts}
All the analyses done by Montgomery in his book are replicated here
using \R, version 2.7, on Mac OS X though they were initiated with R
2.4 running on a Linux plateform. The source code for all the exemples
is available at the following address:
\begin{center} \href{http://www.aliquote.org/articles/stat/dae/}%
{www.aliquote.org/articles/stat/dae/}
\end{center}
Datasets used in this textbook can also be found on this webpage. \R\
scripts should run without any problem with any version of \R\
$\geq2.0$. However, in case you encounter any problem, please send me
an email
(\href{mailto:christophe.lalanne@gmx.net}{christophe.lalanne@gmx.net})
with some detailed information on the bug found. I don't use
\texttt{Sweave} to process this document, because at the time of the
first writing of this textbook I felt more comfortable without it;
further, as there aren't any simulated data, nor too strong
packages dependency, a simple verbatim environment should be
sufficient for most of what I need. So all the included code is
static, except for some pieces of code in Chapter~2, and compilation relies on
\texttt{dvips} + \texttt{ps2pdf} only. Furthermore, I haven't splitted
the \texttt{tex} source into distinct chapters, so there is a ``huge''
source file that can be downloaded from there if anyone is interested
in getting the main \texttt{tex} file :
\href{http://www.aliquote.org/articles/stat/dae/}%
{www.aliquote.org/articles/stat/dae/dae.tex}.
\tableofcontents
\chapter{Introduction}
\label{cha:introduction}
\pagenumbering{arabic}\setcounter{page}{1}
The 6th edition of Montgomery's book, \emph{Design and Analysis of
Experiments}, has many more to do with the various kind of experimental
setups commonly used in biomedical research or industrial
engineering, and how to reach significant conclusions from the
observed results. This is an art and it is called the Design of
Experiment (\textsc{doe}). The approach taken along the textbook
differs from most of the related books in that it provides both a deep
understanding of the underlying statistical theory and covers a broad
range of experimental setups, e.g. balanced incomplete block design,
split-plot design, or response surface. As all these \textsc{doe} are
rarely presented altogether in an unified statistical framework, this
textbook provides valuable information about their common anchoring in
the basic ANOVA Model.
Quoting Wiley's website comments,
\begin{quote}
Douglas Montgomery arms readers with the most effective approach for
learning how to design, conduct, and analyze experiments that
optimize performance in products and processes. He shows how to use
statistically designed experiments to obtain information for
characterization and optimization of systems, improve manufacturing
processes, and design and develop new processes and products. You
will also learn how to evaluate material alternatives in product
design, improve the field performance, reliability, and
manufacturing aspects of products, and conduct experiments
effectively and efficiently.
\end{quote}
Modern computer statistical software now offer an increasingly ``power''
and allow to run computationally intensive procedures (bootstrap,
jacknife, permuation tests,\ldots) without leaving the computer
desktop for one night or more. Furthermore, multivariate exploratory
statistics have brought novel and exciting graphical displays to
highlight the relations between several variables at once. As they are
part of results reporting, they complement very kindly the statistical
models tested against the observed data.
We propose to analyze some the data provided in this textbook with the
open-source \R\ statistical software. The official website,
\url{www.r-project.org}, contains additional information and several
handbook wrote by international contributors. To my opinion, \R\ has
benefited from the earlier development of the \textsf{S} language as a
statistical programming language, and as such offers a very flexible
way to handle every kind of dataset. Its grahical capabilities, as
well as its inferential engine, design it as the more flexible
statistical framework at this time.
The \R\ packages used throughout these chapters are listed below, in
alphabetical order. A brief description is provided, but refer to the
on-line help (\verb|help(package="xx")|) for further indications on
how to use certain functions.
\paragraph{Package listing.}
Since 2007, some packages are now organized in what are called
\emph{Task Views} on \textsc{cran} website. Good news: There is a Task View
called \verb|ExperimentalDesign|. By the time I started to write this
textbook, there were really few available ressources to create complex
designs like fractional factorial or latin hypercube designs, nor was
there any in-depth coverage of \textsc{doe} analysis with \R, except
\cite{Venables:2002} who dedicated some attention to blocking and
factorial designs, J.~Faraway's handbook, \emph{Practical Regression
and Anova using \R} \cite{Faraway:2002} (but see \textsc{cran}
contributed documentation\footnote{Faraway has now published two books
on the analysis of (Non-)Linear Models, GLM, and Mixed-effects Models, see
\cite{Faraway:2005,Faraway:2006}.}), and G.~Vikneswaran who wrote
\href{http://cran.r-project.org/doc/contrib/Vikneswaran-ED_companion.pdf}{An
\R\ companion to ``Experimental Design''} which accompagnies Berger
and Maurer's book \cite{Berger:2002}.
\begin{description}
\item[car] provides a set of useful functions for ANOVA designs and
Regression Models;
\item[lattice] provides some graphical enhancements compared to
traditional \R\ graphics, as well as multivariate displays
capabilities;\\
For \emph{Trellis Displays}, see
\url{http://stat.bell-labs.com/project/trellis/}
\item[lme4] the newer and enhanced version of the \verb|nlme| package,
for which additional data structure are available (nested or
hierarchical model,\ldots);
\item[nlme] for handling mixed-effects models, developped by Pinheiro
\& Bates \cite{Pinheiro:2000};
\item[npmc] implements two procedures for non-parametric multiple
comparisons procedures;
\end{description}
\paragraph{Further Readings.}
Additional references are given in each chapter, when
necessary. However, there are plenty of other general textbooks
on \textsc{doe},
e.g. \cite{Hinkelmann:2005,Christensen:2002,Faraway:2005} (English) and
\cite{Benoist:1994,Dagnelie:2003,Goupy:2004,Azais:2005} (French),
among the most recent ones.
\chapter{Simple Comparative Experiments}
\label{cha:simple-comp-exper}
\section{Summary of Chapter 2}
\label{sec:summary-chapter-2}
After having defined the way simple comparative experiments are planned
(treatments or conditions, levels of a factor), Montgomery briefly explains
basic statistical concepts related to the analysis of such
designs. This includes the ideas of sampling distributions, or hypothesis
formulation. Two samples related problems are covered, both under
specific distributional assumption or in an alternative non-parametric
way. The $t$-test is probably the core concept that one has to
understand before starting with more complex models. Indeed, the
construction of a test statistic, the distribution assumption of this
statistic under the null hypothesis (always stated as an absence of
difference between treatment means), and the way one can conclude from
the results are of primary importance. This chapter must be read by
every scientist looking at a first primer in statistical inference.
\section{Sampling distributions}
\label{sec:sampl-distr}
Several probabillity distributions are avalaible in \R. They are all prefixed
with one of the following letters: \verb|d|, \verb|p|, \verb|q|, and \verb|r|,
which respectively refers to: density function, probability value, quantile
value and random number generated from the distribution. For example, a sample
of ten normal deviates, randomly choosen from the standard normal distribution
(also refered to as ${\cal N}(0;1)$, or $Z$ distribution), can be obtained
using
\begin{verbatim}
x <- rnorm(10)
\end{verbatim}
Since each call to the random number generator (\textsc{rng}) of \R\
involves a different random seed\footnote{Random number generators
were originally based on a congruential recurrence relation, e.g.\\
$x_{k+1}=a_0+b\cdot x_k\pmod{c}$, where $a_0$ is the initial (fixed)
seed for a given sequence. Now, several sophisticated algorithms are
available; refer to \texttt{?RNGkind}.}, it could be convenient to fix
its value such that the same results can be obtained later. To do so,
use something like:
\begin{verbatim}
set.seed(891)
\end{verbatim}
The function \verb|set.seed| is used to set the \textsc{rng} to a specified
state, and it takes any integer between 1 and 1023. Random values
generation is part of statistical theory and these techniques are
widely used in simulation sstudies. Moreover, random numbers are the
core of several computational intensive algorithms, like the Bootstrap
estimation procedure or Monte Carlo simulation design. A very elegant
introduction to this topic is provided in \cite{Gentle:2003} (see
Chapter~8 for some hints on the use of \R\ \textsc{rng}).
%% TODO
%% RNG shall be better explained (in a footnote or in the body)
\R\ can be used to produce different kind of graphical
representation. It's probably the most challenging statistical tool
for that particular option. Among them, dot diagram and histogram are
useful tools to visualize continuous variable. Figure~\ref{fig:2.1}
has been created with the following commands:
\begin{verbatim}
# Tension Bond Strength data (Tab. 2-1, p. 24)
y1 <- c(16.85,16.40,17.21,16.35,16.52,17.04,16.96,17.15,16.59,16.57)
y2 <- c(16.62,16.75,17.37,17.12,16.98,16.87,17.34,17.02,17.08,17.27)
y <- data.frame(Modified=y1,Unmodified=y2)
y.means <- as.numeric(apply(y,2,mean))
opar <- par(mfrow=c(2,1),mar=c(5,7,4,2),las=1)
stripchart(y,xlab=expression("Strength (kgf/cm^2)"),pch=19)
arrows(y.means,rep(1.5,2),y.means,c(1.1,1.9),length=.1)
text(y.means,c(1.2,1.8),round(y.means,2),pos=4,cex=.8)
# Random deviates (instead of data from metal recovery used in the book)
rd <- rnorm(200,mean=70,sd=5)
hist(rd,xlab="quantile",ylab="Relative frequency",
main="Random Normal Deviates\n N(70,5)")
par(opar)
\end{verbatim}
\begin{figure}[htb]
\centering
\begin{minipage}[t]{.45\textwidth}\centering
\includegraphics[width=.95\textwidth]{./img/fig.2.1.eps}
\caption{Dot diagram for the tension bond strength data (upper
panel) and Histogram for 200 normal random deviates (lower
panel).}
\label{fig:2.1}
\end{minipage}\hfill
\begin{minipage}[t]{.45\textwidth}\centering
\includegraphics[width=.95\textwidth]{./img/fig.2.2.eps}
\caption{Density estimate for the same 200 normal random deviates.}
\label{fig:2.2}
\end{minipage}
\end{figure}
As can be seen with this snippet of code, relatively few commands
allow to produce powerful graphics\ldots Indeed, several books or
website are dedicated to exploratory multivariate graphics. Among
others, the interested reader may have a look at:
\begin{itemize}
\item S. Deepayan (2008). \emph{Lattice. Multivariate Data
Visualization with R}\footnote{\R\ code and Figures can be found on
\url{http://dsarkar.fhcrc.org/lattice/book/figures.html}.}. Springer.
\url{http://www.springer.com/statistics/computational/book/978-0-387-75968-5}
\item R Graph Gallery, \url{http://addictedtor.free.fr/graphiques/}
\item Trellis Display, \url{http://stat.bell-labs.com/project/trellis/}
\item P. Murrel (2005). \emph{R Graphics}\footnote{\R\ code and
Figures can be found on
\url{http://www.stat.auckland.ac.nz/~paul/RGraphics/rgraphics.html}.}.
Chapman \& Hall/CRC.
\end{itemize}
and, of course, the must-have-one book that Venables \& Ripley wrote
on the \textsf{S} language, now in its fourth edition,
\cite{Venables:2002}.
Histograms are naturally more appropriate when there are numerous
observations (e.g.~$n>20$). It is not uncommon to express the data as
a density plotted against the observed value, and to superimpose a
normal density function with the corresponding mean and SD. However, a
better way to highlight the distribution of the variable under study,
especially in its continuous aspect, is to draw a non-parametric
density curve, as shown in Figure~\ref{fig:2.2}. We often get a clearer
picture of the underlying distribution, while the appropriate the
number of bins used to display the histogram is not always an easy
choice. But see \cite{Venables:2002} (pp.~126--130) for additional
discussion on this topic.
An other solution is to use a box-and-whisker plot, also called a
\textbf{boxplot}\marginlabel{\textsl{John Tukey} (1915--2000)
introduced modern techniques for the estimation of spectra of time
series, notably the Fast Fourier Transform.}. As illustrated in
Figure~\ref{fig:2.3}, a lot of information can be found in a
boxplot. First, the rectangle box displays half of the total
observations, the median being shown inside as an horizontal
segment. The upper side of the box is thus the third quartile, while
the first quartile is located at the lower side. The extreme tickmarks
correspond to the min and max values. However, when an observation
exceeds $\pm 1.5$ times the inter-quartile range from the median, it
is explicitely drawn on the plot, and the extreme tickmarks then
correspond to these reference values. This way of handling what could
be considered as ``extreme values'' in \R\ is known as the Tukey's
method. To get such a grahics, one use \verb|boxplot()| function which
accept either formula or variable $+$ factor
inputs. Figure~\ref{fig:2.3} is thus simply produced using
\begin{verbatim}
boxplot(y,ylab="Strength (kgf/cm^2)",las=1)
\end{verbatim}
\begin{figure}[ht]
\centering
\includegraphics[width=.5\textwidth]{./img/fig.2.3.eps}
\caption{Boxplot for the portland cement tension bond strength
experiment.}\label{fig:2.3}
\end{figure}
An example of a Laplace-Gauss---or the ``normal'', for
short---distribution, with mean 0 and SD 1, is shown in
Figure~\ref{fig:gauss}. As it is a density function, its area equals 1
and any are comprised between two $x$-values can be calculated very
easily using modern computer software. For instance, the shaded gray
area, which is the probability $P(1.2\leq y<2.4$, is estimated to be
0.107. With \R, it is obtained using \verb|pnorm(2.4)-pnorm(1.2)|.
\begin{verbatim}
x <- seq(-3.5,3.5,by=0.01)
y <- dnorm(x)
plot(x,y,xlab="",ylab="",type="l",axes=F,lwd=2)
axis(side=1,cex.axis=.8); axis(2,pos=0,las=1,cex.axis=.8)
mtext(expression(mu),side=1,line=2,at=0)
mtext(expression(paste(frac(1, sigma*sqrt(2*pi)), " ",
plain(e)^{frac(-(x-mu)^2,
2*sigma^2)})),side=3,line=0)
# highlight a specific area (drawing is from left to right,
# then from right to left)
polygon(c(x[471:591],rev(x[471:591])),c(rep(0,121),rev(y[471:591])),
col="lightgray",border=NA)
\end{verbatim}
\begin{figure}[htb]
\centering
\includegraphics[width=.5\textwidth]{./img/gauss.eps}
\caption{The ``normal'' density function.}\label{fig:gauss}
\end{figure}
\section{Testing hypotheses}
\label{sec:testing-hypotheses}
Statistical hypothesis are generally formulated, based on a given
model, as a set of two opposite assertions, the \emph{null hypothesis}
being that the statistics reflecting some knowledge about the
treatment effect are not different one from the other. Consider a
possible analytical model that describes two-sample related outcomes:
\begin{equation}\label{eq:model1}
y_{ij}=\mu_i+\varepsilon_{ij}\qquad i=1,2;\, j=1,2,\dots,n_i,
\end{equation}
where $y_{ij}$ are the observations gathered from (statistical) unit
$j$ in group $i$, and $\mu_i$ is the group mean.
Then, the corresponding hypothesis that can be formulated is
\begin{equation}\label{eq:hyp1}
\begin{array}{ll}
H_0: & \mu_1=\mu_2 \\
H_1: & \mu_1\neq\mu_2.
\end{array}
\end{equation}
Here $H_0$ denotes the null hypothesis of the absence of effect while
$H_1$ (also denoted $H_A$ by some authors) is the logical negation of
$H_0$.
This testing framework lead to consider two kind of potential errors:
Type I error ($\alpha$) when we reject the null while it is true in
the real world, Type II error ($\beta$) when the null is not rejected
while it should have been. Formally, this is equivalent to
\begin{equation}\label{eq:hyp2}
\begin{array}{lll}
\alpha &= \Pr(\textrm{Type I error}) &= \Pr(\textrm{reject}\; H_0\mid H_0\;\textrm{is
true})\\
\beta &= \Pr(\textrm{Type II error}) &= \Pr(\textrm{fail to
reject}\;H_0\mid H_0\,\textrm{is false})
\end{array}
\end{equation}
Using this notation, $\alpha$ is generally refered to as the
significance level, and it is what is reported by statistical software
when running a given test. Both kind of error are equally important,
although Type II error tends to be neglected in many
studies. Figure~\ref{fig:hypothesis} highlights the relation between
these two quantities, based on two hypothetical distributions. The
script is taken from \textsc{cran} website (but it is not very difficult to
reproduce with a few commands).
% TODO
% change the following picture
\begin{figure}[ht]
\centering
\includegraphics[width=.5\textwidth]{./img/power.ps}
\caption{Type I and II errors.}\label{fig:hypothesis}
\end{figure}
\section{The two-sample $t$-test}
Comparing two set of observations on a response variable involves
three steps: (1) constructing a test statistics, (2) defining its
sampling distribution, and (3) computing the associated $p$-value. As
already said, the $p$-value represents the probability of observing a
value at least as extremal as that observed using the present
data. This is obviously a purely frequentist approach, but it proves
to be sufficient in most cases.
The test statistic is given by
\begin{equation}\label{eq:t-test}
t_0=\frac{\bar{y}_1-\bar{y}_2}{S_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}},
\end{equation}
where $\bar{y}_{1,2}$ are the group means, $n_{1,2}$ the sample sizes
and $S_p$ an estimate of what is called the pooled variance. When
$n_1=n_2$, the design is said to be \emph{balanced}. The pooled
variance is simply the average of the within-group variance, and is
computed, in the general case, as
\begin{equation}\label{eq:t-test2}
S_p^2=\frac{(n_1-1)S_1^2+(n_2-1)S_2^2}{n_1+n_2-2}.
\end{equation}
The quantity $n_1+n_2-2$ is called the \emph{degrees of freedom} of the test
statistics, that is the number of observations free to vary
independently.
There, we must distinguish two approaches in the inferential paradigm
and the interpretation of the $p$-value. According to the Neyman \&
Pearson's view, the statistical test provides an answer to a purely
binary decision (accept or reject the null hypothesis) and the value
of the $p$ is not to be interpreted further than its position with
respect to a criterion value, say 5\%, defined before the start of the
experiment\footnote{The Neyman-Pearson criterion says that we should
construct our decision rule to have maximum probability of detection
while not allowing the probability of false alarm to exceed a certain
value $\alpha$. It can be shown that a likelihood ratio test that
reject $H_0$ in favor of the alternative hypothesis is the most
powerful test of size $\alpha$, though in most case, this test is not
used.}. On the contrary, \textbf{Fisher}\marginlabel{\textsl{Sir
Ronald Aylmer Fisher} (1890--1962) significantly contributed to the
development of methods and sampling distributions suitable for small
samp- les, and he's considered the father of analysis of variance.}
\cite{Fisher:1990} has defended the idea that the value of $p$ itself
provides an indication of the strength of the result against the null
hypothesis.
There are very long-standing debates on these two approaches and on
the way statistical results can be interpreted. We will use most of
the time the former approach (binary decision rule) but also provide
the value of the resulting $p$, though it is generally computed based
on asymptotic theoretical results.
% TODO
% add a reference illustrating the debate
Confidence interval (CI) can be computed easily based on the sampling
distribution of the test statistic, which is the well-known Student
${\cal T}(\nu)$ distribution whose quantiles are available in \R\ (see
\verb|?qt|). The general formulation of a $100(1-\alpha)$\% confidence
interval for a difference of two means, say $\bar{y}_1-\bar{y}_2$, is
easily obtained as
\begin{equation}\label{eq:t-test3}
(\bar{y}_1-\bar{y}_2) \pm t_{\alpha/2,n_1+n_2-2}S_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}
\end{equation}
where $\alpha=0.05$ means a 95\% CI. Interesting discussions on the
use and interpretation of a confidence interval can be found in
articles wrote by Lecoutre and coworkers,
e.g. \cite{Lecoutre:2001,Lecoutre:2003}.
The function \verb|t.test()| can be applied to the Tension Bond
Strength data.
\begin{verbatim}
t.test(y1,y2,var.equal=TRUE)
\end{verbatim}
The output is shown below:
\begin{verbatim}
Two Sample t-test
data: y1 and y2
t = -2.1869, df = 18, p-value = 0.0422
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.54507339 -0.01092661
sample estimates:
mean of x mean of y
16.764 17.042
\end{verbatim}
\R\ gives both the $t_0$, degrees of freedom and $p$-value, as well as
the 95\% confidence interval computed using
Formula~\ref{eq:t-test3}. The test is significant at the commonly
admitted 5\% level, or, alternatively, the $p$-value provides
strengthening evidence against the null. We reach a similar conclusion
when interpreting the 95\% CI as it does not cover 0. Overall, there
is a 0.278 kgf/cm$^2$ difference between the two treatments.
\begin{verbatim}
as.numeric(diff(apply(y,2,mean)))
\end{verbatim}
\label{para:Welch}
If we omit the \verb|var.equal=TRUE| option, \R\ computes the Welch
modified $t$-test. In this case, instead of using a pooled variance
estimate, degrees of freedom are approximate to get a less liberal
$p$-value; this is also refered to as \emph{Satterthwaite approximate
$p$-value} \cite{Satterthwaite:1946,Welch:1947}. The formula for
computing degree of freedom is then
\begin{equation}\label{eq:satterthwaite}
\nu=\frac{2(w_1+w_2)}{w_{12}/(n_1-1)+w_{22}/(n_2-1)}
\end{equation}
Applied to the preceding example, this gives a $t$-value of -2.187,
with 17.025 df, and a $p$-value of 0.043.
\begin{verbatim}
t.test(y1,y2)
\end{verbatim}
As reporting a non-integer degree of freedom may be confusing, it is
often neglected. Here, as variance are not too different between the
two groups, we get quite comparable $p$-value because it isn't
necessary to adjust very strongly the degrees of freedom of the test
statistic.
\section{Comparing a single mean to a criterion value}
\section{Application to paired samples}
Another situation arises when the two samples are related in some
way. For example, we can imagine an experiment where a number of
specimens are tested by both tip 1 and tip 2. Data are in
\texttt{hardness.txt}.
\begin{verbatim}
tmp <- scan("hardness.txt",sep=",")
hardness <- data.frame(y=tmp,tip=gl(2,10))
t.test(y~tip,data=hardness,paired=TRUE)
\end{verbatim}
Here, we cannot conclude to a significant difference between the two
tips ($t(9)=-0.26,\, p=0.798$). If we look at a plot of the two
evaluations (Fig.~\ref{fig:hardness}, left), we can see that both are
distributed along a line with slope 1 suggesting a close agreement
between Tip 1 and Tip 2. In this particular context, a more useful way
of checking agreement is to plot the difference between Tip 1 and Tip
2 as a function of the sum of the two evaluations
(Fig.~\ref{fig:hardness}, right). This was initially proposed for
assessing biomedical agreement by \cite{Bland:1986}.
\begin{verbatim}
with(hardness, plot(y[tip==1],y[tip==2],xlab="Tip 1",ylab="Tip 2"))
abline(0,1)
with(hardness, plot(y[tip==1]+y[tip==2],y[tip==1]-y[tip==2],
xlab="Tip 1 + Tip 2",ylab="Tip 1 - Tip 2",ylim=c(-3,3)))
abline(h=0)
\end{verbatim}
\begin{figure}[htb]
\centering
\begin{tabular}{cc}
\includegraphics[width=.5\textwidth]{./img/hardness1.eps} &
\includegraphics[width=.5\textwidth]{./img/hardness2.eps}
\end{tabular}
\caption{The Hardness testing experiment.}\label{fig:hardness}
\end{figure}
Let's look at we would get if we ignore the pairing:
\begin{verbatim}
t.test(y~tip,data=hardness,var.equal=TRUE)
\end{verbatim}
As expected, the degree of freedoms are twice the previous ones
($n_1+n_2-2=2(n-1)$ when $n_1=n_2=n$) and the $t$-value is larger
reflecting the extra variance not accounted for.
\section{Non-parametric alternative}
For two-sample comparisons, two non-parametric tests can be used,
depending on the way data are collected. If both sample are
independent, we use Mann-Whitney-Wilcoxon rank sum test, while for
paired sample the corresponding test is called Wilcoxon signed rank
test.
Both are called using \R\ function \verb|wilcox.test| and the
option \verb+paired={TRUE|FALSE}+. For the previous examples, we get
\begin{verbatim}
wilcox.test(y1,y2)
wilcox.test(y~tip,data=hardness,paired=TRUE)
\end{verbatim}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% History
%% 2008-08-21: complete the whole chapter
\chapter{Experiments with a Single Factor: The Analysis of Variance}
\label{cha:exper-with-single}
\section{Summary of Chapter 3}
\label{sec:summary-chapter-3}
Montgomery reviews the basic principles underlying the one-way ANOVA
Model under both the ``classical'' approach (based on sum of squares
colated in the so-called ANOVA table) and the regression approach
(based on the estimation of model parameters and solving normal
equations). Once the full model has been evaluated, it is often
necessary to determine which of the treatment means really differ one
from the other. Thus, it calls for multiple comparison procedures
which take care of the Type I error inflation caused by the
multiplicity of hypothesis tests. Another approach includes the design
of orthogonal contrasts which do not inflate the experiment-wise error
rate. Finally, a non parametric alternative, the Kruskal-Wallis ANOVA,
is presented, as well as its multiple comparisons counterpart.
\section{Analysis of the fixed effects model}
The Etch Rate data ara available in the file \texttt{etchrate.txt}. Before
starting the analysis, we may want to view graphically the evolution of the
observed response (Fig.~\ref{fig:etch}).
\begin{verbatim}
etch.rate <- read.table("etchrate.txt",header=T)
grp.means <- with(etch.rate, tapply(rate,RF,mean))
with(etch.rate, stripchart(rate~RF,vert=T,method="overplot",pch=1))
stripchart(as.numeric(grp.means)~as.numeric(names(grp.means)),pch="x",
cex=1.5,vert=T,add=T)
title(main="Etch Rate data",ylab=expression(paste("Observed Etch Rate (",
ring(A),"/min)")),xlab="RF Power (W)")
legend("bottomright","Group Means",pch="x",bty="n")
\end{verbatim}
\begin{figure}[htb]
\centering
\includegraphics[width=.5\textwidth]{./img/etch.eps}
\caption{The Etch Rate data.}\label{fig:etch}
\end{figure}
As can be seen from this scatterplot, there is a clear increase in
observed etch rate as RF Power also increases. Indeed, mean etch rate
evolves from 551.2 \AA/min at 160 W to 707.0 \AA/min at 220
W. Moreover, it seems that this increase occurs in a linear fashion,
but we will return to this point later on.
In its most basic formulation, the one-way model of ANOVA is expressed
as
\begin{equation}\label{eq:aov1}
y_{ij}=\mu_i+\varepsilon_{ij}\qquad i=1,\dots,a;\, j=1,\dots,n,
\end{equation}
where $y_{ij}$ is the $j$th observation associated to treatment (or
group) $i$, $\mu_i$ is the treatment mean, and $\varepsilon_{ij}$ is
the so-called residual value assumed to be
NIID. Equation~\ref{eq:aov1} is called the \emph{means model}. If we
consider the $\mu_i$ with respect to the overall mean, denoted as
$\mu$ with $\mu_i=\mu+\tau_i$, then we can rewrite
Equation~\ref{eq:aov1} as
\begin{equation}\label{eq:aov2}
y_{ij}=\mu + \tau_i+\varepsilon_{ij}\qquad i=1,\dots,a;\, j=1,\dots,n.
\end{equation}
Now, it can be seen that the $\tau_i$ represent the difference between
treatment means and the overall mean, and they are called the effects,
thus we talked about an \emph{effect model}.
\section{Estimating Model parameters}
The ANOVA table (Tab.~\ref{tab:etch}) is produced using the next
commands. The \verb|aov()| and \verb|lm()| functions are of particular
significance when running any ANOVA Model, but it is important to
emphasize that the coding of variable is very important, especially
when using the \verb|lm()| command. In that particular case,
\emph{categorical variables should be} \verb|factor| \emph{in the \R\ terminology},
otherwise a linear regression will be performed!
\begin{verbatim}
# first, we convert each variable to factor
etch.rate$RF <- as.factor(etch.rate$RF)
etch.rate$run <- as.factor(etch.rate$run)
# next, we run the model
etch.rate.aov <- aov(rate~RF,etch.rate)
summary(etch.rate.aov)
\end{verbatim}
% latex table generated in R 2.6.2 by xtable 1.5-2 package
% Fri Mar 21 18:46:07 2008
\begin{table}[ht]
\begin{center}
\begin{tabular}{lrrrrr}
\hline
& Df & Sum Sq & Mean Sq & F value & Pr($>$F) \\
\hline
RF & 3 & 66870.55 & 22290.18 & 66.80 & 0.0000 \\
Residuals & 16 & 5339.20 & 333.70 & & \\
\hline
\end{tabular}
\end{center}
\caption{Results of the ANOVA model for the Etch Rate
Experiment.}\label{tab:etch}
\end{table}
RF Mean Square largely overcomes residuals (also called \emph{error})
MS which is computed simply as the mean of the within-treatment
variances. In other words, there is much more between-treatment
variability than within-treatment ones, and such a result is very
unlikely to occur by chance alone. Note that \R\ doens't produce a
separate row for Total SS as it doesn't provide any additional
information most of the time. Here, Total SS simply equals
$66870.55+5339.20=72209.75$ (with $3+16=19$ df).
Turning back to the ANOVA result, we can estimate the overall mean and
the treatment effects
$\hat{\tau}_i=\bar{y}_{1\cdot}-\bar{y}_{\cdot\cdot}$ as follows:
\begin{verbatim}
# overall mean
(erate.mean <- mean(etch.rate$rate))
# treatment effects
with(etch.rate, tapply(rate,RF,function(x) mean(x)-erate.mean))
\end{verbatim}
This gives
\begin{center}
\begin{tabular}{l|rrrr}
RF & 160 & 180 & 200 & 220 \\
\hline
$\hat{\tau}_i$ & -66.55 & -30.35 & 7.65 & 89.25 \\
\end{tabular}
\end{center}
Another way to get the same result is to use the \verb|model.tables()|
function which provides valuable effect size information with complex
design.
\begin{verbatim}
model.tables(etch.rate.aov)
\end{verbatim}
Finally, to get $100(1-\alpha)$\% confidence intervals for treatment
effects, we may compute them by hand using the following formula
\begin{equation}\label{eq:CI_treatment}
\bar{y}_{i\cdot}\pm t_{\alpha/2,N-a}\sqrt{\frac{\textrm{MS}_e}{n}}
\end{equation}
whereas for any two treatments comparison the above formula becomes
\begin{equation}\label{eq:CI_treatment2}
(\bar{y}_{i\cdot}-\bar{y}_{j\cdot})\pm t_{\alpha/2,N-a}\sqrt{\frac{2\textrm{MS}_e}{n}}
\end{equation}
Using \R\ directly, we only have to compute the pooled SD and get the
value of the corresponding $t$ quantile.
\begin{verbatim}
MSe <- summary(etch.rate.aov)[[1]][2,3]
SD.pool <- sqrt(MSe/5)
t.crit <- c(-1,1)*qt(.975,16)
\end{verbatim}
Thus, we can compute any 95\% CI for a single mean as
$\bar{y}_{i\cdot}\pm 17.3$, and for any pair of means as
$(\bar{y}_{i\cdot}-\bar{y}_{j\cdot})\pm 24.5$. For the latter, we can
compare what we would get with a $t$-test for independent samples,
eg. for $\bar{y}_{1\cdot}-\bar{y}_{2\cdot}$:
\begin{verbatim}
with(etch.rate, t.test(rate[RF==160],rate[RF==180],var.equal=TRUE))
\end{verbatim}
The corresponing 95\% IC for the difference of means is:
[-63.11;-9.29]. This happens to be slighlty different from the above
calculations which lead to [-60.7;-11.7]. In any two cases, the IC
doesn't include 0, suggesting a significant difference between the two
means. However, in the ANOVA framework, we are using the pooled SD
calculated on all subsamples (SD=8.17)
\begin{verbatim}
mean(tapply(etch.rate$rate,etch.rate$RF,var))/5
\end{verbatim}
while for the $t$-test only the variances related to the two samples
are taken into account. In this case, the pooled SD is larger and
equals 13.05 which explain why the previous CI differ in their
respective amplitude.
\begin{verbatim}
mean(c(var(etch.rate$rate[etch.rate$RF==160]),
var(etch.rate$rate[etch.rate$RF==180])))
\end{verbatim}
Generally speaking, when the $F$-test from an ANOVA table is
significant, it means that at least one pair of means is significantly
different (or alternatively, $\exists\, i,j\; (i\neq j)\, |\,
\mu_{i\cdot}-\mu_{j\cdot}\neq 0$). We should observe a convergent
result using a $t$-test on the corresponding groups. But it may not be
advised to simultaneously test all pairs of means using simple
$t$-test as the Type I error would increase with the number of
comparisons made. This way, the probability of detecting a significant
difference would be largely greater than $\alpha=0.05$. One way to
confine the experimentwise error rate to a given $\alpha$ is to
replace $\alpha/2$ in the previous expression with $\alpha/2r$, where
$r$ is the number of intervals computed at the same time. This results
from \textbf{Bonferroni}'s Theorem\marginlabel{\textsl{Carlo
Bonferroni} (1892--1960) contributes to probabi- lity theory more than
to simultaneous statistical inference. Bonferroni's adjustement relies
rather on \emph{Boole's inequality} although he wrote two articles
about this subject.}.\label{para:bonf}
As \R\ is an Object-Oriented language---and this is even more apparent
with the recent development of S4 classes---we may be tempted to apply
the \verb|confint()| function on the \verb|aov| object
directly. However, if we observe the resulting \R\ output:
\begin{verbatim}
> confint(etch.rate.aov)
2.5 % 97.5 %
(Intercept) 533.88153 568.51847
RF180 11.70798 60.69202
RF200 49.70798 98.69202
RF220 131.30798 180.29202
\end{verbatim}
we see that it is far from the expected 95\% CI associated to
treatment effect if they are expressed as difference between group
means and overall etch rate mean ($\tau_i$). What is computed instead
is the estimated 95\% CI for treatment effect substracted to a
baseline or reference level, here the first level of RF (ie. 160
W). So, the difference $\bar{y}_{4\cdot}-\bar{y}_{1\cdot}$ is
estimated to lie between 131.3 and 180.3 95\% of the
time\footnote{Recall that the probability is attached to the
confidence interval which is random, not to the true (population)
parameter.}. We can check the correctness of the result using
Equation~\ref{eq:CI_treatment2}, eg. for the last row labeled
\verb|RF220|:
\begin{verbatim}
as.numeric(grp.means[4]-grp.means[1])+c(-1,1)*qt(.975,16)*sqrt(2*MSe/5)
\end{verbatim}
\section{Model checking}\label{sec:ModelCheck}
Model checking includes the verification of the following assumptions
(in decreasing order of importance):
\begin{enumerate}
\item independence,
\item homoscedasticity (homogeneity of the within-group variances),
\item normality of the residuals.
\end{enumerate}
In short, residuals values, defined as $e_{ij}=y_{ij}-\hat{y}_{ij}$,
should be structureless and well balanced between treatments.
Model checking can be done graphically and this often is the recommended
way, although there exists a formal test for each of the above hypotheses.
Several diagnostic plots are proposed in Figure~\ref{fig:etch_2}.
\begin{verbatim}
opar <- par(mfrow=c(2,2),cex=.8)
plot(etch.rate.aov)
par(opar)
\end{verbatim}
When applied on the result of a call to \verb|aov()|, the \verb|plot|
method produces different graphics which can be controlled with the
\verb|which=| option. \R\ defaults to produce (see \verb|?plot.lm|): a
plot of residuals against fitted values, a Scale-Location plot of
$\sqrt{e_{ij}}$ against fitted values, a Normal Q-Q plot, and a plot of
residuals against leverages. The first two subplots are very useful to
check for any departure from the homoscedasticity and normality
assumptions. The last plot (residuals vs. leverages) provides insight
in possible influencing observations. This is a convenient wrapper
function for diagnostic plots that have to be plotted separetely
otherwise. For instance,
\verb|plot(fitted(etch.rate.aov),residuals(etch.rate.aov))| produces the
first plot (residuals against fitted values), while
\verb|qqnorm(residuals(etch.rate.aov)); qqline(residuals(etch.rate.aov))|
corresponds to a Normal Q-Q plot. It could be useful to use these
commands when we're interested only in derived plan or subset of predictors.
\begin{figure}[htb]
\centering
\includegraphics[width=.7\textwidth]{./img/etch_2.eps}
\caption{Model checking for the ANOVA model.}\label{fig:etch_2}
\end{figure}
Standardized residuals may be used as a rough check for outliers. They
are defined as
\begin{equation}\label{eq:stdres}
d_{ij}=\frac{e_{ij}}{\sqrt{MSe}}
\end{equation}
and it can be shown that they are distributed as $\mathcal{N}(0,1)$
provided $\varepsilon_{ij}\sim \mathcal{N}(0,\sigma^2)$. About 95\% of
the standardized residuals should fall within $\pm 2$.
Independence of observations is largely a matter of the experimental
design and the way data are collected. Perhaps the simplest graphical
way to check for independence is to plot the residuals against run
order or any time index (Fig.~\ref{fig:etch_3}). This also allows to
check for the homoscedasticity assumption since any departure from
constant variance would be reflected in localized subsets of
observations differing in their mean response, or any systematic
pattern of outlyiers.
\begin{figure}[htb]
\centering
\includegraphics[width=.5\textwidth]{./img/etch_3.eps}
\caption{Model checking for the ANOVA model (2).}\label{fig:etch_3}
\end{figure}
Looking at the plot in Figure~\ref{fig:etch_3}, no such pattern are
visible thus we have no reason to reject the independence
hypothesis. A more formal test, and an historical ones, is called the
Durbin-Watson. This procedures aims at testing the serial
autocorrelation of errors and by default makes use of constant lag of
1. It is readily available in the \verb|car| and \verb|lmtest|
packages.
\begin{verbatim}
require(car)
durbin.watson(etch.rate.aov)
\end{verbatim}
The assumption of constant variance, or
homoscedasticity, is probably the most important in practice since we
compute a pooled variance estimate by averaging the within-treatment
variance. Any departure from this hypothesis means that some of the
groups have larger or smaller variance than other, and this causes our
estimate to be somewhat inaccurate. The question of what should be
considered as significantly ``larger'' or ``smaller'' depends on what
is being measured, but it is worth noting that any formal test leading
to the rejection of the hypothesis of constant variance cannot help to
answer this question. Indeed, if we reject $H_0:\;
\sigma^2_1=\sigma^2_2=\dots=\sigma^2_a$, what can we say then?
Nevertheless, the most widely recommended test of homoscedasticity is
\textbf{Bartlett}'s test\marginlabel{\textsl{Maurice
Stevenson Bartlett} (1910--2002) worked on the analysis of data with
spatial and temporal patterns. He is also known for his contri- bution
in the theory of statistical inference and multivariate analysis.}.
\begin{verbatim}
bartlett.test(rate~RF,data=etch.rate)
\end{verbatim}
In case one suspect strong departures from normality, we may use
Levene's testa s an laternative test for homogeneity of
variances. This test is available in the \verb|car| package.
\begin{verbatim}
levene.test(etch.rate.aov)
\end{verbatim}
Finally, the normality of the residuals can be assessed directly using
a Q-Q plot as in Figure~\ref{fig:etch_2} (the so-called \emph{droite de
Henry}, in French) where we expect the values to lie approximately on
the first bisecting line, or using the Shapiro-Wilk's test. Note that
in this latter case, the test should be carried out on each subsample
separately, which might be problematic with few replications per
subgroup.
\begin{verbatim}
shapiro.test(etch.rate$rate[etch.rate$RF==160])
\end{verbatim}
\section{Comparison among treatment means}\label{sec:MCP}
Given our $a=4$ treatments, we have a set of $4(4-1)/2$ comparisons,
the null hypothesis being $H_0:\; \mu_i=\mu_j$ for a given $(i,j)$
pair of treatment means. There are several ways to carry out
parametric multiple comparisons within \R. Perhaps the most common and
easy to understand is the systematic pairwise comparison between every
treatment means. To prevent from inflating Type I error, again,
several methods have been proposed. Among them, the most conservative
is the Bonferroni correction which adjust the nominal $\alpha$ value
by the number of comparisons (we already discussed this kind of
procedure page~\pageref{para:bonf}).
First, a pairwise $t$-test with either Bonferroni or Hochberg
correction lead to the rejection of all null hypotheses regarding
equality of treatment means (Tab.~\ref{tab:bonf} and
\ref{tab:hoch}). There are some differences in the $p$-values computed
in each case because of the adaptive way of handling the correction
factor in the Hochberg case.
\begin{verbatim}
pairwise.t.test(etch.rate$rate,etch.rate$RF,p.adjust.method="bonferroni")
pairwise.t.test(etch.rate$rate,etch.rate$RF,p.adjust.method="hochberg")
\end{verbatim}
\begin{table}[htb]
\begin{minipage}[b]{0.48\linewidth}
\centering
\begin{tabular}{lrrr}
\cline{2-4}
& 160 & 180 & 200 \\
\hline
180 & 0.038 & -- & -- \\
200 & 5.1e-05 & 0.028 & -- \\
220 & 2.2e-09 & 1.0e-07 & 1.6e-05 \\
\hline
\end{tabular}
\caption{Bonferroni method.}\label{tab:bonf}
\end{minipage}\hfill
\begin{minipage}[b]{0.48\linewidth}
\centering
\begin{tabular}{lrrr}
\cline{2-4}
& 160 & 180 & 200 \\
\hline
180 & 0.0064 & -- & -- \\
200 & 2.5e-05 & 0.0064 & -- \\
220 & 2.2e-09 & 8.5e-08 & 1.1e-05 \\
\hline
\end{tabular}
\caption{Hochberg method.}\label{tab:hoch}
\end{minipage}
\end{table}
Another alternative is to use a modified test statistic, to take into
account the Type I error inflated by multiple test. This is the
approach taken by the Tukey HSD\footnote{HSD stands for \emph{Honest
Statistical Difference}.} test \cite{Tukey:1949b}. \R\ function
\verb|TukeyHSD()| gives both adjusted $p$-value and 95\%
CI. Furthermore, there is a \verb|plot| method that provides a nice
graphical summary (Fig.~\ref{fig:Tukey}). Applying the Tukey HSD test,
we raise to the same conclusions as with the ``protected''
$t$-tests. Results are given in Table~\ref{tab:Tukey} and
Figure~\ref{fig:Tukey} where it can be seen that none of the 95\% CI
includes 0.
\begin{verbatim}
TukeyHSD(etch.rate.aov)
plot(TukeyHSD(etch.rate.aov),las=1)
\end{verbatim}
\begin{table}[htb]
\centering
\begin{tabular}{crrrr}
\hline
$i-j$ & $\delta$ & LB-CI & UP-CI & adj. $p$ \\
\hline
180-160 & 36.2 & 3.145624 & 69.25438 & 0.0294279 \\
200-160 & 74.2 & 41.145624 & 107.25438 & 0.0000455 \\
220-160 & 155.8 & 122.745624 & 188.85438 & 0.0000000 \\
200-180 & 38.0 & 4.945624 & 71.05438 & 0.0215995 \\
220-180 & 119.6 & 86.545624 & 152.65438 & 0.0000001 \\
220-200 & 81.6 & 48.545624 & 114.65438 & 0.0000146 \\
\hline
\end{tabular}
\caption{Tukey HSD method.}\label{tab:Tukey}
\end{table}
The 160--180 and 200--180 pairs of treatment means lead as before to
$p$-values comprised between 0.05 and 0.01, well above the other
$p$-values. This also apparent from the lower bound of the 95\% CI
shown in Figure~\ref{fig:Tukey}.
\begin{figure}[htb]
\centering
\includegraphics[width=.5\textwidth]{./img/Tukey.eps}
\caption{Example of an Operating Characteristic curve for the
one-way ANOVA (Etch rate example).}\label{fig:Tukey}
\end{figure}
Other methods will not be discussed here, but the interested reader is
referred to \cite{Christensen:2002} (Chapter~5) or
\cite{Toothaker:1993} for further descriptions of the pro and cons of
the different procedures. \R\ offers a dedicated package called
\verb|multcomp| (see also the \verb|Design| package) which can handle
multiple comparisons for Linear Models. Another
useful reference is \cite{Dudoit:2008} with the accompanying package
\verb|multtest|\footnote{For the moment, I only tried some of its
functionnalities, and I wrote a very brief note entitled \emph{Multiple
comparisons and $p$-value adjustment} which can be consulted from
here: \href{http://www.aliquote.org/memos/2008/07/26/multiple-comparisons-and-p-value-adjustment/}{www.aliquote.org/memos/}}.
As an alternative to the previous techniques, one can construct
specific contrasts for testing only some of treatment means one to the
other. If these contrasts, or difference of means, are designed such
that they are orthogonal altogether, then tests can be done at a
nominal 0.05 level without inflating the overall error rate.
There are various ways to design such contrasts in \R. We here review
only two of them, with the hope that the reader will not be confused by
some of the matrix algebra involved in the former method.
\section{Power and Sample size}
Power and sample size determination are two related concepts. In \R,
the function \verb|power.t.test()| allows for the necessary
computations for the one and two-sample $t$-test. In the case of the
one-way ANOVA (with fixed effects), there is a function called
\verb|power.anova.test()| which do that job, as well as
\verb|powerF()| in the \verb|QuantPsyc| package. This last function
relies on the idea that the $F$ distribution can be manipulated such
that arranging its degrees of freedom (especially that in the
denominator for sample size calculation) or the effect size reflected
in the value of any $F$ test computed from an ANOVA or Regression
analysis allows the user to get an estimate of either sample size or
power for a given design \cite{Murphy:2004}. Generally, power
calculation relies on Operating Charactristic curves, where the
probability of a Type II error is plotted against a parameter $\Phi$
(see Eq.~\ref{eq:phi1}). An example of such an OC curve, applied to
the etch rate experiment, is given is Figure~\ref{fig:power}.
There are basically two very common situations: one in which the
experimenter specifies the expected treatment means under the
alternative, and the other where the experimenter specifies the
minimum difference between any two pair of treatment means.
For the first case, we consider an application using the plasma
etching experiment. Suppose that the experimenter expects to reject
the null with a power of 0.90 if (and only if) the four treatment
means are
\[
\mu_1=575\qquad \mu_2=600\qquad \mu_3=650\quad \textrm{and}\quad \mu_4=675,
\]
considering $\alpha=0.01$ and $\sigma=25$ \AA/min. This way, we have
\begin{equation}\label{eq:phi1}
\Phi^2=\frac{n\sum_{i=1}^4\tau_i^2}{a\sigma^2}=\frac{n(6250)}{4(25)^2}=2.5n
\end{equation}
Using \R, the following code computes the required sample size to get a
power of 0.90 (ie. $\beta\leq 0.01$).
\begin{verbatim}
grp.means <- c(575,600,650,675)
power.anova.test(groups=4,between.var=var(grp.means),within.var=25^2,
sig.level=.01,power=.90)
\end{verbatim}
This gives the following results:
\begin{verbatim}
Balanced one-way analysis of variance power calculation
groups = 4
n = 3.520243
between.var = 2083.333
within.var = 625
sig.level = 0.01
power = 0.9
NOTE: n is number in each group
\end{verbatim}
We conclude that $n=4$ for each treatment group would allow us to
detect the above effect sizes wth a power of 0.90. If we're interested
in the way the SD (specified a priori) may influence the resulting
power, for a fixed sample size, we can run the following script:
\begin{verbatim}
sd <- seq(20,80,by=2)
nn <- seq(4,20,by=2)
beta <- matrix(NA,nr=length(sd),nc=length(nn))
for (i in 1:length(sd))
beta[i,] <- power.anova.test(groups=4,n=nn,between.var=var(grp.means),
within.var=sd[i]^2,sig.level=.01)$power
colnames(beta) <- nn; rownames(beta) <- sd
opar <- par(las=1,cex=.8)
matplot(sd,beta,type="l",xlab=expression(sigma),ylab=expression(1-beta),col=1,
lty=1)
grid()
text(rep(80,10),beta[length(sd),],as.character(nn),pos=3)
title("Operating Characteristic Curve\n for a=4 treatment means")
par(opar)
\end{verbatim}
As can be seen from Figure~\ref{fig:power}, increasing SD or
decreasing sample size (shown on the right of the plot) results in a
loss of power. For instance, with $\sigma=50$ (twice the value
postulated in the previous calculation) and $n=6$, we get only a power
of 0.60 (for a given $\alpha=0.01$). We would have to increase the
sample size up to $n=10$ to get at least a power $>0.80$.
As an illustration of the second power calculation objective, consider
the folowing situation. The experimenter now whishes to reject the
null with probability $\geq 0.90$ if any two treatment means differed
by as much as 75 \AA/min, with $\alpha=0.01$. Considering an SD of
$\sigma=25$, the minimum value of $\Phi^2$ is estimated to be
\begin{equation}\label{eq:4}
\Phi^2=\frac{nD^2}{2a\sigma^2}=\frac{n(75)^2}{2(4)(25^2)}=1.125n
\end{equation}
Estimation of $n$ can be done using OC curve; however, I haven't found
any \R\ function that do that computation automatically.
\begin{figure}[htb]
\centering
\includegraphics[width=.7\textwidth]{./img/occ.eps}
\caption{Example of an Operating Characteristic curve for the
one-way ANOVA (Etch rate example).}\label{fig:power}
\end{figure}
\section{Non-parametric methods in ANOVA}
A non-parametric alternative to the one-way ANOVA is the
Kruskal-Wallis ANOVA, which is quite similar except it is based on the
ranks of the observations rather than their nominal values. The
Kruskal-Wallis test is used to test the null hypothesis that the $a$
treatments are identical with respect to their central position (here,
the median). It should be kept in mind that non-parametric tests
assume that within-groups dispersion is also homogeneous as in the
parametric approach. Indeed, if this was not the case, we could hardly
segregate any individual from one group from an other one as their
distance might be accounted for by the difference in variance of the
subpopulations.
Carrying out the K-W test,
\begin{verbatim}
kruskal.test(rate~RF,data=etch.rate)
\end{verbatim}
gives a test statistic of 16.91 with 3 df and a $p$-value
$<0.001$. This result doesn't refute our preliminary conclusion using
the classical ANOVA Model.
In the non-parametric approach, multiple comparisons techniques
challenge the usual ``easiness'' of parametric ones. Indeed, this
raises the question of how to test
The \verb|npmc| packages offers NP multiple hypothesis testing for the
unbalanced one-way layout, based on Behrens-Fisher and Steel
procedures. These procedures come from \cite{Munzel:2001}.
\begin{verbatim}
library(npmc)
# we need to reformat the data.frame with var/class names
etch.rate2 <- etch.rate
names(etch.rate2) <- c("class","run","var")
summary(npmc(etch.rate2),type="BF")
\end{verbatim}
The results are summarized in Table~\ref{tab:etch_BF}.
\begin{table}[htb]
\begin{center}
\begin{tabular}{crrrrr}
\hline
$i$--$j$ & effect & LB-CI & UP-CI & $p$-value 1s &
$p$-value 2s \\
\hline
\multicolumn{6}{c}{\textbf{Behrens-Fisher}} \\
\hline
1-2 & 0.92 & 0.5764163 & 1.263584 & 0.011450580 & 0.020539156 \\
1-3 & 1.00 & 0.9998842 & 1.000116 & 0.000000000 & 0.000000000 \\
1-4 & 1.00 & 0.9998842 & 1.000116 & 0.000000000 & 0.000000000 \\
2-3 & 0.94 & 0.6758851 & 1.204115 & 0.002301579 & 0.004440345 \\
2-4 & 1.00 & 0.9998842 & 1.000116 & 0.000000000 & 0.000000000 \\
3-4 & 1.00 & 0.9998842 & 1.000116 & 0.000000000 & 0.000000000 \\
\hline
\multicolumn{6}{c}{\textbf{Steel}} \\
\hline
1-2 & 0.92 & 0.4254941 & 1.414506 & 0.07123615 & 0.13078270 \\
1-3 & 1.00 & 0.5054941 & 1.494506 & 0.02446374 & 0.04602880 \\
1-4 & 1.00 & 0.5054941 & 1.494506 & 0.02417453 & 0.04631413 \\
2-3 & 0.94 & 0.4469949 & 1.433005 & 0.05465670 & 0.10154286 \\
2-4 & 1.00 & 0.5054941 & 1.494506 & 0.02412958 & 0.04654181 \\
3-4 & 1.00 & 0.5054941 & 1.494506 & 0.02414774 & 0.04635531 \\
\hline
\end{tabular}
\caption{Results from the NP multiple comparisons procedures applied
to the etch rate data.\newline {\scriptsize LB/UP-CI: lower and
upper-bound of 95\% CI; $p$-values 1s/2s: one-sided and
two-sided $p$-value.}}\label{tab:etch_BF}
\end{center}
\end{table}
Compared to parametric multiple comparisons (\S~\ref{sec:MCP}), we
reach similar conclusions: every pair of treatment means can be
considered as significantly different one from the other.
\paragraph{Further Notes on distribution assumptions.}
It is worth to remind the reader that non-parametric tests share
common assumptions with parametric counterparts, in particular the
hypothesis of comparable dispersion. It also applies to permutation
tests. If we were to compare two distributions for which variance (or
whatever dispersion measure is used) strongly differ one from the
other, it would not make sense to use either a parametric or a
non-parametric hypothesis test. Indeed, a given observation might be
drawn from one or the other distribution, but due to overlapping of
the two distributions with differing variance, it wouldn't be possible
to associate the individual observation with any of them. In other
word, we loose the \emph{exchangeable} hypothesis.
However, a minor modification of the test statistics, as proposed by
Welch \cite{Welch:1951}, may be used for the case of non-constant
variance. Applying the following principle to the etch rate data,
\begin{verbatim}
oneway.test(rate~RF,etch.rate)
\end{verbatim}
gives a $F$ value of 68.72 and a $p$-value largely $<.001$. As was
said for the Welch modified $t$-test (p.~\pageref{para:Welch}),
degrees of freedom for the denominator (the residual) are adjusted,
they are less commonly reported.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% History
%% 2008-03-21: complete BIBD section and Latin Squares
\chapter{Randomized Blocks, Latin Squares, and Related Designs}
\label{cha:rand-blocks-latin}
\section{Summary of Chapter 4}
\section{Randomized Complete Block Design}
Randomized Complete Block Design (RCBD) is a widely used tools to study some
effect of interest while controlling for potential nuisance factor(s). It
should not be confounded with covariance analysis whereby response are
adjusted a posteriori to take into account nuisance factors.
The so-called Effects model can be expressed as
\begin{equation}
\label{eq:1}
y_{ij}=\mu+\tau_i+\beta_j+\varepsilon_{ij}\qquad (i=1,2,\dots,a; j=1,2,\dots,b)
\end{equation}
subject to
\begin{equation}
\label{eq:2}
\sum_{i=1}^a\tau_i = 0 \quad \textrm{and} \quad \sum_{j=1}^b\beta_j = 0
\end{equation}
The fundamental ANOVA equation for the RCBD resumes to
\begin{equation}
\label{eq:3}
SS_T=SS_{\textrm{treat}}+SS_{\textrm{block}}+SS_E
\end{equation}
where \emph{treat} denotes the treatment factor and \emph{block} the blocking
variable. Residual SS, with $(a-1)(b-1)$ degrees of freedom, captures the
variance unexplained by the two other factors. The layout of this design is
quite comparable to that of a two-way ANOVA with one observation per cell: no
interaction term is estimable and the design is orthogonal, so terms can be
entered in any order in the model. Note that such an additive formulation of
the response variations is not always possible, especially if some interaction
between blocks and the factor of interest is to be expected, or is discovered
when inspecting residuals vs. fitted values. In this case, a factorial design
(Chap.~5 and 6) should be more appropriate to uncover the interaction effect.
Let's consider the following example (Tab.~4-3). A product developer decides
to investigate the effect of four different levels of extrusion pressure on
flicks using a RCBD considering batches of resin as blocks. The data are
contained in the file \texttt{vascgraft.txt} and are shown in the following
Table.
% modified from
% xtable(matrix(x,nr=4,byrow=T))
\begin{table}[ht]
\begin{center}
\begin{tabular}{rrrrrrrr}
\hline
& \multicolumn{6}{c}{Batch of Resin (Block)} & \\
\cline{2-7}
PSI & 1 & 2 & 3 & 4 & 5 & 6 & Total\\
\hline
1 & 90.30 & 89.20 & 98.20 & 93.90 & 87.40 & 97.90 & 556.9 \\
2 & 92.50 & 89.50 & 90.60 & 94.70 & 87.00 & 95.80 & 550.1\\
3 & 85.50 & 90.80 & 89.60 & 86.20 & 88.00 & 93.40 & 533.5\\
4 & 82.50 & 89.50 & 85.60 & 87.40 & 78.90 & 90.70 & 514.6\\
Total & 350.8 & 359.0 & 364.0 & 362.2 & 341.3 & 377.8 &
$y_{\cdot\cdot}=2155.1$\\
\hline
\end{tabular}
\end{center}
\end{table}
\begin{verbatim}
x <- scan("vascgraft.txt")
PSI.labels <- c(8500,8700,8900,9100)
vasc.graft <- data.frame(PSI=gl(4,6,24),block=gl(6,1,24),x)
vasc.graft.aov <- aov(x~block+PSI,vasc.graft)
\end{verbatim}
Figure~\ref{fig:vgraft} gives two different pictures of the data. On
the left panel, responses have been averaged over blocks, while on the right,
data have been averaged on the treatment.
% boxplot(x~PSI,data=vasc.graft,xlab="PSI",
% main="Responses averaged over blocks")
% boxplot(x~block,data=vasc.graft,xlab="Blocks",
% main="Responses averaged over treatments")
\begin{figure}[htb]
\centering
\subfigure[]{\includegraphics[width=.35\textwidth]{./img/vgraft.eps}}
\subfigure[]{\includegraphics[width=.35\textwidth]{./img/vgraft_2.eps}}
\caption{Results of the Vascular Graft Experiment.}\label{fig:vgraft}
\end{figure}
If we want to plot both information, we can use a so-called \emph{interaction
plot}, even if we are not considering interaction between the two
factors. Figure~\ref{fig:vgraft_2} plots response as a function of treatment
(x-axis) for each block labelled with different color and line type.
% with(vasc.graft, interaction.plot(PSI,block,x,col=1:6))
\begin{figure}[htb]
\centering
\includegraphics[width=.5\textwidth]{./img/vgraft_3.eps}
\caption{Results of the Vascular Graft Experiment
(\emph{cont.}).}\label{fig:vgraft_2}
\end{figure}
Classical ANOVA results, obtained by issuing \verb|summary(vasc.graft.aov)|) in
the \R\ shell, are reported in Table~\ref{tab:vasc.graft}. As can be seen, the
effect of treatment (PSI) is highly significant, and the results clearly argue
against the null.
% latex table generated in R 2.6.2 by xtable 1.5-2 package
% Fri Mar 21 14:19:19 2008
\begin{table}[ht]
\begin{center}
\begin{tabular}{lrrrrr}
\hline
& Df & Sum Sq & Mean Sq & F value & Pr($>$F) \\
\hline
block & 5 & 192.25 & 38.45 & 5.25 & 0.0055 \\
PSI & 3 & 178.17 & 59.39 & 8.11 & 0.0019 \\
Residuals & 15 & 109.89 & 7.33 & & \\
\hline
\end{tabular}
\end{center}
\caption{Results for the Model
$y=\mu+\textrm{PSI}_i+\textrm{block}_j$.}\label{tab:vasc.graft}
\end{table}
Ignoring the blocking structure would yield incorrect result, though still
significant.
It is always a good practice to check model adequacy after running the ANOVA
model. To do so, we have to check the relation between fitted values and
residuals (\emph{homoscedasticity}), as well as the normality (of the
residuals) hypothesis. Various plots are reproduced in
Figure~\ref{fig:vgraft_3}, including (standardized and raw) residuals
vs. fitted values, QQ-plot and leverage effect.
\begin{verbatim}
opar <- par(mfrow=c(2,2),cex=.8)
plot(vasc.graft.aov)
par(opar)
\end{verbatim}
\begin{figure}[htb]
\centering
\includegraphics[width=.7\textwidth]{./img/vgraft_4.eps}
\caption{Model checking for the Vascular Graft
data.}\label{fig:vgraft_3}
\end{figure}
\paragraph{Handling Missing Values.} If some missing value occurs in an RCBD,
treatments are no longer orthogonal to blocks. Two general solutions are
available:
\begin{itemize}
\item \emph{approximate analysis}: the missing observation is estimated, and
the analysis proceeds as usual, substracting one degree of freedom to the
residual SS;
\item \emph{exact analysis}: it relies on the general regression significance
test, that will not be covered here (please refer to pp.~133--136 of
Montgomery's handbook).
\end{itemize}
\begin{verbatim}
# we delete the 10th observation
x2 <- x
x2[10] <- NA
vasc.graft2 <- data.frame(PSI=gl(4,6,24),block=gl(6,1,24),x2)
\end{verbatim}
We wish to estimate the missing value, say $y_{10}$, so that its contribution
to error sum of squares is minimal. This is equivalent to finding
$\tilde{y_{10}}$ satisfying
\[
\min_y
\sum_{i=1}^a\sum_{j=1}^by_{ij}^2-\frac{1}{b}\sum_{i=1}^a\left(\sum_{j=1}^b\right)^2-\frac{1}{a}\sum_{j=1}^b\left(\sum_{i=1}^a\right)^2+\frac{1}{ab}\left(\sum_{i=1}^a\sum_{j=1}^by_{ij}\right)^2
\]
Considering that $\frac{dSS_E}{dy}=0$, we get
\[
\tilde{y_{10}}=\frac{ay_{i\cdot}^{'}+by_{\cdot j}^{'}-y_{\cdot\cdot}^{'}}{(a-1)(b-1)}
\]
and $y_{10}$ will be imputed a value of 91.08.
\section{Latin Square Design}
The Latin Square design is another way to include blocking factors in a given
design. This way, we can account for 2 nuisance factors.
Latin squares are arranged by combining two circular permutations of a
sequence of treatment (e.g. $\{A, B, C, D, E\}$) on the rows and columns.
The example given by Montgomery on the Rocket Propellant Problem is available
in the file \texttt{rocket.txt}, which can be imported using
\begin{verbatim}
rocket <- read.table("rocket.txt",header=T)
\end{verbatim}
Treatment allocation is illustrated in the following table.
% xtable(matrix(rocket$treat,nr=5,byrow=T))
% latex table generated in R 2.6.2 by xtable 1.5-2 package
% Fri Mar 21 17:48:50 2008
\begin{table}[ht]
\begin{center}
\begin{tabular}{rlllll}
\hline
& \multicolumn{5}{c}{Operators} \\
\cline{2-6}
Batch & 1 & 2 & 3 & 4 & 5 \\
\hline
1 & A & B & C & D & E \\
2 & B & C & D & E & A \\
3 & C & D & E & A & B \\
4 & D & E & A & B & C \\
5 & E & A & B & C & D \\
\hline
\end{tabular}
\end{center}
\end{table}
The following command
\begin{verbatim}
plot(y~op+batch+treat,rocket)
\end{verbatim}
allows to sequantially inspect different boxplots of \verb|y| as a function of
one of the three factors.
\begin{verbatim}
rocket.lm <- lm(y~factor(op)+factor(batch)+treat,rocket)
anova(rocket.lm)
\end{verbatim}
Because the design is balanced, the order for entering term does not
matter. Here are the results of the ANOVA (Tab.~\ref{tab:rocket}). We should
only interpret the F-value associated with the treatment effect
(\verb|treat|).
% latex table generated in R 2.6.2 by xtable 1.5-2 package
% Fri Mar 21 17:58:05 2008
\begin{table}[ht]
\begin{center}
\begin{tabular}{lrrrrr}
\hline
& Df & Sum Sq & Mean Sq & F value & Pr($>$F) \\
\hline
factor(op) & 4 & 150.00 & 37.50 & 3.52 & 0.0404 \\
factor(batch) & 4 & 68.00 & 17.00 & 1.59 & 0.2391 \\
treat & 4 & 330.00 & 82.50 & 7.73 & 0.0025 \\
Residuals & 12 & 128.00 & 10.67 & & \\
\hline
\end{tabular}
\end{center}
\caption{Analysis of the Rocket data.}\label{tab:rocket}
\end{table}
Figure~\ref{fig:rocket} is an attempt to resume the structure of the factors
effects on the measured response. As can be seen, a large proportion of
overall variability in responses is captured by the treatment condition, and
operator appear to convey much variance than the blocking factor itself.
% plot.design(y~factor(op)+factor(batch)+treat,data=rocket)
\begin{figure}[htb]
\centering
\includegraphics[width=.7\textwidth]{./img/rocket.eps}
\caption{Factors effects plot.}\label{fig:rocket}
\end{figure}
\section{Graeco-Latin Square Design}
\section{Balanced Incomplete Block Designs}
Balanced Incomplete Block Designs (BIBD) are a class of randomized block
designs whereby every treatment is not observed for every block present in the
experiment. If we denote by $a$ the number of treatments, and $k$ the maximum
number of treatments for each block ($kF)
treat 3 22.7500 7.5833 11.667 0.01074 *
Residuals 5 3.2500 0.6500
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
\end{verbatim}
This way, we have computed adjusted MS for the catalyst effect. We might be
interested in the adjusted MS for the block effect. This can easily be found
using the appropriate error term, \verb|Error(treat)|, and we get
\begin{verbatim}
Error: treat
Df Sum Sq Mean Sq
treat 3 11.6667 3.8889
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
block 3 66.083 22.028 33.889 0.0009528 ***
Residuals 5 3.250 0.650
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
\end{verbatim}
If we want to get both estimates in a single pass, like \textbf{Minitab}, we
can wrap the two calls to the \verb|aov()| function in a single function with
little effort. Table~\ref{tab:tab.4.21} summarizes both estimates (unadjusted
and adjusted) and associated $p$-values.
\begin{table}[htb]
\centering
\begin{tabular}{lrrrr}
\hline
Effect & df & MS & F & $p$-value \\
\hline
\verb|treat| & 3 & 3.889 & & \\
\verb|treat| (Adj.) & 3 & 7.583 & 11.667 & 0.01074 \\
\verb|block| & 3 & 18.333 & & \\
\verb|block| (Adj.) & 3 & 22.028 & 33.889 & 0.00095 \\
\hline
\end{tabular}
\caption{Summary of BIB analysis.}
\label{tab:tab.4.21}
\end{table}
Another solution is to use the \verb|BIB.test()| function located in
the \verb|agricolae| package. Actually, there is no formula interface
in function call, so we have to pass separetly the blocking factor, th
fixed treatment and the response variable.
\begin{verbatim}
require(agricolae)
BIB.test(tab.4.21.df$treat,tab.4.21.df$treat,tab.4.21.df$rep,
method="tukey",group=FALSE)
\end{verbatim}
\paragraph{Note.} Actually, I did not explore all the functionnalities
of this function and its behavior (e.g. parameter
\verb|group=|). Further, I cannot get correct result with the above code!
Tukey pairwise differences (\verb|treat| factor) can be computed as follow:
\begin{verbatim}
tab.4.21.lm <- lm(rep~block+treat,tab.4.21.df)
treat.coef <- tab.4.21.lm$coef[5:7]
# effect for catalyst 4 (baseline) is missing, so we add it
treat.coef <- c(0,treat.coef)
pairwise.diff <- outer(treat.coef,treat.coef,"-")
\end{verbatim}
Inspecting the output of \verb|summary(tab.4.21.lm)|, we see that the standard
error is estimated to be 0.6982. More generally, SE can be obtained as
$\sqrt{\frac{2k}{\lambda t}}\hat{\sigma}$. The corresponding Tukey critical
value ($1-\alpha=0.95$) is given by
\begin{verbatim}
crit.val <- qtukey(0.95,4,5)
\end{verbatim}
However, the \verb|BIB.test()| function directly gives \textsc{lsd},
Tukey or Waller-Duncan tests for comparing the treatment levels.
Instead of a tabular summary, we may plot the confidence intervals as a
function of each paired comparison (Fig.~\ref{fig:tab.4.21_2}, right).
\begin{verbatim}
ic.width <- crit.val*0.6982/sqrt(2)
xx <- pairwise.diff[lower.tri(pairwise.diff)]
plot(xx,1:6,xlab="Pairwise Difference
(95% CI)",ylab="",xlim=c(-5,5),pch=19,cex=1.2,axes=F)
axis(1,seq(-5,5))
mtext(c("4-1","4-2","4-3","1-2","1-3","2-3"),side=2,at=1:6,line=2,las=2)
segments(xx-ic.width,1:6,xx+ic.width,1:6,lwd=2)
abline(v=0,lty=2,col="lightgray")
\end{verbatim}
Next, we may be interested in assessing whether this BIB performs better
than a complete randomized design (without blocking). Following
\cite{Faraway:2002}, we can compute the relative efficiency as
\[
\frac{\sigma_{CRD}^2}{\sigma_{RCBD}^2}
\]
In this case, it happens to be computed as
\begin{verbatim}
tab.4.21.lm.crd <- lm(rep~treat,tab.4.21.df)
(summary(tab.4.21.lm.crd)$sig/summary(tab.4.21.lm)$sig)^2
\end{verbatim}
and we get a RE of 13\%. Thus, a CRD would require 13\% more observations to
obtain the same level of precision as our BIB.
Recovering Interblock Information calls for some tricky manipulation within
\R. The best combined estimator can be expressed as
$\tau_i^*=\alpha_1\hat{\tau}_i+\alpha_2\tilde{\tau}_i$: it is linear
combination of the intrablock and interblock estimators, with weights
inversely proportional to the variances of $\hat{\tau}_i$ and
$\tilde{\tau}_i$.
Following Equation~4-45, it can be estimated as
\begin{equation}
\tau_i^*=\frac{kQ_i(\sigma^2+k\sigma_{\beta}^2)+\left(\sum_{j=1}^bn_{ij}y_{\cdot
j}-kr\bar{y}_{\cdot\cdot}\right)\sigma^2}{(r-\lambda)\sigma^2+\lambda
a(\sigma^2 + k\sigma_{\beta}^2)}\qquad i=1,2,\dots,a
\end{equation}
As $\sigma^2$ and $\sigma_{\beta}^2$ are unknown parameters, they are replaced
by their estimates. We will use the error MS from the intrablock analysis,
for $\sigma^2$.
%% TODO
%% compute the combined estimator
Figure~\ref{fig:tab.4.21} (left) shows the observed response for this $4\times
4$ design. OLS fit have been superimposed for each ``random'' block. This
treillis graphics \cite{Becker:1995} has been produced using the following
commands:
\begin{verbatim}
require(lattice)
xyplot(rep~treat|block,tab.4.21.df,
aspect="xy",xlab="Catalyst",ylab="Response",
panel=function(x,y) {
panel.xyplot(x,y)
panel.lmline(x,y)
})
\end{verbatim}
\begin{figure}[htb]
\centering
\begin{minipage}[t]{.48\textwidth}\centering
\includegraphics[width=\textwidth]{./img/tab.4.21.eps}
\caption{The catalyst experiment. Response measured in each block as a
function of the type of catalyst ($1,2,3,4$) used.}\label{fig:tab.4.21}
\end{minipage}\hfill
\begin{minipage}[t]{.48\textwidth}\centering
\includegraphics[width=\textwidth]{./img/tab.4.21_2.eps}
\caption{Tukey 95\% simultaneaous confidence
intervals.}\label{fig:tab.4.21_2}
\end{minipage}
\end{figure}
We would obtain the same results if we were to use the \verb|lme4| package,
which rests in this case on REML estimation.
\begin{verbatim}
require(lme4)
print(tab.4.21.lm <- lmer(rep~treat+(1|block),tab.4.21.df),corr=F)
\end{verbatim}
Now, we get a rather more informative output including variance components
estimates together with their standard errors.
\begin{verbatim}
Linear mixed-effects model fit by REML
Formula: rep ~ treat + (1 | block)
Data: tab.4.21.df
AIC BIC logLik MLdeviance REMLdeviance
44.22 46.64 -17.11 38.57 34.22
Random effects:
Groups Name Variance Std.Dev.
block (Intercept) 8.00907 2.83003
Residual 0.65035 0.80644
number of obs: 12, groups: block, 4
Fixed effects:
Estimate Std. Error t value
(Intercept) 74.9704 1.4963 50.11
treat1 -3.5573 0.6973 -5.10
treat2 -3.3541 0.6973 -4.81
treat3 -2.9704 0.6973 -4.26
\end{verbatim}
Should it be of interest to use other linear contrasts for \verb|treat|, we
shall simply remove the intercept from the previous model.
\begin{verbatim}
print(tab.4.21.lm0 <- lmer(rep~-1+treat+(1|block),tab.4.21.df))
\end{verbatim}
However, we can notice in the above output that the intercept term equals
74.97, which doesn't correspond to the observed mean when \verb|treat=4|. In
fact, this is the mean of the predicted means (for the 4th catalyst) across
the four blocks. This follows from the fact that we introduce \verb|block| as
a random effect for which separate intercept have to be computed
(\verb+(1|block)+).
\begin{verbatim}
coef(tab.4.21.lm)[[1]]$`(Intercept)`
mean(coef(tab.4.21.lm)[[1]][,1])
\end{verbatim}
\paragraph{Remark.} Constructing a BIB is not an easy task. One way to build
such a design is to start with a multifactorial design (most often used as a
\emph{screening} device when we have multiple factor of interest but want to
include only the relevant ones in the mail trial). Quoting
\cite{Benoist:1994}, we can build a $L_{12}2^{11}$ design by selecting the
first column (1st factor), then applying a circular permutation to its
elements. The following code is a quick implementation of such a
construction but is not optimized anyway.\footnote{More efficient
algorithms for permutations are available in various \R\ packages
(e.g.~\texttt{vegan}, \texttt{sna}, \texttt{dprep}, \texttt{e1071},
\texttt{gdata}, \texttt{magic}).}
\begin{verbatim}
col <- c(1,1,0,1,1,1,0,0,0,1,0)
perm <- function(x) {
s <- length(x)
m <- matrix(nc=s,nr=s)
y <- rep(x,2)
m[,1] <- x
for (i in 2:s) { m[,i] <- y[i:(s+i-1)] }
m
}
col.perm <- perm(col)
bib11 <- rbind(rep(0,11),col.perm)
# check that the design is well balanced
apply(bib11[-1,],1,sum)
apply(bib11,2,sum)
\end{verbatim}
With this particular design, we can check that there are exactly 6 factors per
block, and, reciprocally, only 6 blocks are associated with each factor.
For reading easiness (at least from my point of view), we can plot the design
matrix rather than displaying it in a tabular format
(Fig.~\ref{fig:tab.4.21_3}). This way, it looks like a confusion matrix.
\begin{figure}[htb]
\centering
\includegraphics[width=.50\textwidth]{./img/tab.4.21_3.eps}
\caption{A BIBD with 10 blocks $\times$ 10 factors.}\label{fig:tab.4.21_3}
\end{figure}
Other examples of block designs analyzed with \R\ are covered in
\cite{Faraway:2005} (Chapter~16).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% History
%% 2008-08-25: start of the chapter
\chapter{Introduction to Factorial Design}
\label{cha:intr-fact-design}
\section{Summary of Chapter 5}
Chapter~5 deals with the analysis f balanced two-factors design. When
appropriately used, factorial designs increase design efficiency, and
it can be shown that the same accuracy can be obtained with a minimum
of essays compared to separate one-way experiment. The fundamental
\textsc{anova} equation is extended to account for the variability
explained by a second factor and a possible interaction between the
two factors. The concept of interaction is often of primary interest
and need to be well understood, both from a scientific and a
statistical point of view.
\section{The two-factor factorial design}
In the general case, the effects model ressembles
\begin{equation}\label{eq:two-factor-fd}
y_{ijk}=\mu+\tau_i+\beta_j+(\tau\beta)_{ij}+\varepsilon_{ijk}
\end{equation}
where $i$, $j$ ($i=1\dots a$, $j=1\dots b$) span the levels of factor
$A$ and $B$, while $k$ stands for the observation number ($k=1\dots
n$). The order in which the $abn$ observations are taken is selected
at random, so this design is said to be a \emph{completely randomized
design}.
In case one or more factor are quantitative, a regression model is
even easily formalized. Note that if we write down the normal equations
related to the above model, it can be shown that there are $a+b+1$
linear dependencies in the system of equations. As a consequence, the
parameters are not uniquely determined and we say that the model is
not directly \emph{estimable} without imposing some constraints. This
happens to be: $\sum_{i=1}^a\hat{\tau}_i=0$,
$\sum_{j=1}^b\hat{\beta}_j=0$, $\sum_{i=1}^a\widehat{\tau\beta}_{ij}=0$
($j=1,2,\dots,b$) and $\sum_{j=1}^b\widehat{\tau\beta}_{ij}=0$
($i=1,2,\dots,a$).
With some algebra, \ref{eq:two-factor-fd} can be expressed as a
(corrected) total sum of sum of squares:
\begin{equation}\label{eq:two-factor-ss}
\begin{split}
\sum_{i=1}^a\sum_{j=1}^b\sum_{k=1}^n(y_{ijk}-\bar{y}_{\cdot\cdot\cdot})^2
&= \sum_{i=1}^a\sum_{j=1}^b\sum_{k=1}^n[
(\bar{y}_{i\cdot\cdot}-\bar{y}_{\cdot\cdot\cdot})+(\bar{y}_{\cdot
j\cdot}-\bar{y}_{\cdot\cdot\cdot}) \\
& \quad + (\bar{y}_{ij\cdot}-\bar{y}_{i\cdot\cdot}-\bar{y}_{\cdot
j\cdot}+\bar{y}_{\cdot\cdot\cdot})+(y_{ijk}-\bar{y}_{ij\cdot})]^2\\
&=
bn\sum_{i=1}^a(\bar{y}_{i\cdot\cdot}-\bar{y}_{\cdot\cdot\cdot})^2+an\sum_{j=1}^b(\bar{y}_{\cdot
j\cdot}-\bar{y}_{\cdot\cdot\cdot})^2 \\
& \quad + n\sum_{i=1}^a\sum_{j=1}^b(\bar{y}_{ij\cdot}-\bar{y}_{i\cdot\cdot}-\bar{y}_{\cdot
j\cdot}+\bar{y}_{\cdot\cdot\cdot})^2\\
& \quad + \sum_{i=1}^a\sum_{j=1}^b\sum_{k=1}^n(y_{ijk}-\bar{y}_{ij\cdot})^2
\end{split}
\end{equation}
Symbolically, this decomposition can also be expressed as:
\begin{equation}\label{eq:anova2}
SS_T=SS_A+SS_B+SS_{AB}+SS_E
\end{equation}
and as can be seen from the last component of the right-hand side of
Equation~\ref{eq:two-factor-ss}, there must be at least two replicates
($n\geq 2$) to obtain an error sum of squares. As for the one-way
layout, this component will be called the residual or the error term.
Hypotheses testing proceeds in three steps:
\begin{itemize}
\item equality of row treatment effects\\ $H_0:\; \tau_1=\tau_2=\dots=\tau_a=0$
\item equality of column treatment effects\\ $H_0:\; \beta_1=\beta_2=\dots=\beta_b=0$
\item no interaction between row and column treatment\\ $H_0:\;
(\tau\beta)_{ij}=0\quad \textrm{for all}\; i,j$
\end{itemize}
Applied to the data found in \texttt{battery.txt}, we can set up a $3^2$
factorial design (two factors at three levels) very easily. The data
consists in a study of the effect of temperature (°F) and a design
parameter with three possible choices. The aim is to design a battery
for use in a device subjected to extreme variations of temperature.
\begin{verbatim}
battery <- read.table("battery.txt",header=TRUE)
battery$Material <- as.factor(battery$Material)
battery$Temperature <- as.factor(battery$Temperature)
summary(battery)
\end{verbatim}
Now, the two-way ANOVA model, including an interaction effect, is
computed as follows:
\begin{verbatim}
battery.aov <- aov(Life~Material*Temperature,data=battery)
\end{verbatim}
Note that \verb|Life~Material*Temperature| is equivalent to
\verb|Life~Material+Temperature+|\newline
\verb|Material*Temperature|, where each
effect is given explicitely, or \verb|Life~.+.^2|, where all factor
included in the data frame are included, together with the
second-order interaction(s).
Results are obtained using \verb|summary(battery.aov)|, and are
printed in Table~\ref{tab:battery}. All three effects are significant,
especially the Temperature effect which account for about 50\% of the
total variability in battery life.
% latex table generated in R 2.7.1 by xtable 1.5-2 package
% Mon Aug 25 20:10:32 2008
\begin{table}[htb]
\begin{center}
\begin{tabular}{lrrrrr}
\hline
& Df & Sum Sq & Mean Sq & F value & Pr($>$F) \\
\hline
Material & 2 & 10683.72 & 5341.86 & 7.91 & 0.0020 \\
Temperature & 2 & 39118.72 & 19559.36 & 28.97 & 0.0000 \\
Material:Temperature & 4 & 9613.78 & 2403.44 & 3.56 & 0.0186 \\
Residuals & 27 & 18230.75 & 675.21 & & \\
\hline
\end{tabular}
\caption{\textsc{anova} table for the $3^2$ battery experiment.}
\label{tab:battery}
\end{center}
\end{table}
Most of the time, a plot of the averaged response variable will be
very useful to gain insight into the effects displayed in the
\textsc{anova} table. In Figure~\ref{fig:battery}, we have plotted the
average Life $\bar{y}_{ij\cdot}$ as a function of Temperature, for
each Material type. Each point in the graph is thus the mean of 4
observations. We call this an interaction plot.
\begin{verbatim}
with(battery, interaction.plot(Temperature,Material,Life,type="b",pch=19,
fixed=T,xlab="Temperature (°F)",ylab="Average life"))
\end{verbatim}
It can be seen that average life decreases as temperature increases,
with Material type 3 leading to extended battery life compared to the
other, especially at higher temperature, hence the interaction effect.
Another useful plot is the effects plot, which can be obtained with
\verb|plot.design()| which takes as an argument the same formula as
that passed to the \verb|aov()| function. Thus,
\begin{verbatim}
plot.design(Life~Material*Temperature,data=battery)
\end{verbatim}
gives the picture given in Figure~\ref{fig:battery2}a. The large
Temperature effect is reflected in the range of battery life variation
induced by its manipulation.
\begin{figure}[htb]
\centering
\includegraphics[width=.7\textwidth]{./img/battery.eps}
\caption{Material type--temperature plot for the battery life
experiment.}\label{fig:battery}
\end{figure}
Now, we have to follow the same routes as in Chapter~3 and run
multiple comparisons as well as check model adequacy. These are
basically the same principles that what we described
pp.~\pageref{sec:ModelCheck} and \pageref{sec:MCP}, so we don't go
further into details for this chapter. Note, however, that model
checking should be done on each ``treatment'' (i.e. crossing each
factor level together).
\begin{figure}[htb]
\centering
\subfigure[]{\includegraphics[width=.45\textwidth]{./img/battery2.eps}}
\subfigure[]{\includegraphics[width=.45\textwidth]{./img/battery3.eps}}
\caption{(a) Effect display. (b) Diagnostic plots.}\label{fig:battery2}
\end{figure}
With such a design, Tukey's HSD are widely appreciated from
researchers. Applying \verb|TukeyHSD(battery.aov,which="Material")|
gives the following results:
\begin{verbatim}
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = Life ~ . + .^2, data = battery)
$Material
diff lwr upr p adj
2-1 25.16667 -1.135677 51.46901 0.0627571
3-1 41.91667 15.614323 68.21901 0.0014162
3-2 16.75000 -9.552344 43.05234 0.2717815
\end{verbatim}
But this not actually what we should compute because the interaction
is significant. Thus the effect of Material depends on which level of
Temperature is considered. If we decide to study the material effect
at 70°F, we get a slightly comparable picture (I do it by hand as I
cannot find a proper \R\ way), but it the right way to compute means
contrast in presence of a significant interaction.
\begin{verbatim}
# we compute the three means at Temperature=70°F
mm <- with(subset(battery, Temperature==70),
aggregate(Life,list(M=Material),mean))
# next the studentized t quantile times the error type (based on pooled SD
# from ANOVA)
val.crit <- qtukey(.95,3,27)*sqrt(unlist(summary(battery.aov))[["Mean Sq4"]]/4)
# finally compare the observed difference of means with the critical value
diff.mm <- c(d.3.1=mm$x[3]-mm$x[1],d.3.2=mm$x[3]-mm$x[2],d.2.1=mm$x[2]-mm$x[1])
names(which(diff.mm > val.crit))
\end{verbatim}
In conclusion, only Material type 3 vs. type 1 and Material type 2
vs. type 1 appear to be significantly different when Temperature is
fixed at 70°F.
Model adequacy, or \emph{residual analysis}, is shown in
Figure~\ref{fig:battery2}b: This includes a plot of residuals or
standardized residuals against fitted values, a Q-Q plot, and a plt of
leverage and Cook's distance. For the two-factor factorial model,
residuals are defined as $e_{ijk}=y_{ijk}-\hat{y}_{ijk}$. Since
$\hat{y}_{ijk}=\bar{y}_{ij\cdot}$ (we average over observations in the
$ij$th cell), the above equation is equivalent to
\begin{equation}\label{eq:twofactor-resid}
e_{ijk}=y_{ijk}-\bar{y}_{ij\cdot}
\end{equation}
Examining the plot of residuals vs. fitted values, we can see that a
larger variance is associated to larger fitted value, and two
observations (2 and 4) are highlighted in Figure~\ref{fig:battery2}b
(top left panel); in other words, the 15°F-material type 1 cell
contains extreme residuals that account for the inequality of
variance. This is easily seen using a command like
\verb|with(battery, tapply(Life,list(Material,Temperature),var))|, which gives
\begin{verbatim}
15 70 125
1 2056.9167 556.9167 721.0000
2 656.2500 160.2500 371.0000
3 674.6667 508.2500 371.6667
\end{verbatim}
We could also imagine using a Model without interaction, where
appropriate. This resumes to removing the $(\tau\beta)_{ij}$ term in
Model~\ref{eq:two-factor-fd}. Applied to the battery life data,
\verb|summary(battery.aov2 <- aov(Life~Material+Temperature,data=battery))|
leads to the following results:
\begin{verbatim}
Df Sum Sq Mean Sq F value Pr(>F)
Material 2 10684 5342 5.9472 0.006515 **
Temperature 2 39119 19559 21.7759 1.239e-06 ***
Residuals 31 27845 898
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
\end{verbatim}
Obviously, the two main effets are still highly significant. However,
residual analysis of this ``reduced'' model (Fig.~\ref{fig:battery4})
shows that a no interaction model is not appropriate. In this figure,
we plot the $\bar{y}_{ij\cdot}$ against fitted values for the no
interaction model,
$\hat{y}_{ijk}=\bar{y}_{i\cdot\cdot}+\bar{y}_{\cdot j\cdot}-\bar{y}_{\cdot\cdot\cdot}$. This
can be viewed as the difference between the observed cell means and
the estimated cell means assuming no interaction; any pattern in this
plot is thus suggestive of the presence of an interaction.
\begin{verbatim}
mm2 <- with(battery, tapply(Life,list(Material,Temperature),mean))
mm2 <- as.vector(mm2)
plot(fitted(battery.aov2)[seq(1,36,by=4)],mm2-fitted(battery.aov2)[seq(1,36,by=4)],
xlab="",ylab=expression(bar(y)[ij.]-hat(y)[ijk]),
pch=19,axes=FALSE,ylim=c(-30,30))
axis(1,at=seq(0,200,by=50),pos=0)
text(155,4,expression(hat(y)[ijk]),pos=4)
axis(2,at=seq(-30,30,by=10),las=1)
\end{verbatim}
\begin{figure}[htb]
\centering
\includegraphics[width=.5\textwidth]{./img/battery4.eps}
\caption{Plot of $\bar{y}_{ij\cdot}$ vs. $\hat{y}_{ijk}$.}\label{fig:battery4}
\end{figure}
There is a clear tendency toward alternated fitted values differences with
increasing predicted values, $\bar{y}_{ij\cdot}$. To highlight that
pattern, we can superimpose a loess line on the above plot.
\begin{verbatim}
yy <- order(fitted(battery.aov2)[seq(1,36,by=4)])
xx.fit <- fitted(battery.aov2)[seq(1,36,by=4)]
yy.fit <- mm2-fitted(battery.aov2)[seq(1,36,by=4)]
lines(xx.fit[yy],predict(loess(yy.fit[yy]~xx.fit[yy])),col="lightgray",lwd=2)
\end{verbatim}
Now, suppose that we have only a single replicate per combination of
factor. If we turn back to Equation~\ref{eq:two-factor-ss}, we see
that the error term is not estimable because it is now confounded with
the interaction term. The corresponding expected mean square proved to
be
\[
\sigma^2+\frac{\sum\sum(\tau\beta)_{ij}^2}{(a-1)(b-1)}.
\]
As a consequence, no tests on main effects can be carried out unless
the interaction effect is zero, i.e. $(\tau\beta)_{ij}=0$. If, after
removing this term, the effects model is correct, then the residual MS
above is an unbiased estimator of $\sigma^2$.
There is a test for determining whether interaction is present or not.
The Tukey's non-additivity, or curvature, test \cite{Tukey:1949a}
consists in separating the residual SS into a single degree of freedom
component due to non-additivity, or interaction, and a component for
error with $(a-1)(b-1)-1$ degrees of freedom. An $F$ test of
$\frac{SS_N}{SS_E/\left[(a-1)(b-1)-1\right]}$, with 1 and
$(a-1)(b-1)-1$ degrees of freedom, at the $\alpha$ level, allow to
reject the null hypothesis of no interaction.
An example of this procedure is shown using \texttt{impurity.txt}
which contains data from a study of a chemical product and its
impurities resulting from two factors---pressure and temperature.
\begin{verbatim}
impurity <- read.table("impurity.txt",header=TRUE)
impurity$Temperature <- as.factor(impurity$Temperature)
impurity$Pressure <- as.factor(impurity$Pressure)
\end{verbatim}
Applying a full factorial model with interaction gives no tests
for main effects as expected. The command used to display is rather
simple:\\
\verb|summary(aov(N~Temperature*Pressure,impurity))|
\begin{verbatim}
Df Sum Sq Mean Sq
Temperature 2 23.333 11.667
Pressure 4 11.600 2.900
Temperature:Pressure 8 2.000 0.250
\end{verbatim}
The idea is thus to decompose the SS of the
Temperature$\times$Pressure term into two additional terms. Although
we can do it ourselves, there is an interesting function in the
\verb|alr3| package which provides both a graphical assessment of the
presence of a potential interaction as well as the formal Tukey's
test. In fact, it amounts to be a test on a quadratic term added to
the model $y \sim A+B$.
\begin{verbatim}
library(alr3)
residual.plots(lm(N~Temperature+Pressure,impurity))
\end{verbatim}
As a result, we get a test statistic of 0.602 with a $p$-value of
0.547. Montgomery provides an $F$ test, as described in the procedure
above, which equals 0.36. This is exactly $0.602^2$ because
\verb|residual.plots()| returns a $t$-test. As a side-effect, this
function also a plot studentized residuals against fitted values, with
the fitted quadratic term (dotted line), as shown in
Figure~\ref{fig:impurity} (lower left panel).
\begin{figure}[htb]
\centering
\includegraphics[width=.7\textwidth]{./img/impurity.eps}
\caption{Curvature test for the impurity data.}\label{fig:impurity}
\end{figure}
\section{General factorial design, response curves and surfaces}
The model described in the preceding section can be generalized to any
number of fixed effects, and there will be as much second order
interaction terms as there are factors, plus third order interaction
term(s).
As an example of a three-factors design, we can consider the data in
the \texttt{bottling.txt} file. In this study, a soft drink bottler is
interested in obtaining more uniform fill heights in the bottles
produced by his manufacturing process. The process engineer can
control three variables during the filling process: the percent
carbonation (A), the oprating pressure in the filler (B), and the
bottles produced per minute on the line speed (C). The factorial model
can be written as
\[
y \sim A+B+C+AB+AC+BC+ABC
\]
where $y$ is the response variable, i.e. the fill height deviation.
We happen to set up the data as follows:
\begin{verbatim}
bottling <- read.table("bottling.txt",header=TRUE,
colClasses=c("numeric",rep("factor",3)))
summary(bottling)
\end{verbatim}
and brief summary plot is shown in Figure~\ref{fig:bottling}. Commands
used for this display are rather simple, and we could probably do
better.
\begin{verbatim}
opar <- par(mfrow=c(2,2),cex=.8)
boxplot(Deviation~.,data=bottling,las=2,cex.axis=.8,ylab="Deviation")
abline(h=0,lty=2)
par(las=1)
mm <- with(bottling, tapply(Deviation,Carbonation,mean))
ss <- with(bottling, tapply(Deviation,Carbonation,sd))
bp <- barplot(mm,xlab="Carbonation",ylab="Deviation",ylim=c(-2,9))
arrows(bp,mm-ss/sqrt(4),bp,mm+ss/sqrt(4),code=3,angle=90,length=.1)
with(bottling, interaction.plot(Carbonation,Pressure,Deviation,type="b"))
with(bottling, interaction.plot(Carbonation,Speed,Deviation,type="b"))
par(opar)
\end{verbatim}
It is used here to highlight the different possible viewings
of this highly structured dataset. Top left panel is a plot of each
``cell'', i.e. each combination of the three factors, with $n=4$
observations per cell. This is rather uninformative as we are mainly
interested in the effect of the factors themselves, as well as their
potential interaction of the variation of the response variable. Top
right panel presents aggregated data on $B$ and $C$: The means
observed, together with their standard errors, are plotted against the
three level of Carbonation. Finally, the two lower panels are
interaction plot, but only display two factors at the same time, thus
we are still lacking a complete overview of the main and interaction
effects.
\begin{figure}[htb]
\centering
\includegraphics[width=.7\textwidth]{./img/bottling.eps}
\caption{The bottling dataset.}\label{fig:bottling}
\end{figure}
The three-way \textsc{anova} model is summarized in
Table~\ref{tab:bottling}.
\begin{verbatim}
summary(bottling.aov <- aov(Deviation~.^3,bottling))
\end{verbatim}
% latex table generated in R 2.7.1 by xtable 1.5-2 package
% Tue Aug 26 11:10:05 2008
\begin{table}[ht]
\begin{center}
\begin{tabular}{lrrrrr}
\hline
& Df & Sum Sq & Mean Sq & F value & Pr($>$F) \\
\hline
Carbonation & 2 & 252.75 & 126.37 & 178.41 & 0.0000 \\
Pressure & 1 & 45.37 & 45.37 & 64.06 & 0.0000 \\
Speed & 1 & 22.04 & 22.04 & 31.12 & 0.0001 \\
Carbonation:Pressure & 2 & 5.25 & 2.63 & 3.71 & 0.0558 \\
Carbonation:Speed & 2 & 0.58 & 0.29 & 0.41 & 0.6715 \\
Pressure:Speed & 1 & 1.04 & 1.04 & 1.47 & 0.2486 \\
Carbonation:Pressure:Speed & 2 & 1.08 & 0.54 & 0.76 & 0.4869 \\
Residuals & 12 & 8.50 & 0.71 & & \\
\hline
\end{tabular}
\caption{Results of the saturated model for the bottling
data.}\label{tab:bottling}
\end{center}
\end{table}
As can be seen from the \textsc{anova} table, main effects are all
significant, while none of the four interaction effects are. Note,
however, that the Carbonation$\times$Pressure interaction is
marginally significant but exceeds the conventional 5\% significance
level. Such results suggest that we may remove the interaction terms,
which is done in the next step (Note that we could have used the
\verb|update()| command which allows to quickly update a given model,
but in this case it is rather borrying to remove all interaction
effects).
\begin{verbatim}
bottling.aov2 <- aov(Deviation~.,bottling)
anova(bottling.aov2,bottling.aov)
\end{verbatim}
Table~\ref{tab:bottling2} show the new $p$-values associated with the
$F$ tests on the main effects only. The last \R\ command in the above
code allows to formally test for the benefit of including the
interaction terms in the model. As the $F$-test is non-significant
($p=0.225$), we could undoubetly remove these terms. We should have
been able to suppose that no interaction are present by simply looking
at the interaction plots shown in Figure~\ref{fig:bottling}. In each
case, both line are about to be strictly paralell, which is implied by
the absence of interaction in the factorial model.
% latex table generated in R 2.7.1 by xtable 1.5-2 package
% Tue Aug 26 11:32:50 2008
\begin{table}[ht]
\begin{center}
\begin{tabular}{lrrrrr}
\hline
& Df & Sum Sq & Mean Sq & F value & Pr($>$F) \\
\hline
Carbonation & 2 & 252.75 & 126.37 & 145.89 & 0.0000 \\
Pressure & 1 & 45.37 & 45.37 & 52.38 & 0.0000 \\
Speed & 1 & 22.04 & 22.04 & 25.45 & 0.0001 \\
Residuals & 19 & 16.46 & 0.87 & & \\
\hline
\end{tabular}
\caption{Results of the reduced model for the bottling data, showing
only significant main effects.}\label{tab:bottling2}
\end{center}
\end{table}
Now, if we turn back to the battery life experiment, what if we include a
quadratic term for the quantitative variable, Temperature?
\begin{verbatim}
battery$Temperature.num <- as.numeric(as.character(battery$Temperature))
battery.aov3 <- aov(Life~Material+Temperature.num+I(Temperature.num^2)
+Material:Temperature.num+Material:I(Temperature.num^2),
data=battery)
summary(battery.aov3)
\end{verbatim}
Note the use of \verb|as.character| conversion to get a proper
interpretation of numerical values of the Temperature factor. If
instead we use directly \verb|as.numeric(battery$Temperature)|, we
would get a $1,2,3$ coding which is not what we really want,
since we are planning to predict actual values of $y$ (see below).
This time, the model can be written symbolically as
\[
y \sim A + B + B^2 + AB + AB^2
\]
with $B$ denoting the Temperature factor. Summary of the results is
shown in Table~\ref{tab:battery2}.
% latex table generated in R 2.7.1 by xtable 1.5-2 package
% Tue Aug 26 11:52:19 2008
\begin{table}[ht]
\begin{center}
\begin{tabular}{lrrrrr}
\hline
& Df & Sum Sq & Mean Sq & F value & Pr($>$F) \\
\hline
Material & 2 & 10683.72 & 5341.86 & 7.91 & 0.0020 \\
Temperature.num & 1 & 39042.67 & 39042.67 & 57.82 & 0.0000 \\
I(Temperature.num\verb|^|2) & 1 & 76.06 & 76.06 & 0.11 & 0.7398 \\
Material:Temperature.num & 2 & 2315.08 & 1157.54 & 1.71 & 0.1991 \\
Material:I(Temperature.num\verb|^|2) & 2 & 7298.69 & 3649.35 & 5.40 & 0.0106 \\
Residuals & 27 & 18230.75 & 675.21 & & \\
\hline
\end{tabular}
\caption{Fitting the battery life data with an additional quadratic
effect of Temperature.}\label{tab:battery2}
\end{center}
\end{table}
If we look at the predicted values for this model, the results shown
in Figure~\ref{fig:battery5} are more in agreement with the intuitive
idea that there is an optimal Temperature that depends of
Material type (cf. the significant $AB^2$ interaction effect in
Table~\ref{tab:battery2}), and for which battery life reaches its
maximum.
\begin{verbatim}
new <- data.frame(Temperature.num=rep(seq(15,125,by=5),3),Material=gl(3,23))
new$fit <- predict(battery.aov3,new)
opar <- par(las=1)
# we first plot the fitted values
with(new, interaction.plot(Temperature.num,Material,fit,legend=FALSE,
xlab="Temperature",ylab="Life",ylim=c(20,190)))
txt.leg <- paste("Material type",1:3)
text(5:7,new$fit[new$Temperature.num==c(45,55,65)]-c(3,3,-20),txt.leg,pos=1)
# next the observed values
points(rep(c(1,15,23),each=12),battery$Life,pch=19)
par(opar)
\end{verbatim}
Note that to get a smooth curve, we have to interpolate the
predictions over all the domain of temperature, when it is considered
as a numeric factor. This is done using the statement
\begin{figure}[htb]
\centering
\includegraphics[width=.5\textwidth]{./img/battery5.eps}
\caption{Model fit for the battery life experiment, including
non-linear effect of temperature.}\label{fig:battery5}
\end{figure}
When more than one factor are quantitative, we may use a response
surface to model the relationship between the response variable and
the design factor. This is illustrated using the data contained in
\texttt{toollife.txt}, which describes results gathered through a
study of the effect of cutting speed (A) and tool angle (B) on the
effective life of a cutting tool. Model fitted to the data is of the
following form
\[
y \sim A+B+A^2+B^2+AB^2+A^2B+AB
\]
We are mainly interested in some of the quadratic terms, both
quadratic and linear effects are crossed in the interaction terms, to
preserve the \emph{hierarchical principle} (inclusion of high-order
term should be followed by inclusion of the lower-order terms that
compose it), as in the preceding example.
\begin{verbatim}
tool <- read.table("toollife.txt",header=TRUE)
tool.lm <- lm(Life~Angle*Speed+I(Angle^2)*I(Speed^2)+Angle:I(Speed^2)
+I(Angle^2):Speed,tool)
tmp.angle <- seq(15,25,by=.1)
tmp.speed <- seq(125,175,by=.5)
tmp <- list(Angle=tmp.angle,Speed=tmp.speed)
new <- expand.grid(tmp)
new$fit <- c(predict(tool.lm,new))
\end{verbatim}
Partial output is reproduced below. Note that these are sequential sum
of squares so that they are orthogonal (because of the structure of
the $3^2$ design itself).
\begin{verbatim}
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.068e+03 7.022e+02 -1.521 0.1626
Angle 1.363e+02 7.261e+01 1.877 0.0932 .
Speed 1.448e+01 9.503e+00 1.524 0.1619
I(Angle^2) -4.080e+00 1.810e+00 -2.254 0.0507 .
I(Speed^2) -4.960e-02 3.164e-02 -1.568 0.1514
Angle:Speed -1.864e+00 9.827e-01 -1.897 0.0903 .
I(Angle^2):I(Speed^2) -1.920e-04 8.158e-05 -2.353 0.0431 *
Angle:I(Speed^2) 6.400e-03 3.272e-03 1.956 0.0822 .
Speed:I(Angle^2) 5.600e-02 2.450e-02 2.285 0.0481 *
\end{verbatim}
Rather than interpreting these results, we are mainly interested in
the way we can convey this information in an informative graphical
display. To this end, we may plot the fitted values against both
continuous predictor values, after interpolation. This is called a
contour plot. Figure~\ref{fig:tool} has been produced using the function
\verb|contourplot()| of the \verb|lattice| package.
\begin{verbatim}
require(lattice)
contourplot(fit~Angle*Speed,data=new,cuts=8,region=T,col.regions=gray(7:16/16))
\end{verbatim}
\begin{figure}[htb]
\centering
\includegraphics[width=.5\textwidth]{./img/tool.eps}
\caption{Contour plot for the cutting tool study.}\label{fig:tool}
\end{figure}
\section{Blocking in a factorial design}
Until now, we have only discussed factorial designs for completely
randomized experiment. As already discussed in Chapter 4, such
approach have to be avoided when possible nuisance factors that can't
be controlled are present. The idea is now to run a single replicate
of a complete factorial experiment within each block. This leads to
the model
\begin{equation}\label{eq:factorial-blocking}
y_{ijk}=\mu+\tau_i+\beta_{ij}+(\tau\beta)_{ij}+\delta_k+\varepsilon_{ijk}
\end{equation}
with the same notation as those of Model~\ref{eq:two-factor-fd}. Now,
we have included an effect for the $k$th block, but we assume that
interaction between blocks and treatment is negligible (as in the
\textsc{rcbd}). Interaction terms, $(\tau\delta)_{ik}$,
$(\beta\delta)_{jk}$, and $(\tau\beta\delta)_{ijk}$, are thus confounded
within the error term.
The \texttt{intensity.txt} file contains data on an experiment aiming
at improvinf the ability to detect targets on a radar scope. Twho
factors were considered, namely the amount of background noise
(``ground clutter'') on the scope and the type of filter placed over
the screen. They are considered as fixed effects. Because of operator
availability and varying degree of knowledge, it is convenient to
select an operator and keep him at the scope until all the necessary
runs have been made. They will be considered as blocks. We have thus
$3\times 2$ treatment combinations arranged in a randomized complete
block. Data are summarized in Figure~\ref{fig:intensity}.
\begin{verbatim}
intensity <- read.table("intensity.txt",header=TRUE,
colClasses=c("numeric",rep("factor",3)))
require(lattice)
xyplot(Intensity~Ground|Operator,data=intensity,groups=Filter,
panel=function(x,y,...){
subs <- list(...)$subscripts
panel.xyplot(x,y,pch=c(1,19),...)
panel.superpose(x,y,panel.groups="panel.lmline",lty=c(1,2),...)
},key=list(text=list(lab=as.character(1:2)),lines=list(lty=1:2,col=1),
corner=c(1,.95),title="Filter Type",cex.title=.8),col=1)
\end{verbatim}
We also fit an OLS line separetely for each Filter type and each
operator, to better appreciate visually the pattern of response. As
can be seen, there is a notable variability between individual slopes
and intercept. Blocking is thus a good way to uncover operator's
intrinsic effect.
\begin{figure}[htb]
\centering
\includegraphics[width=.7\textwidth]{./img/intensity.eps}
\caption{The intensity data.}\label{fig:intensity}
\end{figure}
To apply the \textsc{anova} model, we basically follow the steps
explained in Chapter~4 and specify blocks as a separate strata using
the \verb|Error()| option.
\begin{verbatim}
intensity.aov <- aov(Intensity~Ground*Filter+Error(Operator),intensity)
summary(intensity.aov)
\end{verbatim}
In Table~\ref{tab:intensity}, we see that \R\ does not display the
Blocks effects when we use the \verb|summary()| command. To obtain SS
specific to Blocks, we rather call directly the \verb|aov| object,
like \verb|intensity.aov| at the \R\ command prompt.
\begin{verbatim}
Call:
aov(formula = Intensity ~ Ground * Filter + Error(Operator),
data = intensity)
Grand Mean: 94.91667
Stratum 1: Operator
Terms:
Residuals
Sum of Squares 402.1667
Deg. of Freedom 3
Residual standard error: 11.57824
(...)
\end{verbatim}
We discarded the rest of the output which contains stratum 2 SS
already included in Table~\ref{tab:intensity}. What should be noted is
that the blocking factor SS is rather large compared to other main
effects SS (e.g., Ground, 335.58) or error term (166.33). This
confirms our intuitive idea based on the inspection of
Figure~\ref{fig:intensity} that there are large inter-individual
variation with respect to the response variable. Computing SS for the
blocking factor follows from the above formulation
(Equation~\ref{eq:factorial-blocking}) and it can be shown that
\begin{equation}\label{eq:ss-blocking}
SS_{\textrm{blocks}}=\frac{1}{ab}\sum_{k=1}^ny_{\cdot\cdot k}^2-\frac{y_{\cdot\cdot\cdot}^2}{abn}
\end{equation}
% latex table generated in R 2.7.1 by xtable 1.5-2 package
% Tue Aug 26 18:29:53 2008
\begin{table}[ht]
\begin{center}
\begin{tabular}{lrrrrr}
\hline
& Df & Sum Sq & Mean Sq & F value & Pr($>$F) \\
\hline
Residuals & 3 & 402.17 & 134.06 & & \\
Ground & 2 & 335.58 & 167.79 & 15.13 & 0.0003 \\
Filter & 1 & 1066.67 & 1066.67 & 96.19 & 0.0000 \\
Ground:Filter & 2 & 77.08 & 38.54 & 3.48 & 0.0575 \\
Residuals & 15 & 166.33 & 11.09 & & \\
\hline
\end{tabular}
\caption{Results of the \textsc{anova} model applied to the intensity
data.}\label{tab:intensity}
\end{center}
\end{table}
\chapter{The $2^k$ Factorial Design}
\label{cha:2k-factorial-design}
\section{Summary of Chapter 6}
\section{The $2^2$ design}
The $2^2$ design is the simpler design that belong to the general
family of the $2^k$ design. We consider two factors, $A$ and $B$, with
two levels each which can be tought of as ``low'' and ``high''
levels. The experiment may be replicated a number of times, say $k$, and in
this case we have $2\times 2\times k$ trials or runs, yielding a
\emph{completely randomized experiment}.
Suppose that we are interesting in investigating the effect of the
concentration of the reactant and the amount of the catalyst on the
conversion (yield) in a chemical process, with three replicates. The
objective is to study how reactant concentration (15 or 25\%) and the
catalyst (1 or 2 pounds) impact yield (\texttt{yield.txt}). Results
for the different treatment combinations of the above experiment are
summarized in Figure~\ref{fig:design22}, where a ``$+$'' sign means
the high level and a ``$-$'' sign means the corresponding low level.
\begin{figure}[htb]
\centering
\includegraphics[width=.5\textwidth]{./img/trtcomb22.eps}
\caption{Treatment combinations in the $2^2$ design.}\label{fig:design22}
\end{figure}
The average effect of a factor is defined as the change in response
produced by a change in the level of that factor averaged over the
levels of the other factors. In the preceding figure, the symbols
$(1)$, $a$, $b$, and $ab$ represent the total of all $n$ replicates
taken at the treatment combination. The effect of $A$ at the low level
of $B$ is then defined as $[a-(1)]/n$, and the effect of $A$ at the
high level of $B$ as $[ab-b]/n$. The average of these two quantities
yields the \emph{main effect} of $A$:
\begin{eqnarray}\label{eq:maineffect22}
A &=& \frac{1}{2n}\big\{[ab-b]+[a-(1)]\big\}\nonumber\\
&=& \frac{1}{2n}[ab+a-b-(1)].
\end{eqnarray}
Likewise, for $B$, we have:
\begin{eqnarray}\label{eq:maineffect22b}
B &=& \frac{1}{2n}\big\{[ab-a]+[b-(1)]\big\}\nonumber\\
&=& \frac{1}{2n}[ab+b-a-(1)];
\end{eqnarray}
and we define the \emph{interaction effect} $AB$ as the average
difference between the effect of $A$ at the high level of $B$ and the
effect of $A$ at the low level of $B$:
\begin{eqnarray}\label{eq:maineffect22c}
AB &=& \frac{1}{2n}\big\{[ab-b]-[a-(1)]\big\}\nonumber\\
&=& \frac{1}{2n}[ab+(1)-a-b].
\end{eqnarray}
As an alternative, one may consider that the effect of $A$ can be
computed as
\begin{eqnarray}\label{eq:maineffect22d}
A &=& \bar y_{A^+} - \bar y_{A^-}\nonumber\\
&=& \frac{ab+a}{2n}-\frac{b+(1)}{2n}\nonumber\\
&=& \frac{1}{2n}[ab+a-b-(1)],
\end{eqnarray}
which is exactly the results of \ref{eq:maineffect22}. The same
applies for the computation of the effect of $B$ and
$AB$. Numerically, we have:
\[
A=\frac{1}{2\times 3}(90+100-60-80)=8.33,
\]
or using \R:
\begin{verbatim}
yield <- read.table("yield.txt",header=T)
attach(yield)
rm(yield)
yield.sums <- aggregate(yield,list(reactant=reactant,catalyst=catalyst),sum)
\end{verbatim}
which gives all necessary information:
\begin{verbatim}
reactant catalyst x
1 high high 90
2 low high 60
3 high low 100
4 low low 80
\end{verbatim}
It should be noted that the effect of $A$ is positive which suggests
that increasing $A$ from the low level (15\%) to the high level (25\%)
will increase the yield. The reverse would apply for $B$, whereas the
interaction effect appears rather limited.
\paragraph{The ANOVA Model and contrasts formulation.}
An analysis of variance will help to estimate the direction and
magnitude of the factor effects. We already showed that a contrast is
used when estimating $A$, namely $ab+a-b-(1)$
(Eq.~\ref{eq:maineffect22}). This contrast is called the \emph{total
effect} of $A$. All three contrasts derived above are
\emph{orthogonal}. As the sum of squares for any contrast is equal to
the contrast squared and divided by the number of observations in each
total in the contrast times the SS of the contrast coefficients
(Chapter~3), we have:
\begin{equation}\label{eq:design22SS}
\left\{
\begin{array}{cc}
SS_A &= \frac{[ab+a-b-(1)]^2}{4n}\\
SS_B &= \frac{[ab+b-a-(1)]^2}{4n}\\
SS_{AB} &= \frac{[ab+(1)-a-b]^2}{4n}
\end{array}
\right.
\end{equation}
The total SS has $4n-1$ degrees of freedom and it is found in the
usual way, that is
\begin{equation}\label{eq:design22SSb}
SS_T\sum_{i=1}^2\sum_{j=1}^2\sum_{k=1}^2y_{ijk}^2-\frac{y_{\cdot\cdot\cdot}^2}{4n},
\end{equation}
whereas the error sum of squares, with $4(n-1)$ degrees of freedom, is
$SS_E=SS_T-SS_A-SS_B-SS_{AB }$.
The treatment combinations may be written as
\begin{center}
\begin{tabular}{lcccc}
\hline
Effects & (1) & a & b & ab\\
\hline
A & $-1$ & $+1$ & $-1$ & $+1$\\
B & $-1$ & $-1$ & $+1$ & $+1$\\
AB & $+1$ & $-1$ & $-1$ & $+1$\\
\hline
\end{tabular}
\end{center}
and this order is refered to as Yates's order
\marginlabel{\textsl{Frank Yates} (1902--1994) worked on sample survey
design and analysis. He is also the author of a book on the design and
analysis of factorial experiments.}. Since all contrasts are
orthogonal, the $2^2$ (and all $2^k$ designs) is an \emph{orthogonal design}.
\begin{verbatim}
summary(aov(yield~reactant*catalyst))
\end{verbatim}
From Table~\ref{tab:aov_yield}, we
can verify that both main effects are significant but the interaction
term, $AB$, is not.
% latex table generated in R 2.9.2 by xtable 1.5-5 package
% Sat Oct 3 21:09:29 2009
\begin{table}[ht]
\caption{Analysis of variance table for the ``yield'' experiment.}\label{tab:aov_yield}
\begin{center}
\begin{tabular}{lrrrrr}
\hline
& Df & Sum Sq & Mean Sq & F value & Pr($>$F) \\
\hline
reactant & 1 & 208.33 & 208.33 & 53.19 & 0.0001 \\
catalyst & 1 & 75.00 & 75.00 & 19.15 & 0.0024 \\
reactant:catalyst & 1 & 8.33 & 8.33 & 2.13 & 0.1828 \\
Residuals & 8 & 31.33 & 3.92 & & \\
\hline
\end{tabular}
\end{center}
\end{table}
\paragraph{The Regression Model.}
The coefficients estimated from a regression model
(see below) yield the following prediction equation:
\[
\hat y = 18.333 + 0.833x_{\textrm{\scriptsize reactant}} - 5.000x_{\textrm{\scriptsize catalyst}},
\]
where $x_{\textrm{\scriptsize reactant}}$ and $x_{\textrm{\scriptsize
catalyst}}$ refer to the values taken by the two factors. Here,
factors levels are treated with their corresponding numerical values
(1/2 for catalyst, 15/25 for reactant), but the ANOVA table would
remain the same whatever the values we assign to their
levels. However, the model parameters depend on the unit of
measurement. In the next \R{} script we convert the binary variables
to ordinal variables, with adequate values.
Note that the somewhat tricky manipulation ensures that the level are correctly
mapped to their numeric value.
\begin{verbatim}
reactant.num <- reactant
levels(reactant.num) <- c(25,15)
reactant.num <- as.numeric(as.character(reactant.num))
catalyst.num <- catalyst
levels(catalyst.num) <- c(2,1)
catalyst.num <- as.numeric(as.character(catalyst.num))
yield.lm <- lm(yield~reactant.num+catalyst.num)
yield.lm ## gives the coefficients of the LM
\end{verbatim}
Figure~\ref{fig:design22} shows the response surface and a contour
plot of the fitted response as a function of reactant and catalyst
values. Because we deal with a \emph{first-order model}, which includes only
the main effects, the fitted response surface is a plane. Such a
response surface is used to find a direction of potential improvement
for a process, although as will be seen in Chapter~11, the method of
\emph{steepest ascent} constitutes a more formal way to do so.
\begin{verbatim}
s3d <- scatterplot3d(reactant.num,catalyst.num,yield,type="n",
angle=135,scale.y=1,xlab="Reactant",ylab="Catalyst")
s3d$plane3d(yield.lm,lty.box="solid",col="darkgray")
tmp <- list(reactant.num=seq(15,25,by=.5),catalyst.num=seq(1,2,by=.1))
new.data <- expand.grid(tmp)
new.data$fit <- predict(yield.lm,new.data)
contourplot(fit~reactant.num+catalyst.num,new.data,xlab="Reactant",
ylab="Catalyst")
\end{verbatim}
\begin{figure}[htb]
\centering
\subfigure[Response surface.]{\includegraphics[width=.5\textwidth]{./img/s3dyield.eps}}
\subfigure[Contour plot.]{\includegraphics[width=.4\textwidth]{./img/contouryield.eps}}
\caption{Response surface plot for the yield data.}\label{fig:design22}
\end{figure}
\section{The $2^3$ design}
If we now consider three factors, $A$, $B$, and $C$, the design is
called a $2^3$ factorial design. There are then eight treatment
combinations that can be displayed as a cube
(Figure~\ref{fig:design23}) and are refered to as the \emph{design
matrix}. There are seven degrees of freedom between the eight
treatment combinations in the $2^3$ design: Three DF are associated
with each main effect, four DF are associated with interaction terms
(three second-order interactions and one third-order).
\begin{figure}[htb]
\centering
\begin{minipage}[c]{0.45\textwidth}
\includegraphics[width=.9\textwidth]{./img/trtcomb23.eps}\\
{\small (a) Geometric view}
\end{minipage}\hfill
\begin{minipage}[c]{0.45\textwidth}
\begin{footnotesize}
\begin{tabular}{cccc}
\hline
& \multicolumn{3}{c}{Factor}\\
Run & $A$ & $B$ & $C$\\
\hline
1 & $-$ & $-$ & $-$\\
2 & $+$ & $-$ & $-$\\
3 & $-$ & $+$ & $-$\\
4 & $+$ & $+$ & $-$\\
5 & $-$ & $-$ & $+$\\
6 & $+$ & $-$ & $+$\\
7 & $-$ & $+$ & $+$\\
8 & $+$ & $+$ & $+$\\
\hline
\end{tabular}
\end{footnotesize}\\
\vfill
{\small (b) Design matrix}
\end{minipage}
\caption{Treatment combinations in the $2^3$ design.}\label{fig:design23}
\end{figure}
The effect of $A$ when $B$ and $C$ are at the low level is
$[a-(1)]/n$. When $B$ is at the high level and $C$ at its low level,
it is $[ab-b]/n$. The effect of $A$ when $C$ is at the high level and
$B$ at the low level is $[ac-c]/n$. Finally, when both $B$ and $C$ are
at the high level, the effect of $A$ is $[abc-bc]/n$. Thus, the
average effect of $A$ is:
\begin{equation}\label{eq:design23A}
A=\frac{1}{4n}\big[a-(1)+ab-b+ac-c+abc-bc\big].
\end{equation}
This can found as a contrast between the four treatment combinations
in the right face of the cube in Figure~\ref{fig:design23geom}: The
effect of $A$ is simply the average of the four runs where $A$ is at
the high level ($\bar y_{A^+}$) minus the average of the four runs
where $A$ is at the low level ($\bar y_{A^-}$), or
\begin{eqnarray}\label{eq:design23Ab}
A &=& \bar y_{A^+}-\bar y_{A^-}\nonumber\\
&=& \frac{a+ab+ac+abc}{4n}-\frac{(1)+b+c+bc}{4n}.
\end{eqnarray}
The effect for $B$ and $C$ are computed in a similar manner.
\begin{figure}[htb]
\centering
\subfigure[Main effects]{
\begin{tabular}{ccc}
\includegraphics[width=.2\textwidth]{./img/23design_A.eps}&
\includegraphics[width=.2\textwidth]{./img/23design_B.eps}&
\includegraphics[width=.2\textwidth]{./img/23design_C.eps}\\
\footnotesize $A$ & \footnotesize $B$ & \footnotesize $C$
\end{tabular}}
\subfigure[Two-factor interaction]{
\begin{tabular}{ccc}
\includegraphics[width=.2\textwidth]{./img/23design_AB.eps}&
\includegraphics[width=.2\textwidth]{./img/23design_AC.eps}&
\includegraphics[width=.2\textwidth]{./img/23design_BC.eps}\\
\footnotesize $AB$ & \footnotesize $AC$ & \footnotesize $BC$
\end{tabular}}
\caption{Geometric presentation of contrasts. In each case, high levels are highlighted in blue, low levels in red.}\label{fig:design23geom}
\end{figure}
The two-factor interaction effects are also computed easily since they
reflect the difference between the average of the effects of one
factor at the two levels of the other factor. By convention, one-half
of this difference is called e.g. the $AB$ interaction:
\begin{equation}\label{eq:design23AB}
AB=\frac{\big[abc-bc+ab-b-ac+c-a+(1) \big]}{4n}
\end{equation}
or, equivalently,
\begin{equation}\label{eq:design23ABb}
AB=\frac{\big[abc+ab+c+(1)\big]}{4n}-\frac{\big[bc+b+ac+a\big]}{4n}.
\end{equation}
Finally, the $ABC$ interaction is defined as the average difference
between the $AB$ interactions for the two different levels of $C$,
that is
\begin{eqnarray}\label{eq:design23ABC}
ABC &=& \frac{1}{4n}\Big\{ [abc-bc]-[ac-c]-[ab-b]+[a-(1)] \Big\}\nonumber\\
&=& \frac{1}{4n}\big[ abc-bc-ac+c-ab+b+a-(1)\big].
\end{eqnarray}
Again, all preceding equations reflect the contrast associated with
the estimation of each effect. From these contrasts, we can expand a
table of ``$+$'' (high level) and ``$-$'' (low level) as shown in
Table~\ref{tab:design23}. Once the signs for the main effects have
been established, the remaining effects may be obtained by multiplying
the appropriate columns, row by row. Table~\ref{tab:design23} has a
number of interesting properties: (1) Except for column $I$, every
column has an equal number of plus and minus signs; (2) The sum of the
products of the signs in any two columns is zero (due to
orthogonality); (3) Column $I$ multiplied times any column leaves that
column unchanged because $I$ is an \emph{identity element}; (4) The
product of any two columns yields a column in the table. For example,
$A\times B=AB$ and $AB\times B=AB^2=A$.
Finally, sum of squares for the effects are now simply defined as
\[
SS=\frac{(\textrm{Contrast})^2}{8n}.
\]
\begin{table}[htbp]
\caption{Algebric signs for calculating effects in the $2^3$ design.}\label{tab:design23}
\centering
\begin{tabular}{p{2.5cm}cccccccc}
\hline
\multirow{2}{2.5cm}{Treatment combination}& \multicolumn{8}{c}{\bf Factorial Effect}\\
\cline{2-9}
& $I$ & $A$ & $B$ & $AB$ & $C$ & $AC$ & $BC$ & $ABC$\\
\hline
$(1)$ & $+$ & $-$ & $-$ & $+$ & $-$ & $+$ & $+$ & $-$\\
$a$ & $+$ & $-$ & $-$ & $+$ & $-$ & $+$ & $+$ & $-$\\
$b$ & $+$ & $-$ & $-$ & $+$ & $-$ & $+$ & $+$ & $-$\\
$ab$ & $+$ & $-$ & $-$ & $+$ & $-$ & $+$ & $+$ & $-$\\
$c$ & $+$ & $-$ & $-$ & $+$ & $-$ & $+$ & $+$ & $-$\\
$ac$ & $+$ & $-$ & $-$ & $+$ & $-$ & $+$ & $+$ & $-$\\
$bc$ & $+$ & $-$ & $-$ & $+$ & $-$ & $+$ & $+$ & $-$\\
$abc$ & $+$ & $-$ & $-$ & $+$ & $-$ & $+$ & $+$ & $-$\\
\hline
\end{tabular}
\end{table}
The ``plasma etch'' experiment (\texttt{plasma.txt}) is a $2^3$ design
used to develop a nitride etch process on a single-wafer plasma
etching tool. The design factor are the gap between the electrodes,
the gas flow ($C_2F_6$ is used a the reactant gas), and the RF power
applied to the cathode. Each factor is run at two levels, and the
design is replicated twice.
The data file may be loaded in the usual way:
\begin{verbatim}
plasma <- read.table("plasma.txt",header=TRUE)
plasma
\end{verbatim}
Data are shown below, before transforming
them in an appropriate \R{} \verb|data.frame|.
\begin{verbatim}
Run A B C R1 R2
1 1 -1 -1 -1 550 604
2 2 1 -1 -1 669 650
3 3 -1 1 -1 633 601
4 4 1 1 -1 642 635
5 5 -1 -1 1 1037 1052
6 6 1 -1 1 749 868
7 7 -1 1 1 1075 1063
8 8 1 1 1 729 860
\end{verbatim}
Before applying the ANOVA model, we need to switch to \R{} ``long''
format, as proposed below:
\begin{verbatim}
plasma.df <- data.frame(etch=c(plasma$R1,plasma$R2),
rbind(plasma[,2:4],plasma[,2:4]))
plasma.df[,2:4] <- lapply(plasma.df[,2:4],factor)
\end{verbatim}
Next, we simply use the \verb|aov()| function to estimate all main
effects and all interactions (Table~\ref{tab:plasmaAOV}):
\begin{verbatim}
plasma.df.aov <- aov(etch~A*B*C, data=plasma.df)
summary(plasma.df.aov)
\end{verbatim}
Table~\ref{tab:plasmaAOV} shows that the effect of $A$, $C$ and their
two-way interaction are significant. None of the other effects appears
to be significant at the usual 5\% level. The contribution of each
effect may easily computed from their respective sum of squares.
% latex table generated in R 2.9.2 by xtable 1.5-5 package
% Sun Oct 4 18:41:44 2009
\begin{table}[ht]
\caption{ANOVA table for the ``plasma etch'' experiment.}\label{tab:plasmaAOV}
\begin{center}
\begin{tabular}{lrrrrr}
\hline
& Df & Sum Sq & Mean Sq & F value & Pr($>$F) \\
\hline
A & 1 & 41310.56 & 41310.56 & 18.34 & 0.0027 \\
B & 1 & 217.56 & 217.56 & 0.10 & 0.7639 \\
C & 1 & 374850.06 & 374850.06 & 166.41 & 0.0000 \\
A:B & 1 & 2475.06 & 2475.06 & 1.10 & 0.3252 \\
A:C & 1 & 94402.56 & 94402.56 & 41.91 & 0.0002 \\
B:C & 1 & 18.06 & 18.06 & 0.01 & 0.9308 \\
A:B:C & 1 & 126.56 & 126.56 & 0.06 & 0.8186 \\
Residuals & 8 & 18020.50 & 2252.56 & & \\
\hline
\end{tabular}
\end{center}
\end{table}
To compute the response surface, we need to recode the factor with
their corresponding numerical values, which are given below:
\begin{center}
\begin{tabular}{lcc}
\hline
& Low ($-1$) & High ($+1$)\\
\hline
A (Gap, cm) & 0.80 & 1.20\\
B ($C_2F_6$ flow, SCCM) & 125 & 200\\
C (Power, W) & 275 & 325\\
\hline
\end{tabular}
\end{center}
This is easily done using, e.g.:
\begin{verbatim}
plasma.num.df <- plasma.df
levels(plasma.num.df$A) <- c(0.8,1.2)
levels(plasma.num.df$C) <- c(275,325)
plasma.num.df$A <- as.numeric(as.character(plasma.num.df$A))
plasma.num.df$C <- as.numeric(as.character(plasma.num.df$C))
plasma.num.df.lm <- lm(etch~A*C, plasma.num.df)
\end{verbatim}
Next, we proceeded as before and store in a new \verb|data.frame|
values for which we want to compute predictions from the above
regression model.
\begin{verbatim}
tmp <- list(C=seq(275,325,by=1),A=seq(0.8,1.2,by=.1))
new.data <- expand.grid(tmp)
new.data$fit <- predict(plasma.num.df.lm, new.data)
require(scatterplot3d)
s3d <- scatterplot3d(plasma.num.df$A,plasma.num.df$C,
plasma.num.df$etch,type="n",
angle=135,scale.y=1,xlab="Gap",ylab="Power")
s3d$plane3d(plasma.num.df.lm,lty.box="solid",col="darkgray")
\end{verbatim}
\chapter{Blocking and Confounding in the $2^k$ Factorial Design}
\label{cha:block-conf-2k}
\chapter{Two-Level Fractional Factorial Designs}
\label{cha:two-level-fractional}
\section{Summary of Chapter 8}
Chapter~8 is another extension of the $2^k$ factorial design where
ones aims at limiting the number of runs needed to study a large
number of two-level factors. For example, a complete replicate of a
$2^6$ design requires 64 runs, where only 6 out of the 63 degrees of
freedom correspond to main effects. If we are willing to assume that
certain high-order interaction terms are negligible, we can run only a
fraction of a complete factorial design. This kind of design is maily used
in \emph{screening experiments} where lot of factors are of interest and we
want to determine which ones to include in a future study because of their
larger effects.
\section{The one-half fraction of the $2^k$ design}
Let's say we want to study 3 factors but we cannot afford to runs all
8 treatment combinations. Using one-half fraction, we now have
$2^{3-1}=4$ treatment combinations, e.g. $a$, $b$, $c$, and
$abc$. Practically, in Table~\ref{tab:ABC} we select only treatment
combinations for which the generator $ABC$ has a plus sign.
\begin{table}[htbp]
\caption{The $2^3$ factorial design. The $2^{3-1}$ is formed by considering
only the upper part of this design.}\label{tab:ABC}
\centering
\begin{tabular}{p{2.5cm}cccccccc}
\hline
\multirow{2}{2.5cm}{Treatment combination}& \multicolumn{8}{c}{\bf Factorial Effect}\\
\cline{2-9}
& $I$ & $A$ & $B$ & $C$ & $AB$ & $AC$ & $BC$ & $ABC$\\
\hline
$a$ & $+$ & $+$ & $-$ & $-$ & $-$ & $-$ & $+$ & $+$ \\
$b$ & $+$ & $-$ & $+$ & $-$ & $-$ & $+$ & $-$ & $+$ \\
$c$ & $+$ & $-$ & $-$ & $+$ & $+$ & $-$ & $-$ & $+$ \\
$abc$ & $+$ & $+$ & $+$ & $+$ & $+$ & $+$ & $+$ & $+$ \\
\cline{2-9}
$ab$ & $+$ & $+$ & $+$ & $-$ & $+$ & $-$ & $-$ & $-$ \\
$ac$ & $+$ & $+$ & $-$ & $+$ & $-$ & $+$ & $-$ & $-$ \\
$bc$ & $+$ & $-$ & $+$ & $+$ & $-$ & $-$ & $+$ & $-$ \\
$(1)$ & $+$ & $-$ & $-$ & $-$ & $+$ & $+$ & $+$ & $-$ \\
\hline
\end{tabular}
\end{table}
\begin{figure}[htb]
\centering
\subfigure[Principal fraction,
$I=+ABC$.]{\includegraphics[width=.35\textwidth]{./img/23frac.eps}}
\includegraphics[width=.1\textwidth]{./img/23frac_leg.eps}
\subfigure[Alternate fraction,
$I=-ABC$.]{\includegraphics[width=.35\textwidth]{./img/23frac_cpl.eps}}
\caption{The two one-half fractions of the $2^3$ design.}\label{fig:23frac}
\end{figure}
\chapter{Three-Level and Mixed-Level Factorial and Fractional
Factorial Designs}
\label{cha:three-level-mixed}
\chapter{Fitting Regression Models}
\label{cha:fitt-regr-models}
\section{Summary of Chapter 10}
Chapter~10 offers a gentle introduction to regression analysis in
simple designs. The multiple linear regression model is detailed in
its scalar and matrix form, and it is shown how unbiased estimator for
the regression coefficients and the residual variance might be
obtained when solving the least squares normal equations. Some
illustrations are provided to make the connection with the analysis of
designed experiments. Finally, model diagnostics and the assessment of
goodness of fit are presented.
\section{Linear Regression Models}
When we are facing a situation with one \emph{dependent variable} or
\emph{response} $y$ that depends on $k$ \emph{independent} or
\emph{regression variables}, then we can postulate a given
relationship relating these variables, though in most cases the true
functional relationship between $y$ and $x_1, x_2, \dots, x_k$ is
unknown. The idea is to fit an empirical model, that is a linear
regression model in this particular settingwhich might take the form:
\begin{equation}\label{eq:lm}
y=\beta_0+\beta_1x_1+\beta_2x_2+\varepsilon
\end{equation}
if we consider two regressors, $x_1$ and $x_2$. The $\beta$'s are
called regression coefficients, and the model describes an hyperplane
in a $k$ dimensional space (here, $k=2$). We could also consider
adding an interaction term between $x_1$ and $x_2$ so that the
previous model becomes
\[
y=\beta_0+\beta_1x_1+\beta_2x_2+\beta_{12}x_1x_2+\varepsilon.
\]
A second-order response surface might be represented as follows:
\[
y=\beta_0+\beta_1x_1+\beta_2x_2+\beta_{11}x_1^2+\beta_{22}x_2^2+\beta_{12}x_1x_2+\varepsilon;
\]
this model is still linear in its parameters (after appropriate
substitution).
\subsection{Parameters estimation}
The method of least squares (OLS) is generally used to estimate the
regression coefficients. Let assume that the errors are gaussian
i.i.d. such $\mathbb{E}(\varepsilon)=0$ and
$\mathbb{V}(\varepsilon)=\sigma^2$. The previous model
equation~\ref{eq:lm} can be rewritten as:
\begin{eqnarray}\label{eq:lm2}
y_i &=&
\beta_0+\beta_1x_{i1}+\beta_2x_{i2}+\dots+\beta_kx_{ik}+\varepsilon_i
\nonumber\\
&=& \beta_0+\sum_{j=1}^k\beta_jx_{ij}+\varepsilon_i,\quad i=1,2,\dots,n
\end{eqnarray}
The OLS solution minimizes the sum of the squared errors, that is
\begin{equation}
\label{eq:lm3}
L=\sum_{i=1}^n \varepsilon_i^2=\sum_{i=1}^n\left(y_i-\beta_0-\sum_{j=1}^k\beta_jx_{ij}\right)^2.
\end{equation}
$L$ has to be minimized with respect to
$\beta_0,\beta_1,\dots,\beta_k$, which yields the so-called least
squares \emph{normal equations}.
Using matrix notation, Equation~\ref{eq:lm2} may be written as
\[
\bm{y}=\bm{X\beta}+\bm{\varepsilon}
\]
where
\[
\bm{y}=\begin{bmatrix}y_1\\ y_2\\ \vdots\\ y_n\end{bmatrix},\quad
\bm{X}=\begin{bmatrix}1&x_{11}&x_{12}&\cdots&x_{1k}\\1&x_{21}&x_{22}&\cdots&x_{2k}\\
\vdots&\vdots&\vdots&&\vdots\\1&x_{n1}&x_{n2}&\cdots&x_{nk}\end{bmatrix},\quad
\bm{\beta}=\begin{bmatrix}\beta_0\\ \beta_1\\ \vdots\\
\beta_k\end{bmatrix}\; \textrm{and}\; \bm{\varepsilon}=\begin{bmatrix}\varepsilon_1\\ \varepsilon_2\\ \vdots\\ \varepsilon_n\end{bmatrix}
\]
We want to find the vector $\bm{\hat\beta}$ that minimizes
\[
L=\sum_{i=1}^n \varepsilon_i^2=\bm{\varepsilon'\varepsilon}=(\bm{y}-\bm{X\beta})' (\bm{y}-\bm{X\beta}),
\]
which can also be expressed as
\begin{eqnarray}\label{eq:lm3}
L &=& \bm{y'y}-\bm{\beta'X'y}-\bm{y'X\beta}+\bm{\beta'X'X\beta}\nonumber\\
&=& \bm{y'y}-2\bm{\beta'X'y}+\bm{\beta'X'X\beta}.
\end{eqnarray}
The OLS estimators must satisfy
\[
\left.\frac{\partial L}{\partial\bm{\beta}}\right\vert_{\hat\beta}=-2\bm{X'y}+2\bm{X'X\hat\beta}=0
\]
which simplifies to $\bm{X'X\hat\beta}=\bm{X'y}$. This is the matrix
form of the aforementioned normal equations, and the least squares
estimator of $\bm{\beta}$ is
\begin{equation}
\label{eq:lm4}
\bm{\hat\beta} = (\bm{X'X})^{-1}\bm{X'y}
\end{equation}
and the normal equations read:
\[
\begin{bmatrix}
n & \displaystyle\sum_{i=1}^nx_{i1} & \displaystyle\sum_{i=1}^nx_{i2} & \cdots & \displaystyle\sum_{i=1}^nx_{ik} \\
\displaystyle\sum_{i=1}^nx_{i1} & \displaystyle\sum_{i=1}^nx_{i1}^2 & \displaystyle\sum_{i=1}^nx_{i1}x_{i2} & \cdots & \displaystyle\sum_{i=1}^nx_{i1}x_{ik} \\
\vdots & \vdots & \vdots & & \vdots \\
\displaystyle\sum_{i=1}^nx_{ik} & \displaystyle\sum_{i=1}^nx_{ik}x_{i1} & \displaystyle\sum_{i=1}^nx_{ik}x_{i2} & \cdots & \displaystyle\sum_{i=1}^nx_{ik}^2 \\
\end{bmatrix}
\begin{bmatrix}\hat\beta_0\\ \hat\beta_1\\ \vdots\\
\hat\beta_k\end{bmatrix}=
\begin{bmatrix}\displaystyle\sum_{i=1}^ny_i\\ \displaystyle\sum_{i=1}^nx_{i1}y_i\\ \vdots\\
\displaystyle\sum_{i=1}^nx_{ik}y_i\end{bmatrix}.
\]
As can be seen, the diagonal elements of $\bm{X'X}$ are the sum of
squares of the elements in $\bm{X}$.
The fitted regression model is
\begin{equation}
\label{eq:lm5}
\bm{\hat y}=\bm{X\hat\beta},
\end{equation}
and the $(n\times 1)$ vector of residuals, which reflects the
difference between the actual observation $y_i$ and the fitted value
$\hat y_i$ is denoted by
\begin{equation}
\label{eq:lm6}
\bm{e}=\bm{y}-\bm{\hat y}.
\end{equation}
To estimate $\sigma^2$, we use the SS of the residuals,
$\textrm{SS}_E=\sum_{i=1}^n(y_i-\hat
y_i)^2=\sum_{i=1}^ne_i^2=\bm{e'e}$. After some re-expressions, we
obtain the following \emph{error} or \emph{residual sum of squares} with $n-p$ degrees of freedom
\begin{equation}
\label{eq:lm7}
\textrm{SS}_E=\bm{y'y}-\bm{\hat\beta'X'y}.
\end{equation}
It can be shown that $\mathbb{E}(\textrm{SS}_E)=\sigma^2(n-p)$, so
that an unbiased estimator of $\sigma^2$ is
\begin{equation}
\label{eq:lm8}
\hat\sigma^2=\frac{\textrm{SS}_E}{n-p}.
\end{equation}
\subsection{A simple multiple regression example}
Let's consider the measurement of the viscosity of a polymer and two
process variables--Temperature (°C) and Catalyst feed rate (lb/h),
that will be modeled as $y=\beta_0+\beta_1x_1+\beta_2x_2+\varepsilon$.
\begin{verbatim}
visc <- read.table("viscosity.txt", header=TRUE)
lm.fit <- lm(Viscosity ~ Temperature+Catalyst, data=visc)
summary(lm.fit)
\end{verbatim}
From the commands above, we see that the OLS fit is $\hat
y=1566.08+7.62x_1+8.58x_2$, where $x_1$ stands for the Temperature and
$x_2$ is the Catalyst feed rate.
Diagnostic or residual plots are useful to check that the residuals do
not depend on the fitted values $\hat y$.
In \R,
\begin{verbatim}
op <- par(mfrow=c(2,2), las=1)
plot(lm.fit)
par(op)
\end{verbatim}
would plot up to six diagnostic graphics; the four default ones are:
(a) residuals against fitted values, (b) Scale-Location,
i.e. $\sqrt{|\tilde e_i|}$ against fitted values, (c) normal Q-Q plot,
and (d) residuals against leverage. Plots b--c use standardized
residuals defined as $e_i/(\hat\sigma\sqrt{1-h_{ii}})$, where $h_{ii}$
are the diagonal elements of the ``hat matrix''
$\bm{X}(\bm{X'X})^{-1}\bm{X'}$; they are assumed to have constance
variance. In figure~\ref{fig:lm1}, we plot two of them to show how the
variance of the observed viscosity tends to increase with increasing
viscosity magnitude.
\begin{figure}[htb]
\centering
\includegraphics[width=.8\textwidth]{./img/lm1_resid.eps}
\caption{Normality probability plot of residuals (left) and plot of
residuals against predicted viscosity (right).}\label{fig:lm1}
\end{figure}
Looking at partial residual plots (\verb|plot(resid(lm.fit) ~ visc$Temperature)|) further suggests that viscosity variance increases
with temperature (Fig.~\ref{fig:lm1b}).
Of note, more elaborated partial residuals plots are available in the
\verb|car| package, e.g.
\begin{verbatim}
library(car)
crPlots(lm.fit)
\end{verbatim}
\begin{figure}[htb]
\centering
\includegraphics[width=.8\textwidth]{./img/lm1_resid2.eps}
\caption{Plot of residuals \emph{vs.} each predictor.}\label{fig:lm1b}
\end{figure}
\subsection{Regression analysis of a $2^3$ factorial design}
In this designed experiment, we are interested in investigating the
yield of a process. The variables of interest are: temperature
($x_1$), pressure ($x_2$), and catalyst concentration ($x_3$). Each
variable can be run at a low and a high level, yielding a $2^3$ design
with four center points (Fig.~\ref{fig:lm2}); factor levels are coded
as $\pm 1$.
Let's consider the following model, with main effects only:
\[
y = \beta_0+\beta_1x_1+\beta_2x_2+\beta_3x_3+\varepsilon.
\]
It is easy to check that
\[
\bm{X'X}=
\begin{bmatrix}
12 & 0 & 0 & 0\\
0 & 8 & 0 & 0 \\
0 & 0 & 8 & 0 \\
0 & 0 & 0 & 8
\end{bmatrix}\qquad
\bm{X'y}=
\begin{bmatrix}
612\\ 45\\ 85\\ 9
\end{bmatrix}
\]
Now, using the \verb|lm()| function,
\begin{verbatim}
yield2 <- read.table("yield2.txt", header=TRUE,
colClasses=c(rep("factor",3),"numeric"))
levels(yield2$Temp) <- c(-1,0,1)
levels(yield2$Pressure) <- c(-1,0,1)
levels(yield2$Conc) <- c(-1,0,1)
lm.fit <- lm(Yield ~ as.numeric(as.character(Temp))+
as.numeric(as.character(Pressure))+
as.numeric(as.character(Conc)), data=yield2)
summary(lm.fit)
\end{verbatim}
the fitted regression model is
\[
\hat y = 51.000+5.625x_1+10.625x_2+1.125x3.
\]
Note that we need to convert the original scales into the recoded
factor levels, and using simply \verb|as.numeric()| would not be
sufficient as it would automatically convert factor values into
$\{1,2,3\}$ (this won't change anything except for the intercept term).
\begin{figure}[htb]
\centering
\hfill
\begin{minipage}[c]{0.45\textwidth}
\begin{footnotesize}
\begin{tabular}{cccc}
\hline
& \multicolumn{3}{c}{Factor}\\
Run & $x_1$ & $x_2$ & $x_3$\\
\hline
1 & $-1$ & $-1$ & $-1$\\
2 & $+1$ & $-1$ & $-1$\\
3 & $-1$ & $+1$ & $-1$\\
4 & $+1$ & $+1$ & $-1$\\
5 & $-1$ & $-1$ & $+1$\\
6 & $+1$ & $-1$ & $+1$\\
7 & $-1$ & $+1$ & $+1$\\
8 & $+1$ & $+1$ & $+1$\\
9 & $0$ & $0$ & $0$\\
10 & $0$ & $0$ & $0$\\
11 & $0$ & $0$ & $0$\\
12 & $0$ & $0$ & $0$\\
\hline
\end{tabular}
\end{footnotesize}\\
\vfill
{\small (a) Design matrix}
\end{minipage}\hfill
\begin{minipage}[c]{0.45\textwidth}
\includegraphics[width=.9\textwidth]{./img/lm23.eps}\\
{\small (b) Geometric view}
\end{minipage}
\caption{Experiment design for the chemical experiment.}\label{fig:lm2}
\end{figure}
To see the connection with the analysis of $2^3$ design discussed in
Chapter~??, let compute the effect of temperature as the difference of
means between high and low levels, that is $T=\bar y_{T^+}-\bar
y_{T^-}$:
\begin{verbatim}
diff(with(subset(yield2, Temp != 0),
tapply(Yield, droplevels(Temp), mean)))
\end{verbatim}
This is twice the regression coefficient for temperature. For every
$2^k$ design, the regression coefficient is always one-half the usual
effect estimate. Moreover, this shows that effect estimates from a
$2^k$ design are least squares estimates.
In the previous example, $\bm{X'X}$ was diagonal, so that its inverse
was also diagonal. This means that the estimators of all regression
coefficients are uncorrelated, or
$\textrm{Cov}(\hat\beta_i,\hat\beta_j)=0,\, \forall i\neq j$. To
deliberately achieve such a result, one must choose a design matrix
where columns are orthogonal (i.e., linearly independent): This is
called an \emph{orthogonal design}, as is the case for $2^k$ factorial
designs. Montgomery gives two examples of how regression analysis might be
useful in the case of non-orthogonal designs, either after removing
one run from the preceding experiment or after introducing small
discrepancies in the precision of the factor levels.
\subsection{Aliasing and fractional design}
Another application of regression analysis in the spirit of the
analysis of designed experiments is the de-aliasing of interactions in
a fractional factorial plan.
\section{Regression analysis: Hypothesis Testing and Confidence intervals}
\section{Prediction of new observations}
\section{Regression Model diagnostics}
\chapter{Response Surface Methods and Designs}
\label{cha:resp-surf-meth}
\chapter{Robust Parameter Design and Process Robustness Studies}
\label{cha:robust-param-design}
\chapter{Experiments with Random Factors}
\label{cha:exper-with-rand}
\begin{verbatim}
strength <- read.table("strength.txt",header=TRUE)
strength$Looms <- factor(strength$Looms)
strength$Obs <- factor(strength$Obs)
library(nlme)
strength.lmm <- lme(y~as.factor(Looms),data=strength,random=~1)
\end{verbatim}
\chapter{Nested and Split-Plot Designs}
\label{cha:nested-split-plot}
\chapter{Other Design and Analysis Topics}
\label{cha:other-design-analys}
%\addcontentsline{toc}{chapter}{\numberline{}Bibliography}
\bibliographystyle{plain}
\bibliography{dae}
\clearpage
\appendix
\addcontentsline{toc}{chapter}{Appendix}
\section*{Standard Normal and Student $t$ distributions}
\label{chap:distributions}
\begin{table}[!h]
\begin{center}
\caption{Standard Normal Distribution. Cumulative distribution
function of the ${\cal N}(0,1)$.}
\vspace{0.5cm}
{\scriptsize \input{inputs/tabnorm1.tex}\input{inputs/tabnorm2.tex}}
\end{center}
\end{table}
\begin{table}[!h]
\begin{center}
\caption{Standard Normal Distribution. Quantiles $u(\beta)$
of the order $\beta$ from ${\cal N}(0,1)$.}
\vspace{0.5cm}
{\scriptsize \input{inputs/tabnorm1_inv.tex}\input{inputs/tabnorm2_inv.tex}}
\label{dist-ncr}
\end{center}
\end{table}
\begin{table}[!h]
\begin{center}
\caption{Student distribution. Quantiles $u(\beta)$ of the order $\beta$
from ${\cal T}(\nu)$.}
\vspace{0.5cm}
{\scriptsize \input{inputs/tabstudent1_inv.tex}\input{inputs/tabstudent2_inv.tex}}
\label{dist-t}
\end{center}
\end{table}
\clearpage
\listoffigures
\listoftables
\end{document}