# Covariance structures

When fitting simple models (as in many examples of univariate
analysis one needs to specify only the model equation (the
bit that looks like `y ~ mu...`

) but nothing about the covariances that
complete the model specification. This is because asreml assumes that, in
absence of any additional information, the covariance structure is the product
of a scalar (a variance component) by a design matrix. For example, the
residual covariance matrix in simple examples is **R** = **I**
σ_{e}^{2}, or the additive genetic variance matrix is
**G** = **A** σ_{a}^{2} (where **A** is the numerator
relationship matrix).

However, there are several situations when the analysis require a more complex
covariance structure, usually a direct sum or direct product of two or more
matrices. For example, an analysis of data from several sites might consider
different error variances for each site, that is **R** = Σd
**R**_{i}, where Σd represents a direct sum (see any matrix
algebra book for an explanation) and **R**_{i} is the residual matrix
for site *i*.

Other example of a more complex covariance structure is a multivariate
analysis in one site, where both the residual and additive genetic covariance
matrices are constructed as the product of two matrices. For example, **R** =
**I** * **R**_{0}, where **I** is an identity matrix of size number of
observations, * is the direct product operation (do not confuse with a plain
matrix multiplication) and **R**_{0} is the error covariance matrix
for the traits involved in the analysis. Similarly, **G** = **A** *
**G**_{0} where all the matrices are as previously defined and
**G**_{0} is the additive covariance matrix for the traits.

You will see that asreml’s (and asreml-r’s) notation for this type of analysis
closely resembles matrix notation. asreml-r supports a large number of
covariance structures (and I will present only a few of them), which are
particularly useful for longitudinal and spatial analysis. The structures are
easier to understand (at least for me) if we express a covariance matrix
(**M**) as the product of a correlation matrix (**C**) pre- and postmultiplied
by a diagonal matrix (**D**) containing standard deviations for each of the
traits. That is:

| v11 c12 c13 c14 | | 1 r12 r13 r14 | | c21 v22 c23 c24 | | r21 1 r23 r24 | M = | c31 c32 v33 c34 | C = | r31 r32 1 r34 | | c41 c42 c43 v44 | | r41 r42 r43 1 | | s11 0 0 0 | | 0 s22 0 0 | D = | 0 0 s33 0 | | 0 0 0 s44 | M = D*C*D

where the v are variances, the r correlations and the s standard deviations.

If we do not impose any restriction on **M**, apart from being positive (p.d.)
definite, we are talking about an unstructured matrix (US in asreml parlance).
Thus, **M** or **C** can take any value (as long as it is p.d.) as is usual
when analyzing multiple trait problems.

There are cases when the order of assessment or the spatial location of the
experimental units create patterns of variation, which are reflected by the
covariance matrix. For example, the breeding value of an individual *i*
observed at time *j* (a_{ij}) is a function of genes involved in
expression at time j - k (a_{ij-k}), plus the effect of genes acting
in the new measurement (α_{j}), which are considered independent
of the past measurement a_{ij} = ρ_{jk} a_{ij-k} +
α_{j}, where ρ_{jk} is the additive genetic
correlation between measures j and k.

Rather than using a different correlation for each pair of ages, it is
possible to postulate mechanisms which model the correlations. For example, an
autoregressive model (AR in asreml lingo), where the correlation between
measurements j and k is r^{|j-k|}. In this model **M** = **D** *
**C**_{AR}* **D**, where **C**_{AR} (for equally spaced
assessments) is:

| 1 r^1 r^2 r^3 | | r^1 1 r^1 r^2 | C_AR = | r^2 r^1 1 r^1 | | r^3 r^2 r^1 1 |

A model including this structure will certainly be more parsimonious (economic on terms of number of parameters) than one using an unstructured approach. Looking at the previous pattern it is a lot easier to understand why they are called ‘structures’. A similar situation is considered in spatial analysis, where the ‘independent errors’ assumption of typical analyses is relaxed. A common spatial model will consider the presence of autocorrelated residuals in both directions (rows and columns). Here the level of autocorrelation will depend on distance between trees rather than on time.

Another structure, based on random regressions, is explained in the longitudinal analysis section of the cookbook. Asreml allows fitting many more different structures, so see variance model specification in the manual for more details.