A State Space Modeling Approach to Mediation Analysis

A State Space Modeling Approach
to Mediation Analysis
Fei Gu
McGill University
Kristopher J. Preacher
Vanderbilt University
Emilio Ferrer
University of California, Davis
Mediation is a causal process that evolves over time. Thus, a study of
mediation requires data collected throughout the process. However, most
applications of mediation analysis use cross-sectional rather than longitudinal data. Another implicit assumption commonly made in longitudinal designs
for mediation analysis is that the same mediation process universally applies
to all members of the population under investigation. This assumption ignores
the important issue of ergodicity before aggregating the data across subjects.
We first argue that there exists a discrepancy between the concept of mediation and the research designs that are typically used to investigate it. Second,
based on the concept of ergodicity, we argue that a given mediation process
probably is not equally valid for all individuals in a population. Therefore, the
purpose of this article is to propose a two-faceted solution. The first facet of
the solution is that we advocate a single-subject time-series design that aligns
data collection with researchers’ conceptual understanding of mediation. The
second facet is to introduce a flexible statistical method—the state space
model—as an ideal technique to analyze single-subject time series data in
mediation studies. We provide an overview of the state space method and illustrative applications using both simulated and real time series data. Finally, we
discuss additional issues related to research design and modeling.
Keywords: mediation, state space model
1. Introduction
Mediation is a causal process that evolves over time. In the simplest case, the
causal variable (X) exerts an effect on the outcome variable (Y) partially or
Journal of Educational and Behavioral Statistics
2014, Vol. 39, No. 2, pp. 117–143
DOI: 10.3102/1076998614524823
# 2014 AERA. http://jebs.aera.net
completely through a mediator variable (M) over time. Clearly, time plays an
explicit role in the mediation process. In one of the earliest articles devoted specifically to mediation, Judd and Kenny (1981) emphasized the role of time, even
placing the words ‘‘process analysis’’ in the title of their article. More recently,
Schmitz (2006) provided an overview of the necessity of process analyses in the
context of learning and instruction. Ideally, the empirical study of mediation
requires (1) data collected throughout the process and (2) pertinent statistical
methods that can capture the dynamic mechanism underlying the causal process.
However, process-oriented methods that explicitly consider the role of time are
not common in educational and psychological research (Schmitz, 2006). In the
context of mediation, Cole and Maxwell (2003) discussed that most applications
of mediation analysis use cross-sectional rather than longitudinal data, and not all
methodological treatments acknowledge the necessity of considering time
(see Maxwell & Cole, 2007; Maxwell, Cole, & Mitchell, 2011).
Another implicit assumption commonly made in mediation analysis is that the
same mediation process universally applies to all members of the population
under investigation. This assumption is made in a handful of longitudinal models
recently developed within the structural equation modeling (SEM) framework
proposed to study mediation processes. Cheong, MacKinnon, and Khoo (2003)
used a parallel process latent growth curve model to investigate the effect of a
causal variable on the change in an outcome variable through change in a mediator variable. Gollob and Reichardt (1991) and Cole and Maxwell (2003)
implemented a cross-lagged panel model emphasizing longitudinal relations
between the absolute level of the causal variable and the outcome variable
through the mediator variable. Another variant of a longitudinal SEM approach
is the latent difference score model, in which differences between adjacent observations are treated as latent variables (e.g., Hamagami & McArdle, 2007;
MacKinnon, 2008; McArdle, 2001; McArdle & Nesselroade, 1994; Selig &
Preacher, 2009). In the traditional regression framework, Judd, Kenny, and
McClelland (2001) advocated the use of within-subject designs (in contrast to
between-subjects designs) to assess mediation and moderation, where individuals are put into both treatments. In all models mentioned previously, parameters
are estimated by pooling information across subjects. Although the longitudinal
designs in which these models are applied take the role of time into consideration, they are still limited in that none of them considers the important issue
of ergodicity before aggregating the data across subjects (ergodicity is described
later). As we will show shortly, the conclusions drawn from pooling across subjects may not be informative about how single subjects behave (e.g., Ferrer &
Widaman, 2008; Molenaar, 2004).
In this article, we contend that (a) there is a discrepancy between the concept
of mediation and the research designs that are typically undertaken to investigate
it in practice and (b) based on the concept of ergodicity, there is not necessarily a
universal mediation process that is equally valid for everyone in the population.
A State Space Modeling Approach to Mediation Analysis
Therefore, the purpose of this article is to propose a two-faceted solution to the
persistent problem of using cross-sectional designs in mediation analysis. The
first facet of the solution is to encourage researchers to rethink how they
approach the research design for mediation studies; that is, we advocate moving
from the traditional cross-sectional design to a single-subject time-series design
that aligns data collection with researchers’ conceptual understanding of mediation. The second facet is to introduce a flexible statistical method—the state
space model (SSM)—as an ideal technique to analyze single-subject1 timeseries data in mediation studies. In Section 2, we elaborate on the theoretical
foundation of the single-subject time-series design for mediation analysis. In
Section 3, we provide an overview of state space methods and two illustrative
applications using simulated and empirical data sets. In Section 4, we discuss
additional issues related to research design and modeling.
2. Foundation of the Single-Subject Time-Series Design
In a cross-sectional design, the focus is on the analysis of interindividual
variation—differences among different units at a single point in time. According
to a survey in 2005 of the five American Psychological Association journals publishing the most articles studying mediation, Maxwell and Cole (2007) reported
that more than half of the mediation studies were based on cross-sectional data.
However, according to the concept of mediation as well as the basic requirements
for causal inference, mediation must involve at least two relations that unfold
over time. Specifically, the effect of a causal variable (X) is first exerted on the
mediator variable (M, i.e., X ! M), and then, this effect is carried over to the outcome variable (Y, i.e., M ! Y). Thus, it is immediately clear that a certain amount
of time must elapse for the effect of X to reach Y. Therefore, a requirement for
any mediation analysis is the consideration of the role of time, that is, the necessity of the analysis of intraindividual variation—changes in the same unit over
time. By comparing the concept of mediation and how mediation analysis was
conducted in the literature, it is evident that there exists a large discrepancy
between how mediation is theoretically conceptualized and how it has actually
been modeled in the past. This discrepancy raises an important validity issue concerning the equivalence between the analysis of interindividual variation and the
analogous analysis of intraindividual variation.
Cole and Maxwell (2003) argued and demonstrated that very restrictive conditions are required to ensure accurate results from mediation analysis based on
cross-sectional data. In reality, such restrictive conditions almost never occur,
and bias in cross-sectional analyses of longitudinal mediation has been amply
demonstrated (Maxwell & Cole, 2007; Maxwell et al., 2011). The divide between
interindividual variation and intraindividual variation not only exists in mediation analysis but also appears in areas such as test theory, factor analysis, and
developmental psychology (Molenaar, 2004, 2008a, 2008b).
Gu et al.
Given the existence of this divide in various areas, an important question
becomes whether this division between orientations is justified. Unfortunately,
as a direct consequence of the classical ergodic theorems, the answer to the question is ‘‘no.’’ In fact, equivalence between the analysis of inter- and intraindividual variation is established only for ergodic processes (Molenaar, 2004). It is
worth noting that, outside the context of mediation and in the broader sense of
studying behavior, the inter-/intraindividual debate can be traced back to an older
distinction between nomothetic lawfulness, emphasizing generality in the population, and idiographic characterization, emphasizing the uniqueness of the individual (e.g., Allport, 1937; Lamiell, 1981, 1988; Molenaar, 2004; Rosenzweig,
1958; van Kampen, 2000; Zevon & Tellegen, 1982). When individuals differ
qualitatively rather than quantitatively, ‘‘Qualitative differences mistaken for
quantitative differences can seriously distort relationships and are a prescription
for diluted nomothetic relationships’’ (Nesselroade, Gerstorf, Hardy, & Ram,
2007, p. 219). The following subsection gives a brief, heuristic description of the
concept of ergodicity as the foundation of the single-subject time-series design.
2.1. Ergodicity
From the perspective of dynamical systems, a process is said to be ergodic if
the average of a single trajectory over time (structure of intraindividual variation)
is equal to the average of the ensemble of trajectories at a single point in time
(structure of interindividual variation by pooling across subjects). In order to
understand the consequences of ergodicity in psychology, the development of
human behavior over time is conceived of as a unique high-dimensional space
that contains dynamic processes and all the relevant information about the subject (cf. Molenaar, 1994, 2004, 2008a, 2008b; Molenaar & Ram, 2009, 2010;
Sinclair & Molenaar, 2008). For a particular individual, a finite sample of the
dynamic process over consecutive time points (usually evenly spaced) constitutes a trajectory in his or her behavior space. This trajectory carries information
about intraindividual variation. Correspondingly, a finite sample of the same
behavior space from a group of individuals at a single point in time represents
an ensemble of trajectories, carrying information about interindividual variation
from this group of individuals. Hence, the question of whether the divide
between orientations is justified becomes a question of whether the developmental trajectory of human behavior is ergodic. This, in turn, reduces to the question
of whether stationarity and homogeneity hold for a Gaussian process. Therefore,
examining the stationarity and homogeneity criteria for a given process are
essential empirical steps.
For a Gaussian process, two criteria are required to be met simultaneously for
a process to be considered ergodic: stationarity and homogeneity. In terms of the
first criterion, a stationary process refers to a stochastic process whose joint probability distribution is time-invariant. For a Gaussian process, stationarity2
A State Space Modeling Approach to Mediation Analysis
requires that the first two moments of the process are time-invariant. That is, the
mean function of the time series is a constant, and the covariance function of the
time series depends only on relative time differences (i.e., ‘‘lag’’). Nonstationary
processes, however, are the norm in psychology, for example, learning and
developmental trajectories.
Regarding the second criterion, homogeneity refers to the situation in which each
member of a population obeys the same dynamic law and follows the same statistical
model, constituting exchangeable replications of each other, much as molecules of a
homogeneous gas. The reality, however, is quite the opposite. In fact, heterogeneity
is a general characteristic of human populations. Furthermore, besides the widely
recognized genetic and environmental effects that cause heterogeneity, Molenaar,
Boomsma, and Dolan (1993) arguedthatthere exists a third source of developmental
differences: self-organization of nonlinear epigenetic processes.
Since nonstationarity, heterogeneity, or both are thought to be the rule rather
than the exception in most psychological processes, such processes are then
nonergodic, which means that the structure of interindividual variation is not
equivalent to the structure of intraindividual variation. Therefore, we conclude
that there is no equivalence between measurement orientations in the majority
of cases. This implies that there are not necessary lawful relationships between
the analysis of inter- and intraindividual variation. Thus, in cases concerning processes that unfold over time, statistical analyses should focus on intraindividual
variation, with greater emphasis on single-subject time-series designs. This conclusion, therefore, also applies to mediation analysis.
Discussions of the implications and consequences of ergodicity started to
appear in many areas of psychological research about two decades ago (e.g.,
Molenaar, 1994; Molenaar, 2004, 2008a, 2008b; Molenaar & Campbell, 2009;
Molenaar & Ram, 2009, 2010; Molenaar, Sinclair, Rovine, Ram, & Corneal,
2009; Nesselroade & Molenaar, 1999; Sinclair & Molenaar, 2008, and most
recently Hamaker, 2012). However, there is virtually no mention of ergodicity
in the mediation literature (but see Roe, 2012).
3. Overview of State Space Methods
After establishing the theoretical foundation of the time-series design, processoriented methods are required to analyze the time series data. In this section, we
discuss the second facet of our proposed solution, that is, an overview of SSM. This
is, we believe, the first effort to utilize SSM to investigate mediation. As the first
application of SSM in the context of mediation, we provide some essential basics
of the most straightforward and frequently discussed variant of SSM in the timeseries and econometrics literature, that is, the linear Gaussian SSM. Due to space
limitations, our introduction is brief. For more comprehensive treatments of the
state space methodology in general, we refer readers to Commandeur and
Koopman (2007), Harvey (1989), and Durbin and Koopman (2001).
Gu et al.
3.1 History of State Space Modeling in Psychology and Its Application to
Mediation Analysis
State space methods have their origin in control theory, beginning with the
groundbreaking article by Kalman (1960). Applications in astronautics were initially developed (and are still used) for accurately tracking the position and velocity of moving objects such as aircraft, missiles, and rockets. Shortly after its
application in engineering, SSM also found application in time series analysis and
econometrics (e.g., Aoki, 1987; Harvey, 1989). More recently, quantitative social
and behavioral scientists have begun applying SSM because of its statistical flexibility for evaluating both the measurement properties and the lead–lag relationships among latent variables in psychological processes. Analytic similarities
and differences between the currently dominant SEM and the relatively newly
emerging SSM in the psychology literature are discussed by several authors. Specifically, MacCallum and Ashby (1986) noted that SSM is a special case of SEM,
while Otter (1986) showed the reverse. Chow, Ho, Hamaker, and Dolan (2010)
reconciled the two approaches and provided a more detailed discussion of the relative strengths and weaknesses of both approaches vis-a`-vis their use in representing
intraindividual dynamics and interindividual differences.
Another line of research involving SSM in psychology is built upon the state
space representation of the dynamic factor model (DFM; Molenaar, 1985).
Dynamic factor analysis was proposed to combine P-technique factor analysis
(Cattell, 1963; Cattell, Cattell, & Rhymer, 1947) and time series analysis (Browne
& Nesselroade, 2005; Molenaar, 1985; Molenaar, de Gooijer, & Schmitz, 1992;
Nesselroade, McArdle, Aggen, & Meyers, 2002). Some recent work devoted to
methodological discussions and substantive applications of DFM can be found
in the psychology literature (e.g., Chow, Nesselroade, Shifren, & McArdle,
2004; Ferrer & Nesselroade, 2003; Hershberger, Corneal, & Molenaar, 1994;
Nesselroade & Molenaar, 1999; Sbarra & Ferrer, 2006; Shifren, Hooker, Wood,
& Nesselroade, 1997; Wood & Brown, 1994; Zhang & Browne, 2006). Given the
strong similarity between SSM and DFM, the two terms frequently are used interchangeably, and DFMs often are expressed in state space form to exploit better
parameter estimation properties (Hamaker, Dolan, & Molenaar, 2005; Ho, Ombao,
& Shumway, 2005; Ho, Shumway, & Ombao, 2006; Molenaar, 1994; Song & Ferrer, 2009, 2012; Zhang, Hamaker, & Nesselroade, 2008).
Although SSM models have been used in many areas of research, their application in questions about mediation is not available in the literature. Statistically,
probably all existing longitudinal models for studying mediation can be represented in their state space form, and identical results can be obtained. However,
we do not pursue this direction because of the consequences of the ergodic theorems stated before. Our proposed approach, instead, is based on the specification of an SSM for analyzing time series data from a single subject (especially
when the number of measurement occasions is large).
A State Space Modeling Approach to Mediation Analysis
In practice, modeling the mediation process as an SSM has important benefits.
First, it allows the researcher to investigate time-related sequences among variables (i.e., predictor ! mediator ! outcome), as the process represented by
these variables unfolds over time. Although some longitudinal structural equation models also allow such investigation, the state space approach provides a
better and more thorough depiction because some mediation processes need a
longer time to unfold. Second, SSM can accommodate complex specifications
such as measurement structures and second-order factors.3 Third, state space
analyses at the individual level provide a theoretically sound, bottom-up
approach to create homogeneous subpopulations. If a hypothetical model fits
separately the time series data from several subjects, a homogeneous subpopulation can be created from the analyses at the individual level, and generalized conclusions can be drawn for this subpopulation. The bottom-up approach, however,
can be labor-intensive. More discussion of multiple-subject time series is given
in subsequent sections.
3.2. The Linear Gaussian SSM
Currently, there is no standard notation in the literature for SSM, and different
authors have different preferences. Based on the similarity between SEM and
SSM, we choose to use LISCOMP notation to present the formulation of SSM.
The benefit of using LISCOMP notation is that each matrix has gained a standard
interpretation in the literature, thus providing a convenient and familiar notation.
Let yt be a p-variate vector representing p manifest variables, Zt a q-variate vector
representing q-latent variables (p q 1), and t ¼ 1, 2, … , T denotes the time
point for the corresponding vector. The general linear Gaussian SSM contains a
measurement equation
yt ¼ tt þ LtZt þ et; et MVNð0; tÞ;
and a transition equation,
Zt ¼ at þ BtZt1 þ zt; zt MVNð0; CtÞ;
where tt is a p 1 vector for intercepts, Lt is a p q loading matrix, et is a p 1
vector for measurement errors (also known as innovations in the time series literature), t is a p p diagonal covariance matrix, at is a q 1 vector for means,
Bt is a q q transition matrix, zt is a q 1 vector for residuals, and Ct is a q q
covariance matrix. The measurement errors and residuals are assumed to be serially independent and independent of each other at all time points. We denote as yt
the vertical vector that collects all parameters in tt, Lt, t, at, Bt, and Ct, and the
subscript t means that yt is time varying, thus resulting in the time-varying SSM.
Applications of the time-varying SSM can be found in recent articles by Molenaar, Sinclair, Rovine, Ram, and Corneal (2009), Sinclair and Molenaar
(2008), and Chow, Zu, Shifren, and Zhang (2011).
Gu et al.
Here we consider only the time-invariant SSM, with the understanding that
the model could be extended to include time-varying parameters. The measurement and transition equations are thus simplified to
yt ¼ t þ LZt þ et; et MVNð0; Þ
Zt ¼ a þ BZt1 þ zt; zt MVNð0; CÞ;
in which the parameter vector, y, is time-invariant.
3.3. Parameter Estimation and the Kalman Filter
Unknown parameters of the linear Gaussian SSM are estimated via a recursive
algorithm, called the Kalman filter (KF). The KF4 algorithm is initialized with the
latent variable, Z0|0, and the associated covariance matrix, P0|0, and proceeds with
the prediction and filtering steps iteratively at each time point. When t ¼ 1, the prediction step gives the predicted latent variable and its covariance matrix, that is,
Z1j0 ¼ a þ BZ0j0
P1j0 ¼ BP0j0B0 þ C:
As a byproduct, the one-step-ahead prediction error and its associated covariance
matrix are obtained, that is,
e1 ¼ y1 y1j0 ¼ y1 ðt þ LZ1j0Þ
D1 ¼ LP1j0L0 þ :
Then, the filtering step uses the observed value at t ¼ 1 to update the predicted
values, giving
K1 ¼ P1j0L0
1 ðKalman gain matrixÞ
Z1j1 ¼ Z1j0 þ K1e1 ¼ Z1j0 þ P1j0L0
1 e1
P1j1 ¼ P1j0 K1D1K0
1 ¼ P1j0 P1j0L0
1 LP1j0:
When t ¼ 2, Z1|1 and P1|1 are used in the prediction step, followed by the filtering
step to calculate Z2|2 and P2|2, and so on. In sum, for t ¼ 1, 2, … , T, the recursive
KF algorithm can be written as
Ztjt1 ¼ a þ BZt1jt1
Ptjt1 ¼ BPt1jt1B0 þ C
et ¼ yt ytjt1 ¼ yt ðt þ LZtjt1Þ
Dt ¼ LPtjt1L0 þ
Kt ¼ Ptjt1L0
Ztjt ¼ Ztjt1 þ Ktet ¼ Ztjt1 þ Ptjt1L0
t et
Ptjt ¼ Ptjt1 KtDtK0
t ¼ Ptjt1 Ptjt1L0
t LPtjt1:
Inserting et and Dt at each time point into the log-density function of the multivariate normal distribution and summing all log-density functions, the prediction
error decomposition (PED; Schweppe, 1965) function is obtained:
A State Space Modeling Approach to Mediation Analysis
PED ¼ 1
p logð2Þ log Dt j j e0
t et
Giving certain starting values in y and maximizing the PED function with respect
to y provides the parameter estimates.
3.4. Testing Mediation: Bootstrapping the Time Series Data
As described in the previous sections, early definitions of direct and indirect
effects in mediation analysis were based on cross-sectional designs (Baron &
Kenny, 1986). These definitions were theoretically inaccurate because of the lack
of consideration of time. Modern definitions of the concepts advocate that the
causal variable, the mediator variable, and the outcome variable of a mediation
process should be obtained at different occasions (Collins, Graham, & Flaherty,
1998; Gollob & Reichardt, 1991). Figure 1 illustrates an example of the simplest
three-variate model in which time is taken into account. This figure illustrates the
concepts of direct and indirect effects in mediation analysis. In this model, a is
defined as the direct effect of Xt1 on Mt, b is the direct effect of Mt on Ytþ1, and
c is the direct effect of Xt1 on Ytþ1. The indirect effect of Xt1 on Ytþ1 is defined
as the product of a and b, that is, ab. For more complicated models, the indirect
effect can be a product of more parameters linking several different occasions.
In order to evaluate the direct effects, several statistical tests are available, for
example, the Wald test and the likelihood ratio test. Testing the significance of
the indirect effects is also of great importance but more difficult, and the common
direct methods just mentioned are not appropriate because of the nonnormality of
the sampling distribution of the indirect effect. As an alternative, the use of bootstrap confidence intervals (CIs) is recommended (Bollen & Stine, 1990; Hayes,
2009; MacKinnon, Lockwood, & Williams, 2004; Preacher & Hayes, 2004,
2008a, 2008b; Shrout & Bolger, 2002). The standard nonparametric bootstrap
involves two steps. In the first step, a resample of size N is drawn with replacement from the original sample. In the second step, model parameters are estimated from this resample. The two steps are replicated B times (where B is
large), so that the sampling distribution of the statistic of interest can be obtained.
FIGURE 1. Direct and indirect effects in the simplest mediation model.
Gu et al.
At the .95 level, the 2.5th and 97.5th percentiles are chosen to construct the CI to
conduct the significance test of a single parameter, or of a product of parameters,
by examining whether the CI excludes 0 (indicating a significant effect). Bootstrapping the time series data, however, poses yet another difficulty because of
the temporal dependence of the time series data (Zhang & Browne, 2006). In general, the standard nonparametric bootstrap is not appropriate for time series data
because it destroys the inherent time dependency in the data. In this section, we
introduce two bootstrap methods appropriate for SSM, namely the parametric
bootstrap and the residual-based bootstrap. Besides these two methods, there are
other approaches appropriate for SSM (e.g., Zhang & Chow, 2010).
The parametric bootstrap is essentially a Monte Carlo simulation, in which
repeated bootstrap samples are simulated from a specified model, where the estimates from the original sample are treated as parameters. The underlying
assumption is that the specified model is correct in the population. To generate
each random sample, a number of steps are followed:
1. Generate Z0 from MVN (0, 100 Iq).
2. Set the iteration number t ¼ 1.
3. Generate zt from MVN (0, C^ ).
4. Calculate Zt using Zt ¼ a^ þ B^Zt1 þ zt.
5. Generate et from MVN (0, ^ ).
6. Calculate yt using yt ¼ ^t þ L^Zt þ et.
7. Set t ¼ t þ 1 and return to Step 3.
8. Repeat Steps 3 to 6 until t > T þ 1,000.
9. Save the data from 1,001 to T þ 1,000.
Although not always necessary, the first 1,000 observations are typically
discarded as the burn-in period.
The residual-based bootstrap was first applied to linear Gaussian SSM by
Stoffer and Wall (1991) in assessing the precision of maximum likelihood estimates, and it is considered a semiparametric approach. Similar to the parametric
bootstrap, population parameters are taken to be sample estimates in the
residual-based bootstrap, and the underlying assumption of a correctly specified
model is also made. On the other hand, random samples are drawn, with replacement, from the standardized residuals as in the standard nonparametric bootstrap.
Specifically, the residual-based bootstrap procedure is based on the innovations
form of the KF:
et ¼ yt t LZtjt1
Dt ¼ LPtjt1L0 þ
Kt ¼ Ptjt1L0
Ztþ1jt ¼ BZtjt1 þ BKtet
yt ¼ t þ LZtjt1 þ et:
A State Space Modeling Approach to Mediation Analysis
Then, the algorithm proceeds as follows:
1. Calculate standardized innovations using D^ 1=2
t ^et, denoted ~et.
2. Draw, with replacement, a random sample from ~et to obtain ~e

t .
3. Construct a bootstrap sample by fixing the initial conditions of the KF and
iteratively using the following two equations:
Ztþ1jt ¼ B^Ztjt1 þ B^K^tD^1=2
t ~e

Don't use plagiarized sources. Get Your Custom Essay on
A State Space Modeling Approach to Mediation Analysis
Just from $13/Page
Order Essay

yt ¼ ^t þ L^Ztjt1 þ D^1=2
t ~e

t :
The idea behind the residual-based bootstrap is that the standardized residuals
are independent and identically distributed, and therefore exchangeable, after all
the dynamic and measurement relationships have been accounted for by the
model. This procedure, however, is not robust against model misspecification
(Stoffer & Wall, 1991, 2004; Zhang & Chow, 2010).
For the examples considered in this article, we set B ¼ 2,000 for both bootstrap procedures to approximate the sampling distribution of the product. As a
general rule, large numbers are required to allow enough simulated cases in both
tails of the sampling distribution of the indirect effect so that the percentiles can
be accurately estimated for constructing a CI (Yung & Chan, 1999).
3.5. Illustration 1: A Simulated Lag-2 Example
In order to illustrate the parameter estimation in SSM and the two bootstrap
procedures just described, a three-variate, single-subject time series data set is
generated from a lag-2 model (depicted in Figure 2). Note that the figure represents a temporal slice of three time points from the entire process. Three equations are involved in this model:
FIGURE 2. A temporal slice of the lag-2 model.
Gu et al.
Yt ¼ aYt1 þ bMt1 þ cXt1 þ gXt2 þ eYt
Mt ¼ dMt1 þ eXt1 þ eMt
Xt ¼ fXt1 þ eXt;
where a, d, and f are autoregressive parameters of the outcome variable, the mediator variable, and the causal variable, separately; b, c, and e are the lag-1 crossregressive parameters; g is the lag-2 cross-regressive parameter; and eYt, eMt, and
eXt are residuals in each equation. If g is equal to 0, it reduces to the lag-1 model,
which is a particular case of the lag-2 model. In addition, if c is also equal to 0, it
corresponds to one of the models in Cole and Maxwell (2003, model 5). As we
will show immediately, the lag-2 model can be expressed in state space form. In
principle, as long as we can write the model in state space form, the parameter
estimation and bootstrap procedures can be readily applied.
By defining the following measurement equation and the transition equation,
the state space form of the lag-2 model is obtained:
[email protected]
CA ¼
[email protected]
[email protected]
; and ¼
[email protected]
[email protected]
abc 0 0 g
0 d e 000
0 0 f 000
[email protected]
[email protected]
[email protected]
and C ¼
cY 0 0 000
0 cM 0 000
0 0 cX 000
0 0 0 000
0 0 0 000
0 0 0 000
[email protected]
We simulated four time series data sets with lengths equal to 50, 100, 150, and
200. Compared with the typical lengths in the time series literature, the lengths
considered here are relatively short. Although most social scientists typically collect at best only a handful of repeated measures, intensive longitudinal data are
highly desirable for SSM analyses. Some precedent examples have already
emerged in emotion studies using daily diary data (Chow, Hamaker, Fujita, &
A State Space Modeling Approach to Mediation Analysis
Boker, 2009; Song & Ferrer, 2009, 2012), a cognitive study using daily cognitive
assessment data (Chow, Hamaker, & Allaire, 2009), and a neuropsychology
study using functional magnetic resonance imaging data (Ho et al., 2005).
We estimated the parameters using the KF algorithm with results shown in
Table 1. The estimated parameters in four different length conditions roughly
show that length of the time series is inversely related to the magnitude of parameter bias. A challenge, however, is that when the sampling frequency (i.e., the
time elapsed between consecutive measurement occasions) is fixed, longer
time series data are more expensive and difficult to collect. One strategy to
obtain longer time series data is to increase the sampling frequency. The issue
of length and sampling frequency will be discussed briefly in the last section.
The indirect effect from X(t2) to Y(t) is the product of e and b. The two
bootstrap methods are used to construct 95% CIs for this product. For the four
simulated examples, the 95% CIs from both the parametric bootstrap and the
residual-based bootstrap are similar, and none of the CIs contains zero, indicating a significant lag-2 indirect effect from X(t2) to Y(t). In addition, the CIs from
the parametric bootstrap are a bit larger than those from the residual-based
bootstrap when T ¼ 50 and 100; this difference almost disappears when
T ¼ 150 and 200.
Parameter Estimates From a Lag-2 Model Fitted to Simulated Data
Parameter True Value T ¼ 50 T ¼ 100 T ¼ 150 T ¼ 200
a: autoreg of Y .5 .440 .460 .469 .432
b: Yt on Mt1 .4 .295 .356 .351 .338
c: Yt on Xt1 .4 .373 .342 .360 .386
d: autoreg of M .5 .414 .444 .463 .476
e: Mt on Xt1 .4 .458 .462 .443 .400
f: autoreg of X .8 .820 .741 .778 .792
g: Yt on Xt2 .3 .298 .327 .281 .309
cY .1 .110 .082 .091 .096
cM .4 .282 .318 .332 .349
cX .9 .753 .758 .815 .996
eb .16 .135 .164 .155 .135
resid parm resid parm resid parm resid parm
Number converged 2,000 2,000 2,000 2,000 2,000 2,000 2,000 2,000
Lower bound of eb .067 .055 .119 .110 .117 .115 .104 .105
Upper bound of eb .220 .245 .228 .237 .209 .208 .176 .178
Range of the 95% CI .153 .190 .109 .127 .092 .093 .072 .073
Note. CI ¼ confidence interval; resid ¼ residual-based bootstrap; parm ¼ parametric bootstrap; eb is
the indirect effect from Xt2 to Yt.
Gu et al.
3.6. Illustration 2: An Empirical Study
In this section, we fit the lag-2 model displayed in Figure 2 to the time series of
two man–woman dyads. Each series contains self-reported daily stress and affect
for 91 days. These data are part of a larger study designed to examine dyadic interactions (for more details see, e.g., Ferrer, Steele, & Hsieh, 2012; Ferrer &
Widaman, 2008). The variables used in these analyses are female perceived stress
(X), male positive affect specific to his relationship (M), and female negative affect
specific to her relationship (Y). Relationship-specific affect (RSA) was measured
using the RSA scale (Ferrer et al., 2012), 18 items intended to tap into the participants’ positive and negative emotional experiences specific to their relationships.
Examples of the positive items include ‘‘emotionally intimate,’’ ‘‘trusted,’’ and
‘‘loved.’’ Examples of the negative items include ‘‘sad,’’ ‘‘trapped,’’ and ‘‘discouraged.’’ The stress construct was measured using 5 items from the Positive and Negative Affect Schedule (Watson, Clark, & Tellegen, 1988) including ‘‘distress,’’
‘‘upset,’’ ‘‘scared,’’ ‘‘nervous,’’ and ‘‘afraid.’’ For all analyses, we created unitweighted composites for each person, using all the items in each of the scales.
The results of fitting the lag-2 models to these data are reported in Table 2. For
Dyad 1, the estimated values for parameters b and e are statistically significant,
indicating reliable evidence to support positive relations between the male
partner’s positive affect on a given day and his female partner’s negative affect
the following day, as well as between the female partner’s perceived stress on a
given day and the male partner’s positive affect the following day. However, the
estimated c and g parameters are not statistically significant. The nonsignificant
estimate of g may suggest a more parsimonious model that does not include the
lag-2 structure between the female partner’s stress and her subsequent negative
For Dyad 2, the estimated b parameter is not statistically significant (.011/.031
¼ .355), whereas the estimated e and g are both significant. It is not unreasonable
to expect that the nonsignificant estimate of b may result in a model with a
nonsignificant indirect effect linking the causal variable at t 2 to the outcome
variable at t. For researchers interested in emotion, these preliminary results
provide motivation for further investigation of each dyad’s data separately. Particularly, the counterintuitive sign of the estimated b for both dyads may flag
some problems or uncertainties in the model, the data, or both. In reality, the
affective processes underlying different dyads may be qualitatively different
(e.g., including a feedback process).
For illustrative purposes, we also present the 95% CIs from the two bootstrap
methods for both dyads, but we acknowledge that the lag-2 model may not be
plausible for either dyad. The caveat is that these CIs are not readily interpretable
because neither bootstrap method is robust to model misspecification. Further
efforts are necessary to determine the best final model for each dyad and, in order
to make proper statistical inferences, the two bootstrap methods need to be
A State Space Modeling Approach to Mediation Analysis
applied to each final model separately. This is beyond the scope of the current
illustration, but is deserving of separate study. In summary, the message from this
illustration for mediation researchers is clear; that is, mediation analyses should
be conducted for each subject (i.e., individual, dyad, or other unit of analysis)
separately to accommodate heterogeneity in the units.
4. Additional Issues
4.1. Causal Inference
Establishing causality is an important component of longitudinal research.
There are several recent treatments of causal inference in mediation analysis,
most inspired by Rubin’s causal model (or the potential outcomes framework;
see Albert, 2008; Ten Have & Joffe, 2010). In this framework, causality is
defined with reference to potential outcomes that might have been obtained
under different counterfactual conditions. Because it is not possible to observe
outcomes under all possible conditions, certain assumptions are commonly
invoked to permit causal inference, for example, the assumption that key paths
Parameter Estimates From a Lag-2 Model Fitted to Two Dyads’ Time Series Over 91
Parameter Dyad 1 Dyad 2
a: autoreg of Y (female negative affect relationship
.420 (.137) .267 (.121)
b: Yt on Mt1 .220 (.043) .011 (.031)
c: Yt on Xt1 .053 (.132) .473 (.214)
d: autoreg of M (male positive affect relationship specific) .707 (.049) .736 (.066)
e: Mt on Xt1 .669 (.110) .792 (.205)
f: autoreg of X (stress) .927 (.040) .973 (.020)
g: Yt on Xt2 .042 (.099) .274 (.174)
cY: Var(pagf) .184 (.027) .113 (.017)
cM: Var(nasf) .385 (.057) .588 (.088)
cX: Var(pasf) .309 (.046) .049 (.007)
eb .147 .009
resid parm resid parm
Number converged 2,000 2,000 2,000 2,000
Lower bound of eb .100 .095 .029 .028
Upper bound of eb .225 .246 .076 .049
Range of the 95% CI .125 .151 .105 .077
Note. Standard errors are in parentheses. CI ¼ confidence interval; resid ¼ residual-based bootstrap;
parm ¼ parametric bootstrap; eb is the indirect effect from Xt2 to Yt.
Gu et al.
composing an indirect effect are not confounded by omitted variables, and the
assumption that X does not moderate the effect of M on Y, among others (Imai,
Keele, & Tingley, 2010; Pearl, 2010, 2012; VanderWeele & Vansteelandt, 2009).
The single-subject design is not an attempt to establish causal relationships
per se. Instead, it provides a temporally plausible way to model and test a
hypothetical mediation process. That is, the parameters associated with the
mediator/mediators can be tested against a null hypothesis using a sampling
distribution (perhaps obtained by bootstrapping), so that the researcher can gain
more insights into the data and determine to what degree the data are consistent
with the hypothesized underlying process. Because it is not explicitly couched
in a potential outcomes framework, researchers should be cautious in making
causal inferences using the state space modeling approach to mediation analysis. However, it should be noted that SSM has in its favor that key effects are
within-subject rather than between-subject, and measurements on key variables
are necessarily separated in time. These features provide a stronger basis for
causal inference than other methods that are designed for use with betweensubject and/or cross-sectional data. Extending the logic of the potential outcomes framework to single-subject longitudinal designs is an interesting avenue for future research.
4.2. Extension to the Multiple-Subject Time Series
Gathering time series data simultaneously on multiple persons (as in our
empirical study) is a common practice. In terms of modeling strategy, it can
be considered a straightforward extension to the model presented here. We call
this extension the multiple-subject time-series design. Since the introduction of
DFM (Molenaar, 1985), the single subject is always emphasized as the unit of
analysis. This emphasis may give researchers the impression that DFM is
restricted to analyzing the time series data of a single subject. This impression
is not incorrect for the standard application of DFM. As is demonstrated in the
modern literature, DFM and SSM can and should be extended to the multiplesubject time-series design (e.g., Chow, Hamaker, & Allaire, 2009; Chow,
Hamaker, Fujita, & Boker, 2009; Chow, Zu et al., 2011; Hamaker et al., 2005;
Molenaar, 2010b; Nesselroade, 2010; Song & Ferrer, 2012).
The term panel model is often used to refer to models used to analyze timeseries data (of any length from short to long) collected from a group of subjects
(of any sample size). In mediation analysis, panel models under the SEM framework usually take very short lengths (e.g., a handful of repeated measurements)
and large sample sizes (e.g., Cole & Maxwell, 2003).
Methodologically, multiple-subject modeling can be implemented easily by
fitting a qualitatively and quantitatively identical model to multiple subjects,
treating each individual as a group. By qualitatively identical, we mean that each
subject can be characterized by the same dynamic process implied by the
A State Space Modeling Approach to Mediation Analysis
specified model; whereas by quantitatively identical, we mean that the parameters of the specified model are equated across different subjects. Then, it
is possible to test parameter invariance across subjects by means of likelihood
ratio tests. This procedure is akin to the standard multiple-group SEM analysis
of interindividual variation in searching for commonality across subjects
(nomothetic lawfulness). Following the new definition of parameter invariance
proposed by Nesselroade, Gerstorf, Hardy, and Ram (2007), we suggest that
parameter invariance tests can be better examined at an appropriate level of
abstraction. Specifically, parameters in the measurement equation can differ
to some arbitrary degree to recognize and isolate idiosyncrasy, while those in
the transition equation can be equated and tested for similarities that reflect
nomothetic lawfulness.
However, as described previously, some empirical examples suggest that
heterogeneity is the rule rather than the exception. In addition to our empirical
example, for instance, economists studying workers’ levels of satisfaction
encountered the problem that each individual anchors his or her scale at a different level (Winkelmann & Winkelmann, 1998). This renders interindividual
comparisons of responses meaningless in a cross-sectional study. Another
example, as discussed by Nesselroade (2010), concerns participants’ idiosyncratic use of language. Specifically, one debriefed participant from Mitteness
and Nesselroade (1987) reported that she interpreted the term ‘‘anxious’’ to
mean ‘‘eager.’’
This empirical evidence further supports the general conclusion of heterogeneity across people. On the other hand, Kelderman and Molenaar (2007) provided counterintuitive evidence of the insensitivity of the standard factor
analysis of interindividual variation to the presence of extreme qualitative heterogeneity of the factor loadings in the population of subjects. This proven insensitivity can have serious practical and ethical consequences, yielding individual
assessments and decisions that are biased to unknown degrees (Molenaar,
2008a, 2008b). Given these considerations, great caution should be used if homogeneity is to be assumed.
Another popular modeling strategy for multiple-subject time series data
that allows some degree of heterogeneity is the multilevel modeling framework. Song and Ferrer (2012) recently proposed a random coefficient DFM
(which can be conceptualized as a multilevel SSM) to investigate both intraand interindividual variation. Specifically, they assumed that the parameters
in the transition matrix are drawn from some distribution, so that differences
in dynamics at the appropriate level of abstraction can be accommodated. A
further extension of their model is to allow the parameters in the loading
matrix to be random to capture heterogeneity in the between-subject factor
structure. Such an extension, however, is complex and difficult in terms of
model specification, and estimation may render a model infeasible and/or
indefensible in practice.
Gu et al.
4.3. Length and Sampling Frequency
Longer time series are desired for state space analysis and other time-series
techniques in general. Whereas increasing sampling frequency is a way to
obtain more data points for a given time span, the optimal balance of length and
sampling frequency is almost always context-dependent (Brose & Ram, 2012;
Sliwinski & Mogle, 2008). Researchers should consider carefully the relative
benefits of extending the length of a study versus making more observations
within a fixed length. A thorough discussion of the issue is beyond the scope
of this article; see Collins and Graham (2002), Nesselroade (1991), Nesselroade and Boker (1994), Nesselroade and Jones (1991), and Windle and Davies
4.4. Use of Latent Variables in Mediation Analysis
According to Cole and Maxwell (2003, question 6: What are the effects of random measurement error?), unmodeled measurement error variance can cause
both under- and overestimation of other model parameters. A natural extension
of the simple mediation model that we have illustrated involves the use of latent
variables, each of which can be indicated by multiple manifest variables. By
explicitly modeling error variance () in the measurement equation, the bias
in parameter estimation can be almost completely resolved, provided that the
model is otherwise correctly specified. Further, psychometric properties of the
measurement instruments (e.g., tau-equivalence) can be evaluated by equating
some loading parameters in the L matrix. As for studying mediation, dynamics
of the process can be examined at the latent level while taking into consideration
the factorial structure of the data.
4.5. Use of Exogenous Variables
Exogenous variables, or fixed external inputs, may enter into the observation equation, the transition equation, or both (e.g., Lu¨tkepohl, 2005, p. 613;
Shumway & Stoffer, 2011, p. 320). In this case, the linear Gaussian SSM is
extended to
yt ¼ tt þ LtZt þ txt þ et; et MVNð0; tÞ
Zt ¼ at þ BtZt1 þ txt þ zt; zt MVNð0; CtÞ:
Perhaps the most common purpose for including exogenous variables in statistical models is to account for the variance in some random component (e.g.,
Snijders & Bosker, 2011). Molenaar, de Gooijer, and Schmitz (1992) used a discrete time variable in the transition equation to accommodate a linear time trend.
Beyond parameter estimation in time-varying SSM, Molenaar (1994, 2010a,
2010b) presented the theory, methods, and application of optimal control in psychopathological processes. Basically, a feedback function is derived after the
A State Space Modeling Approach to Mediation Analysis
SSM parameters are estimated to determine the optimal level of the external
inputs such that it is possible to manipulate the external inputs (e.g., the insulin
dose) by the controller (e.g., the therapist) to guarantee that the outcome variable
(e.g., the blood glucose level of a patient) will be as close as possible to the
desired level. This type of application presents a grand opportunity to develop
more effective personalized treatment, as opposed to the general dosage of the
clinical medication which is not optimal, less effective, and often brings some
undesired side effects.
Moreover, it is possible to collect data with exogenous variables in the
multiple-subject time-series design. If the exogenous variable is time varying
within a person, the multiple-group analysis discussed before might be applied
(e.g., Molenaar, 2010b). If the exogenous variable is time-invariant within a person but varying between persons (i.e., a Level-2 variable), the multilevel (or random coefficient) state space framework can be applied (Song & Ferrer, 2012). In
both scenarios, moderated mediation (or other complex interactions) can be
examined (Card, 2012; Preacher, Rucker, & Hayes, 2007). Future research on
this topic is warranted.
4.6. Missing Data
An attractive feature of SSM is that missing values in the time series data are
easily handled by the KF algorithm. The full-information maximum likelihood
(FIML) procedure varies the dimension of the data vector for observations that
contain one or more missing values. A computationally easier variant of the
FIML procedure is to zero out the missing values and retain the same dimension
of the data vector throughout the observations. No additional effort is needed to
preprocess the missing values as in multiple imputation procedures. Shumway
and Stoffer (2011, chapter 6, subsection 6.4) outline the details of the FIML procedure and its easier variant, and they also describe the necessary modifications
for the expectation–maximization optimization algorithm.
4.7. Software Implementation
Estimating time-varying SSMs is often a difficult task in that extensive programming skills are required to write one’s own software. EKFIS, a Fortran program developed by Peter Molenaar to implement the Extended KF with Iteration
and Smoothing algorithm has been used to illustrate the examples in several articles and chapters (e.g., Molenaar et al., 2009; Molenaar & Ram, 2009, 2010; Sinclair & Molenaar, 2008). However, the flexibility of time-varying SSM can
become a liability as well, as EKFIS requires the ability to write and compile Fortran code (Molenaar & Ram, 2010). Other authors have implemented the
(extended) KF algorithm using MATLAB and Ox/Ssfpack to estimate timevarying SSMs (e.g., Chow, Hamaker, & Allaire, 2009; Chow, Hamaker, Fujita,
& Boker, 2009; Chow, Zu et al., 2011; Zu, 2008). The programming efforts
Gu et al.
involved are still, unfortunately and inevitably, demanding, which is one of the
reasons for the scarcity of modeling work along these lines.
The programming task is relatively easier for time-invariant SSM than for its
time-varying counterpart. In 2011, several articles were published to illustrate different software packages (e.g., EViews, MATLAB, R, SAS, Stata, and several others) in a special volume (Vol. 41) of the Journal of Statistical Software. However,
each software package has certain limitations. Readers are urged to consult this
special volume to get a flavor of the package with which they are the most familiar.
Itis worth notingthat MKFM6, a Fortran program provided by Dolan (2005), and
a SAS/IML program, provided by Gu and Yung (2013), are available to estimate
time-invariant linear Gaussian SSMs. MKFM6 is free and has been used by several
authors (e.g., Chow, Ho, Hamaker, & Dolan, 2010; Hamaker et al., 2005; Zhang
et al., 2008). The programming tasks in this article are implemented in SAS/IML,
which was developed by modifying and extending the SAS/IML code provided
by Gu and Yung. All the programs can be obtained by request from the first author.
Finally, a Bayesian approach to parameter estimation for DFM and SSM is
emerging (e.g., Bhattacharya, Ho, & Purkayastha, 2006; Bhattacharya & Maitra,
2011; Chow, Tang, Yuan, Song, & Zhu, 2011; Song & Ferrer, 2012; Zhang &
Nesselroade, 2007). This newer approach requires estimation methods that are
computationally heavy (e.g., Markov chain Monte Carlo). The rapid development of specialized software programs (e.g., WinBUGS, Mplus, OpenBUGS),
however, makes use of Bayesian methods both manageable and appealing.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research,
authorship, and/or publication of this article.
The author(s) received no financial support for the research, authorship, and/or publication
of this article.
1. Application ofthe state space model (SSM)is not restrictedto single-subjecttime
series data. For example, Chow, Ho, Hamaker, and Dolan (2010) provide an
example in which SSM is applied to cross-sectional data. Moreover, we will
discuss the extension of multiple-subject time series data in Subsection 4.1.
2. Note that our definition of stationarity is consistent with that used in the timeseries literature, but not strictly parallel to the same term described in the mediation literature (Cole & Maxwell, 2003; Kenny, 1979).
3. More discussion is provided in Subsection 4.4.
4. Proof of the Kalman filter (KF) algorithm can be found, for instance, in Lu¨tkepohl
(2005, pp. 630–631) or Shumway and Stoffer (2011, pp. 326–327).
A State Space Modeling Approach to Mediation Analysis
Albert, J. M. (2008). Mediation analysis via potential outcomes models. Statistics in Medicine, 27, 1282–1304.
Allport, G. W. (1937). Personality: A psychological interpretation. New York, NY: Holt,
Rinehart, and Winston.
Aoki, M. (1987). State space modeling of time series. Berlin, Germany: Springer.
Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in
social psychological research: Conceptual, strategic, and statistical considerations.
Journal of Personality & Social Psychology, 51, 1173–1182.
Bhattacharya, S., Ho, M.-H. R., & Purkayastha, S. (2006). A Bayesian approach to
modeling dynamic effective connectivity with fMRI data. NeuroImage, 30,
Bhattacharya, S., & Maitra, R. (2011). A nonstationary nonparametric Bayesian approach
to dynamically modeling effective connectivity in functional magnetic resonance imaging experiments. The Annals of Applied Statistics, 5, 1183–1206.
Bollen, K. A., & Stine, R. (1990). Direct and indirect effects: Classical and bootstrap estimates of variability. Sociological Methodology, 20, 115–140.
Brose, A., & Ram, N. (2012). Within-person factor analysis: Modeling how the individual fluctuates and changes across time. In M. R. Mehl & T. S. Conner (Eds.),
Handbook of research methods for studying daily life (pp. 43–61). New York, NY:
The Guilford Press.
Browne, M. W., & Nesselroade, J. R. (2005). Representing psychological processes with
dynamic factor models: Some promising uses and extensions of autoregressive moving
average time series models. In A. Maydeu-Olivares & J. J. McArdle (Eds.), Contemporary psychometrics (pp. 415–452). New York, NY: Routledge.
Card, N. A. (2012). Multilevel mediational analysis in the study of daily lives. In M. R.
Mehl & T. S. Conner (Eds.), Handbook of research methods for studying daily life
(pp. 479–494). New York, NY: The Guilford Press.
Cattell, R. B. (1963). The structuring of change by P-technique and incremental R-technique. In C. W. Harris (Ed.), Problems in measuring change (pp. 167–198). Madison:
University of Wisconsin Press.
Cattell, R. B., Cattell, A. K. S., & Rhymer, R. M. (1947). P-technique demonstrated in
determining psychophysical source traits in a normal individual. Psychometrika, 12,
Cheong, J., MacKinnon, D. P., & Khoo, S. T. (2003). Investigation of mediational processes using parallel process latent growth curve modeling. Structural Equation
Modeling, 10, 238–262.
Chow, S.-M., Hamaker, E. L., & Allaire, J. C. (2009). Using innovative outliers to detect
discrete shifts in dynamics in group-based state-space models. Multivariate Behavioral
Research, 44, 465–496.
Chow, S.-M., Hamaker, E. L., Fujita, F., & Boker, S. M. (2009). Representing timevarying cyclic dynamics using multiple-subject state-space models. British Journal
of Mathematical and Statistical Psychology, 62, 683–716.
Chow, S.-M., Ho, M.-H. R., Hamaker, E. L., & Dolan, C. V. (2010). Equivalence and differences between structural equation modeling and state-space modeling techniques.
Structural Equation Modeling, 17, 303–332.
Gu et al.
Chow, S.-M., Nesselroade, J. R., Shifren, K., & McArdle, J. J. (2004). Dynamic structure
of emotions among individuals with Parkinson’s disease. Structural Equation Modeling,
11, 560–582.
Chow, S.-M., Tang, N., Yuan, Y., Song, X., & Zhu, H. (2011). Bayesian estimation of
semiparametric nonlinear dynamic factor analysis models using the Dirichlet process
prior. British Journal of Mathematical and Statistical Psychology, 64, 69–106.
Chow, S.-M., Zu, J., Shifren, K., & Zhang, G. (2011). Dynamic factor analysis models
with time-varying parameters. Multivariate Behavioral Research, 46, 303–339.
Cole, D. A., & Maxwell, S. E. (2003). Testing meditational models with longitudinal data:
Questions and tips in the use of structural equation modeling. Journal of Abnormal
Psychology, 112, 558–577.
Collins, M. L., & Graham, J. W. (2002). The effect of timing and spacing of observations
in longitudinal studies of tobacco and other drug use: Temporal design considerations.
Drug and Alcohol Dependence, 68, 85–96.
Collins, M. L., Graham, J. W., & Flaherty, B. P. (1998). An alternative framework for
defining mediation. Multivariate Behavioral Research, 33, 295–312.
Commandeur, J. J. F., & Koopman, S. J. (2007). An introduction to state space time series
analysis. New York, NY: Oxford University Press.
Dolan, C. V. (2005). MKFM6: Multi-group, multi-subject stationary time series modeling based
on the Kalman filter. Retrieved December 1, 2004, from http://users.fmg.uva.nl/cdoman/
Durbin, J., & Koopman, S. J. (2001). Time series analysis by state space methods.
New York, NY: Oxford University Press.
Ferrer, E., & Nesselroade, J. R. (2003). Modeling affective processes in dyadic relations
via dynamic factor analysis. Emotion, 3, 344–360.
Ferrer, E., Steele, J., & Hsieh, F. (2012). Analyzing dynamics of affective dyadic interactions using patterns of intra- and inter-individual variability. Multivariate Behavioral
Research, 47, 136–171.
Ferrer, E., & Widaman, K. F. (2008). Dynamic factor analysis of dyadic affective processes with inter-group differences. In N. A. Card, J. P. Selig, & T. D. Little (Eds.),
Modeling dyadic and interdependent data in the developmental and behavioral
sciences (pp. 107–137). Hillsdale, NJ: Psychology Press.
Gollob, H. F., & Reichardt, C. S. (1991). Interpreting and estimating indirect effects
assuming time lags really matter. In L. M. Collins & J. L. Horn (Eds.), Best methods
for the analysis of change: Recent advances, unanswered questions, future directions
(pp. 243–259). Washington, DC: American Psychological Association.
Gu, F., & Yung, Y.-F. (2013). A SAS/IML program using the Kalman filter for estimating
state space model. Behavior Research Methods, 45, 38–53.
Hamagami, F., & McArdle, J. J. (2007). Dynamic extensions of latent difference score
models. In S. M. Boker & M. L. Wegner (Eds.), Data analytic techniques for dynamical systems (pp. 47–85). Mahwah, NJ: Lawrence Erlbaum.
Hamaker, E. L. (2012). Why researchers should think ‘‘within-person’’: A paradigmatic
rationale. In M. R. Mehl & T. S. Conner (Eds.), Handbook of research methods for
studying daily life (pp. 43–61). New York, NY: The Guilford Press.
Hamaker, E. L., Dolan, C. V., & Molenaar, P. C. M. (2005). Statistical modeling of the
individual: Rational and application of multivariate stationary time series analysis.
Multivariate Behavioral Research, 40, 207–233.
A State Space Modeling Approach to Mediation Analysis
Harvey, A. C. (1989). Forecasting, structural time series models and the Kalman filter.
Princeton, NJ: Princeton University Press.
Hayes, A. F. (2009). Beyond Baron and Kenny: Statistical mediation analysis in the new
millennium. Communication Monographs, 76, 408–420.
Hershberger, S. L., Corneal, S. E., & Molenaar, P. C. M. (1994). Dynamic factor analysis:
An application to emotional response patterns underlying daughter/father and stepdaughter/stepfather relationships. Structural Equation Modeling, 2, 31–52.
Ho, M.-H. R., Ombao, H., & Shumway, R. (2005). A state-space approach to modelling
brain dynamics. Statistica Sinica, 15, 407–425.
Ho, M.-H. R., Shumway, R., & Ombao, H. (2006). State-space approach to modeling
dynamic processes: Applications in biological and social sciences. In T. A. Walls &
J. L. Schafer (Eds.), Models for intensive longitudinal data (pp. 148–170). New York,
NY: Oxford University Press.
Imai, K., Keele, L., & Tingley, D. (2010). A general approach to causal mediation analysis.
Psychological Methods, 15, 309–334.
Judd, C. M., & Kenny, D. A. (1981). Process analysis: Estimating mediation in treatment
evaluations. Evaluation Review, 5, 602–619.
Judd, C. M., Kenny, D. A., & McClelland, G. H. (2001). Estimating and testing mediation
and moderation in within-participant designs. Psychological Methods, 6, 115–134.
Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Transactions of the ASME-Journal of Basic Engineering (Series D), 82, 35–45.
Kelderman, H., & Molenaar, P. C. M. (2007). The effect of individual differences in factor
loadings on the standard factor model. Multivariate Behavioral Research, 42, 435–456.
Kenny, D. A. (1979). Correlation and causality. New York, NY: John Wiley.
Lamiell, J. T. (1981). Toward an idiothetic psychology of personality. American Psychologist, 36, 276–289.
Lamiell, J. T. (1988). Once more into the breach: Why individual differences research
cannot advance personality theory. Paper presented at the annual meeting of the American Psychological Association, August, Atlanta, GA.
Lu¨tkepohl, H. (2005). New introduction to multiple time series analysis. Berlin, Germany:
MacCallum, R. C., & Ashby, G. F. (1986). Relationships between linear systems theory
and covariance structure modeling. Journal of Mathematical Psychology, 30, 1–27.
MacKinnon, D. P. (2008). Introduction to statistical mediation analysis. New York, NY:
Lawrence Erlbaum.
MacKinnon, D. P., Lockwood, C. M., & Williams, J. (2004). Confidence limits for the
indirect effect: Distribution of the product and resampling methods. Multivariate
Behavioral Research, 39, 99–128.
Maxwell, S. E., & Cole, D. A. (2007). Bias in cross-sectional analyses of longitudinal
mediation. Psychological Methods, 12, 23–44.
Maxwell, S. E., Cole, D. A., & Mitchell, M. A. (2011). Bias in cross-sectional analyses of
longitudinal mediation: Partial and complete mediation under an autoregressive
model. Multivariate Behavioral Research, 46, 816–841.
McArdle, J. J. (2001). A latent difference score approach to longitudinal dynamic structural
analyses. In R. Cudeck, S. du Toit, & D. Sorbom (Eds.), Structural equation modeling:
Present and future (pp. 341–380). Lincolnwood, IL: Scientific Software International.
Gu et al.
McArdle, J. J., & Nesselroade, J. R. (1994). Using multivariate data to structure developmental change. In S. H. Cohen & H. W. Reese (Eds.), Life-span developmental psychology: Methodological contributions (pp. 223–267). Hillsdale, NJ: Lawrence Erlbaum.
Mitteness, L. S., & Nesselroade, J. R. (1987). Attachment in adulthood: Longitudinal
investigation of mother-daughter affective interdependencies by p-technique factor
analysis. The Southern Psychologist, 3, 37–44.
Molenaar, P. C. M. (1985). A dynamic factor model for the analysis of multivariate time
series. Psychometrika, 50, 181–202.
Molenaar, P. C. M. (1994). Dynamic latent variable models in developmental psychology.
In A. van Eye & C. C. Clogg (Eds.), Latent variable analysis (pp. 155–180). Thousand
Oaks, CA: Sage.
Molenaar, P. C. M. (2004). A manifesto on psychology as idiographic science: Bring the
person back into scientific psychology, this time forever. Measurement, 2, 201–218.
Molenaar, P. C. M. (2008a). On the implications of the classical ergodic theorems: Analysis of developmental processes has to focus on intra-individual variation. Developmental Psychobiology, 50, 60–69.
Molenaar, P. C. M. (2008b). Consequences of the ergodic theorems for classical test
theory, factor analysis and the analysis of developmental processes. In S. M. Hofer
& D. F. Alwin (Eds.), Handbook of cognitive aging: Interdisciplinary perspectives. (pp. 90–104). Thousand Oaks, CA: Sage.
Molenaar, P. C. M. (2010a). Note on optimization of individual psychotherapeutic
processes. Journal of Mathematical Psychology, 54, 208–213.
Molenaar, P. C. M. (2010b). Testing all six person-oriented principles in dynamic factor
analysis. Development and Psychopathology, 22, 255–259.
Molenaar, P. C. M., Boomsma, D. I., & Dolan, C. V. (1993). A third source of developmental differences. Behavior Genetics, 23, 519–524.
Molenaar, P. C. M., & Campbell, C. G. (2009). The new person-specific paradigm in
psychology. Current Directions in Psychological Science, 18, 112–117.
Molenaar, P. C. M., de Gooijer, J. G., & Schmitz, B. (1992). Dynamic factor analysis of
nonstationary multivariate time series. Psychometrika, 57, 333–349.
Molenaar, P. C. M., & Ram, N. (2009). Advances in dynamic factor analysis of psychological processes. In J. Valsiner, P. C. M. Molenaar, M. C. D. P. Lyra, & N. Chaudhary
(Eds.), Dynamic process methodology in the social and developmental sciences (pp.
255–268). Dordrecht, the Netherlands: Springer Science.
Molenaar, P. C. M., & Ram, N. (2010). Dynamic modeling and optimal control of intraindividual variation: A computational paradigm for nonergodic psychological processes.
In S.-M. Chow, E. Ferrer, & F. Hsieh (Eds.), Statistical methods for modeling human
dynamics: An interdisciplinary dialogue (pp. 13–37). New York, NY: Routledge.
Molenaar, P. C. M., Sinclair, K. O., Rovine, M. J., Ram, N., & Corneal, S. E. (2009). Analyzing developmental processes on an individual level using nonstationary time series
modeling. Developmental Psychology, 45, 260–271.
Nesselroade, J. R. (1991). Interindividual differences in intraindividual change. In L. M.
Collins & J. L. Horn (Eds.), Best methods for the analysis of change (pp. 92–105).
Washington, DC: American Psychological Association.
Nesselroade, J. R. (2010). On an emerging third discipline of scientific psychology. In P.
C. M. Molenaar & K. M. Newell (Eds.), Individual pathway of change: Statistical
A State Space Modeling Approach to Mediation Analysis
models for analyzing learning and development (pp. 209–218). Washington, DC:
American Psychological Association.
Nesselroade, J. R., & Boker, S. M. (1994). Assessing constancy and change. In T. F.
Heatherton & J. L. Weinberger (Eds.), Can personality change? (pp. 121–147).
Washington, DC: American Psychological Association.
Nesselroade, J. R., Gerstorf, D., Hardy, S. A., & Ram, N. (2007). Idiographic filters for
psychological constructs. Measurement, 5, 217–235.
Nesselroade, J. R., & Jones, C. J. (1991). Multi-model selection effects in the study of
adult development: A perspective on multivariate, replicated, single-subject, repeated
measures designs. Experimental Aging Research, 17, 21–27.
Nesselroade, J. R., McArdle, J. J., Aggen, S. H., & Meyers, J. M. (2002). Dynamic factor
analysis models for representing process in multivariate time-series. In D. M. Moskowitz
& S. L. Hershberger (Eds.), Modeling intraindividual variability with repeated measures
data: Advances and techniques (pp. 235–265). Mahwah, NJ: Lawrence Erlbaum.
Nesselroade, J. R., & Molenaar, P. C. M. (1999). Pooling lagged covariance structures based
on short, multivariate time series for dynamic factor analysis. In R. H. Hoyle (Ed.), Statistical strategies for small sample research (pp. 223–250). Thousand Oaks, CA: Sage.
Otter, P. (1986). Dynamic structure systems under indirect observation: Identifiability and
estimation aspects from a system theoretic perspective. Psychometrika, 51, 415–428.
Pearl, J. (2010). The foundations of causal inference. Sociological Methodology, 40, 75–149.
Pearl, J. (2012). The causal mediation formula—A guide to the assessment of pathways
and mechanisms. Prevention Science, 13, 426–436.
Preacher, K. J., & Hayes, A. F. (2004). SPSS and SAS procedures for estimating indirect
effects in simple mediation models. Behavior Research Methods, 36, 717–731.
Preacher, K. J., & Hayes, A. F. (2008a). Asymptotic and resampling strategies for assessing and comparing indirect effects in multiple mediator models. Behavior Research
Methods, 40, 879–891.
Preacher, K. J., & Hayes, A. F. (2008b). Contemporary approaches to assessing mediation
in communication research. In A. F. Hayes, M. D. Slater, & L. B. Snyder (Eds.), The
Sage sourcebook of advanced data analysis methods for communication research (pp.
13–54). Thousand Oaks, CA: Sage.
Preacher, K. J., Rucker, D. D., & Hayes, A. F. (2007). Addressing moderated mediation
hypothesis: Theory, methods, and prescriptions. Multivariate Behavioral Research, 42,
Roe, R. (2012). What is wrong with mediators and moderators? The European Health
Psychologist, 14, 4–10.
Rosenzweig, S. (1958). The place of the individual and of idiodynamics in psychology:
A dialogue. Journal of Individual Psychology, 14, 3–20.
Sbarra, D. A., & Ferrer, E. (2006). The structure and process of emotional experience following nonmarital relationship dissolution: Dynamic factor analyses of love, anger,
and sadness. Emotion, 2, 224–238.
Schmitz, B. (2006). Advantages of studying processes in educational research. Learning
and Instruction, 16, 433–449.
Schweppe, F. (1965). Evaluation of likelihood functions for Gaussian signals. IEEE
Transactions on Information Theory, 11, 61–70.
Gu et al.
Selig, J. P., & Preacher, K. J. (2009). Mediation models for longitudinal data in developmental research. Research in Human Development, 6, 144–164.
Shifren, K., Hooker, K., Wood, P., & Nesselroade, J. R. (1997). Structure and variation of
mood in individuals with Parkinson’s disease: A dynamic factor analysis. Psychology
and Aging, 12, 328–339.
Shrout, P. E., & Bolger, N. (2002). Mediation in experimental and nonexperimental studies: New procedures and recommendations. Psychological Methods, 7, 422–445.
Shumway, R. H., & Stoffer, D. S. (2011). Time series analysis and its applications: With R
examples. New York, NY: Springer.
Sinclair, K. O., & Molenaar, P. C. M. (2008). Optimal control of psychological processes: A
new computational paradigm.Bulletin de la Societe des Sciences Medicales Luxembourg,
Sliwinski, M., & Mogle, J. (2008). Time-based and process-based approaches to analysis
of longitudinal data. In S. M. Hofer & D. F. Alwin (Eds.), Handbook of cognitive
aging: Interdisciplinary perspectives (pp. 477–491). Thousand Oaks, CA: Sage.
Snijders, T. A. B., & Bosker, R. J. (2011). Multilevel analysis: An introduction to basic
and advanced multilevel modeling. Thousand Oaks, CA: Sage.
Song, H., & Ferrer, E. (2009). State-space modeling of dynamic psychological processes
via the Kalman smoother algorithm: Rationale, finite sample properties, and applications. Structural Equation Modeling, 16, 338–363.
Song, H., & Ferrer, E. (2012). Bayesian estimation of random coefficient dynamic factor
models. Multivariate Behavioral Research, 47, 26–60.
Stoffer, D. S., & Wall, K. D. (1991). Bootstrapping state-space models: Gaussian maximum likelihood estimation and the Kalman filter. Journal of the American Statistical
Association, 86, 1024–1033.
Stoffer, D. S., & Wall, K. D. (2004). Resampling in state space models. In A. Harvey, S. J.
Koopman, & N. Shephar (Eds.), State space and unobserved component models:
Theory and applications (pp. 171–202). New York, NY: Cambridge University Press.
Ten Have, T. R., & Joffe, M. M. (2010). A review of causal estimation of effects in
mediation analyses. Statistical Methods in Medical Research, 21, 77–107.
VanderWeele, T. J., & Vansteelandt, S. (2009). Conceptual issues concerning mediation,
interventions and composition. Statistics and Its Interface, 2, 457–468.
van Kampen, V. (2000). Idiographic complexity and the common personality dimensions
of insensitivity, extraversion, neuroticism, and orderliness. European Journal of
Personality, 14, 217–243.
Watson, D., Clark, L. A., & Tellegen, A. (1988). Development and validation of brief
measures of positive and negative affect: The PANAS scales. Journal of Personality
and Social Psychology, 54, 1063–1070.
Windle, M., & Davies, P. T. (1999). Developmental research and theory. In K. E. Leonard
& H. T. Blane (Eds.), Psychological theories of drinking and alcoholism (2nd ed., pp.
164–202). New York, NY: Guilford.
Winkelmann, L., & Winkelmann, R. (1998). Why are the unemployed so unhappy? Evidence from panel data. Econometrica, 65, 1–15.
Wood, P., & Brown, D. (1994). The study of intraindividual differences by means of
dynamic factor models: Rationale, implementation, and interpretation. Psychological
Bulletin, 116, 166–186.
A State Space Modeling Approach to Mediation Analysis
Yung, Y.-F., & Chan, W. (1999). Statistical analyses using bootstrapping: Concepts and
implementation. In R. H. Hoyle (Ed.), Statistical strategies for small sample research
(pp. 81–105). Thousand Oaks, CA: Sage.
Zevon, M., & Tellegen, A. (1982). The structure of mood change: Idiographic/nomothetic
analysis. Journal of Personality and Social Psychology, 43, 111–122.
Zhang, G., & Browne, M. W. (2006). Bootstrap standard error estimates in dynamic factor
analysis. Multivariate Behavioral Research, 45, 453–482.
Zhang, G., & Chow, S.-M. (2010). Standard error estimation in stationary multivariate time
series models using residual-based bootstrap procedures. In P. C. M. Molenaar & K. M.
Newell (Eds.), Individual pathway of change: Statistical models for analyzing learning
and development (pp. 169–182). Washington, DC: American Psychological Association.
Zhang, Z., Hamaker, E. L., & Nesselroade, J. R. (2008). Comparisons of four methods for
estimating a dynamic factor model. Structural Equation Modeling, 15, 377–402.
Zhang, Z., & Nesselroade, J. R. (2007). Bayesian estimation of categorical dynamic factor
models. Multivariate Behavioral Research, 42, 729–756.
Zu, J. (2008). Examining nonlinear changes of coefficients in time-varying dynamic factor
models (Unpublished master’s thesis). University of Notre Dame, South Bend, IN.
FEI GU is an assistant professor at the Department of Psychology, McGill University,
1205 Doctor Penfield Avenue, Montreal, QC H3A 1B1, Canada; e-mail: [email protected]
gill.ca. He is interested in innovative applications of state space methods. Specifically,
his work extends the state space model to analyze cross-sectional, multilevel, and
(intensive) longitudinal data from multiple subjects and unifies many existing statistical
models into the state space approach.
KRISTOPHER J. PREACHER is an assistant professor at the Psychology & Human
Development,Vanderbilt University, PMB 552, 230 Appleton Place, Nashville, TN
37203-5721; e-mail: [email protected] His research focuses on the use
of latent variable analysis and multilevel modeling to analyze longitudinal and
correlational data. Other interests include developing techniques to address mediation,
moderation, and model evaluation and selection.
EMILIO FERRER is a professor at the Department of Psychology University of
California, Davis, One Shields Avenue, Davis, CA 95616-8686; e-mail: [email protected]
vis.edu. He is interested in methods to analyze change and intraindividual variability, in
particular latent growth analysis and linear and nonlinear dynamical systems. His
current research in this area involves techniques to model dyadic interactions and developmental processes underlying fluid reasoning.
Manuscript received July 10, 2013
Revision received November 8, 2013
Accepted January 21, 2014
Gu et al.

Still stressed from student homework?
Get quality assistance from academic writers!
error: Content is protected !!
Open chat
Need assignment help? You can contact our live agent via WhatsApp using +1 718 717 2861

Feel free to ask questions, clarifications, or discounts available when placing an order.

Order your essay today and save 30% with the discount code LOVE