Steady-state analysis of genetic regulatory
networks modelled by probabilistic Boolean
networks
Probabilistic Boolean networks (PBNs) have recently been introduced as a promising
class of models of genetic regulatory networks. The dynamic behaviour of PBNs can
be analysed in the context of Markov chains. A key goal is the determination of the
steady-state (long-run) behaviour of a PBN by analysing the corresponding Markov
chain. This allows one to compute the long-term influence of a gene on another
gene or determine the long-term joint probabilistic behaviour of a few selected genes.
Because matrix-based methods quickly become prohibitive for large sizes of networks,
we propose the use of Monte Carlo methods. However, the rate of convergence to
the stationary distribution becomes a central issue. We discuss several approaches
for determining the number of iterations necessary to achieve convergence of the
Markov chain corresponding to a PBN. Using a recently introduced method based on
the theory of two-state Markov chains, we illustrate the approach on a sub-network
designed from human glioma gene expression data and determine the joint steadystate
probabilities for several groups of genes.
Introduction
Modelling of genetic regulatory networks is becoming
increasingly widespread for gaining insight into
the underlying processes of living systems. The
computational biology literature abounds in various
modelling approaches, all of which have particular
goals along with their strengths and weaknesses
(de Jong, 2002; Hasty et al., 2001; Smolen
et al., 2000). One important characteristic that is
shared by some models is the ability to capture
dynamic behaviour of the quantities that are being
modelled. Some examples include Boolean networks
(Kauffman, 1993; Huang, 1999; Somogyi
and Sniegoski, 1996), dynamic Bayesian networks
(Murphy and Mian, 1999; Smith et al., 2002) and the recently-introduced probabilistic Boolean networks
(PBNs) (Shmulevich et al., 2002a, 2002b,
2002c, 2002d; Dougherty and Shmulevich, 2003;
Zhou et al., 2003; Kim et al., 2002; Datta et al.,
2003).
A key aspect of the analysis of such dynamic
systems is the determination of their steady-state
(long-run) behaviour. Indeed, much of the power
of prediction stems from this very knowledge.
This paradigm closely parallels the mission of cancer
professionals who aim to understand the ebb
and flow of molecular events during cancer progression
and to predict how disease will develop
and how patients will respond to certain therapies.
A specific example will illustrate this point.
Comparing the gene expression profiles of humanbrain tumours of high grade (glioblastoma), midgrade
(anaplastic astrocytoma, anaplastic oligodendroglioma)
and low grade (oligodendroglioma),
a key gene expression event — elevated expression
of insulin-like growth factor binding protein
2 (IGFBP2) occurring only in high-grade brain
tumours — was identified (Fuller et al., 1999). It
can be envisioned that some gene expression events
have been initiated at different stages at the lowand
mid-grade brain tumours and may eventually
have led to the (steady) state when IGFBP2 is activated.
The faster these kinetics are, the faster is the
progression of cancer. The goal is that the use of
gene expression profiles, along with the construction
of the gene networks using models such as
PBNs, will make it possible to predict whether (and
when) the convergence to events such as IGFBP2
activation state will occur.
By the same token, we also envision the ability
to ‘see’ the next gene expression event when
IGFBP2 is activated. Such an endeavour can be
facilitated by laboratory experiments and prediction
must be validated by existing knowledge of the
biological system under investigation. For example,
suppose IGFBP2 is activated by experimental
manipulation. In order to determine the steady state
behaviour of some other gene after IGFBP2 activation,
biologists can select a relatively stable cell
system that has a baseline level of IGFBP2. Then,
IGFBP2 can be transfected into the cells and continuously
expressed in the new or ‘perturbed’ cell
system whose steady-state gene expression levels
are of interest. Both cell systems can be surveyed
in a genomic laboratory for the gene expression
profiles. A comparison will shed light on how and
where in the state space the network shifts, forming
a basis for prediction. In one of the described comparisons
carried out in our laboratory, cell invasion
genes were activated in IGFBP2 overexpressed
cells. This gene expression event is linked to a cellular
attractor state — cell invasion (Wang, 2003).
In this paper, we concentrate on obtaining the
steady-state (equilibrium) distribution of a PBN.
This is a crucial task in many contexts. For
instance, as was shown by Shmulevich et al.
(2002b), the steady-state distribution is necessary
in order to compute the so-called (long-term) influence,
which is a measure of the impact of a gene on
other genes. More ‘influential’ genes may possess
a greater potential to regulate the dynamics of the network, as their perturbation can lead to significant
‘downstream’ effects. Shmulevich et al.
(2002d) developed a methodology for altering the
steady-state probabilities of certain states or sets
of states with minimal modifications to the rulebased
structure of the network. Reliable computation
of steady-state distributions is crucial in this
context as well. As another example, suppose we
are interested in the long-term joint behaviour of
several selected genes, i.e. we would like to obtain
their limiting joint distribution. This information
can supply answers to questions of the type: ‘What
is the probability that gene A will be expressed
in the long run?’ or ‘What is the probability that
gene B and gene C will both be expressed in the
long run?’ Steady-state analysis is necessary for
answering such questions.
In this paper, we advocate the use of Monte Carlo
simulation methods for making reliable inferences
from steady-state analysis. A significant emphasis
will be placed on the convergence rate of the
Markov chain corresponding to the PBN, as it is
crucial to ensure that the chain reaches stationarity
before collecting information of interest. We also
illustrate our methods by analysing sub-networks
generated from human glioma gene expression data
(the data set is described in Fuller et al., 1999).
The necessary background material, including definitions
and some results related to PBNs,
Steady-state analysis
Most approaches to steady-state analysis use the
state transition matrix in some form or another. For
the case of PBNs, this would consist of constructing
the state-transition matrix A given in Equation (4)
in the Supplementary Material, and then applying
numerical methods. A variety of approaches using
iterative, projection, decompositional and other
methods could potentially be used (Stewart, 1994).
Unfortunately, however, in the case of PBNs, the
size of the state space grows exponentially in
the number of genes and becomes prohibitive forunsuitable due to the possibly irregular structure
of the transition matrix.
On a more positive note, it should be recognized
that even larger state spaces are commonly encountered
in Markov chain Monte Carlo (MCMC) methods
for many applications, including Markov random
field modelling in image processing (Winkler,
1995), where efficient simulation and estimation
are routinely performed. Thus, Monte Carlo
methods represent a viable alternative to numerical
matrix-based methods for obtaining steady-state
distributions. Informally speaking, this consists of
running the Markov chain for a sufficiently long
time, until convergence to the stationary distribution
is reached, and observing the proportion of
time the process spends in the parts of the state
space that represent the information of interest,
such as the joint stationary distribution of several
specific genes. A key factor is convergence, which
to a large extent depends on the perturbation probability,
p. In general, a larger p results in quicker
convergence, but making p too large is not biologically
meaningful.
In order for us to perform long-term analysis
of the Markov chain corresponding to a PBN
using Monte Carlo methods, we need to be able
to estimate the convergence rate of the process.
Only after we are sufficiently sure that the chain
has reached its stationary distribution can we
begin to collect information of interest. Typical
approaches for assessing convergence are based
on the second-largest eigenvalue of the transition
probability matrix A. Unfortunately, as mentioned
above, for even a moderate number of genes,
obtaining the eigenvalues of the transition matrix
may be impractical. Thus, it is advantageous to
be able to determine the number of iterations
necessary until satisfactory convergence is reached.
One approach for obtaining a priori bounds on
the number of iterations is based on the so-called
minorization condition for Markov chains (Rosenthal,
1995). The Supplementary Material, which is
available discusses the minorization condition in the context
of PBNs. Unfortunately, as we show in the
Supplementary Material, the usefulness of such a
result is rather limited. Even for a moderately small
number of genes, the number of iterations, k, predicted
by the bound is much too large to be useful in practice. We have also found (see Supplementary
Material) that making any assumptions about
the relative magnitudes of the probabilities of perturbation
and transition via the selected Boolean
functions is not likely to significantly improve the
bound on convergence, and that it is only by having
some information about the structure of the
PBN itself that the bound can potentially be lowered.
Thus, with limited ability to obtain a priori
bounds, we now turn to diagnosing convergence to
the steady-state distribution.
Diagnosing convergence
In a practical situation, it is important to be able to
empirically determine when to stop the chain and
produce our estimates. For this purpose, there are a
number of monitoring methods available (Cowles
and Carlin, 1996; Robert, 1995). Consider, for
example, the Kolmogorov–Smirnov test, a nonparametric
test of stationarity that can be used to
assess convergence.
Tuesday, February 1, 2011
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment