Steady-state analysis of genetic regulatory

networks modelled by probabilistic Boolean

networks

Probabilistic Boolean networks (PBNs) have recently been introduced as a promising

class of models of genetic regulatory networks. The dynamic behaviour of PBNs can

be analysed in the context of Markov chains. A key goal is the determination of the

steady-state (long-run) behaviour of a PBN by analysing the corresponding Markov

chain. This allows one to compute the long-term influence of a gene on another

gene or determine the long-term joint probabilistic behaviour of a few selected genes.

Because matrix-based methods quickly become prohibitive for large sizes of networks,

we propose the use of Monte Carlo methods. However, the rate of convergence to

the stationary distribution becomes a central issue. We discuss several approaches

for determining the number of iterations necessary to achieve convergence of the

Markov chain corresponding to a PBN. Using a recently introduced method based on

the theory of two-state Markov chains, we illustrate the approach on a sub-network

designed from human glioma gene expression data and determine the joint steadystate

probabilities for several groups of genes.

Introduction

Modelling of genetic regulatory networks is becoming

increasingly widespread for gaining insight into

the underlying processes of living systems. The

computational biology literature abounds in various

modelling approaches, all of which have particular

goals along with their strengths and weaknesses

(de Jong, 2002; Hasty et al., 2001; Smolen

et al., 2000). One important characteristic that is

shared by some models is the ability to capture

dynamic behaviour of the quantities that are being

modelled. Some examples include Boolean networks

(Kauffman, 1993; Huang, 1999; Somogyi

and Sniegoski, 1996), dynamic Bayesian networks

(Murphy and Mian, 1999; Smith et al., 2002) and the recently-introduced probabilistic Boolean networks

(PBNs) (Shmulevich et al., 2002a, 2002b,

2002c, 2002d; Dougherty and Shmulevich, 2003;

Zhou et al., 2003; Kim et al., 2002; Datta et al.,

2003).

A key aspect of the analysis of such dynamic

systems is the determination of their steady-state

(long-run) behaviour. Indeed, much of the power

of prediction stems from this very knowledge.

This paradigm closely parallels the mission of cancer

professionals who aim to understand the ebb

and flow of molecular events during cancer progression

and to predict how disease will develop

and how patients will respond to certain therapies.

A specific example will illustrate this point.

Comparing the gene expression profiles of humanbrain tumours of high grade (glioblastoma), midgrade

(anaplastic astrocytoma, anaplastic oligodendroglioma)

and low grade (oligodendroglioma),

a key gene expression event — elevated expression

of insulin-like growth factor binding protein

2 (IGFBP2) occurring only in high-grade brain

tumours — was identified (Fuller et al., 1999). It

can be envisioned that some gene expression events

have been initiated at different stages at the lowand

mid-grade brain tumours and may eventually

have led to the (steady) state when IGFBP2 is activated.

The faster these kinetics are, the faster is the

progression of cancer. The goal is that the use of

gene expression profiles, along with the construction

of the gene networks using models such as

PBNs, will make it possible to predict whether (and

when) the convergence to events such as IGFBP2

activation state will occur.

By the same token, we also envision the ability

to ‘see’ the next gene expression event when

IGFBP2 is activated. Such an endeavour can be

facilitated by laboratory experiments and prediction

must be validated by existing knowledge of the

biological system under investigation. For example,

suppose IGFBP2 is activated by experimental

manipulation. In order to determine the steady state

behaviour of some other gene after IGFBP2 activation,

biologists can select a relatively stable cell

system that has a baseline level of IGFBP2. Then,

IGFBP2 can be transfected into the cells and continuously

expressed in the new or ‘perturbed’ cell

system whose steady-state gene expression levels

are of interest. Both cell systems can be surveyed

in a genomic laboratory for the gene expression

profiles. A comparison will shed light on how and

where in the state space the network shifts, forming

a basis for prediction. In one of the described comparisons

carried out in our laboratory, cell invasion

genes were activated in IGFBP2 overexpressed

cells. This gene expression event is linked to a cellular

attractor state — cell invasion (Wang, 2003).

In this paper, we concentrate on obtaining the

steady-state (equilibrium) distribution of a PBN.

This is a crucial task in many contexts. For

instance, as was shown by Shmulevich et al.

(2002b), the steady-state distribution is necessary

in order to compute the so-called (long-term) influence,

which is a measure of the impact of a gene on

other genes. More ‘influential’ genes may possess

a greater potential to regulate the dynamics of the network, as their perturbation can lead to significant

‘downstream’ effects. Shmulevich et al.

(2002d) developed a methodology for altering the

steady-state probabilities of certain states or sets

of states with minimal modifications to the rulebased

structure of the network. Reliable computation

of steady-state distributions is crucial in this

context as well. As another example, suppose we

are interested in the long-term joint behaviour of

several selected genes, i.e. we would like to obtain

their limiting joint distribution. This information

can supply answers to questions of the type: ‘What

is the probability that gene A will be expressed

in the long run?’ or ‘What is the probability that

gene B and gene C will both be expressed in the

long run?’ Steady-state analysis is necessary for

answering such questions.

In this paper, we advocate the use of Monte Carlo

simulation methods for making reliable inferences

from steady-state analysis. A significant emphasis

will be placed on the convergence rate of the

Markov chain corresponding to the PBN, as it is

crucial to ensure that the chain reaches stationarity

before collecting information of interest. We also

illustrate our methods by analysing sub-networks

generated from human glioma gene expression data

(the data set is described in Fuller et al., 1999).

The necessary background material, including definitions

and some results related to PBNs,

Steady-state analysis

Most approaches to steady-state analysis use the

state transition matrix in some form or another. For

the case of PBNs, this would consist of constructing

the state-transition matrix A given in Equation (4)

in the Supplementary Material, and then applying

numerical methods. A variety of approaches using

iterative, projection, decompositional and other

methods could potentially be used (Stewart, 1994).

Unfortunately, however, in the case of PBNs, the

size of the state space grows exponentially in

the number of genes and becomes prohibitive forunsuitable due to the possibly irregular structure

of the transition matrix.

On a more positive note, it should be recognized

that even larger state spaces are commonly encountered

in Markov chain Monte Carlo (MCMC) methods

for many applications, including Markov random

field modelling in image processing (Winkler,

1995), where efficient simulation and estimation

are routinely performed. Thus, Monte Carlo

methods represent a viable alternative to numerical

matrix-based methods for obtaining steady-state

distributions. Informally speaking, this consists of

running the Markov chain for a sufficiently long

time, until convergence to the stationary distribution

is reached, and observing the proportion of

time the process spends in the parts of the state

space that represent the information of interest,

such as the joint stationary distribution of several

specific genes. A key factor is convergence, which

to a large extent depends on the perturbation probability,

p. In general, a larger p results in quicker

convergence, but making p too large is not biologically

meaningful.

In order for us to perform long-term analysis

of the Markov chain corresponding to a PBN

using Monte Carlo methods, we need to be able

to estimate the convergence rate of the process.

Only after we are sufficiently sure that the chain

has reached its stationary distribution can we

begin to collect information of interest. Typical

approaches for assessing convergence are based

on the second-largest eigenvalue of the transition

probability matrix A. Unfortunately, as mentioned

above, for even a moderate number of genes,

obtaining the eigenvalues of the transition matrix

may be impractical. Thus, it is advantageous to

be able to determine the number of iterations

necessary until satisfactory convergence is reached.

One approach for obtaining a priori bounds on

the number of iterations is based on the so-called

minorization condition for Markov chains (Rosenthal,

1995). The Supplementary Material, which is

available discusses the minorization condition in the context

of PBNs. Unfortunately, as we show in the

Supplementary Material, the usefulness of such a

result is rather limited. Even for a moderately small

number of genes, the number of iterations, k, predicted

by the bound is much too large to be useful in practice. We have also found (see Supplementary

Material) that making any assumptions about

the relative magnitudes of the probabilities of perturbation

and transition via the selected Boolean

functions is not likely to significantly improve the

bound on convergence, and that it is only by having

some information about the structure of the

PBN itself that the bound can potentially be lowered.

Thus, with limited ability to obtain a priori

bounds, we now turn to diagnosing convergence to

the steady-state distribution.

Diagnosing convergence

In a practical situation, it is important to be able to

empirically determine when to stop the chain and

produce our estimates. For this purpose, there are a

number of monitoring methods available (Cowles

and Carlin, 1996; Robert, 1995). Consider, for

example, the Kolmogorov–Smirnov test, a nonparametric

test of stationarity that can be used to

assess convergence.

## Tuesday, February 1, 2011

Subscribe to:
Post Comments (Atom)

## No comments:

## Post a Comment