---
title: "denim: deterministic discrete-time model with memory"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{denim: deterministic discrete-time model with memory}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.align="center"
)
library(DiagrammeR) # for flowchart diagram
library(denim)
```
## 1. Simple SIR model with gamma distributed lengths of stay
The SIR model uses 3 compartments: S (susceptible), I (infected), R (recovered) to describe clinical status of individuals. We use the most simple form of SIR model to demonstrate how to define the distribution of the lengths of stay distribution.
```{r, echo=FALSE}
DiagrammeR::grViz("digraph {
graph [layout = dot, rankdir = LR]
node [shape = rectangle]
S -> I [label = 'βSI/N']
I -> R [label = 'γI']
}",
width = 300, height = "100%")
```
The model equations are:
$$S_{t+1} - S_{t} = -\lambda S_{t} = -\frac{\beta I_{t}}{N}S_{t}$$ $$I_{t+1} - I_{t} = \frac{\beta I_{t}}{N}S_{t} - \gamma I_{t}$$ $$R_{t+1} - R_{t} = \gamma I_{t}$$
- $N$: total population size, $N = S + I + R$
- $\beta$: the product of contact rates and transmission probability; usually we define $\lambda =\frac{\beta I_{t}}{N}$ as the force of infection
- $\gamma$: recovery rate
Usually to solve the model easier we make an assumption that the recovery rate $\gamma$ is constant, this will leads to an exponentially distributed length of stay i.e most individuals recover after 1 day being infected.
```{r, echo=FALSE}
rates <- c(0.5, 1, 1.5)
x <- seq(0, 5, 0.001)
y <- dexp(x = x, rate = rates[1])
y2 <- dexp(x = x, rate = rates[2])
y3 <- dexp(x = x, rate = rates[3])
col_codes <- c("#374F77", "#EF9F32", "#6ECBD7")
plot(x, y, type = "l", col = col_codes[1], lty = 1, lwd = 3,
xlab = "Length of stay (days)", ylab = "",
ylim = c(0, 1.5), yaxt = 'n')
lines(x, y2, col = col_codes[2], lty = 1, lwd = 3)
lines(x, y3, col = col_codes[3], lty = 1, lwd = 3)
legend("right", legend = c(0.5, 1.0, 1.5),
col = col_codes, lty = 1, lwd = 3, bty = "\n")
```
A more realistic length of stay distribution can look like this, of which most patients recovered after 4 days. We defined this using a gamma distribution with shape = 3 and scale = 2.
```{r, echo=FALSE}
x <- seq(0, 20, 0.001)
y <- dgamma(x = x, shape = 3, scale = 2)
plot(x, y, type = "l", col = col_codes[1], lty = 1, lwd = 3,
xlab = "Length of stay (days)", ylab = "", yaxt = 'n')
```
The model now look like this:
```{r, echo=FALSE}
DiagrammeR::grViz("digraph {
graph [layout = dot, rankdir = LR]
node [shape = rectangle]
S -> I [label = 'βSI/N']
I -> R [label = 'd_gamma(3, 2)']
}",
width = 400, height = "100%")
```
**Model specification**
*Model transition*
```{r, echo=FALSE}
DiagrammeR::grViz("digraph {
graph [layout = dot, rankdir = LR]
node [shape = rectangle]
S -> I [label = 'βSI/N']
I -> R [label = 'd_gamma(3, 2)']
}",
width = 400, height = "100%")
```
We have two transitions `S -> I` and `I -> R` in this case. The transitions are specified in a list follow this format `"transition" = equation`, of which equation is defined with one of our functions for waiting time distribution.
```{r}
transitions <- list(
"S -> I" = "beta * S * I / N",
"I -> R" = d_gamma(3, 2)
)
```
*Initial state*
Use a vector to define the compartments with their assigned names and initial values in the format `compartment_name = initial_value`:
```{r}
initialValues <- c(
S = 999,
I = 1,
R = 0
)
```
*Model parameters*
If we use a math expression, any symbols except the compartment names are parameters, and would be defined by constant values. There are two constant parameters in our example: `beta` and `N`:
```{r}
parameters <- c(
beta = 0.012,
N = 1000
)
```
**Model application**
*Time step specification*
We run the model for 30 days and give output at 0.01 daily intervals. The default interval (time step) is 1 if not declared explicitly.
```{r}
simulationDuration <- 30
timeStep <- 0.01
```
```{r, fig.width = 6}
mod <- sim(transitions = transitions,
initialValues = initialValues,
parameters = parameters,
simulationDuration = simulationDuration,
timeStep = timeStep)
head(mod)
plot(mod)
```
## 2. How the algorithm work?
In the SIR model, all infected individuals are presented by a single compartment I and have the same recovery rate $\gamma$.
```{r, echo=FALSE}
DiagrammeR::grViz("digraph {
graph [layout = dot, rankdir = LR]
node [shape = rectangle]
S -> I [label = 'βSI/N']
I -> R [label = 'γI']
}",
width = 300, height = "100%")
```
We want the recovery rate of individuals who had been infected for 1 day differ from the recovery rate of 2-day infected patients. So rather than using one compartment for infected (I), we define multiple infected sub-compartments. The number of sub-compartments depends on the maximum day we expect all infected individuals would be recovered.
For example, if we expect a disease with a maximum 4 days of infection, we will end up with 4 sub-compartments. Each sub-compartment has its own recovery rate $\gamma_{1}$, $\gamma_{2}$, $\gamma_{3}$, $\gamma_{4}$. At day 4 it is certain that the patient will recover (because we assume that this disease has a maximum 4 days of infection), $\gamma_{4} = 1$.
```{r, echo=FALSE}
DiagrammeR::grViz("digraph {
graph [layout = dot, rankdir = LR]
node [shape = rectangle]
S -> I1 [label = 'βS(I@_{1} + I@_{2} + I@_{3} + I@_{4})/N']
I1 -> R1 [label = 'γ@_{1}I@_{1}']
I2 -> R2 [label = 'γ@_{2}I@_{2}']
I3 -> R3 [label = 'γ@_{3}I@_{3}']
I4 -> R4 [label = 'I@_{4}']
I1 -> I2 [label = '(1-γ@_{1})I@_{1}']
I2 -> I3 [label = '(1-γ@_{2})I@_{2}']
I3 -> I4 [label = '(1-γ@_{3})I@_{3}']
}", height = "100%", width = "100%")
```
Let $R_1 + R_2 + R_3 + R_4 = \Sigma R$. We have $\frac{R_1}{\Sigma R} = p_1$, $\frac{R_2}{\Sigma R} = p_2$, $\frac{R_3}{\Sigma R} = p_3$, $\frac{R_4}{\Sigma R} = p_4$. Our mission is to estimate $\gamma_{1}$, $\gamma_{2}$, $\gamma_{3}$ to obtain $p_1$, $p_2$, $p_3$, $p_4$ that fit a pre-defined distribution at the equilibrium state. This can be obtained by setting:
$$\gamma_{i} = \frac{p_i}{1 - \sum_{j=1}^{i-1}p_j}$$
For a given length of stay distribution, we identify the maximum length of stay using its cumulative distribution function. Because cumulative distribution function is asymptotic to 1 and never equal to 1, we need to set a value that is acceptable to be rounded to 1. If we want a cumulative probability of 0.999 to be rounded as 1, we set the error tolerance threshold as `1 - 0.999 = 0.001` (specified by the argument `errorTolerance = 0.001`). The time when cumulative probability = 0.999 will be set as the maximum length of stay of the compartment. Default `errorTolerance` of `denim` is set at `0.001`.
## 3. Waiting time distribution
Current available distribution in this package including:
- `d_exponential(rate)`: Discrete **exponential distribution** with parameter `rate`
- `d_gamma(scale, shape)`: Discrete **gamma distribution** with parameters `scale` and `shape`
- `d_weibull(scale, shape)`: Discrete **Weibull distribution** with parameters `scale` and `shape`
- `d_lognormal(mu, sigma)`: Discrete **log-normal distribution** with parameters `mu` and `sigma`
You can define other type of transitions such as:
- Mathematical expression: Transition defined with a string value such as `"beta * S * I / N"` will be converted to a mathematical expression. You will need to define parameters that are not compartment names in the `parameters` argument
- Constant: Transition defined with a numerical value such as `1`, `2` will be converted to a constant. This is to define the number of individuals moving in a time step.
- `transprob(x)`: Every time step a **fixed percentage** of the left compartment transit to the right compartment, this is also a convenient way to define $R_t - R_{t-1} = \gamma I$ which we can input `"I -> R" = transprob(gamma)`
- `nonparametric(waitingTimes...)`: A **vector of values**, could be numbers, percentages, density of the length of stay based on real data, `denim` will convert it into a distribution
- `multinomial(probabilities)`: A convenient way to define **several probabilities** of a compartment transit to many compartments, may or may not in a time step. For example, `"V -> VA, VS, VH" = multinomial(0.6, 0.3, 0.1)` means 60% of `V` will become `VA`, 30% become `VS` and 10% become `VH`. If we continue to define the length of stay distribution for these transitions e.g `"V -> VA" = d_gamma(3, 2)`, probabilities defined with `multinomial()` is not the percentage of the left compartment transit in each time step, but the percentage of individuals move to `VA` at the equilibrium state. If we do not define further length of stay distribution, `"V -> VA, VS, VH" = multinomial(0.6, 0.3, 0.1)` is the percentage of `V` transit to the right compartments per time step similar to a `transprob()` function. See more detailed explanations in the ***Multiple transitions from a compartment*** section.
## 4. Multiple transitions from a compartment
We have many ways to define the type of transition when there are two or more transitions from a compartment. Consider this example:
```{r, echo=FALSE}
DiagrammeR::grViz("digraph {
graph [layout = dot, rankdir = LR]
node [shape = rectangle]
S -> I [label = 'βSI/N']
S -> V [label = '5']
I -> R [label = '0.9 -> d_gamma(3, 2)']
I -> D [label = '0.1 -> d_lognormal(2, 0.5)']
}",
width = 500, height = "100%")
```
There are two scenarios in this example:
- Susceptible individuals can be infected or vaccinated. The assumption here is they will be infected first (`S -> I`), and then the rest of them who were not infected will get vaccinated (`S -> V`).
- Infected individuals can recover or die. If the mortality probability is known, we can implement it into the model, for example by defining `0.9 * I -> R` (90% individuals will recover) and then `0.1 * I -> D` (10% of them die). By doing so, we ensure that the mortality probability is 10%, while also define the length of stay of individuals at the infected state before recover or die follows gamma or log-normal distribution, respectively.
We can define the model for this example as follows:
```{r, fig.width = 5}
transitions <- list(
"S -> I" = "beta * S * I / N",
"S -> V" = 7,
"0.9 * I -> R" = d_gamma(3, 2),
"0.1 * I -> D" = d_lognormal(2, 0.5)
)
initialValues <- c(
S = 999,
I = 1,
R = 0,
V = 0,
D = 0
)
parameters <- c(
beta = 0.12,
N = 1000
)
simulationDuration <- 10
timeStep <- 0.01
mod <- sim(transitions = transitions,
initialValues = initialValues,
parameters = parameters,
simulationDuration = simulationDuration,
timeStep = timeStep)
plot(mod)
```
**Tips**: Instead of writing:
```
"0.9 * I -> R" = d_gamma(3, 2),
"0.1 * I -> D" = d_lognormal(2, 0.5)
```
You can also use the `multinomial()`, then define the length of stay distribution and obtain the same result:
```
"I -> R, D" = multinomial(0.9, 0.1),
"I -> R" = d_gamma(3, 2),
"I -> D" = d_lognormal(2, 0.5)
```
## 5. Another example
```{r, echo=FALSE}
DiagrammeR::grViz("digraph {
graph [layout = dot, rankdir = LR]
node [shape = rectangle]
S -> I [label = 'βS(I + IV)/N']
S -> V [label = '2']
I -> D [label = '0.1 -> d_lognormal(2, 0.5)']
I -> R [label = '0.9 -> d_gamma(3, 2)']
V -> IV [label = '0.1 * βV(I + IV)/N']
IV -> R [label = 'd_exponential(2)']
}",
width = 700, height = "100%")
```
```{r, fig.width = 5}
transitions <- list(
"S -> I" = "beta * S * (I + IV) / N",
"S -> V" = 2,
"0.1 * I -> D" = d_lognormal(2, 0.5),
"0.9 * I -> R" = d_gamma(3, 2),
"V -> IV" = "0.1 * beta * V * (I + IV) / N",
"IV -> R" = d_exponential(2)
)
initialValues <- c(
S = 999,
I = 1,
R = 0,
V = 0,
IV = 0,
D = 0
)
parameters <- c(
beta = 0.12,
N = 1000
)
simulationDuration <- 10
timeStep <- 0.01
mod <- sim(transitions = transitions,
initialValues = initialValues,
parameters = parameters,
simulationDuration = simulationDuration,
timeStep = timeStep)
plot(mod)
```