--- title: "denim: deterministic discrete-time model with memory" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{denim: deterministic discrete-time model with memory} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.align="center" ) library(DiagrammeR) # for flowchart diagram library(denim) ``` ## 1. Simple SIR model with gamma distributed lengths of stay The SIR model uses 3 compartments: S (susceptible), I (infected), R (recovered) to describe clinical status of individuals. We use the most simple form of SIR model to demonstrate how to define the distribution of the lengths of stay distribution. ```{r, echo=FALSE} DiagrammeR::grViz("digraph { graph [layout = dot, rankdir = LR] node [shape = rectangle] S -> I [label = 'βSI/N'] I -> R [label = 'γI'] }", width = 300, height = "100%") ``` The model equations are: $$S_{t+1} - S_{t} = -\lambda S_{t} = -\frac{\beta I_{t}}{N}S_{t}$$ $$I_{t+1} - I_{t} = \frac{\beta I_{t}}{N}S_{t} - \gamma I_{t}$$ $$R_{t+1} - R_{t} = \gamma I_{t}$$ - $N$: total population size, $N = S + I + R$ - $\beta$: the product of contact rates and transmission probability; usually we define $\lambda =\frac{\beta I_{t}}{N}$ as the force of infection - $\gamma$: recovery rate Usually to solve the model easier we make an assumption that the recovery rate $\gamma$ is constant, this will leads to an exponentially distributed length of stay i.e most individuals recover after 1 day being infected. ```{r, echo=FALSE} rates <- c(0.5, 1, 1.5) x <- seq(0, 5, 0.001) y <- dexp(x = x, rate = rates[1]) y2 <- dexp(x = x, rate = rates[2]) y3 <- dexp(x = x, rate = rates[3]) col_codes <- c("#374F77", "#EF9F32", "#6ECBD7") plot(x, y, type = "l", col = col_codes[1], lty = 1, lwd = 3, xlab = "Length of stay (days)", ylab = "", ylim = c(0, 1.5), yaxt = 'n') lines(x, y2, col = col_codes[2], lty = 1, lwd = 3) lines(x, y3, col = col_codes[3], lty = 1, lwd = 3) legend("right", legend = c(0.5, 1.0, 1.5), col = col_codes, lty = 1, lwd = 3, bty = "\n") ``` A more realistic length of stay distribution can look like this, of which most patients recovered after 4 days. We defined this using a gamma distribution with shape = 3 and scale = 2. ```{r, echo=FALSE} x <- seq(0, 20, 0.001) y <- dgamma(x = x, shape = 3, scale = 2) plot(x, y, type = "l", col = col_codes[1], lty = 1, lwd = 3, xlab = "Length of stay (days)", ylab = "", yaxt = 'n') ``` The model now look like this: ```{r, echo=FALSE} DiagrammeR::grViz("digraph { graph [layout = dot, rankdir = LR] node [shape = rectangle] S -> I [label = 'βSI/N'] I -> R [label = 'd_gamma(3, 2)'] }", width = 400, height = "100%") ``` **Model specification** *Model transition* ```{r, echo=FALSE} DiagrammeR::grViz("digraph { graph [layout = dot, rankdir = LR] node [shape = rectangle] S -> I [label = 'βSI/N'] I -> R [label = 'd_gamma(3, 2)'] }", width = 400, height = "100%") ``` We have two transitions `S -> I` and `I -> R` in this case. The transitions are specified in a list follow this format `"transition" = equation`, of which equation is defined with one of our functions for waiting time distribution. ```{r} transitions <- list( "S -> I" = "beta * S * I / N", "I -> R" = d_gamma(3, 2) ) ``` *Initial state* Use a vector to define the compartments with their assigned names and initial values in the format `compartment_name = initial_value`: ```{r} initialValues <- c( S = 999, I = 1, R = 0 ) ``` *Model parameters* If we use a math expression, any symbols except the compartment names are parameters, and would be defined by constant values. There are two constant parameters in our example: `beta` and `N`: ```{r} parameters <- c( beta = 0.012, N = 1000 ) ``` **Model application** *Time step specification* We run the model for 30 days and give output at 0.01 daily intervals. The default interval (time step) is 1 if not declared explicitly. ```{r} simulationDuration <- 30 timeStep <- 0.01 ``` ```{r, fig.width = 6} mod <- sim(transitions = transitions, initialValues = initialValues, parameters = parameters, simulationDuration = simulationDuration, timeStep = timeStep) head(mod) plot(mod) ``` ## 2. How the algorithm work? In the SIR model, all infected individuals are presented by a single compartment I and have the same recovery rate $\gamma$. ```{r, echo=FALSE} DiagrammeR::grViz("digraph { graph [layout = dot, rankdir = LR] node [shape = rectangle] S -> I [label = 'βSI/N'] I -> R [label = 'γI'] }", width = 300, height = "100%") ``` We want the recovery rate of individuals who had been infected for 1 day differ from the recovery rate of 2-day infected patients. So rather than using one compartment for infected (I), we define multiple infected sub-compartments. The number of sub-compartments depends on the maximum day we expect all infected individuals would be recovered. For example, if we expect a disease with a maximum 4 days of infection, we will end up with 4 sub-compartments. Each sub-compartment has its own recovery rate $\gamma_{1}$, $\gamma_{2}$, $\gamma_{3}$, $\gamma_{4}$. At day 4 it is certain that the patient will recover (because we assume that this disease has a maximum 4 days of infection), $\gamma_{4} = 1$. ```{r, echo=FALSE} DiagrammeR::grViz("digraph { graph [layout = dot, rankdir = LR] node [shape = rectangle] S -> I1 [label = 'βS(I@_{1} + I@_{2} + I@_{3} + I@_{4})/N'] I1 -> R1 [label = 'γ@_{1}I@_{1}'] I2 -> R2 [label = 'γ@_{2}I@_{2}'] I3 -> R3 [label = 'γ@_{3}I@_{3}'] I4 -> R4 [label = 'I@_{4}'] I1 -> I2 [label = '(1-γ@_{1})I@_{1}'] I2 -> I3 [label = '(1-γ@_{2})I@_{2}'] I3 -> I4 [label = '(1-γ@_{3})I@_{3}'] }", height = "100%", width = "100%") ``` Let $R_1 + R_2 + R_3 + R_4 = \Sigma R$. We have $\frac{R_1}{\Sigma R} = p_1$, $\frac{R_2}{\Sigma R} = p_2$, $\frac{R_3}{\Sigma R} = p_3$, $\frac{R_4}{\Sigma R} = p_4$. Our mission is to estimate $\gamma_{1}$, $\gamma_{2}$, $\gamma_{3}$ to obtain $p_1$, $p_2$, $p_3$, $p_4$ that fit a pre-defined distribution at the equilibrium state. This can be obtained by setting: $$\gamma_{i} = \frac{p_i}{1 - \sum_{j=1}^{i-1}p_j}$$ For a given length of stay distribution, we identify the maximum length of stay using its cumulative distribution function. Because cumulative distribution function is asymptotic to 1 and never equal to 1, we need to set a value that is acceptable to be rounded to 1. If we want a cumulative probability of 0.999 to be rounded as 1, we set the error tolerance threshold as `1 - 0.999 = 0.001` (specified by the argument `errorTolerance = 0.001`). The time when cumulative probability = 0.999 will be set as the maximum length of stay of the compartment. Default `errorTolerance` of `denim` is set at `0.001`. ## 3. Waiting time distribution Current available distribution in this package including: - `d_exponential(rate)`: Discrete **exponential distribution** with parameter `rate` - `d_gamma(scale, shape)`: Discrete **gamma distribution** with parameters `scale` and `shape` - `d_weibull(scale, shape)`: Discrete **Weibull distribution** with parameters `scale` and `shape` - `d_lognormal(mu, sigma)`: Discrete **log-normal distribution** with parameters `mu` and `sigma` You can define other type of transitions such as: - Mathematical expression: Transition defined with a string value such as `"beta * S * I / N"` will be converted to a mathematical expression. You will need to define parameters that are not compartment names in the `parameters` argument - Constant: Transition defined with a numerical value such as `1`, `2` will be converted to a constant. This is to define the number of individuals moving in a time step. - `transprob(x)`: Every time step a **fixed percentage** of the left compartment transit to the right compartment, this is also a convenient way to define $R_t - R_{t-1} = \gamma I$ which we can input `"I -> R" = transprob(gamma)` - `nonparametric(waitingTimes...)`: A **vector of values**, could be numbers, percentages, density of the length of stay based on real data, `denim` will convert it into a distribution - `multinomial(probabilities)`: A convenient way to define **several probabilities** of a compartment transit to many compartments, may or may not in a time step. For example, `"V -> VA, VS, VH" = multinomial(0.6, 0.3, 0.1)` means 60% of `V` will become `VA`, 30% become `VS` and 10% become `VH`. If we continue to define the length of stay distribution for these transitions e.g `"V -> VA" = d_gamma(3, 2)`, probabilities defined with `multinomial()` is not the percentage of the left compartment transit in each time step, but the percentage of individuals move to `VA` at the equilibrium state. If we do not define further length of stay distribution, `"V -> VA, VS, VH" = multinomial(0.6, 0.3, 0.1)` is the percentage of `V` transit to the right compartments per time step similar to a `transprob()` function. See more detailed explanations in the ***Multiple transitions from a compartment*** section. ## 4. Multiple transitions from a compartment We have many ways to define the type of transition when there are two or more transitions from a compartment. Consider this example: ```{r, echo=FALSE} DiagrammeR::grViz("digraph { graph [layout = dot, rankdir = LR] node [shape = rectangle] S -> I [label = 'βSI/N'] S -> V [label = '5'] I -> R [label = '0.9 -> d_gamma(3, 2)'] I -> D [label = '0.1 -> d_lognormal(2, 0.5)'] }", width = 500, height = "100%") ``` There are two scenarios in this example: - Susceptible individuals can be infected or vaccinated. The assumption here is they will be infected first (`S -> I`), and then the rest of them who were not infected will get vaccinated (`S -> V`). - Infected individuals can recover or die. If the mortality probability is known, we can implement it into the model, for example by defining `0.9 * I -> R` (90% individuals will recover) and then `0.1 * I -> D` (10% of them die). By doing so, we ensure that the mortality probability is 10%, while also define the length of stay of individuals at the infected state before recover or die follows gamma or log-normal distribution, respectively. We can define the model for this example as follows: ```{r, fig.width = 5} transitions <- list( "S -> I" = "beta * S * I / N", "S -> V" = 7, "0.9 * I -> R" = d_gamma(3, 2), "0.1 * I -> D" = d_lognormal(2, 0.5) ) initialValues <- c( S = 999, I = 1, R = 0, V = 0, D = 0 ) parameters <- c( beta = 0.12, N = 1000 ) simulationDuration <- 10 timeStep <- 0.01 mod <- sim(transitions = transitions, initialValues = initialValues, parameters = parameters, simulationDuration = simulationDuration, timeStep = timeStep) plot(mod) ``` **Tips**: Instead of writing: ``` "0.9 * I -> R" = d_gamma(3, 2), "0.1 * I -> D" = d_lognormal(2, 0.5) ``` You can also use the `multinomial()`, then define the length of stay distribution and obtain the same result: ``` "I -> R, D" = multinomial(0.9, 0.1), "I -> R" = d_gamma(3, 2), "I -> D" = d_lognormal(2, 0.5) ``` ## 5. Another example ```{r, echo=FALSE} DiagrammeR::grViz("digraph { graph [layout = dot, rankdir = LR] node [shape = rectangle] S -> I [label = 'βS(I + IV)/N'] S -> V [label = '2'] I -> D [label = '0.1 -> d_lognormal(2, 0.5)'] I -> R [label = '0.9 -> d_gamma(3, 2)'] V -> IV [label = '0.1 * βV(I + IV)/N'] IV -> R [label = 'd_exponential(2)'] }", width = 700, height = "100%") ``` ```{r, fig.width = 5} transitions <- list( "S -> I" = "beta * S * (I + IV) / N", "S -> V" = 2, "0.1 * I -> D" = d_lognormal(2, 0.5), "0.9 * I -> R" = d_gamma(3, 2), "V -> IV" = "0.1 * beta * V * (I + IV) / N", "IV -> R" = d_exponential(2) ) initialValues <- c( S = 999, I = 1, R = 0, V = 0, IV = 0, D = 0 ) parameters <- c( beta = 0.12, N = 1000 ) simulationDuration <- 10 timeStep <- 0.01 mod <- sim(transitions = transitions, initialValues = initialValues, parameters = parameters, simulationDuration = simulationDuration, timeStep = timeStep) plot(mod) ```