Sensitivity Analysis by Python SALib

The post was updated on 2020-01-02 under elementary OS 5.1 Hera.

Sensitivity Analysis is the study of how uncertainty in the output of a model (numerical or otherwise) can be apportioned to different sources of uncertainty in the model input. - Global Sensitivity Analysis: The Primer

According to the above definition, sensitivity analysis is a mathematical method to understand how model output influenced by the uncertainty in the model parameters. In this post, I used a simple example to perform a sensitivity analysis by python module SALib. In addition, the Sobol method was used as an example and visualized the test results in R. The running environment was in R and utilized reticulate package to call Python from R.

Installation & setting

The following command is used to install the related R package (reticulate & ggplot) and python SALib.

install.packages("reticulate")
install.packages("ggplot2")
library(reticulate)
py_install("SALib")

Instead, using pip3 install SALib in the terminal can also install SALib.

# Load ggplot for plotting
library(ggplot2)

The reticulate uses the version of Python found in PATH by default. The use_python() and Sys.which() functions can help us assign the alternate version and check it, for example:

use_python("/usr/bin/python3")
Sys.which("python3")
##            python3 
## "/usr/bin/python3"

Sensitivity analysis by SALib

There are two ways to call python script. One uses source_python() to read the python script. The other is to use Python REPL embedded within the R session by repl_python(). After the analysis in Python REPL, use exit to return to the R prompt. The following is the python script that conducts Sobol sensitivity analysis for Ishigami function (sourced from SALib example). The Ishigami function is a classic function that can be used to test uncertainty and conduct sensitivity analysis. The parameters in the Ishigami function also have strong nonlinearity or nonmonotonicity with outputs.

\[f(x)=sin(x_1)+asin^2(x_2)+bx_{3}^4(x_1)\]

According to the sorce code in SALib, the default setting of \(a\) and \(b\) are 7 and 0.1. In addition, the distribution of \(X_1\), \(X_2\), and \(X_3\) are uniform distributions ranged between \(-\pi\) and \(\pi\).

# Importing SALib
import SALib
import random
from SALib.sample import saltelli
from SALib.analyze import sobol
from SALib.test_functions import Ishigami

# Set random seed to reproduce the same result
random.seed( 10 )

# Define the model inputs
problem = {
  'num_vars': 3,
  'names': ['x1', 'x2', 'x3'],
  'bounds': [[-3.14159265359, 3.14159265359],
             [-3.14159265359, 3.14159265359],
             [-3.14159265359, 3.14159265359]]
}
# Generate samples
X = saltelli.sample(problem, 1000)

# Run model (example)
Y = Ishigami.evaluate(X)

# Perform analysis
Si = sobol.analyze(problem, Y, print_to_console=True)
## Parameter S1 S1_conf ST ST_conf
## x1 0.307975 0.056523 0.560137 0.091808
## x2 0.447767 0.047787 0.438722 0.039933
## x3 -0.004255 0.056495 0.242845 0.026180
## 
## Parameter_1 Parameter_2 S2 S2_conf
## x1 x2 0.012205 0.089981
## x1 x3 0.251526 0.106676
## x2 x3 -0.009954 0.061454

By using Sobol sensitivity analysis, we can obtain three sensitivity indices, which are first-order index (S1; the contribution to the output variance by a single model parameter alone), second-order index (S2; the contribution to the output variance caused by the interaction of two model parameters), and total-order index (ST; the contribution to the output variance caused by a model parameters, including both its first-order effects and interactions). The confidence interval is generated by resampling that can be used to check the convergence of the estimated results.

Visualization in R

Back to R, the following step is to do the necessary data manipulation and visualize the result. The results of the analysis are stored in the object py.

py$Si
## {'S1': array([ 0.30797549,  0.44776661, -0.00425452]), 'S1_conf': array([0.0565232 , 0.0477871 , 0.05649524]), 'ST': array([0.56013728, 0.4387225 , 0.24284474]), 'ST_conf': array([0.09180842, 0.03993287, 0.02618015]), 'S2': array([[        nan,  0.01220462,  0.25152574],
##        [        nan,         nan, -0.00995392],
##        [        nan,         nan,         nan]]), 'S2_conf': array([[       nan, 0.08998064, 0.10667619],
##        [       nan,        nan, 0.06145389],
##        [       nan,        nan,        nan]])}

The Saltelli sampler generated \(N\times(2D+2)\) samples (8000 samples in this case) to evaluate the model sensitivity, where \(N\) is the setting sample number(1000 in this case), and \(D\) is the number of model inputs (3 in this case).

par(mfrow=c(1,3), mar=c(2,2,2,1))
plot(py$X[,1], main="X1")
plot(py$X[,2], main="X2")
plot(py$X[,3], main="X3")

The below scatter plot of parameters (\(X\)) and model output (\(Y\)) can be used to understand the parameter effect on the output variance.

par(mfrow=c(1,3), mar=c(2,2,2,1))
plot(py$X[,1], py$Y, pch = ".", main="X1")
plot(py$X[,2], py$Y, pch = ".", main="X2")
plot(py$X[,3], py$Y, pch = ".", main="X3")

Finally, we can use the calculated sensitivity index to investigate the influence of model parameters.

# Create the data frame for ggplot
parms <- c("x1", "x2", "x3")
sobol <- c(py$Si$S1, py$Si$ST)
conf <- c(py$Si$S1_conf, py$Si$ST_conf)
df <- data.frame(parms, sobol, conf)
df$Order <- rep(c("Main", "Total"), each = 3)
df
##   parms        sobol       conf Order
## 1    x1  0.307975489 0.05652320  Main
## 2    x2  0.447766610 0.04778710  Main
## 3    x3 -0.004254525 0.05649524  Main
## 4    x1  0.560137277 0.09180842 Total
## 5    x2  0.438722499 0.03993287 Total
## 6    x3  0.242844745 0.02618015 Total
# Plotting
pd <- position_dodge(0.1)
theme_set(theme_bw())
ggplot(df, aes(x=parms, y=sobol, group=Order, color=Order)) + 
  geom_errorbar(aes(ymin=sobol-conf, ymax=sobol+conf), width=0, position=pd) +
  geom_point(position=pd, size=3) +
  labs(x="Parameter", y="Sobol index")

According to this result, we can easily find that the parameter \(X1\) has the highest effect on model output based on the total effect. In addition, the current result also shows that the estimated Sobol index does not reach the acceptable convergence (wider confidence interval). Therefore, the sample number needs to increase further to improve the variation in the estimation.

Session info

sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: elementary OS 5.1 Hera
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggplot2_3.2.1   reticulate_1.14
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.3        compiler_3.6.1    pillar_1.4.2     
##  [4] tools_3.6.1       digest_0.6.23     jsonlite_1.6     
##  [7] evaluate_0.14     lifecycle_0.1.0   tibble_2.1.3     
## [10] gtable_0.3.0      pkgconfig_2.0.3   rlang_0.4.2      
## [13] cli_1.1.0         rstudioapi_0.10   yaml_2.2.0       
## [16] blogdown_0.17     xfun_0.11         withr_2.1.2      
## [19] stringr_1.4.0     dplyr_0.8.3       knitr_1.25       
## [22] rappdirs_0.3.1    tidyselect_0.2.5  grid_3.6.1       
## [25] glue_1.3.1        R6_2.4.1          rmarkdown_2.0    
## [28] bookdown_0.16     sessioninfo_1.1.1 farver_2.0.1     
## [31] purrr_0.3.3       magrittr_1.5      scales_1.1.0     
## [34] htmltools_0.3.6   assertthat_0.2.1  colorspace_1.4-1 
## [37] labeling_0.3      stringi_1.4.3     lazyeval_0.2.2   
## [40] munsell_0.5.0     crayon_1.3.4
py_discover_config()
## python:         /home/nan-hung/.local/share/r-miniconda/envs/r-reticulate/bin/python
## libpython:      /home/nan-hung/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
## pythonhome:     /home/nan-hung/.local/share/r-miniconda/envs/r-reticulate:/home/nan-hung/.local/share/r-miniconda/envs/r-reticulate
## version:        3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31)  [GCC 7.3.0]
## numpy:          /home/nan-hung/.local/lib/python3.6/site-packages/numpy
## numpy_version:  1.18.0

Related