--- title: "User guide for executing `CytOpT` on `HIPC` data" author: "Paul Freulon, Jérémie Bigot, Kalidou BA, Boris Hejblum" date: "`r Sys.Date()`" logo: "man/figures/logo.png" output: rmarkdown::html_vignette: toc: yes vignette: > %\VignetteIndexEntry{User guide for executing `CytOpT` on `HIPC` data} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r knitrsetup, include = FALSE} library(reticulate) # this vignette requires python 3.7 or newer to run eval <- tryCatch({ numeric_version(py_config()$version) >= "3.7" && py_numpy_available() && py_module_available("scipy") && py_module_available("sklearn") }, error = function(e) FALSE) knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = eval ) ``` # Introduction `CytOpT` is a supervised method that directly estimates the cell proportions in a flow-cytometry data set by using a source gating as its input and relies on regularized optimal transport. # Analysis of HIPC data As an illustrative example, we analyze here the flow cytometry data from the T-cell panel of the Human Immunology Project Consortium (HIPC) publicly available on ImmuneSpace [Gottardo et al. [2014]](https://www.nature.com/articles/nbt.2777). An `HIPC` data set has the following structure (split into 2 files): - `xx_y_values`: flow-cytometry measurements - `xx_y_clust`: file with the corresponding manual clustering Above, `xx` denotes the center where the data analysis was performed, and `y` denotes the patient and the replicate of the biological sample in question. ## Data load ```{r, eval=TRUE} library(CytOpT) data("HIPC_Stanford") ``` Here are the first few lines of the flow-cytometry measurements from patient *1228* replicate *1A*: ```{r} knitr::kable(head(HIPC_Stanford_1228_1A)) ``` The manual clustering of these data into 10 cell populations (`r paste0(levels(HIPC_Stanford_1228_1A_labels), collapse=", ")`) can be accessed from the `HIPC_Stanford_1228_1A_labels` object. We will use the manual gating from patient *1228* replicate *1A* as our source proportions to infer proportions for patient *1369* replicate *1A*. ## Computation of the benchmark class proportions for target data Because in this example, we know the true proportions in the target data set `HIPC_Stanford_1369_1A`, we can assess the gap between the estimate form `CytOpt` and the cellular proportions from the reference manual gating. For this purpose, we compute those manual proportions with: ```{r} gold_standard_manual_prop <- c(table(HIPC_Stanford_1369_1A_labels)/length(HIPC_Stanford_1369_1A_labels)) ``` ## `CytOpT` ### Optimization ```{r} set.seed(123) res <- CytOpT(X_s = HIPC_Stanford_1228_1A, X_t = HIPC_Stanford_1369_1A, Lab_source = HIPC_Stanford_1228_1A_labels, theta_true = gold_standard_manual_prop, method='both', monitoring = TRUE) ``` ### Results The results from `CytOpt` for both optimization algorithms are: ```{r} summary(res) ``` Some visualizations are provided by the `plot()` method: ```{r, fig.width = 7, fig.asp = .8, fig.retina=2} plot(res) ``` ### Performance evaluation Concordance between the manual gating gold-standard and `CytOpt` estimation can be graphically diagnosed with Bland-Altman plots: ```{r BA, fig.width = 7, fig.asp = .6, fig.retina = 2} Bland_Altman(res$proportions) ``` ---- The methods implemented in the `CytOpt` package are detailed in the following article: > Paul Freulon, Jérémie Bigot, Boris P. Hejblum. CytOpT: Optimal > Transport with Domain Adaptation for Interpreting Flow Cytometry data >