Title: | Optimal Transport for Gating Transfer in Cytometry Data with Domain Adaptation |
---|---|
Description: | Supervised learning from a source distribution (with known segmentation into cell sub-populations) to fit a target distribution with unknown segmentation. It relies regularized optimal transport to directly estimate the different cell population proportions from a biological sample characterized with flow cytometry measurements. It is based on the regularized Wasserstein metric to compare cytometry measurements from different samples, thus accounting for possible mis-alignment of a given cell population across sample (due to technical variability from the technology of measurements). Supervised learning technique based on the Wasserstein metric that is used to estimate an optimal re-weighting of class proportions in a mixture model Details are presented in Freulon P, Bigot J and Hejblum BP (2021) <arXiv:2006.09003>. |
Authors: | Boris Hejblum [aut, cre], Paul Freulon [aut], Kalidou Ba [aut, trl] |
Maintainer: | Boris Hejblum <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.9.4 |
Built: | 2025-01-14 04:38:29 UTC |
Source: | https://github.com/sistm/cytopt-r |
Function to display a bland plot in order to visually assess the agreement between CytOpt estimation of the class proportions and the estimate of the class proportions provided through manual gating.
barplot_prop(proportions, title = "", xaxis_angle = 45)
barplot_prop(proportions, title = "", xaxis_angle = 45)
proportions |
|
title |
plot title. Default is |
xaxis_angle |
scalar indicating an angle to tilt the labels of x_axis. Default is |
a ggplot
object
if(interactive()){ res <- CytOpT(X_s = HIPC_Stanford_1228_1A, X_t = HIPC_Stanford_1369_1A, Lab_source = HIPC_Stanford_1228_1A_labels, eps = 0.0001, lbd = 0.0001, n_iter = 10000, n_stoc=10, step_grad = 10, step = 5, power = 0.99, method='minmax') barplot_prop(res$proportions) }
if(interactive()){ res <- CytOpT(X_s = HIPC_Stanford_1228_1A, X_t = HIPC_Stanford_1369_1A, Lab_source = HIPC_Stanford_1228_1A_labels, eps = 0.0001, lbd = 0.0001, n_iter = 10000, n_stoc=10, step_grad = 10, step = 5, power = 0.99, method='minmax') barplot_prop(res$proportions) }
Function to display a Bland & Altman plot in order to visually assess the agreement between CytOpt estimation
of the class proportions and the estimate of the class proportions provided through manual gating.
Requires that either theta_true
or Lab_target
was provided when running CytOpT()
.
Bland_Altman(proportions, additional_info_shape = NULL)
Bland_Altman(proportions, additional_info_shape = NULL)
proportions |
|
additional_info_shape |
vector of additional information to be used for shape in the plot. Not implemented yet. #'@return a |
if(interactive()){ gold_standard_manual_prop <- c(table(HIPC_Stanford_1369_1A_labels) / length(HIPC_Stanford_1369_1A_labels)) res <- CytOpT(X_s = HIPC_Stanford_1228_1A, X_t = HIPC_Stanford_1369_1A, Lab_source = HIPC_Stanford_1228_1A_labels, theta_true = gold_standard_manual_prop, eps = 0.0001, lbd = 0.0001, n_iter = 10000, n_stoc=10, step_grad = 10, step = 5, power = 0.99, method='both') Bland_Altman(res$proportions) }
if(interactive()){ gold_standard_manual_prop <- c(table(HIPC_Stanford_1369_1A_labels) / length(HIPC_Stanford_1369_1A_labels)) res <- CytOpT(X_s = HIPC_Stanford_1228_1A, X_t = HIPC_Stanford_1369_1A, Lab_source = HIPC_Stanford_1228_1A_labels, theta_true = gold_standard_manual_prop, eps = 0.0001, lbd = 0.0001, n_iter = 10000, n_stoc=10, step_grad = 10, step = 5, power = 0.99, method='both') Bland_Altman(res$proportions) }
Function to estimate the type cell proportions in an unclassified cytometry data set denoted X_s by using the classification Lab_source from an other cytometry data set X_s. With this function the computation of the estimate of the class proportions is done with a descent ascent or minmax or two algorithms.
CytOpT( X_s, X_t, Lab_source, Lab_target = NULL, theta_true = NULL, method = c("minmax", "desasc", "both"), eps = 1e-04, n_iter = 10000, power = 0.99, step_grad = 10, step = 5, lbd = 1e-04, n_out = 5000, n_stoc = 10, minMaxScaler = TRUE, monitoring = FALSE, thresholding = TRUE )
CytOpT( X_s, X_t, Lab_source, Lab_target = NULL, theta_true = NULL, method = c("minmax", "desasc", "both"), eps = 1e-04, n_iter = 10000, power = 0.99, step_grad = 10, step = 5, lbd = 1e-04, n_out = 5000, n_stoc = 10, minMaxScaler = TRUE, monitoring = FALSE, thresholding = TRUE )
X_s |
a cytometry dataframe with only |
X_t |
a cytometry dataframe with only |
Lab_source |
a vector of length |
Lab_target |
a vector of length |
theta_true |
If available, gold-standard proportions in the target data
set |
method |
a character string indicating which method to use to
compute the cytopt, either |
eps |
a float value of regularization parameter of the Wasserstein distance. Default is |
n_iter |
an integer Constant that iterate method select. Default is |
power |
a float constant the step size policy of the gradient ascent method is step/n^power. Default is |
step_grad |
an integer number step size of the gradient descent algorithm of the outer loop.
Default is |
step |
an integer constant that multiply the step-size policy. Default is |
lbd |
a float constant that multiply the step-size policy. Default is |
n_out |
an integer number of iterations in the outer loop. This loop corresponds to the gradient
descent algorithm to minimize the regularized Wasserstein distance between the source and
target data sets. Default is |
n_stoc |
an integer number of iterations in the inner loop. This loop corresponds to the stochastic
algorithm that approximates a maximizer of the semi dual problem. Default is |
minMaxScaler |
a logical flag indicating to whether to scale observations
between 0 and 1. Default is |
monitoring |
a logical flag indicating to possibly monitor the gap between the estimated proportions and the manual
gold-standard. Default is |
thresholding |
a logical flag indicating whether to threshold negative
values. Default is |
a object of class CytOpt
, which is a list of two elements:
proportions
a data.frame
with the (optionally true and)
estimated proportions for each method
monitoring
a list of estimates over the optimization iterations
for each method
(listed within)
if(interactive()){ res <- CytOpT(X_s = HIPC_Stanford_1228_1A, X_t = HIPC_Stanford_1369_1A, Lab_source = HIPC_Stanford_1228_1A_labels, method='minmax') summary(res) plot(res) }
if(interactive()){ res <- CytOpT(X_s = HIPC_Stanford_1228_1A, X_t = HIPC_Stanford_1369_1A, Lab_source = HIPC_Stanford_1228_1A_labels, method='minmax') summary(res) plot(res) }
Function to estimate the type cell proportions in an unclassified cytometry data set denoted X_s by using the classification Lab_source from an other cytometry data set X_s. With this function the computation of the estimate of the class proportions is done with a descent ascent algorithm.
cytopt_desasc_r( X_s, X_t, Lab_source, theta_true = NULL, eps = 1e-04, n_out = 5000, n_stoc = 10, step_grad = 10, monitoring = FALSE )
cytopt_desasc_r( X_s, X_t, Lab_source, theta_true = NULL, eps = 1e-04, n_out = 5000, n_stoc = 10, step_grad = 10, monitoring = FALSE )
X_s |
a cytometry dataframe. The columns correspond to the different biological markers tracked. One line corresponds to the cytometry measurements performed on one cell. The classification of this Cytometry data set must be provided with the Lab_source parameters. |
X_t |
a cytometry dataframe. The columns correspond to the different biological markers tracked. One line corresponds to the cytometry measurements performed on one cell. The CytOpT algorithm targets the cell type proportion in this Cytometry data set |
Lab_source |
a vector of length |
theta_true |
If available, gold-standard proportions in the target data
set |
eps |
an float value of regularization parameter of the Wasserstein distance. Default is |
n_out |
an integer number of iterations in the outer loop. This loop corresponds to the gradient
descent algorithm to minimize the regularized Wasserstein distance between the source and
target data sets. Default is |
n_stoc |
an integer number of iterations in the inner loop. This loop corresponds to the stochastic
algorithm that approximates a maximizer of the semi-dual problem. Default is |
step_grad |
an integer number step size of the gradient descent algorithm
of the outer loop. Default is |
monitoring |
boolean indicating whether Kullback-Leibler divergence should be
monitored and store throughout the optimization iterations. Default is |
A list with the following elements:h_hat
Function to estimate the type cell proportions in an unclassified cytometry data set denoted X_s by using the classification Lab_source from an other cytometry data set X_s. With this function an additional regularization parameter on the class proportions enables a faster computation of the estimator.
cytopt_minmax_r( X_s, X_t, Lab_source, theta_true = NULL, eps = 1e-04, lbd = 1e-04, n_iter = 10000, step = 5, power = 0.99, monitoring = FALSE )
cytopt_minmax_r( X_s, X_t, Lab_source, theta_true = NULL, eps = 1e-04, lbd = 1e-04, n_iter = 10000, step = 5, power = 0.99, monitoring = FALSE )
X_s |
Cytometry data set. The columns correspond to the different biological markers tracked. One line corresponds to the cytometry measurements performed on one cell. The classification of this Cytometry data set must be provided with the Lab_source parameters. |
X_t |
Cytometry data set. The columns correspond to the different biological markers tracked. One line corresponds to the cytometry measurements performed on one cell. The CytOpT algorithm targets the cell type proportion in this Cytometry data set. |
Lab_source |
Classification of the X_s Cytometry data set |
theta_true |
If available, gold-standard proportions in the target data
set |
eps |
Regularization parameter of the Wasserstein distance |
lbd |
an float constant that multiply the step-size policy. Default is |
n_iter |
an integer Constant that iterate method select. Default is |
step |
Constant that multiply the step-size policy. Default is |
power |
the step size policy of the gradient ascent method is step/n^power.
Default is |
monitoring |
boolean indicating whether Kullback-Leibler divergence should be
monitored and store throughout the optimization iterations. Default is |
A list with the following elements:Results_Minmax
HIPC T cell data set from HIPC program for patients 1228 and 1369 (replicate 1A from Stanford).
data(HIPC_Stanford)
data(HIPC_Stanford)
The data are composed of 4 objects:
HIPC_Stanford_1228_1A
: a data.frame
of 31342 cells and 7 markers.
HIPC_Stanford_1228_1A_labels
: a factor
vector with the cell type
of each of the 31342 observed cells.
HIPC_Stanford_1369_1A
: a data.frame
of 33992 cells and 7 markers.
HIPC_Stanford_1369_1A_labels
: a factor
vector with the cell type
of each of the 33992 observed cells.
This immunophenotyping T cell panel from the Lyoplate HIPC dataset was used as part of the FlowCAP III Lyoplate challenge.
Flow cytometry data set from the HIPC T-cell panel study. In the HIPC T-cell panel study, Flow cytometry was measured in 3 samples for each 3 patients (IDs: 1228, 1349 and 1369) with 3 replicates each (1A, 2B and 3C) in 7 centers (NHLBI, Yale, UCLA, CIMR, Baylor, Stanford and Miami), i.e. 63 data sets in total. Manual gating was performed in the different centers to cluster te observed cells into one of 10 cellular populations:
CD8 Effector
CD8 Naive
CD8 Central Memory
CD8 Effector Memory
CD8 Activated
CD4 Effector
CD4 Naive
CD4 Central Memory
CD4 Effector Memory
CD4 Activated
https://www.immuneprofiling.org/hipc/page/show https://www.immunespace.org/ https://www.immunespace.org/project/HIPC/Lyoplate/begin.view?pageId=study.DATA_ANALYSIS
Maecker HT, McCoy JP & Nussenblatt R (2012). Standardizing immunophenotyping for the human immunology project. Nature Reviews Immunology, 12(3):191–200. DOI: 10.1038/nri3158
Finak G, Langweiler M, Jaimes M, Malek M, Taghiyar J, Korin Y, Raddassi K, Devine L, Obermoser G, Pekalski ML, Pontikos N, Diaz A, Heck S, Villanova F, Terrazzini N, Kern F, Qian Y, Stanton R, Wang K, Brandes A, Ramey J, Aghaeepour N, Mosmann T, Scheuermann RH, Reed E, Palucka K, Pascual V, Blomberg BB, Nestle F, Nussenblatt RB, Brinkman RR, Gottardo R, Maecker H & McCoy JP (2016). Standardizing Flow Cytometry Immunophenotyping Analysis from the Human ImmunoPhenotyping Consortium. Scientific Reports. 10(6):20686. DOI: 10.1038/srep20686.
A plotting function for displaying Kullback-Liebler (KL) divergence across iterations of the optimization algorithm(s).
KL_plot( monitoring, n_0 = 10, n_stop = 1000, title = "Kullback-Liebler divergence trace" )
KL_plot( monitoring, n_0 = 10, n_stop = 1000, title = "Kullback-Liebler divergence trace" )
monitoring |
|
n_0 |
first iteration to plot. Default is 10. |
n_stop |
last iteration to plot. Default is 1000. |
title |
plot title. Default is |
a ggplot
object
if(interactive()){ gold_standard_manual_prop <- c(table(HIPC_Stanford_1369_1A_labels) / length(HIPC_Stanford_1369_1A_labels)) res <- CytOpT(X_s = HIPC_Stanford_1228_1A, X_t = HIPC_Stanford_1369_1A, Lab_source = HIPC_Stanford_1228_1A_labels, theta_true = gold_standard_manual_prop, eps = 0.0001, lbd = 0.0001, n_iter = 10000, n_stoc=10, step_grad = 10, step = 5, power = 0.99, method='both', monitoring = TRUE) plot(res) }
if(interactive()){ gold_standard_manual_prop <- c(table(HIPC_Stanford_1369_1A_labels) / length(HIPC_Stanford_1369_1A_labels)) res <- CytOpT(X_s = HIPC_Stanford_1228_1A, X_t = HIPC_Stanford_1369_1A, Lab_source = HIPC_Stanford_1228_1A_labels, theta_true = gold_standard_manual_prop, eps = 0.0001, lbd = 0.0001, n_iter = 10000, n_stoc=10, step_grad = 10, step = 5, power = 0.99, method='both', monitoring = TRUE) plot(res) }
Computes a classification on the target data thanks to the approximation of the transport plan and the classification of the source data. Transport plan is approximated with the stochastic algorithm.
Label_Prop_sto_r( X_s, X_t, Lab_source, eps = 1e-04, const = 0.1, n_iter = 4000, minMaxScaler = TRUE, monitoring = TRUE, thresholding = TRUE )
Label_Prop_sto_r( X_s, X_t, Lab_source, eps = 1e-04, const = 0.1, n_iter = 4000, minMaxScaler = TRUE, monitoring = TRUE, thresholding = TRUE )
X_s |
a cytometry dataframe. The columns correspond to the different biological markers tracked. One line corresponds to the cytometry measurements performed on one cell. The classification of this Cytometry data set must be provided with the Lab_source parameters. |
X_t |
a cytometry dataframe. The columns correspond to the different biological markers tracked. One line corresponds to the cytometry measurements performed on one cell. The CytOpT algorithm targets the cell type proportion in this Cytometry data set |
Lab_source |
a vector of length |
eps |
an float value of regularization parameter of the Wasserstein distance. Default is |
const |
an float constant. Default is |
n_iter |
an integer Constant that iterate method select. Default is |
minMaxScaler |
a logical flag indicating to possibly Scaler |
monitoring |
a logical flag indicating to possibly monitor the gap between the estimated proportions and the manual
gold-standard. Default is |
thresholding |
a logical flag. |
a ggplot
object
a vector of length nrow(X_t)
with the propagated labels
if(interactive()){ res <- Label_Prop_sto_r(X_s = HIPC_Stanford_1228_1A, X_t = HIPC_Stanford_1369_1A, Lab_source = HIPC_Stanford_1228_1A_labels) }
if(interactive()){ res <- Label_Prop_sto_r(X_s = HIPC_Stanford_1228_1A, X_t = HIPC_Stanford_1369_1A, Lab_source = HIPC_Stanford_1228_1A_labels) }
plot S3 method for CytOpt object
## S3 method for class 'CytOpt' plot(x, ...)
## S3 method for class 'CytOpt' plot(x, ...)
x |
an object of class |
... |
further arguments passed to or from other methods. Not implemented. |
a ggplot
object, potentially composed through
patchwork
if(interactive()){ res <- CytOpT(X_s = HIPC_Stanford_1228_1A, X_t = HIPC_Stanford_1369_1A, Lab_source = HIPC_Stanford_1228_1A_labels, eps = 0.0001, lbd = 0.0001, n_iter = 10000, n_stoc=10, step_grad = 10, step = 5, power = 0.99, method='minmax') plot(res) }
if(interactive()){ res <- CytOpT(X_s = HIPC_Stanford_1228_1A, X_t = HIPC_Stanford_1369_1A, Lab_source = HIPC_Stanford_1228_1A_labels, eps = 0.0001, lbd = 0.0001, n_iter = 10000, n_stoc=10, step_grad = 10, step = 5, power = 0.99, method='minmax') plot(res) }
print S3 method for CytOpt object
## S3 method for class 'CytOpt' print(x, ...)
## S3 method for class 'CytOpt' print(x, ...)
x |
an object of class |
... |
further arguments passed to or from other methods. Not implemented. |
the proportions data.frame
from x
if(interactive()){ res <- CytOpT(X_s = HIPC_Stanford_1228_1A, X_t = HIPC_Stanford_1369_1A, Lab_source = HIPC_Stanford_1228_1A_labels, eps = 0.0001, lbd = 0.0001, n_iter = 10000, n_stoc=10, step_grad = 10, step = 5, power = 0.99, method='minmax') print(res) }
if(interactive()){ res <- CytOpT(X_s = HIPC_Stanford_1228_1A, X_t = HIPC_Stanford_1369_1A, Lab_source = HIPC_Stanford_1228_1A_labels, eps = 0.0001, lbd = 0.0001, n_iter = 10000, n_stoc=10, step_grad = 10, step = 5, power = 0.99, method='minmax') print(res) }
print S3 method for summary.CytOpt object
## S3 method for class 'summary.CytOpt' print(x, ...)
## S3 method for class 'summary.CytOpt' print(x, ...)
x |
an object of class |
... |
further arguments passed to or from other methods. Not implemented. |
summary S3 method for CytOpt object
## S3 method for class 'CytOpt' summary(object, ...)
## S3 method for class 'CytOpt' summary(object, ...)
object |
an object of class |
... |
further arguments passed to or from other methods. Not implemented. |
a list
object
if(interactive()){ res <- CytOpT(X_s = HIPC_Stanford_1228_1A, X_t = HIPC_Stanford_1369_1A, Lab_source = HIPC_Stanford_1228_1A_labels, eps = 0.0001, lbd = 0.0001, n_iter = 10000, n_stoc=10, step_grad = 10, step = 5, power = 0.99, method='minmax', monitoring=TRUE) summary(res) }
if(interactive()){ res <- CytOpT(X_s = HIPC_Stanford_1228_1A, X_t = HIPC_Stanford_1369_1A, Lab_source = HIPC_Stanford_1228_1A_labels, eps = 0.0001, lbd = 0.0001, n_iter = 10000, n_stoc=10, step_grad = 10, step = 5, power = 0.99, method='minmax', monitoring=TRUE) summary(res) }