Title: | Automated Cytometry Gating and Annotation |
---|---|
Description: | Given the hypothesis of a bi-modal distribution of cells for each marker, the algorithm constructs a binary tree, the nodes of which are subpopulations of cells. At each node, observed cells and markers are modeled by both a family of normal distributions and a family of bi-modal normal mixture distributions. Splitting is done according to a normalized difference of AIC between the two families. Method is detailed in: Commenges, Alkhassim, Gottardo, Hejblum & Thiebaut (2018) <doi: 10.1002/cyto.a.23601>. |
Authors: | Chariff Alkhassim [aut], Boris P Hejblum [cre, aut], Anthony Devaux [aut], Van Hung Huynh Tran [aut], Melany Durand [aut] |
Maintainer: | Boris P Hejblum <[email protected]> |
License: | LGPL-3 | file LICENSE |
Version: | 2.0.5 |
Built: | 2024-11-21 06:24:22 UTC |
Source: | https://github.com/sistm/cytometree |
Given the hypothesis of a bi-modal distribution of cells for each marker, the algorithm constructs a binary tree, the nodes of which are subpopulations of cells. At each node, observed cells and markers are modeled by both a family of normal distributions and a family of bi-modal normal mixture distributions. Splitting is done according to a normalized difference of AIC between the two families. Method is detailed in: Commenges, Alkhassim, Gottardo, Hejblum & Thiebaut (2018) doi: 10.1002/cyto.a.23601.
The main function in this package is CytomeTree
.
Package: | cytometree |
Type: | Package |
Version: | 2.0.4 |
Date: | 2020-08-12 |
License: | LGPL-3 |
The algorithm is based on the construction of a binary tree, the nodes of which are subpopulations of cells. At each node, observed cells and markers are modeled by both a family of normal distributions and a family of bi-modal normal mixture distributions. Splitting is done according to a normalized difference of AIC between the two families. Given the unsupervised nature of the binary tree, some of the available markers may not be used to find the different cell populations present in a given sample. To recover a complete annotation, we defined, as a post processing procedure, an annotation method which allows the user to distinguish two or three expression levels per marker.
Maintainer: Boris P Hejblum [email protected]
Authors:
Chariff Alkhassim
Anthony Devaux
Van Hung Huynh Tran
Melany Durand
Commenges D, Alkhassim C, Gottardo R, Hejblum BP, Thiébaut R (2018). cytometree: a binary tree algorithm for automatic gating in cytometry analysis. Cytometry Part A, 93(11):1132-1140. <doi: 10.1002/cyto.a.23601>
Useful links:
Report bugs at https://github.com/sistm/Cytometree/issues
Annotates cell populations found using CytomeTree.
Annotation( CytomeTreeObj, K2markers = NULL, K3markers = NULL, plot = TRUE, t = 0.2, remove_outliers_inplot = TRUE, center_fun = c("median", "mean") )
Annotation( CytomeTreeObj, K2markers = NULL, K3markers = NULL, plot = TRUE, t = 0.2, remove_outliers_inplot = TRUE, center_fun = c("median", "mean") )
CytomeTreeObj |
An object of class CytomeTree. |
K2markers |
A vector of class character where the names of
the markers for which 2 levels of expression are sought can be specified.
Default is |
K3markers |
A vector of class character where the names of
the markers for which 3 levels of expression are sought can be specified.
Default is |
plot |
A logical value indicating whether or not to plot the
partitioning in 1, 2 or 3 groups for each marker. Default is |
t |
A real positive-or-null number used for comparison with the normalized AIC computed to compare the fits of the marginal distributions obtained by one normal distribution and by a mixture of two or three normal. For markers used in the tree, the algorithm compares the fits obtained by a mixture of two and three normal distributions. Default value is .2. A higher value leads to a smaller number of expression levels per marker. |
remove_outliers_inplot |
a logical flag indicating whether the y-axis
should be scaled by removing outliers or not. Default is |
center_fun |
a character string either 'median' or 'mean' indicating based
on which summary the populations should be ordered. Default is |
The algorithm is set to find the partitioning in 1, 2 or 3 groups of cell populations found using CytomeTree. In an unsupervised mode, it minimizes the within-leaves sum of squares of the observed values on each marker and computes the normalized AIC to compare the fits of the marginal distributions obtained by one normal distribution and by a mixture of two or three normal.For markers used in the tree, the algorithm compares the fits obtained by a mixture of two and three normal distributions.
A data.frame
containing the annotation of each
cell population.
Chariff Alkhassim, Boris Hejblum
Bootstrapped Confidence Interval.
bootstrapCI(stat, n, alpha)
bootstrapCI(stat, n, alpha)
stat |
A numeric vector of statistics for which to compute a confidence interval. |
n |
An integer giving the number of bootstrap samples. |
alpha |
A real number comprised in ]0, 1[ : 1 - desired confidence level. |
Chariff Alkhassim
Binary tree algorithm for mass cytometry data analysis.
CytofTree( M, minleaf = 1, t = 0.1, verbose = TRUE, force_first_markers = NULL, transformation = c("asinh", "biexp", "log10", "none"), num_col = 1:ncol(M) )
CytofTree( M, minleaf = 1, t = 0.1, verbose = TRUE, force_first_markers = NULL, transformation = c("asinh", "biexp", "log10", "none"), num_col = 1:ncol(M) )
M |
A matrix of size n x p containing mass cytometry measures of n cells on p markers. |
minleaf |
An integer indicating the minimum number of cells
per population. Default is |
t |
A real positive-or-null number used for comparison with the normalized AIC computed at each node of the tree. A higher value limits the height of the tree. |
verbose |
A logical controlling if a text progress bar is displayed during the execution of the algorithm. By default is TRUE. |
force_first_markers |
a vector of index to split the data on first.
This argument is used in the semi-supervised setting, forcing the algorithm to consider
those markers first, in the order they appear in this |
transformation |
A string indicating the transformation used among |
num_col |
An integer vector of index indicating the columns to be transform.
Default is |
First of all, data can be transformed using different transformations. The algorithm is based on the construction of a binary tree, the nodes of which are subpopulations of cells. At each node, observed cells and markers are modeled by both a family of normal distributions and a family of bi-modal normal mixture distributions. Splitting is done according to a normalized difference of AIC between the two families.
An object of class 'cytomeTree' providing a partitioning of the set of n cells.
annotation
A data.frame
containing the annotation of each
cell population underlying the tree pattern.
labels
The partitioning of the set of n cells.
M
The transformed matrix of mass cytometry.
mark_tree
A two level list containing markers used
for node splitting.
transformation
Transformation used
num_col
Indexes of columns transformed
Anthony Devaux, Boris Hejblum
data(IMdata) # dimension of data dim(IMdata) # given the size of the dataset, the code below can take several minutes to run if(interactive()){ # Don't transform Time et Cell_length column num_col <- 3:ncol(IMdata) # Build Cytoftree binary tree tree <- CytofTree(M = IMdata, minleaf = 1, t = 0.1, transformation = "asinh", num_col = num_col) # Annotation annot <- Annotation(tree, plot = FALSE, K2markers = colnames(IMdata)) # Provide subpopulations annot$combinations }
data(IMdata) # dimension of data dim(IMdata) # given the size of the dataset, the code below can take several minutes to run if(interactive()){ # Don't transform Time et Cell_length column num_col <- 3:ncol(IMdata) # Build Cytoftree binary tree tree <- CytofTree(M = IMdata, minleaf = 1, t = 0.1, transformation = "asinh", num_col = num_col) # Annotation annot <- Annotation(tree, plot = FALSE, K2markers = colnames(IMdata)) # Provide subpopulations annot$combinations }
Binary tree algorithm for cytometry data analysis.
CytomeTree(M, minleaf = 1, t = 0.1, verbose = TRUE, force_first_markers = NULL)
CytomeTree(M, minleaf = 1, t = 0.1, verbose = TRUE, force_first_markers = NULL)
M |
A matrix of size n x p containing cytometry measures of n cells on p markers. |
minleaf |
An integer indicating the minimum number of cells
per population. Default is |
t |
A real positive-or-null number used for comparison with the normalized AIC computed at each node of the tree. A higher value limits the height of the tree. |
verbose |
A logical controlling if a text progress bar is displayed during the execution of the algorithm. By default is TRUE. |
force_first_markers |
a vector of index to split the data on first.
This argument is used in the semi-supervised setting, forcing the algorithm to consider
those markers first, in the order they appear in this |
The algorithm is based on the construction of a binary tree, the nodes of which are subpopulations of cells. At each node, observed cells and markers are modeled by both a family of normal distributions and a family of bi-modal normal mixture distributions. Splitting is done according to a normalized difference of AIC between the two families.
An object of class 'CytomeTree' providing a partitioning of the set of n cells.
annotation
A data.frame
containing the annotation of each
cell population underlying the tree pattern.
labels
The partitioning of the set of n cells.
M
The input matrix.
mark_tree
A two level list containing markers used
for node splitting.
pl_list A list of density estimations for each node used in
plot_nodes
for visualization purposes
Chariff Alkhassim, Boris Hejblum
head(DLBCL) # number of cell event N <- nrow(DLBCL) # Cell events cellevents <- DLBCL[, c("FL1", "FL2", "FL4")] # Manual partitioning of the set N (from FlowCAP-I) manual_labels <- DLBCL[, "label"] # Build the binary tree Tree <- CytomeTree(cellevents, minleaf = 1, t=.1) # Retreive the resulting partition of the set N Tree_Partition <- Tree$labels # Plot node distributions par(mfrow=c(1, 2)) plot_nodes(Tree) # Choose a node to plot plot_nodes(Tree,"FL4.1") # Plot a graph of the tree par(mfrow=c(1,1)) plot_graph(Tree,edge.arrow.size=.3, Vcex =.5, vertex.size = 30) # Run the annotation algorithm Annot <- Annotation(Tree,plot=FALSE) Annot$combinations # Compare to the annotation gotten from the tree Tree$annotation # Example of sought phenotypes # Variable in which sought phenotypes can be entered in the form of matrices. phenotypes <- list() # Sought phenotypes: ## FL2+ FL4-. phenotypes[[1]] <- rbind(c("FL2", 1), c("FL4", 0)) ## FL2- FL4+. phenotypes[[2]] <- rbind(c("FL2", 0), c("FL4", 1)) ## FL2+ FL4+. phenotypes[[3]] <- rbind(c("FL2", 1), c("FL4", 1)) # Retreive cell populations found using Annotation. PhenoInfos <- RetrievePops(Annot, phenotypes) PhenoInfos$phenotypesinfo # F-measure ignoring cells labeled 0 as in FlowCAP-I. # Use FmeasureC() in any other case. FmeasureC_no0(ref=manual_labels, pred=Tree_Partition) if(interactive()){ # Scatterplots. library(ggplot2) # Ignoring cells labeled 0 as in FlowCAP-I. rm_zeros <- which(!manual_labels) # Building the data frame to scatter plot the data. FL1 <- cellevents[-c(rm_zeros),"FL1"] FL2 <- cellevents[-c(rm_zeros),"FL2"] FL4 <- cellevents[-c(rm_zeros),"FL4"] n <- length(FL1) Labels <- c(manual_labels[-c(rm_zeros)]%%2+1, Tree_Partition[-c(rm_zeros)]) Labels <- as.factor(Labels) method <- as.factor(c(rep("FlowCap-I",n),rep("CytomeTree",n))) scatter_df <- data.frame("FL2" = FL2, "FL4" = FL4, "labels" = Labels, "method" = method) p <- ggplot2::ggplot(scatter_df, ggplot2::aes_string(x = "FL2", y = "FL4", colour = "labels")) + ggplot2::geom_point(alpha = 1,cex = 1) + ggplot2::scale_colour_manual(values = c("green","red","blue")) + ggplot2::facet_wrap(~ method) + ggplot2::theme_bw() + ggplot2::theme(legend.position="bottom") p }
head(DLBCL) # number of cell event N <- nrow(DLBCL) # Cell events cellevents <- DLBCL[, c("FL1", "FL2", "FL4")] # Manual partitioning of the set N (from FlowCAP-I) manual_labels <- DLBCL[, "label"] # Build the binary tree Tree <- CytomeTree(cellevents, minleaf = 1, t=.1) # Retreive the resulting partition of the set N Tree_Partition <- Tree$labels # Plot node distributions par(mfrow=c(1, 2)) plot_nodes(Tree) # Choose a node to plot plot_nodes(Tree,"FL4.1") # Plot a graph of the tree par(mfrow=c(1,1)) plot_graph(Tree,edge.arrow.size=.3, Vcex =.5, vertex.size = 30) # Run the annotation algorithm Annot <- Annotation(Tree,plot=FALSE) Annot$combinations # Compare to the annotation gotten from the tree Tree$annotation # Example of sought phenotypes # Variable in which sought phenotypes can be entered in the form of matrices. phenotypes <- list() # Sought phenotypes: ## FL2+ FL4-. phenotypes[[1]] <- rbind(c("FL2", 1), c("FL4", 0)) ## FL2- FL4+. phenotypes[[2]] <- rbind(c("FL2", 0), c("FL4", 1)) ## FL2+ FL4+. phenotypes[[3]] <- rbind(c("FL2", 1), c("FL4", 1)) # Retreive cell populations found using Annotation. PhenoInfos <- RetrievePops(Annot, phenotypes) PhenoInfos$phenotypesinfo # F-measure ignoring cells labeled 0 as in FlowCAP-I. # Use FmeasureC() in any other case. FmeasureC_no0(ref=manual_labels, pred=Tree_Partition) if(interactive()){ # Scatterplots. library(ggplot2) # Ignoring cells labeled 0 as in FlowCAP-I. rm_zeros <- which(!manual_labels) # Building the data frame to scatter plot the data. FL1 <- cellevents[-c(rm_zeros),"FL1"] FL2 <- cellevents[-c(rm_zeros),"FL2"] FL4 <- cellevents[-c(rm_zeros),"FL4"] n <- length(FL1) Labels <- c(manual_labels[-c(rm_zeros)]%%2+1, Tree_Partition[-c(rm_zeros)]) Labels <- as.factor(Labels) method <- as.factor(c(rep("FlowCap-I",n),rep("CytomeTree",n))) scatter_df <- data.frame("FL2" = FL2, "FL4" = FL4, "labels" = Labels, "method" = method) p <- ggplot2::ggplot(scatter_df, ggplot2::aes_string(x = "FL2", y = "FL4", colour = "labels")) + ggplot2::geom_point(alpha = 1,cex = 1) + ggplot2::scale_colour_manual(values = c("green","red","blue")) + ggplot2::facet_wrap(~ method) + ggplot2::theme_bw() + ggplot2::theme(legend.position="bottom") p }
Diffuse large B-cell lymphoma data set from the FlowCAP-I challenge.
data(DLBCL)
data(DLBCL)
A data frame with 5524 cell events and 3 markers.
C++ implementation of the F-measure computation
FmeasureC(pred, ref)
FmeasureC(pred, ref)
pred |
vector of a predicted partition |
ref |
vector of a reference partition |
Boris Hejblum
Aghaeepour in FlowCAP 1 ignore the reference class labeled "0"
FmeasureC_no0(pred, ref)
FmeasureC_no0(pred, ref)
pred |
vector of a predicted partition |
ref |
vector of a reference partition |
Boris Hejblum
HIPC T cell panel data set from HIPC program, patient 1369. The data was analyzed and gated by Stanford.
data(HIPC)
data(HIPC)
A data frame with 33992 cell events and 7 markers.
This immunophenotyping T cell panel from the Lyoplate HIPC dataset was used as part of the FlowCAP III Lyoplate challenge.
https://www.immuneprofiling.org/ https://www.immunespace.org/ https://www.immunespace.org/project/HIPC/Lyoplate/begin.view?pageId=study.DATA_ANALYSIS
Maecker HT, McCoy JP & Nussenblatt R (2012). Standardizing immunophenotyping for the human immunology project. Nature Reviews Immunology, 12(3):191–200. DOI: 10.1038/nri3158
Finak G, Langweiler M, Jaimes M, Malek M, Taghiyar J, Korin Y, Raddassi K, Devine L, Obermoser G, Pekalski ML, Pontikos N, Diaz A, Heck S, Villanova F, Terrazzini N, Kern F, Qian Y, Stanton R, Wang K, Brandes A, Ramey J, Aghaeepour N, Mosmann T, Scheuermann RH, Reed E, Palucka K, Pascual V, Blomberg BB, Nestle F, Nussenblatt RB, Brinkman RR, Gottardo R, Maecker H & McCoy JP (2016). Standardizing Flow Cytometry Immunophenotyping Analysis from the Human ImmunoPhenotyping Consortium. Scientific Reports. 10(6):20686. DOI: 10.1038/srep20686.
A dataset containing 10,000 cells and 39 markers of mass cytometry subsampled from the sample SUB116516.478 from the study SDY478 by Mark Davis retrieved from ImmuneSpace
data(IMdata)
data(IMdata)
A data frame with 10,000 rows and 39 variables:
https://www.immunespace.org/project/Studies/SDY478/begin.view?
Plot the cell count for each population using CytomeTree.
plot_cytopop( AnnotObj, nbpop = 10, mincount = 1, maxcount = NULL, y_axis = c("abs_count", "prop") )
plot_cytopop( AnnotObj, nbpop = 10, mincount = 1, maxcount = NULL, y_axis = c("abs_count", "prop") )
AnnotObj |
An object of class Annotation. |
nbpop |
Number indicating the maximum of population plotted.
Default is |
mincount |
Number indicating the minimum of cell count
for the populations. Default is |
maxcount |
Number indicating the maximum of cell count
for the populations. Default is |
y_axis |
a character string either |
Anthony Devaux, Boris Hejblum
# Run CytomeTree data(DLBCL) cellevents <- DLBCL[,c("FL1", "FL2", "FL4")] Tree <- CytomeTree(cellevents, minleaf = 1, t=.1) Annot <- Annotation(Tree,plot=FALSE) # Plot the cell count plot_cytopop(Annot)
# Run CytomeTree data(DLBCL) cellevents <- DLBCL[,c("FL1", "FL2", "FL4")] Tree <- CytomeTree(cellevents, minleaf = 1, t=.1) Annot <- Annotation(Tree,plot=FALSE) # Plot the cell count plot_cytopop(Annot)
Plot the binary tree built using CytomeTree.
plot_graph(CytomeTreeObj, Ecex = 1, Ecolor = 8, Vcex = 0.8, Vcolor = 0, ...)
plot_graph(CytomeTreeObj, Ecex = 1, Ecolor = 8, Vcex = 0.8, Vcolor = 0, ...)
CytomeTreeObj |
An object of class CytomeTree. |
Ecex |
Number indicating the amount by which text
on the edges should be scaled. Default is |
Ecolor |
An integer or a string of character
to color edges of the graph. Default is |
Vcex |
Number indicating the amount by which text
in the vertices should be scaled. Default is |
Vcolor |
A vector of class numeric or character to color
vertices of the graph. Default is |
... |
additional arguments to be passed to |
Chariff Alkhassim
Plot the distribution of the observed cells at each node of the binary tree built using CytomeTree.
plot_nodes( CytomeTreeObj, nodes = NULL, nodesPerCol = NULL, nodesPerRow = NULL, ... )
plot_nodes( CytomeTreeObj, nodes = NULL, nodesPerCol = NULL, nodesPerRow = NULL, ... )
CytomeTreeObj |
An object of class CytomeTree. |
nodes |
A list of character elements containing the name of
the nodes for which the distribution is to be plotted. Default is
|
nodesPerCol |
an integer specifying the number of plots to be
displayed per column when plotting multiple nodes at once. Default is
|
nodesPerRow |
an integer specifying the number of plots to be
displayed per row when plotting multiple nodes at once. Default is
|
... |
further arguments to be passed to |
if both nodesPerCol
and nodesPerRow
are NULL
then all the nodes are plotted on a single page.
"GM"
stands for "Gaussian mixture" and "KDE"
stands for
"Kernel Density Estimation".
a list of ggplot2
plot objects, containing each node plot.
Chariff Alkhassim, Boris Hejblum
data(DLBCL) myct <- CytomeTree(DLBCL[, c("FL1", "FL2", "FL4")], minleaf = 1, t=.1) plot_nodes(myct)
data(DLBCL) myct <- CytomeTree(DLBCL[, c("FL1", "FL2", "FL4")], minleaf = 1, t=.1) plot_nodes(myct)
Retrieve cell populations found using Annotation.
RetrievePops(AnnotationObj, phenotypes)
RetrievePops(AnnotationObj, phenotypes)
AnnotationObj |
An object of class Annotation. |
phenotypes |
A list containing at least one element of class matrix describing a sought phenotype. Each matrix should have two columns where the name of a used marker is associated to a value chosen between 0, 1 and 2. 0 translates to -, 1 to + and 2 to ++. |
A list
of two elements.
phenotypesinfo
A list
containing informations
about sought populations.
Mergedleaves
The partitioning of the set of n cells
with potentially merged leaves.
Chariff Alkhassim, Boris Hejblum