Title: | Time-Course Gene Set Analysis |
---|---|
Description: | Implementation of Time-course Gene Set Analysis (TcGSA), a method for analyzing longitudinal gene-expression data at the gene set level. Method is detailed in: Hejblum, Skinner & Thiebaut (2015) <doi: 10.1371/journal.pcbi.1004310>. |
Authors: | Boris P Hejblum [aut, cre], Damien Chimits [aut], Anthony Devaux [aut] |
Maintainer: | Boris P Hejblum <[email protected]> |
License: | GPL-2 | file LICENSE |
Version: | 0.12.10 |
Built: | 2025-02-01 04:02:47 UTC |
Source: | https://github.com/sistm/tcgsa |
This package implements TcGSA, an algorithm to analyze longitudinal gene-expression data at the gene set level.
Package: | TcGSA |
Type: | Package |
Version: | 0.12.10 |
Date: | 2022-02-27 |
License: | GPL-2 |
The main function in this package is TcGSA.LR
which performs Time-course Gene Set Analysis, and provide nice representations of its results (see plot.TcGSA
and plot1GS
).
Boris P. Hejblum, Damien Chimits — Maintainer: Boris P. Hejblum
Hejblum BP, Skinner J, Thiebaut R, (2015) Time-Course Gene Set Analysis for Longitudinal Gene Expression Data. PLOS Comput. Biol. 11(6):e1004310. doi: 10.1371/journal.pcbi.1004310
This function clusters the genes dynamics of one gene sets into different
dominant trends. The optimal number of clusters is computed thanks to the gap
statistics. See clusGap
.
clustTrend( tcgs, expr, Subject_ID, TimePoint, threshold = 0.05, myproc = "BY", nbsimu_pval = 1e+06, baseline = NULL, only.signif = TRUE, group.var = NULL, Group_ID_paired = NULL, ref = NULL, group_of_interest = NULL, FUNcluster = NULL, clustering_metric = "euclidian", clustering_method = "ward", B = 100, max_trends = 4, aggreg.fun = "median", na.rm.aggreg = TRUE, trend.fun = "median", methodOptiClust = "firstSEmax", indiv = "genes", verbose = TRUE ) ## S3 method for class 'ClusteredTrends' print(x, ...) ## S3 method for class 'ClusteredTrends' plot(x, ...)
clustTrend( tcgs, expr, Subject_ID, TimePoint, threshold = 0.05, myproc = "BY", nbsimu_pval = 1e+06, baseline = NULL, only.signif = TRUE, group.var = NULL, Group_ID_paired = NULL, ref = NULL, group_of_interest = NULL, FUNcluster = NULL, clustering_metric = "euclidian", clustering_method = "ward", B = 100, max_trends = 4, aggreg.fun = "median", na.rm.aggreg = TRUE, trend.fun = "median", methodOptiClust = "firstSEmax", indiv = "genes", verbose = TRUE ) ## S3 method for class 'ClusteredTrends' print(x, ...) ## S3 method for class 'ClusteredTrends' plot(x, ...)
tcgs |
a tcgsa object for |
expr |
either a matrix or dataframe of gene expression upon which
dynamics are to be calculated, or a list of gene sets estimation of gene
expression. In the case of a matrix or dataframe, its dimension are |
Subject_ID |
a factor of length |
TimePoint |
a numeric vector or a factor of length |
threshold |
the threshold at which the FDR or the FWER should be controlled. |
myproc |
a vector of character strings containing the names of the
multiple testing procedures for which adjusted p-values are to be computed.
This vector should include any of the following: " |
nbsimu_pval |
the number of observations under the null distribution to
be generated in order to compute the p-values. Default is |
baseline |
a character string which is the value of |
only.signif |
logical flag for analyzing the trends in only the
significant gene sets. If |
group.var |
in the case of several treatment groups, this is a factor of
length |
Group_ID_paired |
a character vector of length |
ref |
the group which is used as reference in the case of several
treatment groups. Default is |
group_of_interest |
the group of interest, for which dynamics are to be
computed in the case of several treatment groups. Default is |
FUNcluster |
the clustering function used to agglomerate genes in
trends. Default is |
clustering_metric |
character string specifying the metric to be used
for calculating dissimilarities between observations in the hierarchical
clustering when |
clustering_method |
character string defining the agglomerative method
to be used in the hierarchical clustering when |
B |
integer specifying the number of Monte Carlo ("bootstrap") samples
used to compute the gap statistics. Default is |
max_trends |
integer specifying the maximum number of different clusters
to be tested. Default is |
aggreg.fun |
a character string such as |
na.rm.aggreg |
a logical flag indicating whether |
trend.fun |
a character string such as |
methodOptiClust |
character string indicating how the "optimal" number
of clusters is computed from the gap statistics and their standard
deviations. Possible values are |
indiv |
a character string indicating by which unit observations are
aggregated (through |
verbose |
logical flag enabling verbose messages to track the computing
status of the function. Default is |
x |
an object of class ' |
... |
further arguments passed to or from other methods. |
If expr
is a matrix or a dataframe, then the genes dynamics are
clustered on the "original" data. On the other hand, if expr
is a
list returned in the 'Estimations'
element of TcGSA.LR
,
then the dynamics are computed on the estimations made by the
TcGSA.LR
function.
This function uses the Gap statistics to determine the optimal number of
clusters in the plotted gene set. See
clusGap
.
An object of class ClusteredTrends which is a list with the 4 following components:
NbClust
a vector that contains the optimal number of clusters for
each analyzed gene sets.
ClustsMeds
a list of the same length as NsClust
(the
number of analyzed gene sets). Each element of the list is a data frame, in
which there is as many column as the optimal number of clusters for the
corresponding gene sets for each cluster. Each column of the data frame
contains the median trend values for the corresponding cluster.
GenesPartition
a list of the same length as NsClust
(the
number of analyzed gene sets). Each element of the list is a vector which
gives the partition of the genes inside the corresponding gene set.
MaxNbClust
an integer storing the maximum number of different
clusters tested, as given by the argument 'max_trends'
.
Boris P. Hejblum
Tibshirani, R., Walther, G. and Hastie, T., 2001, Estimating the number of data clusters via the Gap statistic, Journal of the Royal Statistical Society, Series B (Statistical Methodology), 63, 2: 41–423.
if(interactive()){ data(data_simu_TcGSA) tcgsa_sim_1grp <- TcGSA.LR(expr=expr_1grp, gmt=gmt_sim, design=design, subject_name="Patient_ID", time_name="TimePoint", time_func="linear", crossedRandom=FALSE) CT <- clustTrend(tcgsa_sim_1grp, expr=expr_1grp, Subject_ID=design$Subject_ID, TimePoint=design$TimePoint) CT plot(CT) CT$NbClust CT$NbClust["Gene set 5"] CT$ClustMeds[["Gene set 4"]] CT$ClustMeds[["Gene set 5"]] }
if(interactive()){ data(data_simu_TcGSA) tcgsa_sim_1grp <- TcGSA.LR(expr=expr_1grp, gmt=gmt_sim, design=design, subject_name="Patient_ID", time_name="TimePoint", time_func="linear", crossedRandom=FALSE) CT <- clustTrend(tcgsa_sim_1grp, expr=expr_1grp, Subject_ID=design$Subject_ID, TimePoint=design$TimePoint) CT plot(CT) CT$NbClust CT$NbClust["Gene set 5"] CT$ClustMeds[["Gene set 4"]] CT$ClustMeds[["Gene set 5"]] }
Simulated data for 5 gene sets of 50 genes each. Gene expression is simulated at 5 time points for 10 patients.
data(data_simu_TcGSA)
data(data_simu_TcGSA)
In expr_1grp
all patients belong to the same unique treatment group. The first 2 gene sets are simulated under the null hypothesis. The gene sets 3 and 4 are simulated under the alternative hypothesis that there is a significant homogeneous time trend within the gene set. The gene set 5 is simulated under the alternative hypothesis that there are significant heterogeneous time trends within the gene set.
In expr_2grp
all patients belong to 2 treatment groups. The 5 first patients belong to the treatment group 'T
', The 5 other patients belong to the treatment group 'C
'. The first 2 gene sets are simulated under the null hypothesis that there is no difference in the time trend between the 2 treatment groups. The gene sets 3 and 4 are simulated under the alternative hypothesis that there are significantly different homogeneous time trends within the gene set between the 2 treatment groups. The gene set 5 is simulated under the alternative hypothesis that there are significantly different heterogeneous time trends between the 2 treatment groups within the gene set.
expr_1grp |
See Details. |
expr_2grp |
See Details. |
design |
a data frame with 5 variables:
|
gmt_sim |
a gmt object containing the gene sets definition. See |
Boris P. Hejblum
This is simulated data.
data(data_simu_TcGSA) summary(expr_1grp) summary(design) gmt_sim
data(data_simu_TcGSA) summary(expr_1grp) summary(design) gmt_sim
This function computes the p-value of the likelihood ratios and apply a multiple testing correction.
multtest.TcGSA( tcgsa, threshold = 0.05, myproc = "BY", exact = TRUE, nbsimu_pval = 1e+06 )
multtest.TcGSA( tcgsa, threshold = 0.05, myproc = "BY", exact = TRUE, nbsimu_pval = 1e+06 )
tcgsa |
a TcGSA object. |
threshold |
the threshold at which the FDR or the FWER should be controlled. |
myproc |
a vector of character strings containing the names of the
multiple testing procedures for which adjusted p-values are to be computed.
This vector should include any of the following: " |
exact |
logical flag indicating whether the raw p-values should be computed from the
exact asymptotic mixture of chi-square, or simulated (longer and not better).
Default is |
nbsimu_pval |
the number of observations under the null distribution to
be generated in order to compute the p-values. Default is |
multtest.TcGSA
returns an dataframe with 5 variables. The
rows correspond to the gene sets under scrutiny. The 1st column is the
likelihood ratios LR
, the 2nd column is the convergence status of the
model under the null hypothesis CVG_H0
, the 3rd column is the
convergence status of the model under the alternative hypothesis
CVG_H1
, the 4th column is the raw p-value of the mixed likelihood
ratio test raw_pval
, the 5th column is the adjusted p-value of the
mixed likelihood ratio test adj_pval
.
Boris P. Hejblum
TcGSA.LR
,
mt.rawp2adjp
,
signifLRT.TcGSA
if(interactive()){ data(data_simu_TcGSA) tcgsa_sim_1grp <- TcGSA.LR(expr=expr_1grp, gmt=gmt_sim, design=design, subject_name="Patient_ID", time_name="TimePoint", time_func="linear", crossedRandom=FALSE) mtt <- multtest.TcGSA(tcgsa_sim_1grp, threshold = 0.05, myproc = "BY", nbsimu_pval = 1000) mtt }
if(interactive()){ data(data_simu_TcGSA) tcgsa_sim_1grp <- TcGSA.LR(expr=expr_1grp, gmt=gmt_sim, design=design, subject_name="Patient_ID", time_name="TimePoint", time_func="linear", crossedRandom=FALSE) mtt <- multtest.TcGSA(tcgsa_sim_1grp, threshold = 0.05, myproc = "BY", nbsimu_pval = 1000) mtt }
This function plots a gene sets dynamic trends heatmap.
## S3 method for class 'TcGSA' plot( x, threshold = 0.05, myproc = "BY", nbsimu_pval = 1e+06, expr, Subject_ID, TimePoint, baseline = NULL, only.signif = TRUE, group.var = NULL, Group_ID_paired = NULL, ref = NULL, group_of_interest = NULL, ranking = FALSE, FUNcluster = NULL, clustering_metric = "euclidian", clustering_method = "ward", B = 500, max_trends = 4, aggreg.fun = "median", na.rm.aggreg = TRUE, methodOptiClust = "firstSEmax", indiv = "genes", verbose = TRUE, clust_trends = NULL, N_clusters = NULL, myclusters = NULL, label.clusters = NULL, prev_rowCL = NULL, descript = TRUE, plot = TRUE, color.vec = c("darkred", "#D73027", "#FC8D59", "snow", "#91BFDB", "#4575B4", "darkblue"), legend.breaks = NULL, label.column = NULL, time_unit = "", cex.label.row = 1, cex.label.column = 1, margins = c(5, 25), heatKey.size = 1, dendrogram.size = 1, heatmap.height = 1, heatmap.width = 1, cex.clusterKey = 1, cex.main = 1, horiz.clusterKey = TRUE, main = NULL, subtitle = NULL, ... )
## S3 method for class 'TcGSA' plot( x, threshold = 0.05, myproc = "BY", nbsimu_pval = 1e+06, expr, Subject_ID, TimePoint, baseline = NULL, only.signif = TRUE, group.var = NULL, Group_ID_paired = NULL, ref = NULL, group_of_interest = NULL, ranking = FALSE, FUNcluster = NULL, clustering_metric = "euclidian", clustering_method = "ward", B = 500, max_trends = 4, aggreg.fun = "median", na.rm.aggreg = TRUE, methodOptiClust = "firstSEmax", indiv = "genes", verbose = TRUE, clust_trends = NULL, N_clusters = NULL, myclusters = NULL, label.clusters = NULL, prev_rowCL = NULL, descript = TRUE, plot = TRUE, color.vec = c("darkred", "#D73027", "#FC8D59", "snow", "#91BFDB", "#4575B4", "darkblue"), legend.breaks = NULL, label.column = NULL, time_unit = "", cex.label.row = 1, cex.label.column = 1, margins = c(5, 25), heatKey.size = 1, dendrogram.size = 1, heatmap.height = 1, heatmap.width = 1, cex.clusterKey = 1, cex.main = 1, horiz.clusterKey = TRUE, main = NULL, subtitle = NULL, ... )
x |
an object of class' |
threshold |
the threshold at which the FDR or the FWER should be controlled. |
myproc |
a vector of character strings containing the names of the
multiple testing procedures for which adjusted p-values are to be computed.
This vector should include any of the following: " |
nbsimu_pval |
the number of observations under the null distribution to
be generated in order to compute the p-values. Default is |
expr |
either a matrix or dataframe of gene expression upon which
dynamics are to be calculated, or a list of gene sets estimation of gene
expression. In the case of a matrix or dataframe, its dimension are |
Subject_ID |
a factor of length |
TimePoint |
a numeric vector or a factor of length |
baseline |
the value of |
only.signif |
logical flag for plotting only the significant gene sets.
If |
group.var |
in the case of several treatment' groups, this is a factor of
length |
Group_ID_paired |
a character vector of length |
ref |
the group which is used as reference in the case of several
treatment groups. Default is |
group_of_interest |
the group of interest, for which dynamics are to be
computed in the case of several treatment groups. Default is |
ranking |
a logical flag. If |
FUNcluster |
the clustering function used to agglomerate genes in
trends. Default is |
clustering_metric |
character string specifying the metric to be used
for calculating dissimilarities between observations in the hierarchical
clustering when |
clustering_method |
character string defining the agglomerative method
to be used in the hierarchical clustering when |
B |
integer specifying the number of Monte Carlo ("bootstrap") samples
used to compute the gap statistics. Default is |
max_trends |
integer specifying the maximum number of different clusters
to be tested. Default is |
aggreg.fun |
a character string such as |
na.rm.aggreg |
a logical flag indicating whether |
methodOptiClust |
character string indicating how the "optimal"" number
of clusters is computed from the gap statistics and their standard
deviations. Possible values are |
indiv |
a character string indicating by which unit observations are
aggregated (through |
verbose |
logical flag enabling verbose messages to track the computing
status of the function. Default is |
clust_trends |
object of class ClusteredTrends containing
already computed trends for the plotted gene sets. Default is |
N_clusters |
an integer that is the number of clusters in which the
dynamics should be regrouped. The cutoff of the clustering tree is
automatically calculated accordingly. Default is |
myclusters |
a character vector of colors for predefined clusters of the
represented gene sets, with as many levels as the value of |
label.clusters |
if |
prev_rowCL |
a hclust object, such as the one return by the
present plotting function (see Value) for instance. If not |
descript |
logical flag indicating that the description of the gene sets
should appear after their name on the right side of the plot if |
plot |
logical flag indicating whether the heatmap should be plotted or
not. Default is |
color.vec |
a character strings vector used to define the color
palette used in the plot. Default is
|
legend.breaks |
a numeric vector indicating the splitting points for
coloring. Default is |
label.column |
a vector of character strings with the labels to be
displayed for the columns (i.e. the time points). Default is |
time_unit |
the time unit to be displayed (such as |
cex.label.row |
a numerical value giving the amount by which row labels
text should be magnified relative to the default |
cex.label.column |
a numerical value giving the amount by which column
labels text should be magnified relative to the default |
margins |
numeric vector of length 2 containing the margins (see
|
heatKey.size |
the size of the color key for the heatmap fill. Default
is |
dendrogram.size |
the horizontal size of the dendrogram. Default is |
heatmap.height |
the height of the heatmap. Default is |
heatmap.width |
the width of the heatmap. Default is |
cex.clusterKey |
a numerical value giving the amount by which the
clusters legend text should be magnified relative to the default |
cex.main |
a numerical value giving the amount by which title text
should be magnified relative to the default |
horiz.clusterKey |
a logical flag; if |
main |
a character string for an optional title. Default is |
subtitle |
a character string for an optional subtitle. Default is |
... |
other parameters to be passed through to plotting functions. |
On the heatmap, each line corresponds to a gene set, and each column to a time point.
If expr
is a matrix or a dataframe, then the "original" data are
plotted. On the other hand, if expr
is a list returned in the
'Estimations'
element of TcGSA.LR
, then it is those
"estimations" made by the TcGSA.LR
function that are plotted.
If descript
is FALSE
, the second element of margins
can
be reduced (for instance use margins = c(5, 10)
), as there is not so
much need for space in order to display only the gene set names, without
their description.
If there is a large number of significant gene sets, the hierarchical clustering
step repeated for each of them can take a few minutes. To speed things up
(especially) when playing with the plotting parameters for having a nice plot,
one can run the clustTrend
function beforehand, and plug its results
in the plot.TcGSA
function via the clust_trends
argument.
An object of class hclust which describes the tree produced by the clustering process. The object is a list with components:
merge
an by
matrix. Row
of
merge
describes the merging of clusters at step i of the clustering.
If an element in the row is negative, then observation -
was
merged at this stage. If
is positive then the merge was with the
cluster formed at the (earlier) stage
of the algorithm. Thus
negative entries in merge indicate agglomerations of singletons, and positive
entries indicate agglomerations of non-singletons.
height
a set of real values (non-decreasing for
ultrametric trees). The clustering height: that is, the value of the
criterion associated with the Ward clustering method.
order
a vector giving the permutation of the original
observations suitable for plotting, in the sense that a cluster plot using
this ordering and matrix merge will not have crossings of the branches.
labels
the gene set trends name.
call
the call which produced the result clustering:
hclust(d = dist(map2heat, method = "euclidean"), method = "ward.D2")
method
"ward.D2", as it is the clustering method that has been used
for clustering the gene set trends.
dist.method
"euclidean", as it is the distance that has been used
for clustering the gene set trends.
legend.breaks
a numeric vector giving the splitting points used
for coloring the heatmap. If plot
is FALSE
, then it is
NULL
.
myclusters
a character vector of colors for the dynamic clusters
of the represented gene set trends, with as many levels as the value of
N_clusters
. If no dynamic clusters were represented, than this is
NULL
.
ddr
a dendrogram object with the reordering used for the
heatmap. See heatmap.2
function from package
gplots
.
gene set.names character vector with the names of the gene sets used in the heatmap.
clust.trends
a ClusteredTrends object.
clustersExport
a data frame with 2 variables containing the two
following variables :
GeneSet
: the gene set trends
clustered.
Cluster
: the dynamic cluster they belong to.
The
data frame is order by the variable Cluster
.
data_plotted
: the data matrix represented by the heatmap
Boris P. Hejblum
Hejblum BP, Skinner J, Thiebaut R, (2015) Time-Course Gene Set Analysis for Longitudinal Gene Expression Data. PLOS Comput. Biol. 11(6): e1004310. doi: 10.1371/journal.pcbi.1004310
if(interactive()){ data(data_simu_TcGSA) tcgsa_sim_1grp <- TcGSA.LR(expr=expr_1grp, gmt=gmt_sim, design=design, subject_name="Patient_ID", time_name="TimePoint", time_func="linear", crossedRandom=FALSE) summary(tcgsa_sim_1grp) plot(x=tcgsa_sim_1grp, expr=tcgsa_sim_1grp$Estimations, Subject_ID=design$Patient_ID, TimePoint=design$TimePoint, baseline=1, B=100, time_unit="H", dendrogram.size=0.4, heatmap.width=0.8, heatmap.height=2, cex.main=0.7 ) tcgsa_sim_2grp <- TcGSA.LR(expr=expr_2grp, gmt=gmt_sim, design=design, subject_name="Patient_ID", time_name="TimePoint", time_func="linear", crossedRandom=FALSE, group_name="group.var") summary(tcgsa_sim_2grp) plot(x=tcgsa_sim_2grp, expr=expr_2grp, Subject_ID=design$Patient_ID, TimePoint=design$TimePoint, B=100, time_unit="H", ) }
if(interactive()){ data(data_simu_TcGSA) tcgsa_sim_1grp <- TcGSA.LR(expr=expr_1grp, gmt=gmt_sim, design=design, subject_name="Patient_ID", time_name="TimePoint", time_func="linear", crossedRandom=FALSE) summary(tcgsa_sim_1grp) plot(x=tcgsa_sim_1grp, expr=tcgsa_sim_1grp$Estimations, Subject_ID=design$Patient_ID, TimePoint=design$TimePoint, baseline=1, B=100, time_unit="H", dendrogram.size=0.4, heatmap.width=0.8, heatmap.height=2, cex.main=0.7 ) tcgsa_sim_2grp <- TcGSA.LR(expr=expr_2grp, gmt=gmt_sim, design=design, subject_name="Patient_ID", time_name="TimePoint", time_func="linear", crossedRandom=FALSE, group_name="group.var") summary(tcgsa_sim_2grp) plot(x=tcgsa_sim_2grp, expr=expr_2grp, Subject_ID=design$Patient_ID, TimePoint=design$TimePoint, B=100, time_unit="H", ) }
This function can plot different representations of the gene expression in a specific gene set.
plot1GS( expr, gmt, Subject_ID, TimePoint, geneset.name, baseline = NULL, group.var = NULL, Group_ID_paired = NULL, ref = NULL, group_of_interest = NULL, FUNcluster = NULL, clustering_metric = "euclidian", clustering_method = "ward", B = 500, max_trends = 4, aggreg.fun = "median", na.rm.aggreg = TRUE, trend.fun = "median", methodOptiClust = "firstSEmax", indiv = "genes", verbose = TRUE, clustering = TRUE, showTrend = TRUE, smooth = TRUE, precluster = NULL, time_unit = "", title = NULL, y.lab = NULL, desc = TRUE, lab.cex = 1, axis.cex = 1, main.cex = 1, y.lab.angle = 90, x.axis.angle = 45, margins = 1, line.size = 1, y.lim = NULL, x.lim = NULL, gg.add = list(theme()), plot = TRUE )
plot1GS( expr, gmt, Subject_ID, TimePoint, geneset.name, baseline = NULL, group.var = NULL, Group_ID_paired = NULL, ref = NULL, group_of_interest = NULL, FUNcluster = NULL, clustering_metric = "euclidian", clustering_method = "ward", B = 500, max_trends = 4, aggreg.fun = "median", na.rm.aggreg = TRUE, trend.fun = "median", methodOptiClust = "firstSEmax", indiv = "genes", verbose = TRUE, clustering = TRUE, showTrend = TRUE, smooth = TRUE, precluster = NULL, time_unit = "", title = NULL, y.lab = NULL, desc = TRUE, lab.cex = 1, axis.cex = 1, main.cex = 1, y.lab.angle = 90, x.axis.angle = 45, margins = 1, line.size = 1, y.lim = NULL, x.lim = NULL, gg.add = list(theme()), plot = TRUE )
expr |
either a matrix or dataframe of gene expression upon which
dynamics are to be calculated, or a list of gene sets estimation of gene
expression. In the case of a matrix or dataframe, its dimension are |
gmt |
a gmt object containing the gene sets definition. See
|
Subject_ID |
a factor of length |
TimePoint |
a numeric vector or a factor of length |
geneset.name |
a character string containing the name of the gene set to
be plotted, that must appear in the |
baseline |
a character string which is the value of |
group.var |
in the case of several treatment groups, this is a factor of
length |
Group_ID_paired |
a character vector of length |
ref |
the group which is used as reference in the case of several
treatment groups. Default is |
group_of_interest |
the group of interest, for which dynamics are to be
computed in the case of several treatment groups. Default is |
FUNcluster |
a function which accepts as first argument a matrix
|
clustering_metric |
character string specifying the metric to be used
for calculating dissimilarities between observations in the hierarchical
clustering when |
clustering_method |
character string defining the agglomerative method
to be used in the hierarchical clustering when |
B |
integer specifying the number of Monte Carlo ("bootstrap") samples
used to compute the gap statistics. Default is |
max_trends |
integer specifying the maximum number of different clusters
to be tested. Default is |
aggreg.fun |
a character string such as |
na.rm.aggreg |
a logical flag indicating whether |
trend.fun |
a character string such as |
methodOptiClust |
character string indicating how the "optimal" number
of clusters is computed from the gap statistics and their standard
deviations. Possible values are |
indiv |
a character string indicating by which unit observations are
aggregated (through |
verbose |
logical flag enabling verbose messages to track the computing
status of the function. Default is |
clustering |
logical flag. If |
showTrend |
logical flag. If |
smooth |
logical flag. If |
precluster |
a vector of length |
time_unit |
the time unit to be displayed (such as |
title |
character specifying the title of the plot. If |
y.lab |
character specifying the annotation of the y axis. If |
desc |
a logical flag. If |
lab.cex |
a numerical value giving the amount by which lab labels text
should be magnified relative to the default |
axis.cex |
a numerical value giving the amount by which axis annotation
text should be magnified relative to the default |
main.cex |
a numerical value giving the amount by which title text
should be magnified relative to the default |
y.lab.angle |
a numerical value (in [0, 360]) giving the orientation by
which y-label text should be turned (anti-clockwise). Default is |
x.axis.angle |
a numerical value (in [0, 360]) giving the orientation by
which x-axis annotation text should be turned (anti-clockwise). Default is
|
margins |
a numerical value giving the amount by which the margins
should be reduced or increased relative to the default |
line.size |
a numerical value giving the amount by which the line sizes
should be reduced or increased relative to the default |
y.lim |
a numeric vector of length 2 giving the range of the y-axis.
See |
x.lim |
if numeric, will create a continuous scale, if factor or
character, will create a discrete scale. Observations not in this range will
be dropped. See |
gg.add |
A list of instructions to add to the |
plot |
logical flag. If |
If expr
is a matrix or a dataframe, then the "original" data are
plotted. On the other hand, if expr
is a list returned in the
'Estimations'
element of TcGSA.LR
, then it is those
"estimations" made by the TcGSA.LR
function that are plotted.
If indiv
is 'genes', then each line of the plot is the median of a
gene expression over the patients. On the other hand, if indiv
is
'patients', then each line of the plot is the median of a patient genes
expression in this gene set.
This function uses the Gap statistics to determine the optimal number of
clusters in the plotted gene set. See
clusGap
.
A list with 2 elements:
classif
: a data.frame
with the 2 following variables: ProbeID
which
contains the IDs of the probes of the plotted gene set, and Cluster
containing $
which cluster the probe belongs to. If clustering
is FALSE
, then Cluster
is NA
for all the probes.
p
: a ggplot
object containing the plot
Boris P. Hejblum
Tibshirani, R., Walther, G. and Hastie, T., 2001, Estimating the number of data clusters via the Gap statistic, Journal of the Royal Statistical Society, Series B (Statistical Methodology), 63, 2: 411–423.
if(interactive()){ data(data_simu_TcGSA) tcgsa_sim_1grp <- TcGSA.LR(expr=expr_1grp, gmt=gmt_sim, design=design, subject_name="Patient_ID", time_name="TimePoint", time_func="linear", crossedRandom=FALSE) plot1GS(expr=expr_1grp, TimePoint=design$TimePoint, Subject_ID=design$Patient_ID, gmt=gmt_sim, geneset.name="Gene set 4", indiv="genes", clustering=FALSE, time_unit="H", lab.cex=0.7) plot1GS(expr=expr_1grp, TimePoint=design$TimePoint, Subject_ID=design$Patient_ID, gmt=gmt_sim, geneset.name="Gene set 5", indiv="patients", clustering=FALSE, baseline=1, time_unit="H", lab.cex=0.7) } if(interactive()){ geneclusters <- plot1GS(expr=tcgsa_sim_1grp$Estimations, TimePoint=design$TimePoint, Subject_ID=design$Patient_ID, gmt=gmt_sim, geneset.name="Gene set 5", indiv="genes", time_unit="H", lab.cex=0.7 ) geneclusters } if(interactive()){ library(grDevices) library(graphics) colval <- c(hsv(0.56, 0.9, 1), hsv(0, 0.27, 1), hsv(0.52, 1, 0.5), hsv(0, 0.55, 0.97), hsv(0.66, 0.15, 1), hsv(0, 0.81, 0.55), hsv(0.7, 1, 0.7), hsv(0.42, 0.33, 1) ) n <- length(colval); y <- 1:n op <- par(mar=rep(1.5,4)) plot(y, axes = FALSE, frame.plot = TRUE, xlab = "", ylab = "", pch = 21, cex = 8, bg = colval, ylim=c(-1,n+1), xlim=c(-1,n+1), main = "Color scale" ) par(op) plot1GS(expr=expr_1grp, TimePoint=design$TimePoint, Subject_ID=design$Patient_ID, gmt=gmt_sim, geneset.name="Gene set 5", indiv="genes", time_unit="H", title="", gg.add=list(scale_color_manual(values=colval), guides(colour = guide_legend(reverse=TRUE))), lab.cex=0.7 ) plot1GS(expr=expr_2grp, TimePoint=design$TimePoint, Subject_ID=design$Patient_ID, gmt=gmt_sim, geneset.name="Gene set 3", indiv="genes", group.var = design$group.var, time_unit="H", gg.add=list(scale_color_manual(values=colval), guides(colour = guide_legend(reverse=TRUE))), lab.cex=0.7 ) }
if(interactive()){ data(data_simu_TcGSA) tcgsa_sim_1grp <- TcGSA.LR(expr=expr_1grp, gmt=gmt_sim, design=design, subject_name="Patient_ID", time_name="TimePoint", time_func="linear", crossedRandom=FALSE) plot1GS(expr=expr_1grp, TimePoint=design$TimePoint, Subject_ID=design$Patient_ID, gmt=gmt_sim, geneset.name="Gene set 4", indiv="genes", clustering=FALSE, time_unit="H", lab.cex=0.7) plot1GS(expr=expr_1grp, TimePoint=design$TimePoint, Subject_ID=design$Patient_ID, gmt=gmt_sim, geneset.name="Gene set 5", indiv="patients", clustering=FALSE, baseline=1, time_unit="H", lab.cex=0.7) } if(interactive()){ geneclusters <- plot1GS(expr=tcgsa_sim_1grp$Estimations, TimePoint=design$TimePoint, Subject_ID=design$Patient_ID, gmt=gmt_sim, geneset.name="Gene set 5", indiv="genes", time_unit="H", lab.cex=0.7 ) geneclusters } if(interactive()){ library(grDevices) library(graphics) colval <- c(hsv(0.56, 0.9, 1), hsv(0, 0.27, 1), hsv(0.52, 1, 0.5), hsv(0, 0.55, 0.97), hsv(0.66, 0.15, 1), hsv(0, 0.81, 0.55), hsv(0.7, 1, 0.7), hsv(0.42, 0.33, 1) ) n <- length(colval); y <- 1:n op <- par(mar=rep(1.5,4)) plot(y, axes = FALSE, frame.plot = TRUE, xlab = "", ylab = "", pch = 21, cex = 8, bg = colval, ylim=c(-1,n+1), xlim=c(-1,n+1), main = "Color scale" ) par(op) plot1GS(expr=expr_1grp, TimePoint=design$TimePoint, Subject_ID=design$Patient_ID, gmt=gmt_sim, geneset.name="Gene set 5", indiv="genes", time_unit="H", title="", gg.add=list(scale_color_manual(values=colval), guides(colour = guide_legend(reverse=TRUE))), lab.cex=0.7 ) plot1GS(expr=expr_2grp, TimePoint=design$TimePoint, Subject_ID=design$Patient_ID, gmt=gmt_sim, geneset.name="Gene set 3", indiv="genes", group.var = design$group.var, time_unit="H", gg.add=list(scale_color_manual(values=colval), guides(colour = guide_legend(reverse=TRUE))), lab.cex=0.7 ) }
This function plots graphs informing on the fit of the mixed modeling of the gene expression performed in TcGSA, for 1 or several gene sets.
plotFit.GS( x, expr, design, subject_name = "Patient_ID", time_name = "TimePoint", colnames_ID, plot_type = c("Fit", "Residuals Obs", "Residuals Est", "Histogram Obs"), GeneSetsList, color = c("genes", "time", "subjects"), marginal_hist = TRUE, gg.add = list(theme()) )
plotFit.GS( x, expr, design, subject_name = "Patient_ID", time_name = "TimePoint", colnames_ID, plot_type = c("Fit", "Residuals Obs", "Residuals Est", "Histogram Obs"), GeneSetsList, color = c("genes", "time", "subjects"), marginal_hist = TRUE, gg.add = list(theme()) )
x |
a tcgsa object for |
expr |
a matrix or dataframe of gene expression. Its dimension are
|
design |
a matrix or dataframe containing the experimental variables that used in the model,
namely |
subject_name |
the name of the factor variable from |
time_name |
the name of a numeric variable from |
colnames_ID |
the name of the variable from |
plot_type |
a character string indicating the type of plot to be drawn. The options are
|
GeneSetsList |
a character string containing the names of the gene set whose fit is being checked. If several gene sets are being checked, can be a character list or vector of the names of those gene sets. |
color |
a character string indicating which color scale should be used. One of the 3 :
|
marginal_hist |
a logical flag indicating whether marginal histograms should be drawn.
Only used for |
gg.add |
A list of instructions to add to the |
Boris P. Hejblum
Hejblum BP, Skinner J, Thiebaut R, (2015) Time-Course Gene Set Analysis for Longitudinal Gene Expression Data. PLOS Comput. Biol. 11(6):e1004310. doi: 10.1371/journal.pcbi.1004310
if(interactive()){ data(data_simu_TcGSA) tcgsa_sim_1grp <- TcGSA.LR(expr=expr_1grp, gmt=gmt_sim, design=design, subject_name="Patient_ID", time_name="TimePoint", time_func="linear", crossedRandom=FALSE) plotFit.GS(x=tcgsa_sim_1grp, expr=expr_1grp, design=design, subject_name="Patient_ID", time_name="TimePoint", colnames_ID="Sample_name", plot_type="Residuals Obs", GeneSetsList=c("Gene set 1", "Gene set 2", "Gene set 3", "Gene set 4", "Gene set 5"), color="genes", gg.add=list(guides(color=FALSE)) ) plotFit.GS(x=tcgsa_sim_1grp, expr=expr_1grp, design=design, subject_name="Patient_ID", time_name="TimePoint", colnames_ID="Sample_name", plot_type="Histogram Obs", GeneSetsList=c("Gene set 1", "Gene set 5"), color="genes", gg.add=list(guides(fill=FALSE)) ) plotFit.GS(x=tcgsa_sim_1grp, expr=expr_1grp, design=design, subject_name="Patient_ID", time_name="TimePoint", colnames_ID="Sample_name", plot_type="Histogram Obs", GeneSetsList=c("Gene set 1", "Gene set 2", "Gene set 3", "Gene set 4", "Gene set 5"), color="genes") }
if(interactive()){ data(data_simu_TcGSA) tcgsa_sim_1grp <- TcGSA.LR(expr=expr_1grp, gmt=gmt_sim, design=design, subject_name="Patient_ID", time_name="TimePoint", time_func="linear", crossedRandom=FALSE) plotFit.GS(x=tcgsa_sim_1grp, expr=expr_1grp, design=design, subject_name="Patient_ID", time_name="TimePoint", colnames_ID="Sample_name", plot_type="Residuals Obs", GeneSetsList=c("Gene set 1", "Gene set 2", "Gene set 3", "Gene set 4", "Gene set 5"), color="genes", gg.add=list(guides(color=FALSE)) ) plotFit.GS(x=tcgsa_sim_1grp, expr=expr_1grp, design=design, subject_name="Patient_ID", time_name="TimePoint", colnames_ID="Sample_name", plot_type="Histogram Obs", GeneSetsList=c("Gene set 1", "Gene set 5"), color="genes", gg.add=list(guides(fill=FALSE)) ) plotFit.GS(x=tcgsa_sim_1grp, expr=expr_1grp, design=design, subject_name="Patient_ID", time_name="TimePoint", colnames_ID="Sample_name", plot_type="Histogram Obs", GeneSetsList=c("Gene set 1", "Gene set 2", "Gene set 3", "Gene set 4", "Gene set 5"), color="genes") }
This function can plot different representations of the gene expression in a list of gene sets.
plotMultipleGS( genesets_list, ncolumns = 1, labels = NULL, expr, gmt, Subject_ID, TimePoint, baseline = NULL, group.var = NULL, Group_ID_paired = NULL, ref = NULL, group_of_interest = NULL, FUNcluster = NULL, clustering_metric = "euclidian", clustering_method = "ward", B = 500, max_trends = 4, aggreg.fun = "median", na.rm.aggreg = TRUE, trend.fun = "median", methodOptiClust = "firstSEmax", indiv = "genes", verbose = TRUE, clustering = TRUE, showTrend = TRUE, smooth = TRUE, time_unit = "", y.lab = NULL, desc = TRUE, lab.cex = 1, axis.cex = 1, main.cex = 1, y.lab.angle = 90, x.axis.angle = 45, margins = 1, line.size = 1, y.lim = NULL, x.lim = NULL, gg.add = list(), show_plot = TRUE )
plotMultipleGS( genesets_list, ncolumns = 1, labels = NULL, expr, gmt, Subject_ID, TimePoint, baseline = NULL, group.var = NULL, Group_ID_paired = NULL, ref = NULL, group_of_interest = NULL, FUNcluster = NULL, clustering_metric = "euclidian", clustering_method = "ward", B = 500, max_trends = 4, aggreg.fun = "median", na.rm.aggreg = TRUE, trend.fun = "median", methodOptiClust = "firstSEmax", indiv = "genes", verbose = TRUE, clustering = TRUE, showTrend = TRUE, smooth = TRUE, time_unit = "", y.lab = NULL, desc = TRUE, lab.cex = 1, axis.cex = 1, main.cex = 1, y.lab.angle = 90, x.axis.angle = 45, margins = 1, line.size = 1, y.lim = NULL, x.lim = NULL, gg.add = list(), show_plot = TRUE )
genesets_list |
a list of the character strings giving the names of the
gene sets to be plotted as they appear in |
ncolumns |
the number of columns used to display the multiple plots.
Default is |
labels |
List of labels to be added to the plots.
You can also set |
expr |
either a matrix or dataframe of gene expression upon which
dynamics are to be calculated, or a list of gene sets estimation of gene
expression. In the case of a matrix or dataframe, its dimension are |
gmt |
a gmt object containing the gene sets definition. See
|
Subject_ID |
a factor of length |
TimePoint |
a numeric vector or a factor of length |
baseline |
a character string which is the value of |
group.var |
in the case of several treatment groups, this is a factor of
length |
Group_ID_paired |
a character vector of length |
ref |
the group which is used as reference in the case of several
treatment groups. Default is |
group_of_interest |
the group of interest, for which dynamics are to be
computed in the case of several treatment groups. Default is |
FUNcluster |
a function which accepts as first argument a matrix
|
clustering_metric |
character string specifying the metric to be used
for calculating dissimilarities between observations in the hierarchical
clustering when |
clustering_method |
character string defining the agglomerative method
to be used in the hierarchical clustering when |
B |
integer specifying the number of Monte Carlo ("bootstrap") samples
used to compute the gap statistics. Default is |
max_trends |
integer specifying the maximum number of different clusters
to be tested. Default is |
aggreg.fun |
a character string such as |
na.rm.aggreg |
a logical flag indicating whether |
trend.fun |
a character string such as |
methodOptiClust |
character string indicating how the "optimal" number
of clusters is computed from the gap statistics and their standard
deviations. Possible values are |
indiv |
a character string indicating by which unit observations are
aggregated (through |
verbose |
logical flag enabling verbose messages to track the computing
status of the function. Default is |
clustering |
logical flag. If |
showTrend |
logical flag. If |
smooth |
logical flag. If |
time_unit |
the time unit to be displayed (such as |
y.lab |
character specifying the annotation of the y axis. If |
desc |
a logical flag. If |
lab.cex |
a numerical value giving the amount by which lab labels text
should be magnified relative to the default |
axis.cex |
a numerical value giving the amount by which axis annotation
text should be magnified relative to the default |
main.cex |
a numerical value giving the amount by which title text
should be magnified relative to the default |
y.lab.angle |
a numerical value (in [0, 360]) giving the orientation by
which y-label text should be turned (anti-clockwise). Default is |
x.axis.angle |
a numerical value (in [0, 360]) giving the orientation by
which x-axis annotation text should be turned (anti-clockwise). Default is
|
margins |
a numerical value giving the amount by which the margins
should be reduced or increased relative to the default |
line.size |
a numerical value giving the amount by which the line sizes
should be reduced or increased relative to the default |
y.lim |
a numeric vector of length 2 giving the range of the y-axis.
See |
x.lim |
if numeric, will create a continuous scale, if factor or
character, will create a discrete scale. Observations not in this range will
be dropped. See |
gg.add |
A list of instructions to add to the |
show_plot |
logical flag. If |
If expr
is a matrix or a dataframe, then the "original" data are
plotted. On the other hand, if expr
is a list returned in the
'Estimations'
element of TcGSA.LR
, then it is those
"estimations" made by the TcGSA.LR
function that are plotted.
If indiv
is 'genes', then each line of the plot is the median of a
gene expression over the patients. On the other hand, if indiv
is
'patients', then each line of the plot is the median of a patient genes
expression in this gene set.
This function uses the Gap statistics to determine the optimal number of
clusters in the plotted gene set. See
clusGap
.
A list with 2 elements:
classif
: a data.frame
with the 2 following variables: ProbeID
which
contains the IDs of the probes of the plotted gene set, and Cluster
containing $
which cluster the probe belongs to. If clustering
is FALSE
, then Cluster
is NA
for all the probes.
p
: a ggplot
object containing the plot
Boris P. Hejblum
This function can plot different representations of the gene expression in a specific gene set, stratified on all subjects.
plotPat.1GS( expr, gmt, Subject_ID, TimePoint, geneset.name, baseline = NULL, group.var = NULL, Group_ID_paired = NULL, ref = NULL, group_of_interest = NULL, FUNcluster = NULL, clustering_metric = "euclidian", clustering_method = "ward", B = 500, max_trends = 4, aggreg.fun = "median", na.rm.aggreg = TRUE, trend.fun = "median", methodOptiClust = "firstSEmax", verbose = TRUE, clustering = TRUE, time_unit = "", title = NULL, y.lab = NULL, desc = TRUE, lab.cex = 1, axis.cex = 1, main.cex = 1, y.lab.angle = 90, x.axis.angle = 45, y.lim = NULL, x.lim = NULL, gg.add = list(theme()) )
plotPat.1GS( expr, gmt, Subject_ID, TimePoint, geneset.name, baseline = NULL, group.var = NULL, Group_ID_paired = NULL, ref = NULL, group_of_interest = NULL, FUNcluster = NULL, clustering_metric = "euclidian", clustering_method = "ward", B = 500, max_trends = 4, aggreg.fun = "median", na.rm.aggreg = TRUE, trend.fun = "median", methodOptiClust = "firstSEmax", verbose = TRUE, clustering = TRUE, time_unit = "", title = NULL, y.lab = NULL, desc = TRUE, lab.cex = 1, axis.cex = 1, main.cex = 1, y.lab.angle = 90, x.axis.angle = 45, y.lim = NULL, x.lim = NULL, gg.add = list(theme()) )
expr |
either a matrix or dataframe of gene expression upon which
dynamics are to be calculated, or a list of gene sets estimation of gene
expression. In the case of a matrix or dataframe, its dimension are |
gmt |
a gmt object containing the gene sets definition. See
|
Subject_ID |
a factor of length |
TimePoint |
a numeric vector or a factor of length |
geneset.name |
a character string containing the name of the gene set to
be plotted, that must appear in the |
baseline |
a character string which is the value of |
group.var |
in the case of several treatment groups, this is a factor of
length |
Group_ID_paired |
a character vector of length |
ref |
the group which is used as reference in the case of several
treatment groups. Default is |
group_of_interest |
the group of interest, for which dynamics are to be
computed in the case of several treatment groups. Default is |
FUNcluster |
a function which accepts as first argument a matrix
|
clustering_metric |
character string specifying the metric to be used
for calculating dissimilarities between observations in the hierarchical
clustering when |
clustering_method |
character string defining the agglomerative method
to be used in the hierarchical clustering when |
B |
integer specifying the number of Monte Carlo ("bootstrap") samples
used to compute the gap statistics. Default is |
max_trends |
integer specifying the maximum number of different clusters
to be tested. Default is |
aggreg.fun |
a character string such as |
na.rm.aggreg |
a logical flag indicating whether |
trend.fun |
a character string such as |
methodOptiClust |
character string indicating how the "optimal" number
of clusters is computed from the gap statistics and their standard
deviations. Possible values are |
verbose |
logical flag enabling verbose messages to track the computing
status of the function. Default is |
clustering |
logical flag. If |
time_unit |
the time unit to be displayed (such as |
title |
character specifying the title of the plot. If |
y.lab |
character specifying the annotation of the y axis. If |
desc |
a logical flag. If |
lab.cex |
a numerical value giving the amount by which lab labels text
should be magnified relative to the default |
axis.cex |
a numerical value giving the amount by which axis annotation
text should be magnified relative to the default |
main.cex |
a numerical value giving the amount by which title text
should be magnified relative to the default |
y.lab.angle |
a numerical value (in [0, 360]) giving the orientation by
which y-label text should be turned (anti-clockwise). Default is |
x.axis.angle |
a numerical value (in [0, 360]) giving the orientation by
which x-axis annotation text should be turned (anti-clockwise). Default is
|
y.lim |
a numeric vector of length 2 giving the range of the y-axis.
See |
x.lim |
if numeric, will create a continuous scale, if factor or
character, will create a discrete scale. Observations not in this range will
be dropped. See |
gg.add |
A list of instructions to add to the |
If expr
is a matrix or a dataframe, then the "original" data are
plotted. On the other hand, if expr
is a list returned in the
'Estimations'
element of TcGSA.LR
, then it is those
"estimations" made by the TcGSA.LR
function that are plotted.
If indiv
is 'genes', then each line of the plot is the median of a
gene expression over the patients. On the other hand, if indiv
is
'patients', then each line of the plot is the median of a patient genes
expression in this gene set.
This function uses the Gap statistics to determine the optimal number of
clusters in the plotted gene set. See
clusGap
.
A dataframe the 2 following variables:
ProbeID
which contains the IDs of the probes of the plotted gene set.
Cluster
which to which cluster the probe belongs to.
If clustering
is FALSE
, then Cluster
is NA
for all the probes.
Boris P. Hejblum
Tibshirani, R., Walther, G. and Hastie, T., 2001, Estimating the number of data clusters via the Gap statistic, Journal of the Royal Statistical Society, Series B (Statistical Methodology), 63, 2: 411–423.
if(interactive()){ data(data_simu_TcGSA) tcgsa_sim_1grp <- TcGSA.LR(expr=expr_1grp, gmt=gmt_sim, design=design, subject_name="Patient_ID", time_name="TimePoint", time_func="linear", crossedRandom=FALSE) plotPat.1GS(expr=expr_1grp, TimePoint=design$TimePoint, Subject_ID=design$Patient_ID, gmt=gmt_sim, geneset.name="Gene set 4", clustering=FALSE, time_unit="H", lab.cex=0.7) plotPat.1GS(expr=expr_1grp, TimePoint=design$TimePoint, Subject_ID=design$Patient_ID, gmt=gmt_sim, geneset.name="Gene set 4", clustering=FALSE, baseline=1, time_unit="H", lab.cex=0.7) colval <- c(hsv(0.56, 0.9, 1), hsv(0, 0.27, 1), hsv(0.52, 1, 0.5), hsv(0, 0.55, 0.97), hsv(0.66, 0.15, 1), hsv(0, 0.81, 0.55), hsv(0.7, 1, 0.7), hsv(0.42, 0.33, 1) ) n <- length(colval); y <- 1:n op <- par(mar=rep(1.5,4)) plot(y, axes = FALSE, frame.plot = TRUE, xlab = "", ylab = "", pch = 21, cex = 8, bg = colval, ylim=c(-1,n+1), xlim=c(-1,n+1), main = "Color scale" ) par(op) plotPat.1GS(expr=expr_1grp, TimePoint=design$TimePoint, Subject_ID=design$Patient_ID, gmt=gmt_sim, geneset.name="Gene set 5", time_unit="H", title="", gg.add=list(scale_color_manual(values=colval)), lab.cex=0.7 ) plotPat.1GS(expr=tcgsa_sim_1grp$Estimations, TimePoint=design$TimePoint, Subject_ID=design$Patient_ID, gmt=gmt_sim, geneset.name="Gene set 3", time_unit="H", lab.cex=0.7 ) }
if(interactive()){ data(data_simu_TcGSA) tcgsa_sim_1grp <- TcGSA.LR(expr=expr_1grp, gmt=gmt_sim, design=design, subject_name="Patient_ID", time_name="TimePoint", time_func="linear", crossedRandom=FALSE) plotPat.1GS(expr=expr_1grp, TimePoint=design$TimePoint, Subject_ID=design$Patient_ID, gmt=gmt_sim, geneset.name="Gene set 4", clustering=FALSE, time_unit="H", lab.cex=0.7) plotPat.1GS(expr=expr_1grp, TimePoint=design$TimePoint, Subject_ID=design$Patient_ID, gmt=gmt_sim, geneset.name="Gene set 4", clustering=FALSE, baseline=1, time_unit="H", lab.cex=0.7) colval <- c(hsv(0.56, 0.9, 1), hsv(0, 0.27, 1), hsv(0.52, 1, 0.5), hsv(0, 0.55, 0.97), hsv(0.66, 0.15, 1), hsv(0, 0.81, 0.55), hsv(0.7, 1, 0.7), hsv(0.42, 0.33, 1) ) n <- length(colval); y <- 1:n op <- par(mar=rep(1.5,4)) plot(y, axes = FALSE, frame.plot = TRUE, xlab = "", ylab = "", pch = 21, cex = 8, bg = colval, ylim=c(-1,n+1), xlim=c(-1,n+1), main = "Color scale" ) par(op) plotPat.1GS(expr=expr_1grp, TimePoint=design$TimePoint, Subject_ID=design$Patient_ID, gmt=gmt_sim, geneset.name="Gene set 5", time_unit="H", title="", gg.add=list(scale_color_manual(values=colval)), lab.cex=0.7 ) plotPat.1GS(expr=tcgsa_sim_1grp$Estimations, TimePoint=design$TimePoint, Subject_ID=design$Patient_ID, gmt=gmt_sim, geneset.name="Gene set 3", time_unit="H", lab.cex=0.7 ) }
This function plots a series of gene sets dynamic trends heatmaps. One heatmap is drawn for each patient. NOT IMPLEMENTED YET (TODO)
plotPat.TcGSA( x, threshold = 0.05, myproc = "BY", nbsimu_pval = 1e+06, expr, Subject_ID, TimePoint, baseline = NULL, only.signif = TRUE, group.var = NULL, Group_ID_paired = NULL, ref = NULL, group_of_interest = NULL, FUNcluster = NULL, clustering_metric = "euclidian", clustering_method = "ward", B = 500, max_trends = 4, aggreg.fun = "median", na.rm.aggreg = TRUE, methodOptiClust = "firstSEmax", verbose = TRUE, clust_trends = NULL, N_clusters = NULL, myclusters = NULL, label.clusters = NULL, prev_rowCL = NULL, descript = TRUE, plotAll = TRUE, color.vec = c("darkred", "#D73027", "#FC8D59", "snow", "#91BFDB", "#4575B4", "darkblue"), legend.breaks = NULL, label.column = NULL, time_unit = "", cex.label.row = 1, cex.label.column = 1, margins = c(5, 25), heatKey.size = 1, dendrogram.size = 1, heatmap.height = 1, heatmap.width = 1, cex.clusterKey = 1, cex.main = 1, horiz.clusterKey = TRUE, main = NULL, subtitle = NULL, ... )
plotPat.TcGSA( x, threshold = 0.05, myproc = "BY", nbsimu_pval = 1e+06, expr, Subject_ID, TimePoint, baseline = NULL, only.signif = TRUE, group.var = NULL, Group_ID_paired = NULL, ref = NULL, group_of_interest = NULL, FUNcluster = NULL, clustering_metric = "euclidian", clustering_method = "ward", B = 500, max_trends = 4, aggreg.fun = "median", na.rm.aggreg = TRUE, methodOptiClust = "firstSEmax", verbose = TRUE, clust_trends = NULL, N_clusters = NULL, myclusters = NULL, label.clusters = NULL, prev_rowCL = NULL, descript = TRUE, plotAll = TRUE, color.vec = c("darkred", "#D73027", "#FC8D59", "snow", "#91BFDB", "#4575B4", "darkblue"), legend.breaks = NULL, label.column = NULL, time_unit = "", cex.label.row = 1, cex.label.column = 1, margins = c(5, 25), heatKey.size = 1, dendrogram.size = 1, heatmap.height = 1, heatmap.width = 1, cex.clusterKey = 1, cex.main = 1, horiz.clusterKey = TRUE, main = NULL, subtitle = NULL, ... )
x |
a tcgsa object. |
threshold |
the threshold at which the FDR or the FWER should be controlled. |
myproc |
a vector of character strings containing the names of the
multiple testing procedures for which adjusted p-values are to be computed.
This vector should include any of the following: " |
nbsimu_pval |
the number of observations under the null distribution to
be generated in order to compute the p-values. Default is |
expr |
either a matrix or dataframe of gene expression upon which
dynamics are to be calculated, or a list of gene sets estimation of gene
expression. In the case of a matrix or dataframe, its dimension are |
Subject_ID |
a factor of length |
TimePoint |
a numeric vector or a factor of length |
baseline |
a character string which is the value of |
only.signif |
logical flag for plotting only the significant gene sets.
If |
group.var |
in the case of several treatment groups, this is a factor of
length |
Group_ID_paired |
a character vector of length |
ref |
the group which is used as reference in the case of several
treatment groups. Default is |
group_of_interest |
the group of interest, for which dynamics are to be
computed in the case of several treatment groups. Default is |
FUNcluster |
the clustering function used to agglomerate genes in
trends. Default is |
clustering_metric |
character string specifying the metric to be used
for calculating dissimilarities between observations in the hierarchical
clustering when |
clustering_method |
character string defining the agglomerative method
to be used in the hierarchical clustering when |
B |
integer specifying the number of Monte Carlo ("bootstrap") samples
used to compute the gap statistics. Default is |
max_trends |
integer specifying the maximum number of different clusters
to be tested. Default is |
aggreg.fun |
a character string such as |
na.rm.aggreg |
a logical flag indicating whether |
methodOptiClust |
character string indicating how the "optimal"" number
of clusters is computed from the gap statistics and their standard
deviations. Possible values are |
verbose |
logical flag enabling verbose messages to track the computing
status of the function. Default is |
clust_trends |
object of class ClusteredTrends containing
already computed trends for the plotted gene sets. Default is |
N_clusters |
an integer that is the number of clusters in which the
dynamics should be regrouped. The cutoff of the clustering tree is
automatically calculated accordingly. Default is |
myclusters |
a character vector of colors for predefined clusters of the
represented gene sets, with as many levels as the value of |
label.clusters |
if |
prev_rowCL |
a hclust object, such as the one return by the
present plotting function (see Value) for instance. If not |
descript |
logical flag indicating that the description of the gene sets
should appear after their name on the right side of the plot if |
plotAll |
logical flag indicating whether a first heatmap with the median
over all the patients should be plotted, or not. Default is |
color.vec |
a character strings vector used to define the color
palette used in the plot. Default is
|
legend.breaks |
a numeric vector indicating the splitting points for
coloring. Default is |
label.column |
a vector of character strings with the labels to be
displayed for the columns (i.e. the time points). Default is |
time_unit |
the time unit to be displayed (such as |
cex.label.row |
a numerical value giving the amount by which row labels
text should be magnified relative to the default |
cex.label.column |
a numerical value giving the amount by which column
labels text should be magnified relative to the default |
margins |
numeric vector of length 2 containing the margins (see
|
heatKey.size |
the size of the color key for the heatmap fill. Default
is |
dendrogram.size |
the horizontal size of the dendrogram. Default is
|
heatmap.height |
the height of the heatmap. Default is |
heatmap.width |
the width of the heatmap. Default is |
cex.clusterKey |
a numerical value giving the amount by which the
clusters legend text should be magnified relative to the default |
cex.main |
a numerical value giving the amount by which title text
should be magnified relative to the default |
horiz.clusterKey |
a logical flag; if |
main |
a character string for an optional title. Default is
|
subtitle |
a character string for an optional subtitle. Default is
|
... |
other parameters to be passed through to plotting functions. |
On the heatmap, each line corresponds to a gene set, and each column to a time point.
First a heatmap is computed on all the patients (see plot.TcGSA
and clustTrend
) to define the clustering. Then, the clustering
and coloring thus defined on all the patients are consistently used in the
separate heatmaps that are plotted by patient.
If expr
is a matrix or a dataframe, then the "original" data are
plotted. On the other hand, if expr
is a list returned in the
'Estimations'
element of TcGSA.LR
, then it is those
"estimations" made by the TcGSA.LR
function that are plotted.
If descript
is FALSE
, the second element of margins
can
be reduced (for instance use margins = c(5, 10)
), as there is not so
much need for space in order to display only the gene set names, without
their description.
The median shown in the heatmap uses the respectively standardized (reduced and centered) expression of the genes over the patients.
An object of class hclust which describes the tree produced by the clustering process. The object is a list with components:
merge
an by
matrix. Row
of
merge
describes the merging of clusters at step i of the clustering.
If an element in the row is negative, then observation -
was
merged at this stage. If
is positive then the merge was with the
cluster formed at the (earlier) stage
of the algorithm. Thus
negative entries in merge indicate agglomerations of singletons, and positive
entries indicate agglomerations of non-singletons.
height
a set of real values (non-decreasing for
ultrametric trees). The clustering height: that is, the value of the
criterion associated with the Ward clustering method.
order
a vector giving the permutation of the original
observations suitable for plotting, in the sense that a cluster plot using
this ordering and matrix merge will not have crossings of the branches.
labels
the gene sets name.
call
the call which produced the result clustering:
hclust(d = dist(map2heat, method = "euclidean"), method = "ward.D2")
method "ward.D2", as it is the clustering method that has been used for clustering the gene set trends.
dist.method
"euclidean", as it is the distance that has been used
for clustering the gene set trends.
legend.breaks
a numeric vector giving the splitting points used
for coloring the heatmap. If plot
is FALSE
, then it is
NULL
.
myclusters
a character vector of colors for clusters of the
represented gene sets, with as many levels as the value of N_clusters
.
If no clusters were represented, than this is NULL
.
ddr
a dendrogram object with the reordering used for the
heatmap. See heatmap.2
function from package
gplots
.
clustersExport
a data frame with 2 variables containing the two
following variables :
GeneSet
: the gene sets
clustered.
Cluster
: the cluster they belong to.
The data
frame is order by the variable Cluster
.
Boris P. Hejblum
Hejblum BP, Skinner J, Thiebaut R, (2015) Time-Course Gene Set Analysis for Longitudinal Gene Expression Data. PLOS Comput. Biol. 11(6):e1004310. doi: 10.1371/journal.pcbi.1004310
if(interactive()){ data(data_simu_TcGSA) tcgsa_sim_1grp <- TcGSA.LR(expr=expr_1grp, gmt=gmt_sim, design=design, subject_name="Patient_ID", time_name="TimePoint", time_func="linear", crossedRandom=FALSE) plotPat.TcGSA(x=tcgsa_sim_1grp, expr=expr_1grp, Subject_ID=design$Patient_ID, TimePoint=design$TimePoint, B=100, time_unit="H" ) plotPat.TcGSA(x=tcgsa_sim_1grp, expr=tcgsa_sim_1grp$Estimations, Subject_ID=design$Patient_ID, TimePoint=design$TimePoint, baseline=1, B=100, time_unit="H" ) }
if(interactive()){ data(data_simu_TcGSA) tcgsa_sim_1grp <- TcGSA.LR(expr=expr_1grp, gmt=gmt_sim, design=design, subject_name="Patient_ID", time_name="TimePoint", time_func="linear", crossedRandom=FALSE) plotPat.TcGSA(x=tcgsa_sim_1grp, expr=expr_1grp, Subject_ID=design$Patient_ID, TimePoint=design$TimePoint, B=100, time_unit="H" ) plotPat.TcGSA(x=tcgsa_sim_1grp, expr=tcgsa_sim_1grp$Estimations, Subject_ID=design$Patient_ID, TimePoint=design$TimePoint, baseline=1, B=100, time_unit="H" ) }
This function can plot different representations of the gene expression in selected gene sets, among a subset of selected subjects.
plotSelect.GS( expr, gmt, Subject_ID, TimePoint, geneset.names.select, Subject_ID.select, display = "one subject per page", baseline = NULL, group.var = NULL, Group_ID_paired = NULL, ref = NULL, group_of_interest = NULL, FUNcluster = NULL, clustering_metric = "euclidian", clustering_method = "ward", B = 500, max_trends = 4, aggreg.fun = "median", na.rm.aggreg = TRUE, trend.fun = "median", methodOptiClust = "firstSEmax", verbose = TRUE, clustering = TRUE, time_unit = "", title = NULL, y.lab = NULL, desc = TRUE, lab.cex = 1, axis.cex = 1, main.cex = 1, y.lab.angle = 90, x.axis.angle = 45, y.lim = NULL, x.lim = NULL, gg.add = list(theme()) )
plotSelect.GS( expr, gmt, Subject_ID, TimePoint, geneset.names.select, Subject_ID.select, display = "one subject per page", baseline = NULL, group.var = NULL, Group_ID_paired = NULL, ref = NULL, group_of_interest = NULL, FUNcluster = NULL, clustering_metric = "euclidian", clustering_method = "ward", B = 500, max_trends = 4, aggreg.fun = "median", na.rm.aggreg = TRUE, trend.fun = "median", methodOptiClust = "firstSEmax", verbose = TRUE, clustering = TRUE, time_unit = "", title = NULL, y.lab = NULL, desc = TRUE, lab.cex = 1, axis.cex = 1, main.cex = 1, y.lab.angle = 90, x.axis.angle = 45, y.lim = NULL, x.lim = NULL, gg.add = list(theme()) )
expr |
either a matrix or dataframe of gene expression upon which
dynamics are to be calculated, or a list of gene sets estimation of gene
expression. In the case of a matrix or dataframe, its dimension are |
gmt |
a gmt object containing the gene sets definition. See
|
Subject_ID |
a factor of length |
TimePoint |
a numeric vector or a factor of length |
geneset.names.select |
a character vector containing the names of the gene sets to
be plotted, that must appear in the |
Subject_ID.select |
a character vector containing the names of the subjects to
be plotted, that must appear in the |
display |
How to display the resulting graphs. One of the following :
|
baseline |
a character string which is the value of |
group.var |
in the case of several treatment groups, this is a factor of
length |
Group_ID_paired |
a character vector of length |
ref |
the group which is used as reference in the case of several
treatment groups. Default is |
group_of_interest |
the group of interest, for which dynamics are to be
computed in the case of several treatment groups. Default is |
FUNcluster |
a function which accepts as first argument a matrix
|
clustering_metric |
character string specifying the metric to be used
for calculating dissimilarities between observations in the hierarchical
clustering when |
clustering_method |
character string defining the agglomerative method
to be used in the hierarchical clustering when |
B |
integer specifying the number of Monte Carlo ("bootstrap") samples
used to compute the gap statistics. Default is |
max_trends |
integer specifying the maximum number of different clusters
to be tested. Default is |
aggreg.fun |
a character string such as |
na.rm.aggreg |
a logical flag indicating whether |
trend.fun |
a character string such as |
methodOptiClust |
character string indicating how the "optimal" number
of clusters is computed from the gap statistics and their standard
deviations. Possible values are |
verbose |
logical flag enabling verbose messages to track the computing
status of the function. Default is |
clustering |
logical flag. If |
time_unit |
the time unit to be displayed (such as |
title |
character specifying the title of the plot. If |
y.lab |
character specifying the annotation of the y axis. If |
desc |
a logical flag. If |
lab.cex |
a numerical value giving the amount by which lab labels text
should be magnified relative to the default |
axis.cex |
a numerical value giving the amount by which axis annotation
text should be magnified relative to the default |
main.cex |
a numerical value giving the amount by which title text
should be magnified relative to the default |
y.lab.angle |
a numerical value (in [0, 360]) giving the orientation by
which y-label text should be turned (anti-clockwise). Default is |
x.axis.angle |
a numerical value (in [0, 360]) giving the orientation by
which x-axis annotation text should be turned (anti-clockwise). Default is
|
y.lim |
a numeric vector of length 2 giving the range of the y-axis.
See |
x.lim |
if numeric, will create a continuous scale, if factor or
character, will create a discrete scale. Observations not in this range will
be dropped. See |
gg.add |
A list of instructions to add to the |
If expr
is a matrix or a dataframe, then the "original" data are
plotted. On the other hand, if expr
is a list returned in the
'Estimations'
element of TcGSA.LR
, then it is those
"estimations" made by the TcGSA.LR
function that are plotted.
If indiv
is 'genes', then each line of the plot is the median of a
gene expression over the patients. On the other hand, if indiv
is
'patients', then each line of the plot is the median of a patient genes
expression in this gene set.
This function uses the Gap statistics to determine the optimal number of
clusters in the plotted gene set. See
clusGap
.
A dataframe the 2 following variables:
ProbeID
which contains the IDs of the probes of the plotted gene set.
Cluster
which to which cluster the probe belongs to.
If clustering
is FALSE
, then Cluster
is NA
for all the probes.
Boris P. Hejblum
Tibshirani, R., Walther, G. and Hastie, T., 2001, Estimating the number of data clusters via the Gap statistic, Journal of the Royal Statistical Society, Series B (Statistical Methodology), 63, 2: 411–423.
if(interactive()){ data(data_simu_TcGSA) tcgsa_sim_1grp <- TcGSA.LR(expr=expr_1grp, gmt=gmt_sim, design=design, subject_name="Patient_ID", time_name="TimePoint", time_func="linear", crossedRandom=TRUE) } if(interactive()){ plotSelect.GS(expr=expr_1grp, TimePoint=design$TimePoint, Subject_ID=design$Patient_ID, gmt=gmt_sim, geneset.names.select=c("Gene set 4", "Gene set 5"), Subject_ID.select=c("P1", "P2"), display="one GS per page", time_unit="H", lab.cex=0.7 ) } if(interactive()){ plotSelect.GS(expr=tcgsa_sim_1grp$Estimations, TimePoint=design$TimePoint, Subject_ID=design$Patient_ID, gmt=gmt_sim, geneset.names.select=c("Gene set 4", "Gene set 5"), Subject_ID.select=c("P1", "P2"), display="one subject per page", time_unit="H", lab.cex=0.7 ) } if(interactive()){ tcgsa_sim_1grp <- TcGSA.LR(expr=expr_1grp, gmt=gmt_sim, design=design, subject_name="Patient_ID", time_name="TimePoint", time_func="linear", crossedRandom=FALSE) plotSelect.GS(expr=tcgsa_sim_1grp$Estimations, TimePoint=design$TimePoint, Subject_ID=design$Patient_ID, gmt=gmt_sim, geneset.names.select=c("Gene set 4", "Gene set 5"), Subject_ID.select=c("P1", "P2"), display="one subject per page", time_unit="H", lab.cex=0.7 ) }
if(interactive()){ data(data_simu_TcGSA) tcgsa_sim_1grp <- TcGSA.LR(expr=expr_1grp, gmt=gmt_sim, design=design, subject_name="Patient_ID", time_name="TimePoint", time_func="linear", crossedRandom=TRUE) } if(interactive()){ plotSelect.GS(expr=expr_1grp, TimePoint=design$TimePoint, Subject_ID=design$Patient_ID, gmt=gmt_sim, geneset.names.select=c("Gene set 4", "Gene set 5"), Subject_ID.select=c("P1", "P2"), display="one GS per page", time_unit="H", lab.cex=0.7 ) } if(interactive()){ plotSelect.GS(expr=tcgsa_sim_1grp$Estimations, TimePoint=design$TimePoint, Subject_ID=design$Patient_ID, gmt=gmt_sim, geneset.names.select=c("Gene set 4", "Gene set 5"), Subject_ID.select=c("P1", "P2"), display="one subject per page", time_unit="H", lab.cex=0.7 ) } if(interactive()){ tcgsa_sim_1grp <- TcGSA.LR(expr=expr_1grp, gmt=gmt_sim, design=design, subject_name="Patient_ID", time_name="TimePoint", time_func="linear", crossedRandom=FALSE) plotSelect.GS(expr=tcgsa_sim_1grp$Estimations, TimePoint=design$TimePoint, Subject_ID=design$Patient_ID, gmt=gmt_sim, geneset.names.select=c("Gene set 4", "Gene set 5"), Subject_ID.select=c("P1", "P2"), display="one subject per page", time_unit="H", lab.cex=0.7 ) }
Density, distribution function, quantile function and random generation for mixtures of chi-squared distributions that corresponds to the null distribution of the Likelihood Ratio between 2 nested mixed models.
rchisqmix(n, s, q) dchisqmix(x, s, q) qchisqmix(p, s, q) pchisqmix(quant, s, q, lower.tail = TRUE)
rchisqmix(n, s, q) dchisqmix(x, s, q) qchisqmix(p, s, q) pchisqmix(quant, s, q, lower.tail = TRUE)
n |
number of observations. |
s |
number of fixed effects to be tested. |
q |
number of random effects to be tested. |
x , quant
|
a quantile. |
p |
a probability. |
lower.tail |
logical; if |
The approximate null distribution of a likelihood ratio for 2 nested mixed
models, where both fixed and random effects are tested simultaneously, is a
very specific mixture of distributions [Self & Liang
(1987), Stram & Lee (1994) and Stram & Lee (1995)]. It depends on both the
number of random effects and the number of fixed effects to be tested
simultaneously:
A vector of random independent observations of the mixture
identified by the values of
s
and q
.
Boris P. Hejblum
Self, S. G. and Liang, K., 1987, Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions, Journal of the American Statistical Association 82: 605–610.
Stram, D. O. and Lee, J. W., 1994, Variance components testing in the longitudinal mixed effects model, Biometrics 50: 1171–1177.
Stram, D. O. and Lee, J. W., 1995, Corrections to "Variance components testing in the longitudinal mixed effects model" by Stram, D. O. and Lee, J. W.; 50: 1171–1177 (1994), Biometrics 51: 1196.
library(graphics) library(stats) sample_mixt <- rchisqmix(n=1000, s=3, q=3) plot(density(sample_mixt))
library(graphics) library(stats) sample_mixt <- rchisqmix(n=1000, s=3, q=3) plot(density(sample_mixt))
A function that identifies the significant gene sets in an object of class
'TcGSA
'.
signifLRT.TcGSA( tcgsa, threshold = 0.05, myproc = "BY", nbsimu_pval = 1e+06, write = F, txtfilename = NULL, directory = NULL, exact = TRUE )
signifLRT.TcGSA( tcgsa, threshold = 0.05, myproc = "BY", nbsimu_pval = 1e+06, write = F, txtfilename = NULL, directory = NULL, exact = TRUE )
tcgsa |
a |
threshold |
the threshold at which the FDR or the FWER should be controlled. |
myproc |
a vector of character strings containing the names of the
multiple testing procedures for which adjusted p-values are to be computed.
This vector should include any of the following: " |
nbsimu_pval |
the number of observations under the null distribution to
be generated in order to compute the p-values. Default is |
write |
logical flag enabling the export of the results as a table in a
.txt file. Default is |
txtfilename |
a character string with the name of the .txt file in which
the results table is to be written, if |
directory |
if |
exact |
logical flag indicating whether the raw p-values should be computed from the
exact asymptotic mixture of chi-square, or simulated (longer and not better).
Default is |
signifLRT.TcGSA
returns a list.
The first element mixedLRTadjRes
is data frame with rows (one
row for each significant gene set) and the 3 following variables:
GeneSet
the significant gene set name from the gmt object.
AdjPval
the adjusted p-value corresponding to the significant gene
set.
desc
the significant gene set description from the gmt object.
The second element multCorProc
passes along the multiple testing
procedure used (from the argument myproc
).
The third element threshold
passes along the significance threshold
used (from the argument threshold
).
Boris P. Hejblum
Hejblum BP, Skinner J, Thiebaut R, (2015) Time-Course Gene Set Analysis for Longitudinal Gene Expression Data. PLOS Comput. Biol. 11(6):e1004310. doi: 10.1371/journal.pcbi.1004310
if(interactive()){ data(data_simu_TcGSA) tcgsa_sim_1grp <- TcGSA.LR(expr=expr_1grp, gmt=gmt_sim, design=design, subject_name="Patient_ID", time_name="TimePoint", time_func="linear", crossedRandom=FALSE) sgnifs <- signifLRT.TcGSA(tcgsa_sim_1grp, threshold = 0.05, myproc = "BY", nbsimu_pval = 1000, write=FALSE) sgnifs }
if(interactive()){ data(data_simu_TcGSA) tcgsa_sim_1grp <- TcGSA.LR(expr=expr_1grp, gmt=gmt_sim, design=design, subject_name="Patient_ID", time_name="TimePoint", time_func="linear", crossedRandom=FALSE) sgnifs <- signifLRT.TcGSA(tcgsa_sim_1grp, threshold = 0.05, myproc = "BY", nbsimu_pval = 1000, write=FALSE) sgnifs }
summary
method for class 'TcGSA
'
## S3 method for class 'TcGSA' summary(object, ...) ## S3 method for class 'summary.TcGSA' print(x, ...)
## S3 method for class 'TcGSA' summary(object, ...) ## S3 method for class 'summary.TcGSA' print(x, ...)
object |
an object of class ' |
... |
further arguments passed to or from other methods. |
x |
an object of class ' |
The function summary.TcGSA returns a list with the following components (list elements):
time_func
the chosen form for the time trend.
separateSubjects
a logical flag indicating whether gene sets
tested for discriminating among patients, or for time trends over time.
ntg
the number of treatment groups.
ngs
the number of tested gene sets.
nsignif
the number of significant gene sets at a 5% FDR (using
the default Benjamini & Yekutieli step-up procedure).
Boris P. Hejblum
if(interactive()){ data(data_simu_TcGSA) tcgsa_sim_1grp <- TcGSA.LR(expr=expr_1grp, gmt=gmt_sim, design=design, subject_name="Patient_ID", time_name="TimePoint", time_func="linear", crossedRandom=FALSE) summary(tcgsa_sim_1grp) tcgsa_sim_2grp <- TcGSA.LR(expr=expr_2grp, gmt=gmt_sim, design=design, subject_name="Patient_ID", time_name="TimePoint", time_func="linear", crossedRandom=FALSE, group_name="group.var") summary(tcgsa_sim_2grp) }
if(interactive()){ data(data_simu_TcGSA) tcgsa_sim_1grp <- TcGSA.LR(expr=expr_1grp, gmt=gmt_sim, design=design, subject_name="Patient_ID", time_name="TimePoint", time_func="linear", crossedRandom=FALSE) summary(tcgsa_sim_1grp) tcgsa_sim_2grp <- TcGSA.LR(expr=expr_2grp, gmt=gmt_sim, design=design, subject_name="Patient_ID", time_name="TimePoint", time_func="linear", crossedRandom=FALSE, group_name="group.var") summary(tcgsa_sim_2grp) }
This function computes the Likelihood Ratios for the gene sets under scrutiny, as well as estimations of genes dynamics inside those gene sets through mixed models.
TcGSA.LR( expr, gmt, design, subject_name = "Patient_ID", time_name = "TimePoint", crossedRandom = FALSE, covariates_fixed = "", time_covariates = "", time_func = "linear", group_name = "", separateSubjects = FALSE, minGSsize = 10, maxGSsize = 500 ) ## S3 method for class 'TcGSA' print(x, ...)
TcGSA.LR( expr, gmt, design, subject_name = "Patient_ID", time_name = "TimePoint", crossedRandom = FALSE, covariates_fixed = "", time_covariates = "", time_func = "linear", group_name = "", separateSubjects = FALSE, minGSsize = 10, maxGSsize = 500 ) ## S3 method for class 'TcGSA' print(x, ...)
expr |
a matrix or dataframe of gene expression. Its dimension are
|
gmt |
a gmt object containing the gene sets definition. See
|
design |
a matrix or dataframe containing the experimental variables that used in the model,
namely |
subject_name |
the name of the factor variable from |
time_name |
the name of a numeric variable from |
crossedRandom |
logical flag indicating whether the random effects of the subjects and of the time points
should be modeled as one crossed random effect or as two separated random effects.
Default is |
covariates_fixed |
a character vector with the names of numeric or factor variables from the |
time_covariates |
a character vector with the names of numeric or factor variables from the |
time_func |
the form of the time trend. Can be either one of |
group_name |
in the case of several treatment groups, the name of a factor variable
from the |
separateSubjects |
logical flag indicating that the analysis identifies
gene sets that discriminates patients rather than gene sets than have a
significant trend over time. Default is |
minGSsize |
the minimum number of genes in a gene set. If there are
less genes than this number in one of the gene sets under scrutiny, the
Likelihood Ratio of this gene set is not computed (the mixed model are not
fitted). Default is |
maxGSsize |
the maximum number of genes in a gene set. If there are
more genes than this number in one of the gene sets under scrutiny, the
Likelihood Ratio of this gene set is not computed (the mixed model are not
fitted). This is to avoid very long computation times. Default is
|
x |
an object of class ' |
... |
further arguments passed to or from other methods. |
This Time-course Gene Set Analysis aims at identifying gene sets that are not
stable over time, either homogeneously or heterogeneously (see Hejblum
et al, 2012)in terms of their probes. And when the argument
separateSubjects
is TRUE
, instead of identifying gene sets that
have a significant trend over time, TcGSA identifies gene sets that
have significantly different trends over time depending on Subjects
.
TcGSA.LR
returns a tcgsa
object, which is a list with
the 5 following elements:
fit a data frame that contains the 3 following variables:
LR
: the likelihood ratio between the model under the
null hypothesis and the model under the alternative hypothesis.
CVG_H0
: convergence status of the model under the null hypothesis.
CVG_H1
: convergence status of the model under the alternative
hypothesis.
time_func
: a character string passing along the value of the
time_func
argument used in the call.
GeneSets_gmt
: a gmt
object passing along the value of the
gmt
argument used in the call.
group.var
: a factor passing along the group_name
variable
from the design
matrix.
separateSubjects
: a logical flag passing along the value of the
separateSubjects
argument used in the call.
Estimations
: a list of 3 dimensions arrays. Each element of the
list (i.e. each array) corresponds to the estimations of gene expression
dynamics for each of the gene sets under scrutiny (obtained from mixed
models). The first dimension of those arrays is the genes included in the
concerned gene set, the second dimension is the Patient_ID
, and the
third dimension is the TimePoint
. The values inside those arrays are
estimated gene expressions.
time_DF
: the degree of freedom of the natural splines functions
Boris P. Hejblum
Hejblum BP, Skinner J, Thiebaut R, (2015) Time-Course Gene Set Analysis for Longitudinal Gene Expression Data. PLOS Comput. Biol. 11(6):e1004310. doi: 10.1371/journal.pcbi.1004310
summary.TcGSA
, plot.TcGSA
,
and TcGSA.LR.parallel
for an implementation using
parallel computing
if(interactive()){ data(data_simu_TcGSA) tcgsa_sim_1grp <- TcGSA.LR(expr=expr_1grp, gmt=gmt_sim, design=design, subject_name="Patient_ID", time_name="TimePoint", time_func="linear", crossedRandom=FALSE) tcgsa_sim_1grp summary(tcgsa_sim_1grp) plot(x=tcgsa_sim_1grp, expr=expr_1grp, Subject_ID=design$Patient_ID, TimePoint=design$TimePoint, baseline=1, B=100, time_unit="H" ) tcgsa_sim_2grp <- TcGSA.LR(expr=expr_2grp, gmt=gmt_sim, design=design, subject_name="Patient_ID", time_name="TimePoint", time_func="linear", crossedRandom=FALSE, group_name="group.var") tcgsa_sim_2grp }
if(interactive()){ data(data_simu_TcGSA) tcgsa_sim_1grp <- TcGSA.LR(expr=expr_1grp, gmt=gmt_sim, design=design, subject_name="Patient_ID", time_name="TimePoint", time_func="linear", crossedRandom=FALSE) tcgsa_sim_1grp summary(tcgsa_sim_1grp) plot(x=tcgsa_sim_1grp, expr=expr_1grp, Subject_ID=design$Patient_ID, TimePoint=design$TimePoint, baseline=1, B=100, time_unit="H" ) tcgsa_sim_2grp <- TcGSA.LR(expr=expr_2grp, gmt=gmt_sim, design=design, subject_name="Patient_ID", time_name="TimePoint", time_func="linear", crossedRandom=FALSE, group_name="group.var") tcgsa_sim_2grp }
A parallel version of the function TcGSA.LR
to be used on a
cluster of computing processors. This function computes the Likelihood
Ratios for the gene sets under scrutiny, as well as estimations of genes
dynamics inside those gene sets through mixed models.
TcGSA.LR.parallel( Ncpus, type_connec, expr, gmt, design, subject_name = "Patient_ID", time_name = "TimePoint", crossedRandom = FALSE, covariates_fixed = "", time_covariates = "", time_func = "linear", group_name = "", separateSubjects = FALSE, minGSsize = 10, maxGSsize = 500, monitorfile = "" )
TcGSA.LR.parallel( Ncpus, type_connec, expr, gmt, design, subject_name = "Patient_ID", time_name = "TimePoint", crossedRandom = FALSE, covariates_fixed = "", time_covariates = "", time_func = "linear", group_name = "", separateSubjects = FALSE, minGSsize = 10, maxGSsize = 500, monitorfile = "" )
Ncpus |
The number of processors available. |
type_connec |
The type of connection between the processors. Supported
cluster types are |
expr |
a matrix or dataframe of gene expression. Its dimension are
|
gmt |
a gmt object containing the gene sets definition. See
|
design |
a matrix or dataframe containing the experimental variables that used in the model,
namely |
subject_name |
the name of the factor variable from |
time_name |
the name of the numeric or factor variable from |
crossedRandom |
logical flag indicating whether the random effects of the subjects and of the time points
should be modeled as one crossed random effect or as two separated random effects.
Default is |
covariates_fixed |
a character vector with the names of numeric or factor variables from the |
time_covariates |
the name of a numeric variable from |
time_func |
the form of the time trend. Can be either one of |
group_name |
in the case of several treatment groups, the name of a factor variable
from the |
separateSubjects |
logical flag indicating that the analysis identifies
gene sets that discriminates patients rather than gene sets than have a
significant trend over time. Default is |
minGSsize |
the minimum number of genes in a gene set. If there are
less genes than this number in one of the gene sets under scrutiny, the
Likelihood Ratio of this gene set is not computed (the mixed model are not
fitted). Default is |
maxGSsize |
the maximum number of genes in a gene set. If there are
more genes than this number in one of the gene sets under scrutiny, the
Likelihood Ratio of this gene set is not computed (the mixed model are not
fitted). This is to avoid very long computation times. Default is
|
monitorfile |
a writable connections or a character string naming a file to write into,
to monitor the progress of the analysis.
Default is |
This Time-course Gene Set Analysis aims at identifying gene sets that are not
stable over time, either homogeneously or heterogeneously (see Hejblum
et al, 2012) in terms of their probes. And when the argument
separatePatients
is TRUE
, instead of identifying gene sets that
have a significant trend over time (possibly with probes heterogeneity of
this trend), TcGSA identifies gene sets that have significantly
different trends over time depending on the patient.
If the monitorfile
argument is a character string naming a file to
write into, in the case of a new file that does not exist yet, such a new
file will be created. A line is written each time one of the gene sets under
scrutiny has been analyzed (i.e. the two mixed models have been fitted, see
TcGSA.LR
) by one of the parallelized processors.
TcGSA.LR
returns a tcgsa
object, which is a list with
the 5 following elements:
fit a data frame that contains the 3 following variables:
LR
: the likelihood ratio between the model under the
null hypothesis and the model under the alternative hypothesis.
CVG_H0
: convergence status of the model under the null hypothesis.
CVG_H1
: convergence status of the model under the alternative
hypothesis.
time_func
: a character string passing along the value of the
time_func
argument used in the call.
GeneSets_gmt
: a gmt
object passing along the value of the
gmt
argument used in the call.
group.var
: a factor passing along the group_name
variable
from the design
matrix.
separateSubjects
: a logical flag passing along the value of the
separateSubjects
argument used in the call.
Estimations
: a list of 3 dimensions arrays. Each element of the
list (i.e. each array) corresponds to the estimations of gene expression
dynamics for each of the gene sets under scrutiny (obtained from mixed
models). The first dimension of those arrays is the genes included in the
concerned gene set, the second dimension is the Patient_ID
, and the
third dimension is the TimePoint
. The values inside those arrays are
estimated gene expressions.
time_DF
: the degree of freedom of the natural splines functions
Boris P. Hejblum
Hejblum BP, Skinner J, Thiebaut R, (2015) Time-Course Gene Set Analysis for Longitudinal Gene Expression Data. PLOS Comput. Biol. 11(6):e1004310. doi: 10.1371/journal.pcbi.1004310
if(interactive()){ data(data_simu_TcGSA) tcgsa_sim_1grp <- TcGSA.LR(expr=expr_1grp, gmt=gmt_sim, design=design, subject_name="Patient_ID", time_name="TimePoint", time_func="linear", crossedRandom=FALSE) library(doParallel) tcgsa_sim_1grp_par <- TcGSA.LR.parallel(Ncpus = 2, type_connec = 'SOCK', expr=expr_1grp, gmt=gmt_sim, design=design, subject_name="Patient_ID", time_name="TimePoint", time_func="linear", crossedRandom=FALSE, separateSubjects=TRUE) summary(tcgsa_sim_1grp) summary(tcgsa_sim_1grp_par) }
if(interactive()){ data(data_simu_TcGSA) tcgsa_sim_1grp <- TcGSA.LR(expr=expr_1grp, gmt=gmt_sim, design=design, subject_name="Patient_ID", time_name="TimePoint", time_func="linear", crossedRandom=FALSE) library(doParallel) tcgsa_sim_1grp_par <- TcGSA.LR.parallel(Ncpus = 2, type_connec = 'SOCK', expr=expr_1grp, gmt=gmt_sim, design=design, subject_name="Patient_ID", time_name="TimePoint", time_func="linear", crossedRandom=FALSE, separateSubjects=TRUE) summary(tcgsa_sim_1grp) summary(tcgsa_sim_1grp_par) }