% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/clustering.R
\name{degPatterns}
\alias{degPatterns}
\title{Make groups of genes using expression profile.}
\usage{
degPatterns(
  ma,
  metadata,
  minc = 15,
  summarize = "merge",
  time = "time",
  col = NULL,
  consensusCluster = FALSE,
  reduce = FALSE,
  cutoff = 0.7,
  scale = TRUE,
  pattern = NULL,
  groupDifference = NULL,
  eachStep = FALSE,
  plot = TRUE,
  fixy = NULL
)
}
\arguments{
\item{ma}{log2 normalized count matrix}

\item{metadata}{data frame with sample information. Rownames
should match \code{ma} column names
row number should be the same length than p-values vector.}

\item{minc}{integer minimum number of genes in a group that
will be return}

\item{summarize}{character column name in metadata that will be used to group
replicates. If the column doesn't exist it'll merge the \code{time} and
the \code{col} columns, if \code{col} doesn't exist it'll use \code{time} only.
For instance, a merge between summarize and time parameters:
control_point0 ... etc}

\item{time}{character column name in metadata that will be used as
variable that changes, normally a time variable.}

\item{col}{character column name in metadata to separate
samples. Normally control/mutant}

\item{consensusCluster}{Indicates whether using \link{ConsensusClusterPlus}
or \code{\link[cluster:diana]{cluster::diana()}}}

\item{reduce}{boolean remove genes that are outliers of the cluster
distribution. \code{boxplot} function is used to flag a gene in any
group defined by \code{time} and \code{col} as outlier and it is removed
from the cluster. Not used if \code{consensusCluster} is TRUE.}

\item{cutoff}{This is deprecated.}

\item{scale}{boolean scale the \code{ma} values by row}

\item{pattern}{numeric vector to be used to find patterns like this
from the count matrix. As well, it can be a character indicating the
genes inside the count matrix to be used as reference.}

\item{groupDifference}{Minimum abundance difference between the
maximum value and minimum value for each feature. Please,
provide the value in the same range than the \code{ma} value
( if \code{ma} is in log2, \code{groupDifference} should be inside that range).}

\item{eachStep}{Whether apply \code{groupDifference} at each stem over
\code{time} variable. \strong{This only work properly for one group
with multiple time points}.}

\item{plot}{boolean plot the clusters found}

\item{fixy}{vector integers used as ylim in plot}
}
\value{
list wiht two items:
\itemize{
\item \code{df} is a data.frame
with two columns. The first one with genes, the second
with the clusters they belong.
\item \code{pass} is a vector of the clusters that pass the \code{minc} cutoff.
\item \code{plot} ggplot figure.
\item \code{hr} clustering of the genes in hclust format.
\item \code{profile} normalized count data used in the plot.
\item \code{raw} data.frame with gene values summarized by biological replicates and
with metadata information attached.
\item \code{summarise} data.frame with clusters values summarized by group and
with the metadata information attached.
\item \code{normalized} data.frame with the clusters values
as used in the plot.
\item \code{benchmarking} plot showing the different patterns at different
values for clustering cuttree function.
\item \code{benchmarking_curve} plot showing how the numbers of clusters and genes
changed at different values for clustering cuttree function.
}
}
\description{
Note that this function doesn't calculate significant
difference between groups, so the
matrix used as input should be already filtered to contain only
genes that are significantly different or the most interesting genes
to study.
}
\details{
It can work with one or more groups with 2 or
more several time points.
Before calculating the genes similarity among samples,
all samples inside the same time point (\code{time} parameter) and
group (\code{col} parameter) are collapsed together, and the \code{mean}
value is the representation of the group for the gene abundance.
Then, all pair-wise gene expression is calculated using
\code{cor.test}  R function using kendall as the statistical
method. A distance matrix is created from those values.
After that, \code{\link[cluster:diana]{cluster::diana()}} is used for the
clustering of gene-gene distance matrix and cut the tree using
the divisive coefficient of the clustering, giving as well by diana.
Alternatively, if \code{consensusCluster} is on, it would use
\link{ConsensusClusterPlus} to cut the tree in stable clusters.
Finally, for each group of genes, only the ones that have genes
higher than \code{minc} parameter will be added to the figure.
The y-axis in the figure is the results of applying \code{scale()}
R function, what is similar to creating a
\code{Z-score} where values are centered to the \code{mean}  and
scaled to the \verb{standard desviation} by each gene.

The different patterns can be merged
to get similar ones into only one pattern. The expression
correlation of the patterns will be used to decide whether
some need to be merged or not.
}
\examples{
data(humanGender)
library(SummarizedExperiment)
library(ggplot2)
ma <- assays(humanGender)[[1]][1:100,]
des <- colData(humanGender)
des[["other"]] <- sample(c("a", "b"), 85, replace = TRUE)
res <- degPatterns(ma, des, time="group", col = "other")
# Use the data yourself for custom figures
 ggplot(res[["normalized"]],
        aes(group, value, color = other, fill = other)) +
  geom_boxplot() +
   geom_point(position = position_jitterdodge(dodge.width = 0.9)) +
   # change the method to make it smoother
   geom_smooth(aes(group=other), method = "lm")
}
