% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/functions_pool.R
\name{errorHet}
\alias{errorHet}
\title{Average absolute difference between expected heterozygosity}
\usage{
errorHet(
  nDip,
  nloci,
  pools,
  pError,
  sError,
  mCov,
  vCov,
  min.minor,
  minimum = NA,
  maximum = NA,
  theta = 10
)
}
\arguments{
\item{nDip}{an integer representing the total number of diploid individuals
to simulate. Note that \code{\link[scrm:scrm]{scrm::scrm()}} actually simulates haplotypes, so the
number of simulated haplotypes is double of this.}

\item{nloci}{is an integer that represents how many independent loci should
be simulated.}

\item{pools}{a list with a vector containing the size (in number of diploid
individuals) of each pool. Thus, if a population was sequenced using a
single pool, the vector should contain only one entry. If a population was
sequenced using two pools, each with 10 individuals, this vector should
contain two entries and both will be 10.}

\item{pError}{an integer representing the value of the error associated with
DNA pooling. This value is related with the unequal contribution of both
individuals and pools towards the total number of reads observed for a
given population - the higher the value the more unequal are the individual
and pool contributions.}

\item{sError}{a numeric value with error rate associated with the sequencing
and mapping process. This error rate is assumed to be symmetric:
error(reference -> alternative) = error(alternative -> reference). This
number should be between 0 and 1.}

\item{mCov}{an integer that defines the mean depth of coverage to simulate.
Please note that this represents the mean coverage across all sites.}

\item{vCov}{an integer that defines the variance of the depth of coverage
across all sites.}

\item{min.minor}{is an integer representing the minimum allowed number of
minor-allele reads. Sites that, across all populations, have less
minor-allele reads than this threshold will be removed from the data.}

\item{minimum}{an optional integer representing the minimum coverage allowed.
Sites where the population has a depth of coverage below this threshold are
removed from the data.}

\item{maximum}{an optional integer representing the maximum coverage allowed.
Sites where the population has a depth of coverage above this threshold are
removed from the data.}

\item{theta}{a value for the mutation rate assuming theta = 4Nu, where u is
the neutral mutation rate per locus.}
}
\value{
a data.frame with columns detailing the number of diploid
individuals, the pool error, the number of pools, the number of individuals
per pool, the mean coverage, the variance of the coverage and the average
absolute difference between the expected heterozygosity computed from
genotypes and from pooled data.
}
\description{
Calculates the average absolute difference between the expected
heterozygosity computed directly from genotypes and from pooled sequencing
data.
}
\details{
Different combinations of parameters can be tested to check the effect of the
various parameters. The average absolute difference is computed with the
\link[Metrics]{mae} function, assuming the expected heterozygosity computed
directly from the genotypes as the \code{actual} input argument and the
expected heterozygosity from pooled data as the \code{predicted} input
argument.
}
\examples{
# single population sequenced with a single pool of 100 individuals
errorHet(nDip = 100, nloci = 10, pools = list(100), pError = 100, sError = 0.01,
mCov = 100, vCov = 250, min.minor = 2)

# single population sequenced with two pools, each with 50 individuals
errorHet(nDip = 100, nloci = 10, pools = list(c(50, 50)), pError = 100, sError = 0.01,
mCov = 100, vCov = 250, min.minor = 2)

# single population sequenced with two pools, each with 50 individuals
# removing sites with coverage below 10x or above 180x
errorHet(nDip = 100, nloci = 10, pools = list(c(50, 50)), pError = 100, sError = 0.01,
mCov = 100, vCov = 250, min.minor = 2, minimum = 10, maximum = 180)

}
