Fits TOP model with M5 bins. By default, it runs Gibbs sampling for all 10 partitions in parallel on 10 CPU cores, and returns a list of posterior samples for each of the 10 partitions. Alternatively, you may fit model for each of the 10 the partitions on separate machines by specifying which partition to run.

fit_TOP_M5_model(
  all_training_data,
  all_training_data_files,
  model_file,
  logistic_model = FALSE,
  transform = c("asinh", "log2", "sqrt", "none"),
  partitions = 1:10,
  n_iter = 2000,
  n_burnin = floor(n_iter/2),
  n_chains = 3,
  n_thin = max(1, floor((n_iter - n_burnin)/1000)),
  n_cores = length(partitions),
  save = TRUE,
  outdir = "TOP_fit",
  return_type = c("samples", "jagsfit", "samplefiles"),
  quiet = FALSE
)

Arguments

all_training_data

A list of the assembled training data of all partitions.

all_training_data_files

A vector of the assembled training data files of all partitions. If all_training_data is missing, it will load the training data from all_training_data_files.

model_file

TOP model file written in JAGS. By default, use the model file included in the TOP package.

logistic_model

Logical; whether to use the logistic version of TOP model. If logistic_model = TRUE, use the logistic version of TOP model. If logistic_model = FALSE, use the quantitative occupancy model (default).

transform

Type of transformation for ChIP-seq read counts. Options are: ‘asinh’(asinh transformation), ‘log2’ (log2 transformation), ‘sqrt’ (square root transformation), and ‘none’(no transformation). This only applies when logistic_model = FALSE.

partitions

A vector of selected partition(s) to run. Default: all 10 partitions. If you specify a few partitions, it will only fit models to data in those selected partitions.

n_iter

Number of total iterations per chain, including burn-in iterations.

n_burnin

Length of burn-in iterations, i.e. number of samples to discard at the beginning. Default is n_iter/2, discarding the first half of the samples.

n_chains

Number of Markov chains (default: 3).

n_thin

Thinning rate, must be a positive integer. Default is max(1, floor(n_chains * (n_iter-n_burnin) / 1000)) which will only thin if there are at least 2000 simulations. No thinning will be performed if n_thin = 1.

n_cores

Number of cores to use in parallel (default: equal to the number of partitions, i.e. length(partitions)).

save

Logical, if TRUE, saves posterior samples as ‘.rds’ files in outdir.

outdir

Directory to save TOP model posterior samples.

return_type

Type of result to return. Options: ‘samples’(posterior samples), ‘jagsfit’ (jagsfit object), or ‘samplefiles’ (file names of posterior samples).

quiet

Logical, if TRUE, suppress model fitting messages. Otherwise, only show progress bars.

Value

A list of posterior samples or jagsfit object for each partition.

Examples

if (FALSE) {
# Example to train TOP quantitative occupancy model:

# The example below first performs 'asinh' transform to the ChIP-seq counts
# in 'assembled_training_data', then runs Gibbs sampling
# for each of the 10 partitions in parallel.
# The following example runs 5000 iterations of Gibbs sampling in total,
# including 1000 burn-ins, with 3 Markov chains, at a thinning rate of 2,
# and saves the posterior samples to the 'TOP_fit' directory.
all_TOP_samples <- fit_TOP_M5_model(assembled_training_data,
                                    logistic_model = FALSE,
                                    transform = 'asinh',
                                    n_iter = 5000,
                                    n_burnin = 1000,
                                    n_chains = 3,
                                    n_thin = 2,
                                    out_dir = 'TOP_fit')

# We can also obtain the posterior samples separately for each partition,
# For example, to obtain the posterior samples for partition #3 only:
TOP_samples_part3 <- fit_TOP_M5_model(assembled_training_data,
                                      logistic_model = FALSE,
                                      transform = 'asinh',
                                      partitions = 3,
                                      n_iter = 5000,
                                      n_burnin = 1000,
                                      n_chains = 3,
                                      n_thin = 2,
                                      out_dir = 'TOP_fit')


# Example to train TOP logistic (binary) model:
all_TOP_samples <- fit_TOP_M5_model(assembled_training_data,
                                    logistic_model = TRUE,
                                    n_iter = 5000,
                                    n_burnin = 1000,
                                    n_chains = 3,
                                    n_thin = 2,
                                    out_dir = 'TOP_fit')

}