Predictor API Schema#

Predictor response#

Key

Value type - Required/Optional

Description: Value options

Example

predictor_name

string- Required

Unique identifier for the Predictor. Constructed automatically in config.py by appending the container’s build timestamp (read from Apptainer’s /.singularity.d/labels.json) to the model’s base name. The format is {ModelName}_{YYYYMMDD-HHMMSS}_{TZ}. In development mode (outside a container), _dev is appended instead. This ensures every container rebuild produces a unique, sortable identifier, allowing Evaluators to distinguish between builds even when the model name has not changed.

This is especially important for Predictors that have undergone updates that led to different predictions.

"predictor_name": "deBoerTestModel_20260127-171101_PST"

matcher_version

string- Optional

If a Matcher was used by the Predictor, the matcher_version returned by the Matcher should be passed through to the Evaluator. This is the build-timestamped Matcher name, following the same versioning convention as predictor_name.

"matcher_version": "Matcher_20260127-171101_PST"

bin_size

integer - Required for track based models

Resolution of the model’s predictions.

"bin_size" : 1

prediction_tasks

array of objects - Required

Each object must contain the following keys: name, type_requested,type_actual, cell_type_requested, cell_type_actual, species_requested, species_actual,predictions, scale_prediction_requested (optional), scale_prediction_actual (optional), aggregation (optional object).

“prediction_tasks”: [
 {
  “name”: “task1”,
  “type_requested”: “expression”,
  “type_actual”: “expression”,
  “cell_type_requested”: “K562”,
  “cell_type_actual”: “bone_marrow_cell_line”,
  “species_requested”: “homo_sapiens”,
  “species_actual”: “homo_sapiens”,
  “scale_prediction_requested”: “linear”,
  “scale_prediction_actual”: “linear”,
  “aggregation”: {“bins”: “mean”},
  “predictions”: { 
    “seq1”: [12.2, 5, 6, ..],
    “seq2”: [1.1, 12, 0.00, ..],
    “random_seq”: [100.1, 50, 0.5, ..],
    “enhancer”: [4, 3.0, 0.001, ..],
    “control”: [0, 0, 0, ..]
   }
 }
]

name

string - Required

Unique identifier for each prediction task array matched from Evaluator.

"name": "task_for_model"

type_requested

string - Required

Prediction type requested: ["accessibility", "binding_molecule", "expression", "conformation_{isoform}", like "conformation_chromatin"]. "binding_<molecule>" can be for any type of binding assay (ex. CHIP-Seq, H3k27ac) and the text trailing the “_” should be all lower case.

"type_requested": "expression"

type_actual

array of string(s) - Required

Prediction type(s) completed by Predictor. In many cases will be the assay the model predicted. If multiple tracks were averaged in a multi-task model they should be included here. ex. [“dnase”, “atac-seq”]

"type_actual": ["expression"]

cell_type_requested

string- Required

Cell type requested by the Evaluator.

"cell_type_requested": "HEPG2"

cell_type_actual

string- Required

Cell type returned by Predictor. Predictor can choose to use the Matcher module, which will returned the closest matched cell type that the Predictor has.

"cell_type_actual": "HEPG2"

species_requested

string - Required

What species was requested by the Evaluator.

"species_requested": "homo_sapiens"

species_actual

string - Required

What species was used by the Predictor.

"species_actual": "homo_sapiens"

scale_prediction_requested

string - Optional

Evaluator requested scaling for predictions: [“linear”, “log”].

"scale_prediction_requested": "log"

scale_prediction_actual

string - Optional

How did the Predictor scale the predictions (if at all): [“linear”, “log”] .

"scale_prediction_actual": "log"

aggregation

object- Optional

Contains information about how replicates, bins and/or tracks were aggregated. Values can be any descriptive string and Predictor builders only need to include those that they used.

“aggregation”: {
  “replicates”: “mean”,
  “bins”: “mean”,
  “tracks”: “special mathematical formula”
 }

predictions

object- Required

Objects of key-value pairs where keys are strings and values are arrays of floats/integers. Each array of predictions can be a single value, a list of values for track predictions or nested lists (numpy arrays for msgpack-numpy responses). We suggested encoding interaction matrices as numpy arrays. The sequence ID keys are matched to the Evaluator sequence ID keys automatically by Predictor

“predictions”: {
  “seq1”: [12.2, 5, 6, ..],
  “seq2”: [1.1, 12, 0.00, ..],
  “random_seq”: [100.1, 50, 0.5, ..],
  “enhancer”: [4, 3.0, 0.001, ..],
  “control”: [0, 0, 0, ..]
 }

trim_upstream

object - Conditional

Returned only for track readout requests. A collection of key-value pairs mapping sequence IDs to integers. The integer specifies the number of base pairs in the first predicted bin that fall upstream of the actual sequence or requested prediction_range.

Evaluators use this exact offset to perfectly align the model’s binned predictions back to the original genomic coordinates.

“trim_upstream”: {
  “seq1”: 5 ,
  “seq2”: 0,
  “random_seq”: 2,
  “enhancer”: 1 ,
  “control”: 0
 }

Note on Binned Predictions and Sequence-Length Alignment#

Predictors that return binned predictions often include “N” bases in flanking bins. These can skew results when performing base-pair (bp)–level evaluation.

When an Evaluator requests a track readout request:

  • The expanded bp-level prediction (for binned outputs) must match the length of the input sequence.

  • The start of a prediction should be aligned with the first bp of the sequence.

  • By default, if no trim_upstream parameter is returned, the Evaluator should crop the predictions only at the downstream end.

  • If a trim_upstream parameter is returned, the Evaluator should:

    1. Crop upstream by the amount specified in trim_upstream.

    2. Crop the remaining required amount downstream to ensure the final prediction length equals the sequence length.

This ensures consistent evaluation and avoids artifacts introduced by binned predictions.