Predictor API Schema

Predictor API Schema#

Predictor response#

Key	Value type - Required/Optional	Description: Value options	Example
`predictor_name`	`string`- Required	Unique identifier for the Predictor. Constructed automatically in `config.py` by appending the container’s build timestamp (read from Apptainer’s `/.singularity.d/labels.json`) to the model’s base name. The format is `{ModelName}_{YYYYMMDD-HHMMSS}_{TZ}`.	`"predictor_name": "deBoerTestModel_20260127-171101_PST"`
`matcher_version`	`string`- Optional	If a Matcher was used by the Predictor, the `matcher_version` returned by the Matcher should be passed through to the Evaluator. This is the build-timestamped Matcher name, following the same versioning convention as `predictor_name`.	`"matcher_version": "Matcher_20260127-171101_PST"`
`bin_size`	`integer` - Required for track and interactive matrix based models	Resolution of the model’s predictions.	`"bin_size" : 1`
`prediction_tasks`	`array of objects` - Required	Each object must contain the following keys: `name`, `type_requested`,`type_actual`, `cell_type_requested`, `cell_type_actual`, `species_requested`, `species_actual`,`predictions`, `scale_prediction_requested` (optional), `scale_prediction_actual` (optional), `aggregation` (optional object).	“prediction_tasks”: [ { “name”: “task1”, “type_requested”: “expression”, “type_actual”: “expression”, “cell_type_requested”: “K562”, “cell_type_actual”: “bone_marrow_cell_line”, “species_requested”: “homo_sapiens”, “species_actual”: “homo_sapiens”, “scale_prediction_requested”: “linear”, “scale_prediction_actual”: “linear”, “aggregation”: {“bins”: “mean”}, “predictions”: { “seq1”: [12.2, 5, 6, ..], “seq2”: [1.1, 12, 0.00, ..], “random_seq”: [100.1, 50, 0.5, ..], “enhancer”: [4, 3.0, 0.001, ..], “control”: [0, 0, 0, ..] } } ]
`name`	`string` - Required	Unique identifier for each prediction task array matched from Evaluator.	`"name": "task_for_model"`
`type_requested`	`string` - Required	Prediction type requested: [`"accessibility"`, `"binding_molecule"`, `"expression"`, `"conformation_{isoform}"`, like `"conformation_chromatin"`]. `"binding_<molecule>"` can be for any type of binding assay (ex. CHIP-Seq, H3k27ac) and the text trailing the “_” should be all lower case.	`"type_requested": "expression"`
`type_actual`	`array of string(s)` - Required	Prediction type(s) completed by Predictor. In many cases will be the assay the model predicted. If multiple tracks were averaged in a multi-task model they should be included here. ex. [“dnase”, “atac-seq”]	`"type_actual": ["dnase"]`
`cell_type_requested`	`string`- Required	Cell type requested by the Evaluator.	`"cell_type_requested": "HEPG2"`
`cell_type_actual`	`string`- Required	Cell type returned by Predictor. Predictor can choose to use the Matcher module, which will returned the closest matched cell type that the Predictor has.	`"cell_type_actual": "HEPG2"`
`species_requested`	`string` - Required	What species was requested by the Evaluator.	`"species_requested": "homo_sapiens"`
`species_actual`	`string` - Required	What species was used by the Predictor.	`"species_actual": "homo_sapiens"`
`scale_prediction_requested`	`string` - Optional	Evaluator requested scaling for predictions: [“linear”, “log”].	`"scale_prediction_requested": "log"`
`scale_prediction_actual`	`string` - Optional	How did the Predictor scale the predictions (if at all): [“linear”, “log”] .	`"scale_prediction_actual": "log"`
`aggregation`	`object`- Optional	Contains information about how replicates, bins and/or tracks were aggregated. Values can be any descriptive string and Predictor builders only need to include those that they used.	“aggregation”: { “replicates”: “mean”, “bins”: “mean”, “tracks”: “special mathematical formula” }
`predictions`	`object`- Required	A mapping of sequence IDs to prediction values. Keys are sequence ID strings; values are scalars/floats for point requests, 1-D array of floats for track requests or nested lists/numpy arrays (msgpack-numpy responses) for interaction matrix requests. The sequence ID keys are matched to the Evaluator sequence ID keys automatically by Predictor	“predictions”: { “seq1”: [12.2, 5, 6, ..], “seq2”: [1.1, 12, 0.00, ..], “random_seq”: [100.1, 50, 0.5, ..], “enhancer”: [4, 3.0, 0.001, ..], “control”: [0, 0, 0, ..] }
`trim_upstream`	`object` - Conditional	Returned only for `track` readout requests. A collection of key-value pairs mapping sequence IDs to integers. The integer specifies the number of base pairs in the first predicted bin that fall upstream of the actual sequence or requested `prediction_range`. Evaluators use this exact offset to perfectly align the model’s binned predictions back to the original genomic coordinates.	“trim_upstream”: { “seq1”: 5 , “seq2”: 0, “random_seq”: 2, “enhancer”: 1 , “control”: 0 }

Notes:#

In development mode (outside a container), _dev is appended to the predictor_name instead. This ensures every container rebuild produces a unique, sortable identifier, allowing Evaluators to distinguish between builds even when the model name has not changed.

This is especially important for Predictors that have undergone updates that led to different predictions.

Note on Binned Predictions and Sequence-Length Alignment#

Predictors that return binned predictions often include “N” bases in flanking bins. These can skew results when performing base-pair (bp)–level evaluation.

When an Evaluator requests a track readout request:

The expanded bp-level prediction (for binned outputs) must match the length of the input sequence.
The start of a prediction should be aligned with the first bp of the sequence.
By default, if no trim_upstream parameter is returned, the Evaluator should crop the predictions only at the downstream end.
If a trim_upstream parameter is returned, the Evaluator should:
1. Crop upstream by the amount specified in trim_upstream.
2. Crop the remaining required amount downstream to ensure the final prediction length equals the sequence length.

This ensures consistent evaluation and avoids artifacts introduced by binned predictions.

Predictor API Schema

Contents

Predictor API Schema#

Predictor response#

Notes:#

Note on Binned Predictions and Sequence-Length Alignment#