Predictor API Schema#
Predictor response#
Key |
Value type - Required/Optional |
Description: Value options |
Example |
|---|---|---|---|
|
|
Unique identifier for the Predictor. Constructed automatically in |
|
|
|
If a Matcher was used by the Predictor, the |
|
|
|
Resolution of the model’s predictions. |
|
|
|
Each object must contain the following keys: |
“prediction_tasks”: [ |
|
|
Unique identifier for each prediction task array matched from Evaluator. |
|
|
|
Prediction type requested: [ |
|
|
|
Prediction type(s) completed by Predictor. In many cases will be the assay the model predicted. If multiple tracks were averaged in a multi-task model they should be included here. ex. [“dnase”, “atac-seq”] |
|
|
|
Cell type requested by the Evaluator. |
|
|
|
Cell type returned by Predictor. Predictor can choose to use the Matcher module, which will returned the closest matched cell type that the Predictor has. |
|
|
|
What species was requested by the Evaluator. |
|
|
|
What species was used by the Predictor. |
|
|
|
Evaluator requested scaling for predictions: [“linear”, “log”]. |
|
|
|
How did the Predictor scale the predictions (if at all): [“linear”, “log”] . |
|
|
|
Contains information about how replicates, bins and/or tracks were aggregated. Values can be any descriptive string and Predictor builders only need to include those that they used. |
“aggregation”: { |
|
|
Objects of key-value pairs where keys are strings and values are arrays of floats/integers. Each array of predictions can be a single value, a list of values for track predictions or nested lists (numpy arrays for msgpack-numpy responses). We suggested encoding interaction matrices as numpy arrays. The sequence ID keys are matched to the Evaluator sequence ID keys automatically by Predictor |
“predictions”: { |
|
|
Returned only for |
“trim_upstream”: { |
Note on Binned Predictions and Sequence-Length Alignment#
Predictors that return binned predictions often include “N” bases in flanking bins. These can skew results when performing base-pair (bp)–level evaluation.
When an Evaluator requests a track readout request:
The expanded bp-level prediction (for binned outputs) must match the length of the input sequence.
The start of a prediction should be aligned with the first bp of the sequence.
By default, if no
trim_upstreamparameter is returned, the Evaluator should crop the predictions only at the downstream end.If a
trim_upstreamparameter is returned, the Evaluator should:Crop upstream by the amount specified in
trim_upstream.Crop the remaining required amount downstream to ensure the final prediction length equals the sequence length.
This ensures consistent evaluation and avoids artifacts introduced by binned predictions.