Training datasets
These are labeled datasets that allow the proponents to train their algorithms. The datasets can be labeled with background and possible anomalies. That being said, the example anomalies do not need to be exactly the same as the test dataset. Moreover, the training dataset need not have the same exact background as the final problem. The goal here is to provide some comfort with the dataset format and how to process the information.
Testing/evaluation datasets
We provide a private testing dataset to allow the participants to test their model for submission. The private dataset will include additional background events and pre-hidden anomalies. The anomaly detection evaluation metric is the false positive rate (FPR) at a specified true positive rate (TPR), which is the same as the overall evaluation for the test dataset.
Evaluation datasets
The evaluation metric is the false positive rate (FPR) at a specified true positive rate (TPR) when detecting anomalies. Please see the individual challenges for more details on the evaluation. Submission scoring code is provided on each challenge's GitHub repository as well. The evaluation of the models will be performed with resources at the NSERC computing center at Lawrence Berkeley labs. This means that submission results will not be immediately available, as submissions must be reviewed prior to being run (this may take a day or two). The leaderboard will be updated on a few day timescale to accommodate the possible compute required to assess the leaderboard.