The data is divided into “Training” and “Challenge” datasets.

Training: The purpose of the Training dataset is to help method developers refine and improve their performance by providing the truth to them when the challenge is launched. Participants can receive immediate evaluation of their submission for the training datasets on Mosaic and use the built-in analytics to dive deeper into how their methods perform against other submissions.

Challenge: The Challenge dataset is released without the truth and final evaluation of the winner(s) of the challenge will be based solely on the challenge dataset.

  Training Challenge Format
Metagenomic Samples 4 metagenomes:
2 newly sequenced samples
2 publicly available “decoy” samples
4 metagenomes:
4 newly sequenced with an undisclosed number of “decoy” samples
Genome Sets 40 genomes 40 genomes multi-FASTA
Truth Tables Made available at launch Available only after challenge finishes Tab-delimited text files


  1. Metagenomic Samples
    1. Raw metagenomic FASTQ samples (Illumina HiSeq4000 paired-end). There would be 4 samples from different donors in each of the Training and Challenge datasets.
      • Training dataset:
        • 2 unpublished human fecal metagenomics samples
        • 2 published “decoy” human fecal metagenomics samples
      • Challenge dataset:
        • 4 unpublished human fecal metagenomics samples with an undisclosed number of decoy samples.
    2. “Decoy” samples refer to “negative control” samples whereby no strains provided in the genome datasets should map to them.
  2. Genome Datasets
    1. Genome datasets comprising 40 genome assemblies (FASTA) each for the Training and Challenge datasets. These genomes can be:
      • “True Positive” strains that were isolated from one of the samples, i.e. present in the sample
      • “Negative Controls” strains that were not present in any of the samples
  3. Truth Tables
    1. Matching table showing which genomes were present in which samples or if they were included as negative controls

You can choose to work on the challenge in two ways:

  • Option A
    • Create a Mosaic app: see tutorial
    • Run app on datasets
  • Option B
    • Download the entire Training dataset as a tarball
    • Download the entire Challenge dataset as a tarball
    • Run your method in your own system

The Testing Ground provides a space for you to evaluate the performance of your analyses before submitting entries to the Challenge. Additionally, in this space you will be able to visualize your Training analysis submissions and compare them with those of other users.

You can submit the results for the Challenge datasets analyses at the Challenge submission page. Here are some considerations to keep in mind:

  • Submission for the Challenge is exactly the same as the Testing Ground
  • Expected submission file format is the same as in the Testing Ground part.
  • Unlike the Testing Ground, after you submit to the Challenge through this page, you will not be able to see the performance of your submission, until after the Strains #2 challenge has ended.
  • You will, however, have access to some submissions details, such as:
    • Submission name
    • Submission ID
    • Submitted As
    • Submitted Date
    • Status
    • Names and IDs of submitted files
  • If the evaluation of your submission fails, a Mosaic administrator will contact you to provide more details and assist you in resolving the issue.

After the challenge ends, everyone who submitted to the challenge will get access to the evaluation metrics of their own submissions. These metrics will not include rankings. The official results of the Challenge will be announced after this process is complete.

If you have any questions, please feel free to contact us.

File to be submitted for the training and challenge parts will be a tab delimited table with 5 columns as shown in the figure:

  • Column 1: genome_id as given in the “reference genomes” file provided for this challenge.
  • Columns: 2-5: Two possible formats indicating presence/absence of the genome in Samples 1 to 4 respectively .
    • Binary: 1 indicating presence (“true positive”) or 0 for absence (“negative control”)
    • Probabilistic: float number indicating probability that the strain is from a particular sample. This allows calculation of precision-recall curve for the submission.

Example where presence/absence is indicated with one (1) and zero (0), respectively:

Example where presence/absence is indicated with a fractional number betweeen 0 and 1 :

The results of analyses of the Training Dataset and the Challenge Datasets will be evaluated by the Strains #2 Evaluator App. The Evaluator compares the submitted results with the appropriate truth datasets. In the case of the Training Dataset, these evaluations are real-time and become available at the successful completion of the evaluation. For the Challenge Dataset, the results will become available after the end of the challenge.

If you would prefer to use the Evaluator without submitting to the Testing Ground, you may run the Evaluator independently and then view the raw results. Since the Challenge Truth is hidden, there's no option to run the Evaluator for Challenge Results.

The overall performance of a submission is evaluated using the Adjusted Rand Index metric (

Additionally the following metrics are calculated and presented in each submission's detail page:

  • F1 SCORE