Sequencing saturation is a measure of the overall completeness and depth of sequencing for all captured fragments. It can be determined by calculating the redundancy of sequenced fragments that possess valid barcodes and unique molecular identifiers (UMIs) and comparing them to unique regions of the genome. The formula for sequencing saturation is as follows:

 

Sequencing Saturation = 1 - (non-duplicated_unique_mapped_reads / total_unique_mapped_reads)

 

In the bam file generated by Mobivision Quantify, fragments with MAPQ=255 represent sequences that can be uniquely mapped to regions in the genome. To calculate the values needed for the sequencing saturation formula:

 

total_unique_mapped_reads: Count the number of sequenced fragments with valid barcodes and UMIs among the fragments with MAPQ=255.

 

non-duplicated_unique_mapped_reads: Count the number of non-duplicated sequenced fragments with valid barcodes and UMIs among the fragments with MAPQ=255.

 

Here is an example of the code to achieve this calculation:


samtools view -q 255 Aligned.bam | gawk '{if (NF==16) {total_reads+=1; !umi[$15,$16]++}} END {printf("%%s,%%s\\n", total_reads, length(umi))}'