In recent years, the microbiome field has undergone a shift from clustering-based methods of operational taxonomic unit (OTU) designation based on sequence similarity to denoising algorithms that identify exact amplicon sequence variants (ASV), and methods to identify contaminating bacterial DNA sequences from low biomass samples have been developed. Although these methods improve accuracy when analyzing mock communities, their impact on real samples and downstream analysis of biological associations is less clear.
Here, we re-processed our recently published milk microbiota data using Qiime1 to identify OTUs, and Qiime2 to identify ASVs, with or without contaminant removal using decontam. Qiime2 resolved the mock community more accurately, primarily because Qiime1 failed to detect Lactobacillus. Qiime2 also considerably reduced the average number of ASVs detected in human milk samples (364±145 OTUs vs. 170±73 ASVs, p<0.001). Compared to the richness, the estimated diversity measures had a similar range using both methods albeit statistically different (inverse Simpson index: 14.3±8.5 vs. 15.6±8.7, p=0.031) and there was strong consistency and agreement for the relative abundances of the most abundant bacterial taxa, including Staphylococcaceae and Streptococcaceae. One notable exception was Oxalobacteriaceae, which was overrepresented using Qiime1 regardless of contaminant removal. Downstream statistical analyses were not impacted by the choice of algorithm in terms of the direction, strength, and significance of associations of host factors with bacterial diversity and overall community composition.
Overall, the biological observations and conclusions were robust to the choice of the sequencing processing methods and contaminant removal.