Modelling metabarcoding count data
This project was funded by NERC (project NE/T010045/1) as part of the Landscape Decisions theme.
It was a collaboration across four universities (University of Kent (Dr Eleni Matechou, Professor Richard Griffiths, Dr Alex Diana as the PDRA), UCL (Professor Jim Griffin), University of East Anglia (Professor Douglas Yu) and Lancaster University (Dr Alex Bush)).
Summary
This work developed a unified hierarchical model for metabarcoding read count data that links observed reads to underlying species biomass through the full data-generation process. The model explicitly accounts for variation and error at three stages: biomass availability at sites, biomass collection into samples, and laboratory analysis through PCR.
A key contribution is the ability to separate ecological signal (true variation in DNA biomass) from observation processes, including sampling inefficiencies, PCR bias, and both false positive and false negative errors. By modelling these components jointly, the approach allows inference on relative DNA concentration across species and sites, rather than relying on raw read counts.
The framework also incorporates species interactions and experimental features such as spike-ins, providing a principled way to quantify uncertainty and improve comparability across samples and studies.
