Taxonomy Data Quality Assurance

We strive to maximize data quality and consistency by implementing several bestpractices including:

Data processing and sample verification - We enter all sample, site, and invertebrate data directly into a cloud-based postgreSQL database. Agencies can submit sample and site data directly to our lab via an online submission process. Before acceptance into the processing queue, we check these data for common issues, such as mislabeled jars, unlikely sampled area values, and potentially erroneous coordinates. We contact agencies when we identify anomalies in submitted samples collection data.
Taxonomic effort conformity – Our taxonomic staff uses the Southwest Association of Freshwater Invertebrate Taxonomists (SAFIT) level 2a to guide our target taxonomic resolution. This means we identify insects to the genus/species level, except midges which we keep at the sub-family level. We target non-insect taxa at a variety of resolution levels, depending on the taxonomic group.
Taxonomic certification and workshops – Our taxonomists have many years of experience and are certified by the Society for Freshwater Science in the identification of Western EPT taxa and General Arthropods. Our taxonomists also attend regional taxonomic seminars offered by the Northwest Bioassessment WorkGroup, Southwest Association of Freshwater Invertebrate Taxonomists, and at universities within our region. Additionally, we send problem specimens to experts orbring them to taxonomy workshops for verification.
Regular and consistent communication among all taxonomists – Our taxonomists share questionable specimens with all NAMC taxonomists and compare to voucher specimens from our reference collection. If we cannot reach consensus, we leave the taxonomic resolution at the coarser level (e.g., genus to family), and we set aside the specimen for later verification by an expert.
Review the composite list of taxa for each set of samples – We review composite taxa lists from entire sample sets for taxonomic consistency across samples. We also review consistency among taxonomists’ identifications for raretaxa, invasive taxa, or taxa not typically found in a habitat or region sampled. We focus particular attention on taxa found at low frequencies within a set or when only a single taxonomist makes an identification. Our entire taxonomic staff re-examines these organisms to ensure the accuracy of identifications.
Re-identification and enumeration – We reprocess a minimum of 2% of all samples within a calendar year. Processing a sample twice allows for the detection of both isolated (single occurrences) and systematic (multiple, regular occurrences) taxonomic errors, which triggers corrective action. We calculate three metrics todetermine similarity between pairs of taxonomic lists for a sample: 1) Jaccard similarity, 2) Percent difference in enumeration (PDE), and 3) Percent taxonomic difference (PTD). We use the Jaccard coefficient to quantify the similarity between taxonomists in identifications (JC):
B = (∑ |X_ij - X_ik|) / (∑ |X_ij + X_ik|)

Where X_ij and X_ik are the number of individuals of species i in the respective samples (j,k). The comparison between both samples is summed for n number of species in the samples. Our minimum quality objective value for similarity using this measure is 95%. We also calculate the percent difference in enumeration using:

𝑃𝐷𝐸 = (|n1 − 𝑛2| / 𝑛1 + 𝑛2) ∗ 100

Where n₁ and n₂ are the final counts the two taxonomists achieve. Our minimum quality objective for PDE is 5%. Additionally, we calculate the percent taxonomic difference using:

𝑃𝑇𝐷 = (1 − [𝑎 / 𝑁]) ∗ 100

Where a is the number of agreements between taxonomic lists and N is the total number of individuals identified. Our minimum quality objective for PTD is 5%. Although our taxonomists typically achieve our minimum quality objectives, we scrutinize any samples where we do not reach our objectives. We use side-by-side taxa lists to compare all QC samples in which we do not reach our quality goals. We next re-examine all specimens with identification discrepancies until we reach a taxonomic consensus.
External taxonomic verification – We submit an additional 2% of all samples (randomly drawn from the previous calendar year’s completed work) to an external lab for re-identification. We divide samples sets into two classes: 1) sets where the number of samples associated with a project is ≥ 25, and 2) sets where the number of samples associated with a project is < 25. For each ≥ 25 sample set, we randomly choose 2% (rounded up) of the samples for external re-identification. We group all samples from < 25 sets and randomly choose 2% (rounded up) of these grouped sets for external re-identification. We send these pre-sorted physical samples (without previous taxonomic identifications) to an external professional taxonomic laboratory which identifies the samples to the same target taxonomic resolution (SAFIT 2a). Once specimens are returned, we use the Jaccard coefficient to assess agreement on a sample-by- sample basis between the paired taxonomic lists. We review all samples with similarity values < 1 to identify specific differences between NAMC and the external laboratory including:
1. Differences in taxonomic resolution
2. Differences in lab-specific naming conventions
3. Differences in taxonomic source naming conventions
4. Differences in taxonomic identification
5. Difference in enumeration
We use these results to identify any misidentification bias and make corrections as needed. We retain all verified specimens in our NAMC reference collection.
Publish all quality control results – We update our website with the latest results from our taxonomic quality control metrics.