Naming the untouchable – environmental sequences and niche partitioning as taxonomical evidence in fungi

Due to their submerged and cryptic lifestyle, the vast majority of fungal species are difficult to observe and describe morphologically, and many remain known to science only from sequences detected in environmental samples. The lack of practices to delimit and name most fungal species is a staggering limitation to communication and interpretation of ecology and evolution in kingdom Fungi. Here, we use environmental sequence data as taxonomical evidence and combine phylogenetic and ecological data to generate and test species hypotheses in the class Archaeorhizomycetes (Taphrinomycotina, Ascomycota). Based on environmental amplicon sequencing from a well-studied Swedish pine forest podzol soil, we generate 68 distinct species hypotheses of Archaeorhizomycetes, of which two correspond to the only described species in the class. Nine of the species hypotheses represent 78% of the sequenced Archaeorhizomycetes community, and are supported by long read data that form the backbone for delimiting species hypothesis based on phylogenetic branch lengths. Soil fungal communities are shaped by environmental filtering and competitive exclusion so that closely related species are less likely to co-occur in a niche if adaptive traits are evolutionarily conserved. In soil profiles, distinct vertical horizons represent a testable niche dimension, and we found significantly differential distribution across samples for a well-supported pair of sister species hypotheses. Based on the combination of phylogenetic and ecological evidence, we identify two novel species for which we provide molecular diagnostics and propose names. While environmental sequences cannot be automatically translated to species, they can be used to generate phylogenetically distinct species hypotheses that can be further tested using sequences as ecological evidence. We conclude that in the case of abundantly and frequently observed species, environmental sequences can support species recognition in the absences of physical specimens, while rare taxa remain uncaptured at our sampling and sequencing intensity.


Section Page
Supplementary tables Table S1. Sample names, soil horizon origin, primer number and barcode 2 Table S2. Number of long reads throughout the quality control and filtering 3 Table S3. Overview of 41 Archaeorhizomycetes ASVs in "phylogenetic" dataset 4 Table S4. Overview of Archaeorhizomycetes reads in the "ecological" dataset 5 Table S5. Summary statistics for 2-way ANOVA for "ecological" dataset 6 Table S6. Summary statistics for PERMANOVA on SHs 6 Table S7. Niche-specific distribution of SHs, Kruskal-Wallis test 7 Table S8. Frequency and abundance of nine Archaeorhizomycetes SHs 7 Table S9. Niche-specific distribution across two orthogonal contrasts, MANOVA test 7 Supplementary figures Figure S1. Overview of sampled plots at Ivantjärnsheden field station 8 Figure S2. All fungal ASV tree identifying ASVs in Archaeorhizomycetes 9 Figure S3. nMDS ordination of total fungal itASV community in ugit102 10 Figure S4. Ranked relative abundance and cumulative abundance of SHs in Fig. 1 11 Figure S5. Proportion (%) of Archaeorhizomycetes reads across soil horizons 12 Figure S6. Number of itASVs and SHs across soil horizon 13 Figure S7. Tree corresponding to Fig. 1 including node labels 14 Figure S8. nMDS of relative abundance of 68 taxa across soil horizons 17 Figure S9. Relative abundance of A. finlayi and SH_7 vs. SH_8 18 Figure S10. Tree corresponding to Fig. 2

Figure S1
Schematic view of plots sampled at Ivantjärnsheden field station in Jädraås. Redrawn from (Axelsson & Bråkenhielm, 1980). The stand was naturally regenerated after tree-felling in 1957 and thinned before the onset of an experimental field study conducted between 1974). In the study 30 x 30 m plots received one of three treatments (irrigation and fertilization (IF) in green, irrigation (I) in blue, un-manipulated control (UM) in brown as well as clear cut plots (CC) striped), labeled with plot number. The clear-cut (grey stripes) was part of a second study following the initial experiment and represent one plot from each previous treatment. Forest has since started to regenerated in clear-cut plots.

Figure S2
A maximum likelihood tree based on an alignment of the rDNA LSU region of 273 fungal ASVs from the current study as well as reference sequences A. borealis (KF993708), A. finlayi (JF836022) and the uncultured lineage GS31 (KY687760) which represents a sister class of Archaeorhizomycetes. Class level taxonomy based on the SINTAX classifier in USEARCH is amended to the ASV label when assigned. Unclassified sequences are amended with UNKNOWN. Bootstrap support values are calculated from 1000 iterations and shown on the Archaeorhizomycetes branch and all deeper branches. Collapsed clades represent Basidiomycota (including 106 ASVs), Ascomycota outside the Taphrinomycotina (111 ASVs) and Mucoromycota (11 ASVs). Forty-two ASV LSU regions representing Archaeorhizomycetes form a well-supported sister clade to GS31 and are highlighted by a box on the tree.

Figure S3
Non-parametric multidimensional scaling (nMDS) ordination displaying communities (fungal ASVs only, positive control sequence (itASV_1) removed; 4,461 ASVs / 1,680,044 sequence reads included) from all 96 samples sequenced on the ugit_102 IonTorrent sequencing run (plot stress = 0.24 based on a maximum of 200 random starts). Prior to ordination, the itASV by sample count matrix was standardized to per-sample relative proportions, and a dissimilarity matrix was calculated using the Bray-Curtis dissimilarity index. Samples in the current study (in yellow) was sequenced on an IonTorrent chip together with samples from several other studies including unpublished samples from Benin, Burkina Faso, India and Kyrgystan, as well as published data from Puerto Rico ) and IN, USA . Positive (pos) and negative (neg) controlls were also included. Samples generally cluster by study site. However jadraas24_B clusters with samples from Puerto Rico possibly indicating tag-swithing affecting this particular sample.

Figure S4
Relative sequence read abundance of the 34 most abundant Archaeorhiozmycetes species hypotheses (SH), delimited based on long-and short-environmental amplicon sequences from soil sampled in a south central Swedish pine forest. These represent half of the total number of SHs delimited at the site. Primary axis illustrates percentage of each SH out of the total fungal community based on reads across all 36 samples. SHs highlighted in dark red and indicated with an * before its label are supported by data from both long amplicon data from the "phylogenetic" dataset and short ampicon data from the "ecological" dataset. Secondary axis illustrates cumulative percentage of each SH as a propostion of the total Archaeorhizomycetes community at the site, based on sequence reads in the "ecological" dataset. Cumulative % of Archaeorhizomycetes community % of fungal community

Figure S5
Proportion (%) of reads assigned to class Archaeorhizomycetes out of all reads assigned to kingdom Fungi in the "ecological" dataset. Box-and-whiskers plots based on 12 samples from each of the soil horizons, organic soil (O), elluvial mineral soil (E) and illuvial mineral soil (B) with middle line at median, hinges at 25 th and 75 th percentile and whiskers extending 1.5 * inter-quartile range. Samples are visualized as grey dots and are darker where overlapping and outliers have dark centers.

Figure S6
A box-and-whiskers plot depicting the medians, first and third quantile and maximum and minimum of a) number of Archaeorhizomycetes itASVs and b) number of Archaeorhizomycetes SHs across soil horizons, organic soil (O), elluvial mineral soil (E) and illuvial mineral soil (B). These variables were significantly different based on 2-way ANOVA (Table S5). Outliers are represented as circles. b) c)

Figure S7
Diversity of Archaeorhizomycetes in mid-Sweden Pine forest podzol soil a) illustrated in maximum likelihood tree based on environmental ribosomal long and short amplicon ASVs representing 68 species hypotheses (SHs) including both described species Archaeorhizomyces borealis and Archaeorhizomyces finlayi. The sister taxa GS31 and four Taphrinamycotina species as outgroup. Nodes are cartooned to visualize SHs represented by more than one ASV. b) Maximum likelihood tree based on only the ITS2 region of the same dataset as above with only the sister taxon GS31 as outgroup. Species hypothesis from a) are illustrated with labled boxes next to the included nodes. c) Maximum likelihood tree based on long reads from the same dataset as above. Same outgroup as in a). Species hypothesis from a) are illustrated with labled boxes next to the included nodes. All Archaeorhizomycetes ASV from "phylogenetic" and "ecological" dataset are labled at the nodes. Bootstrap support are calculated from 1000 iterations. Tree files are available at https://doi.org/10.17605/OSF.IO/96DKZ

Figure S8
Results from nMDS analysis based on the relative abundance out of sequenced fungal community of 68 delimitated Archaeorhizomycetes SHs (red circles labeled SH_number, A.fin, A.bor or Xnumber (for single itASVs)). The soil horizons O (organic), E (elluvial mineral soil) and B (illuvial mineral soil) capturing the measured abundances are marked with their respective convex hulls. Stress value for the nMDS is 0.2058808.   (9802)] (Axelsson & Bråkenhielm, 1980) (Fig. S1). Between 1974 and1990, 30 x 30 m plots received one of four different treatments: irrigation and fertilization (IF), irrigation (I), un-manipulated control (UM), as well as clear cut plots (CC). The clear-cut was part of a second study following the initial experiment and represented one plot from each previous treatment. Forest has since started to regenerate in CC plots.
To account for small-scale variability in soil fungal communities, we collected five soil cores (5 cm diameter and 15 cm deep) in each plot after visually dividing the plot into four quadrants; one core each were taken from the middle of each quadrant and from the middle of the plot after peeling back the top shrub and moss layer (incl. most of the litter layer). Soil cores were separated into visually distinct podzol soil layers: organic soil (O, approximately 0-5 cm depth), mineral elluvial soil (E, 5-8 cm) and mineral illuvial soil (B, 8-15 cm), before pooling the layers for each plot. We sampled four types of plots: three treatments (I: 11,16,17;IF: 4,12,13;O: 21,23,24) as well as the clear cuts (CC: 1, 2, 3) (Fig. S1, Table S1). This sampling rendered a total of 36 soil samples that were separately homogenized in Ziploc bags before separating a 15 mL sample from each that was transported back to the laboratory on ice and stored at -20°C.
An extensive culturing effort was performed during summer and fall of 2013 and 2014, attempting to isolate species in Archaeorhizomycetes from Ivantjärnsheden field station. New soil samples were collected for isolations in 2014. The isolation protocol was based on previous successful isolations of Archaeorhizomycetes (Grelet et al., 2010;Menkis et al., 2014;Rosling et al., 2011). Roots were separated from soil samples collected as described above. Under a stereomicroscope, healthy and dead root tips about 1 cm long were selected, and superficially disinfected in 30 % peroxide for 30 s and rinsed in sterile, deionized water for 2 min. Then, root tips were cut in ~ 2 mm long pieces and placed in 10 cm diameter petri dishes containing modified Melin Norkrans media (MMN) (Marx, 1969) with half strength of glucose. Up to 20 root fragments were placed in each dish and incubated up to five months in darkness at room temperature. Since available Archaeorhizomycetes cultures are slow-growing, all fungal colonies that grew from the root fragment during the first three months were discarded by cutting them out of the plate using a sterile scalpel. Fungal colonies emerging from the root tips after this period were sub-cultivated on new dishes with the same MMN media. Days after plating were recorded for all transfers. Approximately 2,000 root tips were surface-sterilized and plated, resulting in just over 160 cultures somewhat resembling those of A. borealis or A. finlayi. DNA was extracted from these for amplification of the rDNA ITS and LSU region, using the forward primers ITS1 (White et al., 1990) and reverse LR3 (Hopple Jr & Vilgalys, 1994), and then amplified and sequenced by Sanger technology, also including the reverse primer ITS4 (Gardes & Bruns, 1993). The 98 usable sequences generated were assigned to SH in UNITE (Kõljalg et al., 2013) and BLASTed against GenBank (Altschul et al., 1990). None of the sequences matched Archaeorhizomycetes, but many were identified as root associated fungi in the genera Coemansia, Meliniomyces, and Phialocephala (among others). The sequences were deposited in GenBank (Sayers et al., 2020), with accession numbers MH843963-MH844060.

Method S2. Soil DNA extraction
From each composite soil sample, two sub-samples of approximately 0.5 g wet weight were collected into separate 2.0 mL microtube containing 750 μL of lysis buffer (Xpedition TM Soil/Fecal DNA miniprep, Zymo Research Corporation, Irvine, California, USA). Total soil DNA extraction was performed after homogenization for 30 s using a TerraLyser TM (Zymo Research Corporation), following the manufacturer's protocol. DNA concentration and integrity were verified by 0.8 % agarose gel electrophoresis on 0.5 % Tris Acetate-EDTA buffer (Sigma-Aldrich, St. Louis, Missouri, USA) stained with 1 × GelRed TM (Biotium Inc., Hayward, California, USA).

Method S3. Generating the "phylogenetic" sequence dataset
Approximately 1,500 bp of the rDNA ITS and LSU region was amplified from all soil DNA extracts using the primer set ITS1F (Gardes & Bruns, 1993) and LR5 (Hopple Jr & Vilgalys, 1994), with Phusion High-Fidelity DNA polymerase (Thermo Fisher Scientific, Waltham, Massachusetts, USA). We ran a thermo-cycling protocol as follows: an initial denaturation at 95 °C for 10 min followed by 30 cycles of denaturation at 95 °C for 45 s, annealing at 58 °C for 45 s, and elongation at 72 °C for 90 s, with a final elongation at 72 °C for 10 min. A total of 5, 8 and 3 samples successfully amplified by PCR for O, E and B horizons, respectively (Table S1). PCR products from the separate soil horizons were pooled and quantified by Nanodrop 2000C (ThermoScientific, Waltham US), and gel electrophoresis was performed to generate three amplicon libraries (SwO, SwE and SwB) for sequencing at SciLifeLab/NGI (Uppsala, Sweden) on a PacBio RS II system (Pacific Biosciences, Menlo Park, CA, USA). Sequences were delivered to us as error-corrected FASTQ files. Raw reads for the current study are available in ENA (samples ERS3508481-ERS3508483).
The "phylogenetic" sequences dataset was filtered in four steps for downstream phylogenetic analysis of the Archaeorhizomycetes diversity at the site. First, the raw sequence reads were filtered and trimmed using the tool cutadapt (version 1.18) (Martin, 2011) to keep only reads with both primers present, and to remove the actual primer sequences from the reads. Amplicons sequenced in reverse, were reverse complemented before continuing the analyses. Secondly, a quality-controlled, long read sequence dataset of all amplified sequence variants (ASVs) was generated using DADA2 (version 1.9.3) (Callahan et al., 2016). Default parameters were used for filtering the reads, but discarding sequences with more than 12 "expected errors" (maxEE=12). In addition, sequences shorter than 50 bp and longer than 3,000 bp were removed as spurious. The option 'pooled' was used both for denoising and chimera detection, in order to consider the sequences in all three samples/data sets together. For chimera removal, the option 'AllowOneOff' was used to allow for one mismatch or indel. Thirdly, the tool ITSx (version 1.1-beta)  was used to identify different regions of rDNA within each ASV. Seven of the ASVs were considered incomplete reads and removed from further analysis because no ITS2 region was detected. BLASTing of these sequences suggest that they represent protists. For the remaining ASVs, taxonomy was predicted with the ITS2 region using the SINTAX classifier (Edgar, 2010) (Kõljalg et al., 2013). This reference dataset was customized by replacing unassigned species level taxonomy with UNITE SH when available. Geneious (version 11.1.4) (Kearse et al., 2012) was used to align the LSU region of the 276 ASVs with the LSU sequences of A. finlayi (JF836022), A. borealis (KF993708) and the uncultured lineage GS31 (KY687760) which represents a sister class of Archaeorhizomycetes (Tedersoo et al., 2017). The Geneious alignment algorithm was set to the Global alignment with free end gaps option, a cost matrix of 65% similarity (5.0/-4.0) with gap open penalty 12 and gap extension penalty 3. The alignment was visually inspected, and three ASVs were removed because two were truncated (ASV_279 and ASV_196) and one (ASV_280) was chimeric with a possible protist sequence and represent 5 reads.
A final alignment across 1,186bp from 273 ASV and 3 reference sequences was uploaded in the CIPRES portal (Miller et al., 2010) and a maximum likelihood tree was built using the online version of RAxML XSED2 (version 8) (Stamatakis, 2014). The GTRGAMMA model following recommendations on parametrization in (Kelchner & Thomas, 2007), and 1000 iterations for the calculation of bootstrap support. The tree was visualized in FigTree (version 1.4.4) and rooted with two Rozellomycota sequences (ASV_229 and ASV_237). Taxonomy predictions with a confidence value of 0.8 or higher using class when available, or else phyla or domain was added to the ASV name in the tree file using a customized script. Forty-two ASVs representing Archaeorhizomycetes were identified as those forming a wellsupported clade together with A. borealis and A. finlayi, and distinct from GS31 (Fig. S2). The full-length sequences of these 42 ASVs were aligned in Geneious, as described above, and a ML tree was generated in the CIPRES portal as described above, and ASVs on long branches were visually inspected in the alignment. This procedure identified ASV_135 as a chimera formed from parent sequences ASV106 and ASV8, and was removed from the dataset. With these steps, we generated a high quality Archaeorhizomycetes rDNA sequence dataset consisting of 41 ASVs ranging in length from 1,373-1,801bp and representing 52,274 reads (Fig. S2, Table S2, S3). In the end, 272 ASV sequences, together with the most specific taxonomy level with a confidence value of 0.8 or above were published in GBIF (http://doi.org/10.15468/8zymuf) and are available in UNITE with accession numbers UDB0779092-UDB0779901.

Method S4. Generating the "ecological" sequence dataset
The ITS2 region of the rDNA genes was amplified using primers gITS7 forward (Ihrmark et al., 2012) and modified ITS4m reverse ) (a 1:1 mixture of ITS4 (White et al., 1990) and a modified version of it: 5ʹ-TCCTCGCCTTATTGATATGC-3ʹ), with both primers containing adequate barcode sequences for single-ended sequencing (Table S1). Modifications on the reverse primer ITS4 were included to reduce its known bias against the class Archaeorhizomycetes (Schadt & Rosling, 2015). PCR was performed in a final volume of 20 μL reactions with 10-20 ng DNA, 1 × SSoAdvanced TM Universal SYBR® Green Supermix (Bio-Rad Laboratories, Hercules, California, USA), and 0.8 nM of each primer. PCR amplifications were carried out on a CFR96 Touch TM Real/Time PCR Detection system (Bio-Rad Laboratories) with 10 min pre-denaturation at 95 °C, 1 min DNA denaturation at 95 °C, 45 s annealing at 50, 54 and 58 °C (performed in separate tubes) to compensate for primer binding bias (Schmidt et al., 2013), 50 s of extension at 72 °C and 3 min final extension at 72 °C. Monitoring amplification by qPCR allowed us to adjust the number of cycles between 23-27 for each plate, in order to ensure that amplification did not plateau and to minimize chimera formation. All reactions were carried out in parallel on the duplicate DNA extracts from each sample, combining all six PCR products before purification using the ZR-96 DNA Clean & Concentrator™-5 (Zymo Research Corporation). Duplicate quantification of PCR products was performed using the Quant-iT™ PicoGreen® dsDNA Assay Kit (Life Technologies Corporation, Carlsbad, California, USA) on a TECAN F500 microplate reader and the band purity and length were checked by electrophoresis in 2 % agarose gel 0.5 × TAE buffer. A sequencing library was prepared by pooling 35 ng PCR products (when available) from each sample including both negative and positive template controls, loaded onto a 318 chip for PGM Ion Torrent sequencing technology (Life Technologies Corporation, Carlsbad, CA, US), and sequenced at SciLifeLab/NGI (Uppsala, Sweden). The current study was sequenced on the same IonTorrent chip as four other studies with samples from different ecosystems including forest soils across Puerto Rico , mixed deciduous forest in Indiana, US  and two un-published datasets from west Africa (Benin and Burkina Faso) and central Asia (India and Kyrgyzstan). The 4,805,942 raw sequence reads were demultiplexed by the sequencing facility and provided as 96 FASTQ files. Raw reads for the current study are available in ENA (samples ERS4600640-ERS4600675).
The software package DADA2 (version 1.14.0) (Callahan et al., 2016) for R (version 3.6.1) (R Core Team, 2019) was used to quality filter the raw reads and infer ASVs. Prior to ASV inference, primer and barcode sequences were trimmed from the reads using cutadapt (version 2.6 with Python version 3.7.5) (Martin, 2011), allowing for up to two mismatches, and requiring detection of both forward and reverse primers, as well as a minimum primer/read overlap of 10 bp. After primer trimming, 1,905,188 reads (39.6% of raw reads) were retained. The `filterAndTrim` function of the DADA2 package was then used to trim the first 15 bp (`trimLeft = 15`), truncate the reads at any quality score of two (`tuncQ = 2`), and filter out reads that are less than 75 bp in length (`minLen = 75`) or have more than five expected errors (`maxEE = 5`). Next, error learning and denoising (via the `learnErrors` and `dada` functions, respectively) of the reads were done using default parameters, except that a homopolymer gap penalty of 1 was specified for the denoising step (`HOMOPOLYMER_GAP_PENALTY = -1`), and the alignment band size was increased to 32 for both error learning and denoising steps (`BAND_SIZE = 32`). All 96 samples from the chip were pooled for ASV inference (`pool = TRUE`) and `removeBimeraDenovo` function was used to detect and remove chimeras (`method = "pooled"`) and allowing for a one-off mismatch (`allowOneOff = TRUE), resulting in a total of 4,822 ASVs, representing 1,804,814 reads (37.6% of raw reads). For clarity, ASVs generated from IonTorrent data are called itASVs throughout the text.
Two methods were used to identify itASVs as putative Archaeorhizomycetes sequences. In the first method, taxonomy was predicted using a bootstrap support cutoff of 0.8 for all itASVs using the SINTAX classifier in USEARCH (v11.0.667_i86osx32) (Edgar, 2010) with a modified version of the UNITE USEARCH/UTAX reference dataset for all Eukaryotes (version 8.0) (UNITE Community, 2019). Since the SINTAX classifier cannot handle blank fields in taxonomy strings, the UNITE reference dataset was modified to replace blank fields with '<lowest_name_ided>_<rank>_Incertae_sedis', where <lowest_name_ided> is the identity of the lowest identified supra-taxonomic rank, and <rank> is the taxonomic rank of the blank field, using the custom python script add_Incertae_sedis.py. These taxonomy predictions were also used to detect non-fungal sequences (based on a 0.8 bootstrap cutoff). In the second method, itASVs were BLASTed against a reference dataset consisting of the 41 Archaeorhizomycetes ASVs from the phylogenetic dataset, plus five sequences of the undescribed class-level lineage GS31 (GenBank Accession numbers KY687669, KY687744, KY687760, KY687776, and KY687785) (Tedersoo et al., 2017), using the blastn command line tool (version 2.9.0) (Camacho et al., 2009) with a minimum of 70% query coverage per highscoring sepment pair (`-qcov_hsp_perc 70`). An itASV was identified as putatively Archaeorhizomycetes if it was detected via either method (i.e., any itASV with a blast hit to one of the Archaeorhizomycetes ASVs of the phylogenetic dataset or with a taxonomic prediction to Archaeorhizomycetes). A total of 182 and 282 itASVs were identified across the 96 samples via the SINTAX method and the BLAST method, respectively, and all 182 itASVs detected via the SINTAX method were also identified via the BLAST method.
After removing non-fungal itASVs as well as the sequence from the positive control DNA (itASV_1), the dataset consisted of 4,461 itASVs and 1,680,044 reads, of which 282 itASVs/425,349 reads were putatively Archaeorhizomycetes (25.3% of the fungal reads) across the entire sequencing run. The `metaMDS` function from the R package vegan (version 2.5-6) (Oksanen et al., 2013) was used to conduct an nMDS ordination of the complete itASV occurrence matrix across all samples (Fig. S3). The ordination was based on a distance matrix, calculated after transforming the matrix to per-sample relative abundances using the Bray-Curtis dissimilarity index, and a maximum of 200 random starts was specified.
The results indicate that sample 24B may suffer from tag-switching with one of the samples from a different study, however we still choose to include this sample in all downstream analyses, since only seven rare Archaeorhizomycetes itASV were unique to sample 24B, and therefore the Archaeorhizomycetes community was not notably different from the other Jädraås samples. Finally, the "ecological" dataset (an itASV count per sample matrix for the Jädraås samples) was generated by removing all samples from other studies and positive/negative controls. Further, itASVs occurring only once across the 36 Jädraås samples were filtered out. This "ecological" dataset consisted of 1,664 itASVs (619,176 reads) (Supplementary datafile 1), sequences are published in GenBank (accession numbers MT926458 -MT928121), and 123 of these itASVs (233,667 reads; 37.7% of Jädraås fungal reads) were putatively Archaeorhizomycetes (Table S4, Supplementary datafile 2).

Method S5. Delimiting phylogenetic species hypotheses
Phylogenetic species recognition We generated an alignment with the 123 itASVs from the "ecological" dataset and the 41 ASVs from the "phylogenetic" dataset including reference rDNA sequences of A. borealis (KF993708), A. finlayi (JF836021 and JF836022) and the uncultured lineage GS31 (KY687760) as well as several 5.8S and LSU rDNA sequences representing outgroup taxa in Taphrinomycotina  (DQ470973) and Taphrina wiesneri (Ráthay) Mix (NG_027620)) using Geneious as above. To avoid duplication of data, we identified and removed 25 itASVs that were identical to the ITS2 regions of a long read ASV in the alignment (Supplementary datafile 3). A ML tree was inferred from the final alignment with 146 sequences (including 41 ASVs covering ITS and LSU and 98 itASVs covering the ITS2) as described above. SHs were delimited using both Bayesian and Maximum likelihood implementations of the Poisson tree processes (PTP) model, using the online web server (Zhang et al., 2013) (Supplementary datafiles 4). FigTree v1.4.4 was used to visualize the resulting tree, and Maximum likelihood solution SHs were collapsed to their stem node for display. Only SHs supported by long reads were further analyzed for niche distribution. All alignments and trees are made available in TreeBASE, Study ID S26320.
Analysis of relative sequence read abundance and distribution of Archaeorhizomycetes A 2-way ANOVA (function aov in R 3.4.1) (R Core Team, 2019) was used to test for general trends in total fungal reads, number of Archaeorhizomycetes itASVs, number of Archaeorhizomycetes SHs, and relative abundance of Archaeorhizomycetes across soil horizons and treatments administered in previous studies (Table S5, Supplementary datafile 5). Terms were added sequentially and post-hoc Tukey test with HSD p-value correction for multiple comparisons were performed when significant effects were detected. The overall Archaeorhizomycetes community composition in relation to soil horizon was visualized using nMDS ordination (metaMDS function in the vegan R package) based on the relative abundance of the 68 delimited Archaeorhizomycetes SHs using the default Bray-Curtis distance. Two dimensions were assigned to the analysis with a maximum of 100 random starts. The significance of soil horizon, plot and treatments in shaping the Archaeorhizomycetes community composition was subsequently tested using permutational multivariate analysis of variance (PERMANOVA) implemented in the anova.cca function in the vegan R package (Oksanen et al., 2013) (Table S6). This analysis was based on relative abundance of the nine SHs represented by long read, and 999 permutations were performed. We also tested for soil horizon specificity of the nine SH using a Kruskal-Wallis test in R (kruskal.test function in R version 3.4.1) (R Core Team, 2019). The multiple comparisons p-values were corrected with the Benjamini & Hochberg-method (Table S7).
IonTorrent sequencing generated 4.5 times more reads than PacBio sequencing and we thus expected the "ecological" dataset to represent a more exhaustive sampling compared to the phylogenetic dataset. We used the "ecological" dataset to estimate how much of the Archaeorhizomycetes species richness and abundance was represented by the "phylogenetic" dataset, by calculating the proportion of SHs including long read ASVs out of all SHs (Fig. 1, Table S4). Read abundance of SHs in the "ecological" dataset was used to estimate the detection limit of the "phylogenetic" dataset by determining the rarest SH including long read ASVs. However, detection limits are not absolute since not all soil samples were included in the "phylogenetic" dataset (Table S1).
Ecological species recognition Based on the phylogenetic analysis of SHs detected at the site, we selected two pairs of sister SHs (SH_2 vs. SH_3 and SH_7 vs. SH_8) that were supported by long reads from the "phylogenetic" dataset and frequently observed in the "ecological" dataset ( Fig. S4, Table  S8). We tested these pairs for niche-specific distribution using relative abundance in the "ecological" dataset with soil horizons as operationally identified niches. Orthogonal contrasts for each pair were given as explanatory variables against soil horizon relative abundances in the statistical model, which was run with 200 permutations (Table S9).

Method S6. Placement of SHs with published sequences
To place Archaeorhizomycetes SHs from the current study in a larger phylogenetic context, an alignment that included publicly available environmental sequences previously identified as belonging to the Archaeorhizomycetes (Menkis et al., 2014), as well as new sequences affiliated with the class as identified by BLAST search in UNITE (Altschul et al., 1990). Environmental sequences were included if they covered at least two of the three rDNA regions ITS1, ITS2 or LSU. Duplicate sequences from individual studies were excluded. Sequences were aligned including the outgroup previously described and visually inspected using the Geneious software package (version 11.1.4) (Kearse et al., 2012) to remove suspected chimeras. The alignment was truncated after the LSU D3 region where some sequences had long inserts (Holst-Jensen et al., 1999) that was not covered by most previously published sequences. Further, insertions present in only two or less sequences were deleted because these positions often originated in previously published sequences where we had no mean to control for sequencing errors. Maximum likelihood trees were built using the online version of RAxML XSED2 (version 8) (Stamatakis, 2014) in CIPRES portal (Miller et al., 2010), specifying a GTRGAMMA model and 1000 iterations for the calculation of bootstrap support. To focus the analysis on SHs from the current study, a series of alignments and trees were generated through a process of stepwise removal of published sequences that separated on deep nodes without sequences generated in the current study. The final alignment included a total of 172 Archaeorhizomycetes sequences in addition to the 41 ASVs generated in the current study and six outgroup sequences (alignment and tree are available in TreeBASE Study S26320). SHs were delimitated across the tree using the bPTP portal as above and referred to as global SHs and visualized in TreeView by collapsing nodes corresponding to SHs according to the maximum likelihood solution. All included Archaeorhizomycetes sequences were mapped to UNITE species hypotheses at 98.5% by massBLAST of their ITS region (Supplementary datafile 6). The generated tree allowed us to evaluate the robustness of phylogenetic species delimitation in our local dataset and to visualize global sister clade relationships. Further, the larger Archaeorhizomycetes alignment was used to visually inspect and identify diagnostic sequences regions in both the ITS1 and ITS2 region for two novel species first hypothesized as SH_7 and SH_8 (Fig. 1).