Addressing widespread misidentifications of traditional medicinal mushrooms in Sanghuangporus (Basidiomycota) through ITS barcoding and designation of reference sequences

“Sanghuang” refers to a group of important traditionally-used medicinal mushrooms belonging to the genus Sanghuangporus. In practice, species of Sanghuangporus referred to in medicinal studies and industry are now differentiated mainly by a BLAST search of GenBank with the ITS barcoding region as a query. However, inappropriately labeled ITS sequences of “Sanghuang” in GenBank restrict accurate species identification and, to some extent, the utilization of these species as medicinal resources. We examined all available 271 ITS sequences related to “Sanghuang” in GenBank including 31 newly submitted sequences from this study. Of these sequences, more than half were mislabeled so we have now corrected the corresponding species names. The mislabeled sequences mainly came from strains utilized by non-taxonomists. Based on the analyses of ITS sequences submitted by taxonomists as well as morphological characters, we separate the newly described Sanghuangporus subbaumii from S. baumii and treat S. toxicodendri as a later synonym of S. quercicola. Fourteen species of Sanghuangporus are accepted, with intraspecific distances up to 1.30% (except in S. vaninii, S. weirianus and S. zonatus) and interspecific distances above 1.30% (except between S. alpinus and S. lonicerinus, and S. baumii and S. subbaumii). To stabilize the concept of these 14 species of Sanghuangporus, their taxonomic information and reliable ITS reference sequences are provided. Moreover, ten potential diagnostic sequences are provided for Hyperbranched Rolling Circle Amplification to rapidly confirm three common commercial species, viz. S. baumii, S. sanghuang, and S. vaninii. Our results provide a practical method for ITS barcoding-based species identification of Sanghuangporus and will promote medicinal studies and commercial development from taxonomically correct material. Supplementary Information The online version contains supplementary material available at 10.1186/s43008-021-00059-x.


INTRODUCTION
Many macrofungi are established in traditional medicine and possess diverse properties (Wu et al. 2019a). "Sanghuang" comprises an important group of woodinhabiting mushrooms that have been utilized in traditional medicine in China and adjacent countries for 2000 years . Modern scientific studies have revealed several medicinal attributes of "Sanghuang", including antitumor, antioxidant, anti-inflammation, and immunomodulation activities ). This fungal resource has also attracted the attentions of fungal chemists and pharmacologists outside Asia (Chepkirui et al. 2018;Cheng et al. 2019). Natural products, such as polysaccharides, polyphenols, pyrones and terpenes are the bioactive compounds responsible for the medicinal properties of "Sanghuang" ). Today, "Sanghuang" is mainly consumed in a brewed tea made from small pieces of cultivated basidiomes or occasionally powdered mycelia.
Like other wood-inhabiting traditional medicinal mushrooms, such as "Lingzhi" (Cao et al. 2012;Wang et al. 2012;Yao et al. 2013Yao et al. , 2020Dai et al. 2017), "Niuchangchih" (Wu et al. 2012b(Wu et al. , 2012c and "Fuhling" (Redhead and Ginns 2006), there has been much debate about the taxonomic identity of "Sanghuang". Most fungal taxonomists now agree that "Sanghuang" is represented by species of Sanghuangporus ). Fourteen species have been described and accepted as members of Sanghuangporus: 11 species in Asia, and one in each of Africa, Europe, and North America ). In addition, more new species await to be described from Africa (Chepkirui et al. 2018;Cheng et al. 2019) and perhaps other parts of the world. Besides morphological and ecological (host preference) characters, the ITS barcoding region provides the most powerful tool for differentiating species of the genus. For example, more than half of the known species of Sanghuangporus were discovered with the aid of the ITS region alone (Wu et al. 2012a(Wu et al. , 2019bTian et al. 2013;Ghobad-Nejhad 2015;Tomšovský 2015;Zhu et al. 2017). Moreover, the reliability of the ITS region for species differentiation in the genus has been substantiated by a multilocus-based phylogenetic analysis (Zhu et al. 2019). Consequently, Zhou et al. (2020) reported ITS sequences from reliably identified voucher collections of the known species in the genus.
Transdisciplinary studies on Sanghuangporus have been performed to promote the utilization of this medicinal resource Cai et al. 2019;Zhu et al. 2019;Shao et al. 2020). Most of these studies aimed to identify their materials via a BLAST search of GenBank (https://www.ncbi.nlm.nih.gov/genbank/) using the ITS barcoding region as the query. However, even though each of the 14 species of Sanghuangporus has a reliable ITS sequence accession number ), it is not always easy to determine material in hand by a simple ITS-based BLAST search. This is a consequence of redundant and even incorrectly labeled ITS sequences in GenBank (Nilsson et al. 2006;Hofstetter et al. 2019). With inaccurately identified sequences emerging as potential matches, more collections will inevitably be inaccurately identified and the ITS sequences generated from the inaccurately identified collections will be submitted to GenBank compounding the issue and presenting new obstacles for later accurate identification. This means that there is high likelihood of medicinal and other attributes being attributed to incorrectly named species of "Sanghuang". Meanwhile, before the erection of the genus Sanghuangporus ), ITS sequences generated from "Sanghuang" were labeled under other generic names, such as Inonotus and Phellinus, even though with the correct epithets. This phenomenon confuses researchers who lack taxonomic knowledge, and results in a misapplication of species names to medicinal properties, which then has a negative effect on obtaining permissions from regulatory authorities for commercial development (Zhou 2020).
As stated by Zhou (2020), the use of correct scientific names for fungal species is crucial to studies of traditional Chinese medicine and their commercial exploitation. To facilitate the rational medicinal utilization of Sanghuangporus, all ITS sequences related to "Sanghuang" in GenBank should be re-examined to assist species identification. The aim of the current study is therefore to assess the utility of the ITS region for species discrimination in Sanghuangporus, and reset the species circumscriptions on the basis of the ITS barcoding region, in order to facilitate the correction of previously mislabeled ITS sequences in GenBank, and to provide candidate diagnostic ITS sequences for use in rapid species identification of Sanghuangporus using Hyperbranched Rolling Circle Amplification (HRCA).

Morphological examination
The newly sequenced specimens and strains are deposited in HMAS, IFP and BJFC. The specimens were observed with an Olympus BX43 light microscope (Tokyo, Japan) at magnifications up to 1000×. Microscopic procedure followed . Specimen sections were prepared in Cotton blue (CB), Melzer's reagent (IKI), and 5% potassium hydroxide (KOH). All measurements were made from material mounted in heated CB. When presenting the variation of basidiospore sizes, 5% of the measurements were excluded from each end of the range and are given in parentheses. Drawings were made with the aid of a drawing tube. In the text, L = mean basidiospore length (arithmetic average of all measured basidiospores), W = mean basidiospore width (arithmetic average of all measured basidiospores), Q = variation in the L/W ratios between the studied specimens, and (a/b) = number of basidiospores (a) measured from given number (b) of specimens.

Molecular sequencing
A small piece of the basidiome or culture was taken for DNA extraction, which was performed using a CTAB rapid plant genome extraction kit-DN14 (Aidlab Biotechnologies, Beijing). The crude DNA was used as templates for the PCR amplifications of the ITS region. The primer pairs ITS1F/ITS4 and ITS5/ITS4 (White et al. 1990;Gardes and Bruns 1993) were selected for amplification and subsequent sequencing at the Beijing Genomics Institute. The PCR procedure was as follows: initial denaturation at 95°C for 3 min, followed by 34 cycles at 94°C for 40 s, 57.2°C for 45 s and 72°C for 1 min, and a final extension at 72°C for 10 min. All newly generated sequences are deposited in GenBank (Table 1).

Downloading sequences from GenBank
The genus name Sanghuangporus and the epithets of 14 Sanghuangporus species were used first as queries to search GenBank. Meanwhile, the reliable sequences of 14 Sanghuangporus species  were used as queries to perform BLAST searches in GenBank. The cut-off value of similarity for the resulting sequences was set as 95%. All the ITS sequences matching these queries that had been deposited until 30 April 2020 were retrieved from GenBank (Table 1). In addition, recently published papers related to the taxonomy of Sanghuangporus were checked for supplementary information on collections generating these sequences (Wu et al. 2012a(Wu et al. , 2019bZhou and Qin 2012;Tian et al. 2013;Ghobad-Nejhad 2015;Tomšovský 2015;Han et al. 2016;Zhu et al. 2019;Huo et al. 2020;Shao et al. 2020).

Phylogenetic analyses
Two datasets of ITS sequences were assembled, one consisting of all sequences recovered from searches of GenBank and newly generated sequences, and the other consisting of the subset of sequences originating from material identified by taxonomists. The datasets were separately aligned using MAFFT 7.110 (Katoh and Standley 2013) under the G-INS-i option (Katoh et al. 2005). All resulting alignments are deposited in TreeBASE (http://www.treebase.org; accession number S26272). jMo-delTest (Guindon and Gascuel 2003;Posada 2008) was used to estimate the best-fit evolutionary model for each alignment with calculations made under the corrected Akaike information criterion. Following the estimated models, Maximum Likelihood (ML) and Bayesian Inference (BI) algorithms were used to construct midpoint-rooted trees for the alignments. The ML algorithm was performed using raxmlGUI 2.0 (Stamatakis 2014;Edler et al. 2021), and the bootstrap (BS) replicates were calculated under the auto FC option (Pattengale et al. 2010). The BI algorithm was performed using MrBayes 3.2 (Ronquist et al. 2012), which employed two independent runs each with four chains and starting from random trees. Trees were sampled every 1000th generation, of which the first 25% were removed as burn-in and the other 75% were retained for constructing a 50% majority consensus tree and calculating Bayesian posterior probabilities (BPPs). Tracer 1.5 (http:// tree.bio.ed.ac.uk/software/tracer/) was used to judge the convergence of the chains.

Evaluation of molecular species delimitation
Molecular species delimitation was estimated using multi-rate Poisson Tree Processes (mPTP) method (Kapli et al. 2017). The Newick tree file generated from the ML algorithm was directly uploaded to the web-service version (https://mptp.h-its.org/#/tree) with no outgroup taxon.

Evaluation of genetic distances of ITS sequences
The genetic distances of an alignment of ITS sequences were estimated using MEGA X (Kumar et al. 2018;Stecher et al. 2020). For genetic distances between and within species of Sanghuangporus, the parameters were set as follows: a BS method of variance estimation with 1000 BS replications, a p-distance substitution model including transitions and transversions, uniform rates among sites, and a pairwise deletion treatment of gaps and missing data.

Identification of diagnostic ITS sequences
Identification of diagnostic ITS sequences was according to the alignment of the ITS sequences generated using MAFFT 7.110 (Katoh and Standley 2013) under the G-INS-i option (Katoh et al. 2005); if a fragment was more than one nucleotide long and was unique for one species and not variant within this species then this fragment was identified as a potential diagnostic sequence for this species.

RESULTS
A total of 13 specimens and 18 strains were newly sequenced, and the resulting ITS sequences were submitted to GenBank (Table 1). According to our criteria, 240 ITS sequences were downloaded from GenBank, but two sequences (HQ845057 and KP974834, originally identified as Inonotus vaninii and Sanghuangporus baumii, respectively) showed unexpectedly large differences from other sequences of Sanghuangporus by BLAST search, and thus were considered not to belong to the genus and were excluded from subsequent phylogenetic analyses (Table 1). Eventually, a dataset of all available         269 ITS sequences (31 newly sequenced and 238 downloaded from GenBank) from Sanghuangporus species was used to construct a preliminary phylogenetic framework for this genus. An alignment of 941 characters resulted from this dataset, and HKY + G was estimated as the bestfit evolutionary model for phylogenetic analysis. The ML search stopped after 850 bootstrap replicates. All chains in BI converged after ten million generations, which is indicated by the estimated sample sizes (ESSs) of all parameters above 500 and the potential scale reduction factors (PSRFs) close to 1.000. The ML and BI algorithms generated nearly congruent topologies in the main lineages (Additional file 1: Tree S1, Additional file 2: Tree S2). Therefore, only the topology from the ML algorithm is visualized in a circle form here; the midpoint-rooted tree recovered 13 species and four undescribed lineages of Sanghuangporus (Fig. 1). The one species gap compared with the 14 accepted species is a result of collections previously identified as S. quercicola and S. toxicodendri (this species is represented by collections Wu 1805-2, Wu 1805-3, Wu 1805-5, Wu 1807-2, Wu 1807-3 and Wu 1807-4) nesting within a single clade (Fig. 1). Of the 13 recovered species of Sanghuangporus, the clades of S. lonicericola and S. sanghuang did not receive good statistical support, the clade of S. alpinus was strongly supported just by the BI algorithm, and the other species were all strongly supported by both the ML and the BI algorithms (Additional file 1: Tree S1, Additional file 2: Tree S2). Sanghuangporus microcystideus merged with S. sp. 1 in the tree inferred from the ML algorithm (Fig. 1, Additional file 1: Tree S1), but was separated from S. sp. 1 in the BI tree (Additional file 2: Tree S2). The relationship between S.  (Fig. 1). In GenBank, species names from 10 out of 77 phylogenetically analyzed specimens were misapplied (tips labeled in green in Fig. 1), while those from 134 out of 192 phylogenetically analyzed strains were wrongly identified to species level (tips labeled in red in Fig. 1). Furthermore, two ITS sequences (HQ845057 and KP974834) of strains labeled as species of Sanghuangporus were extremely deviant and did not belong to the genus (Table 1). Most of these errors came from submissions by non-taxonomists. Therefore, to circumscribe species in Sanghuangporus, we selected the ITS sequences submitted to GenBank by Fig. 1 The phylogenetic tree inferred from 269 ITS sequences. The topology was generated from the maximum likelihood algorithm. The tips in green represent mislabeled specimens, while those in red represent mislabeled strains Shen et al. IMA Fungus (2021) 12:10 taxonomists for a new round of phylogenetic analysis (Table 1). The new dataset included 122 ITS sequences and resulted in an alignment of 871 characters with HKY + I + G as the best-fit evolutionary model. The ML search stopped after 450 bootstrap replicates. All chains in BI converged after four million generations, which is indicated by the ESSs of all parameters above 1000 and the PSRFs close to 1.000. The ML and BI algorithms generated nearly congruent topologies in the main lineages, and so only the midpoint-rooted ML tree is presented along with the BPPs at the nodes (Fig. 2). As in Fig. 1, this tree also recovered 13 species of Sanghuangporus with S. quercicola and S. toxicodendri nested within a single clade (Fig. 2). Among these 13 species, the clade of S. lonicericola was still not strongly supported, and the clades of S. alpinus and S. sanghuang were moderately supported from the ML algorithm and fully supported from the BI algorithm, while the clades of all other species received strong statistical support from both the ML and the BI algorithms (Fig. 2). Moreover, in the seven collections of the undescribed lineage close to S. baumii in Fig. 1, four were sampled in the new dataset, and the independence of these four collections and their affinity to S. baumii were also strongly supported (Fig. 2). Therefore, this undescribed lineage is described as a new species, S. subbaumii, below. Molecular species delimitation was estimated on the tree generated from the new dataset with 122 selected ITS sequences. The mPTP method supported the independence of 11 species, while Sanghuangporus alpinus, S. lonicerinus and S. weigelae were recovered as a single species (Additional file 3: Fig. S1).
To further explore the species relationships among Sanghuangporus, the alignment with 122 selected ITS sequences underwent a genetic distance analysis. The ranges of the within and between species genetic distances are mostly non-overlapping (Additional file 4: Table S1). Sanghuangporus microcystideus and S. pilatii, each represented by a single collection, were excluded from the within species analysis. Regarding other species of Sanghuangporus, the genetic distances within S. vaninii, S. weirianus and S. zonatus were 0-1.72%, 2.68% and 0-1.71%, respectively, whereas those within other species were no more than 1.30% and as low as 0.00% within S. ligneus (Additional file 4: Table S1). Regarding the genetic distances between species, all were above 1.30% except that those between S. alpinus and S. lonicerinus, and S. baumii and S. subbaumii were 1.03-2.86% and 1.19-3.07%, respectively. Across all pairwise comparisons between species, most (84 of 91) had distances above the maximum within species distance of 2.68% (Additional file 4: Table S1). Furthermore, distances between S. microcystideus and all other species were more than 8.90% and those between S. pilatii and all other species were more than 2.69% (Additional file 4: Table S1).
Based on an integrative taxonomic approach, 14 species of Sanghuangporus are accepted here. Their taxonomic information and reliable ITS sequences (from holotypes where possible) are provided below. Regarding S. baumii, S. lonicericola, S. lonicerinus, S. microcystideus, S. pilatii, S. vaninii, and S. weirianus, their holotypes were too old (50 years old or more) and so were unlikely to be successfully sequenced. Moreover, certain institutions did not make holotypes available for sequencing. Therefore, we use ITS sequences from other reference collections as reliable ITS sequences for those species.
Fifty-four ITS sequences of S. baumii, S. sanghuang and S. vaninii, the most common species in medicinal studies and products , were further retrieved from the dataset with 122 selected sequences. These 54 sequences were realigned and the alignment is presented with shaded background (Additional file 5: Fig. S2). From this alignment, ten potential diagnostic sequences with two to six nucleotide differences were identified for HRCA to differentiate species: two for S. baumii, two for S. sanghuang and six for S. vaninii (Additional file 5: Fig. S2, Table 2). Sanghuangporus baumii (Pilát) L.W. Zhou & Y.C. Dai, Fungal Diversity 77: 340 (2016). Basionym: Phellinus baumii Pilát, Bull. trimest. Soc. mycol. Fr. 48: 25 (1932).  Fig. 2 The phylogenetic tree inferred from ITS sequences submitted by taxonomists. The topology was generated from the maximum likelihood algorithm, and bootstrap values and Bayesian posterior probabilities simultaneously above 50% and 0.8, respectively, are presented at the nodes

DISCUSSION
In this study, we summarized all available ITS barcoding sequences bearing the name "Sanghuang" in GenBank. A total of 271 ITS sequences related to "Sanghuang", including 31 newly generated sequences from this study, were analyzed. In association with previous information of morphology, hosts, and multilocus-based phylogeny, 14 species are accepted as members of Sanghuangporus including the new species S. subbaumii described herein. We also synonymize S. toxicodendri under S. quercicola. Sanghuangporus subbaumii has a phylogenetically close relationship to S. baumii; however, these two species form two distinct lineages with strong support (Additional file 1: Tree S1, Additional file 2: Tree S2, Fig. 2). Moreover, S. subbaumii and S. baumii were also estimated as two independent species using the mPTP method (Additional file 3: Fig. S1), and for ITS the interspecific distance is 1.19-3.07%, generally above the cutoff value of interspecific distances (1.30%) within Sanghuangporus (Additional file 4: Table S1). Besides molecular evidence, morphological differences between these two species are also clear. Geographically, S. subbaumii is only known from North China, whereas Chinese collections of S. baumii are distributed in north-east China (Table 1).
Sanghuangporus toxicodendri was recently described from specimens collected from Toxicodendron sp. in Hubei, central China (Wu et al. 2019b) and resembles S. quercicola, another species originally described from central China (Zhu et al. 2017). However, in the publication introducing S. toxicodendri (Wu et al. 2019b) the separation from S. quercicola was not well-supported phylogenetically. Moreover, the morphological differences between these two species are slight (such as for basidiospore length) or involve variable characters that do not have taxonomic signal (such as the surface color of the pileal margin) (Zhu et al. 2017;Wu et al. 2019b). In the current phylogenetic analyses, the six specimens of S. toxicodendri, three specimens of S. quercicola and four additional collections merged in a fully supported clade (Additional file 1: Tree S1, Additional file 2: Tree S2, Fig. 2). The mPTP-based estimation of species delimitation also treated S. toxicodendri and S. quercicola as a single species (Additional file 3: Fig. S1) and the intraspecific distances among ITS sequences under both names were 0-1.11%, well below the threshold of 1.30% (Additional file 4: Table S1). Therefore, S. toxicodendri and S. quercicola are considered conspecific, and S. quercicola has priority by publication date over S. toxicodendri.
The clade of S. lonicericola was present but not wellsupported in our phylogenetic analyses (Additional file 1: Tree S1, Additional file 2: Tree S2, Fig. 2). Similarly, the clades of S. alpinus and S. sanghuang were not strongly supported by the ML algorithm (Fig. 2). For S. lonicericola and S. alpinus, despite the lack of support in one or both analyses, each formed a distinct clade, and for both species distances to other species were above the threshold of 1.30% (S. lonicericola minimum 2.19% and S. sanghuang minimum 2.90%; Additional file 4: Table S1). In addition, S. alpinus, S. lonicerinus, and S. weigelae, even though forming three independent lineages, were considered conspecific by the mPTP method (Additional file 3: Fig. S1). However, the interspecific distances for ITS between S. weigelae and each of S. alpinus and S. lonicerinus are above the cut-off value of interspecific distances (1.30%) within Sanghuangporus (Additional file 4: Table S1). Regarding the pair of S. alpinus and S. lonicerinus, for ITS the between species distance (1.03-2.86%) was generally above the intraspecific distances within either species (0-1.08% and 0-1.18%, respectively; Additional file 4: Table S1). Moreover, the monophyly of S. alpinus was strongly supported by the BI algorithm and that of S. lonicerinus was strongly supported by both the ML and the BI algorithms (Fig. 2). Besides, morphological delimitations among these five species are stable (Wu et al. 2012a;Tian et al. 2013;. Taking all this into account, we accept S. alpinus, S. lonicericola, S. lonicerinus, S. sanghuang, and S. weigelae as five independent species.
Sanghuangporus vaninii, S. weirianus, and S. zonatus are the only three species with intraspecific ITS  12:10 Page 17 of 21