IMA genome-F18

Visagie, Cobus M.; Magistà, Donato; Ferrara, Massimo; Balocchi, Felipe; Duong, Tuan A.; Eichmeier, Ales; Gramaje, David; Aylward, Janneke; Baker, Scott E.; Barnes, Irene; Calhoun, Sara; De Angelis, Maria; Frisvad, Jens C.; Hakalova, Eliska; Hayes, Richard D.; Houbraken, Jos; Grigoriev, Igor V.; LaButti, Kurt; Leal, Catarina; Lipzen, Anna; Ng, Vivian; Pangilinan, Jasmyn; Pecenka, Jakub; Perrone, Giancarlo; Piso, Anja; Savage, Emily; Spetik, Milan; Wingfield, Michael J.; Zhang, Yu; Wingfield, Brenda D.

doi:10.1186/s43008-023-00121-w

Fungal Genomes
Open access
Published: 06 October 2023

IMA genome-F18

The re-identification of Penicillium genomes available in NCBI and draft genomes for Penicillium species from dry cured meat, Penicillium biforme, P. brevicompactum, P. solitum, and P. cvjetkovicii, Pewenomyces kutranfy, Pew. lalenivora, Pew. tapulicola, Pew. kalosus, Teratosphaeria carnegiei, and Trichoderma atroviride SC1

Cobus M. Visagie¹,
Donato Magistà²,
Massimo Ferrara²,
Felipe Balocchi¹²,
Tuan A. Duong¹,
Ales Eichmeier⁴,
David Gramaje⁴,
Janneke Aylward^1,5,
Scott E. Baker^6,7,
Irene Barnes¹,
Sara Calhoun⁸,
Maria De Angelis¹³,
Jens C. Frisvad⁹,
Eliska Hakalova³,
Richard D. Hayes⁸,
Jos Houbraken¹⁰,
Igor V. Grigoriev^8,11,
Kurt LaButti⁸,
Catarina Leal⁴,
Anna Lipzen⁸,
Vivian Ng⁸,
Jasmyn Pangilinan⁸,
Jakub Pecenka³,
Giancarlo Perrone²,
Anja Piso¹,
Emily Savage⁸,
Milan Spetik³,
Michael J. Wingfield¹,
Yu Zhang⁸ &
…
Brenda D. Wingfield¹

IMA Fungus volume 14, Article number: 21 (2023) Cite this article

1899 Accesses
1 Citations
9 Altmetric
Metrics details

Introduction

Sequencing fungal genomes has now become very common and the list of genomes in this manuscript reflects this. Particularly relevant is that the first announcement is a re-identification of Penicillium genomes available on NCBI. The fact that more than 100 of these genomes have been deposited without the correct species names speak volumes to the fact that we must continue training fungal taxonomists and the importance of the International Mycological Association (after which this journal is named). When we started the genome series in 2013, one of the essential aspects was the need to have a phylogenetic tree as part of the manuscript. This came about as the result of a discussion with colleagues in NCBI who were trying to deal with the very many incorrectly identified bacterial genomes (at the time) which had been submitted to NCBI. We are now in the same position with fungal genomes. Sequencing a fungal genome is all too easy but providing a correct species name and ensuring that the fungus has in fact been correctly identified seems to be more difficult. We know that there are thousands of fungi which have not yet been described. The availability of sequence data has made identification of fungi easier but also serves to highlight the need to have a fungal taxonomist in the project to make sure that mistakes are not made.

IMA GENOME‐F 18A

The re-identification of Penicillium genomes available in NCBI

Introduction

Penicillium and its 536 accepted species represent one of the most commonly occurring and important fungal genera (Houbraken et al. 2020; Visagie et al. 2014). In recent years, whole genome sequencing efforts have increased and hundreds of Penicillium genomes are publicly available in the NCBI genome database (https://www.ncbi.nlm.nih.gov/datasets/genome). The study of these genomes is important, for example, to gain a better understanding of the biology of certain species. However, these studies and their communication depend on the use of the correct name of the genomes and the conclusions drawn from them. Analyses such as genome comparisons based on incorrect identifications lead to incorrect conclusions. The problem of misidentified genomes has already been highlighted by Houbraken et al. (2021), who also made several recommendations to prevent misidentifications in future. To support future studies using the genomes currently available in NCBI with the name Penicillium, we re-identify the genomes here using the modern taxonomy of the genus as published in Houbraken et al. (2020) who published an accepted species list and an updated subgeneric classification at the subgenus, section and series levels.

Materials and methods

A Penicillium reference dataset was compiled mainly based on the most recent taxonomy and accepted species list published by Houbraken et al. (2020). The six gene regions included in the dataset were beta-tubulin (BenA), calmodulin (CaM), RNA polymerase II second largest subunit (RPB2), RNA polymerase II largest subunit (RPB1), the subunit of the cytosolic chaperonin Cct ring complex (Cct8), and Tsr1, the protein required for processing 20S pre-rRNA in the cytoplasm. These gene regions were extracted from genomes downloaded for Penicillium from the NCBI Genome Portal using Geneious Prime v. 2023.1.2 and included in the dataset.

In our multi-gene phylogenetic analysis, each gene region was treated as separate partitions and introns and exons were taken into consideration where appropriate. Datasets were aligned using MAFFT v. 7.490 with the G-INS-i option (Katoh and Standley 2013). Alignments were trimmed or adjusted as needed and then concatenated in Geneious Prime. The General Time Reversible nucleotide substitution model with gamma distribution with invariant site (GTR + G + I) was chosen for all partitions. Maximum likelihood trees were calculated in IQ-tree v. 2.1.3 (Minh et al. 2020), subsequently visualised in TreeViewer v. 2.0.1 (https://treeviewer.org/) and edited in Affinity Publisher v. 2 (Serif (Europe), Nottingham, UK). The reference datasets, alignments and tree files were uploaded to the University of Pretoria research data repository hosted on Figshare (https://www.doi.org/10.25403/UPresearchdata.24004071).

Results and discussion

Of the 426 genomes analysed in this study, 281 were correctly identified, 87 were misnamed, 12 were misidentified and 33 were submitted as Penicillium without a species name (see Table 1, Additional file 1: Table 1 and Figs 1, 2). Of the correctly identified strains, 27 resolved in the P. camemberti species complex in the series Camembertiorum. This group is economically important and is typically used for the production of cheese like brie or camembert (Thom 1906). Taxonomically, this group and its six accepted species needs to be revised, but is complicated due to several past domestications (Ropars et al. 2020a, b). As there is little to no phylogenetic variation to guide identifications, we accept the name under which genomes from this group were submitted. Of the misidentified genomes, five belong to different genera including: GCA_023625675, which we believe to be a Candida species; GCA_023627405, which belongs to Aspergillus ustus; GCA_011750695, which belongs to Talaromyces minnesotensis; and GCA_002382835 and GCA_002382855, which belong to Talaromyces pinophilus. Six genomes were labelled with old names that have been synonymised, including: GCA_028828285 belonging to P. solitum (= P. majusculim) (Frisvad and Samson 2004)); GCA_025586815 belonging to P. desertorum (= P. glycyrrhizacola); GCA_015585885, GCA_015586035 and GCA_015585865 belonging to P. chrysogenum (= P. griseoroseum) (Houbraken et al. 2012); and GCA_028829675 belonging to P. glabrum (= P. tannophilum) (Houbraken et al. 2014). GCA_028974045 was submitted as a potential new species closely related to P. viridicatum and is identical to the recently described P. mali-pumilae (Hyde et al. 2019). Based on our analyses, we have identified three new species, including: GCA_028828675 in section Sclerotiora series Herqueorum; GCA_028827225 in section Fasciculata series Viridicata; and GCA_028826995, GCA_028974015 and GCA_028827235 in section Robsamsonia series Urticicola. Among the misidentified genomes were 12 that belong to different sections, including: GCA_000943775 and GCA_000943765 belonging to P. canescens in section Canescentia (not P. capsulatum in section Ramigena); GCA_015585765 and GCA_015585785 belonging to P. chrysogenum in section Chrysogena (not P. dipodomyicola in section Robsamsonia); GCA_028828875 and GCA_028826875 belonging to P. malacaense in section Idahoensia (not P. capsulatum in section Ramigena); GCA_015585975 belonging to P. rubens in section Chrysogena (not P. dipodomyicola in section Robsamsonia); GCA_020284065, GCA_019827435 and GCA_019828795 belonging to P. rubens in section Chrysogena (not P. fimorum in section Robsamsonia); GCA_015585905 belonging to P. rubens in section Chrysogena (not P. polonicum in section Fasciculata); and GCA_019804565 belonging to P. solitum in section Fasciculata (not P. robsamsonii in section Robsamsonia). We consider 87 genomes misnamed, with the submitted name being classified in the same series as our re-identified name. An example of this is the large number of genomes submitted as P. chrysogenum that belong to its closest relative, P. rubens in the series Chrysogena.

Table 1 Summary of genomes re-identified during this study. See Additonal file 1: Table 1 for the full list of strains analysed during this study.

Full size table

There are many reasons why genome sequences may have been submitted with names with which we disagree. The aim of this revision was not to criticise the submitters. Rather, we want to make our opinions known about the species to which available genomes belong, thus making the already very important resource that the submitters have created even more valuable. Based on our re-identifications, Penicillium genomes are currently available for 103 of 536 accepted species, representing both subgenera, 22 of 33 sections and 51 of 101 series.

Authors: Cobus M. Visagie*, Jens C. Frisvad, and Jos Houbraken.

*Contact: Cobus.Visagie@fabi.up.ac.za.

IMA GENOME‐F 18B