IMA Genome - F16

© The Author(s) 2022. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. IMA GENOME‐F 16A Draft genome assembly of Fusarium marasasianum Introduction Many plants are thought to have at least one Fusariumassociated disease with more than 80% of economically important plants affected by at least one Fusarium disease (Leslie and Summerell 2006). The socioeconomic importance of Fusarium is particularly evident when considering the Fusarium fujikuroi species complex (FFSC, sensu Geiser et al. 2021). This monophyletic group contains 65 species and numerous cryptic species (Yilmaz et al. 2021). More than 50 species in the FFSC have publicly available genomes (www. ncbi. nlm. nih. gov), indicative of their economic importance. A number of recent studies showed that the FFSC complex contains four large clades (Herron et al. 2015; Sandoval-Denis et al. 2018; Yilmaz et al. 2021). One of these corresponds to the so-called “American” clade that was initially proposed to reflect the biogeography of the species it contains (O’Donnell et al. 1998). For example, Fusarium circinatum, the pine pitch canker pathogen, is thought to be native to Mexico and Central America (Drenkhan et al. 2020), where it likely co-evolved with its Pinus hosts (Herron et al. 2015; O’Donnell et al. 1998; Wikler and Gordon 2000). The American clade also includes five additional species associated with Pinus species in Colombia. These species are F. fracticaudum, F. pininemorale, F. parvisorum, F. marasasianum, and F. sororula, of which F. parvisorum, F. marasasianum, and F. sororula displayed levels of pathogenicity that were comparable to those of F. circinatum on susceptible Pinus species (Herron et al. 2015). The risk that the various American clade species pose to forestry in Colombia and globally has provided the impetus for projects aiming to sequence their genomes. To complement the genomic resources available for F. circinatum (Fulton et al. 2020; van der Nest et al. 2014a; Van Wyk et al. 2018; Wingfield et al. 2012, 2018a), the genomes of F. pininemorale (Wingfield et al. 2017), F. fracticaudum (Wingfield et al. 2018b) and F. sororula (van der Nest et al. 2021) have been published. Here we present the whole genome sequence for the pine pathogen F. marasasianum, named after the late South African professor Walter “Wally” F.O. Marasas (Wingfield and Crous 2012) who specialised in the taxonomy of Fusarium species and their associated mycotoxins.


Draft genome assembly of Fusarium marasasianum Introduction
Many plants are thought to have at least one Fusariumassociated disease with more than 80% of economically important plants affected by at least one Fusarium disease (Leslie and Summerell 2006). The socioeconomic importance of Fusarium is particularly evident when considering the Fusarium fujikuroi species complex (FFSC, sensu Geiser et al. 2021). This monophyletic group contains 65 species and numerous cryptic species (Yilmaz et al. 2021). More than 50 species in the FFSC have publicly available genomes (www. ncbi. nlm. nih. gov), indicative of their economic importance.
A number of recent studies showed that the FFSC complex contains four large clades (Herron et al. 2015;Sandoval-Denis et al. 2018;Yilmaz et al. 2021). One of these corresponds to the so-called "American" clade that was initially proposed to reflect the biogeography of the species it contains (O'Donnell et al. 1998). For example, Fusarium circinatum, the pine pitch canker pathogen, is thought to be native to Mexico and Central America (Drenkhan et al. 2020), where it likely co-evolved with its Pinus hosts (Herron et al. 2015;O'Donnell et al. 1998;Wikler and Gordon 2000). The American clade also includes five additional species associated with Pinus species in Colombia. These species are F. fracticaudum, F. pininemorale, F. parvisorum, F. marasasianum, and F. sororula, of which F. parvisorum, F. marasasianum, and F. sororula displayed levels of pathogenicity that were comparable to those of F. circinatum on susceptible Pinus species (Herron et al. 2015).
The risk that the various American clade species pose to forestry in Colombia and globally has provided the impetus for projects aiming to sequence their genomes. To complement the genomic resources available for F. circinatum (Fulton et al. 2020;van der Nest et al. 2014a;Van Wyk et al. 2018;Wingfield et al. , 2018a, the genomes of F. pininemorale , F. fracticaudum (Wingfield et al. 2018b) and F. sororula (van der Nest et al. 2021) have been published. Here we present the whole genome sequence for the pine pathogen F. marasasianum, named after the late South African professor Walter "Wally" F.O. Marasas (Wingfield and Crous 2012) who specialised in the taxonomy of Fusarium species and their associated mycotoxins.

Materials and methods
Fusarium marasasianum CMW 25512 was grown on ½ potato dextrose agar (PDA) medium consisting of 20% w/v PDA and 5% w/v agar at 25 °C. Genomic DNA was extracted as described previously (van der Nest et al. 2021) and used to generate one paired-end library (550 bp insert size and read length of 251 bp) that was then sequenced using the Illumina HiSeq 2500 platform at Macrogen (Seoul, Korea). After duplicate and poor quality reads were removed using the Qiagen Genomics Workbench v. 20.0.4 (CLCBio, Aarhus), the remaining reads were assembled using SPAdes v. 3.13.0 (Bankevich et al. 2012). The completeness of the genome assembly was determined with BUSCO v. 4.0.6 utilising the "hypocreales" dataset (Manni et al. 2021). We used the MAKER annotation pipeline (Cantarel et al. 2008), which uses Augustus (Stanke et al. 2006), Genemark ES (Ter-Hovhannisyan et al. 2008) and SNAP (Korf 2004) to annotate the assembly. In these procedures, annotation data from F. circinatum , F. fujikuroi (Wiemann et al. 2013), F. verticillioides (Ma et al. 2010), F. mangiferae and F. proliferatum (Niehaus et al. 2017) were included as supporting evidence for gene models.
Placement of F. marasasianum CMW 25512 within the FFSC was verified using phylogenetic analysis of a dataset containing translation elongation factor 1-α and β-tubulin gene sequences for relevant FFSC taxa . For this purpose, sequences were aligned using MAFFT v. 7.487 (Katoh et al. 2019), concatenated and subjected to maximum likelihood phylogenetic analysis in PhyML v. 3.1 (Guindon et al. 2010). As indicated by jModelTest v. 2.1.10 (Darriba et al. 2012), the analysis employed the generalised time reversible (GTR) model (Tavare 1986) with a proportion of invariable sites and gamma correction for among site rate variation.

Results and discussion
Assembly of the F. marasasianum genome yielded a total genome size of 47,207,981 bp with a G + C content of 46.25%. The assembly consisted of 166 contigs with a N50 of 1,535,275 bp, and phylogenetic analysis confirmed the taxonomic identity of the sequenced genome as F. marasasianum (Fig. 1). Genome completeness was estimated to be 99.8% corresponding to 99.6% complete and singlecopy BUSCOs, 0.2% complete and duplicated BUSCOs and 0.2% missing BUSCOs (n = 4494). A total of 15,564 gene models were predicted in the F. marasasianum assembly with a density of 329.69 orfs/Mbp. Sequence analysis showed that the twelve chromosomes typically present in species from the FFSC are found in F. marasasianum CMW 25512. In addition to F. circinatum, the genomic resource presented here represents the fifth genome (van der Nest et al. 2021;Wingfield et al. 2017Wingfield et al. , 2018b for species associated with pitch canker-like symptoms on Pinus spp. (Herron et al. 2015). It has been suggested that these Fusarium species diversified alongside pines in Mexico/ Central America (Herron et al. 2015;O'Donnell et al. 1998) and that their distribution is driven by international trade, thereby posing significant quarantine risks (Drenkhan et al. 2020). Availability of genomic resources for these fungi will facilitate and stimulate research aimed at resolving questions regarding their shared evolutionary history, ecology and pathogenicity. Introduction The family Ceratocystidaceae was redefined in 2014 to accommodate genera previously treated as species complexes in the genus Ceratocystis s. lat. (de Beer et al. 2014;Wingfield et al. 2013). One of these genera, Huntiella, now accommodates species previously placed in the C. moniliformis species complex and includes 31 species (Liu et al. 2018(Liu et al. , 2020Marin-Felix et al. 2019). Amongst the genera in Ceratocystidaceae, Huntiella species are defined by being typically saprobic, while many other species in this family are important plant pathogens or agents of blue stain (de Beer et al. 2014).
While most of the recognized Huntiella species are heterothallic, a small number have been described as unisexual (Wilson et al. 2015(Wilson et al. , 2021a. In heterothallic species such as H. omanensis and H. bhutanensis, individuals harbour either the MAT1-1 or MAT1-2 idiomorph, which confer the MAT1-1 and MAT1-2 mating types, respectively (Wilson et al. 2015(Wilson et al. , 2021a. Sexual development consequently requires an interaction between two individuals of opposite mating type, as in other heterothallic ascomycetes (Wilson et al. 2021b). In contrast, only MAT1-2 isolates of H. moniliformis and H. fecunda have been discovered and despite the absence of the MAT1-1 mating type, these species are capable of independent Page 3 of 22 Wingfield et al. IMA Fungus (2022) 13:3 sexual reproduction via the unisexual pathway (Liu et al. 2018;Wilson et al. 2015).
In this study, we sequenced the genome of Huntiella abstrusa, using both the PacBio and Illumina platforms in an effort to generate a high-quality draft genome sequence. This species was described based on only a single isolate and was originally described as unisexual (Marin-Felix et al. 2019). The results from this study showed that the isolate that was used in the original description represented a mixed culture, with individuals of both the MAT1-1 and MAT1-2 mating types present. This species is thus heterothallic.

Nucleotide sequence accession number
This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession: JAJNMT000000000. The version described in this paper is version JAJNMT000000000.

Materials and methods
Genomic DNA was extracted from H. abstrusa using a rapid salt-extraction protocol (Aljanabi and Martinez 1997), with modification (Duong et al. 2013). For the long-read sequencing, a library was constructed from DNA extracted from a mixed mating type culture of H. abstrusa and sequencing was conducted at Macrogen (Seoul, Korea) using PacBio RSII 10 Kb SMRTbell template libraries and the DNA polymerase binding kit P6 v. 2. For the short-read sequencing, a library was constructed from DNA extracted from a single MAT1-2 individual that was isolated from the mixed mating type culture used for the long-read sequencing. An Illumina library was prepared using the TruSeq PCR free library kit with 550 bp median insert size and sequenced at Macrogen (Seoul, Korea) using the HiSeq 2500 platform, generating paired end reads of 251 bp.
The PacBio reads were assembled using Flye v. 2.8.1 (Kolmogorov et al. 2019). This assembly was subsequently polished with the trimmed Illumina reads using Fig. 1 Maximum likelihood tree based on the partial gene sequences of translation elongation factor 1-α and β-tubulin Herron et al. 2015;Wingfield et al. 2015aWingfield et al. , 2018a. Values at branch nodes are the bootstrapping confidence values with those ≥ 85% shown. The F. marasasianum isolate sequenced in this study is indicated in bold; F. marasasianum CBS 137238 is ex-holotype Page 4 of 22 Wingfield et al. IMA Fungus (2022) 13:3 Pilon v. 1.23 (Walker et al. 2014). Three iterations of polishing were done to generate the final assembly. Genome statistics were summarized using Quast v. 5.1 (Gurevich et al. 2013). Genome completeness was evaluated using BUSCO v. 4.0.6, using the fungi_odb10, ascomycota_odb10, and sordariomycetes_odb10 lineage datasets (Simão et al. 2015). AUGUSTUS v. 3.2.3 was used to annotate protein coding genes, using the Fusarium graminearum gene models (Stanke et al. 2006). Phylogenetic analyses were conducted to confirm the identity of the isolate used for genome sequencing. The sequences for three gene regions (ITS, BT1 and TEF-1α) were extracted from the sequenced genome and combined with homologous sequences from seven other Huntiella species (Liu et al. 2020). Each gene region was aligned independently using the online version of MAFFT v. 7.0 (Katoh and Standley 2013) with default settings. MrModelTest2 v. 2.4 (Nylander 2004) was used to conduct model testing on each alignment, after which the alignments were concatenated into a single file. MrBayes v. 3.2.7 was subsequently used for Bayesian inference analyses. This analysis was run for 500,000 generations, with 10 parallel runs, 4 chains, and using the models as identified by MrModelTest2. Trees were sampled every 100 generations and 25% of the sampled trees were discarded as burn-in. Posterior probabilities were calculated from the remaining trees.

Results and discussion
The genome sequence of H. abstrusa was 29.5 Mb; assembled into a total of 287 contigs, 274 of which were above 1000 bp in length. The N50 and N90 values were 420,278bp and 67,803bp, respectively, and the L50 and L90 values were 18 and 76, respectively. The GC content was 48.5%. The BUSCO analysis showed that the assembly was 97.9%, 96.3%, and 86.9% complete with respect to the fungi, Ascomycota, and Sordariomycete datasets. A total of 7 952 protein coding genes were predicted by AUGUSTUS. The phylogenetic placement of this isolate was confirmed (Fig. 2), showing that H. abstrusa is most closely related to H. microbasis and others in the so-called Asian clade.
The results showed that the isolate used to describe H. abstrusa and used for the long-read genome sequencing in this study represented a mixed culture. This isolate consisted of individuals of both the MAT1-1 and MAT1-2 mating types. Assembly with the PacBio reads preferentially assembled the MAT1-2 idiomorph due to low coverage of the MAT1-1 idiomorph. Furthermore, the short-read genome sequencing was conducted using DNA from a single MAT1-2 isolate. Thus, the final Fig. 2 Identity confirmation of the Huntiella abstrusa isolate sequenced in this study. Three gene regions (ITS, BT1 and TEF-1α) were extracted from the assembled genome and compared to other species within the Asian clade of the genus Huntiella. The phylogeny was produced using Bayesian Inference and posterior probabilities are indicated at the nodes Page 5 of 22 Wingfield et al. IMA Fungus (2022) 13:3 genome, assembled using PacBio reads and polished used Illumina reads, represented a single MAT1-2 isolate. As expected, the MAT locus was associated with SLA2 and APC, genes that are frequently present near the MAT1 locus in Pezizomycotina species (Wilken et al. 2017).
The H. abstrusa genome was slightly larger than some of the other Huntiella genomes that have been sequenced to date (Table 1), including H. moniliformis (van der Nest et al. 2014b), but is smaller than that of H. omanensis (van der Nest et al. 2014a). This size difference can be accounted for by the difference in the number of genes in each of these genomes, with the average number of ORFs per Mb remaining stable across the species. Future research may focus on constructing the pangenome of these species, identifying the genes that are differentially present/absent in these species and determining whether their functions can be linked to their lifestyles.
The genome sequence of H. abstrusa presented here is the fifth to be published for a species of Huntiella (van der Nest et al. 2014a, b;Wingfield et al. 2016Wingfield et al. , 2017 and is one of more than 15 genomes for species in Ceratocystidaceae (van der Nest et al. 2014a;Wilken et al. 2013;Wingfield et al. 2016). Numerous Huntiella species have recently been the subject of investigations considering their mating behaviours and sexual strategies as well as the gene content and distribution of the MAT loci and pheromone response pathway (Wilson et al. , 2020(Wilson et al. , 2021a. The availability of the genome of H. abstrusa will allow for further genome comparisons between species that exhibit different sexual strategies. Furthermore, there are notable differences between Huntiella species, which are almost exclusively saprobic, and species from other Ceratocystidaceae genera, which are typically pathogenic (de Beer et al. 2014). Thus, this genome and others like it will contribute towards a better understanding of the underlying genetic mechanisms that govern the ecology of these fungi, especially pathogenicity, virulence, and host specificity.

Draft genome sequences for two different isolates of the stem canker pathogen Immersiporthe knoxdaviesiana Introduction
The Cryphonectriaceae include many important tree pathogens (Gryzenhout et al. 2009), notably the causal agent of the devastating chestnut blight, Cryphonectria parasitica. Immersiporthe knoxdaviesiana was first reported causing a serious stem canker disease on native Rapanea melanophloeos in a botanical garden in the Western Cape Province of South Africa (Chen et al. 2013). The disease was apparently new to the area and was observed to be spreading rapidly. Pathogenicity trials showed that I. knoxdaviesiana is aggressive on R. melanophloeos and able to kill trees in a short period of time (Chen et al. 2013). This has led to the suggestion that the fungus is an introduced pathogen (Wingfield et al. 2020). However, population genetic studies, for example utilizing microsatellite markers would be needed to confirm the hypothesis and thus clarify the origin of the pathogen.
Twelve genome sequences are currently available for species of Cryphonectriaceae. These include, Chrysoporthe austroafricana (Wingfield et al. 2015b), C. cubensis, C. deuterocubensis (Wingfield et al. 2015a), C. puriensis (van der Nest et al. 2021), Celoporthe dispersa (Liu et al. 2019), as well as seven species of Cryphonectria (https:// www. ncbi. nlm. nih. gov/ genome/). The aim of this study was to provide genome sequence data for I. knoxdaviesiana, a fourth genus of Cryphonectriaceae. This study also included a newly collected isolate of the pathogen from an area distant from where the canker disease was first recorded.

Nucleotide sequence accession number
The genomic sequences of Immersiporthe knoxdaviesiana (CMW 37318 and CMW 55904) have been deposited at DDBJ/EMBL/GenBank under the accession number ASM2111731v1 and JAJNGR000000000 respectively.

Material and methods
Immersiporthe knoxdaviesiana isolates CMW 37318 and CMW 55904 were obtained from the culture collection (CMW) of the Forestry and Agricultural Biotechnology Institute (FABI), the University of Pretoria and the latter culture has also been preserved in the collection of the China Eucalypt Research Centre (CERC), Chinese Academy of Forestry (CAF), Zhanjiang, Guangdong Province, China. Genomic DNA was extracted from single hyphal tip cultures grown on malt yeast broth (2% malt extract, 0.5% yeast extract) using the method described by Duong et al. (2013). To verify the identification of the isolates, the internal transcribed spacer (ITS) region and the partial β-tubulin gene (tub1 and tub2) regions were sequenced. The reference sequences were obtained from GenBank and the sequence dataset was aligned using an online version of MAFFT v. 7 (Katoh and Standley 2013). Phylogenetic analysis using maximum likelihood (ML) was performed with RAxML v. 8 (Stamatakis 2014). Branch support was calculated using 1000 bootstrap replicates. Nanopore sequencing was performed for the Western Cape isolate (CMW 37318) using the MinION sequencing device. The sequencing library was prepared using the Genomic DNA by Ligation (SQK-LSK109) protocol. The library was loaded on a MinION flow cell (R10.3) and the sequencing run was carried out for 48 h. Base calling was conducted using the ONT Guppy base calling software v. 4.0.14 (https:// commu nity. nanop orete ch. com). Nanopore reads were trimmed using Porechop (https:// github. com/ rrwick/ Porec hop). The genome was assembled using Flye v. Illumina sequencing was performed for the Eastern Cape isolate CMW 55904. The genomic DNA was submitted to Novogene (Beijing, China) for sequencing using the Illumina HiSeq 2500 platform. A paired-end library with 550 bp median insert size was generated and 250 bp paired-end reads were sequenced. Poor quality data and adapters were removed using the program Trimmomatic v. 0.36 (Bolger et al. 2014). The program SPAdes v. 3.14 (Bankevich et al. 2012) was used to assemble the genome.

Results and discussion
Phylogenetic analysis confirmed the taxonomic identity of the two isolates as I. knoxdaviesiana (Fig. 3). The assembled genome size of the Western Cape isolate (CMW 37318) was 38,985 688 bp, with an N50 of 3,955,278 bp and L50 of 4, while the Eastern Cape isolate CMW 55904 had the assembled genome size of 39,205,829 with an N50 of 339,827 bp and L50 of 37. The assembly of the CMW 37318 isolate had 12 contigs and that of CMW 55904 had 421 contigs, of which 212 were longer than 1 Kb. The GC content was 53.9% for both isolates. AUGUSTUS predicted 11,116 and 10,984 protein coding gene models for isolates CMW 37318 and CMW 55904, respectively. BUSCO analysis showed the assembled genome of isolate CMW 37318 had an 87.9% completeness score. Of the 3817 BUSCO groups searched, 42 BUSCO orthologs were fragmented and 419 BUSCO orthologs were missing. Isolate CMW 55904 had a 98.3% BUSCO completeness score. Nine BUSCO orthologs were fragmented and 55 orthologs were missing.
The results of BLASTx showed both MAT1-1 and MAT1-2 idiomorph were on the contig 4 of CMW 37318 and scaffold 39 of isolate CMW 55904, which suggests that I. knoxdaviesiana has a homothallic reproductive system (Fig. 4)

Draft genome assembly of Macrophomina pseudophaseolina strain WAC 2767, and ex-epitype strain of M. phaseolina Introduction
The genus Macrophomina includes several economically important plant pathogenic species causing damping-off, seedling blight, and stem and dry root rot (aka charcoal rot) on a broad range of broadacre, horticultural, and vegetable crops worldwide (Kaur et al. 2012;Marquez et al. 2021). Macrophomina species are soil-borne pathogens and may survive in soil or plant debris for more than four years by forming resting structures called microsclerotia. Currently, five species of Macrophomina are known, viz. M. phaseolina and M. pseudophaseolina (Sarr et al. 2014), M. euphorbiicola (Machado et al. 2019), M. vaccinii , and M. tecta ). Among these, M. phaseolina and M. pseudophaseolina (Sarr et al. 2014), have been reported on a broad range of host plants, with M. phaseolina alone reported on > 800 plant species (Farr and Rossman 2021;Sarr et al. 2014). However, the underlying molecular mechanisms allowing these two Macrophomina species to infect a wide range of plants is poorly understood.
In this study, we construct the draft genome assembly and annotation for a M. pseudophaseolina strain and provide genome assembly and annotation for the ex-epitype strain of M. phaseolina. These resources will facilitate comparative genomic studies of Macrophomina species to identify virulence-related factors, such as effectors and secondary metabolites, which will enhance future studies to better understand the evolution of Macrophomina species, host-pathogen interactions, and underlying infection mechanisms.

Nucleotide sequence accession numbers
The Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the Accession Numbers JAJJIC000000000 and JAJJID000000000 (BioProject PRJNA780220; and BioSamples SAMN23133107 and SAMN23133108). The versions described in this paper are JAJJIC010000000 and JAJJID010000000.

Materials and methods
The M. pseudophaseolina strain (WAC 2767) sequenced here was obtained from the Western Australian Plant Pathology Reference Culture Collection (WAC, Perth, WA), and grown for 7 d in potato dextrose broth (Amyl Media, Australia) at room temperature (approx. 20 °C) at 220 rpm. Genomic DNA of strain WAC 2767 was extracted using a DNeasy Plant Mini Kit (Qiagen, Australia) according to manufacturer's instructions. The DNA extracted from the ex-epitype strain of M. phaseolina (CBS 205.47) was obtained from the Westerdijk Fungal Biodiversity Institute in Utrecht, The Netherlands (formerly the CBS-KNAW Fungal Biodiversity Centre). DNA samples were quantified using a Qubit v.3.0 fluorometer (Thermo Fisher Scientific, Australia). Gel electrophoresis on a 0.8% agarose gel was used to assess DNA integrity.

Results and discussion
Illumina paired end (2 × 300 bp) sequencing of Macrophomina pseudophaseolina strain WAC 2767 resulted in a total of 15,392,968 reads, with an estimated genome size of ~ 48.46 Mb based on Jellyfish analysis, which indicated a genome coverage of 96 ×. Unicycler-based assembly resulted in a higher quality genome based on the BUSCO result of 94.5% completeness (3,551 complete and single-copy BUSCOs, 27 complete and duplicated BUSCOs; 78 fragmented BUSCOs, 130 missing BUSCOs), and was selected for further analyses and annotation. The final assembly included 3799 contigs/ scaffolds with an N50 value of 75.77 Kb and the largest contig of size 355.92 Kb. The GC content of the genome was 53.87%. BRAKER2 predicted 12,698 protein coding genes in the WAC 2767 genome.
Illumina sequencing of M. phaseolina strain CBS 205.47 resulted in a total of 27,631,588 reads. The estimated genome size based on Jellyfish was ~ 49.4 Mb, which is comparable to published genomes of M. phaseolina strains from jute, strawberry and sorghum (Burkhardt et al. 2019;Islam et al. 2012;Purushotham et al. 2020). An approximate genome coverage of 169 × was, therefore, achieved for strain CBS 205.47. Unicycler-based  Sarr et al. (2014). Alignment and tree were submitted to TreeBase (No. 28996). The tree was constructed using RAxML v.8 (Stamatakis 2014), based on the GTR substitution model with gamma-distribution rate variation for individual partitions. The tip labels in bold represent ex-type strains, and asterisks denote strains sequenced in the current study. Bootstrap support values > 80% are shown at the branches. The tree is rooted to Botryosphaeria dothidea  This is the first published genome for M. pseudophaseolina and will be valuable for future comparative studies of Macrophomina species. While multiple genome assemblies of M. phaseolina are publicly available, the genome sequence of the ex-epitype published here will serve as a reference for future phylogenetic and comparative genomics studies to further understand the biology and evolution of M. phaseolina.

Draft genome sequence of the basidiomycetous yeast Naganishia randhawae CBS 16859 isolated from avian guano in South Africa Introduction
The genus Naganishia was initially proposed by Goto (1963) to accommodate the yeast Naganishia globosus. This species was subsequently subsumed into Cryptococcus saitoi, based on ribosomal RNA (rRNA) sequence analysis, which led to the synonymization of the genus (Fonseca et al. 2000). However, the genus was re-established with the purpose of resolving the diversity and heterogeneity of the yeast genus Cryptococcus (Fell et al. 1999;Fonseca et al. 2000;Liu et al. 2015;Scorzetti et al. 2002). Naganishia belongs to the class Tremellomycetes, order Filobasidiales, and family Filobasidiaceae, and comprises 16 species. Taxa in this genus reproduce by budding and sexual reproduction has to date not been observed (Kurtzman et al. 2011;Liu et al. 2015). They produce starch-like compounds and utilize caffeic, ferulic, hydroxybenzoic acids, L-malic, p-coumaric, protocatechuic, and vanillic acids (Fotedar et al. 2018;Liu et al. 2015). Nitrate is utilized and fermentation has not been observed (Liu et al. 2015).
Species in Naganishia have a global distribution particularly in extreme terrestrial environments characterized by cold temperatures and recurrent diurnal freeze/ thaw cycles (Costello et al. 2009;Lynch et al. 2012;Pulschen et al. 2015;Schmidt et al. 2017;Solon et al. 2018). Several of these species are known to cause disease within immunocompromised individuals. Naganishia albida has been implicated in cutaneous lesions, encephalitis, keratitis, onychomycosis, and pneumonia (Burnik et al. 2007;Lee et al. 2004;Ragupathi and Reyna 2015); N. diffluens is associated with subcutaneous infections (Kantarcioǧlu et al. 2007); N. friedmannii has been confirmed as an etiologic agent of onychomycosis (Ekhtiari et al. 2017); and N. uzbekistanensis has been isolated from the bone-marrow of lymphoma patients (Powel et al. 2012).
Members of this genus also show great biotechnological value. Naganishia liquefaciens and N. adeliensis display an ability to accumulate lipids (Selvakumar et al. 2019;Selvakumar and Sivashanmugam 2018), while the draft genome of N. albida NRRLY-1402 incorporates several genes that play a role in lipid biosynthesis (Vajpeyi and Chandran 2016). Lipid production in yeast is considered an alternative feedstock for biodiesel production, thereby contributing to the fight against climate change and the development of sustainable practices (Luque et al. 2010). Similarly, the microbial production of single cell oils (SCOs) has received considerable attention in recent years. It provides a variety of advantages over the use of animal or plant sources, for example, it is not limited to climatic conditions or geographical location. Additionally, they also offer a shorter processing time and allow for a greater variety of substrate utilization, including industrial waste (Luque et al. 2010;Ward and Singh 2005).
Despite their potential economic and environmental importance, the mechanism underlying the diverse functions of Naganishia species is poorly understood. Here we report the first genome sequence of an isolate of N. randhawae CBS 16859, which was isolated from avian guano in South Africa. This genome will contribute towards an increased understanding of the biology of the genus Naganishia.

Nucleotide sequence accession number
This Whole Genome Shotgun project and internal transcribed space sequence (ITS) of Naganishia randhawae CBS 16859 has been deposited at DDBJ/ENA/GenBank under the accession JABRPJ000000000 and MT542688 respectively.

Culture conditions
Our isolate differed from the described type material (CBS 10160; Khan et al 2010) in that melanin was produced on bird seed agar (BSA; Staib and Seeliger 1966) and growth was inhibited at 37 °C. Naganishia randhawae CBS 16859 was maintained by periodic transfer on yeast peptone dextrose (YPD, pH 5.5) agar (Ausubel et al. 1989) supplemented with 0.2 g/L of chloramphenicol (Sigma) and incubated at 30 °C.

Taxonomic placement
The taxonomic placement of N. randhawae CBS 16859 was investigated by constructing a Maximum Likelihood (ML) phylogeny with the fungal ITS 1 and 2 regions (including the 5.8S rRNA gene). Genomic DNA was extracted using the Quick-DNA Fungal/Bacterial Kit (Zymo Research) as per the manufacturers' instructions. The ITS region was amplified using the primers ITS1 and ITS4 and previously described protocol (White et al. 1990). The manually curated ITS sequence of N. randhawae CBS 16859 was deposited in GenBank (Accession No. MT5452688). Comparison of the amplified ITS nucleotide sequence against the National Center for Biotechnology Information (NCBI) nucleotide database indicated that this strain belongs to N. randhawae, sharing 99% nucleotide identity with the type isolate CBS 10160.
The ITS sequences of representative isolates in this genus were obtained from the NCBI nucleotide database (https:// www. ncbi. nlm. nih. gov/ nucco re). Similarly, the ITS sequence from Filobasidium wieringae CBS1937 (AF444373.1) was included as outgroup. All sequences were aligned using the M-Coffee webserver (Moretti et al. 2007), prior to trimming of the unaligned 5′ and 3′ ends. An ML phylogeny was then constructed using PhyML-SMS with smart model selection (Guindon et al. 2010) (Lefort et al. 2017) with 1000 bootstrap replicates.

DNA isolation, genome sequencing and assembly
Genomic DNA was extracted as previously described. Library preparation and sequencing were performed using the Illumina NovaSeq 6000 platform (paired-end read approach 2 X 250 bp) by MR DNA (Texas, USA). Adapter sequences and low quality (< Q28) reads were removed using the FastQC toolkit v. 0.11.8 (Andrews 2010). The trimmed reads were de novo assembled with SPAdes v. 3.9.0 (Bankevich et al. 2012). The assembled contigs were further refined with local Blastn analysis using BioEdit v.7.0.5.3 (Hall 1999) and contig extension using the Integrated Genome Browser v. 9.0.2 (Nicol et al. 2009).

Results and discussion
The taxonomic placement of N. randhawae CBS 16859 within the genus Naganishia is illustrated in Fig. 6.
The draft genome sequence of N. randhawae CBS 16859 is comprised of 386 contigs with a total size of 20,271,596 bp and average G + C content of 51.71%. As such, it is approximately 0.58 Mb larger than the genome of N. vishiniacii ANT03-52 and 0.41 Mb smaller than that of N. albida JCM2334 (data not shown). The genome of N. randhawae CBS 16859 codes for 6,775 proteins and 168 rRNA sequences and incorporates 86.8% of the Basidiomycota BUSCO gene models (Simão et al. 2015), showing a relatively high level of completeness.
The genome of the N. randhawae CBS 16859 was annotated using eggNOG according to the COGs database (Huerta-Cepas et al. 2019). The predominant functional categories comprised of proteins with known functions included the "posttranslational modification protein turnover chaperones" (O) category with 8%; followed by the "carbohydrate transport and metabolism" (G) category with 7% (Fig. 7). Metabolic variation in yeast holds practical importance as it provides an insight regarding the growth rate, biotechnological importance, and level of pathogenicity of the species (Breunig et al. 2014;Gibney et al. 2013).
Carbohydrate-Active Enzymes (CAZymes) are responsible for the synthesis and degradation of glycoconjugates, oligo-and polysaccharides and are also active in immune and host-pathogenic interactions. Analysis of CAZymes showed that the largest number of genes in the auxiliary activity (AAs) family was encoded by N. randhawae CBS 16859 (Table 2, Lombard et al. 2014). The AAs family consists of multicopper oxidases (MCOs) that include some of the essential enzymes involved in melanin biosynthesis (Langfelder et al. 2003). Interestingly, Page 12 of 22 Wingfield et al. IMA Fungus (2022) 13:3 the genome of N. randhawae CBS 16859 encodes a protein which is identical to the MCO laccase enzyme (KEP53184.1) found in the well-known melanin producing plant pathogen Rhizoctonia solani (Shu et al. 2019). Laccase activity and pigment has been shown to contribute towards pathogenesis in N. albida and N. diffluens (Ikeda et al. 2002), by inhibiting phagocytosis by macrophages, decreasing susceptibility to killing by free radicals and increasing resistance towards antifungal agents such as amphotericin B (Nosanchuk and Casadevall 2003). Similarly, the N. randhawae CBS 16859 genome also houses three unique copies of the ERG24 gene, which encodes the enzyme Delta-(14)-sterol reductase.
Overexpression of this gene, and erg24 mutations, are associated with fungal resistance towards three classes of ergosterol inhibitors, specifically the allylamines, azoles and morpholines (Almeida-Paes et al. 2017;Li et al. 2016;van de Sande et al. 2007). Coupled with these potential pathogenicity factors, analysis of the secreted proteases using the MEROPS database reveals that metallo-and serine proteases encoding genes were among the most abundant peptidase encoding genes in the genome of N. randhawae CBS 16859 (Fig. 8). Both enzyme families are important pathogenicity factors among Naganishia spp. and other fungal dermatophytes (Monod et al. 2002;Yike 2011).
In conclusion, the genome sequence of N. randhawae CBS 16859 represents the first genome for this species and will serve as a valuable genomic resource to deepen Fig. 6 Maximum likelihood (ML) phylogeny indicating the taxonomic placement of Naganishia randhawae CBS 16859. The tree was constructed on the basis of the internal transcribed spacer region sequences using PhyML-SMS, with the best fit evolutionary model HKY85 + G (Guindon et al. 2010;Lefort et al. 2017). Bootstrap support values (n = 1000 replicates) greater than 500 are indicated at the nodes IMA GENOME-F 16F

Draft genome sequence of Pseudocercospora cruenta causing black leaf mould of cowpea Introduction
Cowpea (Vigna unguiculata) is a widely cultivated legume in tropical and subtropical regions, especially in Africa, Asia, and some parts of America. It serves as a significant source of carbohydrates, protein, minerals and vitamins for human and livestock nutrition in the tropical world (Duangsong et al. 2016;Longe 1980;Singh et al. 2003). In many countries, cowpea is grown as a major component in cropping systems because of its rapid growth, drought tolerance, and ability to fix atmospheric nitrogen (Duangsong et al. 2016;Omoigui et al. 2019). Cowpeas are planted on an estimated 14.5 mha of land per year, with a total yield of 6.2 million metric tons/year (Kebede and Bekeko 2020). India is one of the major countries contributing substantially to the cowpea production of the world. However, yields in India are significantly lower as compared to the world's average due to the unavailability of high yielding varieties and the occurrence of biotic and abiotic stresses (Raina et al. 2020).
Cowpea yield is not only lowered by the unavailability of high yielding varieties, but also by diseases caused by several pathogenic organisms, such as viruses, bacteria, and fungi. Fungal diseases are devastating for the growth, development, and yield of cowpea . About 40 fungal species have been reported to cause  Page 14 of 22 Wingfield et al. IMA Fungus (2022) 13:3 diseases associated with different cowpea varieties (Bailey et al. 1990). One such disease is 'black leaf mould' of cowpea caused by Pseudocercospora cruenta (formerly Cercospora cruenta). Pseudocercospora belongs to Mycosphaerellaceae (Capnodiales, Dothideomycetes), and several species have Mycosphaerella-like sexual morphs Hyde et al. 2013;Kirk et al. 2013). It is a cosmopolitan genus of phytopathogenic fungi associated with many plant species, including several economically relevant hosts (Bakhshi et al. 2014;Crous et al. 2013). P. cruenta represents a distinct pathogen specific to Vigna and Phaseolus species (Crous and Braun 2003;Hsieh and Goh 1990 (Kamal 2010). Black leaf mould of cowpea is prevalent in the rainy season during high moisture and warm temperatures (Heng et al. 2020). In India, the disease is so severe in September-October that farmers cannot harvest a single pod due to complete defoliation within few days of infection (Pandey 2002). The disease causes 35 to 40% yield loss in susceptible varieties (Fery et al. 1976;Schneider et al. 1976). In addition, black leaf mould incidence on cowpea limits the leaf area available for photosynthesis resulting in reduced yield (Booker and Umaharan 2007;Ekhuemelo et al. 2019). Therefore, there is a need to understand the genome organization of Pseudocercospora cruenta that could assist in identifying the virulence gene(s) that control the disease.

Nucleotide sequence accession numbers
The genome sequence of P. cruenta has been deposited in DDBJ/ENA/GenBank databases under the accession number JAASFE000000000; Bioproject PRJNA613165; Biosample SAMN14395397. The version described in this paper is version JAASFE010000000. The raw Illumina HiSeq sequence reads are deposited in NCBI-Sequence Read Archives (SRA) under accession SRX7980600. The genome annotation and data on predicted genes and effectors have been deposited in Mendeley data with DOI number https:// doi. org/ 10. 17632/ g39mv 8yp87.1. Fig. 8 Classification of peptidases, according to the MEROPS database, encoded within the Naganishia randhawae CBS 16859 genome (Rawlings et al. 2016) Page 15 of 22  13:3

Materials and methods
Identification, isolation and DNA extraction Pseudocercospora cruenta infected cowpea leaves were collected from Banaras Hindu University agricultural farm, Varanasi, India, in 2009. Typical symptoms present on the leaves were observed, photographed (Nikon D5200, Nikon, Japan), and microscopic examination was carried out by scrapping the growth of the pathogen from the infected spots. Photographs of conidia and conidiophores were taken at 20 × and 40 × resolution using NIS-Elements imaging software. The pathogen was identified by comparing the microscopic characteristics of conidia and conidiophores with the MycoBank database. P. cruenta was isolated aseptically on Potato Dextrose Agar (PDA) medium, and colony characters were studied. P. cruenta was submitted to an International Depositary Authority (IDA) recognized repository, the National Centre for Microbial Resource, National Centre for Cell Science (NCMR-NCCS), Pune, India, with an accession number MCC 9095. Liquid culture of the monoconidial isolate was grown in 30 ml Potato Dextrose Broth (PDB) medium and incubated for 7 d at 25 ± 1 °C. Fungal mycelium was harvested aseptically after 7 d, and genomic DNA was extracted using modified Cetyl Trimethyl Ammonium Bromide (CTAB) extraction protocol (Murray and Thompson 1980). Quantification of DNA was carried out using Eppendorf BioPhotometer ® D30. The amplified products were visualized on 1.5% agarose gel.
Genome sequencing, assembly, and annotation The library was prepared for sequencing on a HiSeq 2500 (Illumina) using 2 × 100 bp and 2 × 250 bp paired-end chemistry at the AgriGenome Labs (Kochi, India). Raw pairedend reads were quality-checked using FastQC (Andrews 2010). Adapters and low-quality reads with an average quality score of less than 30 were removed using Adapt-erRemoval v. 2.3.1 (Schubert et al. 2016). FastUniq v. 1.1 (Xu et al. 2012) was used to remove duplicates in paired short reads. Velvet v. 1.2.10 was used for de novo assembly (Zerbino and Birney 2008). A range of k-mers from 31 to 95 was used for Velvet assembly. Quality assessment of complete assembly statistics was performed in QUAST v. 4.6 (Gurevich et al. 2013). The quality and completeness of the assembly was assessed with Benchmarking Universal Single Copy Orthologs (BUSCO v. 2.0) using the ascomycete odb_9 dataset (Simão et al. 2015). AUGUS-TUS (Stanke and Morgenstern 2005) was used to predict protein-coding genes from the assembled genome. Using the BLASTX v. 2.6.0 tool (https:// blast. ncbi. nlm. nih. gov/) and an E-value cut-off of 10 -3 , the predicted gene functions were compared with the UniProt (The UniProt Consortium 2021) and the NCBI databases. The best BLASTX hit for each gene was chosen based on query coverage, identity, similarity score, and gene description. The anticipated genes were annotated in terms of molecular activities, cellular components, and biological processes using the UniProt and NCBI databases for gene ontology. The protein dataset was subjected to the CUPP webserver (Barrett and Lange 2019) to predict and classify Carbohydrate Active Enzymes (CAZymes).
Effector prediction and annotation To identify putative effector genes, the following pipeline was used: The predicted proteome was analyzed using SignalP v. 5.0 (Armenteros et al. 2019) to filter non-secretory proteins. Secretory protein dataset was then subjected to TMHMM v. 2.0 (Krogh et al. 2001) to eliminate proteins with one or more transmembrane helices, followed by PredGPI (Pierleoni et al. 2008) to exclude glycophosphatidylinositol (GPI) anchored proteins that likely represent surface proteins rather than secreted effectors. WoLF PSORT (Horton et al. 2007) and DeepLoc v. 1.0 (Armenteros et al. 2017) were used to eliminate proteins destined to organelles. Proteins rich in cysteine content (≥ 4 cysteine residues) were identified and were subjected to EffectorP v. 2.0 tool (Sperschneider et al. 2018) to identify the potential effectors. The predicted effectors were functionally annotated using OmicsBox (Götz et al. 2008).
Phylogenetic analysis The phylogenetic tree was constructed using the ITS region sequence extracted from the assembled genome sequence. The ITS region sequences of ex-type and reliably named strains, and sequences from Bakhshi et al. (2014) and Crous et al. (2013) were acquired from GenBank based on the closest similarity of the BLASTn search. MAFFT v. 7.0 (Katoh and Standley 2013) was used to align the sequences. MEGA v. 7.0 (Kumar et al. 2016) was used for the phylogenetic study. During the sequence alignment, gaps and missing data were removed. The phylogenetic tree was built using the Neighbour-Joining method (Saitou and Nei 1978). Bootstrap analysis was performed using 1000 repetitions to calculate the confidence levels for each branch. Bootstrap values less than 50% were not considered.

Results
Morphological and molecular identification The black leaf moulds were rusty brown to reddish, sometimes almost grey, and orbicular in shape. Symptoms were produced initially on the abaxial surface of older leaves. Mature leaves developed grey symptoms due to the superficial growth of the pathogen and its dark conidia (Fig. 9a). P. cruenta was characterized by the presence of rudimentary stromata, subhyaline to pale olivaceous brown conidiophores with 0-3 septa (Fig. 9b). Conidia were cylindric or cylindro-obclavate, subhyaline to very pale olivaceous brown, straight to mildly curved, with 3-14 septa (Fig. 9c). The cultured colony appeared dark grey to black on the PDA medium (Fig. 9d). The phylogenetic interference based on the ITS region confirmed the sequenced draft genome as the P. cruenta (Fig. 10).
Genome assembly and annotation The draft genome assembly of P. cruenta resulted in a genome of 40.39 Mb with an overall GC content of 46.6%. There were 241 contigs larger than 50 Kb. Statistics of genome sequencing and assembly are summarised in Table 3. Gene prediction analysis yielded a total of 12,606 protein-coding genes. The number of predicted genes with a significant BLASTX match (E-value < = 1e−3 and similarity score > = 40%) with UniProt was 11,741. Genome completeness analy-sis identified 284/290 (97.93%) complete BUSCOs in the database of ascomycetes. Detailed characterization of the draft genome suggested that 502 genes were predicted to encode CAZymes, including 82 enzymes involved in auxiliary activities, 13 carbohydrate esterases, 311 glycoside hydrolases, 89 glycosyl transferases, and 7 polysaccharide lyases.
Gene ontology analysis mapped 1038 terms associated with molecular functions such as ATP binding, oxidoreductase activity, metal ion binding and nucleic acid-binding; 339 terms associated with cellular components where the maximum numbers of hits were linked to integral membrane transport and 948 terms related to biological processes such as transmembrane transport, metabolic processes, DNA repair and transcription.  Wingfield et al. IMA Fungus (2022) 13:3 Prediction and annotation of effectors The genome of P. cruenta consisted of 12,606 predicted proteins. Out of those proteins, 818 were classically secreted proteins with a signal peptide. Out of 818 proteins, 86 proteins with one or more transmembrane helices and 68 proteins with GPI anchored motifs were eliminated. Out of the 664 proteins, WoLF PSORT and DeepLoc identified 456 extracellular and cytoplasmic proteins, out of which 93 proteins were predicted to be effector. Functional annotation of the predicted effectors identified 44 (47%) proteins with hypothetical function. The remaining proteins identified were associated with biochemical functions like degradation of large molecules like hydrolases and peptidases, metabolic processes, metal ion binding, transcriptional co-activation, biosynthetic processes, necrosis induction, and proteolysis.

Discussion
Pseudocercospora cruenta has a broad host range resulting in infections being carried over to the next growing season (Omoigui et al. 2019). Therefore, black leaf mould management is difficult due to inoculum presence from multiple hosts. Furthermore, continuous use of fungicides for pathogen management causes harmful effects  Genes assigned to GO terms 2,325 Complete BUSCOs (C) 284 Complete and single-copy BUSCOs (S) 283 Complete and duplicated BUSCOs (D) 1 Fragmented BUSCOs (F) 1 Missing BUSCOs (M) 5 Total BUSCO groups searched 290