1. INTRODUCTION
The family Platycephalidae, commonly known as flatheads, comprises 86 species within 17 genera worldwide (Fricke et al. 2024). Among these, 8 species across 6 genera are known to inhabit the Korean Peninsula (Yoon et al. 2022). While a considerable number of species have been identified globally, only 11 species of Platycephalidae larvae have been reported in the north-western Pacific. These include one species each within the genera Inegocia, Rogadius, and Suggrundus, two species within the genus Onigocia, and one unidentified species within the genus Platycephalus, along with five unidentified species (Okiyama 2014). The lower number of reported larval species compared to adults may be attributed to the challenges in identifying larvae due to their morphological similarity across species during early development stages (Okiyama 2014).
Fish species identification has traditionally relied on the morphological characteristics of adults (Yoon et al. 2022). However, in fish larvae, morphological traits undergo rapid changes during growth, making it necessary to use developmental stage-specific morphological changes and inter-species differences as indicators for species identification (Kendall et al. 1984;Leis and Carson-Ewart 2000;Richards 2005;Okiyama 2014). The lack of such developmental stage-specific information, combined with physical damage and protein degradation during specimen collection, presents significant challenges for morphology-based species identification. This challenge is particularly pronounced in the early larval stages, where the morphological similarity between species is high, making species-level identification based solely on morphological traits exceedingly difficult.
To address these challenges, DNA barcoding has increasingly gained attention as a powerful tool for species identification and genetic phylogeny inference, utilizing common mitochondrial DNA barcode regions such as MT-CO1, MT-CYB, MT-RNR1, and MT-RNR2 (Kim et al. 2008a;Kim et al. 2008b;Wang et al. 2023). This technique enables species-level identification even for fish eggs, which are otherwise difficult to identify based on morphological traits alone (Aoyama et al. 1999;Akimoto et al. 2002;Rodriguez-Graña et al. 2004;Kim et al. 2008a;Kim et al. 2008b). For sequence analysis, Sanger sequencing is employed for single samples (Kim et al. 2008a;Kim et al. 2008b;Ko et al. 2013).
Samples collected in the wild using a zooplankton net are typically a mixture of various biological groups (Kim et al. 2021;Song et al. 2021;Choi et al. 2024;Song et al. 2024). During this process, physical contact between the net and the organisms, or among the organisms themselves, can cause damage to the samples, leading to cross-contamination between different species. In particular, individual samples collected may already be contaminated with various organisms. Such contamination issues present significant challenges when analyzing a single specimen using DNA metabarcoding. For example, in the analysis of the diet of small marine animals such as copepods, considerable effort is required to remove contaminants from the chitinous surfaces of the samples (Yeh et al. 2020). In contrast, fish larvae lack developed scales, making them highly vulnerable to external physical damage, which increases the likelihood of contamination by other organisms during the net collection process or subsequent experiments for species identification. For this reason, DNA metabarcoding using amplicon sequencing may be more advantageous than Sanger sequencing in samples with a high risk of contamination.
DNA metabarcoding, using amplicon or wholegenome sequencing, is particularly effective for analyzing mixed samples of fish eggs or larvae, allowing for the determination of species composition (Kimmerling et al. 2018;Duke and Burton 2020;Alshari et al. 2021;Ratcliffe et al. 2021;Kim et al. 2023). This methodology is also valuable for identifying individual larvae and analyzing remnants of organisms ingested by micro marine predators, further enhancing species identification capabilities including eDNA samples (Song et al. 2012;Park et al. 2022;Choi et al. 2023;Choi and Kim 2023;Choi et al. 2024). Species identification of larval fish samples mixed with other species inevitably involves physical contact between these samples and potential contamination from the surrounding environment during the process of separation and microscopic observation. In this context, DNA metabarcoding is expected to offer significant advantages for identifying species in research samples with a high risk of contamination.
In this study, we applied amplicon sequencing using high-throughput sequencing (HTS) techniques to a larval specimen presumed to be an unidentified species of the family Platycephalidae, collected off the coast of the Korean Peninsula. Through the analysis of extensive sequence data, we successfully identified the species and present our findings in this paper.
2. MATERIALS AND METHODS
2.1. Sample collection and initial processing
In this study, a zooplankton sample was collected on August 23, 2019, from station W58 (35.91°N, 126.34°E), in the coastal waters of the western Korean Peninsula, using a larval fish net with an 80 cm mouth diameter and 300 μm mesh size. The collected sample was preserved in 95% ethanol solution and then transported to the laboratory. In the laboratory, a dissection microscope (Stemi 2000-C; Zeiss, Germany) was used to observe and isolate the larvae.
2.2. Morphological observation and measurements
Among the extracted larvae, a specimen (AYL19_ 113) with a body shape similar to that of the family Platycephalidae was selected for further analysis. This specimen was photographed using a digital camera (EOS-1D X Mark II; Canon, Japan) attached to the dissection microscope (10×), and measurements were taken to the nearest 0.1 mm. The names of body part, counts, and morphometric traits of the fish larva were recorded following Okiyama (2014).
2.3. Genomic DNA extraction and library preparation
Genomic DNA (gDNA) was extracted from the entire body of the selected larval fish specimen. Ethanol was first removed, and the specimen was washed twice with phosphate-buffered saline. gDNA extraction was performed using the TIANamp Marine Animals DNA Kit (Tiangen, China), following the manufacturer’s protocol. The extracted gDNA was used then to create a library for MT-CO1 DNA metabarcoding. Library preparation involved a two-step PCR process, following the method of Amplicon et al. (2013). The primers used in the first PCR (mlCOIintF, jgHCO2198; Geller et al. 2013;Leray et al. 2013), designed for marine invertebrates and applicable to all-taxa biotic surveys, included adapter sequences required for MiSeq (Illumina, USA) library preparation. The second PCR primers included index sequences to differentiate the first PCR products (Amplicon et al. 2013;Choi et al. 2022).
The conditions for the first PCR were as follows: an initial denaturation at 95°C for 3 minutes, followed by 40 cycles of denaturation at 95°C for 30 seconds, annealing at 46°C for 1 minute, and extension at 72°C for 1 minute. A final extension was performed at 72°C for 10 minutes, after which was held at 4°C. The conditions for the second PCR were: an initial denaturation at 95°C for 3 minutes, followed by 8 cycles of denaturation at 95°C for 30 seconds, annealing at 55°C for 1 minute, and extension at 72°C for 1 minute. A final extension was performed at 72°C for 10 minutes.
2.4. Amplicon sequencing and ASV analysis
The second PCR products were purified, pooled in equal concentrations, and sequenced using the MiSeq platform (Illumina, USA). MiSeq sequencing generated a total of 87,868 MT-CO1 reads, each 301 bp in length (accession number: SRX26025590). Representative sequences, referred to as Amplicon Sequence Variants (ASVs), were derived for species identification by comparing the MT-CO1 reads against a custom Animal Kingdom MT-CO1 reference library using QIIME 2 (ver. 2023.5; Bolyen et al. 2019). The identified ASVs were subsequently subjected to a BLAST search (Johnson et al. 2008) to confirm the sequences of closely related species for accurate species identification.
2.5. Construction of the MT-CO1 reference library
The MT-CO1 reference library was constructed using 602,592 MT-CO1 sequences from the MetaZooGene Mode-C database (Bucklin et al. 2021). A total of 561,518 sequences were extracted after removing non-Animal Kingdom taxa, including 692 Protozoa, 19,990 Chromista, 19,434 Plantae, and 954 Bacteria sequences from the Mode-C MT-CO1 dataset. These extracted sequences were mapped to the MT-CO1 sequence of Abyssogena phaseoliformis (AP014557) using the mapper tool in Geneious 11.1.5 with the settings at “Highest Sensitivity/ Slow” settings. During this process, some MT-CO1 sequences in the 3ʹ-5ʹ direction were converted to the 5ʹ-3ʹ direction.
A total of 536,470 contig sequences were mapped to the MT-CO1 sequence of Abyssogena phaseoliformis, representing 95.5% of the Animal Kingdom sequences. Flanking sequences outside the mlCOIintF and jgHCO 2198 primer regions were trimmed from the mapped contig sequences using Geneious 11.1.5 for further analysis. The trimmed sequences ranged in length from 10 to 678 bp, with an average length of 317±54 bp. Of these, 225,759 unique sequences longer than 250 bp were extracted using Geneious 11.1.5.
The unique sequences were filtered based on the presence of 11 degenerate bases (R, Y, M, K, S, W, V, H, B, D, N), resulting in the final extraction of 221,194 unique sequences without degenerate bases, each at least 250 bp in length. The lengths of these sequences ranged from 250 to 678 bp, with an average length of 330±26 bp. Corresponding taxonomic hierarchy data were extracted from Mode-C in mothur-formatted files and integrated into the QIIME 2 pipeline to align with the MT-CO1 reference sequences for further analysis.
2.6. Validation using fish-specific primers and MinION MK1C sequencing
To validate the accuracy of the DNA metabarcoding results obtained using the MiSeq platform, the gDNA of the larval fish specimen (AYL19_113) was amplified via PCR using fish-specific primers FishF2_t1 and FishR2_t1 (Ward et al. 2005). The PCR conditions were as follows: an initial denaturation at 94°C for 3 minutes, followed by 35 cycles of denaturation at 94°C for 30 seconds, annealing at 52°C for 40 seconds, and extension at 72°C for 1 minute. A final extension was performed at 72°C for 7 minutes, followed by a held at 4°C. The amplified PCR products were prepared for sequencing using the Ligation Sequencing Kit with native barcoding (SQK-LSK109 with EXP-NBD104 and EXP-NBD 114; Oxford Nanopore Technologies, UK) and analyzed on a MinION MK1C (Oxford Nanopore Technologies, UK) device. This process produced a total of 394,218 reads with an average length of 808±170 bp (accession number: SRX26025591). The read sequences were aligned with four ASV sequences produced through the QIIME 2 pipeline (version 2023.5; Bolyen et al. 2019) using the Mapper tool in Geneious (ver. 11.1.5) to create consensus sequences.
2.7. Phylogenetic analysis and species identification
To identify the species, the representative consensus sequences obtained from the MinION MK1C were compared with genetically similar sequences retrieved from NCBI/GenBank (Clark et al. 2016). Sequences were aligned using CLUSTAL W (Thompson et al. 1994). Genetic distances between and within species were calculated, and a maximum-likelihood tree was constructed using the GTR+G+I model in MEGA 11 (Nei and Kumar 2000;Tamura et al. 2021). Based on these analyses, the species of the larval fish specimen was conclusively determined.
3. RESULTS AND DISCUSSION
The larval fish specimen analyzed in this study measured 10.0 mm in standard length (SL) and exhibited distinctive morphological features, including a large head with rounded eyes, a somewhat flattened head, and a slightly laterally compressed body and tail. The fin formula was as follows: D I-VIII - 11; A 11; P 20; V I, 5. The first and second dorsal fins were separated, and the posterior edge of the preopercle featured five sharp spines, with the uppermost spine being the most prominent. Additionally, supraorbital, supraoccipital, and parietal spines were observed (Fig. 1).
Melanophores were distributed from the snout to the areas between the dorsal and anal fins, as well as on the pectoral fins and the first dorsal fin. They were absent in front of the first dorsal fin, below the pectoral fins, and on the second dorsal fin, anal fin, abdomen, and caudal peduncle. However, faint melanophores were observed at the base and on the lower part of the caudal fin, with a single melanophore located on the isthmus, in front of the base of the pelvic fins (Fig. 1).
This specimen closely resembled Platycephalidae sp. 5 in terms of meristic and morphological characteristics, as well as melanophore distribution patterns, particularly regarding eye shape and the distribution of melanophores. Compared to morphologically similar species, such as Sebastes hubbsi, S. oblongus, S. schlegelii, and Platycephalus sp. 2, the study specimen exhibited a strong resemblance to Platycephalidae sp. 5 (Okiyama 2014), suggesting it likely belongs to the same species.
In this study, two types of amplicon sequences were obtained from the gDNA of the larval fish specimen using the MiSeq and MinION MK1C platforms. The MiSeq analysis produced 43,934 paired-read sequences (a total of 87,868 single reads, each 301 bp in length) amplified using PCR primers (Leray et al. 2013) designed for metabarcoding metazoan diversity and fish gut contents. These sequences were processed through QIIME 2, resulting in four ASV sequences (Table 1). Only 1.2% of the total paired-read sequences were valid and used to generate the ASVs. The ASVs were identified as belonging to three fish species (ASV_1: Cynoglossus abbreviatus, ASV_3: Scomberomorus niphonius, ASV_4: Cociella crocodilus) and one copepod species (ASV_2: Paracalanus parvus).
ASV_1, identified as Cynoglossus abbreviatus, accounted for 85.2% of the valid reads. However, the fish larva of this species (11.2 mm SL; Okiyama 2014) did not match the morphological characteristics of the specimen in this study (Fig. 1). Similarly, ASV_3 (Scomberomorus niphonius, 11.4 mm SL; Okiyama 2014) and ASV_2 (Paracalanus parvus) also exhibited clear distinct morphological differences from the study specimen (Fig. 1).
In contrast, although limited morphological information is available for Cociella crocodilus (ASV_4), it demonstrated a high degree of morphological similarity to Platycephalidae sp. 5 (Okiyama 2014). The discrepancy between the MiSeq-based ASV results and the morphological analysis may be attributed to PCR errors or the low quality of the gDNA. The fact that only 1.2% of the total paired-read sequences were valid and used to generate the ASVs further indicates that these discrepancies may have impacted the accuracy of species identification (Table 1).
Moreover, larval fish specimens are highly susceptible to cross-contamination during the selection and storage of individual samples from zooplankton, as well as during microscopic observation. DNA metabarcoding analyses of such samples are therefore prone to contamination, potentially reflecting sequences from other samples. In such cases, morphological information plays a crucial role in ensuring the accuracy of species identification.
Further analysis using the MinION MK1C platform yielded consensus sequences for two fish species: Cociella crocodilus and Scomberomorus niphonius. A total of 242,964 reads, representing 61.6% of the total raw sequences, were used to generate these consensus sequences. Of these, 99.98% mapped to Cociella crocodilus (MZ895550), providing strong support its identification (Table 1).
While this percentage is significantly higher compared to the MiSeq AVS results, it also highlights the possibility for cross-contamination with other species in the samples used for gDNA extraction. Similar results have been reported in the species identification of individual small marine animals and the analysis of their prey organisms (Choi and Kim 2023).
In DNA metabarcoding, ASV sequences generated by QIIME 2 or PhyloOTU sequences from mothur are used to identify species based on their genetic distance from a reference library (Schloss et al. 2009;Bolyen et al. 2019). For this study, we constructed an MT-CO1 reference sequence library using MZGdb, a database specialized for marine animals (Bucklin et al. 2021). This library was instrumental in identifying candidate species based on ASV sequences from the MiSeq data.
To further validate the identification, we conducted a Maximum Likelihood (ML) tree analysis comparing the MT-CO1 barcode sequences of species closely related to Cociella crocodilus and examined their phylogenetic relationships. The analysis revealed that Cociella crocodilus sequences formed distinct clades, with genetic similarity within these clades exceeding 97%. Specifically, the CC clade, which included sequence PQ013178, consisted exclusively of Cociella crocodilus, exhibiting a genetic similarity of 99.58±0.27%. The presence of a sequence (MT985364; Zhao et al. 2021) verified by morphological analysis within this clade, further supports the identification of the study specimen as Cociella crocodilus (Fig. 2).
However, we also identified potential misidentifications in other sequences, such as KU94339 and MT02 1463, which grouped with Inegocia guttata and Kumococius rodericensis, respectively. These findings highlight the critical importance of ensuring reference sequence accuracy in DNA barcode analysis. Additionally, sequences JX488156 and K777581, identified as Thysanophrys celebica, formed the TC1 clade, whereas sequences JQ349911 and KP267636, identified as Cociella crocodilus, grouped within the TC2 clade alongside Thysanophrys celebica (JX488201). These results suggest that JQ349911 and KP267636 may also have been misidentified.
Based on the phylogenetic relationships observed in the ML-tree and the morphological traits consistent with Platycephalidae sp. 5, we conclude that the larval fish specimen (Fig. 1) in this study is highly likely to be Cociella crocodilus. This conclusion is further supported by Zhao et al. (2021), who conducted comprehensive analyses of reference sequences and detailed morphological examinations, providing additional evidence for the accurate identification of this species. Furthermore, the possibility for cross-contamination during the sampling process highlights the value of amplicon sequencing as a powerful analytical tool for species identification in single specimens of marine animals.
In this study, species identification of a Cociella crocodilus larva was achieved by comparing MiSeq analysis results with morphological traits observed from the detected species list. To verify the accuracy of species identification, a re-analysis was performed using the MinION MK1C. Fish-specific PCR primers (FishF2_t1 and FishR2_t1; Ward et al. 2005) were applied, and the annealing temperature for PCR was increased from 46°C (used in MiSeq) to 55°C. Due to these differing experimental conditions, direct comparisons of species identification accuracy across the two NGS platforms are challenging.
NGS equipment was selected for the analysis of a single larval sample to mitigate the high risk of crosscontamination during larval morphological examination. This study underscores the importance of incorporating species-specific larval morphological information to enhance species identification accuracy in DNA metabarcoding using NGS. Generally, the MiSeq excels in producing short reads with 99.9% accuracy, while the portable MinION MK1C is better suited for generating long reads with approximately 95% accuracy.
This study provides valuable insights into the early life stages of Cociella crocodilus, marking the first identification of this species at the larval stage. Accurate identification of larvae, such as this, provides essential information for locating nursery grounds and indirectly identifying potential spawning areas. Supporting evidence includes a weak signal of C. crocodilus detected in the amplicon sequences (accession number: SAMN445 10075) from mixed fish eggs collected in nearby waters (35.40°N, 126.25°E) on June 1, 2021, where a C. crocodilus larva (Fig. 1; 35.91°N, 126.34°E) was observed. Furthermore, understanding the morphological and genetic characteristics of fish larvae contributes to biodiversity monitoring and conservation efforts within the coastal ecosystems of the western Korean Peninsula.