Assembly and annotation of chloroplast genomes

The assembly resulted in a complete sequence of the cp genome of C. hirtinoda with a length of 139,561 bp (Fig. 1), consisting of 83.166 bp large single-copy region, 20.811 bp small single-copy regions, and two IR regions of 21,792 bp, comprising the typical quadripartite structure of terrestrial plants. The cp genome of C. hirtinoda was annotated with 130 genes, including 85 protein-coding genes, 37 tRNA genes, and 8 rRNA genes (Table 1). Most of the 15 genes of C. hirtinoda cp contains introns. Of these, 13 genes contain an intron (atpF, ndhA, ndhB, petB, petD, rpl2, rpl16, rps16, trnA-UGC, trnI-GAU, trnK-UUU, trnL-UAA, trnV-UAC) and only the gene cyf3 comprises two introns, and the gene clpP the intron was removed (Supplementary Table S1). The rps12 the gene contained two copies, and the three exons were spliced ​​into a trans-splicing gene18.

Figure 1

Chloroplast Genome Map C. hirtinoda. Different colors represent different groups of functional genes. Genes outside the circle indicate counterclockwise transcription and genes inside clockwise transcription. The thick black line on the outer circle represents the two IR regions. The GC content is the dark gray area inside the ring.

Table 1 Summary of the chloroplast genome of C. hirtinoda.

The accD, ycf1, and ycf2 genes were missing in the cp genome of C. hirtinodaand introns in genes clpP and rpoC1 were lost. This phenomenon is consistent with previous systematic evolutionary studies of the genome structure of plants in the family Poaceae.19. The phenomenon of missing genes is reported in other plants20,21,22,23.

The total GC content in the C. hirtinoda cp was 38.90%, and the content of each of the four bases, A, T, G, and C, was 30.63%, 30.46%, 19.57%, and 19.33%, respectively (Table 2 ). The LSC region (36.98%) and the SSC region (33.21%) showed much lower values ​​than the IR region (44.23%), indicating a non-uniform base content distribution in the cp genome, probably due to four rRNAs in the IR region, which in turn increases the GC content in the IR region. These values ​​were similar to previously reported cp genome results for some Poaceae plants.24.25.

Table 2 Basic composition in the C. hirtinoda choloroplast genome.

Repeat sequences and codon analysis

SSR consists of 10 bp long base repeats and is widely used to explore phylogenetic evolution and analysis of genetic diversity26,27,28,29.

A total of 48 SSRs were detected in C. hirtinoda(Fig. 2A). In terms of SSR distribution, the majority (79%) of SSRs (38) were observed in the LSC region, while 6 SSRs in the IR region (13%) and 4 SSRs in the SSC region (8 %) were discovered (Fig. .2B). Previous research suggests that the distribution of SSR numbers in each region and differences between locations in GC content are related to expansion or contraction of the IR boundary.30.

Figure 2
Figure 2

Analysis of simple sequence repetitions in C. hirtinoda genome cp. (A) The percentage distribution of 45 SSRs in the LSC, SSC and IR regions. (B).

The REPuter program revealed that the cp genome of C. hirtinoda was identified with 61 repeats, consisting of 15 palindromic, 19 direct, and no inverse and complementary repeats (Fig. 3). We noticed that repeating the analyzes of three chimonobambusa species in the genus had 61-65 repeats, with a single setback in C. hejiangensis. Most repeat lengths were between 30 and 100 bp, and repeat sequences were located in the IR or LSC region31 (Supplementary Table S2).

picture 3
picture 3

Information on the chloroplast genome repeats of chimonobambusa species of the genus.

We have identified 20,180 codons in the coding region of C. hirtinoda (Fig. 4, Supplementary Table S3). The AUU codon of Ile was the most used, and the TER of UAG was the least used codon (817 and 19), excluding termination codons. Leu was the most encoded amino acid (2,170) and the TER was the lowest (85). Relative Synonymous Codon Usage (RSCU) value greater than 1.0 means a codon is used more frequently32. RSCU values ​​for 31 codons exceeded 1 in the C. hirtinoda cp, and of these, the third most frequent codon was A/U with 29 (93.55%), and the frequency of the AUG and UGG start codons used showed no bias (RSCU = 1).

Figure 4
number 4

Frequencies of amino acids in C. hirtinoda protein coding sequences of the cp genome. The column charts show the number of amino acid codes and the broken line shows the proportion of amino acid codes.

Comparative analysis of genome structure

The nucleotide variability (Pi) values ​​of the three cp genomes discovered in the chimonobambusa species within the genus ranged from 0 to 0.021 with a mean value of 0.000544, as demonstrated by DnaSP 5.10 software analysis. Five peaks were observed in the two single-copy regions, and the highest peak was present in the trnT-trnE-trnY region of the LSC region (Fig. 5). The Pi value for LSC and SSC is significantly higher than that of the IR region. In the IR region, very different sequences were not observed, a highly conserved region. The sequences of these highly variable regions are reported in other plants during examinations for species identification, phylogenetic analysis, and population genetics research.33,34,35.

Figure 5
number 5

Sliding window analysis of chimonobambusa complete chloroplast genome sequences of the genus. X-axis: position of the middle of a window, Y-axis: nucleotide diversity of each window.

Structural information for complete cp genomes among three chimonobambusa Species within the genus revealed sequences in most regions to be conserved (Fig. 6). The LSC and SSC regions show a remarkable degree of variation, greater than the IR region, and the non-coding region shows higher variability than the coding region. In noncoding areas, 7–9k, 28–30k, 36k, and other gene loci differed significantly. Genoa rpoC2, rps19, ndhJ and other regions differ in the protein coding region. However, the agreement between the tRNA and rRNA regions is 100%. A similar phenomenon has also been reported by other36.

Figure 6
number 6

Genome alignment visualization of the genome sequences of three chloroplast species using Chimonobambusa hejiangensis as a reference. The vertical scale indicates percent identity, ranging from 50 to 100%. The horizontal axis shows the coordinates in the cp genome. These are colors that represent protein coding, intron, mRNA, and conserved non-coding sequence, respectively.

IR contraction and expansion in the chloroplast genome

Due to the unique circular structure of the cp genome, there are four junctions between the LSC/IRB/SSC/IRA regions. During the evolution of species, the stability of the sequences of the two IR regions has been ensured by the expansion and contraction of the IR region of the chloroplast genome to some extent, and this adjustment is the main reason for the variation the length of the chloroplast genome.37.38.

The variations at the IR/SC limits in the three chimonobambusa the genomes of the chloroplast genus were very similar in organization, gene content, and gene order. The size of the IR ranges from 21,797 bp (C. tumidissinoda) to 21,835 bp (C. hejiangensis). The ndhH The gene spans the SSC/IRa boundary, and this gene spanned 181 to 224 bp in the IRa region for all three chimonobambusa gender. The gene rps19 was extended from the IRb to the LSC region with a gap of 31-35 bp. The rpl12 The gene was located in the LSC region of all genomes, ranged 35-36 bp outside the LSC/IRb (Fig. 7).

Picture 7
number 7

Comparison of LSC, SSC and IR boundaries of chloroplast genomes among the three chimonobambusa species. The LSC, SSC and IRs regions are represented by different colors. JLB, JSB, JSA and JLA respectively represent the connection sites between the corresponding regions of the genome. Genes are represented by boxes.

Three chloroplast genomes of the chimonobambusa gender were compared using the purple alignment. The results showed that all the sequences show a perfect conservation of synteny without inversion or rearrangement (Fig. 8).

Picture 8
figure 8

The chloroplast genomes of three chimonobambusa species rearranged by MAUVE software. Locally collinear blocks (LCB) are represented by blocks of the same color connected by lines. The vertical line indicates the degree of conservatism between the positions. The small red bar represents rRNA.

Phylogenetic analysis

We performed a phylogenetic analysis using the complete chloroplast genomes and matK gene reflecting the phylogenetic position of C. hirtinoda. Maximum likelihood (ML) analysis based on complete chloroplast genomes indicated seven nodes with all-branch support (100% bootstrap value). However, the three chimonobambusa the genera showed a moderate relationship due to fewer samples used, confirming that C. hirtinoda is closely related to C. tumidissinoda with a bootstrap value 62% higher than C. hejiangensis. A phylogenetic tree based on matK gene revealed that chimonobambusa the species grouped in a branch corresponded to the phylogenetic tree constructed by the complete cp genome tree (Fig. 9). The results show that the entire chloroplast genome identified related species better than the first, consistent with the previous study39.

Figure 9
number 9

Maximum likelihood phylogenetic tree based on complete chloroplast genomes (A) and matK embarrassed (B).

About The Author

Related Posts