The Qarhan Salt Lake is the second largest salt lake in the world and contains a rich and unique range of extremophiles requiring in-depth exploration. Halophilic microorganisms are promising resources for biotechnology due to their flexibility and survivability. The present study first isolated a novel strain of Halobacillus trueperi S61 from the Qarhan Salt Lake, then whole-genome sequencing and comparative genomics using third-generation PacBio combined with second-generation Illumina technology were performed. The whole genome of H. trueperi S61 identified 57,549 reads and consists of a complete circular chromosome of 4,047,887 bp with 43.86% genetic compound (GC) content and no gaps. A total of 139 non-coding ribonucleic acids (RNA) (including 86 tRNA, 30 rRNA, and 23 sRNA),16 gene islands with 260, 275 bp, and two prophages (with 82,682 in length) were predicted. The whole genome of H. trueperi S61 was annotated with 3,982 protein-coding genes using the Nr, Swissport, KOG, and KEGG databases for 3,980, 3,667, 2,998, and 2,303 genes. In addition, 561 carbohydrate enzymes and 4,416 pathogen–host interaction-related genes were identified. The protein function of H. trueperi S61 was focused on biological processes with distribution in gene transcription and amino acids as well as carbohydrate metabolism. The novel strain of H. trueperi S61 isolated from the Qarhan Salt Lake primarily preferred protein biological processes and antibiotic resistance, providing a potential resource for biotechnology.

  • The whole genome of Halobacillus trueperi S61 isolated from the Qarhan Salt Lake was identified.

  • The Halobacillus trueperi S61 predicted 3982 nucleotides 3567510 in length and 44.57% GC content.

  • The Halobacillus trueperi S61 summarized basic annotation for 3982 protein-coding genes.

  • The Halobacillus trueperi S61 preferred protein biological processes and antibiotic resistance.

Graphical Abstract

Graphical Abstract
Graphical Abstract

The marine is an enriched pool of resources and contains numerous halotolerant or psychrophilic microorganisms that inevitably evolve physiological and genomic adaptations to extreme conditions. Among them, the flexibility and survivability of Halophilic microbes is a valuable property and prospect in biotechnology (Poli et al. 2017; Hong et al. 2019; Zhang et al. 2022). For instance, Halobacillus members are an important source of the halotolerant extracellular enzymes for industrial production, and Halobacillus trueperi RSK CAS9 was optimized for lipase production in the marine fish industry (Sathishkumar et al. 2015; Treves et al. 2018; Park et al. 2020).

Noteworthy, H. trueperi is moderately halophilic with a concentration of 0.5–2.5 mol·L−1, aerobic, and heterotrophic and it was first taken from the Great Salt Lake (Utah) (Spring et al. 1996). Since then, the H. trueperi has attracted more attention and researchers have carried out more studies. Lu et al. (2004) isolated H. trueperi from the saltwater in the western Himalayas and reported that H. trueperi DSM10404 was able to accumulate glycine, glutamate, and betaine as salt-tolerant compatible solutes. Gupta et al. (2019) isolated H. trueperi SS1 from Lunsu saltwater and Kharangate-Lad & Bhosle (2016) isolated H. trueperi MXM-16 from mangrove plant litter which is capable of producing hydroxamate siderophore and carotenoid pigments to chelate iron. Rivadeneyra et al. (2004) isolated H. trueperi ATCC 700077 from the solid and liquid salinities in the southern Sahara region of Tunisia and reported it as a major ecosystem-adaptive microorganism. Although several halophilic bacteria have been widely reported, their unique features are present in different natural environments. Importantly, the Qarhan Salt Lake is second-largest salt in the world and the largest in China (Shen et al. 2022) and contains rich and unique halophilic microbial resources that require in-depth exploration with broad prospects (Li et al. 2020).

With the development of biotechnology, researchers have used emerging techniques to identify organisms (Shaikh et al. 2020). Currently, whole-genome sequencing is used as a novel and culture-independent technique to explore the genetic diversity and evolutionary history of microorganisms, and various genomic projects have been performed (Thirugnanasambandam et al. 2017; Edward et al. 2018; Xu et al. 2020; Zhang et al. 2020; Chen et al. 2021; Wang et al. 2021a, 2021b). Although the second generation of Illumina sequencing has large sequencing throughput, high sequencing accuracy, and low cost, its read length is relatively short (150–400 bp). The third-generation single-molecule real-time sequencing Pacbio is advantageous in ultra-long sequencing read length (average of 10–12 kb), high throughput (5–10 Gb data), no GC bias, and direct detection of various types of DNA methylation. The latest advanced sequencing technologies will provide novel insights into the metabolic profiles of microorganisms (Buermans & den Dunnen 2014; Kang et al. 2015; Williamson et al. 2016).

Based on a novel strain first time isolated from the Qarhan Salt Lake and identified as H. trueperi S61 (Shen et al. 2022), this study performed whole-genome sequencing and comparative genomics by combining the advantages of third-generation PacBio and second-generation Illumina technology. The obtained active secondary metabolites with complete and accurate genome assembly favorable to understanding the genome properties of H. trueperi S61 contribute to genetics and potential biocontrol technique applications.

Sample collection and pretreatment as well as microbe isolation

Fresh water and soil were collected from the Qarhan Salt Lake in Qinghai, Tibet Plateau, China (36°18′–36°45′N, 99°02′E). Fifteen sampling points were collected according to the five-point sampling method. All samples were kept in portable freezers and transported to the laboratory for pretreatment. The water samples were pretreated by mixing them and filtered through a 0.22 aperture filter, and bacteria were enriched on the filter membrane under sterile conditions. When the water was 30 ml, the filter membrane was removed and placed in a glass test tube containing 3 ml of sterile seawater, which is 10−1 water sample. Pretreatment of an appropriate amount of soil samples was taken and the samples, air-dried, and treated at 120 °C for 1 h. Then 10 g of treated soil sample was weighed and 90 ml of sterile seawater was added, placed in a sterilized triangular flask with a glass sphere, and fully shaken for 30 min, then the supernatant was absorbed. The supernatant was 10−1 soil sample. The above 10−1 samples were diluted to 10−2 and 10−3 times in succession and 150 μl gradient samples were taken and coated on an ATCC213 medium plate (10 g MgSO4·7H2O, 0.2 g CaCl2·2H2O, 2.5 g peptone, 10 g yeast extract, 5 g KCl, 30 g NaCl, and 12 g agar powder, 1,000 ml of distilled water was added and the pH of distilled water was adjusted around 7.2–7.4) with three repetitions. After placing the coated plate upside down in the incubator, it was cultivated at 28 and 37 °C, respectively. After observing colonies, single colonies were picked up and purified to obtain purified strains.

Molecular identification and genome analysis of H. trueperi S61

H. trueperi S61 was isolated and purified by dilution-plate enrichment culture. Morphologically, it was Gram-positive, spherical, capsular, and peritrichous flagella with a size of 0.6–0.8 × 0.4–0.6 μm. Molecular identification was performed from DNA extraction following the instructions of Sangon Biotech Co., Ltd column bacterial DNA extraction kit procedures (Shanghai, China). The bacterial universal primers were F27 and P1541 (5′-AGAGTTTGATCCTGGCTCAGG-3′ and 5′-AAGGAGGTGGTGATGCCGCA-3′). The reaction conditions were 94 °C denaturations for 45 s, 50 °C annealing for 45 s, 72 extensions for 75 s, and a 50 μl reaction system for 30 cycles. The PCR products were detected by agarose gel electrophoresis, and the sequence results were obtained by cloning and then sequence was uploaded to https://www.ezbiocloud.net for comparison. Finally, the matched genus was determined and named H. trueperi S61 (preservation number GDMCC No: 60078).

The genome sequencing was carried out by third-generation PacBio and second-generation Illumina with the assistance of Genedenovo Biotechnology Co., Ltd (Guangzhou, China). First, DNA extraction and quality control were obtained by concentration and electrophoresis tests, respectively. Then, RS II and Sequel from Pacific Biosciences were used for single-molecule real-time-based amplification. After the constructed library, Qubit was used to perform quality detection and Agilent 2100 was used to evaluate insert sizes and then PacBio sequencing was performed. At the same time, Illumina sequencing was performed using Hiseq ×10 after library construction and detection.

Genome assembly and function annotation

Genome assembly was performed using third-generation sequencing data, followed by second-generation data to correct the assembly results. Genome component analysis and functional annotation were performed based on the corrected assembly results. First, to perform quality control on the sequencing data the raw data from Pacbio and Illumina sequencing were filtered to obtain clean data. Then, the genome was assembled by using Falcon to splice and assembling third-generation sequencing reads to calculate the coverage and GC distribution. According to the assembled genome sequence and the predicted results of the encoded genes, genome circle diagram was drawn to display the features of the genome comprehensively.

Followed by genome component analysis, using National Center for Biotechnology Information Search database for encoding gene prediction, RNAmmer, tRNAscan, and cmscan to compare the Rfam database for non-coding ribosomal RNA (rRNA), transfer RNA (tRNA), and predict small RNA (sRNA). Furthermore, Interpersed and Tandem repeat sequences of bacterial genomes were predicted using Repeat Masker and TRF software. The CRISPR finder was used to predict Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) on the genome. Transposon PSI (version: 20100822), Island Viewer4, and Phage_Finder were used to perform transposons, Gene Islands (GIs), and prophage prediction on the genome.

In addition, basic and advanced function annotations were analyzed. Basic function annotations included the non-redundant protein database and SwissProt by using blastp and diamond to compare the amino acid sequences encoded by the gene with the database. Through the Kyoto encyclopedia of genes and genomes (KEGG), Gene Ontology (GO), Non-redundant (Nr), and Cluster of Orthologous Groups (COG) database to obtain the annotation results corresponding to those of genes and classify accordingly. For advanced analysis, Pfam Scan (https://www.ebi.ac.uk/Tools/pfa/pfamscan/) was performed on protein families database of alignments and hidden Markov models (Pfam) to provide complete and accurate protein family and domain classification information. blastp for pathogen-host interactions (PHI-base) and carbohydrate-active enzymes database (CAZy) analysis were used. Protein sequences of the predicted genes were analyzed using SignalP 4.1 to identify the signal proteins and predic transmembrane proteins and effector proteins through TMHMM and EffectiveT3. Resistance gene identifier (RGI), blastn, and antismash 4.1.0 were used to predict antibiotic resistance ontology, virulence factors of pathogenic bacteria (VFDB), and secondary metabolism gene clusters.

Genome assembly and component features of the genome H. trueperi S61

The second-generation sequencing and third-generation sequencing techniques were used to perform deep sequencing of H. trueperi S61, resulting in a detailed map of the circular chromosome. The whole genome of H. trueperi S61 was identified by QC-PacBio with 57,549 reads and consists of a complete circular chromosome of 4,047,887 bp with 43.86% GC and no gaps (Figure 1). GC-Depth demonstrated that the strains H. trueperi S61 showed a Poisson distribution of GC with no significant bias and a scattered region of 20–30% GC content, possibly affected by mitochondrial DNA (Figure 2(a)). Gupta et al. (2019) identified that the genome H. trueperi SS1 has 4,329 sequences with 4.14 Mbp and 42.15% GC content as well as 35 RNA genes.
Figure 1

The Genome Halobacillus trueperi S61 circle diagram.

Figure 1

The Genome Halobacillus trueperi S61 circle diagram.

Close modal
Figure 2

The Halobacillus trueperi S61 subreads (a) length distribution and (b) assembly result of GC-Depth distribution.

Figure 2

The Halobacillus trueperi S61 subreads (a) length distribution and (b) assembly result of GC-Depth distribution.

Close modal

The genome of H. trueperi S61 predicts 3,982 nucleotides with length of 3,567,510 and a GC content of 44.57%. The non-coding RNA (ncRNA) was a non-encode protein that performs various biological functions in life activities at the RNA level (Chen et al. 2021). In this study, 139 ncRNAs were identified including 86 tRNA, 30 rRNA, and 23 sRNA (Table 1). Among them, the largest amount was tRNA with a sequence length of 0.16% of the total sequence length, indicating the important role of tRNA in the expression and regulation of H. trueperi S61 cells (Figure 2(b)). The repeated sequences as components of gene regulatory networks affect evolution, heredity, and mutation in life (Li et al. 2021a). This study predicts 58 interspersed repeats with 3,909 bp and five types, with the largest elements at 28 and 18 in LINEs and SINEs with 1,856 and 1,149 bp, respectively, while less proportion was predicted in the DNA and LTR elements with 497 and 218 bp, respectively. In addition, three types of transposons (helitronORF and LINE) were predicted.

Table 1

The statistics of non-coding RNA prediction of H. trueperi S61

TypeNumberAverage length (bp)Total lengthIn genome (%)
tRNA 86 77 6,648 0.16 
16S_rRNA 10 1,538 15,380 0.38 
5S_rRNA 10 115 1,150 0.03 
23S_rRNA 10 2,926 29,260 0.72 
sRNA 23 123 2,847 0.07 
TypeNumberAverage length (bp)Total lengthIn genome (%)
tRNA 86 77 6,648 0.16 
16S_rRNA 10 1,538 15,380 0.38 
5S_rRNA 10 115 1,150 0.03 
23S_rRNA 10 2,926 29,260 0.72 
sRNA 23 123 2,847 0.07 

The clustered regularly interspaced short palindromic repeats (CRISPR) a genetic weapon or natural immune system of most bacteria and archaea, due to their resistance to extraneous plasmids and phage sequence (Zhang et al. 2021). Two kinds of CRISPRs were predicted in the genome H. trueperi S61, Crispr 1 (AGAAAACAAAACCAACAATCAGCTG) and Crispr 2 (TGATGGGAATCGAACCCACGACAT) indicated that strain H. trueperi S61 provides the corresponding acquired immunity to the host through CRISPR pathway. Gene islands (GI) are considered mobile genetic elements due to their relation to various biological functions, especially the horizontal transfer of genes (Lekota et al. 2018). These predicted GI regions may be contained in H. trueperi S61 antibiotic resistance genes and bacteriostatic gene fragments. A total of 16 gene islands with 260,275 bp have been predicted in the whole genome of H. trueperi S61, which may support microbial adaptation to distinct abiotic stresses and antimicrobial resistance environments. In addition, prophage, as a carrier of genetic information, could be integrated with the genome of the infected microbe after infection. Previous studies have found that bacteriophages were capable of dissolving certain pathogenic microorganisms that may be beneficial for disease healing, while also dissolving beneficial or other harmful microorganisms. As a result, it is widely used as a carrier for the horizontal transfer of beneficial microorganisms (Zhang et al. 2020). The present study identified two prophages with a length of 82,682 in H. trueperi S61, containing 43.86 CDs of 44 and 61 genes with 44.85 and 38.43% GC, respectively. Therefore, it has been speculated that H. trueperi S61 has the ability to lyse pathogens while requiring further validation.

Essential functional annotation of the genome of H. trueperi S61

The whole genome sequence of H. trueperi S61 has been summarized with basic annotations for 3,982 protein-coding genes. In order to improve functional prediction, 3,980, 3,667, 2,998, and 2,303 genes were annotated with the Nr, Swissport, KOG, and KEGG databases, respectively. Specifically, 3,668 genes have been annotated with the COG function database in the whole-genome sequence of H. trueperi S61 (Figure 3(a)). The protein function was mainly distributed in 9.95% amino acid transport and metabolism (E), 8.02% carbohydrate transport and metabolism (G), 8.53% transcription (K), and genes were 365, 294, and 313. A total of 13.71% of genes were only predicted for general function, and 9.13% of genes had unknown protein functions, which required further evaluation. In addition, 107 genes were involved in secondary metabolites related to biosynthesis transport and catabolism, and 222 genes related to inorganic ion transport and metabolism (P), while other categories account for less proportion. There were 7,829 genes with GO annotation function (Figure 3(b)) and demonstrated that biological process accounts for 52% and is dominant by metabolic process, cellular, and single-organism process (1,016, 953, and 752), as well as localization and biological regulation (318 and 298, respectively). Additionally, molecular function accounts for 23% and is affiliated to catalytic activity and binding (888 and 657), followed by transporter activity and nucleic acid binding transcription factor activity (114 and 103). TCellular components account for 25% and are most distributed in the membrane, membrane part, and cell (537, 484, and 376, respectively). This result indicated that the gene product of strain H. trueperi S61 primarily focused on biological processes.
Figure 3

The function classification of Halobacillus trueperi S61 according to cluster of orthologous groups of (a) proteins (COG) and (b) gene ontology (GO) database.

Figure 3

The function classification of Halobacillus trueperi S61 according to cluster of orthologous groups of (a) proteins (COG) and (b) gene ontology (GO) database.

Close modal
The KEGG pathway annotations were able to identify the functional genes that were up- or down-regulated in the target metabolites (Yu et al. 2020; Wang et al. 2021b). In this study, 3,672 genes of H. trueperi S61 were annotated with KEGG and annotated as five types: 80.94% metabolism, 8.66% environmental information processing, 5.80% genetic information processing, 4.36% cellular processes, and 0.25% organismal systems. In the metabolic pathway, 602 genes were related to metabolic, 274 and 215 genes were associated with the biosynthesis of secondary metabolites and antibiotics, 183 genes were related to the microbial metabolism in diverse environments, 117 genes were associated with the biosynthesis of amino acids and 96 genes with the carbon metabolisms (Figure 4(a)). Furthermore, 76 genes were related to microbial viability (replication and repair) and reflected in six KEGG pathways, indicating H. trueperi S61 may play a role in homologous recombination, mismatch repair, DNA replication, base excision repair, nucleotide excision repair, and non-homologous end-joining metabolic pathways (Li et al. 2021b). Additionally, the Nr database annotated 3,980 genes and among which 88.32% matched with Bacillus subtilis (number of 3,515), 134 and 130 genes belong to species Bacillus sp. EGD-AK10 and Streptococcus pneumoniae, followed by 66 Bacillus sp. YP1, 22 Bacillus sp. CMAA 1185, 17 Bacillus sp. LM 4–2, and 13 Bacillus and Bacillus sp. JS (Figure 4(b)). Overall, the GO, COG, KEGG, and Nr annotations of protein-coding genes indicated that the protein function of H. trueperi S61 was primarily focused on biological processes with an emphasis on gene transcription and amino acids and carbohydrate metabolism.
Figure 4

The function classification of Halobacillus trueperi S61 according to (a) Kyoto Encyclopedia of Genes and Genomes (KEGG) and (b) non-redundant protein database (Nr) database.

Figure 4

The function classification of Halobacillus trueperi S61 according to (a) Kyoto Encyclopedia of Genes and Genomes (KEGG) and (b) non-redundant protein database (Nr) database.

Close modal
In addition, bacterial comparative genomic analysis was performed between the genome Bacillus velezensis (Hal61) and the target genome H. trueperi (Htr), Halobacillus litoralis (Hli), Halobacillus kuroshimensis (Hku), Halobacillus dabanensis (Hda), and Rossellomorea vietnamensis (Rvi). The results of parallel collinearity and two-dimensional collinearity indicated that Rvi and Hal61 have higher coverage than other genomes (Figure 5(a) and (c)), and the gene family analysis showed that the total number of genes of Htr, Hli, Hku, Hda, Rvi, and Hal61 was 3,820, 4,065, 3,872, 3,981, 4,319, and 3,982 with gene families 474, 463, 506, 407, 1,393, and 1,290, respectively (Figure 5(b)). Overall, Bacillus velezensis is more closely related to Rossellomorea vietnamensis.
Figure 5

The comparative genomic analysis of genome Bacillus velezensis (Hal61) and Rossellomorea vietnamensis (Rvi) of (a) parallel multicollinearity graph and (b) two-dimensional collinearity map and (c) gene family.

Figure 5

The comparative genomic analysis of genome Bacillus velezensis (Hal61) and Rossellomorea vietnamensis (Rvi) of (a) parallel multicollinearity graph and (b) two-dimensional collinearity map and (c) gene family.

Close modal

The advanced function annotation of the genome of H. trueperi S61

The carbohydrate enzymes (CAZymes) are essential when pathogens pass through the primary barrier cell wall after the host is attacked, and function as carbohydrates, glycoconjugates biosynthesis, and decomposition (Yu et al. 2020). The H. trueperi S61 contains 561 CAZymes and especially glycoside hydrolases (GH) and glycosyltransferases (GT) with 35.29 and 31.37%, respectively, followed by 19.61% carbohydrate-binding modules (CBM), 12.66% carbohydrate esterases (CE), 0.89% auxiliary activities (AAs), and 0.18% polysaccharide lyases (PL) (Figure 6). Importantly, GT and GH play a crucial role in metabolism processes, since GT was associated with nucleotide and amino sugar metabolism, and GH was associated with glycogen, maltose, and N-acetylglucosamine degradation, which are possibly favorable for nutrient acquisition and maintaining the structure for the survivability of the H. trueperi S61 in salt seas (Li et al. 2021a). Woo et al. (2017) annotated Halobacillus mangrovi KTB 131 genome and pointed out that most strains were distributed in secondary metabolite biosynthesis, catabolism, and transport. Additionally, secreted proteins involved enzymes, antibodies, and some hormones, and 3,982 proteins were predicted with 273 signals, 138 transmembranes, and 135 secret proteins. Effector protein is a critical point in bacterial secretion systems, pathogens secrete effector proteins into the extracellular or host cells through TNSS (type N secretion systems, type I-VII), which affects various important activities in the cell processes, such as immune response and cell death, further cause pathological reactions. There was identified four symbols of effectors: yfjA (Hal61 00834), yueC (Hal61 03072), yueB (Hal61 03073), and yukC (Hal61 03075), which all belong to Bacillus subtilis.
Figure 6

The advanced function classification of Halobacillus trueperi S61 according to carbohydrate-active enzymes (CAZy) database.

Figure 6

The advanced function classification of Halobacillus trueperi S61 according to carbohydrate-active enzymes (CAZy) database.

Close modal

The pathogenic host interaction gene database PHI included diverse pathogenic genes related to different types of hosts. It is crucial to find target genes for drug intervention (Zhang et al. 2020). By gene annotation, strain H. trueperi S61 has 4,416 PHI-related genes, mostly dominant pathogen species distributed in Burkholderia glumae that caused bacterial grain rot disease with DNA gyrase (bacterial topoisomerase II). Followed by Flavobacterium psychrophilum (DNA gyrase), Cryptococcus neoformans (GTP Biosynthesis), Bacillus anthracis (Tellurite Resistance), and Pectobacterium wasabiae (Posttranscriptional regulator) caused bacterial cold-water disease, meningoencephalitis, anthrax, and soft rot, respectively. Among these, 742 pathogenic factor genes derived from Magnaporthe oryzae (related to Magnaporthe grisea), 367, 262, 216, and 208 pathogenic genes related to Fusarium graminearum (related to Gibberella zeae), Aspergillus fumigatus, Alternaria alternata, and Candida albicans. Moreover, the virulence factors of pathogenic bacteria (VFDB) database annotated 15 factors in the form of Listeria monocytogenes, Legionella pneumophila Philadelphia, Chlamydia trachomatis, Salmonella enterica, Escherichia coli, Bacillus anthracis, Bacillus anthracis, and Mycobacterium tuberculosis. In addition, the prediction results for secondary metabolism gene clusters show ten gene cluster types, composed of nrps, terpene, nrps-transatpks-otherks, t3pks, lantipeptide, and sactipeptide head_to_tail.

The comprehensive antibiotic research database (CARD) was used to associate antibiotic modules with their targets, resistance mechanisms, and genetic mutations (Lekota et al. 2018). There were predicted 11 efflux pump complexes or subunits confer antibiotic resistance including lmrB, ykkD, TaeA, sav1866, ykkC, lmrD, TriC, bmr, and blt. Four antibiotic inactivation enzymes included aadK, VgbC, rphB, BLA1, and mphI and an antibiotic target protection protein (mfd). Antibiotic-resistant gene Enterococcus faecium cls conferring resistance to daptomycin, antibiotic-resistant fabI, mecA, Bacillus subtilis mprF, Escherichia coli EF-Tu mutants conferring resistance to kirromycin, Staphylococcus aureus rpoB mutants conferring resistance to rifampicin, Mycobacterium tuberculosis intrinsic murA conferring resistance to Fosfomycin as well as a determinant of resistance to nucleoside antibiotic (tmrB). Treves et al. (2018) evaluated the draft genome of Halobacillus sp. BBL2006 identified 4,331 open reading frames which comprised heavy metals and antibiotic resistance genes. Although the encoded genes were annotated from different databases, the reflected phenomena were consistent and mainly distributed in protein biological processes and antibiotic resistance, which provides a potential resource for biotechnology.

The whole-genome assembly and annotation were performed on the novel strain H. trueperi S61 isolated from the Qarhan Salt Lake. The genome of H. trueperi S61 predicted 3,982 nucleotides with a length of 3,567,510 and a GC content of 44.57%. A total of 3,668 genes have been annotated with COG and the protein function is mainly distributed in 9.95% amino acid transport and metabolism (365 genes). There were 7,829 genes annotated with GO annotation function and biological processes account for 52% and are dominant by metabolic process (1,016 genes). A total of 3,672 genes were annotated with KEGG and dominant by metabolism (80.94%), and the Nr database annotated 3,980 genes, and 88.32% matched with Bacillus subtilis. The Bacillus velezensis was most associated with Rossellomorea vietnamensis. Overall, the strain H. trueperi S61 mainly focused on biological processes.

We are grateful for the support of funding the General Project of the Natural Science Foundation of Qinghai Science and Technology Department (2019-ZJ-914) and the National Modern Agricultural Technology System (CARS-10).

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Buermans
H.
&
den Dunnen
J.
2014
Next generation sequencing technology: advances and applications
.
Biochimica Et Biophysica Acta-Molecular Basis of Disease
1842
,
1932
1941
.
Chen
K.
,
Wang
L.
,
Chen
H.
,
Zhang
C.
,
Wang
S.
,
Chu
P.
,
Li
S.
,
Fu
H.
,
Sun
T.
,
Liu
M.
,
Yang
Q.
,
Zou
H.
&
Zhuang
W.
2021
Complete genome sequence analysis of the peanut pathogen Ralstonia solanacearum strain Rs-P.362200
.
BMC Microbiology
21
,
118
.
Edward
L.
,
Ignatius
B.
,
Joseph
M.
,
Rees
J.
,
Muchadeyi
F.
,
Madoroba
E.
&
Heerden
H.
2018
Whole genome sequencing and identification of Bacillus endophyticus and B. anthracis isolated from anthrax outbreaks in South Africa
.
BMC Microbiology
18
,
67
.
Gupta
S.
,
Sharma
A.
,
Dev
A.
,
Baumler
D.
&
Sourirajan
A.
2019
Draft genome sequence of Halobacillus trueperi SS1, isolated from Lunsu, a saltwater body in the Northwest Himalayas
.
Microbiology Resource Announcements
8
,
e01710
e01718
.
Hong
J.
,
Song
H.
,
Moon
Y.
,
Hong
Y.
,
Bhatia
S.
,
Jung
H.
,
Choi
T.
,
Yang
S.
,
Park
H.
&
Choi
Y.
2019
Polyhydroxybutyrate production in halophilic marine bacteria Vibrio proteolyticus isolated from the Korean peninsula
.
Bioprocess and Biosystems Engineering
42
,
603
610
.
Lekota
K.
,
Bezuidt
O.
,
Mafofo
J.
,
Rees
J.
,
Muchadeyi
F.
,
Madoroba
E.
&
Heerden
H.
2018
Whole genome sequencing and identification of Bacillus endophyticus and B. anthracis isolated from anthrax outbreaks in South Africa
.
BMC Microbiology
18
,
67
.
Li
T.
,
Zhang
X.
,
Guo
L.
,
Qi
T.
,
Tang
H.
,
Wang
H.
,
Qiao
X.
,
Zhang
M.
,
Zhang
B.
,
Feng
J.
,
Zuo
Z.
,
Zhang
Y.
,
Xing
C.
&
Wu
J.
2021a
Single-molecule real-time transcript sequencing of developing cotton anthers facilitates genome annotation and fertility restoration candidate gene discovery
.
Genomics
113
,
4245
4253
.
Park
Y.
,
Choi
T.
,
Han
Y.
,
Song
H.
,
Park
J.
,
Bhatia
S.
,
Gurav
R.
,
Choi
K.
,
Kim
Y.
&
Yang
Y.
2020
Effects of osmolytes on salt resistance of Halomonas socia CKY01 and identification of osmolytes-related genes by genome sequencing
.
Journal of Biotechnology
322
,
21
28
.
Poli
A.
,
Finore
I.
,
Romano
I.
,
Gioiello
A.
,
Lama
L.
&
Nicolaus
B.
2017
Microbial diversity in extreme marine habitats and their biomolecules
.
Microorganisms
5
,
25
.
Rivadeneyra
M.
,
Parraga
J.
,
Delgado
R.
,
Cormenzana
A.
&
Delgado
G.
2004
Biomineralization of carbonates by Halobacillus trueperi in solid and liquid media with different salinities
.
Fems Microbiology Ecology
48
,
39
46
.
Thirugnanasambandam
R.
,
Inbakandan
D.
,
Abraham
L.
,
Kumar
C.
,
Sundaram
S.
,
Subashni
B.
,
Vasantharaja
R.
,
Kumar
A.
,
Kirubagaran
R.
,
Khan
S.
&
Balasubramanian
T.
2017
De novo assembly and annotation of the whole genomic analysis of Vibrio campbellii RT-1 strain, from infected shrimp: Litopenaeus vannamei
.
Microbial Pathogenesis
113
,
372
377
.
Wang
J.
,
Yang
C.
,
Zhang
C.
,
Mao
X.
&
Li
Z.
2021a
Complete genome sequence of the Clostridium difficile LCL126
.
Bioengineered
12
,
745
754
.
Williamson
A.
,
De Santi
C.
,
Altermark
B.
,
Karlsen
C.
&
Hjerde
E.
2016
Complete genome sequence of Halomonas sp. R5-57
.
Standards in Genomic Sciences
11
,
62
.
Woo
M.
,
Park
S.
,
Park
K.
,
Park
M.
,
Kim
J.
,
Lee
H.
,
Sohn
J.
,
Lee
D.
,
Nam
G.
,
Shin
K.
&
Lee
S.
2017
Draft genome sequence of the halophilic Halobacillus mangrovi KTB 131 isolated from Topan salt of the Jeon-nam in Korea
.
Genomics Data
14
,
18
20
.
Xu
P.
,
Zhang
X.
,
Su
H.
,
Liu
X.
&
Hong
G.
2020
Genome-wide analysis of PYL-PP2C-SnRK2s family in Camellia sinensis
.
Bioengineered
11
,
103
115
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).