Mx. BIOME: A bioinformatics and big data analysis platform in a data explosion era

With the development of next-generation sequencing and other technologies, there has been an exponential increase in DNA, RNA, and protein sequences and protein structures in the last two decades, paralleled with the advancement of user-friendly bioinformatics tools. Furthermore, the enormous improvement in computational power and accumulation of massive amounts of data has given rise to the development of big data analysis, which can unveil novel patterns, associations, and trends. In view of the unprecedented biological data explosion, we have recently developed a platform known as Mx. BIOME, which provides a collection of some of the most popular and cutting-edge tools in the fields of bioinformatics and big data analysis. Such a collection would facilitate end-users to identify suitable tools for analyzing their specific datasets. Further studies and reviews on various in silico tools are necessary to compare the advantages and limitations.
- Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci U S A. 1977;74(12):5463-5467. doi: 10.1073/pnas.74.12.5463
- Margulies M, Egholm M, Altman WE, et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005;437(7057):376-380. doi: 10.1038/nature03959
- Eid J, Fehr A, Gray J, et al. Real-time DNA sequencing from single polymerase molecules. Science. 2009;323(5910):133-138. doi: 10.1126/science.1162986
- Teng JLL, Yeung ML, Chan E, et al. Pacbio but not illumina technology can achieve fast, accurate and complete closure of the high GC, complex Burkholderia pseudomallei two-chromosome genome. Front Microbiol. 2017;8:1448. doi: 10.3389/fmicb.2017.01448
- Lu H, Giordano F, Ning Z. Oxford nanopore MinION sequencing and genome assembly. Genomics Proteomics Bioinformatics. 2016;14(5):265-279. doi: 10.1016/j.gpb.2016.05.004
- Castillo M. The scientific method: A need for something better? AJNR Am J Neuroradiol. 2013;34(9):1669-1671. doi: 10.3174/ajnr.A3401
- Khamisy-Farah R, Gilbey P, Furstenau LB, et al. Big data for biomedical education with a focus on the Covid-19 era: An integrative review of the literature. Int J Environ Res Public Health. 2021;18(17):8989. doi: 10.3390/ijerph18178989
- Lau SKP, Woo PCY. Pitfalls in big data analysis: Next-generation technologies, last-generation data. Diagn Microbiol Infect Dis. 2019;94(2):209-210. doi: 10.1016/j.diagmicrobio.2018.12.006
- Chu YW, Chang KP, Chen CW, Liang YT, Soh ZT, Hsieh LC. miRgo: Integrating various off-the-shelf tools for identification of microRNA-target interactions by heterogeneous features and a novel evaluation indicator. Sci Rep. 2020;10(1):1466. doi: 10.1038/s41598-020-58336-5
- Huang CC, Chang CC, Chen CW, Ho SY, Chang HP, Chu YW. PClass: Protein quaternary structure classification by using bootstrapping strategy as model selection. Genes (Basel). 2018;9(2):91. doi: 10.3390/genes9020091
- Pan WJ, Chen CW, Chu YW. siPRED: Predicting siRNA efficacy using various characteristic methods. PLoS One. 2011;6(11):e27602. doi: 10.1371/journal.pone.0027602
- Tung CH, Chen CW, Guo RC, Ng HF, Chu YW. QuaBingo: A prediction system for protein quaternary structure attributes using block composition. Biomed Res Int. 2016;2016:9480276.doi: 10.1155/2016/9480276
- Tung CH, Chen CW, Sun HH, Chu YW. Predicting human protein subcellular localization by heterogeneous and comprehensive approaches. PLoS One. 2017;12(6):e0178832. doi: 10.1371/journal.pone.0178832
- Chen CW, Lin MH, Liao CC, Chang HP, Chu YW. IStable 2.0: Predicting protein thermal stability changes by integrating various characteristic modules. Comput Struct Biotechnol J. 2020;18:622-630. doi: 10.1016/j.csbj.2020.02.021
- Tung CH, Chien CH, Chen CW, Huang LY, Liu YN, Chu YW. QUATgo: Protein quaternary structural attributes predicted by two-stage machine learning approaches with heterogeneous feature encoding. PLoS One. 2020;15(4):e0232087. doi: 10.1371/journal.pone.0232087
- Chen CW, Chang KP, Ho CW, Chang HP, Chu YW. KStable: A computational method for predicting protein thermal stability changes by K-Star with Regular-mRMR feature selection. Entropy (Basel). 2018;20(12):988. doi: 10.3390/e20120988
- Chang CC, Tung CH, Chen CW, Tu CH, Chu YW. SUMOgo: Prediction of sumoylation sites on lysines by motif screening models and the effects of various post-translational modifications. Sci Rep. 2018;8(1):15512. doi: 10.1038/s41598-018-33951-5
- Chien CH, Chang CC, Lin SH, Chen CW, Chang ZH, Chu YW. N-GlycoGO: Predicting protein N-glycosylation sites on imbalanced data sets by using heterogeneous and comprehensive strategy. IEEE Access. 8 2020;8:165944-165950. doi: 10.1109/ACCESS.2020.3022629
- Chen CW, Huang LY, Liao CF, Chang KP, Chu YW. GasPhos: Protein phosphorylation site prediction using a new feature selection approach with a GA-aided ant colony system. Int J Mol Sci. 2020;21(21):7891. doi: 10.3390/ijms21217891
- Zhou Z, Zhu Y, Chu M. Role of Covid-19 vaccines in sars- Cov-2 variants. Front Immunol. 2022;13:898192. doi: 10.3389/fimmu.2022.898192