A comprehensive statistical analysis of COVID-19 trends: Global and United States insights through ARIMA, regression, and spatial models

The COVID-19 pandemic has driven the need for accurate data analysis and forecasting to support public health decision-making. This study applied autoregressive integrated moving average (ARIMA) models and ARIMA models with exogenous variables to predict short-term trends in confirmed COVID-19 cases across several regions, including the United States of America, Asia, Europe, and Africa. Model performance was compared between ARIMA and the automated model selection function, auto.arima, and anomaly detection was performed to investigate discrepancies between predicted and observed case numbers. Additionally, the study explored the relationship between vaccination rates and new case trends while also examining the influence of socioeconomic factors—such as gross domestic product per capita, human development index, and healthcare resources availability—on COVID-19 incidence across countries. The findings provide valuable insights into the effectiveness of predictive models and highlight the significant role of socioeconomic factors in the spread of the virus, thereby contributing to the development of more effective strategies for future epidemic prevention and control.
- Lai CC, Shih TP, Ko WC, Tang HJ, Hsueh PR. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and coronavirus disease-2019 (COVID-19): The epidemic and the challenges. Int J Antimicrob Agents. 2020;55(3):105924. doi: 10.1016/j.ijantimicag.2020.105924
- Adhikari R, Agrawal RK. An introductory study on time series modeling and forecasting. arXiv [Preprint]; 2013. doi: 10.48550/arXiv.1302.6613
- Benvenuto D, Giovanetti M, Vassallo L, Angeletti S, Ciccozzi M. Application of the ARIMA model on the COVID-2019 epidemic dataset. Data Brief. 2020;29:105340. doi: 10.1016/j.dib.2020.105340
- Petropoulos F, Makridakis S. Forecasting the novel coronavirus COVID-19. PLoS One. 2020;15(3):e0231236. doi: 10.1371/journal.pone.0231236
- Bontempi E, Vergalli S, Squazzoni F. Understanding COVID-19 diffusion requires an interdisciplinary, multi-dimensional approach. Environ Res. 2020;188:109814. doi: 10.1016/j.envres.2020.109814
- Paltiel AD, Zheng A, Schwartz JL. Speed versus efficacy: Quantifying potential tradeoffs in COVID-19 vaccine deployment. Ann Intern Med. 2021;174(4):568-570. doi: 10.7326/M20-7866
- Islam N, Khunti K, Dambha-Miller H, Kawachi I, Marmot M. COVID-19 mortality: A complex interplay of sex, gender and ethnicity. Eur J Public Health. 2020;30(5):847-848. doi: 10.1093/eurpub/ckaa150
- Bambra C, Riordan R, Ford J, Matthews F. The COVID-19 pandemic and health inequalities. J Epidemiol Community Health. 2020;74(11):964-968. doi: 10.1136/jech-2020-214401
- Box GE, Jenkins GM, Reinsel GC, Ljung GM. Time Series Analysis: Forecasting and Control. 5th ed. United States: John Wiley and Sons; 2015.
- Hamilton JD. Time Series Analysis. United States: Princeton University Press; 1994.
- Dickey DA, Fuller WA. Distribution of the estimators for autoregressive time series with a unit root. J Am Stat Assoc. 1979;74(366a):427-431. doi: 10.1080/01621459.1979.10482531
- Ljung GM, Box GE. On a measure of lack of fit in time series models. Biometrika. 1978;65(2):297-303. doi: 10.1093/biomet/65.2.297
- Akaike H. A new look at the statistical model identification. In: Parzen E, Tanabe K, Kitagawa G, editors. Selected Papers of Hirotugu Akaike. Germany: Springer; 1994. p. 215-232. doi: 10.1007/978-1-4612-1694-0_16
- Hyndman RJ, Athanasopoulos G. Forecasting: Principles and practice. Australia: OTexts; 2018.
- Bergmeir C, Benítez JM. On the use of cross-validation for time series predictor evaluation. Inform Sci. 2012;191: 192-213. doi: 10.1016/j.ins.2011.12.028
- Chai T, Draxler RR. Root mean square error (RMSE) or mean absolute error (MAE)? - Arguments against avoiding RMSE in the literature. Geosci Model Dev. 2014;7(3):1247-1250. doi: 10.5194/gmd-7-1247-2014
- Willmott CJ, Matsuura K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Res. 2005;30(1):79-82.
- Hyndman RJ, Khandakar Y. Automatic time series forecasting: The forecast package for R. J Stat Softw. 2008;27(3):1-22. doi: 10.18637/jss.v027.i03
- Tashman LJ. Out-of-sample tests of forecasting accuracy: An analysis and review. Int J Forecasting. 2000;16(4):437-450. doi: 10.1016/S0169-2070(00)00065-0
- Chandola V, Banerjee A, Kumar V. Anomaly detection: A survey. ACM Comput Surv. 2009;41(3):1-58. doi: 10.1145/1541880.1541882
- Aggarwal CC. Outlier Analysis. 2nd ed. Germany: Springer; 2017.
- Pankratz A. Forecasting with Dynamic Regression Models. United States: John Wiley and Sons; 1991.
- Granger CWJ. Investigating causal relations by econometric models and cross-spectral methods. Econometrica. 1969;37(3):424-438. doi: 10.2307/1912791
- Wagner AK, Soumerai SB, Zhang F, Ross-Degnan D. Segmented regression analysis of interrupted time series studies in medication use research. J Clin Pharm Ther. 2002;27(4):299-309. doi: 10.1046/j.1365-2710.2002.00430.x
- Chow GC. Tests of equality between sets of coefficients in two linear regressions. Econometrica. 1960;28(3):591-605. doi: 10.2307/1910133
- Imbens GW, Lemieux T. Regression discontinuity designs: A guide to practice. J Econometrics. 2008;142(2):615-635. doi: 10.1016/j.jeconom.2007.05.001
- Mukaka MM. A guide to appropriate use of correlation coefficient in medical research. Malawi Med J. 2012;24(3):69-71.
- Kutner MH, Nachtsheim CJ, Neter J, Li W. Applied Linear Statistical Models. 5th ed. United Staes: McGraw-Hill/Irwin; 2005.
- Montgomery DC, Peck EA, Vining GG. Introduction to Linear Regression Analysis. 5th ed. United States: John Wiley and Sons; 2012.
- Burnham KP, Anderson DR. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. 2nd ed. Germany: Springer; 2004.
- O’Brien RM. A caution regarding rules of thumb for variance inflation factors. Qual Quant. 2007;41(5):673-690. doi: 10.1007/s11135-006-9018-6
- James G, Witten D, Hastie T, Tibshirani R. An Introduction to Statistical Learning: With Applications in R. Germany: Springer; 2013.
- Jolliffe IT. Principal Component Analysis. 2nd ed. Germany: Springer; 2002.
- Wold S, Sjöström M, Eriksson L. PLS-regression: A basic tool of chemometrics. Chemometr Intell Lab Syst. 2001;58(2):109- 130. doi: 10.1016/S0169-7439(01)00155-1
- Anselin L. Local indicators of spatial association-LISA. Geograph Anal. 1995;27(2):93-115. doi: 10.1111/j.1538-4632.1995.tb00338.x
- Ord JK, Getis A. Local spatial autocorrelation statistics: Distributional issues and an application. Geograph Anal. 1995;27(4):286-306. doi: 10.1111/j.1538-4632.1995.tb00912.x
- Cliff AD, Ord JK. Spatial Processes: Models and Applications. Billerica, MA: Pion; 1981.
- Getis A, Ord JK. The analysis of spatial association by use of distance statistics. Geograph Anal. 1992;24(3):189-206. doi: 10.1111/j.1538-4632.1992.tb00261.x
- World Health Organization. Classification of Omicron (B.1.1.529): SARS-CoV-2 Variant of Concern. Available from: https://www.who.int/news/item/26-11-2021-classification-of-omicron-(b.1.1.529)-sars-cov-2-variant-of-concern [Last accessed on 2021 Nov 26].
- World Health Organization. Tracking SARS-CoV-2 Variants; 2022. Available from: https://www.who.int/en/activities/ tracking-sars-cov-2-variants [Last accessed on 2025 Jun 16].
- World Health Organization. Update on Omicron Subvariants and the Global COVID-19 Situation; 2022. Available from: https://www.who.int/news-room/feature-stories/detail/ update-on-omicron-subvariants-and-the-global-covid-19- situation [Last accessed on 2024 Sep 24].
- World Health Organization. Weekly Epidemiological Update on COVID-19; 2023. Available from: https://www.who.int/publications/m/item/weekly-epidemiological-update-on-covid-19 [Last accessed on 2023 Jan 11].
- Chowell G, Hyman JM, Castillo-Chavez C. Mathematical and Statistical Estimation Approaches in Epidemiology. Germany: Springer; 2021.
- Liu Z, Magal P, Seydi O, Webb G. Predicting the cumulative number of cases for the COVID-19 epidemic in China from early data. Math Biosci Eng. 2020;17(4):3040-3051. doi: 10.3934/mbe.2020172
- Li Q, Guan X, Wu P, et al. Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia. N Engl J Med. 2020;382(13):1199-1207. doi: 10.1056/NEJMoa2001316
- Hernández-Orallo E, Chiner-Oms Á, Rubio-Soler M, et al. The importance of considering the impact of COVID-19 variants in forecasting models. J Ambient Intell Hum Comput. 2022;13(7):3285-3298. doi: 10.1007/s12652-021-03113-x
- Gao Q, Hu Y, Dai H, et al. Modeling COVID-19 with ARIMA and ARIMAX models: A case study in China. IEEE Access. 2022;10:55089-55102. doi: 10.1109/ACCESS.2022.3182134
- Wooldridge JM. Introductory Econometrics: A Modern Approach. 6th ed. United States: Cengage Learning; 2016.