Wasserwirtschaft und Hydrosystemmodellierung

Prediction of odour in sewer systems using data-driven methods

Introduction and background

Wastewater treatment plants (WWTPs) and collection systems emit and generate considerable amounts of air emissions and offensive odours. Several chemical compounds can be emitted and cause odour in collection systems, headworks and WWTPs such as sulphur-based compounds (e.g. H2S) and nitrogen-based compounds (e.g. N2O). The main role of sewer systems is to collect and convey wastewater to treatment facilities. Sulphide build-up is one of the major problems occurring in wastewater systems. The consequences of the undesirable septic conditions include toxicity, corrosion and odour nuisance. Corrosion is often a major failure mechanism for concrete sewers and under such circumstances the sewer service life is largely determined by the progression of microbially induced concrete corrosion. Therefore, predicting corrosion in sewer systems is a vital element in planning sewer pipes, shafts as well as rehabilitation programs.

Although some empirical models have demonstrated satisfying results for simulating/predicting odour/corrosion, however there are areas in which they are not robust. For example, they lack the ability to predict short term variations in sulphide concentrations. Another common problem with all these models is that only few factors among many others affecting H2S production are considered resulting in highly case specific parameter values. Such problems have created room for further research and development in the subject of odour and corrosion management in sewers. The use of data-driven techniques and in particular those based on artificial intelligence (AI) in modelling of engineering phenomena have drawn much attention from the scientific and research community in the past few decades.


Recent studies have suggested that proper predictions of environmental parameters could be obtained via artificial intelligence based models such as ANNs (Pires et al., 2010; Moazami et al., 2016; Biancofiore et al., 2017; Zounemat-Kermani et al., 2018; Dotse et al., 2018). Despite the proper and positive premises, the real effectiveness of AI-based models have been verified only on a few case studies, and the performance of the time dependent ANNs (e.g. NARX) for modelling emission factors has never been assessed. Hence, a comprehensive assessment of the capability and effectiveness of AI models in simulating emission factors of wastewater facilities is beneficial for filling the gap in this unexplored area of research. This task is carried out in this study, where the prediction of emission factor (H2S) using multivariate nonlinear autoregressive exogenous ANN and regression methods is accomplished. In other words, the aim of this research is to evaluate the feasibility of using emission factors as a tool to predict H2S emissions from four different wastewater treatment plants (WWTPs) in Louisiana, USA.

Methods (Multivariate NARX neural network)

Inspired by the animal neural systems, artificial neural networks (ANNs) are parallel distributed processors which are made up of simple processing units called neurons. ANNs are categorized as machine learning models which exhibit good learning and generalization capabilities for a variety classification, prediction and simulating purposes. Due to the nature of the modelling phenomenon, different categories of ANNs can be utilised such as multi-layer perceptron neural network (Cigizoglu, 2004), generalized regression neural network (Zounemat-Kermani, 2014) and deep learning neural network (Schmidhuber, 2015).

One of the ANN categories that can be considered for modelling time dependent problems is recurrent neural networks. In recurrent neural networks the temporal dynamic behaviour for a time sequence can be exhibited and captured by the network structure during the training process. Nonlinear autoregressive with exogenous inputs (NARX) neural networks are types of dynamically-driven recurrent ANNs. NARXs have one or several local/global feedback loop in which the same structure can be made up for different recurrent models. In other words, in NARXs the summary information of exogenous variables is included which leads to fewer number of residuals. Thus NARXs have a reasonable computation cost in comparison to the classical feed-forward ANNs (Cadenas et al., 2016).

Figure 1 depicts the optimum and the final architecture of one of the four developed NARX models in this study with eight input variables, nine hidden neurons and 1 output variable (the East Bank WWPT). As could be seen from the architecture of the developed NARX model in Figure 1, the model used five input multivariate independent variables (Q, T, pH, BOD and TS) at time (t) with three delays of emission factor at times (t, t-1, t-2). Detailed information about the input selection procedure (F in Equation 1) and proper delay time (n in Equation 1) using the principal component analysis and average mutual function is presented in the following sections.

Application and results

Figures 2 shows the comparison of observed versus simulated (the training phase) and predicted values (the testing phase) of all applied models for the four WWTPs. From Figure 6 it can be observed that the values of the weekly gaseous H2S obtained from the NARX models are generally simulated (the training phase) and predicted (the testing phase) closer to the observed value than the other models.

Figure 3 displays the results of NARX models for simulated and predicted EF-Flow against the observed values in forms of scatter plots for the training and testing datasets. As can be observed from the scatterplot the simulated and predicted values of NARX models follow the observed values in both training and testing phases. In addition, the simulated and predicted values of the four WWTPs exhibit a reluctant tendency for an absolute over/under estimation of the gaseous H2S.


Biancofiore, F., Busilacchio, M., Verdecchia, M., Tomassetti, B., Aruffo, E., Bianco, S., ... & Di Carlo, P. (2017). Recursive neural network model for analysis and forecast of PM10 and PM2. 5. Atmospheric Pollution Research8(4), 652-659.

Cadenas, E., Rivera, W., Campos-Amezcua, R., & Cadenas, R. (2016). Wind speed forecasting using the NARX model, case: La Mata, Oaxaca, México. Neural Computing and Applications27(8), 2417-2428.

Cigizoglu, H. K. (2004). Estimation and forecasting of daily suspended sediment data by multi-layer perceptrons. Advances in Water Resources27(2), 185-195.

Dotse, S. Q., Petra, M. I., Dagar, L., & De Silva, L. C. (2018). Application of computational intelligence techniques to forecast daily PM10 exceedances in Brunei Darussalam. Atmospheric Pollution Research9(2), 358-368.

Moazami, S., Noori, R., Amiri, B. J., Yeganeh, B., Partani, S., & Safavi, S. (2016). Reliable prediction of carbon monoxide using developed support vector machine. Atmospheric Pollution Research7(3), 412-418.

Pires, J. C., Alvim–Ferraz, M. C., Pereira, M. C., & Martins, F. G. (2010). Prediction of PM10 concentrations through multi–gene genetic programming. Atmospheric Pollution Research1(4), 305-310.

Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural networks61, 85-117.

Zounemat-Kermani, M. (2014). Principal component analysis (PCA) for estimating chlorophyll concentration using forward and generalized regression neural networks. Applied Artificial Intelligence28(1), 16-29.

Zounemat-Kermani, M., Ramezani-Charmahineh, A., Adamowski, J., & Kisi, O. (2018). Investigating the management performance of disinfection analysis of water distribution networks using data mining approaches. Environmental monitoring and assessment190(7), 397.

Georg Forster Research Fellowship for Experienced Researchers (Alexander von Humboldt Foundation):

Associate Prof. Mohammad Zounemat-Kermani


The scope of this research was to understand the feasibility of developing soft computing models for gaseous H2S emissions as an emission factor from four WWTPs. In this respect, multivariate nonlinear autoregressive exogenous neural networks (NARX ANNs) as a soft computing model as well as standard regressive models (multiple linear regression model, MLR; stepwise regression, SR; & two-variate logarithmic regression model, TVLR) were constructed and developed for predicting emission factor (gaseous H2S).

Based on several statistical measures (RMSE, MAPE, PCC, NSE & GRI) accounting for models accuracy, it was found that the NARX models could predict the emitted H2S with better accuracy (average RMSE=15.3 mg/m3 and MAPE=55.7%), similarity (average PCC=0.91 and, NSE=0.92) and reliability (GRI=1.65). Among the standard regression models (MLR, SR and TVLR), the performance of MLR models in the training and testing phases indicates an acceptable degree of accuracy (average NSE=0.85 and MAPE=185.9%). Investigating the various input combinations using correlation coefficient values and PCA analysis, demonstrated that the ambient temperature and average daily flow have the maximum effect on the H2S prediction accuracy, whereas, the pH input variable was recognized as the least important independent factor on gaseous H2S emission.