The work evolved between 2008 - 2014 at the Chair of Environmental Informatics, Faculty of Environmental Sciences and Process Engineering, Brandenburg University of Technology Cottbus - Senftenberg
Day of scientific discussion: 04.12.2014
Since Z3, the first automatic, programmable and operational computer, emerged in 1941, computers have become an unshakable tool in varieties of engineering researches, studies and applications. In the field of hydroinformatics, there exist a number of tools focusing on data collection and management, data analysis, numerical simulations, model coupling, post-processing, etc. in different time and space scales. However, one crucial process is still missing - filling the gap between available mass raw data and simulation tools.
In this research work, a general software framework for time series scenario composition is proposed to improve this issue. The design of this framework is aimed at facilitating simulation tasks by providing input data sets, e.g. Boundary Conditions (BCs), generated for user-specified what-if scenarios. These scenarios are based on the available raw data of different sources, such as field and laboratory measurements and simulation results. In addition, the framework also monitors the workflow by keeping track of the related metadata to ensure its traceability.
This framework is data-driven and semi-automatic. It contains four basic modules: data pre-processing, event identification, process identification, and scenario composition. These modules mainly involve Time Series Knowledge Mining (TSKM), fuzzy logic and Multivariate Adaptive Regression Splines (MARS) to extract features from the collected data and interconnect themselves. The extracted features together with other statistical information form the most fundamental elements, MetaEvents, for scenario composition and further time series generation. The MetaEvents are extracted through semi-automatic steps forming Aspects, Primitive Patterns, Successions, and Events from a set of time series raw data. Furthermore, different state variables are interconnected by the physical relationships derived from process identification. These MetaEvents represent the complementary features and consider identified physical relationships among different state variables from the available time series data of different sources rather than the isolated ones. The composed scenarios can be further converted into a set of time series data as, for example, BCs, to facilitate numerical simulations.
A software prototype of this framework was designed and implemented on top of the Java and R software technologies. The prototype together with four prototype application examples containing mathematical function-generated data, artificial model-synthetic hydrological data, and measured hydrological and hydrodynamic data, are used to demonstrate the concept. The results from the application examples present the capability of reproducing similar time series patterns from specific scenarios compared to the original ones as well as the capability of generating artificial time series data from composed scenarios based on the interest of users, such as numerical modelers. In this respect, it demonstrates the concept’s capability of answering the impacts from what-if scenarios together with simulation tools. The semi-automatic concept of the prototype also prevents from inappropriate black-box applications and allows the consideration of the knowledge and experiences of domain experts. Overall, the framework is a valuable and progressive step towards holistic hydroinformatics systems in reducing the gap between raw data and simulation tools in an engineering suitable manner.