Next Article in Journal
Regarding Groundwater and Drinking Water Access through A Human Rights Lens: Self-Supply as A Norm
Previous Article in Journal
Assessment of the Environmental Risk of Pesticides Leaching at the Watershed Scale under Arid Climatic Conditions and Low Recharge Rates
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Quantifying the Information Content of a Water Quality Monitoring Network Using Principal Component Analysis: A Case Study of the Freiberger Mulde River Basin, Germany

1
Institute for Integrated Management of Material Fluxes and of Resources (UNU-FLORES), United Nations University, Ammonstrasse 74, 01067 Dresden, Germany
2
Institute for Urban Water Management, Department of Hydrosciences, Technische Universität Dresden (TU Dresden), Bergstrasse 66, 01069 Dresden, Germany
*
Author to whom correspondence should be addressed.
Water 2020, 12(2), 420; https://doi.org/10.3390/w12020420
Submission received: 11 November 2019 / Revised: 29 January 2020 / Accepted: 3 February 2020 / Published: 5 February 2020
(This article belongs to the Section Water Quality and Contamination)

Abstract

:
Although river water quality monitoring (WQM) networks play an important role in water management, their effectiveness is rarely evaluated. This study aims to evaluate and optimize water quality variables and monitoring sites to explain the spatial and temporal variation of water quality in rivers, using principal component analysis (PCA). A complex water quality dataset from the Freiberger Mulde (FM) river basin in Saxony, Germany was analyzed that included 23 water quality (WQ) parameters monitored at 151 monitoring sites from 2006 to 2016. The subsequent results showed that the water quality of the FM river basin is mainly impacted by weathering processes, historical mining and industrial activities, agriculture, and municipal discharges. The monitoring of 14 critical parameters including boron, calcium, chloride, potassium, sulphate, total inorganic carbon, fluoride, arsenic, zinc, nickel, temperature, oxygen, total organic carbon, and manganese could explain 75.1% of water quality variability. Both sampling locations and time periods were observed, with the resulting mineral contents varying between locations and the organic and oxygen content differing depending on the time period that was monitored. The monitoring sites that were deemed particularly critical were located in the vicinity of the city of Freiberg; the results for the individual months of July and September were determined to be the most significant. In terms of cost-effectiveness, monitoring more parameters at fewer sites would be a more economical approach than the opposite practice. This study illustrates a simple yet reliable approach to support water managers in identifying the optimum monitoring strategies based on the existing monitoring data, when there is a need to reduce the monitoring costs.

1. Introduction

Rivers are the main inland freshwater source for domestic, industrial, and agricultural purposes [1]. As a result of the deleterious effects of human activities and population growth, about one-third of the river stretches in Latin America, Africa, and Asia have been affected by severe pathogen contamination, and one-seventh by organic pollution [2]. Additionally, natural processes such as precipitation, erosion, and weathering of crustal materials can also contribute to the impairment of water quality in rivers [1,3]. In European water bodies, the emphasis on chemical water quality assessment has shifted to trace contaminants [4]. As a result, over 50% of all European Union rivers, including all rivers in some countries, fail to achieve a good chemical state [5]. To protect and properly manage the rivers, the monitoring of water quality is critical [6,7]. Water quality monitoring (WQM) is defined as the effort to obtain quantitative information on the physical, chemical, and biological characteristics of water bodies via representative sampling [6]. In view of the spatial and temporal variations in hydrochemistry of rivers, regular monitoring programs are required for reliable estimates of the water quality [1]. This often results in complex datasets of various physicochemical variables, which do not always easily convey meaningful information [8,9]. Researchers have highlighted two main reasons for this “data-rich, but information-poor” syndrome: (a) unclear defined monitoring objectives [8,10,11] and (b) lack of specific methods to design WQM networks [10,12].
Previous studies used a variety of methods to assess and optimize the water quality monitoring networks in rivers, including artificial neural networks [13,14], genetic algorithm [15,16], and simple descriptive statistical analysis [17]. These methods seek to identify the water quality parameters, monitoring frequencies, number of samples and location of monitoring sites. According to Nguyen et al. [18], among the available methods, multivariate statistics are the most widely-used techniques to assess the variability of water quality and the efficiency of water quality monitoring networks worldwide. Principal component analysis (PCA) has been a particularly popular method to extract important information from the complicated datasets, with studies dating back to the 1930s [19]. In the field of water quality research, PCA and factor analysis (FA) were usually applied together to identify critical water quality parameters that are responsible for temporal and spatial variations of river water quality [20,21,22,23,24,25,26,27]. However, the application of PCA to identify principal water quality monitoring stations was rarely reported in the literature. Some studies that used PCA for this purpose include Ouyang [28], who used PCA to evaluate the effectiveness of ambient monitoring stations on St. Johns River in Florida, USA, and Wang et al. [29] who combined PCA/FA with cluster analysis to implement the selection of the principal monitoring sites for Tamsui river in Taiwan. Although both these studies indicated the potential of improving the efficiency and economy of the monitoring network [28] through the reduction of monitored parameters and stations, the cost-effectiveness of the proposed monitoring network has not been specifically quantified and deciding upon an optimum option for the monitoring network remains a challenge.
In this study, we aim to identify the relevant water quality parameters and monitoring stations that are responsible for the spatial and temporal variations of basin-wide river water quality. For this purpose, a thorough analysis of the complex water quality data collected from the Freiberger Mulde river basin in eastern Germany was conducted using PCA. Based on this analysis, we propose a different and simple approach to quantify the “information” of the monitoring network, alongside the visualization and interpretation of the PCA outcomes. This study also intends to provide an adoptable approach to evaluate the trade-offs between information provided by the monitoring network and the expenses of the monitoring activities.

2. Materials and Methods

2.1. Study Area

Freiberger Mulde (FM) is a 124-km long siliceous river with a catchment area of 2985 km² [30]. It is the headstream of the three main tributaries of the Mulde River, which is one of the important western tributaries of the Elbe River in Germany. Running northwest and rising from the Ore Mountains in Czech Republic, the FM river has been historically polluted with heavy metals due to both geogenic and human activities, especially by ore mining [31]. Even now, the river basin is still considered a major source of heavy metals to the Elbe River [30].
The monitoring program for surface water of the FM river basin is under the context of Water Framework Directive (WFD) and aims at collecting the data for a status assessment of biological, chemical, and physicochemical water quality elements. A total of 463 water quality parameters have been monitored in the FM river basin since 1999, including general physiochemical parameters, industrial pollutants, pesticides, herbicides, and pharmaceuticals. The monitoring network in the FM river basin is comprised of 364 measuring points, 27 measuring points of which are on the mainstream of the FM River and an additional 337 measuring points are on the tributaries of its river network (Figure 1).

2.2. Data Selection and Preparation

The monitoring data for the FM river basin, which have been collected by the Free State of Saxony since 1999 and which are freely accessible on their water quality database platform, were used for this research [32]. The process of preparing the dataset for the application of multivariate statistical analysis consisted of selecting water quality variables and monitoring stations, while minimizing the missing values. The water quality parameters that were considered for current analysis include chemical and physiochemical elements that explain the catchment processes, such as the influence of both the drainage basin and local environmental conditions. For this reason, long-term and frequently-monitored parameters were prioritized. Parameters with data availability of less than 30% or with censored data comprising more than 15% (concentrations below the detection limits and/or below the quantitation limits of the analytical methods) were excluded. The records under the censor limits were replaced by half of the detection limits and/or quantitation limits. According to the United States Environmental Protection Agency [33], this percentage of censored data is acceptable for a substitution method. The soluble concentrations in total water samples were used for the analysis. Maps of the river basin and monitoring stations were also obtained via Saxony’s open access on geodata portal [34].
The selection of monitoring stations was based on the availability of monitoring data. In this study, we considered the monitoring stations that had at least three years of monitoring data and 12 sampling events (Figure 1). The continuity of the monitoring years was not necessarily required, because the variables were assumed to be independent and identically distributed.

2.3. Principal Component Analysis

Principal component analysis (PCA) is a popular multivariate statistical technique used for dimension reduction [35]. PCA provides information on the most meaningful variables, thus describing the whole dataset and rendering data reduction with a minimum loss of the original information [1]. PCA transforms the original variables into new, uncorrelated variables called the principal components (PCs) [36]. The calculation to obtain PCs is given in Abdi and Williams [19]. In this study, PCA is implemented based on the correlation matrix. In instances where the variables are highly correlated, the first few principal components may be sufficient to describe most of the variability of the dataset [37]. The importance of a component is reflected by its eigenvalue. PCs with eigenvalues less than one are commonly recommended to be ignored [1,22,38]. To strengthen the interpretation, PCs with eigenvalues more than one are subjected to the varimax rotation, which generates rotated components (RCs). RCs further simplify the data structure coming from PCA [1,22]. The varimax rotation technique prevents multiple variables from being loaded to a single component, allowing for easy interpretation of significant variables [24]. Because these rotations are performed in a subspace, the new rotated components explain less variance than the original principal components, but the total variance remains the same after rotation [19].
In PCA, the correlations between a variable and a component are loadings, which estimate the information that they share. For interpretation, variables that have absolute values of loadings greater than or equal to 0.7 are strongly correlated, from 0.5 to 0.7 are moderately correlated, and less than 0.5 are weakly correlated to the component [1,22]. In other words, the larger the loading values, the more important that variable is to explain the component. The length of the projection of the observations on the components are factor scores. The importance of an observation for a component can be obtained by the ratio of the squared factor score of this observation to the sum of squared factor scores of all observations in the component [19]. This ratio is called contribution of the observation to the component. Details of the equations to calculate loadings, factor scores, and observation contributions can be found in Abdi and Williams [19]. For a given component, the sum of the contributions of all observations is equal to 1. Thus, the larger the value of the contribution, the more the observation contributes to explaining the component [19]. In this study, the observation contributions in percentage are used to calculate the importance of the monitoring sites in explaining the spatial variability and importance of the monitoring months in explaining temporal variability of the water quality. On each component, the contribution of a monitoring site is calculated as the sum of the contributions of all observations on that site during the whole monitoring period. Similarly, the contribution per monitoring month is calculated as the sum of contributions of all observations of all sites on that month. The variance explained by a monitoring site at any component is quantified by the product of its contribution and the variance explained by the selected component.
PCA is carried out in R software, and varimax rotation is implemented on R package psych [39]. The factor scores, loadings, and contribution of observation can be directly extracted using R package FactoMineR [40]. Map visualizations of PCA’s results are implemented on QGIS (version 2.18.16) software [41].

3. Results and Discussion

3.1. Data Screening and Descriptive Statistics

A thorough review of the existing dataset revealed that the timing of the sample collection was routine and not intended to capture any specific event. Of the monitoring sites of small tributaries, some were dismissed in 2006 and some were added after 2007. Only the main river and big tributaries such as Zschopau, Flöha, Gimlitz, Pockau, and Hüttenbach have been monitored long enough to obtain continuous data series from 1999 to 2016. This could be a result of the WFD implementation in Germany, where one of the WFD-mandated deadlines for “setting up networks and putting them into operation” was December 2006 [42]. For this reason, the monitoring period considered in this study is restricted to 2006–2016.
Although more than 80 water quality parameters were screened, only 23 parameters were selected based on the criteria mentioned in Section 2.2. After the initial screening, the selected database included 7541 sampling events covering 23 parameters at 151 monitoring sites, for a period of 11 years (2006 to 2016). A descriptive statistics summary with the percentages of censored data is presented in Table 1. The monitoring sites cover the large streams of Freiberger Mulde, Zschopau, Große Striegis, Flöha, Bobritzsch, Aschbach, and 75 other smaller tributaries. It is noted that most of the parameters do not follow normal distribution with high standard deviation and skewness, with the exceptions of oxygen and temperature (Table 1). For principal component analysis, non-normal data is log-transformed and then standardized to zero mean and unit of variance to avoid misclassification arising from different scales and units of the monitored variables.

3.2. Characterized Water Quality Parameters and Sources Identification Based on Factor Loadings

The results of the principal component analysis for the data matrix of (7541 observations × 23 variables) in the FM river basin are shown in Table 2. There are five principal components (PCs) with the eigenvalue more than one, explaining 75.1% of the water quality variability (Figure 2). PC1 is strongly and positively correlated to HCO3, Ca2+, Cl, Mg2+, K+, Na+, SO42−, Boron and TIC and moderately correlated to NO3 and TON. The sources of these ionic concentrations may have multiple origins: rainfall, weathering of silicate and carbonate minerals, dissolved minerals contained in some sedimentary rocks, or leaching from the soil surface during rainstorms [43]. The first PC represents the weathering process and explains 37.6% of the total water quality variability in the river basin. The second component accounts for 12.9% of the observed data variability and has strong negative loadings on zinc and moderate loadings on nickel, fluoride, and arsenic. These trace metals and anions appear naturally in river waters through the weathering of minerals and also anthropogenically through the mixing of industrial effluents into the river streams and non-point pollution sources [44]. Taking into account the historical mining activities in the FM river basin [30], the major sources of PC2 are likely related to abandoned mines in the Ore Mountains. The third component explains 11.9% of the total variance and has negative strong loadings on dissolved organic carbon and total organic carbon and positive moderate loadings on turbidity and NO3. PC3 represents organic matter, which could originate from the natural decomposition of organic material, as well as anthropogenic activities including agriculture and domestic wastewater discharges [45]. PC4 shows the inverse relationship between temperature and oxygen, representing the seasonal effects and explaining 7.4% of the variance. PC5 accounts for only 5.3% of the data variability and does not show strong or moderate correlation to any variable.
To strengthen the interpretation, varimax rotation was applied for the first five principal components and resulted in a new set of loadings (Table 2). For the first four components, PCA and varimax-rotated PCA gives the same interpretation of the hidden factors that affect the water quality variability of the FM river basin. The first rotated component (RC) links to the major ions and total inorganic carbon but only explains 34.9% instead of 37.6% of the data variability. RC2 also relates to mining activities and weathering processes, but with strong loadings on arsenic and fluoride and moderate loading on zinc, it explains only 11.2% of the total variance. RC3 conveys the same strong correlation to organic carbon and turbidity and accounts for 11.4% of the observed variability. RC4 shows the seasonal effects with strong positive loading on temperature and negative loading on oxygen, which explains 9% of the total variance. It is only on the fifth component that the varimax rotation shows a strong loading of manganese and moderate loadings of nickel and zinc and this explains 8.7% of the total variance. Therefore, the sources of RC5 could also be the weathering processes and the historic mining activities. In combining the results from PCA and varimax rotation, the major sources of surface water quality variation in FM river include weathering process, mining activities, agriculture, seasonality, and wastewater effluents. For the first five components, 75.1% of the variance can be explained by 18 parameters: HCO3, Ca2+, Cl, Mg2+, K+, Na+, SO42−, Boron, TIC, Arsenic, Zinc, Nickel, Fluoride, DOC, TOC, Temperature, Oxygen, and Manganese. Notably, PCA does not give a substantial data reduction with more than 78% of the parameters (18 out of 23) to explain 75.1% of the data variation.
PCA does not explicitly account for the redundancy of correlated variables. To further reduce the number of monitoring parameters, Pearson’s correlation coefficients for all 23 parameters were computed for the entire monitoring period (Figure 3). If the correlation coefficient is between 0.9 and 1 (or −0.9 and −1), the two variables are highly correlated and can be represented by a linear relationship. Thus, for the paired variables that have correlation coefficients of more than 0.9, one of them could be discarded to reduce the redundancy of the information, e.g., Cl − Na+ (0.92), HCO3 − TIC (0.96), Ca2+ − Mg2+ (0.92), and DOC − TOC (0.94), with the variable of higher loading on the principal component being kept. Consequently, combining the PCA results and Pearson correlation analysis, four parameters (Na+, HCO3, Mg2+, DOC) can be further discarded. As a result, 14 variables (Boron, Calcium, Chloride, Potassium, Sulphate, TIC, Fluoride, Arsenic, Zinc, Nickel, Temperature, Oxygen, TOC, and Manganese) now explain 75% of the total variance, and therefore, should remain under observation.

3.3. Spatial and Temporal Variability of Water Quality Based on the Contribution of Observations

Variation of water quality is captured and represented by sampling points (geographical or pollution effect) and sampling months (seasonal effect) [20]. In this study, the contributions of monitoring sites were used to visualize the spatial variation of water quality in the FM river basin (Figure 4). For a given component, the value of contributions in percentage are summed up to 100, with the variance explaining the monitoring sites summed up to the variance explained by that component. One percentage of contribution is recommended as the threshold to decide if a monitoring site is critical on a specific component. As such, 25 monitoring sites contribute more than one percent to PC1. They are located mostly on the upstream tributaries (16 sites) and partly on the FM river and its small first-order streams (9 sites). These monitoring sites make up 23% out of 37.6% of variance explained by the first component. The weathering process is therefore spatially dependent and plays an important role in the upstream and mainstream of the FM river basin in explaining the water quality variance. The second component shows higher contributions in three places: Wilisch (in the upper west of the river basin) and Roter Graben and Münzbach streams, which are close to Freiberg city where abandoned mines and heavy industries are located. On the contrary, monitoring sites where organic matter was observed showed that this component was homogenously distributed across the river basin, with only minor variations being observed. Notably, the highest contribution site to PC3 (10.1%) is located on Lampertsbach, which is a 5.6-km long stream running through the populated area of Cranzahl and connected to the Sehma river. The second-highest contribution site of organic matter was shown to be on the Schwarze Pockau, derived from bog-water in the Ore Mountains. Like PC3, the fourth component (PC4) was evenly distributed among the monitoring sites, with a maximum deposit of only 1.75% on Zschopau river, again showing that temperature and oxygen contribute to minor spatial variations. The most relevant sites for measuring the fifth component (PC5) are located on the Graben and Münzbach streams, with both being affected by mining discharges from Freiberg city industry.
The temporal variability of each principal component was analyzed according to the monthly contributions of each observed component, and calculated using the total amount of observations. The contribution per component over a 12 months period was summed up to 100%; if each month shows an equal contribution, the period of time sampled shows a negligible impact on that component. A fluctuation in contributions over several months indicated that the component is subjected to temporal variation, with a higher contribution suggesting the influence of the time period. Figure 5 shows the contribution over the entire monitoring period of 12 months (January to December) for the first five components. The first component (PC1) remains almost constant over the 12 months, with a minor contribution shift observed during the cold season of November to March, potentially because the rate of chemical weathering decreases with the decreasing temperature [46]. This also indicates the minor influence of the sampling months on the mineral contents in the FM river basin. For the second component, a weak point is shown during warm periods, accompanied by a peculiar pattern of higher and lower contributions, differing from month to month, with low contributions in even months and high contributions in odd months. While the reason for this oscillating contribution remains unclear, the observations suggested that the monitoring schemes favored odd months over even months. The fluctuations in components three and four are quite similar: in PC3, the maximum variation of organic matter is observed in July (12.1%), which is almost twice the contribution of April (5.6%) and December (5.9%). The higher contribution of organic matter from June to September could be related to lower and more variable flow during summertime. In PC4, the extreme warm (July to September) and cold months (January to March) play a bigger role than the milder months of April, May, and October in demonstrating the variation. The fifth component resembles patterns of the first and second component, with less seasonal variation of manganese in the FM river basin.
Based on the contributions of observations from PCA, the mineral contents (major ions) in the FM river basin are mainly impacted by sampling locations rather than seasonality. In contrast, sampling months play a more important role in explaining the variation of organic matter, temperature, and oxygen than sampling locations. Both monitoring schemes and locations influence the heavy metals variation in the FM river basin. Temporally, July and September contribute the most and December the least in explaining the data variability. As an implication, monitoring strategies should focus more on the warmer months to capture the most variability of water quality in the FM river basin. The significance of discharge variability for concentration variability should be further studied. Spatially, areas close to Freiberg city and to the upper west of the river basin are the hotspots in terms of heavy metals and mineral contents in explaining the water quality variations.

3.4. Cost-Effectiveness of Proposed Water Quality Monitoring Network Based on PCA Results

Although the principal component analysis helped to identify the critical factors, variables, and monitoring sites that explain the water quality variability, this information still does not constitute a criterion to decide if the proposed variables and sites present the optimum options. This section strives to provide a solution to this problem by quantifying the cost-effectiveness of the monitoring network based on the results from PCA. According to Harmancioglu, et al. [10], a possible way of measuring the benefits of monitoring practice can be the information conveyed by the collected data. This study was conducted under the assumption that the “effectiveness” or the “information” of a monitoring network corresponds to the water quality variance deriving from the monitoring data collected. The information is therefore equivalent to the variance explained by the principal components: specifically, if only strong loading parameters on the first component (Ca2+, Cl, Mg2+, K+, Na+, SO42−, Boron, and TIC) are monitored for all monitoring sites, then only 37.6% of water quality variability is preserved. Cumulatively, if all 10 strong loading variables on PC1 and PC2 are monitored (at all 151 monitoring sites), then the monitoring network retains 50.5% of its information. Depending on the monitoring requirement, the water managers can select the parameters for observations on specific components accordingly.
Monitoring costs in the state of Saxony are program-based and the monitoring prices of different parameters are not available. Therefore, we estimated the monitoring costs based on the 2019 services’ price list of Brandenburg, another State in Germany that neighbors Saxony [47]. These estimations include the cost of transportation (for an average of 10 monitoring sites per day), sampling, and laboratory analysis. Detailed prices are given in Table 3. If only the laboratory cost was considered, monitoring of organic matter (PC3), temperature and oxygen (PC4), and inorganic contents (PC1) would be more economical compared to the heavy metals (PC2 and PC5), with the percentage of information achieved per euro being 0.71, 0.68, 0.58, and 0.19, respectively. Furthermore, monitoring of all 23 variables appeared to be less economical than monitoring the 14 critical variables of the first five components.
Each monitoring site has a different contribution to a component to explain the total variance; the variance explained by a monitoring site i on j components, denoted v i , j , is calculated as:
v i , j = c t r i , 1 × v 1 + c t r i , 2 × v 2 + +   c t r i , j × v j
where   c t r i , j is the contribution of monitoring site i on component j, and v j is the variance explained by j-th component. The variance explained by monitoring sites on each component is given in Annex 1. According to the monitoring variables and number of monitoring sites, sampling and monitoring costs are estimated for one monitoring event. To quantify the cost at different levels of information achieved, the monitoring costs were estimated for five scenarios:
  • PC1: monitoring of six variables strongly correlated to PC1 (Ca2+, Cl, K+, SO42−, Boron, and TIC) and obtaining 37.6% of information accordingly;
  • PC1,2: monitoring of 10 variables strongly correlated to PC1 and PC2 (Ca2+, Cl, K+, SO42−, Boron, TIC, Fluoride, Arsenic, Zinc, Nickel) and obtaining 50.5% of information accordingly;
  • PC1,3,4: monitoring of nine variables strongly correlated to PC1, PC3, and PC4 (Ca2+, Cl, K+, SO42−, Boron, TIC, TOC, temperature, oxygen) and obtaining 56.9% of information accordingly;
  • PC1-5: monitoring of 14 variables correlated to the first five components (Ca2+, Cl, K+, SO42−, Boron, TIC, Fluoride, Arsenic, Zinc, Nickel, TOC, Oxygen, Temperature, Manganese) and obtaining 75.1% of information accordingly; and
  • All PC: monitoring of all 23 variables and obtaining 100% of the information.
An adaptation from the cost-effectiveness plane illustrating the information and the costs of different monitoring options is shown in Figure 6. Monitoring sites are in descending order based on their contributions to the variance explained (given in Appendix A) for calculating cumulative variance. The cost-effectiveness plane in our case consists of four-quadrants: high information—low cost, high information—high cost, low information—high cost, and low information—low cost. Five strategies of variable selection according to the variance explained by principal components are also displayed in the same diagram (Figure 6). The current monitoring practice of 23 variables at 151 monitoring sites would give 100% of information on data variability at estimated 51,507 euro per monitoring event (equivalent to 100% cost). A reduction of monitoring sites or WQ variables would result in a decrease in the information achieved as well as the monitoring costs. As such, monitoring of six variables of PC1 (Ca2+, Cl, K+, SO42−, Boron, and TIC) at 151 sites would cost 17,487 euro (40% of the total cost) but would only give 37.6% of the information. Monitoring the 10 variables of PC1 and PC2 (Ca2+, Cl, K+, SO42−, Boron, TIC, Fluoride, Arsenic, Zinc, Nickel) at 151 sites would explain 50.5% of the information at the cost of 26,411 euro (~51.2% compared to the total cost). The PC1 curve lies completely in the low information—low cost quadrant, while the PC1,2 curve exceeded 50% of the cost at 148 sites. Combination of three components (PC1,3, and 4) with cost-effective variables explained 57% of the data variability at the cost of 21,669 euro for all sites, which provides more information at less cost than the combination of PC1 and 2. The high laboratory costs of heavy metals made the cost of PC1,2 curve increase faster than the information added, as compared to the PC1 and PC1,3,4 curves. Although the curves of PC1-5 and All PC expand in three quadrants, it is deemed more effective to monitor more variables at fewer sites than the opposite practice. For example, to achieve 75.1% of information, measuring 14 variables at 151 sites would cost 34,837 euro while monitoring all variables at 72 sites provides the same amount of information and would only cost 24,616 euro. It is noteworthy that the strategies all parameters (All PC), main parameters (PC1-5), and cost-effective parameters (PC1,3,4) perform similarly: up to 45% information, although they differ in their emphasis on number of sites versus number of parameters. Other options with different level of cost and information achieved can be compared easily using the rank of monitoring sites in Annex 1 and price list in Table 3.
The most challenging aspect of this approach is the selection of the representative variables on the principal components. In this study, strong loading variables (loadings > 0.7) with low correlation coefficient (<0.9) were selected; by ignoring other variables, part of the variance explained would certainly be lost. In addition, the input for PCA requires no missing data; thus, this approach is data-dependent and only applicable when a decision must be made to remove monitoring sites or WQ parameters. The quantification of information based on the variance explained limits the objective of the designed monitoring network to the determination of changes in water quality only, without consideration of other specific objectives such as trend detection and compliance monitoring. Finally, the cost estimation was simplified for one monitoring event without consideration of other fixed and operational costs of a monitoring program. In order to curb these limitations and provide a more effective monitoring network design, future research should consider the quantification of multi-objectives monitoring network (data quality, information accuracy, statistical methods, monitoring costs, stakeholder views, social factors, etc.) and monitoring frequencies.

4. Conclusions

This study demonstrates the usefulness of principal component analysis (PCA) in analyzing the complex dataset to address the water quality management in rivers. PCA proves to be useful for the analysis of 11-year irregular monitoring data from the Freiberger Mulde (FM) river basin, which is comprised of 23 water quality parameters and 151 monitoring sites. A combination of PCA and Pearson’s correlation analysis allowed for identification of 14 critical parameters that are responsible for explaining 75.1% of data variability in the river basin. Weathering processes, historical mining, wastewater discharges, and seasonality are the main causes of the river water quality variability. The contributions calculated from factor scores are very insightful in interpreting spatial and temporal sources of water quality variations. As such, heavy metals are impacted by both sampling locations and sampling time. Specifically, Wilisch (in the upper west of the river basin), Roter Graben, and Münzbach streams, which are close to the Freiberg city, appear to be the best selections for monitoring of heavy metals. Monitoring of those significant sites is recommended to guarantee the continuity of effective water quality monitoring in the future. The mineral contents play an important role in explaining the water quality variations of the FM river basin and are impacted more by the sampling locations than the sampling months. The variation of organic matter, oxygen, and temperature, in contrast, are more dependent on the sampling months rather than the sampling locations, with July and September contributing to the highest variability in water quality. Temporarily, five major factors explaining water quality of the FM river basin vary the most in July and September and the least in December, hence the future monitoring scheme should concentrate more on the warmer months.
This work establishes a simple quantification of the cost-effectiveness framework of the monitoring networks based on PCA results for the FM river basin. Under the current monitoring-intense conditions, preserving monitoring variables rather than sites seems to be more economical than the opposite practice. To achieve 75% of variance, it is recommended to monitor 23 parameters at 72 monitoring sites, rather than monitoring 14 parameters at 151 monitoring sites, with the first option resulting in a cost decrease of 20% compared to the second option. Different variable selection strategies increase in significance depending on the requirement for substantial cost reductions. Up to 40% of information can be retained for less than 15% of current costs, at either 21 sites with all variables or 31 sites with the main variables (PCs 1-5), or 50 sites with more economical variables (PCs 1,3, and 4). This approach is restricted to quantify the basin-wide variability of water quality based on the previously established water quality variables and sampling sites. Further quantification of monitoring frequencies still needs to be specified in order to assess the effectiveness of the monitoring network. Often, monitoring intends to assess the state or development of a water body. Objectives such as trend detection or compliance assessment require other evaluation criteria, rather than information explained. The monitoring costs in this study were estimated only based on laboratory, transportation, and sampling costs, but the costs of the whole monitoring program can be easily incorporated into the presented approach if the data of monitoring period, frequencies, and other costs (logistics, personnel, maintenances, etc.) are available. This approach may support water managers and practitioners in selecting the optimum monitoring sites and variables through a rational understanding of the dynamic sources of water quality, when there is a need to reduce the monitoring costs.

Author Contributions

Conceptualization: T.H.N., B.H.; data analysis: T.H.N.; writing and first draft preparation: T.H.N.; reviewing: B.H., H.H., S.C.; research supervision: H.H., P.K. All authors have read and agreed to the published version of the manuscript.

Funding

This study is funded by German Academic Exchange Service (DAAD) and The United Nations University Institute for Integrated Management of Material Fluxes and of Resources (UNU-FLORES). Open Access Funding by the Publication Fund of the TU Dresden

Acknowledgments

We would like to thank our colleagues Stacy Roden, Louisa Andrews, and Mahesh Jampani for proofreading this manuscript. We further esteem the open data policy of Saxon State Office for the Environment, Agriculture and Geology as well as the provision of monitoring costs by Berlin-Brandenburg state laboratory.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. The variance (in percentage) explained by each monitoring site on selected principal components.
Table A1. The variance (in percentage) explained by each monitoring site on selected principal components.
SiteRiverPC1PC2PC3PC4PC5PC1,2PC1,3,4PC1-5All PCs
OBF31301Freiberger Mulde0.2060.0150.1250.0920.0590.2220.4240.4980.689
OBF31302Zethaubach0.0660.0370.0540.0320.0040.1020.1510.1920.283
OBF31303Helbigsdorfer Bach0.0320.0780.0370.0480.0090.1100.1170.2050.296
OBF31400Freiberger Mulde0.0810.0410.0760.0430.0360.1220.2000.2770.422
OBF31500Freiberger Mulde0.0940.0110.0530.0360.0400.1050.1830.2340.573
OBF31510Freiberger Mulde0.1560.0590.0520.0300.0460.2150.2380.3430.650
OBF31520Freiberger Mulde0.0420.0390.0120.0090.0100.0810.0630.1120.182
OBF31530Stangenbergbach0.0750.2200.0420.0120.0450.2950.1300.3950.544
OBF31540Hüttenbach0.3120.0690.0710.0320.0380.3810.4150.5220.793
OBF31600Freiberger Mulde0.1810.1980.0370.0400.0270.3790.2580.4830.806
OBF31601Kleinwaltersdorfer Bach0.0200.0110.0230.0560.0100.0310.1000.1210.213
OBF31610Freiberger Mulde0.2120.0730.0190.0230.0350.2850.2550.3620.429
OBF31700Freiberger Mulde0.7390.2550.0700.0700.1120.9940.8801.2461.510
OBF31701Freiberger Mulde0.1780.0490.0170.0080.0280.2270.2040.2810.333
OBF31710Freiberger Mulde0.2150.0390.0280.0170.0290.2530.2600.3280.395
OBF31711Pitzschebach0.1510.0130.0570.0890.1180.1640.2980.4280.686
OBF31800Freiberger Mulde0.1890.0350.0210.0240.0230.2240.2330.2910.346
OBF31801Marienbach0.1780.0350.0430.0450.0410.2120.2660.3420.442
OBF31900Freiberger Mulde0.1950.0340.0220.0240.0220.2290.2410.2970.358
OBF31950Freiberger Mulde0.1820.0290.0290.0310.0150.2110.2410.2850.357
OBF32000Freiberger Mulde0.4620.0650.0700.0760.0380.5270.6080.7110.859
OBF32001Gärtitzer Bach0.5980.0830.0430.0740.0630.6820.7150.8621.005
OBF32201Görnitzbach0.7900.2470.1700.0860.0591.0361.0461.3511.571
OBF32202Schickelsbach0.3470.0620.0750.0360.0420.4090.4580.5620.690
OBF32203Polkenbach0.5590.1430.0500.0630.0450.7020.6720.8600.991
OBF32204Polkenbach0.3340.0550.0190.0470.0520.3890.3990.5070.598
OBF32205Fritzschenbach0.3650.0640.0730.0720.0700.4290.5110.6450.778
OBF32206Schanzenbach0.4930.2320.2170.0880.0860.7260.7981.1171.299
OBF32300Freiberger Mulde0.2510.0320.0920.0920.0570.2820.4340.5230.742
OBF32600Chemnitzbach0.0790.0350.1150.0440.0220.1140.2370.2940.406
OBF32601Voigtsdorfer Bach0.0870.0050.0630.0260.0080.0920.1760.1890.342
OBF32700Grosshartmannsdorfer Bach0.0640.1100.0960.0810.0110.1750.2420.3630.568
OBF32750Gimmlitz0.2930.0270.1160.0670.0080.3200.4760.5110.657
OBF32800Gimmlitz0.1000.0480.1050.0430.0150.1480.2490.3120.452
OBF32900Münzbach2.1020.1530.2430.0890.2062.2552.4342.7933.363
OBF32901Münzbach0.2230.3910.1570.0630.0450.6130.4420.8781.398
OBF32903Münzbach0.3490.0500.0390.0240.0380.3990.4130.5010.728
OBF33010Roter Graben0.1571.9090.0550.0540.6752.0660.2662.8493.273
OBF33020Roter Graben0.4141.1540.0350.0540.4251.5680.5042.0822.405
OBF33090Bobritzsch0.0330.0030.2450.1030.0170.0360.3810.4010.699
OBF33100Bobritzsch0.0180.0480.0860.0600.0730.0660.1640.2850.515
OBF33111Dittmannsdorfer Bach0.1800.0400.0660.0460.0050.2190.2910.3360.465
OBF33200Bobritzsch0.0510.0610.1000.1060.1000.1120.2570.4180.657
OBF33300Sohrbach0.0180.0180.0420.0420.0150.0360.1010.1340.455
OBF33400Colmnitzbach0.0230.0170.0330.0480.0200.0400.1050.1420.234
OBF33500Rodelandbach0.0390.0150.0710.0610.0070.0540.1700.1930.319
OBF33601Erbisdorfer Wasser0.0460.0630.0580.0350.0070.1090.1390.2100.350
OBF33650Grosse Striegis0.0070.0640.0520.0050.0930.0710.0650.2220.296
OBF33701Oberreichenbacher Bach0.0250.0310.0590.0430.0190.0550.1260.1760.255
OBF33702Schirmbach0.0070.0110.0300.0460.0050.0180.0830.0990.207
OBF33703Kemnitzbach0.0140.0240.1200.0560.0110.0380.1900.2250.406
OBF33710Grosse Striegis0.0410.0090.0320.0350.0050.0510.1080.1230.254
OBF33711Langhennersdorfer Bach0.0570.0370.0450.0340.0070.0940.1360.1810.243
OBF33713Aschbach0.0580.0290.0270.0680.0370.0880.1540.2200.468
OBF33800Grosse Striegis0.0960.0120.0610.0660.0080.1080.2230.2430.422
OBF33900Grosse Striegis0.2490.0270.0850.0930.0100.2760.4270.4640.676
OBF34101Pahlbach0.0430.0250.0370.0350.0280.0690.1160.1690.287
OBF34200Kleine Striegis0.1780.0540.0340.0560.0060.2320.2670.3280.425
OBF34300Klatschbach0.5060.1040.0850.1070.0150.6110.6980.8181.219
OBF34390Geyerbach0.2710.3400.0060.0060.0500.6110.2830.6730.831
OBF34400Zschopau1.1730.0490.0970.0220.0231.2221.2921.3641.553
OBF34401Geyerbach0.1580.3380.0570.0580.0130.4960.2730.6250.719
OBF34403Greifenbach0.2710.1650.0520.0830.0120.4370.4070.5840.699
OBF34404Greifenbach1.9370.3220.3260.0210.0392.2582.2832.6443.170
OBF34405Zschopau0.2030.0150.0280.0080.0090.2180.2390.2620.352
OBF34409Zschopau0.0430.0380.0430.0300.0130.0810.1170.1670.260
OBF34601Hüttenbach0.1240.0460.1120.1070.1030.1700.3430.4920.690
OBF34700Zschopau0.0160.0130.0260.0220.0110.0290.0640.0880.135
OBF34701Venusberger Dorfbach0.0560.0100.0670.0320.0050.0660.1550.1710.282
OBF34710Zschopau0.0060.0070.0160.0140.0090.0130.0360.0530.079
OBF34801Dittmannsdorfer Bach0.0090.0140.0520.0270.0050.0230.0890.1080.183
OBF34802Schwarzbach0.0100.0100.0850.0280.0400.0210.1240.1740.272
OBF34890Zschopau0.0130.0100.0290.0350.0160.0230.0770.1030.155
OBF34900Zschopau0.0240.0250.0580.0500.0220.0490.1320.1790.287
OBF34901Eubaer Bach0.3820.0200.0520.0460.0070.4030.4790.5070.673
OBF34910Zschopau0.0260.0190.0530.0720.0350.0450.1510.2060.321
OBF35001Mühlbach0.0160.0200.0510.0260.0320.0360.0930.1450.231
OBF35002Lützelbach0.1810.0170.0190.0400.0080.1990.2400.2660.365
OBF35003Holzbach0.1320.0130.0360.0340.0130.1460.2030.2290.308
OBF35101Ottendorfer Bach0.0970.0280.0380.0460.0110.1250.1810.2200.306
OBF35102Altmittweidaer Bach0.3390.0500.0610.0760.0110.3900.4760.5380.685
OBF35103Auenbach0.1030.0480.0360.0410.0100.1510.1800.2390.320
OBF35200Zschopau0.0410.0200.0900.0780.0240.0610.2090.2520.390
OBF35251Schweikershainer Bach0.1510.0590.0540.0630.0160.2100.2670.3430.445
OBF35252Richzenhainer Bach0.3240.0920.0690.0660.0120.4160.4590.5640.736
OBF35253Richzenhainer Bach0.5950.0340.0550.0730.0560.6290.7230.8130.998
OBF35254Gebersbach0.3400.0840.0890.0690.0510.4250.4980.6340.834
OBF35255Eulitzbach0.3740.0590.1370.0890.0940.4330.5990.7521.012
OBF35257Mortelbach0.2220.0720.0280.0300.0060.2940.2800.3580.456
OBF35258Mortelbach0.1820.0270.0550.0960.0480.2090.3330.4080.688
OBF35310Zschopau0.0080.0040.0170.0100.0050.0120.0350.0440.070
OBF35350Zschopau0.0750.0630.1370.1300.0420.1380.3410.4470.683
OBF35391Rote Pfütze0.0070.1190.0250.0160.1490.1260.0470.3150.445
OBF35400Rote Pfütze0.1100.0090.0770.0410.0080.1190.2280.2450.358
OBF35490Sehma1.3550.0130.0880.0570.0201.3681.5001.5331.736
OBF35600Sehma0.1000.0240.0250.0190.0030.1240.1430.1710.223
OBF35601Lampertsbach1.1270.3201.2090.1170.0191.4462.4532.7913.774
OBF35602Lampertsbach0.1190.0070.0070.0090.0030.1250.1350.1450.216
OBF35650Sehma0.0460.0200.0140.0070.0030.0660.0670.0900.142
OBF35800Sehma0.0700.0360.0560.0640.0530.1060.1900.2790.572
OBF35802Sehma0.1020.1930.0230.0770.0030.2950.2010.3970.565
OBF36000Pöhlbach0.0510.0230.0350.0140.0070.0740.1010.1300.289
OBF36100Pöhlbach0.0310.0070.0210.0140.0090.0380.0660.0810.207
OBF36200Pöhlbach0.0370.0190.0550.0550.0080.0560.1470.1730.442
OBF36300Pöhlbach0.0260.0120.0580.0350.0070.0380.1190.1380.266
OBF36400Pressnitz1.1010.0150.0780.0310.0481.1161.2101.2741.572
OBF36402Steinbach0.2620.0240.0360.0380.0130.2850.3350.3720.494
OBF36403Haselbach0.3840.0110.0240.0280.0040.3940.4360.4500.629
OBF36404Sandbach0.0150.0190.0400.0210.0100.0330.0760.1040.200
OBF36450Pressnitz0.1220.0040.0180.0120.0020.1260.1510.1580.189
OBF36500Pressnitz0.2920.0220.0750.0560.0140.3140.4230.4580.598
OBF36600Jöhstädter Schwarzwasser0.5530.0230.0470.0390.0150.5750.6390.6770.952
OBF36601Jöhstädter Schwarzwasser0.2300.0140.0280.0310.0040.2430.2880.3060.396
OBF36700Rauschenbach0.1180.0290.0790.0280.0140.1470.2250.2680.435
OBF36793Wilisch0.0360.0910.0850.0570.0460.1260.1780.3150.444
OBF36794Wilisch0.1310.8740.0320.0600.0101.0040.2241.1071.469
OBF36795Wilisch0.0290.2870.0220.0290.0060.3160.0790.3720.520
OBF36797Wilisch0.0150.0620.0480.0310.0170.0770.0940.1730.238
OBF36800Wilisch0.0650.1160.0510.1100.0600.1820.2270.4040.714
OBF36801Jahnsbach0.0220.0170.0660.0400.0060.0390.1270.1510.259
OBF36803Jahnsbach0.2710.1920.0150.0620.0050.4630.3490.5460.647
OBF36850Flöha0.7870.0240.0340.0220.0060.8110.8430.8730.997
OBF36911Cämmerswalder Dorfbach0.1200.0310.0890.0280.0060.1510.2360.2740.378
OBF36912Mortelbach0.0980.0310.0730.0320.0060.1290.2030.2410.355
OBF37000Flöha0.4460.0330.1010.0650.0180.4790.6120.6630.801
OBF37001Rungstockbach0.5090.0090.0230.0280.0210.5180.5600.5900.721
OBF37010Flöha0.2360.0150.0570.0490.0110.2510.3420.3680.499
OBF37101Saidenbach0.0640.0550.0260.0220.0180.1200.1120.1850.266
OBF37103Saidenbach0.0640.0790.0580.0460.0200.1430.1680.2670.359
OBF37104Haselbach0.1670.0760.0640.0380.0190.2430.2690.3640.475
OBF37105Lautenbach0.2880.0540.0560.0600.0790.3420.4040.5380.691
OBF37106Röthenbach0.1180.0430.0570.0540.0150.1610.2290.2870.371
OBF37300Flöha0.0970.0350.1420.0800.0180.1310.3180.3710.544
OBF37400Schweinitz0.4790.0250.2200.0740.0120.5040.7730.8100.990
OBF37401Seiffener Bach0.0230.0130.0300.0740.0610.0370.1270.2010.325
OBF37404Seiffener Bach0.0060.0330.0660.1250.0080.0380.1970.2370.353
OBF37450Natzschung0.3000.0030.0300.0130.0020.3030.3430.3490.390
OBF37500Natzschung1.4650.0240.1780.0790.0101.4891.7221.7561.970
OBF37600Bielabach0.0390.0490.0430.0250.0090.0890.1070.1650.251
OBF37800Schwarze Pockau0.9080.0120.3510.0510.0070.9191.3101.3281.503
OBF37910Schwarze Pockau1.1960.0220.4230.0890.0091.2191.7081.7401.976
OBF38000Schwarze Pockau0.1490.0630.2200.0710.0520.2120.4400.5550.725
OBF38100Rote Pockau0.0240.0330.0840.0250.0460.0570.1330.2120.300
OBF38101Rote Pockau0.0580.1790.0320.0260.0010.2370.1150.2950.337
OBF38190Rote Pockau0.0010.1180.0130.0320.0060.1180.0460.1700.210
OBF38200Rote Pockau1.7320.0250.4360.0500.0171.7572.2192.2602.695
OBF38201Schlettenbach0.0880.0150.0200.0340.0340.1030.1430.1920.305
OBF38400Grosse Lössnitz0.0190.0800.0970.0520.0190.0990.1680.2660.477
OBF38401Gahlenzer Bach0.0250.0270.0490.0450.0220.0520.1200.1690.304
OBF38402Weissbach0.0580.0490.0550.0400.0730.1060.1530.2740.469
OBF38500Hetzbach0.0370.0440.1360.0680.0340.0820.2420.3200.504
Total variance explained (%)37.612.911.97.45.350.556.975.1100
Values in bold show the important sites on the principal components. PC3-organic matter and PC4-seasonality show a minor dependence on the sampling locations with the variance distributed quite homogenously among the sites. On the first five components, contribution of a monitoring site can be calculated as the quotient of its variance and the total variance explained by the component.

References

  1. Singh:, K.P.; Malik, A.; Mohan, D.; Sinha, S. Multivariate statistical techniques for the evaluation of spatial and temporal variations in water quality of Gomti River (India)—A case study. Water Res. 2004, 38, 3980–3992. [Google Scholar] [CrossRef]
  2. UNEP. A Snapshot of the World’s Water Quality: Towards a Global Assessment; United Nations Environment Programme: Nairobi, Kenya, 2016; p. 162. [Google Scholar]
  3. Carpenter, S.R.; Caraco, N.F.; Correll, D.L.; Howarth, R.W.; Sharpley, A.N.; Smith, V.H. Nonpoint Pollution of Surface Waters with Phosphorus and Nitrogen. Ecol. Appl. 1998, 8, 559–568. [Google Scholar] [CrossRef]
  4. Arle, J.; Mohaupt, V.; Kirst, I. Monitoring of Surface Waters in Germany under the Water Framework Directive—A Review of Approaches, Methods and Results. Water 2016, 8, 217. [Google Scholar] [CrossRef]
  5. European Environment Agency Chemical Status of Surface Water Bodies. Available online: https://www.eea.europa.eu/themes/water/european-waters/water-quality-and-water-assessment/water-assessments/chemical-status-of-surface-water-bodies (accessed on 3 January 2020).
  6. Sanders, T.G. Design of Networks for Monitoring Water Quality; Water Resources Publication: Colorado, CO, USA, 1983. [Google Scholar]
  7. United Nations, World Health Organization. Water Quality Monitoring: A Practical Guide to the Design and Implementation of Freshwater Quality Studies and Monitoring Programmes, 1st ed.; Bartram, J., Ballance, R., Eds.; E & FN Spon: London, UK; New York, NY, USA, 1996; ISBN 978-0-419-22320-7. [Google Scholar]
  8. Ward, R.C.; Loftis, J.C.; McBride, G.B. The “Data rich but Information-poor” Syndrome in Water Quality Monitoring. Environ. Manag. 1986, 10, 291–297. [Google Scholar] [CrossRef]
  9. Dixon, W.; Chiswell, B. Review of aquatic monitoring program design. Water Res. 1996, 30, 1935–1948. [Google Scholar] [CrossRef]
  10. Harmancioglu, N.B.; Alpaslan, N. Water quality monitoring network design: A problem of multi-objective decision making. JAWRA J. Am. Water Resour. Assoc. 1992, 28, 179–192. [Google Scholar] [CrossRef]
  11. Behmel, S.; Damour, M.; Ludwig, R.; Rodriguez, M.J. Water quality monitoring strategies—A review and future perspectives. Sci. Total Environ. 2016, 571, 1312–1329. [Google Scholar] [CrossRef]
  12. Strobl, R.O.; Robillard, P.D. Network design for water quality monitoring of surface freshwaters: A review. J. Environ. Manag. 2008, 87, 639–648. [Google Scholar] [CrossRef]
  13. Singh, K.P.; Basant, A.; Malik, A.; Jain, G. Artificial neural network modeling of the river water quality—A case study. Ecol. Model. 2009, 220, 888–895. [Google Scholar] [CrossRef]
  14. Pérez, C.J.; Vega-Rodríguez, M.A.; Reder, K.; Flörke, M. A Multi-Objective Artificial Bee Colony-based optimization approach to design water quality monitoring networks in river basins. J. Clean. Prod. 2017, 166, 579–589. [Google Scholar] [CrossRef]
  15. Park, S.Y.; Choi, J.H.; Wang, S.; Park, S.S. Design of a water quality monitoring network in a large river system using the genetic algorithm. Ecol. Model. 2006, 199, 289–297. [Google Scholar] [CrossRef]
  16. Puri, D.; Borel, K.; Vance, C.; Karthikeyan, R. Optimization of a Water Quality Monitoring Network Using a Spatially Referenced Water Quality Model and a Genetic Algorithm. Water 2017, 9, 704. [Google Scholar] [CrossRef] [Green Version]
  17. Couto, C.M.C.M.; Ribeiro, C.; Maia, A.; Santos, M.; Tiritan, M.E.; Ribeiro, A.R.; Pinto, E.; Almeida, A. Assessment of Douro and Ave River (Portugal) lower basin water quality focusing on physicochemical and trace element spatiotemporal changes. J. Environ. Sci. Health Part A 2018, 0, 1–11. [Google Scholar] [CrossRef] [PubMed]
  18. Nguyen, T.H.; Helm, B.; Hettiarachchi, H.; Caucci, S.; Krebs, P. The selection of design methods for river water quality monitoring networks: A review. Environ. Earth Sci. 2019, 78. [Google Scholar] [CrossRef]
  19. Abdi, H.; Williams, L.J. Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2010, 2, 433–459. [Google Scholar] [CrossRef]
  20. Vega, M.; Pardo, R.; Barrado, E.; Debán, L. Assessment of seasonal and polluting effects on the quality of river water by exploratory data analysis. Water Res. 1998, 32, 3581–3592. [Google Scholar] [CrossRef]
  21. Simeonov, V.; Stratis, J.A.; Samara, C.; Zachariadis, G.; Voutsa, D.; Anthemidis, A.; Sofoniou, M.; Kouimtzis, T. Assessment of the surface water quality in Northern Greece. Water Res. 2003, 37, 4119–4124. [Google Scholar] [CrossRef]
  22. Singh, K.P.; Malik, A.; Sinha, S. Water quality assessment and apportionment of pollution sources of Gomti river (India) using multivariate statistical techniques—a case study. Anal. Chim. Acta 2005, 538, 355–374. [Google Scholar] [CrossRef]
  23. Zhang, X.; Wang, Q.; Liu, Y.; Wu, J.; Yu, M. Application of multivariate statistical techniques in the assessment of water quality in the Southwest New Territories and Kowloon, Hong Kong. Environ. Monit. Assess. 2011, 173, 17–27. [Google Scholar] [CrossRef]
  24. Kim, M.; Kim, Y.; Kim, H.; Piao, W.; Kim, C. Enhanced monitoring of water quality variation in Nakdong River downstream using multivariate statistical techniques. Desalination Water Treat. 2016, 57, 12508–12517. [Google Scholar] [CrossRef]
  25. Calazans, G.M.; Pinto, C.C.; da Costa, E.P.; Perini, A.F.; Oliveira, S.C. Using multivariate techniques as a strategy to guide optimization projects for the surface water quality network monitoring in the Velhas river basin, Brazil. Environ. Monit. Assess. 2018, 190, 726. [Google Scholar] [CrossRef] [PubMed]
  26. Pinto, C.C.; Calazans, G.M.; Oliveira, S.C. Assessment of spatial variations in the surface water quality of the Velhas River Basin, Brazil, using multivariate statistical analysis and nonparametric statistics. Environ. Monit. Assess. 2019, 191, 164. [Google Scholar] [CrossRef] [PubMed]
  27. Peña-Guzmán, C.A.; Soto, L.; Diaz, A. A Proposal for Redesigning the Water Quality Network of the Tunjuelo River in Bogotá, Colombia through a Spatio-Temporal Analysis. Resources 2019, 8, 64. [Google Scholar] [CrossRef] [Green Version]
  28. Ouyang, Y. Evaluation of river water quality monitoring stations by principal component analysis. Water Res. 2005, 39, 2621–2635. [Google Scholar] [CrossRef]
  29. Wang, Y.B.; Liu, C.W.; Liao, P.Y.; Lee, J.J. Spatial pattern assessment of river water quality: Implications of reducing the number of monitoring stations and chemical parameters. Environ. Monit. Assess. 2014, 186, 1781–1792. [Google Scholar] [CrossRef]
  30. Greif, A. The impact of mining activities in the Ore Mountains on the Mulde river catchment upstream of the Mulde reservoir lake. Hydrol. Wasserbewirtsch. 2015, 59, 318–331. [Google Scholar]
  31. Klemm, W.; Greif, A.; Broekaert, J.A.C.; Siemens, V.; Junge, F.W.; van der Veen, A.; Schultze, M.; Duffek, A. A Study on Arsenic and the Heavy Metals in the Mulde River System. Acta Hydrochim. Hydrobiol. 2005, 33, 475–491. [Google Scholar] [CrossRef]
  32. Dimmer, R. Gewässergütedaten. Available online: https://www.umwelt.sachsen.de/umwelt/wasser/7112.htm (accessed on 28 June 2019).
  33. US EPA. Guidance for Data Quality Assessment-Practical Methods for Data Analysis; United States Environmental Protection Agency: Washington, WA, USA, 2000.
  34. Löwig, M. Geodatendownload des Fachbereichs Wasser. Available online: https://www.umwelt.sachsen.de/umwelt/wasser/10002.htm?data=beschaffenheit (accessed on 18 June 2019).
  35. Hubert, M.; Reynkens, T.; Schmitt, E.; Verdonck, T. Sparse PCA for High-Dimensional Data with Outliers. Technometrics 2016, 58, 424–434. [Google Scholar] [CrossRef]
  36. Odom, K.R. Assessment and redesign of the synoptic water quality monitoring network in the Great Smoky Mountains National Park. Ph.D. Thesis, University of Tennessee, Knoxville, Tennessee, 2003. [Google Scholar]
  37. Guigues, N.; Desenfant, M.; Hance, E. Combining multivariate statistics and analysis of variance to redesign a water quality monitoring network. Environ. Sci. Process. Impacts 2013, 15, 1692. [Google Scholar] [CrossRef]
  38. Shrestha, S.; Kazama, F. Assessment of surface water quality using multivariate statistical techniques: A case study of the Fuji river basin, Japan. Environ. Model. Softw. 2007, 22, 464–475. [Google Scholar] [CrossRef]
  39. Revelle, W. Psych: Procedures for Psychological, Psychometric, and Personality Research; 2019; Software; Available online: https://cran.r-project.org/web/packages/psych/index.html (accessed on 8 January 2020).
  40. Husson, F.; Josse, J.; Le, S.; Mazet, J.; Husson, M.F. Package ‘FactoMineR.’. In Package FactorMineR; 2019; Software; Available online: http://factominer.free.fr/.
  41. QGIS Development Team. QGIS Geographic Information System; Open Source Geospatial Foundation Project, 2019; Software; Available online: https://www.qgis.org/en/site/.
  42. LAWA German Guidance Document for the Implementation of the EC Water Framework Directive; Bund/Länder-Arbeitsgemeinschaft Wasser: Berlin, Germany, 2003.
  43. Meybeck, M. Global occurrence of major elements in rivers. Treatise Geochem. 2003, 5, 207–223. [Google Scholar]
  44. Salomons, W.; Förstner, U. Metals in the Hydrocycle; Springer: Berlin/Heidelberg, Germany, 1984; ISBN 978-3-642-69327-4. [Google Scholar]
  45. Imai, A.; Fukushima, T.; Matsushige, K.; Hwan Kim, Y. Fractionation and characterization of dissolved organic matter in a shallow eutrophic lake, its inflowing rivers, and other organic matter sources. Water Res. 2001, 35, 4019–4028. [Google Scholar] [CrossRef]
  46. Kump, L.R.; Brantley, S.L.; Arthur, M.A. Chemical weathering, atmospheric CO2, and climate. Annu. Rev. Earth Planet. Sci. 2000, 28, 611–667. [Google Scholar] [CrossRef] [Green Version]
  47. Landeslabor Berlin-Brandenburg Leistungs-verzeichnis. Available online: https://www.landeslabor.berlin-brandenburg.de/sixcms/detail.php/883783 (accessed on 10 November 2019).
Figure 1. Location map of the study area with number of sampling events per water quality monitoring (WQM) site during the monitoring period of 2006 to 2016 on the Freiberger Mulde river basin in eastern Germany.
Figure 1. Location map of the study area with number of sampling events per water quality monitoring (WQM) site during the monitoring period of 2006 to 2016 on the Freiberger Mulde river basin in eastern Germany.
Water 12 00420 g001
Figure 2. Eigenvalues and cumulative variance explained for 23 principal components from principal component analysis (PCA).
Figure 2. Eigenvalues and cumulative variance explained for 23 principal components from principal component analysis (PCA).
Water 12 00420 g002
Figure 3. Pearson correlation matrix of 23 parameters during the whole monitoring period. The numbers are correlation coefficients.
Figure 3. Pearson correlation matrix of 23 parameters during the whole monitoring period. The numbers are correlation coefficients.
Water 12 00420 g003
Figure 4. Spatial variation of the five major factors in explaining the data variability in Freiberger Mulde river basin based on the contribution of monitoring sites to (a) Principal component 1—weathering and leaching processes, (b) Principal component 2—industrial and mining impacts, (c) Principal component 3—organic matter, (d) Principal component 4—seasonality, (e) Principal component 5—weathering and mining impacts.
Figure 4. Spatial variation of the five major factors in explaining the data variability in Freiberger Mulde river basin based on the contribution of monitoring sites to (a) Principal component 1—weathering and leaching processes, (b) Principal component 2—industrial and mining impacts, (c) Principal component 3—organic matter, (d) Principal component 4—seasonality, (e) Principal component 5—weathering and mining impacts.
Water 12 00420 g004
Figure 5. Temporal variation of the five major factors in explaining the data variability in Freiberger Mulde river basin over the whole monitoring period of 2006 to 2016 based on the contribution of 12 monitoring months.
Figure 5. Temporal variation of the five major factors in explaining the data variability in Freiberger Mulde river basin over the whole monitoring period of 2006 to 2016 based on the contribution of 12 monitoring months.
Water 12 00420 g005
Figure 6. The cost-information plane for different monitoring strategies. PC1 (37.6%)—monitoring 6 water quality variables at one to 151 sites, PC1,2 (50.5%)—monitoring 10 variables at one to 151 sites, PC1,3,4—monitoring 9 variables at one to 151 sites, PC1-5 (75.1%)—monitoring 14 variables at one to 151 sites, and All PC (100%)—monitoring 23 variables at one to 151 sites.
Figure 6. The cost-information plane for different monitoring strategies. PC1 (37.6%)—monitoring 6 water quality variables at one to 151 sites, PC1,2 (50.5%)—monitoring 10 variables at one to 151 sites, PC1,3,4—monitoring 9 variables at one to 151 sites, PC1-5 (75.1%)—monitoring 14 variables at one to 151 sites, and All PC (100%)—monitoring 23 variables at one to 151 sites.
Water 12 00420 g006
Table 1. Statistical summary of 23 analyzed water quality parameters in the Freiberger Mulde river basin from 2006 to 2016.
Table 1. Statistical summary of 23 analyzed water quality parameters in the Freiberger Mulde river basin from 2006 to 2016.
ParameterUnitMeanSDMedianMinMaxSkewKurtosisCensor Data (%)
Arsenicµg/L7.6926.6220.214809.84117.643
Bariumµg/L50.7625.464634806.2282.290
Bicarbonate (HCO3)mg/L55.3452.233905602.679.983.1
Boronµg/L32.8746.36222.8310007.8191.140.4
Calcium (Ca2+)mg/L30.6222.37252.11802.135.750
Chloride (Cl)mg/L31.2133.53231.1150013.24497.140
Dissolved organic carbon (DOC)mg/L3.743.023.10.35656.3268.790.5
Fluoridemg/L0.260.360.20.04109.1140.361.09
Magnesium (Mg2+)mg/L7.85.236.20.8502.145.410
Manganeseµg/L98.85523.2200.7184009.62101.350.5
Nickelµg/L3.566.232.20.35956.1244.766.1
Nitrate (NO3)mg/L20.1212.6318.50.491000.680.230
Oxygenmg/L10.861.5110.72.316.9−0.080.710
pH(-)7.360.57.44.39.8−1.76.90
Potassium (K+)mg/L4.536.593.40.436024.151146.830
Sodium (Na+)mg/L20.6526.67151.2100010.67268.960
Sulphate (SO42−)mg/L51.540.493975503.8121.850
Temperature°C9.364.999.4−1.126.40.14-0.750
Total inorganic carbon (TIC)mg/L9.629.926.450.351002.7210.371.2
Total organic carbon (TOC)mg/L4.634.663.70.351209.12149.020.3
Total organic nitrogen (TON)mg/L0.90.870.60.07152.7317.059.3
TurbidityTE/F7.8725.453.40110022.91781.860.6
Zincµg/L187.91203.27132.122100011.05140.116.4
Table 2. Loadings of the variables on the first five components according to PCA and varimax-rotated PCA.
Table 2. Loadings of the variables on the first five components according to PCA and varimax-rotated PCA.
ParametersPCAVarimax-Rotated PCA
PC1PC2PC3PC4PC5RC1RC3RC2RC5RC4
Arsenic0.31−0.52−0.270.370.350.090.010.80.030.23
Barium0.340.030.230.170.360.35−0.120.26−0.34−0.07
Bicarbonate0.880.340.030.02−0.010.880.2−0.03−0.170.19
Boron0.81−0.17−0.120.040.10.70.160.40.10.15
Calcium0.910.030.25−0.06−0.190.95−0.080.010.130.08
Chloride0.9−0.170.12−0.060.060.870.010.320.12−0.01
DOC0.110.38−0.77−0.310.15−0.020.92−0.06−0.050.09
Fluoride0.35−0.63−0.20.230.330.14−0.030.820.150.09
Magnesium0.830.020.28−0.08−0.360.89−0.15−0.120.240.11
Manganese0.22−0.55−0.32−0.44−0.340.130.220.180.81−0.09
Nickel0.19−0.67−0.11−0.1−0.20.08−0.10.380.620.00
Nitrate0.580.10.5−0.090.130.69−0.230.00−0.18−0.23
Oxygen−0.29−0.120.48−0.660.36−0.12−0.11−0.07−0.02−0.93
pH0.60.470.170.020.290.660.16−0.04−0.49−0.02
Potassium0.890.03−0.090.000.130.820.240.29−0.030.12
Sodium0.89−0.09−0.03−0.070.150.820.180.350.050.02
Sulphate0.84−0.110.11−0.12−0.350.84−0.030.010.360.13
Temperature0.330.14−0.470.69−0.220.150.140.15−0.10.9
TIC0.880.30.080.05−0.020.890.13−0.01−0.160.19
TOC0.120.36−0.81−0.340.15−0.020.96−0.04−0.020.09
TON0.550.1−0.17−0.140.120.50.330.13−0.020.01
Turbidity0.380.09−0.5−0.29−0.070.280.60.010.230.1
Zinc0.18−0.84−0.11−0.10.140.03−0.090.690.51−0.17
Eigenvalue8.6462.9732.7431.7031.2148.0272.6322.5652.0621.993
Variance0.3760.1290.1190.0740.0530.3490.1140.1120.090.087
Cumulative Variance0.3760.5050.6240.6980.7510.3490.4630.5750.6650.751
Values in bold are strong loadings, values in italic are moderate loadings.
Table 3. Price in euro for transportation, sampling, and laboratory analysis according to Brandenburg services price list in 2019.
Table 3. Price in euro for transportation, sampling, and laboratory analysis according to Brandenburg services price list in 2019.
ItemsRelated Principal ComponentPrice (Euro)Analytical MethodPrice per Principal Component (Euro)Variance Per Principal Component (%)Information Per Price (%/Euro)
Total inorganic carbonPC116.8 64.637.60.58
BoronPC119.7DIN EN ISO 17294-2 2005-02 (E 29)
Chloride, Sulphate, Calcium, Sodium, Potassium, Fluoride, Magnesium, NitratePC128.1 *DIN EN ISO 10304-1:2009-07 (D 20)
ArsenicPC219.7DIN EN ISO 17294-2 2005-02 (E 29)59.112.90.22
ZincPC219.7DIN EN ISO 17294-2 2005-02 (E 29)
NickelPC219.7DIN EN ISO 17294-2 2005-02 (E 29)
Total organic carbonPC316.8DIN EN 12260:1996 (H 34)IN EN 1484: 1997-08 (H 3)16.811.90.71
TemperaturePC41.9DIN 38404 Teil 4 (C4)10.97.40.68
OxygenPC49EN 25814:1992 (G22) DIN 3840-G23
ManganesePC528.1DIN 38406-E Serie28.15.30.19
First 5 PCs 179.575.10.42
pHAll PC9DIN 38404-5:2009-07 (C5)2901000.34
TurbidityAll PC9DIN EN ISO 7027: 2000-04
BariumAll PC19.7DIN EN ISO 17294-2 2005-02(E 29)
DOCAll PC35.4DIN EN 1484: 1997-08 (H 3)
BicarbonateAll PC1.9DEV D8: 1971
TONAll PC35.4DIN EN 1484: 1997-08 (H 3)
Transportation from 1 km to 100 km 152
Sampling with basic efforts 35.1
* price for all the listed parameters at one analysis; price is assumed to be equivalent to TOC.

Share and Cite

MDPI and ACS Style

Nguyen, T.H.; Helm, B.; Hettiarachchi, H.; Caucci, S.; Krebs, P. Quantifying the Information Content of a Water Quality Monitoring Network Using Principal Component Analysis: A Case Study of the Freiberger Mulde River Basin, Germany. Water 2020, 12, 420. https://doi.org/10.3390/w12020420

AMA Style

Nguyen TH, Helm B, Hettiarachchi H, Caucci S, Krebs P. Quantifying the Information Content of a Water Quality Monitoring Network Using Principal Component Analysis: A Case Study of the Freiberger Mulde River Basin, Germany. Water. 2020; 12(2):420. https://doi.org/10.3390/w12020420

Chicago/Turabian Style

Nguyen, Thuy Hoang, Björn Helm, Hiroshan Hettiarachchi, Serena Caucci, and Peter Krebs. 2020. "Quantifying the Information Content of a Water Quality Monitoring Network Using Principal Component Analysis: A Case Study of the Freiberger Mulde River Basin, Germany" Water 12, no. 2: 420. https://doi.org/10.3390/w12020420

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop