A comprehensive procedure to develop water quality index: A case study to the Huong river in Thua Thien Hue province, Central Vietnam

This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

This work proposed a novel procedure of Water Quality Index (WQI) development that could be used for practical applications on a local or regional scale, based on available monitoring data. Principal component analysis (PCA) was applied to the monthly data of 11 water quality parameters (pH, conductivity (EC), total suspended solid (TSS), dissolved oxygen (DO), five -day biological oxygen demand (BOD), chemical oxygen demand (COD), ammonia (N-NH 4 ), nitrate (N-NO 3 ), phosphate (P-PO 4 ), total coliform, and total dissolved iron monitored at 11 sites at Huong river in the years 2014–2016. From the PCA, the three extracted principal components explained 67% of the total variance of original variables. From the set of communality values, the weight (w i ) for each parameter was determined. Linear sub-index functions were established based on the permissible limits from the National Technical Regulations on Surface Water Quality set up by the Vietnam Environment Agency (VEA) to derive the sub-index (q i ) for each parameter. The multiplicative formula that is the product of the sub-indices (q i ) raised to the respective weights (w i ), was used for calculation of the final WQI values. The proposed index (WQI) was then applied to the river with quarterly data of the 11 parameters monitored at ten sites in the years 2017–2020. The WQI representatively reflected the actual status of the river overall water quality, of which 97.8% of the WQI values belonged to grades of EXCELLENT and GOOD, and 2.2% of grade MODERATE. Comparison between the river water quality evaluations resulting from the developed WQI with the WQI adopted by National Sanitation Foundation (NSF-WQI) and the index issued by Vietnam Environment Agency (VN-WQI) indicated that the proposed WQI was more suitable for river quality assessment.

With the aim at developing a comprehensive and simple WQI procedure, using available monitoring data, this study is based on the following approaches: (i) a mixed system is used in parameter selection (basic and additional parameters); (ii) PCA is applied to estimate relative weights of parameters; (iii) Sub-indices are determined based on linear equations that are derived from national water quality guidelines; (iv) multiplicative formula is used as an aggregation method to calculate final WQI. This WQI procedure then is applied to Huong river in Thua Thien Hue province, Central Vietnam.

The aggregation method to create the final WQI value must be selected so that it avoids problems of eclipsing and ambiguity [ 2 ]. The eclipsing arises wherein the final index value does not represent the actual state of overall water quality as the lower values of one or some sub-indices are dominated by the higher values of other sub-indices or vice versa. The ambiguity occurs wherein actual water quality is good, but final WQI answers to be bad or vice versa [ 4 , 17 , 19 , 39 , 40 ].

Index aggregation is conducted after the assignment of weights to obtain the final WQI value. The two most common methods to aggregate the sub-indices are the additive (arithmetic) and multiplicative (geometric) methods. There are also other modified versions of the two methods [ 2 , 4 ]. The mixed aggregation methods (combination of additive and geometric methods) are proposed by some researchers [ 16 , 23 , 30 ]. The multiplicative method which is shown in Eq (1) has been adopted for final aggregation in many WQIs [ 1 , 11 , 18 , 37 , 38 ].

The weights are assigned to the selected parameters concerning their relative importance and their influence on the final index value [ 2 , 4 ]. The weights of the parameters can be either equal or unequal. A few of WQIs used equal weights in the calculation [ 13 , 14 , 20 , 23 , 28 – 30 ]. Many WQIs were calculated with unequal weights. The weights assigned to the parameters were commonly defined by either participatory-based procedure such as Delphi method [ 1 ] or Analytical Hierarchy Process [ 31 ], or multivariate statistical analysis, mainly PCA and FA. To avoid subjective judgment from experts in the participatory-based procedure, the index developers suggested using PCA and FA to define parameter weights by different approaches [ 11 , 22 , 24 , 32 – 34 ]. Exploratory factor analysis (FA) is a dimension reduction method, similar in some respect to PCA, though different enough from PCA that the two should not in any real way be considered equivalent [ 35 ]. In practice, PCA is a relatively simple technique when compared to FA. With factor analysis, since there are so many options and complexities, the outcome of the procedure for any analysis may be different, depending on how many factors-remained solutions [ 35 , 36 ]. A big deal for FA is the non-uniqueness of loadings. This means that how well a given variable load onto a given factor often depends on how many factors were extracted in the factor analysis [ 35 , 36 ]. Other than FA, from PCA results, a given variable loading onto an extracted principal component is unique [ 35 ]. This means that the variable loadings obtained from PCA reflect intrinsic and actual influence or importance of the variables to the water body under study. Thus, a comprehensive and unique approach based on only PCA results to define the weights of water quality parameters is necessary for WQI development.

This step aims to transform concentrations of selected water quality parameters into a standardized or common scale without unit, typically within identical range, i.e. 0 (poorest) – 100 (best) or 0 (poorest) – 1 (best), called sub-index [ 2 ]. To define sub-index value, WQI developers have established the sub-index functions or rating curves of different parameters [ 4 , 9 ]. There are three methods that are usually employed: (i) expert judgment such as the NSF-WQI [ 1 ], Oregon Index [ 12 ], and Almeida’s Index [ 18 ]; (ii) use of the water quality standards or guidelines [ 12 – 14 , 16 , 23 , 25 – 27 ] and (iii) statistical methods. The use of water quality standards or guidelines facilitates sub-division of sub-index values and provides more information for the users [ 12 ]. Several procedures to calculate WQI directly from the parameters without transforming them into a common scale. For instance, the CCME-WQI development process [ 20 ] uses a specific mathematic equation for directly aggregating the index.

Based on a review of 30 existing WQIs, the parameters selected to calculate WQIs were divided into three types: fixed, open, and mixed systems [ 4 ]. The most of those WQIs have used a fixed set of parameters that is commonly called “basic” as the selected parameters are the most significant ones for water quality evaluation in the study site or region [ 1 , 2 , 12 – 18 ]. The fixed system (e.g. NSF-WQI with 9 parameters), allows users to compare water quality status among the sites or rivers, but not to add the new parameter(s) needed for assessment of water quality [ 19 ]. Some WQIs use an open system that has no guidelines for the selection of parameters, for example, the WQI developed by Canadian Council of Ministers of Environment [ 20 ]. This system causes difficulty in comparisons among monitored sites and among river basins [ 21 ]. The mixed system consists of the basic and additional parameters. The selection of additional parameters incorporated into WQI calculation is depended on their sub-index values or importance in river water quality reflection [ 13 ]. Many studies indicated that the objective (less subjective) way to select parameters for the development of a WQI is based on the results obtained from statistical analysis of available monitoring data, such as correlation analysis, multivariate analysis technique: principal component analysis/PCA, factor analysis/FA [ 2 – 4 , 22 – 24 ]. The issues mentioned above, relating to parameter selection for WQI development, indicate that a mixed system should be chosen to avoid ‘rigidity’ and the parameters selected should be ones monitored routinely, of great importance in reflecting river water quality.

The aim of establishing a WQI is to transform the concentrations of selected water quality parameters (or variables) with different units and dimensions into sub-indexes with dimensionless scale, defining subindices, and choosing an aggregation method to generate the numerical value for the index [ 2 , 4 , 10 ]. The general procedure to create a WQI consists of the following steps [ 2 , 4 , 5 ]: (i) selection of water quality parameters; (ii) computation of sub-index values through a transformation of the parameters to a standard scaling factor; (iii) estimation of weights for all parameters; (iv) aggregation of the sub-index values and weights to obtain the final WQI.

According to the reviews mentioned above, the remarks were extracted as follows [ 4 , 10 , 12 ]: (i) although many WQIs are available, there is still a need for an overall WQI that can incorporate the available data and describe the water quality for different uses; (ii) significant discrepancies were observed in the course of water quality classification from different methodologies; (iii) the most challenging aspect is that WQIs are developed for a specific region, being source-specific; therefore, there is a continuing interest to develop accurate WQIs that suit a local or regional area; (iv) no single WQI has been globally accepted; (v) there is no worldwide accepted method guiding steps for WQI development, thus, further works in this fields are still necessary to solve the limitations of worldwide developed WQIs. These conclusions indicate a desire to develop a method and a water quality index for practical applications on local or regional scale, based on available monitoring data.

Water quality is important information in water resources management. Different uses of water need various water quality parameters consisting of physical, chemical, and biological ones. For the water quality assessment, water quality standards or guidelines have been established on international and regional scale. However, they provide evaluation taking individual parameters into account and do not indicate a general picture of the water quality in sites or regions under study [ 1 – 5 ]. The development of water quality assessment methods based on a quantitative and comprehensive index has attracted big concerns from scientists. Water Quality Index (WQI) is a mathematical tool to transfer water quality parameters to a single integer value, depicting the overall health status of a water body [ 2 , 6 – 8 ]. The WQI developed by Brown et al. [ 1 ] was proposed by National Sanitation Foundation (NSF-WQI) to assess surface water quality. The NSF-WQI has been applied worldwide as originally proposed or modified before applications [ 2 , 9 – 11 ]. Many reviews about developed WQIs [ 2 , 4 , 5 , 9 ] indicated that WQIs has been widely used as an efficient tool to assess surface and underground water quality.

Materials and methods

Study area

Hue City (belonging to Thua Thien Hue province) was the ancient capital of Vietnam under the governing of the Nguyen Dynasty lasted from 1802 to 1945 and had been the political and cultural center in Central Vietnam since then. It is the noted sight-seeing resort that was registered as a World Culture Heritage since 1993. Huong river with a catchment area of 2830 km2 and a population of 540,000 in its basin is formed from two branches (Ta Trach and Huu Trach) originating from the mountains in the west of the province and combining at Tuan confluence. The main part of the river with 32 km length divides the city into two parts on its flowing way: north part (old city) and south part (new city), and meets Bo river at Sinh confluence (far from Hue city 15 km West), finally goes to Tam Giang-Cau Hai lagoon (running along the seaside) and then to the East sea at Thuan An outlet ( ). The average width and depth of the main river part are 200 m and 2–8 m, respectively. Binh Dien hydro-power plant with a capacity of 423.7 million m3, located upstream of Huu Trach branch, has been operated since 2009. Ta Trach reservoir, with a capacity of 646 million m3, located upstream of Ta Trach branch, has been built for flood control purpose since 2013. A damp (Thao Long damp) has been built at the mouth area of the river in 2006 to prevent saline intrusion from the sea via the lagoon. Huong river is the most important surface water source used for different activities such as domestic activities, industries, irrigation, navigation, tourism, aquaculture, etc. in the province. Van Nien and Gia Vien are now two water intakes for two water treatment plants in the city. Wastewaters discharged into the river, floods in the wet season (September–December), and saline intrusion in the dry season (January–August) are environmental concerns to the river basin. Air temperature in the province is in the range of 21–38°C and 24.8°C on average. The annual average rainfall in the province is from 2700 mm to 3800 mm annually with a predominance of 60% in wet season. The river average flow was from 428 m3/s (in the dry season) to 553 m3/s (in the wet season), responding to the median flow from 189 m3/s to 214 m3/s, respectively (calculated from monitoring data in the years 2014–2016).

An external file that holds a picture, illustration, etc.
Object name is pone.0274673.g001.jpgOpen in a separate window

Collection of water quality data

The water quality dataset used in this study is a seven-year monitoring data (2014–2020). It was divided into two sets: the dataset of the year 2014–2016 was used for WQI procedure development, while the dataset of 2017–2020 was employed for testing the WQI procedure developed and assessing the water quality of the Huong river. The water quality monitoring program was performed by the Institute of Natural Resources, Environment, and Biotechnology (IREB), Hue University, under the support of the Ministry of Training and Education, Vietnam. The water quality data were in the form of monthly data in reference to surface water samples collected every month at 11 monitoring sites (Hto, HT, Tto, TT, SH1 –SH3, and SH5 –SH8 shown in over a period of 3 years (2014–2016). Fourteen parameters that were routinely monitored were: temperature, pH, electrical conductivity (EC), total suspended solids (TSS), dissolved oxygen (DO), 5-day-biochemical oxygen demand (BOD), chemical oxygen demand (COD), ammonium (N-NH4), nitrate (N-NO3), phosphate (P-PO4), total coliform (TC), total dissolved iron (Fe), the river velocity and flow rate. Several total dissolved heavy metals (HgII, CdII, AsIII,V, CrVI, PbII, CuII, ZnII) and organochlorine pesticides (DDTs, HCHs) were monitored one or two times per year.

The river water quality has also been quarterly monitored (in February, May, August and November) at ten sampling sites (HT, TT, and SH1 –SH8, ) by the Center for Natural Resources and Environment Monitoring (CREM) under the support of Thua Thien Hue Province–People Committee in the year of 2017–2020. The monitored parameters were the same as mentioned above.

Analytical methods for water quality parameters were adopted from Standard Methods for the Examination of Water and Waste Water [41]. Quality assurance and quality control procedures were conducted during the monitoring or analysis to confirm the data quality. Quality control consists of revising repeatability, trueness, linearity, limit of detection (LOD) and blank were routinely undertaken to confirm confidence of the monitoring/analysis results [41].

Procedure of WQI development

The procedure of WQI development conducted in this study is described in .

An external file that holds a picture, illustration, etc.
Object name is pone.0274673.g002.jpg Open in a separate window

  • 3, P-PO4 indicates organic pollution and eutrophication levels of the river, respectively. The parameter TC describes fecal bacteria pollution level of the river. Iron is commonly occurred in the river waters due to erosion and washing from the soil in river basins and therefore, it is selected as an additional parameter in the WQI model. The heavy metals and organochlorides were not selected for the river WQI development, because their concentrations (collected from the available monitoring data) were very low, i.e. lower than the detection limit (LOD) or much lower than the limits of national guidelines on surface water quality [

    Parameter selection: Ten basic parameters (pH, EC, TSS, DO, BOD, COD, N-NH4, N-NO3, P-PO4, TC) and one additional parameter (Fe) were selected for the river WQI development. The parameters pH, EC, TSS and DO presents physical characteristics of the river. The parameters BOD, COD and N-NH4, N-NO, P-POindicates organic pollution and eutrophication levels of the river, respectively. The parameter TC describes fecal bacteria pollution level of the river. Iron is commonly occurred in the river waters due to erosion and washing from the soil in river basins and therefore, it is selected as an additional parameter in the WQI model. The heavy metals and organochlorides were not selected for the river WQI development, because their concentrations (collected from the available monitoring data) were very low, i.e. lower than the detection limit (LOD) or much lower than the limits of national guidelines on surface water quality [ 42 ] set up by Vietnam Ministry of Natural Resources and Environment/MONRE. The data set of the 11 parameters collected from IREB in the years 2014–2016 was used for the river WQI development. The original data set of 11 water quality parameters is supplied in S1 Data

The data set of the 11 parameters (n = 11) collected from CREM in the year 2017–2020 (S2 Data) was used for testing the proposed WQI model and assessing the river water quality.

  • Estimation of weights:

Principle component analysis method can ideally reduce the dimensionality of a multivariate data set while still maintaining its original structure to the maximum extent possible and thus it is often used while dealing with environmental data. The PCA reduces the total number of original variables to a smaller data set of new variables (factors or components) while preserving the variability with a minimal loss of information. The PCA method helps to extract the components/factors from the correlation matrix, necessary to explain the variance structure through linear combinations of the original variables [35]. For the PCA calculation, original variables are commonly transferred to normalized variables, which have zero mean and unit variance, to remove the effects of the variable unit and scale [35]. The eigenvalue of each component (or factor) is the amount of variance in the data set which is accounted for (or explained) by the component. The PCA calculation also gives the factor loading for each variable. Each factor loading represents the degree of contribution of the variable to the formation of the factor. The variables with the highest factorial load are considered of greater importance and should influence more on the factor [11,35]. In this study, the communality, which is a sum of square loadings of retained principal components (PCs) for each variable, was used for the calculation of the weight in the WQI procedure. The variable with the highest communality is considered of the most importance and vice versa. The PCA calculations were performed by using the free software R, version 4.0.3/64-bit (10-10-2020), module R-Studio and package Factoextra (version 1.0.7).

  • Determination of sub-index values:

For convenience to WQI users in defining the sub-index of each selected parameter (or variable), linear sub-index functions are established based on the permissible limits from the National Technical Regulations on Surface Water Quality (QCVN 08:2015-MT/BTNMT) [42] set up by Vietnam Ministry of Natural Resource and Environment (MONRE) in 2015. The linear functional form for each variable (x) is:

y=a×x+b

(Eq 2)

where y (or q) is sub-index calculated from the monitored concentration of the variable x;

a and b are derived from the two linear equations:

100=a+b×(limitofclassA1)

(Eq 3)

where y = 100 corresponding to the best quality for variable x (≤ the limit of class A1 indicated in the regulation);

1=a+b×(limitofclassB2)

(Eq 4)

where y = 1 corresponding to the worst quality for variable x (≥ the limit of class B2 in the regulation).

The water quality limits regulated for the selected parameters extracted from QCVN 08:2015-MT/BTNMT are shown in .

Table 1

pHECTSSDOBODCODN-NH4N-NO3P-PO4FeTC-μS/cmmg/Lmg/Lmg/Lmg/Lmg/Lmg/Lmg/Lmg/LMPN/
100 mL
min
5.41614.4250.070.10.050.19130
max
7.2955767.8922.60.420.70.170.887500
mean
6.470136.23.6100.160.320.080.431618
STDV
0.393120.51.340.060.110.030.111177
median
6.34986.239.00.150.30.080.421500
MAD
0.21440.313.10.030.10.020.081020
quartile 1
6.23855.836.50.120.220.060.35480
quartile 3
6.570146.5513.00.20.40.10.512525
CV (%)
513399835403534332673
CL95%
0.0391.20.050.120.40.010.010.010.01116
QCVN 08: A1
6–8.5115320≥ 64100.320.10.52500     A26–8.5153830≥ 56150.350.215000     B15.5–9307750≥ 415300.9100.31.57500     B25.5–9100≥ 225500.9150.5210000Open in a separate window

The DO concentrations higher than saturation indicate over algal synthesis in eutrophic waters, leading to a reduction in water quality. Saturated DO concentration at 20°C and the air pressure of 760 mmHg is 9 mg/L. This means that the sub-index (y) equals to 100 for the DO concentrations in the range of 6–9 mg/L (i.e. from the limit A1 to saturation with accepting that the lowest river water temperature was 20°C). In case the DO concentration is lower than 6 mg/L, the sub-index linear function for the parameter DO is determined following Eqs 3 and 4. If the DO concentration is over 9 mg/L (over saturation), a and b are derived from two equations:

100=a+b×9

(Eq 5)

and1=a+b×12.

(Eq 6)

The pH limits in class A1 and A2 stated in the regulation range from 6 to 8.5, responding to the sub-index of 100. In the case of pH lower than 5.5 (limit B1) or higher than 9 (limit B2), the sub-index is equal to 1. This means that there are two sub-index functions for the parameter pH. Due to the parameter EC is not regulated in the QCVN 08:2015-MT/BTNMT [42], the sub-index linear function for the EC is established based on the limits for the parameter TDS required in the other regulations with approximately accepting that [43].

TDSmg/L=0.65×EC(μS/cm)

(Eq 7)

According to National Technical Regulations on Drinking Water Quality (QCVN 01:2009/BYT) [44] set up by the Vietnam Ministry of Health, the limit for TDS is lower than 1000 mg/L, approximate to EC < 1538 μS/cm that responds to the sub-index (y) of 100; According to National Technical Regulations on Water Quality for Irrigation (QCVN 39:2011/BTNMT) [45] set up by Vietnam MONRE, the limit for TDS is lower than 2.000 mg/L, approximate to EC < 3077 μS/cm that responds to the sub-index of (y) of 1. This means that in the sub-index linear equation for the EC, a and b are derived from two equations:

100=a+b×1538

(Eq 8)

and1=a+b×3077

(Eq 9)

  • Aggregation of the sub-index values into final WQI:

Multiplicative method using formula Eq 1 mentioned above to calculate final WQI. Where, qi is the parameter sub-index, ranging from 1 (the worst quality) to 100 (the best quality); wi is the parameter weight defined from the PCA procedure, ranging from 0 to 1; sum of the weights equals to one.

Water quality assessment basing on WQI grade

The grades representing the river water quality vary from 1 to 100. The classification of the river water quality, based on the WQI values, in this study is similar to the classification regulated in the VN-WQI model [30] (see S1 Text), as follows: grades 91–100 (EXCELLENT, color BLUE); 76–90 (GOOD, color GREEN); 51–75 (MODERATE, color YELLOW); 26–50 (POOR, color ORANGE); 10–25 (VERY POOR, color RED); < 10 (HIGHLY POLLUTED, color BROWN).

The proposed WQI was then applied to evaluate the river water quality employing the dataset in the years 2017–2020. The river water quality evaluations resulting from the proposed WQI were compared with the NSF-WQI and VN-WQI in several critical cases (the parameter concentrations above or below the limits) to examine ambiguity and eclipsing of the WQI indices in the river water quality reflection. The NSF-WQI is an index calculated according to either multiplicative formula (Eq 1) or additive one (Eq 2) with nine selected parameters (n = 9) consisting of temperature change (ΔT), pH, Tur (turbidity), TS (total solids), DO, BOD5, N-NO3, P-PO4 and fecal coliform (NSF-WQI, 1970). It includes the parameter weights:

WQI=∑i=19wiqi

(Eq 10)

In this study, the NSF-WQI was calculated according to both the formulas (Eqs 1 and 10).

The original data set of the nine water quality parameters mentioned above and the results obtained from the NSF-WQI calculation are supplied in S3 Data. The parameter subindex (qi) was derived from the respective rating curve. DO concentration (mg/L) at a given water temperature (extracted from S2 Data) was converted into DO saturation (%) to define the subindex for parameter DO. The parameter ΔT was obtained by subtracting the upstream temperature from the temperature downstream and recording the result as temperature change (°C). The parameter TS was accepted to be the sum of TDS and TSS: TS = TDS + TSS, where TDS (total dissolved solids) concentration was estimated by: TDS (mg/L) = 0.65 × EC (μS/cm); the parameters EC and TSS were extracted from S2 Data. Fecal coliform concentration was replaced by the total coliform (TC) concentration for the NSF-WQI calculation. The relative weights for the parameters (wi in parenthesis) are as follows (in decrease order of the wi): DO (0.17), TC (0.16), pH (0.11), BOD (0.11), ΔT (0.10), N-NO3 (0.10), P-PO4 (0.10), Tur (0.08), TS (0.07).

The VN-WQI is an index without the parameter weight, meaning that the selected parameters have equal weight (weights are all equal to one). The sub-index value for each parameter is defined from the normalized scales given in the appropriate table. The sub-index for the parameter DO is derived from a given equation with monitored water temperature. The final VN-WQI value is calculated with both multiplicative and additive methods (the VN-WQI model is supported in S1 Text). In this study, the index VN-WQI applied to the river was calculated from eight parameters (n = 8): pH (belongs to Group I); DO, BOD, COD, N-NH4, N-NO3 and P-PO4 (Group IV) and TC (Group V). The heavy metals including As, Cd, Pb, CrVI, Cu, Zn, Hg (Group III) and organochlorides such as aldrin, BHCs, dieldrin, DDTs, heptachlor and heptachlor epoxide (Group II) were not selected for the VN-WQI calculation because their concentrations monitored in the river samples in the years 2017–2020 were lower than the detection limits (LODs) or much lower than the limits regulated by Vietnam MONRE (QCVN 08-MT:2015/BTNMT) [42].