The identification of long- and short-term trends is important to many stakeholders, and these trends are important components of the Houston-Galveston Area Council’s work, particularly in relation to the evaluation and revision regional monitoring efforts and priorities. H-GAC staff used several methods of analyses to characterize surface water quality in the H-GAC region. Trend analysis can identify cases where the value of a water quality parameter is changing over time. Statistical tests are performed to distinguish statistically significant trends from random and seasonal variation. While it might seem reasonable to use all the data available for these analyses, as the amount of data increases the likelihood of finding a statistically significant but unimportant trend also increases. To minimize this, H-GAC performed trend analysis on the most recent 15 years (June 2002–May 2017) of Texas Commission on Environmental Quality (TCEQ)-validated data to highlight recent trends in water quality in the region. All data management and statistical analysis was performed using Statistical Analysis System (SAS) version 9.3.
Data Selection and Processing
For analyses in this report, H-GAC staff selected water quality data collected between June 1, 2002 and May 31, 2017 from data downloaded from the Surface Water Quality Monitoring Information System (SWQMIS) on November 14, 2017. SWQMIS is a database that serves as the repository for surface water quality data for the state of Texas. All data used for these analyses were collected under a TCEQ-approved Quality Assurance Project Plan (QAPP). Qualified data (data added to SWQMIS with qualifier codes that identify quality, sampling, or other problems that may render the data unsuitable) were excluded from the download. All data for all stations in the H-GAC Clean Rivers Program region (in general, basins 9, 10, 11, 13, and 24) were combined. Available flow data from U.S. Geological Survey (USGS) gaging stations in the Segment 1007 watershed were downloaded from the USGS website on January 8, 2018.
Variables in each data set were transformed as appropriate, and new variables were created to facilitate analysis and graphical display of results. In some cases, data from two or more STORET (method) codes were combined because the results obtained from each method can be considered equivalent. Any data collected at a depth greater than 0.3 meters, or not collected under a routine ambient monitoring program, were deleted.
Censored data (data reported as < [parameter limit of quantitation (LOQ)] were transformed to a value of one-half the parameter LOQ associated with the data, with some important exceptions. Because nutrient quantitation limits have been lowered over time, the presence of data censored at many different LOQs in the same dataset poses several problems. If the data for a given parameter are censored at values well above a later, lower LOQ value, trend analysis could suggest a trend where no real water quality trend is present. There is no ideal solution to this problem. Editing the censored data alone would limit, but not eliminate, false trends. In cases where some of the data reflected use of a lower LOQ than the current H-GAC Clean Rivers Program LOQ, values were transformed to one-half of the H-GAC Clean Rivers Program LOQ to minimize the identification of trends caused by changing analytical methods. H-GAC does not believe the impact from this transformation is significant. The impact of this analysis would be most pronounced for parameter trends typically found at concentrations at or near the quantitation limit in that specific water body.
The following parameters were selected for analysis:
Data Selection for Trend Analysis
H-GAC staff performed segment-level trend analysis on a 15-year data series (if available) from the most downstream station in each classified and unclassified segment. If that station did not have a significant series of flow data associated with sampling events, and the next station upstream had a significant flow data series (preferably from a USGS gaging station), the next upstream station was selected instead. Trends were also evaluated at the assessment unit (AU) level, and graphs showing results from individual stations within each AU were also produced for review.
Trend analysis methodology
The first stage of trend analysis for both segments and AUs was nonparametric correlation analysis (Kendall’s tau-b) of the parameter value with the sample collection date to identify correlations that were significant at p <0.05. these potential trends were then evaluated with up to four other methods. simple linear regression of the natural log of the parameter value on the time variable was performed for all data in the subset selected by h-gac for trend analysis. flow-adjusted trends were obtained through correlation of residuals from loess (locally-weighted least squares) regression in cases where instantaneous flow data were available. if there were no temporal gaps in the time-series (missing years, consistently missing seasons), seasonal kendall />Sen Slope estimation/Theil regression was run. If more than 15 percent of the data were censored at the analytical limit of quantitation, survival analysis (Tobit analysis in SAS PROC LIFEREG) was performed.
Plots of selected statistically-significant trends were produced for segments and AU in each of the three watersheds selected for this report. Each graph includes an inset showing the results of multiple trend analyses. If the trend is described as Increasing or Decreasing it means the calculated p-value is below the threshold of 0.05 selected by H-GAC. Trends identified as Stable have a calculated p-value greater than 0.05. When evaluating the results of several trend analyses of a given parameter, H-GAC placed the most weight on the Kendall correlation because nonparametric methods are insensitive to outliers in the time series. However, if Kendall correlation differed from the results of seasonal trend analysis or flow-weighted analysis, the data were further evaluated. If no flow data were available, the flow-adjusted trend appears as Not Calculated (indicating no flow data is available) or Insufficient Data (indicating only one flow value exists and a correlation could not be calculated). If the seasonal Kendall/Sen Slope trend was not calculated due to gaps (missing seasons) in the time series, the seasonal Kendall trend appears as Not Calculated. Survival analysis was only applied in those cases where the amount of censored data could bias the results of the other methods. H-GAC set the threshold at 15 percent or more censored data. If fewer than 15 percent of the data were censored, survival analysis was not performed, and the trend appears as Not Applicable on graphs.
H-GAC staff conducted a variety of targeted analyses showing the relationship between parameter values and flow conditions for each monitoring station. These analyses supplemented interpretation of observed trends, and in some cases suggested relationships that might not be evident from trend analysis alone. Graphs of statistically significant flow dependencies were produced in cases where instantaneous flow data were available.
Trend analysis for the Regional Water Quality Summary
In 2015, H-GAC staff compiled a subset of stations in classified segments believed to be most representative of segment water quality by selecting one to three stations that were statistically representative of a given parameter in a given segment. Means and standard deviations of parameter values are calculated for each station, and those stations with means and standard deviations closest to the overall mean and standard deviation for the segment and parameter combination were selected. Preference was given to stations where stream flow was measured, and final selections were reviewed for reasonableness. In most cases, the station or stations at the most downstream location of the segment was the most statistically representative. Selection relied on SAS procedures PROC MEANS and PROC RANK. The same subset of stations has been used since 2015 to allow consistent comparisons across regional water quality summaries created for different years.
A conservative trend analysis was performed using seven years of recent data (June 1, 2010 – May 31, 2017) at the selected representative monitoring stations in the classified portion of each watershed to detect trends at the watershed level for the H-GAC Regional Water Quality Summary. Trends were identified by nonparametric correlation analysis and simple linear regression. Because nonparametric methods are less sensitive to extreme values in the data than parametric techniques like linear regression, trends that were suggested by linear regression analysis alone were not included in the chart.
Trends (for the “Frog Chart” analysis) were considered statistically significant if the p-value was below 0.05. 0.05 is the standard significance level used in most applications; H-GAC feels that selecting all results with p-values LE 0.10 produces too many real, but unimportant, trends. In part, this is due to the large amount of data collected in our region – the more data one analyzes, the more likely it is that one will find a result – and identify a “trend” - that is statistically different from randomness (“no trend”). 0.0545 rounds to 0.055, which in “arithmetic rounding becomes 0.06 when expressed as one significant figure.
Moving Geometric Mean Plots
In addition to trend analysis, H-GAC created plots of seven-year geometric means for indicator bacteria for each segment. These are a type of moving- or rolling-average plot, and they are constructed by calculating the geometric mean of all data collected up to seven years before a given sample was collected and plotting it (on the y-axis) against the collection date (on the x axis) of the last sample in the series. A smoothed line (penalized B-spline) is fitted to the time series. One can assess the change in bacterial density over time from this sort of plot more easily than from a simple plot of density versus time. These plots are more meaningful for segments with historical bacteria data than for segments recently added to monitoring schedules (typically unclassified segments).
Watershed Characterizations
H-GAC used SAS to produce tables showing impairments and concerns for each AU, monitoring stations in each AU and segment, and a variety of other summary data to aid in the characterization of water quality issues in each watershed. In most cases, the source of the tabulated information was TCEQ (Integrated Reports and assessment results, the Coordinated Monitoring Schedule, station inventory reports, AU and segment GIS shapefiles).