Efficient multi-objective calibration of a computationally intensive hydrologic model with parallel computing software in Python

Efficient multi-objective calibration of a computationally intensive hydrologic model with parallel computing software in Python

Environmental Modelling & Software 46 (2013) 208e218 Contents lists available at SciVerse ScienceDirect Environmental Modelling & Software journal h...

2MB Sizes 0 Downloads 53 Views

Environmental Modelling & Software 46 (2013) 208e218

Contents lists available at SciVerse ScienceDirect

Environmental Modelling & Software journal homepage: www.elsevier.com/locate/envsoft

Efficient multi-objective calibration of a computationally intensive hydrologic model with parallel computing software in Python Xuesong Zhang a, *, Peter Beeson b, Robert Link a, David Manowitz a, Roberto C. Izaurralde a, Ali Sadeghi b, Allison M. Thomson a, Ritvik Sahajpal c, Raghavan Srinivasan d, Jeffrey G. Arnold e a

Joint Global Change Research Institute, Pacific Northwest National Laboratory, University of Maryland, College Park, MD 20740, USA Agricultural Research Service, United Stated Department of Agriculture, Beltsville, MD 20705, USA Department of Geographical Sciences, University of Maryland, College Park, MD 20740, USA d Spatial Sciences Laboratory in the Department of Ecosystem Science and Management, Texas A&M University, College Stations, TX 77845, USA e Grassland, Soil & Water Research Laboratory USDA-ARS, Temple, TX 76502, USA b c

a r t i c l e i n f o

a b s t r a c t

Article history: Received 18 September 2012 Received in revised form 27 January 2013 Accepted 13 March 2013 Available online 28 April 2013

With enhanced data availability, distributed watershed models for large areas with high spatial and temporal resolution are increasingly used to understand water budgets and examine effects of human activities and climate change/variability on water resources. Developing parallel computing software to improve calibration efficiency has received growing attention of the watershed modeling community as it is very time demanding to run iteratively complex models for calibration. In this research, we introduce a Python-based parallel computing package, PP-SWAT, for efficient calibration of the Soil and Water Assessment Tool (SWAT) model. This software employs Python, MPI for Python (mpi4py) and OpenMPI to parallelize A Multi-method Genetically Adaptive Multi-objective Optimization Algorithm (AMALGAM), allowing for simultaneously addressing multiple objectives in calibrating SWAT. Test results on a Linux computer cluster showed that PP-SWAT can achieve a speedup of 45e109 depending on model complexity. Increasing the processor count beyond a certain threshold does not necessarily improve efficiency, because intensified resource competition may result in an I/O bottleneck. The efficiency achieved by PP-SWAT also makes it practical to implement multiple parameter adjustment schemes operating at different scales in affordable time, which helps provide SWAT users with a wider range of options of parameter sets to choose from for model(s) selection. PP-SWAT was not designed to address errors associated with other sources (e.g. model structure) and cautious supervision of its power should be exercised in order to attain physically meaningful calibration results. Ó 2013 Elsevier Ltd. All rights reserved.

Keywords: Parallel processing Evolutionary multi-objective optimization High performance computer Soil and water assessment tool Parameter calibration

Software availability Software: PP-SWAT Developer: Xuesong Zhang, Joint Global Change Research Institute, Pacific Northwest National Laboratory and University of Maryland, 5825 University Research Court Suite 1200, College Park, MD 20740, USA Operating systems: Linux Dependent software: Python 2.6 or above, mpi4py, and OpenMPI Availability: Free of charge; contact the first author to request a copy. * Corresponding author. E-mail address: [email protected] (X. Zhang). 1364-8152/$ e see front matter Ó 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.envsoft.2013.03.013

1. Introduction Watershed models have been widely used by researchers and decision makers to understand hydrological, ecological, and biogeochemical processes and to examine effects of human activities and climate change/variability on water quantity and quality. These models, however, require careful calibration of a large number of parameters mostly due to measurement limitations and scaling issues (Beven, 2000). Manual and/or automatic parameter optimization has become a standard procedure by the researchers to reduce the disagreement between one or more model predicted and observed variables. With the advancement of powerful geographic information system (GIS) technology and availability of high spatial-resolution data, the last few decades have seen a substantial increase in process complexity and resolution (both

X. Zhang et al. / Environmental Modelling & Software 46 (2013) 208e218

spatial and temporal) in applications of watershed models (Gupta et al., 1998). While enabling better representation of intricate watershed dynamics, these advancements demand more intensive computation, making manual calibration an increasingly labor intensive and time consuming task. Therefore, automatic calibration procedures, sometimes in conjunction with manual calibration, have become more and more an acceptable and popular procedure (Gupta et al., 1999; Van Liew et al., 2005). As model calibration often involves thousands of iterative model runs, an implementation of automatic parameter optimization still may take days, weeks, and even months for some computationally demanding process-based watershed models if sequentially executed (Zhang et al. 2008; Razavi et al. 2012). Thus, computational burden is a major concern in the application of complex watershed models. Recent studies have demonstrated the applicability of combining parallel computation techniques and High-Performance Computing (HPC) systems to enhance the efficiency of calibrating watershed models (e.g. Vrugt et al., 2006; Tang et al., 2007). One popular type of methods used distributed-memory HPC systems, in which a large application was decomposed into a number of parallel tasks which communicated with each other through the Message Passing Interface (MPI), to either simultaneously execute multiple model runs during parameter optimization or parallelize the computation tasks within one model run. Both strategies have shown significant time savings in the application of water resource models. Many examples on expediting parameter optimization can be found in the literature. For example, Whittaker (2004) and Confesor and Whittaker (2007) developed a parallel genetic algorithm library (PGAPACK) in FORTRAN that combines the clusterbased parallel computing technique with the non-dominated sorting genetic algorithm II (NSGAII, Deb et al., 2002) to calibrate SWAT and successfully tested this method on a Beowulf cluster consisting of a server and 12 computation nodes. Vrugt et al. (2006) described a parallel implementation of the Shuffled Complex Evolution Metropolis (SCEM-UA) using Matlab with the MPITB tool box for GNU Octave. Their test results showed that the speedup (calculated as the ratio of wall clock time required by sequential computation to that of the enhanced parallel computing algorithm) gained was nearly linear to the number of processors allocated for running the Sacramento Soil Moisture Accounting (SAC-SMA) model on a distributed computer system. Tang et al. (2007) tested a parallelized version of the epsilon-NSGAII with the Argonne National Laboratory’s Infinicon MPI on a Linux cluster for the SACSMA model and a long-term groundwater monitoring (LTM) application and also found that linear speedup can be achieved by increasing the number of processors with a MastereSlave (MS) parallelization scheme. Sulis (2009) attained significant time savings via a grid-computing approach to optimizing multi-reservoir operating rules efficiently and effectively in an application to a water system in Southern Italy. Engineering the model code to carry out computation tasks simultaneously on multiple computational units has also demonstrated its potential in terms of time savings. For instance, Zhan (2005) developed Fortran-based MPI software that can invert the Laplace transform in parallel for groundwater modeling problems involving oscillatory water levels. Li et al. (2011) used MPI with Cþþ in the parallelization of the Digital Yellow River Model through dynamic dispatching of simulation tasks to a series of computational units. Along with the recent advancement in the multi-core hardware, the shared-memory systems in which the generated parallel tasks collaborate through shared variables in single memory subsystem, have been increasingly embraced by the watershed modeling community. Rouholahnejad et al. (2012) developed a parallel calibration framework for the SWAT model, which has the advantage of

209

fully utilizing the power of modern multi-core computers. On a Windows server with 24 cores, their method can achieve a speedup of around 10 folds. Wu and Liu (2012) developed parallel parameter estimation program for SWAT using the R statistical system (www. r-project.org). Zhang et al. (2012) developed a Python-based optimization software for SWAT which can improve the efficiency of SWAT optimization by about eight-fold on a Linux server with 16 cores. The recent development in graphical processing units (GPUs) with support for general computing tasks has seen increased applications in environmental modeling (Singh et al., 2011; Kalyanapu et al., 2011; Bryan, 2013). In addition, cloud computing is expected to play an increasingly important role in scientific computing in the future (Miller, 2009). Given the continuously improving availability of diverse parallel computing techniques and resources, there has been and will continue to be a pressing need for new methods and software allowing for executing routine modeling exercises in parallel mode on different types of HPC systems. Here, we introduce a Python-based Parallel computing software package for efficient multi-objective calibration of SWAT (PPSWAT), which combines Python (python.org), mpi4py (Dalcin et al., 2011), OpenMPI (www.open-mpi.org), and AMALGAM (Vrugt and Robinson, 2007). We selected the SWAT model (Arnold et al., 1998), a continuous, long term, distributed-parameter hydrologic model, because it has been incorporated into the U.S. Environmental Protection Agency (US EPA) Better Assessment Science Integrating Point & Nonpoint Sources (BASINS) software package, and is also being considered as a core watershed model by the United States Department of Agriculture (USDA) for applications in the Conservation Effects Assessment Project (CEAP) (Richardson et al., 2008) across watersheds in the U.S. For multi-objective optimization problems, multiple trials of the optimization procedures (each with over ten thousand SWAT runs) are required to approximate Pareto optimal parameter sets that are not dominated by other parameter solutions in the parameter space (Zhang et al., 2010). As one run of SWAT may take minutes or even hours (e.g. Zhang et al., 2009; Rouholahnejad et al., 2012), a greater degree of efficiency is needed when carrying out parameter estimation of a high resolution SWAT study on a large scale watershed. In previous studies, speedups of around 15 or less have been reported for parallelized parameter optimization of SWAT (e.g. Whittaker, 2004; Rouholahnejad et al., 2012; Zhang et al., 2012). It is of significant practical value to explore the use of super computers with hundreds of computation units for efficient calibration of the SWAT model. As our main objective, on a Linux cluster, we carried out a suite of numerical experiments using PPSWAT to demonstrate its capability to improve the efficiency of calibrating SWAT. With the efficiency achieved by PP-SWAT, we assessed the effects of adjusting parameter at subbasin and grouped-subbasin scales on SWAT calibration, as compared to the commonly used watershed scale parameter adjustment scheme. Through an experiment examining the implications of model structure setup for calibrating SWAT, we also discussed the importance of using auxiliary information and expert knowledge in building an SWAT model to represent adequately the behavior of a specific watershed. Overall, PP-SWAT is expected to serve as a useful tool for efficiently calibrating the SWAT model and help SWAT users find parameter solutions that represent well the watershed behaviors. 2. Materials and methods 2.1. Evergreen computing cluster The Evergreen computing cluster and clusters of similar design are popular, inexpensive, and widely available in sizes ranging from a few dozen processors up to world-class systems of more than 100,000 processors. Therefore, we expect the

210

X. Zhang et al. / Environmental Modelling & Software 46 (2013) 208e218

lessons learned from running PP-SWAT on Evergreen to be applicable to current and future research programs using SWAT and related models. In brief, the Evergreen computing cluster comprises 284 compute nodes, plus a handful of I/O, login, and miscellaneous service nodes. All user jobs are run on the compute nodes, while the auxiliary nodes provide services such as interactive logins, network access to the outside world, and access to long-term storage. Each compute node hosts a pair of Intel Xeon 5560 (quad-core) or 5660 (hex-core) microprocessors running at 2.8 GHz. The system has a total of 2512 cores available. Each core has a peak throughput (Rpeak) of 11.2 GFLOPS (billion floating-point operations per second). Thus, the total Rpeak for the system is 27686 GFLOPS, or approximately 27.7 TFLOPS. Each node has 48 GB of memory, for a total system memory of 13.6 TB Fig. 1. Two user-accessible networks link the compute nodes. The primary network is a QDR InfiniBand (IB) network running at 40 gigabits per second (Gbps) line rate (32 Gbps data rate). The round-trip latency (i.e. the time required to send a zerolength message and receive a response) on the IB network is 1.7 ms. Thus the IB network provides both high throughput and rapid response times. It is ideal for running distributed calculations in which the compute nodes performing the calculations must exchange data frequently. The secondary network runs on gigabit ethernet. This network has much lower throughput (approximately A Gbps line rate) and much higher latency (typically around 100 ms). It is unsuitable for calculations that require extensive communication between compute nodes. However, the Ethernet network’s inferior latency and throughput are not an impediment to loosely-coupled applications that require only infrequent, short communication between processes. Therefore, in practice, Evergreen’s ethernet network does see use in calculations that for technical reasons are unable to use the IB network. PP-SWAT is an example of an application that cannot use the IB network. PPSWAT uses Python to manage much of the calculation, including the MPI communication and the AMALGAM algorithm implementation. SWAT evaluations were launched from the Python code via the POSIX fork(2) and exec(3) system calls (packaged together in the Python os.system() function). When using this technique in together with MPI, the user must guarantee that the child code does not touch any of the special “registered” memory used by the InfiniBand stack before calling exec(3). When the driver code is written in Python, it does not seem to be possible to respect this guarantee because parts of the Python runtime are not under the user’s control (see more detailed discussion in the Supplement). Therefore, we found that the PP-SWAT only runs reliably over the Ethernet network (which lacks this requirement). We believe that this workaround will be necessary in any software stack that uses MPI with a Python driver and uses os.system() to launch a code written in a lower-level language, unless steps are taken to address these issues by the HPC community. Similar issues would probably arise when using a driver written in any other high-level language with an extensive runtime, including popular choices like Perl and Ruby. Evergreen users’ storage needs are supported by a 1.4 PB Lustre filesystem that is accessible from all nodes (compute and auxiliary) in the system. Lustre is designed to allow parallel access to files from any or all compute nodes simultaneously. This

capability supports higher read/write performance than is achievable with a conventional network filesystem. The filesystem is fully backed up and is available for long-term storage for user data. 2.2. Description of SWAT The SWAT is a watershed scale, continuous, long term, distributed-parameter model that can simulate surface and subsurface flow, soil erosion and sediment deposition, and nutrient fate and movement through watersheds (Arnold et al., 1998). SWAT divides a watershed into subbasins connected by a stream network, and further delineates Hydrologic Response Units (HRUs) consisting of unique combinations of land cover and soils in each subbasins. The major hydrologic routines within SWAT account for snow fall and melt, surface runoff, vadose zone processes (i.e. infiltration, evaporation, plant uptake, lateral flows, and percolation), groundwater flows, and river routing. SWAT has been applied worldwide to assist in hydrologic water resource availability, pollutant fate and transport, and sustainable watershed management (Gassman et al., 2007). 2.3. Structure and functions of PP-SWAT PP-SWAT was programmed in Python, a free and powerful programming language that runs across multiple platforms such as Windows and Linux/Unix. Free software is available to provide Python with bindings to Message Passing Interface (MPI). PP-SWAT employs AMALGAM to resolve the multi-objective optimization problems and uses a combination of mpi4py and OpenMPI to allow for accessing HPC parallel computing resources. We selected AMALGAM in PP-SWAT, because it performed the best among several evolutionary multi-objective optimization methods tested for SWAT over a suite of watersheds representing a wide range of watershed characteristics (e.g. climate, land use, soils, and size) (Zhang et al., 2010). AMALGAM adaptively and simultaneously employs multiple Evolution Multi-objective Optimizers (EMOs) to ensure a fast, reliable, and computationally efficient solution to multi-objective optimization problems (Vrugt and Robinson, 2007). It starts with a random initial population Pt of size N, where the subscript t represents the iteration number ranging from 0 (initial population) to T (the maximum number of iteration). The fast non-domination sorting (FNS, Deb et al., 2002) is used to calculate the rank of each individual in Pt. An offspring population P t of size N is subsequently created by implementing each candidate algorithm within AMALGAM to generate a prescribed number of offspring parameter solutions, N ¼ fNt1 ; Nt2 ; .; Ntk g, from Pt, where k denotes the ID of each candidate EMO within AMALGAM. The best N solutions within Rt ¼ Pt WP t are selected into Ptþ1, which is further evolved by AMALGAM until the maximum number of iterations is reached. The two key procedures in AMALGAM are simultaneous multi-method search and self-adaptive offspring creation. AMALGAM dynamically changes the number of parameter solutions contributed by each EMO into the offspring population based on its instantaneous

Fig. 1. Illustration of the structure of EVERGREEN. There are a total of 2512 processors, 825 linked by a 40 Gbps Infiniband network and a 1 Gbps Ethernet network. The Lustre filesystem 826 has 1.4 PB of available storage.

X. Zhang et al. / Environmental Modelling & Software 46 (2013) 208e218 is updated according to performance. That is, fNt1 ; Nt2 ; .; Ntk g P i n Þ, where Oi =N i Þ= kn ¼ 1 ðOnt =Nt1 Nti ¼ N  ðOit =Nt1 t t1 is the ratio of the number of offspring points an algorithm contributes to the new population (Oit ) and the cori ). For responding number that the algorithm created in the previous generation (Nt1 the first generation, N01 ¼ N02 ¼ . ¼ N0k ¼ N=K, where K is the number of i candidate EMOs used in AMALGAM. The minimum value of Nt is 5. Basic controls of AMALGAM were determined following Vrugt and Robinson (2007). Here we used four candidate EMO algorithms in AMALGAM, including NSGA-II, Particle Swarm Optimization (PSO, Kennedy and Eberhart, 2001), adaptive metropolis search (AMS, Haario et al., 2001), differential evolution (DE, Storn and Price, 1997). The outputs from AMALGAM are not one single optimal parameter solution, instead multiple non-dominated parameters solutions are generated. If one parameter solution is not inferior to the other solutions in terms of all the objective functions to be optimized, then it is called non-dominated. The use of multi-objective optimizers and the concept of non-dominance in SWAT calibration have been discussed in previous research (e.g. Zhang et al., 2010) and were not detailed here. Binding Python with OpenMPI through mpi4py enables PP-SWAT to simultaneously submit jobs to cores distributed across computing nodes in a Master-Slave configuration (Fig. 2). The key steps implemented in PP-SWAT are similar to that of a sequential version of AMALGAM, except that PP-SWAT performs the tasks highlighted in the shaded boxes in parallel while a sequential version of AMALGAM executes these operations one after another. Basically, PP-SWAT sequentially implements program initialization (reading in default parameters and specifications of objective functions) and the AMALGAM operators (parameter initialization, performance ranking, changing the number of parameter solutions contributed by each EMO, and producing offspring population of parameter sets) as introduced in the above paragraph. Two tasks were specially treated to be performed in parallel. The first parallelized task is creating C (equal to the number of Slave cores) copies of SWAT input files, which allows the C Slave cores to simultaneously and separately run SWAT. The other parallelized task is the execution of SWAT, which contains three major operations. We developed three modules to perform these three operations, respectively. [1] a Modifying Parameters (MP) module to adjust parameters of each HRU, subbasin, and reach during each iteration of the AMALGAM; [2] a Run SWAT

211

(RS) module to execute the SWAT model for the specified simulation period and print out the simulation results at a specific subbasin or reach that are compared with observations; and [3] a Calculation of Objective functions (COF) module compares the simulated variable of interest (e.g. streamflow) at a specific location (e.g. subbasin or reach) with the corresponding observed data for a specified time period and calculates values of the specified objective function types (e.g. NashSutcliffe efficiency). The COF module does not only calculate objective functions using simulated and observed streamflow but also fractions of different water budget components (e.g. tile flow percentage of total water yield). The MP module in PP-SWAT provides modelers the flexibility of adjusting parameters at different spatial scales. Each dimension of the parameters to be calibrated is defined by a unique combination of parameter name (e.g. CN2 and ESCO), subbasin/groupede subbasin number, parameter change approach (i.e. percentage change, addend, and replacement), range of parameter change (e.g. 10% or 20 mm), upper and lower limits of the parameter, and the extension of the files storing the parameter. This parameter identification strategy is very similar to that employed in the SWAT_Edit.exe program described by Yang et al. (2007), except that PP-SWAT not differentiate parameters for each soil layer, but allows for varying parameters at a grouped-subbasin scale. In many cases, a set of parameter adjustments (multiplier or addend) is applied to the HRUs across the entire watershed during calibrating SWAT, omitting spatial heterogeneity of parameters. Currently, PP-SWAT can adjust parameters at three scales, i.e. watershed scale parameter adjustment (WSPA), grouped-subbasin scale parameter adjustment (GSPA), and subbasin scale parameter adjustment (SSPA). The refining of parameter adjustment scales often result in a substantial parameter dimension increase, which can result in much more complex optimization problems and consequently longer iterations of the parameter optimization process. In this context, improving the efficiency of executing SWAT through parallel computing is important for making practical fine resolution parameter adjustments. 2.4. Description of test watersheds We selected two watersheds to demonstrate the functions and performance of PP-SWAT. The first is the Little River Experimental Watershed (LREW) that includes the upper 334 km2 of the Little River in southwest Georgia and has been a subject of a long-term hydrologic and water quality research by USDA-ARS and their collaborators (Sheridan, 1997). Briefly, this region is gently sloping and has a humid subtropical climate. Soils in LREW are predominantly sandy and sandy loams underlain by shallow, relatively impermeable subsurface horizons (Sheridan, 1997). Land use within the watershed is approximately 50% woodland, 31% cropland, 10% pasture, 2% water, and 7%urban. The land use, soil, and climate data used to build the SWAT project are obtained from Bosch et al. (2007). The LREW represents a relatively simple watershed simulation case, because we only delineated 27 subbasins and defined 42 HRUs and prepared 479 files for the LREW SWAT model. Streamflow at three monitoring stations is used to calibrate SWAT (Fig. 3a). The second is the South Fork Watershed (SFW) of the Iowa River which covers about 788 km2 and is one of the USDA CEAP benchmark watersheds (Richardson et al., 2008). The watershed is located on the Des Moines Lobe. The main branch of the South Fork River is gauged by a U.S. Geologic Survey gauge (USGS 05451210, here named SF450). The other three gauges (SF400, BC450, and TC325) are maintained by the USDA-ARS National Laboratory for Agriculture and the Environment (NLAE) (Fig. 3b; Tomer et al., 2008). In the SFW, SWAT model setup has been structured and simulated based on Beeson et al. (2011). Like most of the Des Moines Lobe, the watershed is dominated by pothole depressions and artificial subsurface tile drainage, needed to drain the hydric soils which cover nearly 54% of the watershed (Tomer and James, 2004). In the SWAT model setup, the drain tiles are assigned to cropland HRUs based on limited tile maps (hand-drawn on historic aerial photos and on hydric soil areas) and the location on the landscape. This process generates 759 tiled HRUs out of 1017 HRUs (588 km2 or 75% of the watershed). This is consistent with Green et al. (2006), who reported that 80% of the watershed was tiled and that tile discharge represents 60e75% of the total streamflow (Tomer and James, 2004; Green et al., 2006). Land use in the watershed is 84% cropland, and the rest is mostly pasture or forest with limited urban areas. Although the area of SFW is only twice as that of the LREW, we delineated 86 subbasins and defined 1017 HRUs with 7000 input files for SWAT in the SFW. This disproportional increase in the number of HRUs is mainly caused by (1) the use of the Soil Survey Geographic (SSURGO, soils.usda.gov/survey/geography/ssurgo) data with a fine scale of 1:24,000 as compared to the State Soil Geographic (STATSGO, soils.usda.gov/survey/geography/statsgo) data with a scale of 1:250,000 used in LREW; and (2) the subdivision of cropland by considering crop rotation and tillage practices (Beeson et al., 2011). In contrast to the LREW, the SFW SWAT model represents a relatively complex test case. 2.5. Efficiency improvement evaluation metrics and optimization objective functions

Fig. 2. Diagram on parallel execution of AMALGAM for SWAT on distributed HPCs with a Master-Slave configuration. (Note: although general discussion has been provided by Zhang et al. (2009; 2010), the maximum iteration required to achieve satisfactory calibration results is expected to vary across watersheds and is dependent on multiple factors such as parameter dimension and ranges and complexity of hydrological processes).

We used speedup (Pacheco, 2011) to measure the efficiency improvement achieved by PP-SWAT. This metric is calculated as:

Sp ¼

T1 Tp

(1)

212

X. Zhang et al. / Environmental Modelling & Software 46 (2013) 208e218

Fig. 3. Location of the Little River Experimental Watershed in Georgia (a) and South Fork Watershed in Iowa (b). where T1 is the time taken by the program to run on 1 processor, and Tp represents the time it takes to run the same code on P processors. Speedup measures how much faster a program runs in parallel. We executed PP-SWAT five times for each test case. For each execution, we set 2000 SWAT runs as the limit, because this number was sufficiently larger than the maximum number of cores (500) available for this research while not too large to consume too much time and resources. The results were averaged to derive the speedup of PP-SWAT with different number of processors. We employed three optimization objective functions (OOFs) in the LREW and five in the SFW. The OOFs in the LREW are Nash-Sutcliffe efficiency (NSE, Nash and

Sutcliffe, 1970) at I, J, and B monitoring stations calculated using daily streamflow data from 1997 to 2004. In the SFW, OOFs included four NSEs at the SF400, SF450, BC350, and TC325 monitoring stations calculated using daily streamflow from 2000 to 2011. In both watersheds a two-year warm-up was used to initialize watershed state variables such as soil water. 2.6. Parameters used to calibrate SWAT The objective of the original SWAT model’s design was to operate in large-scale ungauged basins with little or no calibration efforts (Arnold et al., 1998; Srinivasan

X. Zhang et al. / Environmental Modelling & Software 46 (2013) 208e218 et al., 2010). Therefore, most of the SWAT parameters can be estimated automatically using the GIS interface and meteorological information, combined with internal model databases (Manguerra and Engel, 1998; Srinivasan et al., 1998; Arnold et al., 1999; Fontaine et al., 2002). The parameters and associated allowable adjustment ranges, default values from ArcSWAT database (Winchell et al., 2010), and initial values obtained by literature review are shown in Table 1. These settings were based on Arnold et al. (2011), Van Griensven et al. (2006), Green et al. (2006), Van Liew et al. (2005, 2007), Zhang et al. (2009) and Beeson et al. (2011), and recommendations by Drs. Jeffrey G. Arnold and Mauro DiLuzio at USDA-ARS at Temple, Texas (Pers. Comm., 2012). Specific model structure and parameter settings are described in the results and discussion section for each experiment. 2.7. Test cases We designed four sets of test experiments to: 1) demonstrate the potential of PP-SWAT to improve the efficiency of SWAT calibration (T1); 2) examine PP-SWAT performance as influenced by parameter adjustment scales (T2); and 3) understand implications of model structure setup for calibrating SWAT to represent watershed behavior (T3). In T1, we examined the efficiency achieved by PP-SWAT in the LREW with simulation periods of 8 (1997e2004) and 3 (1997e1999) years and in the SFW with a simulation period of 14 (1998e2011) years. The speedup was calculated using a series of number of cores. For T2, we applied three parameter adjustment scales, i.e. watershed scale parameter adjustment (WSPA), groupedsubbasin scale parameter adjustment (GSPA), and subbasin scale parameter adjustment (SSPA), in both LREW and SFW to derive results for analysis. We aggregated subbasins in LREW and SFW into three and four groups, respectively. Those subbasins that are contributing to the same stream gauge are marked with a unique color and taken as one group (Fig. 3). In LREW, stream gauges I, F and B correspond to 5, 5, and 17 subbasins, respectively. The stream gauges SF400, BC350, SF450, and TC325 in SFW correspond to 34, 15, 14, and 23 subbasins, respectively. For T3, we launched PP-SWAT with two model structures with and without consideration of tile drainage and pothole to assess effects of omitting important hydrologic processes on model calibration. The importance of incorporating expert knowledge of watershed characteristics also was examined for improving model calibration to attain better representation of watershed behavior. In T2 and T3, we selected a population size of 100 for AMALGAM following Zhang et al. (2010) and used 51 cores based on our speedup vs. number of cores used analysis derived for the SFW case.

3. Results and discussion 3.1. Speedup achieved by PP-SWAT The speedup and wall time consumed by one run of SWAT were plotted against the number of cores in Fig. 4. A sequential run of SWAT in LREW took around 14.3 and 4.7 s respectively for 8-year

213

and 2-year simulations (LREW-8y and LREW-2y). While using small numbers of cores (e.g. 20 and 40), both the 8y and 2y curves were close to each other and scaled approximately linearly with the number of cores used (Fig. 4a). For LREW-2y, the speedup did not improve after 150 cores. The LREW-8y experiment achieved better parallel scaling, but eventually saturated around 200e300 processors. Speedup even declined in absolute terms for processor counts greater than about 450. This phenomenon is an extreme example of I/O saturation, in which the I/O demands so greatly exceed the system’s capacity that certain elements of the filesystem (notably, the metadata server) begin to perform less efficiently. This effect caused a further drop in performance, which accounted for the decline in speedup for very high processor counts. Note that, there was a small drop in speedup around 100 cores before reaching saturation for the LREW-2y experiment, which might be caused by the interference of other jobs involving intensive I/O operations. In SFW, a sequential run of SWAT with one core took about 120 s. By increasing the number of cores to 20, 40, and 60, we observed a nearly linear increase of speedup from 19, to 36, and to 46 (Fig. 4b), respectively. However, performance saturated at 60 processors and declined thereafter. This confirmed our finding in LREW that the benefit of using more cores can be canceled out by the penalty on performance caused by I/O overload. In the SFW case, adding more processors beyond 60 not only failed to provide any benefit; it was counterproductive. Overall, PP-SWAT showed its potential to reduce the time required to calibrate SWAT by a factor of 40e110 on Evergreen. For SFW, this improvement allowed us to complete 20,000 runs of SWAT in roughly 14 h with 60 cores compared to 644 h when using only one core. If multiple trials (five times in this research) were employed to approximate better the Pareto front that comprised of all non-dominated parameter solutions, the time saving benefit could reach about 3153 h (131 days). Even for the relatively simple LREW-8y experiment, PP-SWAT reduced one trial of SWAT calibration from 80 h to less than one hour. One SWAT run on a typical personal computer with multiple cores demanded less time than using one core on Evergreen. We observed that personal computers with two, four, and eight cores reduced the time required by one

Table 1 Parameters for calibration in the SWAT model. Code

Parameter

Description

Range of parameter values/ change percentage

Default

Initial adjustments

1

CN2

Curve number

ESCO GW_REVAP REVAPMN

) ) )

5

GWQMN

0e5000

1.0

)

6 7 8

GW_DELAY RCHRG_DP CH_K2

0e50 0e1 0.0e150

31 0.05 0.0

) ) )

9 10 11 12 13 14 15

SURLAG DDRAIN TDRAIN GDRAIN POT_FR POT_TILE POT_VOLX

Soil evaporation compensation factor Ground water re-evaporation coefficient Threshold depth of water in the shallow aquifer for re-evaporation to occur (mm) Threshold depth of water in the shallow aquifer required for return flow to occur (mm) Groundwater delay (days) Deep aquifer percolation fraction Effective hydraulic conductivity in main channel alluvium (mm hr1) Surface runoff lag coefficient (day) Depth to subsurface tile drain (mm) Time to drain soil to field capacity (hr) Drain tile lag time (hr) Fraction of HRU that drains into pothole Average daily outflow to main channel from tile flow (mm) Maximum volume of water stored in the pothole (mm)

55e77a 59e87b 0.95 0.02 1.0

)

2 3 4

10% to 10%a 50%e10%b 0.75e1.0 0.02e0.2 0e500

0.5e10 20e20%@,* [email protected] [email protected] [email protected] [email protected] 90%e200%@

4.0 na na na na na na

) 1000y 24y 96y 0.0e0.2$ 25$ 200$

Note: ) denote inheritance from default values; @ pothole and tile drainage related parameters; only applied in SFW; *values of DDRAIN are not allowed to exceed maximum soil depth 200 mm (Beeson et al., 2013); y from Green et al. (2006); $ from Beeson et al. (2013). a For LREW. b For SFW.

214

X. Zhang et al. / Environmental Modelling & Software 46 (2013) 208e218

vary at watershed scale, therefore, the parameter dimensions of SSPA and GSPA were less than 15 (subbasin or subbasin group number). Although we repeatedly simulated PP-SWAT with long iterations, the intricacy of solving high dimension problems made it difficult for PP-SWAT to provide better performance consistently with SSPA as compared to the other two schemes. Objective function values of the non-dominated solutions were shown in Figs. 5 and 6 to illustrate the sensitivity of SWAT calibration to different parameter adjustment scales. For SFW, all nondominated parameter solutions achieved separately by WSPA, GSPA, and SSPA had a tile flow fraction larger than 0.56 and less than 0.70, matching well with the expected range of tile flow fractions 0.60e0.75. As we did not have accurate basin-wide measurements of tile flow, here we left tile flow fraction out of our analysis but focused on comparing NSEs at the four streamflow monitoring stations. With different parameter adjustment scales, substantial variation was exhibited in terms of the distribution and range of the objective functions’ values (Fig. 5), indicating that parameter adjustment scale was an important factor that influenced SWAT calibration. The best NSE values at TC325, BC350, and

Fig. 4. PP-SWAT parallel speedup as a function of the number of cores used in the run. (a for LREW and b for SFW).

SWAT run by 5e25%, depending on specific computer configurations. This alerted us that the actual speedup benefit of using PPSWAT on Evergreen, for one execution, was over estimated as compared to SWAT on typical personal computers. The experimental results also showed that using too many cores may overstress the I/O system and lead to speedup decline. Conservatively selecting the number of cores helped avoid this negative effect but could not guarantee reaching the maximum speedup. Redesigning the I/O format of SWAT could further improve the efficiency of utilizing SWAT in HPC applications. 3.2. Effects of parameter adjustment scales on SWAT calibration We executed PP-SWAT five times, each with 20,000 SWAT runs, to optimize multiple objective functions simultaneously and derive non-dominated solutions for analysis. For SFW, PP-SWAT started from the initial parameter settings described in Table 1 while LREW-8y started from the default settings. Theoretically, SSPA should obtain objective function values at least as good as those achieved by the GSPA and WSPA, because the parameter solution space of WSPA was a subset of that of GSPA, which was further encompassed in the parameter solution space of SSPA. However, this was not always true because adjusting parameters at a finer scale led to optimization problems with higher parameter dimensions and more complex mapping relationships between parameter solutions and objective function values. For example, in SFW, the parameter dimensions for SSPA, GSPA, and WSPA were 1205, 57, and 15 respectively. Note, SURLAG was only allowed to

Fig. 5. Dominance relationship between non-dominated parameter solutions obtained with different parameter adjustment scales in LREW.

X. Zhang et al. / Environmental Modelling & Software 46 (2013) 208e218

SF400 obtained by SSPA were pronouncedly higher than those found by GSPA and WSPA. At SF450, GSPA achieved higher NSE values than SSPA. The best NSE values reached by WSPA were consistently lower than those of GSPA and SSPA, though, suggesting the importance of addressing the spatial heterogeneity of watershed characteristics in model calibration. For LREW, we also observed substantial difference among the distribution and range of NSE values of the non-dominated parameter solutions attained with the three parameter adjustment schemes (Fig. 6). Similar to the SFW results, SSPA did not always achieve better NSE values than the other two schemes, while WSPA never attained one of the best NSE values at the three monitoring stations. We further analyzed the dominance relationship between nondominated solutions obtained by different parameter adjustment schemes. We used three sub-plots in Fig. 5 to illustrate the dominance relationships in LREW. To save space, only two sub-plots in Fig. 6 were used in SFW. For SFW, by combining and comparing the non-dominated solutions obtained by the WSPA, GSPA, and SSPA, a total of 105 parameter sets were selected into the final nondominated solutions, among which SSPA and GSPA contributed 90 and 15, respectively. All parameter solutions from the WSPA were dominated. Similar results were obtained for LREW. That is, 37 and 43 members of the final non-dominated parameter solutions were contributed by SSPA and GSPA respectively, while none from WSPA. 3.3. Impacts of model structure setup for calibrating SWAT to represent watershed behavior Parallel computing based auto-calibration software, such as PPSWAT, can benefit modelers by significantly improving calibration efficiency. However, to ensure that calibration results are

215

physically meaningful, selecting model structures that appropriately represent key hydrologic processes is a critical procedure. In SFW, using the default parameters from the SWAT database (Table 1) which did not characterize the potholes or tile drainage, none of the NSE values at the four monitoring stations was larger than 0.5 (Table 2). Through literature review and initial parameterization of pothole and tile drainage characteristics, all four objective function values improved, especially at SF400. However, the SWAT performance was still not satisfactory. The default and initial parameterization did not yet provide credible tile flow simulation, with tile flow fractions of 0% and 6.5% respectively. These results suggest that careful parameter calibration is important to further improve SWAT performance, especially regarding tile flow. Starting from each of the default and initial sets of parameters, we implemented PP-SWAT with SSPA to calibrate SWAT. Note, the experiment based on default parameter settings only needed to adjust 9 parameters (Table 1) and test four objective functions, because tile drainage and pothole-related processes were not addressed. This experiment was referred to as SFW-default-SSPA. We also repeated the calibration procedures five times (each with 20,000 runs) to derive non-dominated parameter solutions. The results obtained by the SFW-default-SSPA experiment shown in Fig. 7 were compared with those from the SFW-initial-SSPA experiment, which denoted the experiment executing PP-SWAT with SSPA and the initial parameter settings. Results of the SFWinitial-SSPA experiment were obtained from the previous PPSWAT executions in SFW as shown in Fig. 6. To facilitate comparisons between SFW-default-SSPA and SFW-initial-SSPA, objective function values of the non-dominated parameter solutions from these two experiments were both plotted in Fig. 7. Overall, the non-dominated solutions found by the SFWdefault-SSPA experiment were better than those obtained by SFW-initial-SSPA in terms of NSEs at the four monitoring stations (Fig. 7). All NSE values obtained by the SFW-default-SSPA experiment were larger than 0.50 and can be taken as satisfactory in terms of streamflow simulation (Moriasi et al., 2007). However, the contribution of tile flow was zero in this experiment due to the omission of tile drainage structures. In the SFW-default-SSPA experiment, the calibrated parameters compensated for the incorrect model structure to nevertheless yield a good match with observed streamflow. In contrast, the SFW-initial-SSPA experiment arrived at over 50 parameter solutions that not only had NSE values larger than 0.50 at all four streamflow stations but also simulated well tile flow contribution fractions (calculated as tile flow divided by total water yield) between 0.62 and 0.69, which fell within the expected range of 60e75%. This result corroborates previous research (e.g. Wagener and Gupta, 2005) and illustrates the importance of verifying model structures prior to launching PPSWAT. Otherwise, the parameter calibration may lead to a false impression that the model performs satisfactorily, though it essentially omits significant hydrological processes in a specific watershed. Inadequate consideration of model structure may lead to non-meaningful calibration results (Kollat et al., 2012).

Table 2 Objective function values at the four monitoring stations and tile flow fractions in the SFW with default and initial settings of parameters.

Fig. 6. Illustration of dominance relationship between non-dominated parameter solutions obtained with different parameter adjustment scales in SFW.

Objective functions

Default

Initial

NSE NSE NSE NSE TFF

10.89 0.42 0.21 0.24 0

0.02 0.45 0.24 0.26 0.065

at at at at

SF400 BC350 SF450 TC325

216

X. Zhang et al. / Environmental Modelling & Software 46 (2013) 208e218

Fig. 7. Comparison of objective function values of non-dominated parameter solutions as impacted by changing initial model settings.

3.4. Discussion PP-SWAT is the first parallel computing software programmed in Python for multi-objective calibration of SWAT. During the testing of PP-SWAT, we found that the combination of Python and OpenMPI caused jobs to fail when using InfiniBand for connections between the compute nodes; however, these jobs were able to run successfully on the Ethernet network. We believe this to be an inevitable result of mixing Python and MPI when worker processes spawn additional processes using os.system() or a similar facility, and should therefore be taken into consideration by any researchers attempting a similar setup. Through combining Python, mpi4py, and OpenMPI, PP-SWAT achieved speedups around or larger than 50 for two test watersheds on a distributed computing cluster, which are much higher than the speedups reported in recent literature on parallel parameter calibration of SWAT (e.g. Rouholahnejad et al., 2012; Zhang et al., 2012; Yalew et al., 2013), illustrating the high potential of using distributed clusters to run SWAT efficiently. The efficiency test results vary widely depending on the run configuration, but one feature evident in all of the calculations was a point beyond which adding more processors would not increase performance. This phenomenon was a result of saturating the cluster’s I/O subsystem. As the rate of I/O requests approaches the system’s capacity, data read operations begin to take longer to complete, adding a source of overhead that limits parallel performance. SWAT I/O demands are particularly intense, due to the large number of relatively small input files. The code incurs I/O overhead not just for the read operations, but for file open and close operations. Furthermore, breaking data into multiple files inhibits the buffering and caching that the Linux operating system normally performs automatically on the user’s behalf. We also observed the phenomenon that efficiency of parallel computing may decline when too many processors were used,

which has not been presented and discussed in previous SWAT literature. This result has significant implications for the application of PP-SWAT on clusters with similar configurations to Evergreen. It is important to size jobs of SWAT model runs so as to avoid overstressing the system. However, it is difficult to find a simple indicator to decide on the number of cores where maximum speedup occurs. I/O load is related to many factors, such as input files (number and size) and length of simulation period. Overall I/O performance is influenced by the capacity of metadata server and its sensitivity to I/O load increases. In addition, overhead associated with managing the task pool, interference from other jobs involving intensive I/O, and intensified memory and cache competition may further complicate the determination of an appropriate number of cores for achieving the maximum speedup. From the test results obtained in SFW and LREW, a conservative selection of 60 cores or fewer appears to achieve a speedup scaling well with the number of cores. If users want to fully exploit the power of a cluster, preliminary experiments such as these are required. For those clusters with similar configurations to Evergreen, we suggest using less than 200 cores for numerical experiments. The design of SWAT’s input files are not ideal for large-scale computing. The scaling limitations we encountered here could be greatly mitigated by consolidating the input data into a single structured input file (using, for example, tools like netCDF) that would allow the input to be read in using large block reads. Redesigning the SWAT I/O format along these lines would help eliminate I/O as a scaling bottleneck. The power of PP-SWAT allowed us to perform numerical experiments that require long iterations of the computationally intensive SWAT in affordable time to examine calibration issues that have not been fully addressed in the SWAT literature. For example, with the multiple parameter adjustment schemes in PP-SWAT that operates at different scales, we conducted a total of 600,000 SWAT runs (300,000 in LREW and 300,000 in SFW) in to enhance our understanding of the impacts of parameter adjusting resolution on SWAT calibration. Overall, addressing parameter variability at grouped-subbasin and subbasin scales helped calibrate SWAT to achieve better statistical objective function values. However, with limited computational resources and time, using finer parameter adjustment scales does not guarantee better calibration results, because the associated increase in parameter dimensionality led to more complex optimization problems to be resolved. Similar to the results reported by Zhang et al. (2009, 2010) on comparing different optimization algorithms for SWAT, we observed that the relative performance of WSPA, GSPA, and SSPA varied across different watersheds, making it difficult to generalize improvements resulting from using finer scale parameter adjustment. In this context, the parallel computing efficiency achieved by PP-SWAT allows significant time savings and makes practical applying multiple parameter adjustment scales to provide a wider range of calibration results that modelers can further evaluate and choose from. Through another numerical experiment, we emphasized the importance of appropriately setting up the structure of SWAT before launching PP-SWAT, which is often not exercised in SWAT applications. The amount of information contained in streamflow data may not be adequate to determine a large number of parameters (Jakeman and Hornberger, 1993). PP-SWAT can obtain satisfactory parameter solutions in terms of streamflow simulations, even when key hydrologic processes are overlooked. Thus, in addition to fitting streamflow data, expert knowledge and observed data of other hydrological processes are important for calibrating hydrologic models to mimic watershed behavior properly (Moriasi et al., 2007; Zhang et al., 2011; Wagener and Montanari, 2011). To obtain valid calibration results, it is suggested to examine carefully whether critical hydrologic components are incorporated into model structure before launching PP-SWAT to consume precious

X. Zhang et al. / Environmental Modelling & Software 46 (2013) 208e218

computational resources. Also progress has been made to derive new statistics that better represent information contained in streamflow data. For example, Joseph et al. (2013) showed that NSE does not address the asynchronicity between simulated and observed response hydrographs and may lead to significant underestimation of surface runoff generation when used as the objective function to be optimized, while a new event-adaptive approach they proposed can help cope with the event-flow synchronization errors and achieve more realistic estimation of water budget components. Incorporating such objective functions into PP-SWAT holds the promise to further improve the reliability of auto-calibration and deserves further research in the future. PP-SWAT is a highly modularized program (Fig. 2), whereby making it readily to be modified for different watershed models. In general, two components of PP-SWAT require revisions. First, the procedures used to read in default parameter values of distributed modeling units is expected to be revised substantially, as the organization and format of input files for other models may be completely different from those of SWAT. As well, the three modules (i.e. MP, RS, and COF modules as described in Section 2.3) that were particularly designed for executing SWAT are also expected to undergo changes. The MP module, which involves intensive input files reading/writing, should be modified to be in line with other models. For RS, we only need to change the name of the executable program to that of the model to be executed. In addition, moderate modifications of the COF modules should be conducted in order to make PP-SWAT to perform post-processing the outputs of other models and compare them with corresponding observed data. Depending on the actual difference between SWAT and another model to be used, minor efforts may be required to revise some other procedures in PP-SWAT. Overall, we do expect fundamental changes in the structure to occur in order to adapt PP-SWAT for other hydrologic/watershed models. As to replacing AMALGAM with other algorithms, the following changes are required: (1) replacing the operators employed in AMALGAM to evolve the parameter solutions and (2) including objective functions that are employed by a specific algorithm. As most of the codes in PP-SWAT (3500 lines out of a total 3900 lines) were programmed to deal with non-AMALGAM operations, such as modifying SWAT parameters, running SWAT in parallel, and calculating objective function values, we expect the efforts required to substituting other algorithm for AMALGM are much less than adapting PP-SWAT to other models. However, the actual workload is dependent on the complexity of the target algorithm. We also expect minimal efforts required for PP-SWAT to include other types of objective functions, because COF module has already possesses the functions to read in the outputs from SWAT and the corresponding observations and simple changes of equations used to compare the observed and simulation time series can easily result in different types of objective functions. 4. Conclusions In this investigation, we developed a parallel computing software package in Python, PP-SWAT, for efficient calibration of the SWAT model. This software employed Python 2.6, mpi4py and OpenMPI to parallelize the AMALGAM algorithm and allowed for simultaneously optimizing multiple objectives, such as simulations of streamflow at multiple sites within a watershed. We tested PPSWAT in two watersheds on a mid-sized cluster with a Lustre filesystem. The results showed that PP-SWAT achieved speedups of 45e110 depending on the complexity of the watershed model and capacity of the metadata server dealing with I/O operations, beyond a threshold. Increasing the number of processors did not necessarily help improve efficiency. For example in the South Fork Watershed in Iowa, using 60 processors provided the maximum

217

speedup, after which efficiency began to decline. Some of these performance limitations appeared to be related to how the SWAT input data are organized, and they could be improved upon by reorganizing the data into a more HPC-friendly format. Through experiments in LREW and SFW, we found that a widely expanded pool of parameters can be derived for modelers to choose among in comparison to only utilizing watershed scale parameter adjustment and adjusting parameters on a finer scale holds the promise to provide better model performance than applying a uniform parameter addend/multiplier for the entire watershed, as it was shown in the SFW test cases. As finer resolution parameter adjustment leads to more complex optimization problems with increased parameter dimension to resolve, the power of PP-SWAT in achieving significant time savings for long sequences of SWAT runs makes practical exercising multiple parameter adjustment schemes operating at different scales in affordable time for SWAT calibration. PP-SWAT still can yield calibration results matching observed data even when important hydrologic features, such as tile drainage, are entirely overlooked. In order to attain calibrated SWAT appropriately representing watershed behaviors, caution should be exercised in choosing appropriate model structures before executing PP-SWAT, especially for complex SWAT models that could consume days or weeks to calibrate even on a supercomputer. Overall, PP-SWAT developed here represents a unique contribution to and diversifies the existing base of SWAT automatic calibration software. The functions of PP-SWAT to assess tradeoffs among multiple optimization objectives and adjust parameters at different scales, in conjunction with its capability to achieve substantial time savings, makes it an efficient and useful tool for helping SWAT users find parameter solutions that represent well the hydrological processes of a watershed. Acknowledgments We sincerely appreciate the valuable comments provided by the four anonymous reviewers, which substantially improved the quality of this paper. This work was partially funded by the DOE Great Lakes Bioenergy Research Center (DOE BER Office of Science DE-FC02-07ER64494, DOE BER Office of Science KP1601050, DOE EERE OBP 20469-19145), and NASA (NNH08ZDA001N and NNH12AU03I). Appendix A. Supplementary data Supplementary data related to this article can be found at http:// dx.doi.org/10.1016/j.envsoft.2013.03.013. References Arnold, J.G., Srinivasan, R., Muttiah, R.S., Williams, J.R., 1998. Large area hydrologic modelling and assessment part I: model development. Journal of the American Water Resources Association 34, 73e89. Arnold, J.G., Srinivasan, R., Muttiah, R.S., Allen, P.M., 1999. Continental scale simulation of the hydrologic balance. Journal of the American Water Resources Association 35, 1037e1052. Arnold, J.G., Kiniry, J.R., Srinivasan, R., Williams, J.R., Haney, E.B., Neitsch, S.L., 2011. Soil and Water Assessment Tool Input/Output File Documentation Version 2009. Blackland Research Center, Texas Water Resources Institute, College Station, Texas. Technical Report No. 365. Available at: http://swat.tamu.edu/ media/19754/swat-io-2009.pdf (Verified on August 21, 2012). Beeson, P.C., Doraiswamy, P.C., Sadeghi, A.M., Di Luzio, M., Tomer, M.D., Arnold, J.G., Daughtry, C.S.T., 2011. Treatments of precipitation inputs to SWAT models. Transactions of ASBAE 54, 2011e2020. Beeson, P.C., Sadeghi, A.M., Lang, M.W., Tomer, M.D., Daughtry, C.S.T., 2013. Sediment Delivery Estimates in Water Quality Models Altered by Resolution and Source of Topographic Data. Journal of Environmental Quality. http://dx.doi.org/ 10.2134/jeq2012.0148. Beven, K.J., 2000. Rainfall-Runoff Modeling: the Primer. John Wiley & Sons Press, New York.

218

X. Zhang et al. / Environmental Modelling & Software 46 (2013) 208e218

Bosch, D.D., Sheridan, J.M., Lowrance, R.R., Hubbard, R.K., Strickland, T.C., Feyereisen, G.W., Sullivan, D.G., 2007. Little river experimental watershed database. Water Resources Research 43, W09470. http://dx.doi.org/10.1029/ 2006WR005844. Bryan, B.A., 2013. High-performance computing tools for the integrated assessment and modeling of social-ecological systems. Environmental Modelling and Software 39, 295e303. http://dx.doi.org/10.1016/j.envsoft.2012.02.006. Confesor, R.B., Whittaker, G.W., 2007. Automatic calibration of hydrologic models with multi-objective evolutionary algorithm and pareto optimization. Journal of the American Water Resources Association 43, 981e989. Dalcin, L.D., Paz, R.R., Kler, P.A., Cosimo, A., 2011. Parallel distributed computing using Python. Advances in Water Resources 34, 1124e1139. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T., 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6, 182e197. Fontaine, T.A., Cruickshank, T.S., Arnold, J.G., Hotchkiss, R.H., 2002. Development of a snowfall-snowmelt routine for mountainous terrain for the soil water assessment tool (SWAT). Journal of Hydrology 262, 209e223. Gassman, P.W., Reyes, M., Green, C.H., Arnold, J.G., 2007. The soil and water assessment tool: historical development, applications, and future directions. Transactions of ASABE 50, 1212e1250. Green, C.H., Tomer, M.D., Di Luzio, M., Arnold, J.G., 2006. Hydrologic evaluation of the soil and water assessment tool for a large tile-drained watershed in Iowa. Transactions of ASABE 49, 413e422. Gupta, H.V., Sorooshian, S., Yapo, P.O., 1998. Toward improved calibration of hydrologic models: multiple and noncommensurable measures of information. Water Resources Research 34, 751e763. Gupta, H.V., Sorooshian, S., Yapo, P.O., 1999. Status of automatic calibration for hydrologic models: comparison with multilevel expert calibration. Journal Hydrologic Engineering 4, 135e143. Haario, H., Saksman, E., Tamminen, J., 2001. An adaptive metropolis algorithm. Bernoulli 7, 223e242. Jakeman, A.J., Hornberger, G.M., 1993. How much complexity is warranted in a rainfall-runoff model? Water Resources Research 29, 2637e2649. Joseph, J.F., Sharif, H.O., Arnold, J.G., Bosch, D.D., 2013. The impact of asynchronicity on event flow estimation in basin-scale hydrologic model calibration. Journal of the American Water Resources Association. http://dx.doi.org/10.1111/jawr.12011. Kalyanapu, A.J., Shankar, S., Pardyjak, E.R., Judi, D.R., Burian, S.J., 2011. Assessment of GPU computational enhancement to a 2D flood model. Environmental Modelling and Software 26, 1009e1016. Kennedy, J., Eberhart, R.C., 2001. Swarm Intelligence. Morgan Kaufmann, San Mateo, CA. Kollat, J.B., Reed, P.M., Wagener, T., 2012. When are multi-objective calibration tradeoffs in hydrologic models meaningful? Water Resources Research 48, W03520 http://dx.doi.org/10.1029/2011WR011534. Li, T., Wang, G., Chen, J., Wang, H., 2011. Dynamic parallelization of hydrological model simulations. Environmental Modelling and Software 26, 1736e1746. Manguerra, H.B., Engel, B.A., 1998. Hydrologic parameterization of watersheds for runoff prediction using SWAT. Journal of the American Water Resources Association 34, 1149e1162. Miller, M., 2009. Cloud Computing Pros and Cons for End Users. Available at: http:// www.informit.com/articles/article.aspx?p¼1324280 (accessed on 18.08.12.). Moriasi, D.N., Arnold, J.G., Van Liew, M.W., Bingner, R.L., Harmel, R.D., Veith, T.L., 2007. Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Transactions of ASABE 50, 885e900. Nash, J.E., Sutcliffe, J.V., 1970. River flow forecasting through conceptual models: part I. A discussion of principles. Journal of Hydrology 10, 282e290. Pacheco, P., 2011. An Introduction to Parallel Programming, first ed. Morgan Kaufmann, Massachusetts, USA. Razavi, S., Tolson, B.A., Burn, D.H., 2012. Numerical assessment of metamodelling strategies in computationally intensive optimization. Environmental Modelling and Software 34, 67e86. Richardson, C.W., Bucks, D.A., Sadler, E.J., 2008. The conservation effects assessment project benchmark watersheds: synthesis of preliminary findings. Journal of Soil and Water Conservation 63, 590e604. Rouholahnejad, E., Abbaspour, K.C., Vejdani, M., Srinivasan, R., Schulin, R., Lehmann, A., 2012. A parallelization framework for calibration of hydrological models. Environmental Modelling Software 31, 28e36. Sheridan, J.M., 1997. Rainfall-streamflow relations for coastal plain watersheds. Transactions of ASAE 13, 333e344. Singh, B., Pardyjak, E.R., Norgren, A., Willemsen, P., 2011. Accelerating urban fast response Lagrangian dispersion simulation using inexpensive graphics processor parallelism. Environmental Modelling and Software 26, 739e750.

Srinivasan, R., Ramanarayanan, T.S., Arnold, J.G., Bednarz, S.T., 1998. Large area hydrologic modeling and assessment: part II e model application. Journal of American Water Resources Association 34, 91e102. Srinivasan, R., Zhang, X., Arnold, J.G., 2010. SWAT ungauged: hydrological budget and crop yield predictions in the upper Mississippi River Basin. Transactions of the ASABE 53, 1533e1546. Storn, R., Price, K., 1997. Differential evolution e a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization 11, 341e359. Sulis, A., 2009. GRID computing approach for multireservoir operating rules with uncertainty. Environmental Modelling and Software 24, 859e864. Tang, Y., Reed, P.M., Kollat, J.B., 2007. Parallelization strategies for rapid and robust evolutionary multiobjective optimization in water resources applications. Advances in Water Resources 30, 335e353. Tomer, M.D., James, D.E., 2004. Do soil surveys and terrain analyses identify similar priority sites for conservation? Soil Science Society of America Journal 68, 1905e1915. Tomer, M.D., Moorman, T.B., Rossi, C.G., 2008. Assessment of the Iowa River’s South Fork watershed: part 1. Water quality. Journal of Soil and Water Conservation 63 (6), 360e370. Van Griensven, A., Meixner, T., Grunwald, S., Bishop, T., Di luzio, M., Srinivasan, R., 2006. A global sensitivity analysis tool for the parameters of multi-variable catchment models. Journal of Hydrology 324, 10e23. Van Liew, M.W., Arnold, J.G., Bosch, D.D., 2005. Problems and potential of autocalibrating a hydrologic model. Transactions of ASAE 48, 1025e1040. Van Liew, M.W., Veith, T.L., Bosch, D.D., Arnold, J.G., 2007. Suitability of SWAT for the conservation effects assessment project: a comparison on USDA ARS watersheds. Journal of Hydrologic Engineering 12, 173e189. Vrugt, J.A., Robinson, B.A., 2007. Improved evolutionary optimization from genetically adaptive multimethod search. Proceedings of National Academic of Science 104, 708e711. Vrugt, J.A., Nualláin, B.Ó., Robinson, B.A., Bouten, W., Dekker, S.C., Sloot, P.M.A., 2006. Application of parallel computing to stochastic parameter estimation in environmental models. Computers Geosciences 32, 1139e1155. Wagener, T., Gupta, H.V., 2005. Model identification for hydrological forecasting under uncertainty. Stochastic Environmental Research and Risk Assessment 19, 378e387. Wagener, T., Montanari, A., 2011. Convergence of approaches toward reducing uncertainty in predictions in ungauged basins. Water Resources Research 47, W06301. http://dx.doi.org/10.1029/2010WR009469. Whittaker, G., 2004. Use of a Beowulf cluster for estimation of risk using SWAT. Agronomy Journal 96, 1495e1497. Winchell, M., Srinivasan, R., Di Luzio, M., Arnold, J.G., 2010. ArcSWAT Interface for SWAT2009, User’s Guide. Blackland Research and Extension Center Texas AgriLife Research, Temple, Texas 76502, p. 495. Wu, Y., Liu, S., 2012. Automating calibration, sensitivity and uncertainty analysis of complex models using the R package Flexible Modeling Environment (FME): SWAT as an example. Environmental Modelling Software 31, 99e109. Yalew, S., van Griensven, A., Ray, N., Kokoszkiewicz, L., Betrie, G.D., 2013. Distributed computation of large scale SWAT models on the Grid. Environmental Modelling and Software 41, 223e230. http://dx.doi.org/10.1016/j.envsoft.2012.08.002. Yang, J., Reichert, P., Abbaspour, K.C., 2007. Bayesian uncertainty analysis in distributed hydrologic modeling: a case study in the Thur River Basin (Switzerland). Water Resources Research 43, W10401. Zhan, X., 2005. Parallel Fortran-MPI software for numerical inversion of the Laplace transform and its application to oscillatory water levels in groundwater environments. Environmental Modelling and Software 20, 1736e1746. Zhang, X., Srinivasan, R., Debele, B., Hao, F., 2008. Runoff simulation of the headwaters of the Yellow River using the SWAT model with three snowmelt algorithms. Journal of the American Water Resources Association 44, 48e61. Zhang, X., Srinivasan, R., Zhao, K., Van Liew, M., 2009. Evaluation of global optimization algorithms for parameter calibration of a computationally intensive hydrologic model. Hydrological Processes 23, 430e441. Zhang, X., Srinivasan, R., Van Liew, M., 2010. On the use of multi-algorithm, genetically adaptive multi-objective method for multi-site calibration of the SWAT model. Hydrological Processes 24, 955e969. Zhang, X., Srinivasan, R., Arnold, J.G., Izaurralde, R.C., Bosch, D.D., 2011. Simultaneous calibration of surface flow and baseflow simulations: a revisit of the SWAT model calibration framework. Hydrological Processes 25, 2313e2320. Zhang, X., Izaurralde, R.C., Zong, Z., Zhao, K., Thomson, A.M., 2012. Evaluating the efficiency of a multi-core aware multi-objective optimization tool for calibrating the SWAT model. Transactions of ASABE 55 (5), 1723e1731.