Reply on RC1

The authors in this manuscript discussed the potential utility of applying the Isotopic General Circulation Models (iGCM) product isoGSM in assisting large scale hydrological modelling. It was found that spatial isotope data of precipitation from isoGSM can essentially help to reduce modeling uncertainty and improve parameter identifiability in comparison to a calibration method using only discharge and snow cover area fraction without any information of water isotope. The suggested isotopic-tracer-aided hydrological model showed high values for robustly representing runoff processes in large mountainous catchments with sparse observations in high mountain Asia. This topic is closely matched to the journal and the results can be interesting for hydrological modelling community. Additionally, it is well-written, logically organized, and easy to follow. Reviewer would like to point out two main concerns that may be helpful to generalize the results to improve the paper.

studies, global climate changes are changing streamflow regimes and groundwater storage in cold alpine regions on the TP (e.g., Xu et al., 2019;Lin et al., 2020;Yong et al., 2021). In a warming background, for example, frozen ground on the TP are experiencing significantly degradation, which will modify storage capacity of soil and groundwater and even the flow pathway. Hence, the question is that is it enough to justly constrain parameters, and shall the model structure be simultaneously changed in the study basin?

Response 2:
Thanks for your comment. We agree with you that the model structure changes in the warming background. However, we would like to illustrate three points here. First, some of the changing underlying surface conditions can be reflected by not only model structure, but also parameters themselves. For example, frozen ground degradation can lead to a larger water storage capacity and higher hydraulic conductivity, which can be reflected by the parameters WM, KKA and KKD in the model. Second, the tracer-aided modelling method can actually help diagnose model structure as well (e.g., Birkel et al., 2011), but such work was only conducted in large basins due to the limited precipitation isotope input data in large scale. This study mainly explored the utility of isotopic GCM data on driving tracer-aided model in large basin, thus provided the potential to conduct the works improving model structure in large basin scale. Thirdly, we applied the model at a relatively short time scale (less than one decade), during which the structure change is not significant.
We will add the discussion about this issue in the revised manuscript.

Comment 3:
Results suggested that model driven by the corrected isoGSM data can provide a more reliable ratio in determining the contributions of runoff components, especially the overestimations of glacier melt. The authors have compared the results with other assessments (e.g., Immerzeel et al., 2010). I know that accurate estimation of runoff components in a macro-basin is a tough task due to sparse observations, and thus maybe controversial in high mountain Asia basins. However, the reviewer suggested that more evidences (e.g., isotopic results or sub-basin results or neighboring observed data) besides modelling results should be added and compared to justify their results. And statistical results about glacial retreat in the YTR maybe help as another useful evidence for runoff components determinations. In addition, more physical explanations of adopted assumptions, equations (e.g., equation (1)) should be supplied.

Response 3:
Thanks for your suggestion. We will try to find more evidences to verify our results through following ways: Verifying the results of glacier melt estimation by comparing the calibrated DDF with the reported values estimated in a physical manner by glacier mass balance measurements (Zhang et al. 2006). This study also estimated the contribution of different runoff generation pathways (surface and subsurface runoff). We will compare the contribution of baseflow with the result estimated by groundwater model independently from hydrological modeling approach (Yao et al., 2021). An accurate baseflow estimation can lead to a more reliable estimation of water sources by constraining some of the parameters.
In addition, we will provide more physical explanations of the assumptions and equations in the revised manuscript.

Comment 4:
One of the reasons limiting tracer-aided model in applying in larger scale catchment lies in the lumped conceptual model structures. So the reviewer suggest that more information about model structures should be added in the section Introduction and methodology. How to delineate a larger scale basin into response units in your model for fully capturing the heterogenous natures of a basin? And how to organize the model structure to consider the strong spatial variability of runoff generations especially in vertical direction.

Response 4:
Thanks for your comment. We will provide more information about the THREW model structure in the revised paper. This study adopted the spatially semi-distributed model conceptualization of representative elementary watershed (REW) to enable the model to simulate the runoff processes in a large scale. The heterogenous natures of basin were captured by the distributed input data including climate factors, vegetation, soil and topography, which affected the runoff generation processes.
However, we would point out that developing a distributed tracer-aided model is just a work with more difficulties compared to lumped model (because the tracer processes need to be combined with a rather complex description of runoff processes), but not a key problem. There are already some distributed tracer-aided models (e.g., isoWASA adopted in He et al. (2019)). The challenge is that such models cannot be applied in large basin due to the input data problem, which is the focus of our study. We would clarify this issue in the revised paper.

Comment 5:
As pointed by the authors, runoff in this region is highly vulnerable under climate warming, and hence the land covers, soils and groundwater aquifers. How do they consider these changing environmental factors in hydrological modelling?

Response 5:
Thanks for your comment. We think that the changing factors can be reflected by the changing model structures, parameters and input data. We need to acknowledge that the changing conditions are far less than adequately represented in current model due to lack of adequate understanding of influence of changing condition on runoff generation mechanism. Some of the changes can be represented by model parameters, so we can represent such kind of changes by tuning model parameters including by using isotopic data. But more studies are required to understand the influencing mechanisms.
However, this study applied the model at a relatively short time scale (less than one decade), during which the problem of changing condition is not a big issue. This study mainly focused on finding more accurate parameters in a given period.

Comment 6:
Could you provide more details about how model parameters be constrained or calibrated in terms of isotopic data?

Response 6:
Thanks for your question. The parameter was constrained by involving the behavior of isotope simulation in the optimization objective of calibration process. We will provide more detail of this in the introduction part of revised paper.

Comment 7:
Does the TPSCE data include glacier in snow cover, or not?

Response 7:
Yes. The TPSCE data is generated by mering multisource snow cover datasets (Chen et al., 2018), and includes the glacier area. We can see that the snow cover area in the KR catchment has a minimum value, which is close to the glacier covered area ratio.

Comment 8:
As is known precipitation condensing at cooler temperatures tends to be more depleted in the heavier stable isotopes, thus precipitation falling at higher latitudes, at higher elevations, and further inland tends to be isotopically depleted (Yang et al., 2020). So try to explain the physical meaning and extent of the coefficients (e.g., x, y) in Equation (1).

Response 8:
Thanks for your comment. The equation 1 was used to capture the elevation effect and continental effect. The measurement stations were approximatively at the same latitude, and the extent of YBR basin was within a small range of latitude ( Fig. 1 in article), thus latitude was not chosen as a variable in regression. Longitude can reflect the distance from the station to the mainland border, thus the coefficient y is expected to be higher than 0. The coefficient x reflects the altitudinal lapse of precipitation isotope composition, thus is expected to be lower than 0. The estimated values of coefficient were same as expected (x = -0.003, y = 0.574).

Comment 9:
Isotopic composition of glacier meltwater in this catchment was assumed to be -18.9‰, why a constant value was adopted here. The uncertainty of isotopic data in glacier as well as precipitation for hydrological modelling should be discussed.

Response 9:
Thanks for your question. The uncertainty that isotope input data brings to hydrological model is an important issue to discuss. We found that large number of studies indicated that the isotope composition of glacier melt had very small variability, and the value were much lower than that of precipitation (e.g., He et al., 2019;Cable et al., 2011;Rai et al., 2019;Wang et al., 2016). So it is reasonable to assume the isotope composition of glacier melt as a constant value, when there is no available measurement data. However, the value of the assumed composition will affect the model, especially the estimated contribution of water sources, and we will try to discuss this issue in the revised paper.

Comment 10:
Equation (3) is similar to Equation (1). However, the equation has deprecated the term longitude here. Why?

Response 10:
Equation (1) was used to interpolate the point-scale measurement data to the whole basin, and the term longitude reflected continental effect. Equation (3) was used to correct the output of isotopic GCM model, which tended to have larger error in the regions with higher elevation, because of the complex regional topography, which cannot be captured well by the coarse spatial resolution of GCM. It seems that no mechanism can make the error of GCM change with longitude, thus it was deprecated in Equation (3). However, the choice of regression term in regression and bias correction will undoubtably have important influence on the modelling result. Consequently, we still need to do lots of works to explore a general way to drive tracer-aided model using isotopic GCM data (e.g., to have a better understanding on the bias characteristic of the iCGM data).

Comment 11:
The standard for REW delineation? Why do you sub-divide the whole YTP into 63 units and however 41 in the more smaller catchment KR?

Response 11:
The REW (representative elementary watershed) approach is adopted based on the selfsimilar characteristics of a watershed and its sub-watersheds (Reggiani et al., 1999). REW is considered as the fundamental component of hydrological processes and modelling, in which series of balance equations are established. Consequently, either a whole watershed, or a certain level of sub-watershed can be regarded as a REW. The major principle of REW scale is the scale of interest, modelling purpose, and the data availability (Tian et al., 2006;Tian et al., 2008). The REW division decides the trade-off between accuracy and simulation efficiency (the more REWs we divided, the higher accuracy but lower efficiency we got). The efficiency is especially important for multi-objective calibration, thus this study only divided the YTR basin into 63 REWs. This was also the division adopted by Tian et al. (2020), which got a good performance of hydrological simulation in YTR basin. For the smaller KR catchment, the accuracy and efficiency can be balanced by dividing the catchment to 41 REWs, which was also adopted in a previous work (Nan et al. 2021). We will add above explanations in the revised version of manuscript.

Comment 12:
Why NSE threshold is significantly larger in maco-YTR than in smaller scale of KR?

Response 12:
We found that it was relatively easier to get good simulation and high NSE (> 0.9) in YTR than in KR (the highest NSE was only around 0.85). We attribute this result to two aspects of reasons. First is the input data. In the large YTR basin, we used the gridded data in large scale to reflect the spatial variability of climate factors. In the small KR catchment (but with large elevation range), we can only infer the precipitation and temperature in high elevation according to the climate data at catchment outlet, and the reported elevation lapse rate estimated by the data in lower elevations. Thus the input data was supposed to be better in YTR basin than KR catchment. Secondly, in the large YTR basin, the runoff processes were complex and the effect of multiple processes can compensate each other, thus good simulation was easier to obtain. In the small KR catchment, the runoff process was relatively simple, thus it was more difficult to simulate if the controlling runoff process was not captured well.

Comment 13:
The authors can refer to some reported contemporaneous isotopic data if possible, add some sporadic-distributed data as additional evidences besides the continuous observations in 2005.

Response 13:
Thanks for your suggestion. We will try to find more evidences to verify our result. However, we think that the discontinuous and sporadic-distributed isotope data is not suitable to verify the result, because our results indicated that the model performed better on capturing the seasonal variation of river isotope, but not as well on simulating the isotope signature for a given date or a short period. Consequently, in the paper, we divided the limited available isotope data into two groups, i.e., the data at outlet station for calibration, and the data at internal stations for validation. Nonetheless, we will use some other evidences to verify the reasonability of our result as mentioned in response 3.

Comment 14:
In which stations the model performance in YTR was shown in Table 3  Nuxia for discharge, the whole YTR basin for SCA, and all the four stations for isotope. We will make this clearer in the revised paper.

Comment 15:
Why the dual-objective has obtained the best results, while it produced the worst MAE values on another hand? However, the two scenarios adopting isotopic data as supplements for modelling could get better results of runoff components. More details should be revealed why the latter two scenarios calibrate the model at the cost of precision, and in order to obtain more accurate predictions, part of hydrological processes must have been distorted in the dual-objective to compensate other wrong representation in hydrological process simulation.

Response 15:
Thanks for your comment. This is indeed an important issue to illustrate the role of isotope data to improve the model behavior.
The MAEiso was not involved in the optimization objective in the dual-objective calibration, thus worst MAEiso values were obtained. We analyzed the relationship between the behaviors of discharge and isotope simulations (NSE-dis and MAE-iso) obtained by dualobjective calibration, and found that there was a trade-off between the two objectives (Fig. R1a, see attachment). The highest NSE-dis can reach around 0.93, but the MAE-iso is not good at the same time. When MAE-iso reach relative best values, the NSE-dis is around 0.9, which is still a high-level performance. We further found that when the highest NSE-dis was obtained, the contribution of glacier melt was estimated as around 0.35~0.4, which was however estimated as around 0.2 when best MAE-iso was obtained ( Fig. R1b and c). The isotope composition of glacier melt was assumed to be lower than the precipitation, thus an overestimated contribution of glacier melt can lead to lower simulated river isotope than measurement. Consequently, calibration focusing only on discharge may result in overestimated glacier melt, which can be rejected by the behavior of isotope simulation.
It is notable that the performance of isotope simulation is more sensitive than discharge simulation to the runoff component and internal processes. For example, when the contribution of glacier melt is in a large range of 10-40%, the NSE-dis can all be calibrated to a high value (>0.9) by adjusting other parameters, whereas the MAE-iso gets worse significantly when the proper contribution of water source is deviated.
We will discuss this issue with more details and provide some corresponding figures in the revised paper.

Comment 16:
Provide a spatial distribution map of precipitation isotope.

Response 16:
Thanks for your suggestion. We will add the spatial distribution map of precipitation isotope in the revised paper.

Comment 17:
Why the dual-objective has obtained good results in predicting discharge in the outlet station, while the other two scenarios adopting isotopic data could get better results in internal stations?

Response 17:
Thanks for your question. The model was only optimized according to the NSE at outlet station in dual-objective calibration, thus cannot necessarily result in a proper representation of internal processes. However, the fact that model can simultaneously satisfy multiple calibration objectives gave confidence in the model realization (McDonnell and Beven, 2014). Better performance for internal stations is not necessarily consistent, but the fact that most of them perform reasonably shows the robustness of the modelling results. We will discuss more about this in the revised paper.

Comment 18:
What is the meaning "consistently estimated lower proportions of glacier melt than the dual-objective calibration, which can be attributed to the role of isotope data in regulating the contribution of strong-evaporated surface runoff component fed by glacier melt to streamflow"? And what is the proportion of glacier evaporation in glacier melting?

Response 18:
Nan et al. (2021) provided a more detailed explanation for this. The THREW model assumed the glacier melt contributed to river channel through surface runoff, together with other surface components (precipitation occurring in saturated area and impermeable areas). The impermeable area in KR catchment is large due to the large glacier covered area, resulting in a large contribution of surface runoff. The evaporation of surface water was highly related to the surface runoff, and consequently related to the contribution of glacier melt. The surface evaporation process resulted in a higher isotope composition of surface runoff component due to the isotopic fractionation effect. Our results indicated that only when the proportion of evaporation to total surface runoff was around 30%, the model can perform well on isotope simulation.

Comment 19:
The largest differences in the winter season can only explain that isotopic constrain functions. But the predictions have also been improved?

Response 19:
Thanks for your question. We attributed the large differences in winter to the extremely small total water input, because the contribution of water sources was calculated by dividing the amount of individual water source by the total water input amount. Nonetheless, the difference had negligible effect on the prediction of total runoff, because of the extremely low contribution of winter water to the total annual amount (<1%).

Comment 20:
The uncertainty of isotopic data for hydrological modelling should be discussed quantitatively and deeply. For instance, the distribution map of precipitation isotope is coarse and vertical effects may be not considered in present scenarios in details.

Response 20:
Thanks for your suggestion. We will try to discuss the uncertainty of isotopic data for hydrological modelling more deeply in the revised paper. However, a quantitative evaluation of isotopic uncertainty requires much more work to do. We are going to prepare a separate paper about this issue following this work.

TECHNICAL CORRECTIONS
P2L26: was first corrected changes as was firstly corrected.
P3L52: Zongxing et al., 2019 changes as Li et al., 2019? The following same below can also be revised. P3L74-79: Quite a long sentence it is and suggest to adopt short sentence to follow the gist easily.
P27: keep x-, y-axis in the same scale.
P39: calibration scenarios instead of calibration variant makes sense?

Response:
Thanks for your corrections, and we will revise these in the newest version of paper. The term "calibration variant" was referred to Tong et al. (2021) which similarly conducted several calibration scenarios to explore the value of soil and snow data.