Reply on RC1

Authors: The aim of this manuscript is mainly a report on the implementation of the idea of the concurrent radiation scheme, which offers a solution for improving the performance of the model. Although we generally expect a thorough analysis of this approach in a follow-up paper, there are a few points in this regard that are worth mentioning here. In the concurrent radiation scheme, the radiation component was separated from the main model, and thus enabling the component to opt for a different choice of domain decomposition from the one adopted by the main model. This feature however requires a reordering of data between the MPI processes assigned to the main model and the radiation component which is fully handled by the YAXT library.


Authors:
The aim of this manuscript is mainly a report on the implementation of the idea of the concurrent radiation scheme, which offers a solution for improving the performance of the model. Although we generally expect a thorough analysis of this approach in a follow-up paper, there are a few points in this regard that are worth mentioning here. In the concurrent radiation scheme, the radiation component was separated from the main model, and thus enabling the component to opt for a different choice of domain decomposition from the one adopted by the main model. This feature however requires a reordering of data between the MPI processes assigned to the main model and the radiation component which is fully handled by the YAXT library.
Such an approach is a means solely aimed at improving the overall performance of the model and creating a load-balance between the MPI processes assigned to the main model and the component. It is noteworthy that the accuracy of the model will not however be affected if the radiation component and the main model adopt different domain decomposition. This is because the temporal and spatial resolutions of the model and the radiation component are not affected and thus the simulations results are expected to remain bit-wise identical.
Changing the temporal resolution of the radiation component is, on the other hand, motivated by the need for decreasing the gap between ∆t rad and ∆t atm in order to achieve a consistency in the model. Since the radiation component scales better than the main model, assigning equal numbers of MPI processes to the calculation of radiative transfer and the rest of the model results in an idle time in the MPI processes assigned to the radiation component and thus creates a load imbalance in the model, as shown in Fig 13. Such an idle time offers an ample opportunity for reducing the radiation time step and calculating radiative transfer more often (without increasing the total execution time of the model or increasing the resource usage). Although this approach is expected to improve the load-balance in the model, it should be noted that this solution is primarily in pursuit of generating a more consistent atmospheric model rather than creating a loadbalancing in the model.
In the nutshell, adopting a coarser or finer domain decomposition is the effective means for load-balancing without affecting the accuracy of the model. Changing the temporal resolution of the radiation component is however aimed at improving the accuracy of the model though it can potentially contribute to load-balancing of the model as well.
(2.) Referee: Related to this, I'm missing a more systematic assessment in section 5 which currently is somewhat qualitative (e.g., "This implies that a further tuning for the concurrent radiation scheme may be needed", lines 321-322: what does this actually mean?) Authors: Model tuning generally refers to the adjustment of a set of model parameters towards the end of model development. It is an approach that is commonly used to obtain certain properties, e.g., temperature, cloud feedback, climate sensitivity, in good agreement with observational records. Without tuning, models may drift away from the realistic state of Earth's climate. As shown by Mauritsen and Roeckner (2020) [1], a set of parameters concerning shallow convection, critical relative humidity in the fractional cloud scheme and mixed-phase clouds are carefully tuned in ECHAM6.3.
Here we adopt the concurrent radiation scheme in the model. which may strongly affect the radiation budget. Yet parameters concerning the cloud scheme are not adjusted. Figure 22 exhibits that the biases in cloud radiative forcing of the concurrent radiation scheme is larger than those of the sequential radiation scheme. This suggests that a specific tuning, especially for the relative humidity threshold for cloud formation in the upper troposphere and the lowest model level (Mauritsen, personal communication), for the concurrent radiation scheme is needed.
We reformulate the sentence in lines 321-322 as follows, "This implies that model parameters concerning the cloud formation should be carefully adjusted for the concurrent radiation scheme. Specifically, the relative humidity threshold for cloud formation in the upper troposphere and the lowest model level should be changed to improve the match with observational records (Mauritsen and Roeckner, 2020)." We intend to illustrate the concepts of the concurrent radiation scheme and its implementation in ECHAM6.3.05 in this manuscript. A more systematic assessment on the concurrent radiation scheme is somewhat beyond the scope of this study. We would present a thorough assessment in a separate paper together with the tuned model.

(3.)
Referee: How specific is the "tuning" for the specific model setup?
Authors: Please refer to our response in (2.). Parameters concerning the relative humidity threshold for cloud formation in the upper troposphere and the lowest model level should be changed.

Referee:
The new scheme also introduces a change in the operator splitting technique, the potential effects of which are not discussed nor systematically assessed in the paper: In the classical radiation scheme, the radiation timestep takes the most up-to-date quantities of the atmospheric physics as input, whereas in the concurrent radiation scheme the input quantities systematically lag behind by one ATM timestep (cf. Figs 2 and 6) Authors: This is one of the biggest theoretical questions to be answered and we would be very happy to discuss this in a follow-up paper together with climate scientists and mathematicians. However, the current paper should only be seen as a proof of concepts and concentrates on the implementation.

Referee(minor points):
line 143; "receives feedback ... upon the request" what does this mean ? line 178: specify type of InfiniBand (EDR, HDR, or alike) line 205: "... is adopted to ..." what does this mean? typo in line 408: scalability Figs 2, 6, 13-16: some of the labelling is hard to read, in particular red font on blue background Authors: All the minor points were applied to the revised version of the manuscript and is ready to be submitted upon the request.