Assessment of NBSs effectiveness for flood risk management : The Isar River case study

Nature-based solutions (NBSs) are increasingly implemented to mitigate natural risks in urban and rural contexts, from coastal to mountainous areas. Nevertheless, the lack of quantitative approaches to assess NBSs ’ effectiveness limits their technical, social and cultural acceptance. Within the PHUSICOS project (EU H2020 Innovation Action; Grant Agreement nr. 776681) a comprehensive assessment framework tool (AFT) has been developed to ﬁ ll this gap. This paper presents an ex-post analysis with the PHUSICOS AFT applied to the Isar River case study. The restoration of the urban reach of the Isar River, in the city of Munich, was implemented in the early 2000s and represents a successful example of ecosystem and user-friendly ﬂ ood risk management plan. The performance of the NBS measures implemented to manage the ﬂ ood risk and improve the ecological status of the river (NBS scenario) is assessed in comparison with an alternative scenario with traditional hard engineering measures (grey scenario, GS). Results underscore the NBS as a competitive alternative. The ex-post analysis shows the potential of the PHUSICOS AFT for NBS performance assessment, providing guidance on indicator selection, stakeholders ’ management and performance assessment. The application discussed here is expected to aid professionals and researchers involved in the design, implementation, monitoring and evaluation of NBSs.


INTRODUCTION
Extreme natural hazards, such as floods, droughts, landslides, are more frequently occurring throughout the world, enhanced by climate change and uncontrolled land use (Wilby & Keenan 2012;Edwards et al. 2021;Feldmeyer et al. 2021).Considering the impact, cost and massiveness of grey traditional solutions for dealing with disaster risk reduction (DRR), natural and eco-friendly alternatives are encouraging more sustainable approaches in urban (Shafique & Kim 2015;De Paola et al. 2018a, 2018b), rural and mountainous areas (Ingold et al. 2010;Scarano 2017;Bhatt et al. 2020).The European Commission fosters the development of ecosystem approaches and, namely, of nature-based solutions (NBSs), to promote DRR while enhancing the restoration and conservation of biodiversity, safeguarding human well-being and health (Faivre et al. 2018).
NBSs are intended as 'solutions that are inspired and supported by nature, which are cost-effective, simultaneously provide environmental, social and economic benefits and help build resilience' (European Commission 2016).The NBS application fosters the citizens' participation, instilling trust among stakeholders (SHs) during both their implementation and operations (Kumar et al. 2020).Thus, they represent a living solution, inspired by nature, resulting in economic viability and enhancing socio-economic and environmental benefits (Maes & Jacobs 2015).NBSs can be classified as green, blue or hybrid when based on vegetative, waterbody or combined with grey infrastructure, respectively (Debele et al. 2019).Differently from grey solutions, usually having mono-functional features, NBSs are able to address socio-economic challenges supporting the reduction of economic vulnerability and developing a sense of place (Lupp et al. 2021).NBSs implicitly allow increasing the sense of identity, fostering the promotion and safeguard of the ecosystem services (Rowinśki et al. 2018).In comparison to grey solutions, properly designed NBSs can balance lower performances with cheaper maintenance costs, with greater costeffectiveness and efficiency in the long-term horizon (Naumann et al. 2014).
Finally, investments in NBSs can boost sustainable tourism and lead to new green jobs for operators, entrepreneurs and local producers (Boyle & Kuhl 2021).
One of the main concerns for the assessment of NBSs' effectiveness relates to the monitoring stage of their performances.It can be achieved by comparing the results against predefined targets, adequating the design and maintenance activities in cooperation with policymakers, actively involved in the decision-making process (Kumar et al. 2021).The monitoring represents a crucial stage at both the ex-ante and the ex-post project assessment, through the coupling of record datasets from public authorities, statistical databases, literature reviews and workshops with technical, environmental and socio-economic parameters (Andrés et al. 2021).In the literature, several sets of indicators have been proposed as a function of the territorial context (Nicholson et al. 2020), the monitoring instrument (Li et al. 2019) and the applied technique (Vinten et al. 2019).
Owing to their multi-functional nature, the design of NBSs for DRR cannot neglect the assessment of the related cobenefits, by implementing tools for the quantitative evaluation of their effectiveness under different perspectives: technical, socio-economic and environmental (Albert et al. 2020).
Several procedures have been proposed to assess the NBS performances in urban areas (Liquete et al. 2016;EKLIPSE 2017;Narayan et al. 2017;Raymond et al. 2017;Zölch et al. 2017;Calliari et al. 2019), remarking upon the strict connection between the NBS effectiveness and the urban (Figurek 2021) and biophysical (Pinto et al. 2021) characteristics of the area.Moreover, many governmental agencies have been developing protocols and methods to evaluate strategies, plans, programmes and projects where different types of solutions (including grey and also nature-based) are considered in public investments' decision-making process, in order to compare the effects relative to the environmental, economic, and social goals (Interior 2013;King et al. 2021).Among them, the U.S. Army Corps of Engineers (USACE) developed criteria and guidelines to design and manage natural and nature-based features (NNBFs) against flood risk reduction.Such guidelines focus on how to effectively include natural solutions into planning and policy rules.There, NNBFs are considered as standalone solutions or as integrated with other structural and nonstructural measures (Carter & Lipiec 2020).
In the frame of the NBS application against natural hazards, the PHUSICOS project pays specific attention to rural mountainous areas, for which peculiar technical, environmental and socio-economic features can be identified (Baills et al. 2021;Strout et al. 2021).Indeed, these environments deal with natural hazards such as flooding, landslide, rockfall, snow avalanche (Haritashya et al. 2006;Keiler & Fuchs 2018).Moreover, differently from urban areas, rural and mountainous areas show specific socio-economic and environmental features, caused by depopulation and limited economic development.The NBS approaches may thus provide proper co-benefits to take on these challenges (Solheim et al. 2021).Furthermore, while few pieces of research on NBS focus on the mountainous landscape (Palomo et al. 2021), crucial and exposed transportation routes, lifelines and critical infrastructures are located in these areas.
Many governance models may enable NBS implementation, however, the most effective approach to plan, design and implement NBSs entails the application of polycentric and collaborative planning (Zingraff-Hamed et al. 2020a;Dumitru et al. 2021), by involving experts and SHs' dialogues (Scolobig et al. 2016).Indeed, several studies have pointed out that such participative approaches may enhance the effective NBSs' implementation (Pauleit et al. 2017;Frantzeskaki et al. 2019;Zingraff-Hamed et al. 2020b).The PHUSICOS project tests the Living Lab approach to select, implement, monitor and assess nature-based and hybrid solutions in mountainous and rural areas (Fohlmeister et al. 2017a(Fohlmeister et al. , 2017b;;Lupp et al. 2021).
Owing to the multi-functional nature of NBSs, their effectiveness assessment requires the application of multi-criteria decision analysis (MCDA) (Ishizaka & Nemery 2013;Ruangpan et al. 2020).MCDA allows different co-benefits to be managed and to include the SH involvement in both the weighting and assessment processes.However, studies showed that many methods include only ecological and risk mitigation features while neglecting economic and social indicators (Perosa et al. 2021b).Furthermore, the assessment of suitable solutions before their implementation is essential to support the decisionmaking process (Zingraff-Hamed et al. 2018;Perosa et al. 2021a).
Within the frame of the PHUSICOS project, a multi-level assessment tool was developed (Autuori et al. 2019), aimed at applying a MCDA for evaluating the effectiveness of nature-based, hybrid and grey solutions.The assessment framework tool (AFT) was addressed at the planning, design, implementation and monitoring of design scenarios (DS) through a multi-stakeholder involvement and a multi-disciplinary approach.Owing to its flexible structure, it can be customized and refined for several operative fields and socio-economic and environmental contexts.
In this paper, the PHUSICOS AFT was tested on the Isar River case study, for an ex-post assessment.The aim of this work is two-fold: (i) to present the PHUSICOS AFT and its application for a real case study; (ii) to assess whether the NBSs implemented at the Isar River were competitive relative to more common traditional grey solutions.The DS of the river flood risk management plan, implemented in the areas surrounding the Municipality of Munich in the early 2000s, was analysed and compared with a hypothetical grey scenario (GS).Results allow observation of the benefits or detriments of the post-intervention scenario, in comparison with the pre-intervention configuration (baseline scenario).
Section 2 reports the features and the structure of the AFT.Section 3 introduces the Isar River case study and the application of the AFT.Section 4 discusses the results of the assessment.Section 5 provides the summary and the concluding remarks.

THE PHUSICOS PROJECT ASSESSMENT FRAMEWORK TOOL (AFT)
The AFT is composed of a set of key performance indicators (KPIs) used to assess the effectiveness of the DSs (e.g., naturebased, hybrid and grey solutions) from different perspectives: technical, environmental, social and economic.KPIs are selected and used to properly evaluate key ecosystem services, co-benefits and costs.The assessment is based on the KPIs' estimation, to be selected and quantified to evaluate their features at different DSs.Once the KPIs are quantified, by applying a multi-level aggregative and weighting approach, the score of each DS is calculated as a numerical scoring of its overall performance.Once a score is evaluated for each alternative DS, the alternatives can be quantitatively compared, and the results can aid the selection of the best-suited option (Figure 1).The AFT has a multi-level hierarchic structure composed of KPIs suitable for rural and mountainous areas.Differently from the urban contexts, rural and mountainous areas have specific features in terms of forcing natural hazards, and socio-economic and environmental characteristics.These areas are usually more susceptible to flooding, rockfall and landslide hazards.From the socio-economic perspective, they are affected by youth drain, limited infrastructural services and accessibility.
Within the PHUSICOS AFT, KPIs are grouped in hierarchically ordered categories.Such categories are ambits (first level), criteria (second level), sub-criteria (third level) and the actual KPIs (fourth level).In the application of the AFT, KPIs are selected using a a top-down approach, based on the introduced hierarchic structure.The ambits were identified to assess (1) the DS performance against one or multiple natural risks (Ambit 1: risk reduction); (2) the DS affordability (Ambit 2: technical and feasibility aspects); (3) the environmental co-benefits of the DS (Ambit 3: environment and ecosystems); (4) the implications on the society (Ambit 4: society); (5) the impacts on the economy at the local scale (Ambit 5: local economy).
Each ambit is composed of one or more criteria, in turn including sub-criteria (Table 1).Each sub-criterion consists of one or more KPIs.Namely, the complete AFT includes 5 ambits, 14 criteria, 40 sub-criteria and 98 KPIs.For more details refer to the deliverable D4.1 of the H2020 PHUSICOS project (Autuori et al. 2019).However, KPIs and different categories can be included or customized depending on the specific features of the investigated case study.
The core of the AFT structure is a matrix of 13 columns with the number of rows corresponding to the number of considered indicators.Ambits, criteria and sub-criteria are reported in column 1, 2 and 3, respectively, whereas the KPIs are specified in column 4. The overall structure of the matrix is summarized in Table 2, whereas Table 3 shows a simplified sketch of the AFT matrix.
The AFT matrix may be intended as a flexible tool, to be properly tailored at each specific case study.Indeed, the KPIs, subcriteria and criteria to be included in the assessment strictly depend on the specificity of the investigated case study, namely, setting status, type of risk faced and solutions suggested.Nevertheless, to ensure a comprehensive approach, at least one KPI per ambit, or better per criterion, should be included in the analysis.Specifically, the main natural hazards occurring in the area have to be identified, to exclude the non-relevant KPIs.Then, the data availability and the suitability of measuring and monitoring the KPIs at both the DSs and the baseline scenario have to be considered to select the spectrum of criteria to be included.In addition, the AFT tailoring depends on the chronological and climatic contexts wherein the assessment operates.To implement measures dealing with DRR, such as NBS, hybrid or grey solutions, two assessment stages can be considered: an ex-ante stage, before the design and implementation of the measure, and an ex-post stage, after the implementation, for monitoring the performance of the implemented interventions.The AFT application is different for the two stages.For the ex-ante stage, either a simplified matrix, with very few KPIs, useful for a quick and preliminary assessment of different DSs (NBS, hybrid or grey) or an extended matrix can be considered for assessing the effectiveness of a DS before its implementation.Conversely, at the ex-post stage, more KPIs are needed for the proper analysis of the robustness of the implemented DS either towards specific goals (assessment factor matrix) or for extensively monitoring its performances (extended matrix).

KPIs normalization, weighting and aggregation
The AFT is based on the KPIs' estimation at each DS, in comparison with their value at the baseline scenario.The performance of each DS is estimated by means of the overall scoring, quantified as the multi-level weighted sum of the KPI scores.
Owing to the heterogeneous features and metric of the KPIs, once each KPI is estimated, a normalization approach is applied, as detailed in the following.
Let us consider a set of n DSs K i , with i ¼ 1,…, n, in which a set m indicators KPI j are accounted for, with j ¼ 1, …, m.To normalize the KPIs within the range from 0 to 100, Equations ( 1) and ( 2) are applied: where KPI j,i is the normalized value of the indicator KPI i,j and KPI j max (KPI j min ) is the maximum (minimum) value achievable by KPI i at the baseline scenario.Equation ( 1) is considered when the KPI is maximized for the optimization (→ symbol in column number 7 of the AFT matrix of Table 2), whereas Equation (2) when it is minimized (← symbol in column number 7 of the AFT matrix of Table 2).Thus, the KPI j,i parameter states the beneficial/detrimental extent provided by the ith DS on the j-th KPI, in comparison with the baseline scenario.
A weighting procedure is then applied, in order to consider how some features, and in turn some KPIs, may be intended as having greater or lesser importance for the assessment process, depending on both the involved SHs and the case study specificity.To this aim, the AFT, developed in the frame of the PHUSICOS project, is based on a multi-level weighting procedure to assess the score of the j-th KPI at the i-th DS with Equation (3): with: where w I,j , with j ¼ 1, …, m, is the weight of j-th KPI at the i-th DS (first level of weighting), w II,l , with l ¼ 1,…, c and c the number of defined criteria, is the weight of the l-th criterion including the j-th KPI (second level of weighting) and w III,a , with a ¼ 1,…, p and p is the number of considered ambits, is the weight of the a-th ambit (third level of weighting).
Owing to the different skills and backgrounds of SHs involved in the weighting procedure, the most suitable approach to set the multi-level weights entails coupling the Likert scale (Likert 1932) and the uniform weighting procedure.According to Likert's approach, SHs or specialists should rate the preferences (e.g., via surveys), giving a vote from 1 ('not at all important

Society Landscape and heritage
Landscape perception to me') to 5 ('very important to me'), by making easier the comparison among each other.Once the preferences of all the voters are collected, the weights can be properly normalized for the final assessment.Given the significant number of KPIs and, in some cases, their technical specificity, not simply to be understood by heterogeneous groups of SHs, the Likert scale approach is actually suitable only for the second and the third level of weighting, namely, the criteria and the ambits.Whereas the uniform approach can be accounted for the fourth level, the KPIs, by assuming a uniform rate between all the objects belonging to the same level.The weight is thus estimated as the reciprocal of W, where W is the number of selected KPIs.Nevertheless, alternative approaches can be chosen, such as the pairwise comparison (Kou et al. 2016), resulting in any cases not easily relevant when a significant number of objects W is included in the assessment.
Against the tool proposed by Autuori et al. (2019) in the PHUSICOS deliverable and applied by Pugliese et al. (2020), to properly assess the numerousness of KPIs belonging to each criterion and ambit, an improved version of the AFT was developed.The total score R i of the i-th scenario is thus estimated by summing up and averaging the weighted KPIs of each ambit (Equation ( 4)): where NA a is the number of KPIs belonging to the a-th ambit, w I,j is the weight of the j-th KPI, w II,j and w III,j are the weights of the criterion and the ambit wherein the j-th KPI belongs to, respectively.Owing to the multi-level weighting approach, the criterion score R Ci and the ambit score R Ai are calculated with Equations ( 5) and ( 6), respectively: where NC l is the number of KPIs belonging to the l-th criterion and NCA l,a is the number of criteria belonging to the a-th ambit.
The hierarchic AFT can be applied to calculate the partial scores of DSs, while focusing the attention on specific ambits or criteria and thus performing the comparative analysis between alternatives of specific aspects (e.g., technical, environmental or socio-economic ambits).

The Isar River restoration
The Isar River is an alpine tributary of the Danube River.It sources in the Austrian Alps, flows north through southern Germany and joins after 292.3 km the Danube River in Deggendorf (Figure 2).The Isar's catchment area is 8,964.57km² and precipitations in the Alps cause high rainfall volume and resulting floods during the summer.The Isar River has a nival regime with high discharge variability throughout the year.In Munich, discharges range between ∼10 m 3 /s during drought periods and 1,050 m 3 /s during extreme flooding (www.hnd.bayern.de).Flooding of the Isar usually occurs within a few hours and recedes within a few days (Egger et al. 2019).Climate change models suggest an increase in extreme weather in the Alps.The rain volume will rise up to 25% and an increase of 12% in the 100-year return period maximum discharge of the alpine rivers is predicted (DKRZ 2017; Wagner et al. 2017).Owing to both the Isar's fast-flowing water with hydro-electrical potential and its furious and repetitive flooding patterns threatening human settlements, the river has been channellized and regulated.Hydromorphological modifications started in the late 19th century resulted in degraded morphological status of the river and related ecological and social losses.These triggered serious concerns and regrets from civil society that demanded its restoration (Zingraff-Hamed et al. 2019).The first restoration plan was designed in 1970 but it took more than 40 years of intense efforts before the implementation of the Isar River restoration in Munich was completed.From 1970 to 1990, civil society and lobbyists demanded a near-natural river in Munich and pressured politicians with very little success at the beginning.Since 1985, they have gradually gained recognition.
The Isar restoration planning took a decisive turn with the result of hydraulic calculations investigating Munich's exposure to HQ100, indicating urgent actions to prevent damage by such flooding events (Döring & Binder 2010).All SHs were involved in a comprehensive planning process to design the 8 km-long Isar-Plan in Munich with a three-fold goal: mitigate flood risk, improve the recreational potential and provide suitable habitats for fauna and flora.It was clear that measures would not follow the conventional approach of heightening of dams and dykes for flood protection, but targeted a river ecological restoration.The implementation took 11 years (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011) and the project had a cost of approximately 35 M€, funded by the State of Bavaria and the City of Munich.Reshaping the section of the river with lowered and near-natural banks allowed water-based recreation activities (Figure 3), new riparian gravel structures and typical riparian vegetation establishment, while higher water volumes could be retained compared to the former trapezoidal river morphology.The project won the first German award 'Gewässerentwicklungspreis' for river development (Binder 2010).The collaborative planning and especially the in-depth involvement of NGOs and the public authorities was a key success factor for the planning and implementation project.Even though the Isar River restoration is a famous 'good practice' to follow, until today, few investigations have assessed the overall success of this measure.

The weighting procedure through a Living Lab approach
The different groups of SHs involved in the co-creation processes of the Isar River Restoration participated in a workshop in January 2019 carry out to ex-post reflections on the measures designed, planned and implemented in Munich.The workshop was composed of different work blocks, e.g., 'reflect about the Isar evolution and future', 'weight the goals of the Isar River restoration project' and 'assess project success'.Considering the weighting procedure, prior to the workshop, the list of all suggested criteria and KPIs was screened by the research team and their partners to reduce the full set to criteria relevant for the Isar case.This resulted in a selection of ten criteria distributed within the five ambits of the AFT.During the workshop, a total of 29 SHs from NGOs, public and private organizations weighted the ambits with the preselected criteria of each ambit using Likert scales from 1 to 5. The SHs had to use dot-stickers on posters fixed on a wall.Sticker colour depended on the belonging of the SH to one of the core SH groups.While the voting per sticker was anonymous, no discretion was set up and SHs were able to discuss when voting to exchange opinions and encourage the discussion.

AFT application to the Isar River case study
The AFT described in Section 2 was applied to the Isar River case study to carry out the ex-post performance assessment of the project.The effectiveness of the implemented river restoration project was evaluated in comparison with an alternative project scenario.Specifically, two alternative DSs were considered: (1) the NBS scenario, i.e., the implemented Isar River restoration project described in Section 3.1, and (2) a GS, consisting in the design of a 1.5 m-high dyke and the vegetation removal at the existing dykes, simulated as a low-cost traditional flood risk reduction measure.Data documenting the post-restoration status were collected from existing data repositories (Table 4).The data mining of baseline resulted from collection of published data documenting the status prior to restoration, with the exception of the flood protection capacity that was assumed as achieving the flood protection targets, namely, flood capacity of 1,200 m 3 /s.All the data were sourced from the monitoring procedure performed within the 8 km-long restored river reach.
Owing to the flexibility of the approach, the KPIs were selected depending on the site-specific characteristics.With reference to this case study of morphological restoration, the indicators should specifically address the aspects related to the hydromorphological assessment of riverine areas (González del Tánago et al. 2021).Thus, in the environment and ecosystems ambit, fish diversity and count, invertebrate count, sediment size, sinuosity, floodplain features and water quality  were considered as relevant KPIs.Analogously, in the society ambit, the social acceptability, the political obstacles and the human recreation were included as KPIs.Moreover, further environmental aspects, such as the pollution from excavation machinery, mining riprap and transportation of materials to the site should be included in the assessment.Overall, considering the data availability limitations and the negligible variation of some indicators at the two considered DSs, a final matrix of 28 KPIs, as reported in Table 4, was considered in the assessment.The selected KPIs belonging to all the five ambits of the matrix, were identified and distributed between ten criteria and 15 sub-criteria.The sixth column of Table 4 reports the data source of the corresponding indicator.
As defined in Section 3.2, through a Living Lab approach, three different SH groups, namely, NGOs, public and private organizations, gave a rate from 1 to 5 to each ambit and criterion, according to the Likert (Likert 1932) scale, whereas the uniform criterion was applied to weight the KPIs.A fourth neutral SH group was included, whose weights were set equal to the overall average of all the SHs' weights.

RESULTS AND DISCUSSION
Tables 5 and 6 summarize the average weights of ambits and criteria, respectively.
The KPIs and the corresponding normalized values were estimated for the NBS and the GS, as reported in Table 7. Equations ( 4)-( 6) were applied to calculate the scenario, ambit and criterion scoring, respectively.Results of the ambit comparison are given in Table 8 and Figure 4.
For the NBS scenario, the NGO was the only SH group returning a greater score for the technical and feasibility aspects ambit, given the high weight provided to this category.The public, the private and the neutral SHs earned the highest score with the society ambit, instead.The risk reduction ambit earned the lowest value with all the SH groups, given the lower volume capability of the riverbed to convey the discharge, of about three times lower than that of the GS.
For the GS, both the environmental and ecosystems, and the society ambits earned negative scores, given the increase of temperature and the reduction of usability of the river for restoration, respectively, provoked by the simulated intervention.The technical and feasibility aspects ambit exhibited the highest value with all the SH groups, because of the initial  investment costs, resulting in being about three times lower than that of the NBS scenario.This ambit reached a score higher than the private organizations with the NGO stakeholders, given the greater relative weight.Table 9 and Figure 5 show the criteria comparison between the two DSs.Notwithstanding the higher value of hazard score achieved by the GS, the NBS solution returned significantly greater scores with the remaining criteria, given the effectiveness of this solution in fostering the environmental and socio-economic co-benefits.Indeed, although the hazard score was three times lower, the benefits in terms of soil, vegetation and biodiversity were negligible for the grey solution.Moreover, the increase of a new area for recreational use, new pedestrian and cycling path, area easily accessible for people with disabilities and the increased rate of property values balanced the lower usability of the river for recreation.Concerning the revitalization of marginal areas criterion, the NBS solution doubled the benefits related to the implementation of the GS.
The comparison of the overall scenario scoring between the NBS and GS is plotted in Figure 6.
The NBS reached an overall scenario scoring significantly higher than the grey solution with all the considered SHs, resulting in being five times greater when considering each SH group.For all SHs, the NBS scenario racked up a total scenario scoring higher than the GS.It varied from 3.33 to 4.09, against the range between 0.61 and 0.74 of the grey solution (Figure 6).Indeed, given the relevant environmental and socio-economic co-benefits, despite the lower score of the risk reduction ambit, the NBS scenario was preferable in comparison with the GS.This result is interesting because it shows that GSs are perceived as having a higher risk mitigation capacity than NBS.However, as a matter of fact, the NBS and the GS have similar performance considering risk mitigation potential (Gerwien 2020).
It is worth noting that the comparison between the two DSs led to similar results with varying the SH group.The higher score was observed with the private organizations, given the higher weights provided to the environmental and ecosystems, society and local economy ambits.The discrepancies in scoring between SHs is due to the differences between both ambits and criteria scoring.Indeed, the private SHs gave a low ranking to the risk reduction aspects, whereas results between NGO and public SHs are comparable.This matches with studies in the literature showing that the non-public sector and, namely, civil society, is inclined to underestimate the flood risk more than NGO and public authorities (Baan & Klijn 2004).Conversely, they significantly take into consideration the effects on ecological and social co-benefits.Indeed, NBSs are more perceived as supporting biodiversity and wildlife, by assuring better air quality and climate regulation (Ferreira et al. 2020).Moreover, they are intended as limiting social inequalities more than traditional grey solutions, given the ability to promote recreational activities, sociocultural initiatives and quality of life (Conedera et al. 2015).
The local economy had no primary importance for all the categories of investigated SHs, instead, both the new job opportunities and the economic activities were perceived to be less evidently related to the application of a NBS approach.This is presumably due to the different time span and the order of magnitude having economic benefits, compared with those of the environmental and social effects.
Concerning the GS, the results of different SHs were comparable, given the few improvements that it showed on the local economy and the detrimental effects on the environment and the society co-benefits.This may be corroborated by the awareness knowledge of the effects of grey solutions in the long run, given the great amount of long-term implemented case studies.Indeed, against traditional grey applications, the NBSs can be intended as a relatively novel approach, making the assessment of their effectiveness mainly due to SH perception, rather than to strengthened lessons learned.As observed by further studies (Goeldner-Gianella et al. 2015;Potocǩi et al. 2021), when high perceived risks occur, the lack of evidence, novelty and complexity of NBS tend to favour traditional grey approaches for risk reduction.The AFT application to the Isar River case study showed some limitations of the implemented approach.The dependency on the data availability for the assessment of both the baseline and the DSs was remarked.The data extension allows estimation of more KPIs belonging to the different criteria, aimed at limiting the susceptibility of the assessment by the numerousness of considered indicators.The limitation of available data at both the NBS and the design ones forced the neglect of some KPIs which could result in being relevant for the assessment.In this case study, specific indicators of the environmental and socio-economic properties of the site (e.g., fish diversity and number, water, quality, political obstacles, pollution from excavation machinery) were thus not included.The criteria to select the KPIs from the overall matrix should properly account for the robustness of the available datasets and the specificity of the investigated site.Indeed, the KPIs can be tailored as a function of both the occurring hazards and the environmental and socio-economic features of the site.Nevertheless, the effectiveness of the assessment needs specific evaluation on the biases arising from the input data of the model.
Moreover, results were shown to be slightly sensitive by the SH weighting.The interaction between SHs during the Living Lab session, open to exchange opinions and stimulate discussion, positively affected the familiarity of participants with the nomenclature and the meaning of ambits and criteria to be voted upon.Their different backgrounds and skills would not allow them to correctly understand all indicators, justifying the choice to apply the uniform weighting to the KPI level.
However, to limit the biases between scorings, the involvement of wider SH groups may improve the robustness of the weighting outcomes.In addition, as observed in previous studies (Giordano et al. 2020), coupling the Living Lab experience with effective territory control, through the implementation of complementary socio-institutional actions, could improve the SHs' awareness limiting the different perceptions of heterogeneous SH groups.

SUMMARY AND CONCLUSIONS
NBSs are increasingly considered as competitive alternatives for managing natural risks in urban, rural and mountainous areas, given their potential to reduce risk while providing socio-economic and environmental co-benefits.The performance assessment of NBSs still represents a critical step towards their spreading.In the frame of the H2020 PHUSICOS project, an AFT was developed to quantitatively assess the effectiveness of NBSs, hybrid and GSs via a comprehensive multi-criteria approach.
In this paper, the results of one of the first studies in the literature focused on the quantitative estimation of the effectiveness of Isar River restoration are provided, by analysing the strength/weak points of this measure through a multi-disciplinary comprehensive approach.This case study is the first application of the PHUSICOS AFT to an ex-post analysis, providing a deeper insight into the framework reliability at the monitoring stage of designed NBS measures.The results were illustrated and discussed, with a specific focus on the inclusion of SHs into the evaluation process.The tool was tailored based on the available data and the site-specific hazards.In greater detail, the preferences of three different SH categories, coupled with a fourth neutral SH category, were surveyed through a Living Lab approach and the results were integrated within the weighting procedure.For the ex-post performance assessment, two DSs, a NBS and a grey one, were compared, by estimating 28 KPIs, belonging to all the ambits of the PHUSICOS AFT.
Comparable results were observed between the different SH groups, resulting in the NBS scenario being the most advisable solution with an overall score of about three times higher, regardless of the SHs' weighting.Specifically, the NGO stakeholder category achieved a total score of 3.86 at the NBS DS against 0.74 at the grey one.The public organizations returned an NBS score of 3.34 against 0.61 of the GS, whereas the public SHs totalled 4.09 versus 0.74.The differences in SH scores for the NBS scenario were mainly due to the relevant weights given by these categories to both the environment and ecosystems and the local economy ambits.Thus, the higher score provided by the private SH group, the greater score achieved by this category.
Nevertheless, despite the lower benefits in terms of risk reduction capability, the NBS approach attained the highest scores with the environmental and societal ambits.The GS provided only higher technical benefits, reaching ambit scores of about three times higher than the NBS ones with all the SH categories, as a consequence of the greater volume capacity of the riverbed against the flooding events than the prior solution.An increase of water capacity of 100% was indeed estimated for the GS, against 33% of the NBS one.Conversely, the NBS selection was mainly sustained by the higher relevance provided to the quality of life, environment and ecosystem, landscape and heritage improvement criteria in addition to the flood protection, showing that the NBS approach was the best trade-off between the technical outcomes and the environmental and socioeconomic ones.Even if scores of all the SHs' categories empathized the inclination to select the NBS scenario, biases between SHs' results were observed, given the variable weighting of the different framework categories.
This is, on one hand, a limitation of the proposed approach due to not providing a univocal solution for decision-makers.On the other hand, it can represent a strength of the method because it allows a quantitative viewpoint to be obtained from different social categories, providing guidance on indicator selection, SHs' management and performance estimation.This application is thus expected to aid professionals and researchers involved in the design, implementation, monitoring and evaluation of NBSs.The involvement of a wider set of SHs could allow the robustness of the procedure to be assessed, against the SH weighting, limiting the biases observed in this study.As a preliminary swift assessment, considering the neutral SH category, obtained by averaging the weights of the SHs can be intended as an effective solution for prompt decisionmaking.Indeed, in this work, the simulated neutral SH gave sight of a good trade-off, by handling the diversity in perception of the different surveyed SH groups.It can be thus considered for the final evaluation, providing the comprehensive result of the AFT, including the SHs' weighting.
A further limitation of the approach concerns the KPI selection and estimation.The consistency of the observed results can be surely improved by including more KPIs in the assessment.However, the availability of data to estimate the KPIs at both the NBS and the designed ones can enhance further modelling activities to limit the gap between different perceptions.On the other side, given the relevant flexibility of the framework, the selection of the KPIs can be properly customized as a function of the site specificity and the data availability.
The framework application to the Isar River case study was useful to assess the framework effectiveness for an ex-post analysis, providing relevant support to further approaches available in the literature.Indeed, the tool is able to couple a systematic quantitative multi-criteria analysis with the flexibility of including the perception of different SH categories.One of the most relevant peculiarities of the framework is being devoted to mountainous and rural areas, presenting specific environmental and socio-economic features.
Finally, given the multi-criteria approach, the AFT does not make easier inter-temporal comparisons.To distinguish the impacts during the project construction from those at the operational stage, a discounting analytic technique may thus be accounted by comparing benefits and costs occurring in different time spans.In addition, several causal interactions and the environmental and socio-economic co-benefits may be analysed with different time scales, providing the robust inclusion of the temporal dimension, aimed at properly analysing the processes, related to the NBS effectiveness, evolving over time.
In the next steps of the research, further improvements will thus focus on the analysis of the different time scales characterizing the AFT indicators.

Figure 1 |
Figure 1 | Conceptual scheme for project scenarios' comparative performance assessment.

Figure 2 |
Figure 2 | Map of the Isar River restored area (courtesy of Marcelian Grace).

Figure 3 |
Figure 3 | River floodplain after the urban-reach restoration.Caption of the reshaped banks of the Isar River (courtesy of Marcelian Grace).

Figure 4 |
Figure 4 | Ambits' comparison between NBS and GS for the Isar River case study.

Figure 6 |
Figure 6 | NBS and GS scoring for the Isar River case study.

Table 2 |
Key to read the 13 columns composing the AFT matrix

Table 3 |
Structure of the framework matrix

Table 6 |
Results of criteria weighting procedure for the Isar River case study

Table 7 |
KPIs estimation at the baseline, NBS and GS for the Isar River case study

Table 9 |
Criteria scores of the NBS and GS for the Isar River case study Figure 5 | Criteria comparison between (a) NBS and (b) GS for the Isar River case study.AQUA -Water Infrastructure, Ecosystems and Society Vol 71 No 1, 55 Downloaded from http://iwaponline.com/aqua/article-pdf/71/1/42/994276/jws0710042.pdf by AALTO UNIVERSITY user