network resilience
[SERZ72] © 123RF.COM

The reality is that much of the energy sector’s power infrastructure is old and prone to frequent failures. Catastrophic events, such as what is being experienced in 2020, exacerbate this reality making it most important for utilities to strive for network resilience.

The article first appeared in ESI Africa Issue 4-2020.
Read the full digimag or subscribe to receive a print copy.

The capability of the electricity grid to carry out its function is continually being challenged by a number of elements from the expected natural disasters to the new threat of cyber-attacks. Bouncing back from such challenges makes network resilience increasingly important. Power system resilience today is receiving more attention by regulators and the utility industry as a key factor of the defence against the high-impact and low-probability (HILP) events that result from major catastrophic events, which have significant economical and societal impact.

This article makes use of a paper that analyses historical outage data for transmission system components and discusses the implications of nearby overlapping outages with respect to resilience of the power system. The paper carries out a risk-based assessment using the North American Electric Reliability Corporation (NERC) Transmission Availability Data System (TADS) for the North American bulk power system (BPS).

…solutions are required for physical- and cybertypes of extreme contingencies.

Analysis of an outage data methodology

The frequent effects of catastrophic events have shown the vulnerability of the electric grid and the lack of adequate methodologies for evaluating resilience under these HILP events. Understanding the comprehensive risks associated with extreme events is important because it affects the ability of companies and individuals to plan for resilience in areas such as prevention, adaptation, and recovery. Resilience analysis has to consider both the impact and probability of an event. An effort to develop methodologies and tools to mitigate or minimise the risk from HILP events and to improve electric grid resilience in areas of planning and operation continues but by far does not provide comprehensive solutions.

Enhancing the resilience of a power system needs to be coordinated for all segments (generation, transmission, distribution, customers), and solutions are required for physical- and cyber-types of extreme contingencies. The paper discusses the power system resilience concept in operation planning by evaluating the historical cluster outages of multiple transmission elements (e.g. lines, transformers) recorded within a 2-min time interval. According to the paper, this type of outage is a threat to operating a single-contingency reliability criterion to each utility transmission operator (TOP).

The paper further proposes a methodology using the TADS data for assessing the resilience of BPS under these nearby overlapping outages. To gain a better understanding of how clusters of nearby outages can impact the system resilience in the future, this study examines both sustained and momentary outages.

The research team performed a comprehensive analysis of the North American combined inventory and cluster outage data for both automatic sustained and momentary outages within a 2-min window. The analysis aims to identify the actionable information from outage data statistics that could be helpful in preventing or mitigating the consequences of newly studied overlapping outage clusters. In addition, this paper presents a methodology to evaluate a likelihood of clusters of different sizes and the overall cluster for a transmission owner (TO) based on its transmission inventory.

Operational grid resilience

Detecting and preventing multiple outages is critical to maintaining power system reliability and resilience. Operation planning engineers, as well as control room operators, face complex situations resulting from these multiple events. When power grids have high volumes of renewable energy sources or they are heavily stressed with high power transfers, it becomes an increasingly challenging task to make electricity grids most efficient, reliable, and resilient. A growing body of publications in recent years presents the concept of resilience by assessing the impact and mitigation measures to major disturbances as a result of adverse weather, natural disasters, hurricanes, earthquakes, and cyberattacks. To increase system resilience requires an understanding of a wide range of preparatory, preventive, and remedial actions, as well as how these impact planning, operation and restoration over the entire life cycle of different kinds of grid failures.

The key attributes of a resilient power system are robustness, resistance, resourcefulness, and redundancy.

Risk-based methodology

With enough data available, one can also use these datasets as a source of features and for training modern machine learning approaches to predicting and quantifying risk. Machine learning and artificial intelligence approaches can also provide timely recommendations to the operator in charge of remedial actions. Common sources of data that recent research increasingly incorporates into risk and reliability studies are those characterising renewable resources. There is not a unique definition for resilience today, but the majority of published definitions focus on the power system’s ability to anticipate, absorb, and rapidly recover from an external, high-impact, low-probability event. A conceptual framework of power system resilience covers the following steps:

Step 1: Threat/event characterisation,

Step 2: Vulnerability of system’s components,

Step 3: System response, and

Step 4: System restoration.

The key attributes of a resilient power system are robustness, resistance, resourcefulness, and redundancy. Due to the limitations mentioned above the study does not cover all these attributes and the results are primarily related to Step 1. To measure for example, the robustness of a power system a comprehensive study needs to be performed to establish a threshold value of the consequence beyond which the performance of the system is considered to be unacceptable.

Risk analysis depends on characterisation of the threats, vulnerabilities, and consequences of adverse events to determine the expected loss of critical functionality. Due to the scope of this paper and the datasets the research team had control of, the authors are not applying the traditional risk-based methodology but are focusing on the risk factors that come from the outages included in the analysis. Therefore, authors have used the results on cluster statistics to evaluate an aggregated risk at operating entity level that could be helpful to TOs to identify mitigation measures to prevent or minimise the impacts of those outages.

Transmission availability data system (TADS)

For this analysis, TADS automatic (momentary and sustained) outages of TADS elements of 200kV and above for years 2013–2019 were grouped by TO. These outages were sorted in chronological order, then examined to select groups of outages inside a TO with starting times of two consecutive outages separated by at most 2 minutes. This process resulted in 4,246 groups that contained 10,501 outages (or 32.6% of all TADS automatic outages over a seven-year period).

Next, these groups were examined to detect outages that do not overlap in time with at least one other outage in the group. (Overlapping outages are defined here as outages that overlap in time, for any period of time. Namely, if two outages start at the same time, they overlap; if one of the outages starts earlier, the second outage should start before the first one ends for them to overlap.) These outages were removed from the study, and groups were redefined to contain only outages that overlap with one or more outages in the group. The resulted sets of outages are called clusters.

A cluster is a set of automatic outages of transmission elements in the same company that satisfies the following conditions: (a) when sorted by their start time, a difference between start time of any two consecutive outages does not exceed 2 minutes; (b) each outage in a cluster overlaps in time with at least one other outage in a cluster. Condition (b) implies that outages in each cluster are “continuous”, i.e., at any moment from the earliest start of all outages in the cluster to the latest end of all outages at least one outage continues.

The size of a cluster is defined as the number of outages it contains. For any cluster of size 2 and greater, the operator has at least one N-2 contingency, but depending on the cluster size may have multiple N-2, N-3, N-4 . . . contingencies. The final data set processed for this study consists of 2,918 clusters comprised of 6,942 automatic outages (or 21.6% of all 32,198 automatic outages of TADS elements 200kV and above from 2013 to 2019). Table 1 illustrates a breakdown of the outages in clusters by transmission element type and by voltage class as reported in TADS. For transformers, the voltage class is the high-side voltage. Voltages are operating voltages.

Analysis of clusters

The outages listed in Table 1 are grouped together into clusters as summarised in Table 2. The inclusion of automatic outages for all TADS elements allows the capture of more nearby overlapping outages and a better evaluation of their risks to dynamic stability and resilience of the transmission system.

Table 1: Transmission Availability Data System (TADS) automatic outage in clusters by
element type and voltage class.
Table 2: TADS automatic outages indicated by element type and voltage class.

Table 2 indicates that with the exception of the year 2014, the number of clusters in North America stayed consistent during the study period. In 2014 the number was significantly lower, and the largest cluster contained only seven outages. Overall, most clusters (76%) consist of two outages, with several outliers (clusters with sizes of between 11 and 18). The average size of a cluster equals 2.4 outages. An empirical distribution of the cluster size is illustrated in Figure 1.

Figure 1: Distribution of cluster sizes (2013 – 2019)

The 6,952 outages in clusters are divided into 2,007 momentary outages and 4,945 sustained outages (i.e. outages lasting at least 1 min). The percentage of sustained outages in clusters is significantly higher than in the total population of automatic outages for years 2013–2019 (71% versus 58%). Figure 2 lists the outages by TADS initiating cause. Several of the smallest groups are not shown (together they contain less than 1% of outage.

Figure 2: Outages in clusters by initiating cause (2013 – 2019)

Lightning initiates the largest number of outages in clusters, but the majority of them are momentary. In contrast, failed AC substation equipment is the leading cause of sustained outages in clusters but it initiates a relatively small number of momentary outages. Power system condition is the third largest group. Lightning, the top cause of outages in clusters, is the second leading cause of all automatic outages in TADS, but it initiates only 8% of outages in large clusters.

Unknown, the leading cause of TADS outages, ranks relatively low for clusters: it initiates 9% of outages in clusters and only 3% of outages in large clusters, because causes of larger transmission events tend to be better investigated and reported. Prominently, power system condition causes 25% of outages in large clusters while in TADS it ranks low (4% of TADS outages). This cause is reported for automatic outages caused by power system conditions such as instability, overload trip, out-of-step, abnormal voltage, abnormal frequency, or unique system configurations (e.g., an abnormal terminal configuration due to an existing condition with one breaker already out of service).

Company risk assessment

The cluster statistics presented in the previous sections can be used to evaluate a company risk caused by clusters of overlapping outages. The impact of a cluster can be defined, for example, as its size or in a more sophisticated way, as the sum of equivalent MVA values of transmission elements in this cluster. The likelihood of a cluster can be estimated as follows. The expected number nk(7) of clusters of size k over 7 years for a company A is estimated by: nk(7) = Nk(7)*Inv(A)/Inv(TADS


The comprehensive historical data analysis of cluster outages provides an operating entity with a quantitative method to identify the outages with the highest risks. The knowledge gained from this study will help companies to understand potential risks and to identify mitigation measures to prevent or minimise the impacts of those outages. A final word from the paper is that future research around outage prediction based on machine learning algorithms is needed to proactively cope with overlapping electrically close outages and to improve grid resilience. ESI


This is article is based on an adaption of a 2020 paper titled A Risk-Based Approach to Assess the Operational Resilience of Transmission Grids, written by Milorad Papic, Svetlana Ekisheva and Eduardo Cotilla-Sanchez. View online for a full list of references and diagrams. All tables and figures are attributed to the paper.