The distribution of electricity is plagued by a certain amount of loss of power, whether through technical or non-technical losses (NTL). This challenge sparked an idea of a new methodology for NTL detection combining fine-grained smart meter consumption data with cellular phone data records.
The original idea proposed by Carlo Papa at Enel Foundation was instantiated in a collective paper recently presented at CIRED 2019 in Madrid.
Smart meter data (SMD), with as fine as 15-minute data consumption reading for each customer, is a very rich source of information for better understanding energy consumption patterns, and accordingly planning and operating the energy distribution network. However, the size of the data set, composed of thousands of yearly readings for each individual customer, requires the use of sophisticated data processing methods and machine learning algorithms to extract useful information.
While SMD have been used to assist in NTL detection, this type of data by itself is not sufficient to enable effective detection. In fact, broadly speaking NTL detection would be empowered by a comparison between (a) expected energy activity in a certain area, and (b) the amount of energy use that is actually recorded by the smart meters. It is clear that SMD can be used only for (b), while (a) requires access to other sets that can be considered a proxy to human activity.
Among existing data sets, cellular phone Call Data Records (CDRs) have been extensively used to characterise human activities from different perspectives [2,3,4]. Compared to other data sets such as credit card transactions, vehicle GPS traces, and so on, CDRs have the advantage of (i) covering a vast majority of population, engaged in (ii) general, day-to-day activity. As for (i), we observe that the penetration rate of cellular phones approaches 100% of the active population, not only in developed countries but increasingly so also in the developing world .
As for (ii), we observe that while other data sets record human traces related to a particular activity (e.g., buying goods with a credit card, or traveling on a GPS equipped vehicle), CDRs record time and location of users engaged in general daily activities, and can thus be considered a better predictor of general human activity in an area with respect to other data sets. In particular, it has been extensively shown in the literature that CDRs allow a very accurate prediction of home and work location , which are the primary loci where NTL activity would take place.
The proposed methodology
To our best knowledge, the one reported herein is the first attempt to use SMD in combination with CDRs to improve the effectiveness of NTL detection. The high-level description of the proposed methodology is reported in Figure 1.
At step 1, spatio-temporal features of both SMD and CDR are analysed. It is important to observe that, in order to increase the potential of NTL detection, both spatial and temporal features of the recorded data should be exploited, which implies the adoption of clustering and machine learning methods for SMD that aims at detecting temporal consumption patterns, such as those reported in . Similarly for CDR, where methods for extracting temporal features of human activity have been proposed, e.g., in .
The outcome of step 1 is a collection of spatially distributed energy profiles (from SMD) and human activity profiles (from CDR). The goal of step 2, which is the key step of the methodology, is establishing a framework for comparing the obtained profiles, with the goal of identifying outliers, i.e., regions of the city/area of interest where statistically significant deviations between the energy and human activity profiles are detected. The set of so identified outliners is the input to the next step of the methodology, where traditional energy loss enquiry methods (e.g., sending a crew on the field) are put in place in the identified regions to verify whether the detected discrepancy between energy and human activity behaviours are actually the result of NTLs.
Comparing profiles Step 2 of the methodology consists of performing a statistically accurate comparison of the energy and human activity profiles built in the previous step. The energy and human activity profiles should be built using similar (ideally, the same) units for temporal and spatial analysis, so to ease a direct comparison of the two profiles. A crucial choice in this step is selecting features of the profiles that enable an effective identification of outliers. A promising candidate is looking at activity patterns occurring in what are the home/work locations of a user.
The rationale is the NTLs are likely to occur at either residential or commercial locations, most likely overlapping with somebody’s home or work location. Thus, by building human activity profiles by considering only records recorded at home or work location, it should be possible to remove noise and obtain a more accurate characterisation of the human activity profiles in an area. Separate profiles should also be built for weekdays/weekend periods, as well as for night time/daytime.
The outcome of step 1 can be interpreted as giving of input to step 2 and depending on the shape of the obtained distribution, the most appropriate outlier detection method can be selected, drawing upon the vast literature on this topic .
The final step of the methodology consists of using traditional NTL fraud detection techniques, but, instead of looking at the entire area of the city/region, looking only at the outlier spatial units obtained after step 2. By the very definition of outlier, those would account for only a very small fraction of the entire area under study, typically well below 5%.
Thus, the potential of the methodology lies in the ability of significantly narrowing down the scope of application of traditional, expensive NTL detection techniques, with a corresponding reduction in cost. On the downside, NTL could occur also outside the outlier regions identified by our proposed methodology.
However, these losses are likely of lesser entity, or more spatially dispersed, than the ones occurring in the outlier regions, making the use of traditional NTL detection techniques in those area ineffective or too expensive.
For Enel Foundation, the next step in the research will be the validation of the proposed methodology with real data sets by means of a concrete application. The application of data sets different from those usually available in electricity distribution companies is a promising field where new data science expertise can provide meaningful results in areas such as forecast of electricity demand or security of the system. ESI
This article is based on the paper TLC Pointer – The use of Geospatial Data for Non-Technical Loss Detection, which was presented at the 25th International Conference on Electricity Distribution (CIRED) in Madrid in June 2019. The paper was made available to ESI Africa by Enel Foundation.
 T. Teeraratkul, D. O’Neil, S. Lall, 2018, “Shape-based approach to household electric load curve clustering and prediction”, IEEE Trans. on Smart Grid, vol. 9, n. 5, 5196-5206.
 K. Kung, K. Greco, S. Sobolevsky, C. Ratti, 2014, “Exploring universal patterns in human home-work commuting from mobile phone data”, PLOS One, 9 (6): e96180
 M. Gonzalez, C. Hidalgo, A. Barabasi, 2008, “Understanding individual human mobility patterns”, Nature, vol. 453, n. 7196, pp. 779.
 L. Dong, S. Chen, Y. Chen, Z. Wu, C. Li, H. Wu, 2017, “Measuring economic activity in China with mobile big data”, EPJ Data Science, Vol. 6, n. 1, pp. 29.
 S. Grauwin, S. Sobolevsky, S. Moritz, I. Godor, C. Ratti, 2014, “Towards a comparative science of cities: using mobile traffic records in New York, London and Hong Kong”, Computational Approaches for Urban Environments, Springer, pp. 363-387.