Estimating Solar PV Capacity & Building Energy Use with Satellite Imagery | Dr. Kyle Bradbury
- Hello and welcome back from the break. I'm Kyle Bradbury from Duke University. As a co-organizer of this event, I'm thrilled to kick off this session of the 2020 Energy Data Analytics Symposium on remote resources and infrastructure. Remote sensing isn't historically a field that's been closely linked with energy systems.
Yes, there's been remote sensing applications, finding suitable sites for planning generation development, some use of aerial imagery to monitor transmission rights of way and vegetation encroachment, and applications for oil and gas exploration. With a handful of niche applications, though, that was pretty much it. All that is changing. Now more than ever, remote sensing data of various types are being used as a way to understand, plan, and manage energy systems in novel, and I think exciting, ways. The speakers today present a glimpse into the vanguard of the space, demonstrating the potential of remote sensing data for energy systems. So, in this session, I'll kick us off discussing how distributed generation and consumption can be estimated using only satellite and aerial imagery.
Martha Morrissey will take it away demonstrating how development seed is using machine learning to help map out the connective tissue of electricity systems, transmission infrastructure. Then we'll close up the session with Dr. Heather Couture from Pixel Scientia Labs talking about how satellite imagery can be used to better understand the impacts of energy systems through monitoring emissions from greenhouse gases. So, a few other things to mention here. Actually before we jump into that, generation, transmission, and end use emissions, we're about to see what we can learn about the full electricity system value chain from the convenience of our nearest satellites.
I'm an assistant research professor in electrical and computer engineering here at Duke and managing director of the Energy Data Analytics Lab at the Duke University Energy Initiative. And my work is how, is really at the intersection of how much machine learning techniques can be used to better understand and plan sustainable energy systems. What I wanted to talk with you all about today is a question that I get quite regularly.
Well, what can we learn about energy systems using remote sensing data? Okay, so take a look at this image here. And, if we think about what we can, what we can learn from that, there's actually quite a bit that we can learn from a satellite image. So, first of all, you see buildings all over the place. The size and shapes of these buildings may differ, but there's something we often can learn from these buildings, which is, first of all, the energy being consumed by the building is often related to the size of that building, right? So, the larger the building, the more thermal comfort-related energy consumption is usually going on inside of the building. Similarly, we can see solar PV arrays here, right? We can see distributed generation, as long as it's been installed properly, from above. Obviously, if trees are covering those up, I'm probably not in an ideal location.
And then there are also slightly more subtle indicators here of energy system-related information. You can see, might be fairly faint down here, but in the corner you can see things like streetlights, distribution lines, transmission lines, information about electricity access, and even vehicles talking about transportation and energy consumption through potentially electric vehicles and other types of energy consumption activity. So, there's actually quite a bit of activity that we can explore with remote sensing data alone.
So, our research vision, that of of my collaborators and I, we're trying to put together a global automated assessment of energy infrastructure to develop pathways to sustainably meet energy needs, everything from renewable integration and maintaining system reliability to energy access and development of electricity systems for the first time. So, we've been working on a series of techniques at the Energy Data Analytics Lab to explore, really, the electricity value chain, right? Looking at energy supply, looking at the presence of solar PV arrays. And not just where they are, but potentially how much electricity can they generate, right? And where is that growth happening? Similarly, electricity demand. Building energy consumption. Can we get a better sense of how much electricity is being consumed by buildings through remote sensing technologies? And of course, the interconnective tissue in between transmission and distribution. I won't talk much about this.
Martha will be discussing some of her work on that soon. But this is another area that we've been working on as well. How do you identify the topology of transmission and distribution lines? Then, another thing that we've been working on is trying to apply these types of techniques to diverse environments can be rather challenging, especially if you train up and want your algorithms to automatically do these assessments of energy infrastructure in one location and want to apply them to another. So, another area of work is exploring how synthetic imagery, artificially created scenes, can be used to help us overcome some of those challenges with data limitations and having limited training data that's labeled for these types of activities. So, today specifically, we're gonna focus on these supply and demand components, but we'll briefly touch on some of the directions with synthetic imagery at the end. First of all, distributed energy supply, solar PV identification.
You know, by 2050, the EIA estimates that renewables will represent 38% of US electricity generation, and half of which will be provided by solar PV. Additionally, information on solar PV is quite important for system integration planning to ensure reliability and economic efficiency. Although we don't have a great deal of information on solar PV at anything below the state or national level.
And so, if we want to, through this work, we wanna better estimate the solar array locations, power capacity, and energy generated using remote sensing technologies, and do this automatically in a way that can be scaled up for frequent analysis to provide these data. So, what's the process that we're looking at, here? Well, we take some input satellite imagery, we apply convolutional neural networks to that that have been trained on some data, examples that provide a sense of what solar PV arrays look like in imagery. We then take the output of that, and that provides us with a polygon that is kind of the outline of that of that solar array. And we can use that polygon to say, "Oh, here's where we have solar PV."
We also can use that polygon to say, "Oh, look, we can measure the size of that solar PV array "in square meters." And then map that to how much energy as, first of all, how much power it can generate, because power capacity is proportional to size. And then, combine that with regional insulation data to then estimate how much energy can be generated from a given amount of installed solar PV. This will allow us to create high-resolution estimates of solar energy, the locations of those at a very, very high-resolution level, the capacity that can, that is represented by those, and the energy that could be generated through those, and that historically has been generated. For this, of course, we need to start with a foundation of some underlying data. We worked with a great group of students a few years back through a program called Data+ and another program called Bass Connections that are at Duke.
These are programs for students to engage in research while they're going through their studies. And the students produced a hand-labeled set, so, meaning they drew boxes around solar arrays in satellite imagery for over 19,000 solar arrays from across four California cities. They went through hundreds of square kilometers of imagery data to do this exhaustively over those areas and produce the data set available to download at the links shared there.
Now, using these data, we were able to train up the algorithms mentioned before in order to identify the, some of these in satellite and aerial imagery. And so, what does this look like in terms of some sample output? Well, here, what we're showing are, in green, the true positives, meaning solar arrays that we actually correctly found, false negatives, of which you can see one here, which is essentially a solar array that was present, but we missed it. We didn't identify that. And false positives, which were solar arrays that we, well, things that we said were solar arrays that actually were not.
You can see for the vast majority here, we're getting the solar arrays that were present with a few errors along the way. These examples are fairly representative of some of our results from a case study in San Diego, California that was in collaboration with partners at Lawrence Berkeley National Labs. And what we were able to find is that we were able to achieve a recall, meaning what fraction of of solar PV arrays did we find? We were able to get 84% of those. And of the things that we called solar arrays, 90% precision were actually solar arrays. We're really pleased with these results, and I think they demonstrate the potential to automatically identify a solar PV using these these techniques.
Additionally, we've looked at how the relationship between solar array area, the visible area from above, is connected to the capacity. It's not a perfect connection because, of course, there can be tilts and adjustments to the, and different types of solar panels that could be in play. But on the whole, it's a pretty good correlate for solar array capacity. And with that, we can combine that with regional installation data to be able to create maps of either location capacity or energy generation at the state level or beyond.
We've done some work like this in Connecticut and in San Diego, and we'll be publishing some more on that in the very near future. So, with solar arrays, distributed generation, we can get a handle on that using remote sensing methods with the right training data. We can also learn quite a bit about building energy consumption. And in this type of work and with all of these projects, these are big team efforts and I've listed the collaborators on each of these slides as appropriate. Shout outs to everyone on the team for the great work on this.
Building energy consumption is another area where we can learn a lot from remote sensing data. So, first of all, buildings use about 40% of total US energy, and efficiency improvements could reduce this by 20%. What does that mean? Well, 8% of all energy use and emissions could potentially be avoided with efficiency improvements to these buildings, but high resolution and up-to-date data on building energy consumption aren't generally available. So, what can we do? Well, we can apply these types of techniques to estimate building energy consumption as well. Let's look at how we might do this for residential building energy consumption estimation.
We can start with taking a satellite image, that of a region. We segment those buildings, meaning we identify where the buildings are in each of those images pixel-wise. We can then classify each of those detected buildings as being either commercial buildings or residential buildings. Then, if we're estimating residential building consumption, we take those residential buildings and extract features from those buildings. Maybe it's area. You know, the size of the building is gonna be, again, highly correlated with thermal comfort-related issues, one of the biggest consumers of energy in most residential homes.
Maybe things like perimeter, which define the complexity, could help to explain the complexity of a building, or other related features. We included a number of additional features in our work. Some of them, for those of you familiar with convolutional neural networks, were resonant extracted features.
These are automatically extracted features that may have semantic meaning, may not. Could be anything from color of the rooftop to proximity of a swimming pool. They're various features that are machine-generated. And we take all of those features, area, perimeter whatever else we have, and put them through a random forest classification model, excuse me, regression model, to estimate our energy consumption. What does each of those buildings consume with respect to energy? And we've evaluated the performance of each of these.
So, for identifying the buildings, we're finding 84 to 80, the intersection over union of 0.76. This means how much do our annotations of the building overlap with the building itself? If it's perfect, it'd be one. If it's completely disjunct, it would be zero. So, we were well pleased with that. Then we classify each of those buildings by type, getting that correct 99% of the time for residential buildings and 74% of the time for commercial buildings. And lastly, we measure our estimation of the energy consumption itself.
So, how well can we estimate building energy consumption beyond the regional average? That was our baseline that we compared against. You know, you can get a sense of, in an area, how much the typical building consumes. But how do we translate that into specific estimates for these buildings? Over that baseline average, we were able to improve individual building energy consumption estimation by 7% to 15%. And with small amounts of aggregation on the one square kilometer basis, we were able to improve that estimate by 28% to 42%.
So, we can, this is an area that we're continuing to explore, but definitely has potential going forward. The one last thing I'll mention, so, with all of these techniques, solar PV, building energy consumption, transferring what we've learned from one location to another can be challenging because the data look different, as you can see here. And getting labeled training data from new locations, which is often required for good performance, can be expensive. So, we've been developing techniques to overcome this issue by using synthetic satellite imagery for training some deep learning models.
And so, we're generating these synthetic data to enhance our training data corpus in order to be able to provide a boost there to performance in these techniques. I recently released a paper on this, and Jordan Malof was leading the way on some of that charge. So, I encourage you to read that if you're interested.
Just to wrap up, what I hope I've demonstrated with this is that we can use remote sensing data and machine learning coupled together to assess a number of different energy system characteristics, all the way from generation to end use consumption. And these data can be gathered without necessarily the need for manual surveys, with the assumption that we have access to the remote sensing data, right? And we're gonna be working to overcome some of the challenges of applying this to diverse geographies in order to facilitate being able to scale this up to ever-wider areas of application. So, I know that was a whirlwind tour, but with that, I'll thank you and open it up to questions. - Thank you, Kyle. That was great, and we encourage questions in the Q&A tab.
We've got a few minutes for questions. I'm actually really curious, Kyle, kind of what the next step of this is. You talked about demand and looking at building demand. Could you, it was actually referenced in the chat about pools. Could you identify specific characteristics of a building, whether there's a pool or maybe a commercial building with a large suite of EV charging stations? Although those are smaller assets than a building, could that help inform information about potential energy consumption of a building? - Yes, so that's a great point.
With the building energy consumption piece, I think one of the natural next steps is to extend some of that work from the residential sector to the commercial sector. You know, the commercial sector introduces a whole bunch of additional challenges in that the buildings and what's contained within them could be much more heterogeneous, right? And in the other challenge, there's a lot less publicly available training data to learn from, for example, buildings. So, I think that's one natural connection. I think there certainly is potential in the space around improving residential energy consumption estimates, using additional features, having, using information about other types of maybe demographic information that could be useful for estimating energy consumption in neighborhoods. If you know an area was built a couple of years ago 'cause the materials are newer, then those are probably better insulated, so that might be a little bit more energy efficient.
I think there are definitely are areas to move forward with that, commercial buildings and additional refinements in the individual building energy estimation process. - Oh, that's good to know. And one question that's getting uploaded in the Q&A is, folks are curious, going back to supply, are there other supply technologies that are not solar that are usable to identify with with this technique? - So, I think there are a number of different other aspects of the energy system that weren't mentioned that could be explored through this. I think one area that I think is interesting, others today may talk about as well, is around generators themselves, large power plants.
But not just, I mean, for example, in the US through the Energy Information Administration and other organizations, we have a pretty robust set of information on where generators are. That's not true throughout the world, so there's still an opportunity there to identify additional generators for places that we don't have information about. But there is also the potential for assessing what type of equipment is present in different power plants. What are the emissions remediation technologies that may be there, and what other, what other characteristics of the system might we be able to learn about through that type of technology? You know, I think that there's also a case, a strong case for the transmission and distribution side for evaluating the location of those components in the system, and especially from the energy access perspective. So, I think that's another really exciting application of these techniques in the space.
- That's great. I think we have time for one more quick question. Do you think that this technique works for less industrialized regions of the world and geographies that you haven't actually specifically looked at with some of the data that you were mentioning today? - Absolutely, absolutely.
And I think for certain applications, that's where the benefit is the greatest. For, you know if we're talking about, if we're talking about questions around energy access and communities transitioning to electricity access for the first time, getting information on the existing transmission infrastructure and the proximity to communities that are looking to electrify, that's extremely useful information for planning purposes. You know, in the US, we think about the existing transmission infrastructure that we may have, and we might be able to find those data, you may just have to pay for them. And that becomes one of the challenges around a lot of this, too, is the availability of the data, and sometimes that just takes purchasing it from a data provider.
But sometimes there's security issues around that and sensitivities around that. Understandably, we wanna make sure these systems remain reliable and secure. So, I think that there's actually a lot of room in this space for not just interesting applications, but thinking through ways of being able to provide data on these systems or close proximities of it, that would allow us to answer interesting research questions, interesting public policy questions that doesn't sacrifice any loss in resolution or security.