The Potential of Simulation Technologies: Multi-Agent Simulation and Reinforcement Learning
SATOKI FUJITA: Hello, everyone. Thank you, all, very much for coming today. I am Satoki Fujita, who belongs to Shionogi pharmaceutical company. It's been about two years since I became a working adult.
Currently as a data scientist, I'm struggling with a variety of data, including real-world data, every day, to contribute to data-driven decision-making. This time, I would like to make a presentation under the title of "The Potential of Simulation Technologies : Multi Agent Simulation & Reinforcement Learning." First is the Introduction. As a motivation for this presentation, we hope that it will be an opportunity to reaffirm the usefulness of simulation technology. In this era of rapid change, the existence of simulations that can reproduce various things in the virtual space of a computer is very important. This time, I'd like to introduce Multi-Agent SImulation that can flexibly reproduce various situations and Reinforcement Learning that can be used to search for optimal intervention strategies, and then show the power of them, through several examples of infectious disease spread.
First of all, I explain the background part. In our society every day, although the scale is large and small, the surrounding conditions and environment are changing every moment. For example, pandemic, traffic jam, earthquake, launch of new products etc., and it's required to grasp phenomena caused by them and make decisions in a timely manner. In such a case, what should we do? The answer is the "simulation" you all know By formulating a phenomenon in the real world, it can be translated into the model in the "Theoretical" world, where it can be simulated and reproduced in virtual space. And its behavior can be confirmed.
Then with the results obtained there as prior knowledge, it's possible to respond appropriately to the actual phenomenon. However, it's often difficult to model the entire phenomenon as it is. For example, the spread of infectious diseases. People who are the vehicle of infection have different behaviors, thoughts, and characteristics, and susceptibility to infection varies, depending on them.
It's difficult to predict how these people will interact with each other and how the infection will progress, as a result. However, even if the whole phenomenon is difficult, it can be said that the behavior of people who are its components is relatively easy to model. Multi-Agent Simulation can be used in such cases.
Multi-Agent Simulation as the name implies, is agent-based simulation. By modeling the micro components, rather than the entire macro phenomena, it reproduces the entire phenomenon that results from their interaction. Assuming a pandemic as an example, by modeling each person as an agent and acting autonomously along with it, we can see how the infection spreads by each contact. I will show you a concrete example later. Furthermore, there may be times when you want to find the optimal intervention based on the phenomenon reproduced in this way.
Reinforcement Leaning is a technique that can be used in such cases. Reinforcement Learning is one of the machine-learning methods to learn the optimal action strategy through trial and error. The agent takes action, and then the Environment returns a Reward and the next State.
And the system learns the best action to receive more of that reward. Multi-Agent Simulation and Reinforcement Learning enable a faithful reproduction of phenomena in the real world and derivation of optimal decision-making on them. An example is the optimization of measures by local governments for pandemic control.
In the framework of Reinforcement Learning, the Environment is a simulated world by Multi-Agent Simulation, and the local government is the Agent. Then, using Reinforcement Learning, we'll search for the optimal action, as is defined here is regulation, to control infection spread within the range that doesn't affect the economy as much as possible. Although, the introduction has become longer, as the first step to utilize simulation technologies in the situations where it's difficult to see ahead, like a pandemic, we will introduce about below.
This time, we'll show our results in the infectious disease area. So we'll first introduce the SEIR model, which is a standard infectious disease model. Following that, application examples of Multi-Agent Simulation and Reinforcement Learning is shown. Then it's a brief explanation of the SEIR model. In the SEIR model, the set of human beings is divided into four states : first, S state, those who don't have immunity; and E state, those who are infected during the incubation period; and I state, those who develop the disease; and the last, R state, those who recover. Then the transition between each state is expressed by an ordinary differential equation.
By setting the parameter values (beta, lambda, and gamma) that affect the speed of transition between each state. The SEIR model can express the transition status or infections spread. The figure below shows an example of the spread of infection, using the SEIR model.
The x-axis is time, and the y-axis is the number of people in each state. In the figure, as the number of infected people increases, the population of Susceptible decreases. And at the end, most of them are infected and recovered. However, there are many doubts in the world that people with various characteristics are moving according to their behavioral patterns. And it's doubtful that one ordinary differential equation can express the result of their interaction.
One of the technologies to solve such a problem is the Multi-Agent Simulation described below. Now I explain a practical example of Multi-Agent simulation. Considering the introduction of the four states mentioned in the explanation of the SEIR model, the spread of infection can be expressed in this way, in the Multi-Agent Simulation. First, place each person in any scale of the grid-like field and let them act step by step.
When a S state person comes into contact with an E state person, the S state person is infected with a certain probability and becomes E state. Then, after a certain period of time, it develops and becomes I state. And after a certain period of time, it recovers and becomes R state.
By repeating the steps in such a setting, you can see how the infection spreads. As points to be aware, this time it's assumed that I state people will take appropriate measures so as not to infect other people, and the infection is set to occur only from the E state person. This video shows the spread of infectious diseases by Multi-Agent simulation.
It may be hard to see, but the colored squares that represent each agent move every step. You can see that the infection has gradually spread, and the number of people who have been infected once and recovered, which is shown in red, is increasing. Based on this, I would like to reproduce a more realistic example by Multi-Agent Simulation.
Imagine a small community consisting of Home area, Society area, School area, and Office area. And by reproducing the movements of people in that community, let's see how the infection spreads in it. I'm just letting you know this in advance, but we are not assuming any particular real situation in this example. Assuming a grid-like field as before, set the Home area, Society area, School area, and Office area in it. In addition, we will prepare Hospital area, so that I state people will move there. One step is assumed as one hour, and people are made to act in each step.
In addition, it's assumed that there is a possibility of infection in the same mass. One square in the Home area represents one house and consists of a family of one to five people. Each person is either a Worker, Student, or Housemaker, and travels between home and other areas depending on the time of day. On weekdays, Workers go to the Office area, and Students go to School area.
Workers living alone may make a detour to the Society area for several hours after leaving the company. Housemakers may be in Society area for up to two hours. On holidays, everyone may spend hours in the Society area. The total number of humans is about 1,200.
I tried running the Multi-Agent Simulation under these settings. Let's see the behavior by Multi-Agent Simulation. You can see how each person goes to Office, School, and Society from their home and spends a few hours there and then returns home.
During their stay in the School area, Office area, and Society area, each person will move within each area. On weekdays, Housemakers spends most of their time at home, while Workers and Students go to Office and School, respectively, and get in touch with some. Prioritizing simplicity, the time spent in the School area and Office area is uniform, and the time spent in the society area is uniformly distributed.
As the number of infected people increases, the beds in the Hospital area will fill up. Looking at this graph of the cumulative number of infected people in each area, on the far right, we can see that there are many infections mainly within the family. In Society area, it seems that infection is progressing all at once, mainly when going out on holidays. The final transition graph looks like this. Next, let's see what happens if we add some restrictions to this community.
Considering facial coverings, self isolation, teleworking, outing restrictions, and school closure as regulations, we'll add these one by one as the regulation level goes up. So regulation level 0 is the normal state where nothing is regulated, and regulation level 5 is the most regulated state. From now on, let's increase the regulation level one by one and see what happens to the infection situation. First, assuming that facial coverings are equipped, the result is as shown in the figure on the right. As a result of settings the infection rate at the time of contact from 5% to 3% by equipping facial coverings, the number of infected people has decreased by about 200.
And it can be said that it's important to have a little awareness of infection prevention, even when close contact. In addition to regulation level one, if you impose a 14-day home wait after finding contact with an E state person, the infection spread will be as shown in the figure on the right. As you can see, that's the number of S state people at the end of the infection spread is larger than the number of S state people, the effect of the regulation is great. In addition, regulation level three imposes telework on about half of the Workers. The number of people who have experienced the infection has decreased by about 200, reaffirming the importance of closing the main source of infection.
Furthermore, at Level 4, we restrict going out to the Society area, but there was no big change. With this setting, the impact may have been small because there were not many infections in Society area. At level five, which has the highest level of regulation, school closures are added. As a result, the infection in the School area has completely disappeared, and the infection is over without an epidemic. From the viewpoint of preventing the spread of infection, its ideal to continue this Regulation Level five, but from the viewpoint of economic effect, it's the worst. In reality, you want to find the optimal regulation strategy that can reduce the negative impact on the economy, while still controlling the spread of the infection to some extent.
Reinforcement Learning can be used in such cases, which will be described in the next section. Now, let's see how reinforcement learning is used for decision-making. Environment in the framework of Reinforcement Learning is a small community constructed by Multi-Agent simulation earlier, and the Agent that takes action is a local government that regulates.
Furthermore, it's necessary to set the Reward, State, and Action that are exchanged between them. Action is, of course, the six regulations mentioned earlier (They are one, two, three, four, five). And the local government choose the action to take from these on a weekly basis. We want to appropriately select the best action to suppress the spread of infection without adversely affecting the economy as much as possible. So the reward is set in this way to reflect that. When the number of I state people exceeds number of beds, the first term becomes smaller, and when the regulations are tightened, the second term becomes smaller.
The observations are the number of S, E, and I state people. We compared the strategies learned by Reinforcement Learning with the two strategies, A and B, that can be easily considered. Strategy A raises the regulation level by one if the number of Infectious people increases compared to a week ago, and lowers it by one otherwise. Strategy B sets the regulation level to three if the number of Infectious state people increases compared to a week ago, and lowers by one otherwise. Looking at the figure, it can be seen that the number of Infectious state people is suppressed in the strategy learned by Reinforcement Learning and strategy B, but it is larger than the number of beds in strategy A.
The economic impact was also calculated as "Cost", using the second part of the Reward. Strategy by Reinforcement Learning has the least impact. We can say that the other two simpler strategies may be better in terms of early convergence of pandemic.
But the strategy learned by Reinforcement Learning are tuned so that they don't exceed the maximum number of beds, while limiting their economic impact. This time, we applied Reinforcement Learning with a very simple setting, but since research on deep reinforcement learning and multi agent Reinforcement Learning, etc. is also active, I think it will be possible to handle even more complicated situations. SAS Viya also provides the Action set for Reinforcement Learning, so please try it. In summary, by using Multi-Agent Simulation, which can flexibly compose phenomena from micro elements, it becomes possible to reproduce and experiment with various things on a computer.
Furthermore, by using Reinforcement Learning, it's possible to derive an optimal intervention strategy. In this rapidly changing era, we'd like to utilize these technologies to look ahead. This time, we introduced the application in the case of assuming a simple virtual situation. But in the future, we'd like to consider applying it to actual specific problems. We'd appreciate it if you feel the potential of technologies such as Multi-Agent Simulation and Reinforcement Learning, and if this presentation could be the start of utilizing these simulation technologies in various fields, not limited to the infectious diseases.
That's all. If you have any comments, please contact us. Thank you for your attention.