# AI Disruption of Quantitative Finance: From Forecasting, to Generative Models to Optimization

Show Video

Or Any other function which basically, describes. The, investment, goal. Objective, functions, are, in general, some function, of the projected, distribution. Of the asset returns, at the end of the. Investment horizon. This means that the problem of portfolio, optimization. Naturally. Is a, stochastic. Uh, optimization. Problem. Uh one can solve it like in two ways yeah you can think about in a static, optimization. Where. You project. You project the. The, the, estimation. Of the next period, uh. Returns. All the way forward to to the end of the investment horizon. And you come up with the distribution, of your returns, at the end of the investment horizon, and solve the optimization. Just once for that. Or you can estimate, the next period returns. Dynamically. And solve the optimization. Problem, sequentially. To basically, set up a. Dynamic, optimization. Paradigm. It was markowitz, who pioneered the attempts, to solve the stochastic, optimization. Problem described in the last slide. This framework. Is actually consists of two steps. The first step, no matter which. Objective, function, you have. You just need to solve the mean variance optimization. Problem. It is a constraint, quadratic, optimization. Which tries to minimize, the portfolio, variance. While constraining, the expected portfolio, return. At a target level. Solutions. Like this for different values of target mean value. Define a curve in the mean variance, plane, which is called efficient. Frontier. Once you have division, frontier. One can solve a one-dimensional. Optimization. Problem, of optimizing. The custom objective, function. Along that curve, in the mean variance, plane. Um, so, one one might think, uh. To use this method to periodically. Solve this static, optimization. Problem. To make it a dynamical, setup, right. Well it seems that it is a viable option unless you try it in the real world. Because, in the real world you have the transaction, costs. The problem is that, the solution to the optimization. For two periods, might be so far away from each other. That rebalancing, the portfolio, and the costs, of transactions, lead to a sub-optimal, policy. In fact. Myopic, optimal, actions, can cause sub-optimal. Cumulative, rewards, at the end of the period. Um. So that. Now that we talked a little bit about, uh. The opt-in the portfolio optimization. Problem and how it was formulated. In terms of the stochastic, optimization. And the attempts that were made to basically, solve those those, uh, problems. We can discuss, now a little bit about, how we can formulate the portfolio, optimization. As a markup decision, process. And applies some of the. The methods in the reinforcement, learning.

To Solve the portfolio optimization. Problem. So first of all how is it, an mdp defined, right. Um. Let's assume a setup where at each time step, an agent starts, from an initial state, takes an action which is uh some kind of interaction, with the environment. And the environment, gives a reward to the agent, and. Changes the state. Right. If the state transition, probability. Which is determined, by the environment. Is, only a function of the current state and not all the history, of the up up to this point of the time. The dynamical, system is called markovian, decision, process. Okay. How does it look like for a trading agent, right. At the beginning of each period. The agent has to rebalance, the portfolio. And come up with a vector, of the acid holdings, this this basically defines the action so the action of a trading bot would be. The, directly, the the portfolio, weights that, is uh is coming up with at, the end of each period. What about the reward what is the reward function of the, environment. In general identifying. Reward, is a is a little bit more challenging. And. What is rewarded, report basically. Is a scalar, value, which fully specifies. The goals of the, agent. And maximization. Of the expected, cumulative, reward, over many steps. Will lead to the optimal solution of the task. Let's, let's look at some examples. Taking, games for example, the goal of the agent is very well defined in in games right, either you win or, you lose a game and it could be well divided. Into separate, reward signals for each time step. If you win a game at the end of the step. You got a reward of one, if you lose. Again, at the end at the end of a time step you get a reward of minus one for example. And you get a reward of zero otherwise. So. Very well defined, and very well divisible, into. Separate time steps, however. Take a trading, agent for example who wants to maximize, the return. But at the same time, do not want to expose, this fund. To extreme market downtrends. And crashes. He does it for example. By managing, the.

Value At risk of his portfolio. So the the objective of the agent, is clearly, defined. By, but dividing, this objective. Into. Sequential. Reward, signals. Uh might be a very challenging, task. Um. Now let's talk about the state and observation. At any step. We can only observe, asset prices, right, and the observation, is given by the price of all assets, this is clear. We also know that. When, one period prices, do not fully capture, the state of the market, so this is something which is known i mean you cannot basically, predict the whole state of the market by just looking at the, prices, of yesterday, for example. This makes, uh. Financial, markets, a little bit more challenging. And they. In general financial, markets, are not a fully observable, markup decision process, and they are just partially, observables, because we can only as agents. Uh observe. The prices. Um. So, what it means is that, the state that that an agent has, is completely, different from the state of the environment. And there are some solutions, to basically. Uh, build. The whole environment, state from the state of the age. The most obvious, solution is we can we can build the state of environment, from the whole, history. Of the observation. Which is basically, not scalable. Or. We can approximate, the environment, state, by some parameterized. Function of past observations. When we were working with time series. Uh. As as we're as we're doing that in financial, markets. It is natural, to. To assume that the state generating, function, is not only, a function of observations. But also a function of the past energy in states, right. So, with we think of some some models which has some kind of memory. Um. Let's look at some of the examples. Garsh models, so these are these are these are the models which are widely used in quantity, finance and, they are basically. Constructed, in this way. Assume that the state of the market, at each time can be fully represented, by the volatility, of individual, assets, this is the assumption, that basically, says. If you know the volatilities. You know the full state of the market. If you assume that garch models, can build a rather simple mapping, of past volatilities. And current observations, which are the prices, to generate, the volatilities. For the current time step, and therefore, they can, fully. Build. The state of the market. From the observations, that passed. From the past observation, and, past states. We can look at uh. In a con. We can look at other models like in in continuous, domain stochastic, volatility, models they do the same. They basically, build volatilities. Which are hidden states of the market. As. Um. By by just fitting a kind of stochastic, process. To to, to the volatilities. In this way they are able to basically. Generate, the hidden states which are volatilities. And. Generate, a full representation. Of of the market. But obviously one can use. A, more sophisticated. Featurization. Of the, of the hidden variables, or hidden state of the market, so it shouldn't be as, simple as just volatilities, one can, have a. Complicated. Representation. Of that and, neural networks, for example can can build those kind of complicated, models of, of the market state. But the common thing. Among all these models. Is that the state of the environment, is built. Used, using the past observations. And past states. And. The. The state of the agent. At the current time is not. Uh enough to basically, come up with the with the whole state of, of the financial, market, or basically, the returns for the next, period. Okay now that we talked about the mdp, formulation. Of portfolio, optimization. A little bit. I want to go through some of the main components. Of the reinforcement. Learning, in this part. To basically, put us in the position.

To Come up with some algorithms. That we want to eventually. Be implementing. Using reinforcement. Learning. Um, policies. So, policy is simply mapping, from a state, which an agent experience. To an action, that it takes, it could be deterministic. Policy, which means that if an agent finds himself in a certain state. He will always take a certain action. Or it could be a problem, probabilistic. Policy, which means that, he will choose a certain action from a spectrum, of all possible, actions. With some predefined, probability. Concept of value function so what is value function, value function is defined, as the expected. Amount of reward. One can get from an mdp. Starting, from the state, and following a certain policy. For example. If we define. The reward, of a trading, bot, to be. Just log returns, of portfolio, returns. At the end of each time step. The value function would be the expected, amount of cumulative. Return. At the at the end of the investment horizon. And models. What are models models of just agents represent, a representation. Of the environment. And it defines the transition, probabilities. Of the state and development. For example, if you assume. That the next step. Returns. Of the financial. Time series, following gaussian, distribution. The model of the environment is fully defined. Via the transition, probability, of a gaussian. Distribution. So, now that we have all the ingredients. In place. We want to talk about the model based in reinforcement, learning, import for the optimization. How the setup looks like and how we can basically, build. Algorithm, algorithms, based on these setups. We start from our familiar mdp, setup. Where an agent interacts, with the environment. And gets rewards, based on the action it takes. But now the idea is that, uh. The agent, first tries to learn the model of environment, from the transition. Uh. He, has been experiencing. So he's not going to. To, to, optimize, the policy, directly, from the experience. But he first tries to, learn some model from the transitions, that he's been experiencing. And then based on that model he will try to, to solve a, kind of optimization. So, at each time step. Uh. The agent first, predicts the next state because he has a model for the employment, so he he predicts the next state and the reward he will be getting. Based on the action e2, he took. Then he observed the real transition, the real rewards that he got from the environment. And then he can basically, incrementally. Update his model, because. He has a model and he has a loss function that he can basically, train. Uh train the model upon. So what are the advantages, of that, uh, that kind of paradigm. So. Um. There are some advantages, especially. In in, financial. Uh, portfolio, optimization. The most important one is that. There has been a lot of studies about the behavior, financial, markets. And. The properties, of the. Financial, time series data, it is very easy, to basically, implement, those findings, directly, into a model-based, reinforcement, learning. Paradigm. Right, so you basically, can put all those findings. Explicitly, into a model. And then. Have a model that best describes. Uh the financial, market transitions. So, things like, volatility, clustering. Things like heavy tales of the returns. Tail dependence. Among different assets. Existence, of jumps, and non-stationarity. Can be directly, modeled and learned from the data. But then obviously there are some disadvantages. Because you have an explicit, model, that that you have to first. To learn. There are. Uh. Some sorts of errors and approximations. Coming, coming, right, so you you first have to to learn a model and if your model is not a, an accurate. Representation. Of the. Environment. The the optimal policies, that you learn based on that that model won't be optimal, at all because, you have a model which which cannot, or is not basically, describing, the, the market, as good as, if it can or it should. Um. So, let's formulate, everything that we've been talking about the model best, reinforcement, learning. Um. What should we do. Um. Well, in general, if you want to basically, use reinforcement. Learning. Or model-based, reinforcement, learning. We need to gather some experience, by interacting, with the environment, and figuring, out. The model from those, experience, that we have been, gathering, right. But in finance. It is a little bit much easier, because. Uh, the interactions, that we make with environment.