Rigistrasse 28

5634 Merenschwand

Thomas Reding

+41 56 535 95 13



1605 01862 Optimal market making


We also plan to compare the performance of the Alpha-AS models with that of leading RL models in the literature that do not work with the Avellaneda-Stoikov procedure. Post-hoc Mann-Whitney tests were conducted to analyse selected pairwise differences between the models regarding these performance indicators. Table 6 compares the results of the Alpha-AS models, combined, against the two baseline models and Gen-AS. The figures represent the percentage of wins of one among the models in each group against all the models in the other group, for the corresponding performance indicator. To start filling Alpha-AS memory replay buffer and training the model (Section 5.2). Therefore, by choosing a Skew value the Alpha-AS agent can shift the output price upwards or downwards by up to 10%.


DRL has been used generally to determine the actions of placing bid and ask quotes directly [23–26], that is, to decide when to place a buy or sell order and at what price, without relying on the AS model. Spooner proposed a RL system in which the agent could choose from a set of 10 spread sizes on the buy and the sell side, with the asymmetric dampened P&L as the reward function (instead of the plain P&L). Combining a deep Q-network (see Section 4.1.7) with a convolutional neural network , Juchli achieved improved performance over previous benchmarks.

Market Making via Reinforcement Learning

This limits the influence of a single observation on the Q-value to which it contributes. Closing_time – Here, you set how long each “trading session” will take. After choosing the exchange and the pair you will trade, the next question is if you want to let the bot calculate the risk factor and order book depth parameter. If you set this to false, you will be asked to enter both parameters values.

But suppose you have fun reading intricate scientific papers (I do!). In that case, the original article is easy to find on a quick internet search, or you can find the original publication here. This article will explain the idea behind the classic paper released by Marco Avellaneda and Sasha Stoikov in 2008 and how we implemented it in Hummingbot.

Other indicators, such as the Sortino ratio, can also be used in the reward function itself. Another approach is to explore risk management policies that include discretionary rules. Alternatively, experimenting with further layers to learn such policies autonomously may ultimately yield greater benefits, as indeed may simply altering the number of layers and neurons, or the loss functions, in the current architecture. Maximum drawdown registers the largest loss of portfolio value registered between any two points of a full day of trading. Similarly, on the Sortino ratio, one or the other of the two Alpha-AS models performed better, that is, obtained better negative risk-adjusted returns, than all the baseline models on 25 (12+13) of the 30 days. Again, on 9 of the 12 days for which Alpha-AS-1 had the best Sharpe ratio, Alpha-AS-2 had the second best; and for 10 of the 13 test days for which after Alpha-AS-2 obtained the best Sortino ratio, Alpha-AS-1 performed XRP avellaneda-stoikov paper second best.

In contrast, we propose maintaining the Avellaneda-Stoikov procedure as the basis upon which to determine the orders to be placed. We use a reinforcement learning algorithm, a double DQN, to adjust, at each trading step, the values of the parameters that are modelled as constants in the AS procedure. The actions performed by our RL agent are the setting of the AS parameter values for the next execution cycle.

Bertram’s pairs trading strategy with bounded risk

Comparison of values for Max DD and P&L-to-MAP between the Gen-AS model and the Alpha-AS models (αAS1 and αAS2). Table 8 provides further insight combining the results for Max DD and P&L-to-MAP. From the negative values in the Max DD columns, we see that Alpha-AS-1 had a larger Max DD (i.e., performed worse) than Gen-AS on 16 of the 30 test days.

memory replay buffer

Reducing the number of features considered by the RL agent in turn dramatically reduces the number of states. This helps the algorithm learn and improves its performance by reducing latency and memory requirements. The ranges of possible values of the features that are defined in relation to the market mid-price, are truncated to the interval [−1, 1] (i.e., if a value exceeds 1 in magnitude, it is set to 1 if it is positive or -1 if negative). So, as the trading session is getting closer to the end, order spreads will be smaller, and the reservation price position will be more “aggressive” on rebalancing the BTC inventory. The reasoning behind this parameter is that, as the trading session is getting close to an end, the market maker wants to have an inventory position similar to when the one he had when the trading session started.

[Level 1] Basic Concepts of Crypto Trading

It can then start exploiting this knowledge to apply an action selection policy that takes it closer to achieving its reward maximization goal. With the risk aversion parameter, you tell the bot how much inventory risk you want to take. A value close to 1 will indicate that you don’t want to take too much inventory risk, and hummingbot will “push” the reservation price more to reach the inventory target.

Top 10 Quant Professors 2022 – Rebellion Research

Top 10 Quant Professors 2022.

Posted: Thu, 13 Oct 2022 07:00:00 GMT [source]

Among the members of this class we denote the best performing one as the LIN strategy. Note that this class of strategies also includes the state-of-the-art Gueant-Lehalle-Fernandez-Tapia approximations. Additionally, we consider a simple strategy that always places limit orders precisely at the best bid and the best ask. Regardless of the approach used, faithful LOB modeling, ideally accounting for the empirical properties and stylized facts of market microstructure as well as the discrete nature of the LOB itself, is pivotal to obtaining high-performing MM controllers. For example, in the original AS model , price movements are assumed to be completely independent of the arrivals of market orders and the LOB dynamics, while the subsequent approaches only partly address such inconsistencies. To ameliorate this, a novel weakly-consistent pure-jump market model that ensures that the price dynamics are consistent with the LOB dynamics with respect to direction and timing is proposed in .

Low-rank approximation algorithms aim to utilize convex nuclear norm constraint of linear matrices to recover ill-conditioned entries caused by multi-sampling rates, sensor drop-out. However, these existing algorithms are often limited in solving high-dimensionality and rank minimization relaxation. In this paper, a robust kernel factorization embedding graph regularization method is developed to statically impute missing measurements. Specifically, the implicit high-dimensional feature space of ill-conditioned data is factorized by kernel sparse dictionary.

  • This is the default mode when you create a new strategy, but if you have your model to determine these values, you can deactivate the “easy” mode by setting config parameters_based_on_spread to False.
  • 1 illustrates the bid and ask prices and their 5-level queues for a stock at two consecutive time points .
  • We discuss a potential application of order flow imbalance as a measure of adverse selection in limit order executions, and demonstrate how it can be used to analyze intraday volatility dynamics.

The goal of this paper is first to propose an optimal quoting strategy that is adopted by the stochastic volatility, drift effect and market impact by the amount and type of the orders in the price dynamics. We also consider the case of the market impact occuring by the jumps in volatility dynamics. We derive the closed-form solutions for the optimal quotes and solve the corresponding nonlinear HJB equations using the finite difference discretization method which enables us to evaluate the spread values and derive the various simulation analyzes. Furthermore, we explore the risk and normality testings of the models depending on their strategies. Lastly, we compare the models that we have derived in this paper with existing optimal market making models in the literature under both quadratic and exponential utility functions. We have designed a market making agent that relies on the Avellaneda-Stoikov procedure to minimize inventory risk.

But this kind of approach, depending on the market situation, might lead to market maker inventory skewing in one direction, putting the trader in a wrong position as the asset value moves against him. In this part, we operate the simulations under the quadratic utility function for all introduced models here for the comparison purposes, although they have been defined with different utility criteria and solved under the different settings in their original papers. This is a small inventory-risk aversion value but is enough to force the inventory process to revert to zero at the end of the trading. Of exists and is unique that should be guaranteed by the verification theorem so that this classical solution is the value function of the HJB equation and the spreads, defined by , are indeed the optimal ones. Is Markovian, the optimization problem can be solved using the stochastic control approach (Bates 2016; Björk 2012; Pham 2009). The solution will be based on two different choices of utility functions, quadratic and exponential, in the sequel.

AS-Gen had the best P&L-to-MAP ratio only for 2 of the test days, coming second on another 4. The mean and the median P&L-to-MAP ratio were very significantly better for both Alpha-AS models than the Gen-AS model. On this performance indicator, AS-Gen was the overall best performing model, winning on 11 days.

  • Again, the probability of selecting a specific individual for parenthood is proportional to the Sharpe ratio it has achieved.
  • We show that, over short time intervals, price changes are mainly driven by the order flow imbalance, defined as the imbalance between supply and demand at the best bid and ask prices.
  • For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available.
  • Figures in parenthesis are the number of days the Alpha-AS model in question was second best only to the other Alpha-AS model (and therefore would have computed another overall ‘win’ had it competed alone against the baseline and AS-Gen models).
  • Clients also benefit, as internalisation reduces market impact.

R is the latest avellaneda-stoikov paper obtained from state s by taking action a. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. ArXiv is committed to these values and only works with partners that adhere to them. ArXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Papers With Code is a free resource with all data licensed under CC-BY-SA.

Note that this is how much % of the total inventory value you want to have allocated on the base asset. For example, if you are trading BTC-USD but want to focus on keeping your inventory 100% on BTC, you set this value to 100. Starting with the strategy name, you have to enter avellaneda_market_making to use this new strategy. After that, use config order_book_depth_factor and config risk_factor to set your custom values.

Market-making by a foreign exchange dealer – Risk.net

Market-making by a foreign exchange dealer.

Posted: Wed, 10 Aug 2022 07:00:00 GMT [source]

However, existing methods fail to achieve both the two goals simultaneously. To fill this gap, this paper presents an interpretable intuitionistic fuzzy inference model, dubbed as IIFI. While retaining the prediction accuracy, the interpretable module in IIFI can automatically calculate the feature contribution based on the intuitionistic fuzzy set, which provides high interpretability of the model. Also, most of the existing training algorithms, such as LightGBM, XGBoost, DNN, Stacking, etc, can be embedded in the inference module of our proposed model and achieve better prediction results.


Are they scaled by some scaling parameter beforehand – and what data is this parameter estimated from ? If not, how much data is lost by only using the price differences with absolute values smaller than 1? Also, if the market candle features are „divided by the open mid-price for the candle“, does this mean that all of those higher than the mid-price would be would be truncated to 1? The methodology might be more sound than this, but the text simply does not offer answers to these questions.

Both Alpha-AS models performed better than the rest on 19 days. Meanwhile, AS-Gen, again the best of the rest, won on Sortino on only 3 test days. The mean and the median of the Sortino ratio were better for both Alpha-AS models than for the Gen-AS model , and for the latter it was significantly better than for the two non-AS baselines. Thus, the Alpha-AS models came 1st and 2nd on 20 out of the 30 test days (67%). The btc-usd data for 7th December 2020 was used to obtain the feature importance values with the MDI, MDA and SFI metrics, to select the most important features to use as input to the Alpha-AS neural network model. The data for the first use of the genetic algorithm was the full day of trading on 8th December 2020.


Mann-Whitney tests comparing the four daily https://www.beaxy.com/ indicator values (Sharpe, Sortino, Max DD and P&L-to-MAP) obtained for the Gen-AS model with the corresponding values obtained for the other models, over the 30 test days. Number of days either Alpha-AS-1 or Alpha-AS-2 scored best out of all tested models, for each of the four performance indicators. For every day of data the number of ticks occurring in each 5-second interval had positively skewed, long-tailed distributions. The means of these thirty-two distributions ranged from 33 to 110 ticks per 5-second interval, the standard deviations from 21 to 67, the minimums ran from 0 to 20, the maximums from 233 to 1338, and the skew ranged from 1.0 to 4.4. The dataset used contains the L2 orderbook updates and market trades from the btc-usd (bitcoin–dollar pair), for the period from 7th December 2020 to 8th January 2021, with 12 hours of trading data recorded for each day. Most of the data, the Java source code and the results are accessible from the project’s GitHub repository .

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert