LSTM-Based Dynamic Pricing in Home Retail

17/09/2021

LSTM-Based Dynamic Pricing in Home Retail

It is no secret that machine learning methods have become widespread in various business areas: optimizing, improving and even creating new business processes. One important area is the question of setting the price, and here, with enough data, ML helps to do what was previously difficult to achieve — to reconstruct a multi-factorial demand curve from the data. Thanks to the restored demand curve, it became possible to build dynamic pricing systems that allow you to optimize the price depending on the goal of pricing — to increase revenue or profit. This article is a compilation of research case, in which it has been developed and tested the LSTM-ANN dynamic pricing model for one of the products of a home goods retailer.

I want to note right away that in this article I will not disclose the name of the company in which the study was conducted (nevertheless, it is one of the companies from the list presented in the background), instead I will name it — Retailer.

Background

In the home goods retail market, there is a price leader — Leroy Merlin. The sales volumes of this network allow them to maintain a minimum price strategy for the entire product range, which leads to price pressure on other market players.

Revenue and profit of the main retailers in St. Petersburg as of December 31, 2017:

Name	Revenue, RUB mln.	Profit, RUB mln.
Leroy Merlin Vostok OOO	226 687	5 131
OBI FC LLC	15 559	(318)
STD «Petrovich» LLC	37 087	1 456
Castorama RUS LLC	32 489	(2 815)
Maksidom LLC	20 892	2 224

In this regard, the Retailer uses a different approach to pricing:

— the price is set at the level of the lowest of the competitors;

— limit on the price below: purchase price + minimum markup, reflecting the approximate costs per unit of goods.

This approach is a combination of the costly pricing method and focus on competitors’ prices. However, it is not perfect — it does not directly take into account consumer demand directly.

Due to the fact that the dynamic pricing model takes into account many factors (demand, seasonality, promotions, competitor prices), and also allows you to impose restrictions on the proposed price (for example, from below — cost coverage), this system potentially gets rid of all one-sidedness and disadvantages of other pricing methods.

Data

For research, the company provided data from January 2015 to July 2017 (920 days / 131 weeks). This data included:

daily sales, including weekends, for 470 products (16 item groups);
days of promotions in the store;
days in which discounts on goods were provided;
prices for each of the 470 items;
daily data on the number of checks throughout the network in St. Petersburg;
prices of the main competitors for most of the 470 products (data was taken once a week).

In addition to this data, I also added calendar dummy variables:

season of the year (autumn/winter/summer/spring);
month;
quarter;
day of the week;
h

As well as weather variables:

precipitation;
temperature;
deviation of temperature from the average in the season.

By directly analyzing the daily sales of goods, I found that:

Only about 30% of the products were sold all the time, all other products were either put on sale later than 2015 or removed from sale earlier than 2017, which led to a significant limitation in the choice of goods for research and price experiment. This also leads us to the fact that due to the constant change of products in the store line, it becomes difficult to create a complex per-item pricing system, however, there are some ways to get around this problem, which will be discussed later.

Price recommendation system

To build a price recommendation system for a product for the next period based on a model that predicts demand, I came up with the following scheme:

Feeding various prices as input to the trained model, we will receive sales estimation. Using such model we can optimize the price to achieve the desired result — to maximize the expected revenue or expected profit. It remains only to train a model that could predict sales well.

What didn’t worked

After selecting one of the products to research, I used XGBoost before going directly to the LSTM model.

I did this in the hope that XGBoost will help me discard a lot of unnecessary factors (this happens automatically), and use the remaining ones for LSTM models. I used this approach consciously, because I wanted to get a strong and, at the same time, a simple justification for the choice of factors for the model, on the one hand, and on the other, to simplify the development. In addition, I received a ready-made, draft model, on which I could quickly test different ideas in the study. And after that, having reached the final understanding of what will work and what will not, make the final LSTM model.

To understand the task of forecasting, I will give a graph of the daily sales of the first selected product:

The entire sales time series on the chart was divided by the average sales for the period, to not reveal the real values, but to keep the view.

In general, there is a lot of noise, while there are pronounced bursts — this is the conduct of promotions at the network level. Since this was my first experience building machine learning models, I had to spend quite a lot of time reading various articles and documentation in order for me to eventually get something.

Initial list of factors that affects sales:

Data on daily sales of other goods from this group, total sales in the group in pieces and the number of checks for all chain stores in St. Petersburg with lags 1, 2, 3, 7, 14, 21, 28;
Data on the prices of other goods from the group;
The ratio of the price of the studied product with the prices of other products from the group;
The lowest price among all competitors (data was taken once a week, and I made the assumption that such prices will be valid for the next week);
The ratio of the price of the studied product with the lowest price from competitors;
Sales lags by group (in pieces);
Simple average and RSI based on sales lags of the group’s products, total sales in the group, and the number of receipts.

A total of 380 factors were obtained. (2.42 observations per factor). Thus, the problem of cutting off non-significant factors was indeed high, but XGBoost helped to cut the number of factors to 23 (40 observations per factor).

The best result after hyperparameters greed search looks like this:

R^2-adj = 0.4 on the test set

The data was divided into training and test samples without mixing (since this is a time series). As a metric, I used the R^2 indicator, adjusted deliberately, since it’s most common for regression tasks and easy to understand.

The final results reduced my faith in success, because the result of R^2-adj 0.4 meant only that the prediction system would not be able to predict demand for the next day well enough, and the price recommendation would differ little from the random system.

Additionally, I decided to check how effective the use of XGBoost would be for predicting daily sales for a group of goods (in pieces) and predicting the number of checks in the whole network.

Sales by product group:

Checks:

I think the reason that sales data for a particular product could not be predicted is clear from the graphs presented — noise. Individual sales of goods turned out to be too random, so the regression method was not effective. At the same time, by aggregating the data, we removed the influence of randomness and got good predictive capabilities.

In order to finally convince myself that predicting demand one day ahead is a meaningless exercise, I used the SARIMAX model (the statsmodels package for python) for daily sales:

In fact, the results do not differ in any way from those obtained using XGBoost, which suggests that the use of a complex model in this case is not justified.

At the same time, I also want to note that weather factors were not significant for either XGBoost or SARIMAX.

Building the final model

The solution to improve the quality of the prediction was to aggregate the data to a weekly level. This made it possible to reduce the influence of random factors, however, it significantly reduced the number of observed data: if there were 920 daily data, then only 131 weekly data. Decreased greatly.

In addition, my task was complicated by the fact that at that time, the company decided to change the product on which the experiment will be carried out to new product -primer CERESIT, so I had to develop the model from scratch. The product was changed to a product with a pronounced seasonality:

Due to the transition to weekly sales, a question arose: is it generally adequate to use the LSTM model on such a small amount of data? I decided to find out in practice and, first of all, to reduce the number of factors. I threw out all the factors that are calculated on the basis of sales lags (averages, RSI), weather factors (on the daily data, the weather did not play a role, and the conversion to the weekly level, all the more, lost any meaning). After that, I used XGBoost to cut off other unimportant factors. Subsequently, I additionally cut out a few more factors, based on the LSTM model, simply excluding the factors one by one, retraining the model and comparing the results.

The final list of factors looks like this:

The ratio of the price per kilogram of the investigated product and the primer CERESIT ST 17 10 liters;
The ratio of the price of the investigated product and the product and primer CERESIT ST 17 10 l;
The ratio of the price of the investigated product and the primer EURO PRIMER 3 l;
The ratio of the price of the studied product and the minimum price of competitors;
Dummy variables for three promotions at the network level;
Dummy variables for spring, summer, autumn seasons;
Lags 1 — 5 weekly sales of the studied product.

There are 15 factors in total (9 observations per factor).

The final LSTM model was written using Keras, included 2 hidden layers (25 and 20 neurons, respectively), and an activator — a sigmoid.

Final LSTM model code using Keras:

model = Sequential()

model.add(LSTM(25, return_sequences=True, input_shape=(1, trainX.shape[2])))

model.add(LSTM(20))

model.add(Dense(1, activation=’sigmoid’))

model.compile(loss=’mean_squared_error’, optimizer=’adam’)

model.fit(trainX, trainY, epochs=40, batch_size=1, verbose=2)

model.save(‘LSTM_W.h5’)

Result:

Index	Train forecast	Test forecast
R^2-adj	0,92	0,83
RMSE	10,22	15,85

The quality of the prediction on the test sample looked quite convincing in terms of the metric, however, in my opinion, it still fell short of the ideal, because, despite the fairly accurate definition of the average sales level, spikes in individual weeks could deviate quite a lot from » average» level of sales, which gave a strong deviation of the sales forecast from reality on certain days (up to 50%). However, I have already used this model directly for the experiment in practice.

It is also interesting to see what the reconstructed price demand curve looks like. To do this, I ran the model over a range of prices and, based on the predicted sales, built a demand curve:

Experiment

Each week, I have been provided sales data for the previous week in St. Petersburg, as well as prices from competitors. Based on this data, I performed price optimization to maximize the expected profit based on trained LSTM model, and this price was used in real world for the next week. This went on for 4 weeks (the period was agreed with the retailer).

Profit maximization was carried out with restrictions: the minimum price was the purchase price + fix. surcharge, the maximum price was limited to the price of a Primer from the same manufacturer.

The results of the experiment are presented in the tables below:

Sales volume prediction:

	standard price	set price	predicted sales volume	actual sales	deviation, times
1 week	357	386	0,66	2,15	3,28
2 week	357	479	0,29	2,02	6,85
3 week	357	505	0,43	0,97	2,26
4 week	357	504	0,45	1,38	3,05

Profit prediction:

	predicted profit	real profit	deviation, times
1 week	44,68	146,37	3,28
2 week	47,82	324,68	6,85
3 week	80,51	182,20	2,26
4 week	84,29	257,08	3,05

In order to evaluate the impact of the new pricing system on sales, I compared sales for the same period, only for previous years.

Summary results for 4 weeks:

	2015	2016	2017 (experiment)	to 2015	to 2016
sales, pieces	6,04	5,75	6,52	+7,9%	+13,4%
sales weighted price	339	459	458	+35,0%	-0,3%
revenue	2050,83	2641,59	2985,40	45,6%	13,0%
profit	127,04	811,47	910,30	616,5%	12,2%

As a result, we get a twofold picture: completely unrealistic forecasts in terms of sales, but, at the same time, purely positive results in terms of economic indicators (both in terms of profit and revenue).

The explanation, in my opinion, is that in this case, the model, while incorrectly predicting the volume of sales, nevertheless, caught the right idea — the price elasticity for this product was below 1, which means that the price could be increased, not fearing for a drop in sales, which we saw (sales in pieces remained approximately at the same level as last year and the year before).

But you should also not forget that 4 weeks is a short period and the experiment was conducted on only one product. In the long run, overpricing goods in a store usually leads to a drop in sales in the store as a whole. To confirm my guess on this, I decided to use XGBoost to check if consumers have a “memory” for prices for previous periods (if in the past it was more expensive “in general” than competitors, the consumer goes to competitors). Whether the average price level for the group for the last 1, 3 and 6 months will affect sales by product groups?

Indeed, the guess was confirmed: one way or another, the average price level for previous periods affects sales in the current period. This means that it is not enough to optimize the price in the current period for a single product — it is also necessary to take into account the general price level in the long term. Which, in general, leads to a situation where tactics (profit maximization now) contradict strategy (survival in the competition). This, however, is already better left to marketers.

Given the results obtained and the experience in my opinion, the most optimal pricing system based on the sales forecast could look like this:

Aggregate sales of product nomenclature – using cluster analysis grouping by similarity and predict sales and set the price not for a single product, but for subgroup — this way we will avoid the problem of constantly removing and adding product nomenclatures.
Consider long-term effects. To do this, you can use a model that predicts sales for the whole retailer. Fortunately, it turned out to be impressively accurate even on daily sales.

LSTM-Based Dynamic Pricing in Home Retail

LSTM-Based Dynamic Pricing in Home Retail

Newsletter subscription

System brochure