Discussion Papers 2024

CIRJE-F-1234	"Probability-based A/B testing with Adaptive Minimax Regret (AMR) Criterion for Long-Term Customer Metrics"
Author Name	Abe, Makoto
Date	August 2024
Full Paper	PDF file
Remarks

Abstract
In economics, uncertainty is distinguished into two types: risk, which can be evaluated in terms of probability, and ambiguity, in which the probability is unknown. In decision making under risk, the rational course of action is to make a choice that maximizes expected utility, which is the utility of an event weighted by its probability. On the other hand, under ambiguity, where the probability is unknown, how should decisions be made? We first introduce the Minimax Regret, a decision-making criterion under ambiguity where probabilities are unknown but the interval is known. As a concrete example, consider two slot machines: one existing and one new. The winning probability of the former is known, while the winning probability of the latter is unknown, with only the interval provided. In this case, the optimal strategy according to the Minimax Regret criterion would be to randomly pull each of the two slot machines with a certain probability. Next, when utility is measured by a long-term metric, the interval of uncertainty for this metric decreases over time. To address this, we introduce the Adaptive Minimax Regret (AMR) approach, which maximizes utility by updating the probabilities according to the Minimax Regret criterion based on the information available at each point in time. Simulation testing on the case of the existing and new slot machines mentioned earlier showed that AMR produced high performance comparable to bandit algorithms. As an application of AMR in marketing, we propose sequential campaign strategies and probabilistic A/B testing aimed at maximizing the average customer lifetime (utility) of the target audience.
Keywords: .Minimax Regret, Bandit Algorithm, Probabilistic A/B Testing

Abstract

In economics, uncertainty is distinguished into two types: risk, which can be evaluated in terms of probability, and ambiguity, in which the probability is unknown. In decision making under risk, the rational course of action is to make a choice that maximizes expected utility, which is the utility of an event weighted by its probability. On the other hand, under ambiguity, where the probability is unknown, how should decisions be made? We first introduce the Minimax Regret, a decision-making criterion under ambiguity where probabilities are unknown but the interval is known. As a concrete example, consider two slot machines: one existing and one new. The winning probability of the former is known, while the winning probability of the latter is unknown, with only the interval provided. In this case, the optimal strategy according to the Minimax Regret criterion would be to randomly pull each of the two slot machines with a certain probability. Next, when utility is measured by a long-term metric, the interval of uncertainty for this metric decreases over time. To address this, we introduce the Adaptive Minimax Regret (AMR) approach, which maximizes utility by updating the probabilities according to the Minimax Regret criterion based on the information available at each point in time. Simulation testing on the case of the existing and new slot machines mentioned earlier showed that AMR produced high performance comparable to bandit algorithms. As an application of AMR in marketing, we propose sequential campaign strategies and probabilistic A/B testing aimed at maximizing the average customer lifetime (utility) of the target audience.

Keywords: .Minimax Regret, Bandit Algorithm, Probabilistic A/B Testing