On Asymptotic and Finite-Time Optimality of Bayesian Predictors
Daniil Ryabko; 20(149):1−24, 2019.
Abstract
The problem is that of sequential probability forecasting for finite-valued time series. The data is generated by an unknown probability distribution over the space of all one-way infinite sequences. Two settings are considered: the realizable and the non-realizable one. Assume first that the probability measure generating the sequence belongs to a given set $C$ (realizable case), but the latter is completely arbitrary (uncountably infinite, without any structure given). It is shown that the minimax asymptotic average loss---which may be positive---is always attainable, and it is attained by a Bayesian predictor whose prior is discrete and concentrated on $C$. Moreover, the finite-time loss of the Bayesian predictor is also optimal up to an additive $\log n$ term (where $n$ is the time step). This upper bound is complemented by a lower bound that goes to infinity but may do so arbitrarily slow. Passing to the non-realizable setting, let the probability measure generating the data be arbitrary, and consider the given set $C$ as a set of experts to compete with. The goal is to minimize the regret with respect to the experts. It is shown that in this setting it is possible that all Bayesian strategies are strictly suboptimal even asymptotically. In other words, a sublinear regret may be attainable but the regret of every Bayesian predictor is linear. A very general recommendation for choosing a model can be made based on these results: it is better to take a model large enough to make sure it includes the process that generates the data, even if it entails positive asymptotic average loss, for otherwise any combination of predictors in the model class may be useless.
[abs]
[pdf][bib]| © JMLR 2019. (edit, beta) | 

