btfit
fits the Bradley-Terry model on (potentially) large and sparse datasets.
btfit(btdata, a, MAP_by_component = FALSE, subset = NULL, maxit = 10000, epsilon = 0.001)
btdata | An object of class "btdata", typically the result ob of ob <- btdata(..). See |
---|---|
a | Must be >= 1. When |
MAP_by_component | Logical. Only considered if a > 1. Then, if FALSE, the MAP estimate will be found on the full dataset. If TRUE, the MAP estimate will be found separately for each fully-connected component. |
subset | A condition for selecting a subset of the components. This can either be a character vector of names of the components, a single predicate function (that takes a component as its argument), or a logical vector of the same length as the number of components). |
maxit | The maximum number of iterations for the algorithm. If returning \(\pi\) by component, this will be the maximum number of iterations for each component. |
epsilon | Determines when the algorithm is deemed to have converged. (See Details.) |
btfit
returns an S3 object of class "btfit". It is a list containing the following components:
The matched call
A list of length \(M\), where \(M\) is the number of fully-connected components of the comparison graph \(G_W\) (or the requested subset) of two or more items. The \(m\)-th list item is a named vector \(\pi\), the strength parameter, for the items in the \(m\)-th fully connected component, \(m = 1, \ldots, M\). These are sorted in descending order.
A vector of length \(M\). The \(m\)-th entry is the number of iterations it took for the algorithm to converge for the \(m\)-th component, for \(m = 1, \ldots, M\). Note that if the algorithm has not converged in any component, a warning will be produced.
A logical vector of length \(M\), indicating whether the algorithm has converged for the \(m\)-th component in maxit
iterations.
A list of length \(M\). The \(m\)-th list item is a matrix where each element \(n_{ij}\) is the number of times item \(i\) played against item \(j\), for the items in the \(m\)-th component. The rows and columns are arranged in the same order as the ordered pi vector(s).
A list of length \(M\). The \(m\)-th item is a vector of the diagonal elements of the btdata$wins
matrix, for the items in the \(m\)-th fully-connected component. These values are used as the fitted values for the diagonal of the matrix output in fitted.btfit
.
The names of the dimnames of the original btdata$wins
matrix.
Let there be \(K\) items, let \(\pi_k\) be the Bradley-Terry strength parameter of item \(k\), for \(k = 1, \ldots, K\) and let \(\pi\) be the vector of all the \(\pi_k\). Let \(w_{ij}\) be the number of times item \(i\) wins against item \(j\), let \(n_{ij} = w_{ij} + w_{ji}\) be the number of times they play, with \(w_{ii} = 0\) by convention and let \(W_i = \sum_{j=1}^K w_{ij}\). Then the Bradley-Terry model states that the probability of item \(i\) beating item \(j\), \(p_{ij}\), is: $$p_{ij} = \frac{\pi_i}{\pi_i + \pi_j}.$$
The comparison graph, \(G_W\), has the \(K\) players as the nodes and a directed edge from node \(i\) to node \(j\) whenever item \(i\) has beaten item \(j\) at least once. The MLE of the Bradley-Terry model exists and is finite if and only if the comparison graph is fully-connected (i.e. if there is a directed path from node \(i\) to node \(j\) for all items \(i\) and \(j\)).
Assuming that the comparison graph of the data is fully-connected, the MLE of the Bradley-Terry model can be found using the MM-algorithm (Hunter, 2004).
If the comparison graph of the data is not fully-connected, there are two principled options for fitting the Bradley-Terry model. One is to find the MLE within each fully-connected component. The other is to find the Bayesian MAP estimate, as suggested by Caron & Doucet (2012), where a \(Gamma(a,b)\) gamma prior is placed on each \(\pi_i\), and the product of these is taken as a prior on \(\pi\). The MAP estimate can then be found with an EM-algorithm. When \(a = 1\) and \(b = 0\), the EM and MM-algorithms are equivalent and the MAP estimate and MLE are identical. The rate parameter of the Gamma prior, \(b\), is not likelihood identifiable. When \(a > 1\), \(b\) is set to \(aK - 1\), where \(K\) is the number of items in the component; this choice of \(b\) minimises the number of iterations needed for the algorithm to converge.
The likelihood equations give $$a - 1 + W_i = b\pi_i + \sum_{j \neq i} \frac{n_{ij}\pi_i}{\pi_i + \pi_j},$$
for \(i = 1, \ldots, K\). For the algorithm to have converged, we want \(\pi\) to be such that the LHS and RHS of this equation are close for all \(i\). Therefore, we set the convergence criteria as $$\left|\frac{a - 1 + W_i}{b\pi_i + \sum_{j \neq i} \frac{n_{ij}\pi_i}{\pi_i + \pi_j}} - 1\right| < \epsilon,$$
for all \(i\).
Since the equations do not typeset well within the R help window, we recommend reading this section online: https://ellakaye.github.io/BradleyTerryScalable/reference/btfit.html.
Caron, F. and Doucet, A. (2012) Efficient Bayesian Inference for Generalized Bradley-Terry Models. Journal of Computational and Graphical Statistics, 21(1), 174-196.
Hunter, D. R. (2004) MM Algorithms for Generalized Bradley-Terry Models. The Annals of Statistics, 32(1), 384-406.
btdata
, summary.btfit
, coef.btfit
, fitted.btfit
, btprob
, vcov.btfit
, simulate.btfit
citations_btdata <- btdata(BradleyTerryScalable::citations) fit1 <- btfit(citations_btdata, 1) summary(fit1)#> $call #> btfit(btdata = citations_btdata, a = 1) #> #> $item_summary #> # A tibble: 4 x 3 #> component item estimate #> <chr> <chr> <dbl> #> 1 full_dataset JRSS-B 1.0604065 #> 2 full_dataset Biometrika 0.7897537 #> 3 full_dataset JASA 0.3095638 #> 4 full_dataset Comm Statist -2.1597241 #> #> $component_summary #> # A tibble: 1 x 4 #> component num_items iters converged #> <chr> <int> <int> <lgl> #> 1 full_dataset 4 2 TRUE #>toy_df_4col <- codes_to_counts(BradleyTerryScalable::toy_data, c("W1", "W2", "D")) toy_btdata <- btdata(toy_df_4col) fit2a <- btfit(toy_btdata, 1) summary(fit2a)#> $call #> btfit(btdata = toy_btdata, a = 1) #> #> $item_summary #> # A tibble: 7 x 3 #> component item estimate #> <chr> <chr> <dbl> #> 1 2 Han 0.69564153 #> 2 2 Gal 0.41253614 #> 3 2 Fin -1.10817768 #> 4 3 Cyd 0.59239992 #> 5 3 Amy 0.03250119 #> 6 3 Ben -0.24307179 #> 7 3 Dan -0.38182932 #> #> $component_summary #> # A tibble: 2 x 4 #> component num_items iters converged #> <chr> <int> <int> <lgl> #> 1 2 3 6 TRUE #> 2 3 4 10 TRUE #>fit2b <- btfit(toy_btdata, 1.1) summary(fit2b)#> $call #> btfit(btdata = toy_btdata, a = 1.1) #> #> $item_summary #> # A tibble: 8 x 3 #> component item estimate #> <chr> <chr> <dbl> #> 1 full_dataset Eve 1.90113420 #> 2 full_dataset Cyd 0.47237293 #> 3 full_dataset Han 0.24535391 #> 4 full_dataset Amy -0.07655328 #> 5 full_dataset Gal -0.10175687 #> 6 full_dataset Ben -0.42296697 #> 7 full_dataset Dan -0.53638389 #> 8 full_dataset Fin -1.48120003 #> #> $component_summary #> # A tibble: 1 x 4 #> component num_items iters converged #> <chr> <int> <int> <lgl> #> 1 full_dataset 8 101 TRUE #>fit2c <- btfit(toy_btdata, 1, subset = function(x) length(x) > 3) summary(fit2c)#> $call #> btfit(btdata = toy_btdata, a = 1, subset = function(x) length(x) > #> 3) #> #> $item_summary #> # A tibble: 4 x 3 #> component item estimate #> <chr> <chr> <dbl> #> 1 3 Cyd 0.59239992 #> 2 3 Amy 0.03250119 #> 3 3 Ben -0.24307179 #> 4 3 Dan -0.38182932 #> #> $component_summary #> # A tibble: 1 x 4 #> component num_items iters converged #> <chr> <int> <int> <lgl> #> 1 3 4 10 TRUE #>