btfit fits the Bradley-Terry model on (potentially) large and sparse datasets.

btfit(btdata, a, MAP_by_component = FALSE, subset = NULL, maxit = 10000,
epsilon = 0.001)

## Arguments

btdata An object of class "btdata", typically the result ob of ob <- btdata(..). See btdata. Must be >= 1. When a = 1, the function returns the maximum likelihood estimate (MLE) of $$\pi$$ (by component, if necessary). When a > 1, a is the shape parameter for the Gamma prior. See Details. Logical. Only considered if a > 1. Then, if FALSE, the MAP estimate will be found on the full dataset. If TRUE, the MAP estimate will be found separately for each fully-connected component. A condition for selecting a subset of the components. This can either be a character vector of names of the components, a single predicate function (that takes a component as its argument), or a logical vector of the same length as the number of components). The maximum number of iterations for the algorithm. If returning $$\pi$$ by component, this will be the maximum number of iterations for each component. Determines when the algorithm is deemed to have converged. (See Details.)

## Value

btfit returns an S3 object of class "btfit". It is a list containing the following components:

call

The matched call

pi

A list of length $$M$$, where $$M$$ is the number of fully-connected components of the comparison graph $$G_W$$ (or the requested subset) of two or more items. The $$m$$-th list item is a named vector $$\pi$$, the strength parameter, for the items in the $$m$$-th fully connected component, $$m = 1, \ldots, M$$. These are sorted in descending order.

iters

A vector of length $$M$$. The $$m$$-th entry is the number of iterations it took for the algorithm to converge for the $$m$$-th component, for $$m = 1, \ldots, M$$. Note that if the algorithm has not converged in any component, a warning will be produced.

converged

A logical vector of length $$M$$, indicating whether the algorithm has converged for the $$m$$-th component in maxit iterations.

N

A list of length $$M$$. The $$m$$-th list item is a matrix where each element $$n_{ij}$$ is the number of times item $$i$$ played against item $$j$$, for the items in the $$m$$-th component. The rows and columns are arranged in the same order as the ordered pi vector(s).

diagonal

A list of length $$M$$. The $$m$$-th item is a vector of the diagonal elements of the btdata$wins matrix, for the items in the $$m$$-th fully-connected component. These values are used as the fitted values for the diagonal of the matrix output in fitted.btfit. names_dimnames The names of the dimnames of the original btdata$wins matrix.

## Details

Let there be $$K$$ items, let $$\pi_k$$ be the Bradley-Terry strength parameter of item $$k$$, for $$k = 1, \ldots, K$$ and let $$\pi$$ be the vector of all the $$\pi_k$$. Let $$w_{ij}$$ be the number of times item $$i$$ wins against item $$j$$, let $$n_{ij} = w_{ij} + w_{ji}$$ be the number of times they play, with $$w_{ii} = 0$$ by convention and let $$W_i = \sum_{j=1}^K w_{ij}$$. Then the Bradley-Terry model states that the probability of item $$i$$ beating item $$j$$, $$p_{ij}$$, is: $$p_{ij} = \frac{\pi_i}{\pi_i + \pi_j}.$$

The comparison graph, $$G_W$$, has the $$K$$ players as the nodes and a directed edge from node $$i$$ to node $$j$$ whenever item $$i$$ has beaten item $$j$$ at least once. The MLE of the Bradley-Terry model exists and is finite if and only if the comparison graph is fully-connected (i.e. if there is a directed path from node $$i$$ to node $$j$$ for all items $$i$$ and $$j$$).

Assuming that the comparison graph of the data is fully-connected, the MLE of the Bradley-Terry model can be found using the MM-algorithm (Hunter, 2004).

If the comparison graph of the data is not fully-connected, there are two principled options for fitting the Bradley-Terry model. One is to find the MLE within each fully-connected component. The other is to find the Bayesian MAP estimate, as suggested by Caron & Doucet (2012), where a $$Gamma(a,b)$$ gamma prior is placed on each $$\pi_i$$, and the product of these is taken as a prior on $$\pi$$. The MAP estimate can then be found with an EM-algorithm. When $$a = 1$$ and $$b = 0$$, the EM and MM-algorithms are equivalent and the MAP estimate and MLE are identical. The rate parameter of the Gamma prior, $$b$$, is not likelihood identifiable. When $$a > 1$$, $$b$$ is set to $$aK - 1$$, where $$K$$ is the number of items in the component; this choice of $$b$$ minimises the number of iterations needed for the algorithm to converge.

The likelihood equations give $$a - 1 + W_i = b\pi_i + \sum_{j \neq i} \frac{n_{ij}\pi_i}{\pi_i + \pi_j},$$

for $$i = 1, \ldots, K$$. For the algorithm to have converged, we want $$\pi$$ to be such that the LHS and RHS of this equation are close for all $$i$$. Therefore, we set the convergence criteria as $$\left|\frac{a - 1 + W_i}{b\pi_i + \sum_{j \neq i} \frac{n_{ij}\pi_i}{\pi_i + \pi_j}} - 1\right| < \epsilon,$$

for all $$i$$.

Since the equations do not typeset well within the R help window, we recommend reading this section online: https://ellakaye.github.io/BradleyTerryScalable/reference/btfit.html.

## References

Caron, F. and Doucet, A. (2012) Efficient Bayesian Inference for Generalized Bradley-Terry Models. Journal of Computational and Graphical Statistics, 21(1), 174-196.

Hunter, D. R. (2004) MM Algorithms for Generalized Bradley-Terry Models. The Annals of Statistics, 32(1), 384-406.

btdata, summary.btfit, coef.btfit, fitted.btfit, btprob, vcov.btfit, simulate.btfit

## Examples

citations_btdata <- btdata(BradleyTerryScalable::citations)
fit1 <- btfit(citations_btdata, 1)
summary(fit1)#> $call #> btfit(btdata = citations_btdata, a = 1) #> #>$item_summary
#> # A tibble: 4 x 3
#>      component         item   estimate
#>          <chr>        <chr>      <dbl>
#> 2 full_dataset   Biometrika  0.7897537
#> 3 full_dataset         JASA  0.3095638
#> 4 full_dataset Comm Statist -2.1597241
#>
#> $component_summary #> # A tibble: 1 x 4 #> component num_items iters converged #> <chr> <int> <int> <lgl> #> 1 full_dataset 4 2 TRUE #> toy_df_4col <- codes_to_counts(BradleyTerryScalable::toy_data, c("W1", "W2", "D")) toy_btdata <- btdata(toy_df_4col) fit2a <- btfit(toy_btdata, 1) summary(fit2a)#>$call
#> btfit(btdata = toy_btdata, a = 1)
#>
#> $item_summary #> # A tibble: 7 x 3 #> component item estimate #> <chr> <chr> <dbl> #> 1 2 Han 0.69564153 #> 2 2 Gal 0.41253614 #> 3 2 Fin -1.10817768 #> 4 3 Cyd 0.59239992 #> 5 3 Amy 0.03250119 #> 6 3 Ben -0.24307179 #> 7 3 Dan -0.38182932 #> #>$component_summary
#> # A tibble: 2 x 4
#>   component num_items iters converged
#>       <chr>     <int> <int>     <lgl>
#> 1         2         3     6      TRUE
#> 2         3         4    10      TRUE
#> fit2b <- btfit(toy_btdata, 1.1)
summary(fit2b)#> $call #> btfit(btdata = toy_btdata, a = 1.1) #> #>$item_summary
#> # A tibble: 8 x 3
#>      component  item    estimate
#>          <chr> <chr>       <dbl>
#> 1 full_dataset   Eve  1.90113420
#> 2 full_dataset   Cyd  0.47237293
#> 3 full_dataset   Han  0.24535391
#> 4 full_dataset   Amy -0.07655328
#> 5 full_dataset   Gal -0.10175687
#> 6 full_dataset   Ben -0.42296697
#> 7 full_dataset   Dan -0.53638389
#> 8 full_dataset   Fin -1.48120003
#>
#> $component_summary #> # A tibble: 1 x 4 #> component num_items iters converged #> <chr> <int> <int> <lgl> #> 1 full_dataset 8 101 TRUE #> fit2c <- btfit(toy_btdata, 1, subset = function(x) length(x) > 3) summary(fit2c)#>$call
#> btfit(btdata = toy_btdata, a = 1, subset = function(x) length(x) >
#>     3)
#>
#> $item_summary #> # A tibble: 4 x 3 #> component item estimate #> <chr> <chr> <dbl> #> 1 3 Cyd 0.59239992 #> 2 3 Amy 0.03250119 #> 3 3 Ben -0.24307179 #> 4 3 Dan -0.38182932 #> #>$component_summary
#> # A tibble: 1 x 4
#>   component num_items iters converged
#>       <chr>     <int> <int>     <lgl>
#> 1         3         4    10      TRUE
#>