Calculates the Bradley-Terry probabilities of each item in a fully-connected component of the comparison graph, \(G_W\), winning against every other item in that component (see Details).

btprob(object, subset = NULL, as_df = FALSE)

Arguments

object

An object of class "btfit", typically the result ob of ob <- btfit(..). See btfit.

subset

A condition for selecting one or more subsets of the components. This can either be a character vector of names of the components (i.e. a subset of names(object$pi)), a single predicate function (that takes a vector of object$pi as its argument), or a logical vector of the same length as the number of components, (i.e. length(object$pi)).

as_df

Logical scalar, determining class of output. If TRUE, the function returns a data frame. If FALSE (the default), the function returns a matrix (or list of matrices). Note that setting as_df = TRUE can have a significant computational cost when any of the components have a large number of items.

Value

If as_df = FALSE, returns a matrix where the \(i,j\)-th element is the Bradley-Terry probability \(p_{ij}\), or, if the comparison graph, \(G_W\), is not fully connected and btfit has been run with a = 1, a list of such matrices for each fully-connected component of \(G_W\). If as_df = TRUE, returns a five-column data frame, where the first column is the component that the two items are in, the second column is item1, the third column is item2, the fourth column is the Bradley-Terry probability that item 1 beats item 2 and the fifth column is the Bradley-Terry probability that item 2 beats item 1. If the original btdata$wins matrix has named dimnames, these will be the colnames for columns one and two. See Details.

Details

Consider a set of \(K\) items. Let the items be nodes in a graph and let there be a directed edge \((i, j)\) when \(i\) has won against \(j\) at least once. We call this the comparison graph of the data, and denote it by \(G_W\). Assuming that \(G_W\) is fully connected, the Bradley-Terry model states that the probability that item \(i\) beats item \(j\) is $$p_{ij} = \frac{\pi_i}{\pi_i + \pi_j},$$ where \(\pi_i\) and \(\pi_j\) are positive-valued parameters representing the skills of items \(i\) and \(j\), for \(1 \le i, j, \le K\). The function btfit can be used to find the strength parameter \(\pi\). It produces a "btfit" object that can then be passed to btprob to obtain the Bradley-Terry probabilities \(p_{ij}\).

If \(G_W\) is not fully connected, then a penalised strength parameter can be obtained using the method of Caron and Doucet (2012) (see btfit, with a > 1), which allows for a Bradley-Terry probability of any of the K items beating any of the others. Alternatively, the MLE can be found for each fully connected component of \(G_W\) (see btfit, with a = 1), and the probability of each item in each component beating any other item in that component can be found.

References

Bradley, R. A. and Terry, M. E. (1952). Rank analysis of incomplete block designs: 1. The method of paired comparisons. Biometrika, 39(3/4), 324-345.

Caron, F. and Doucet, A. (2012). Efficient Bayesian Inference for Generalized Bradley-Terry Models. Journal of Computational and Graphical Statistics, 21(1), 174-196.

See also

btfit, btdata

Examples

citations_btdata <- btdata(BradleyTerryScalable::citations) fit1 <- btfit(citations_btdata, 1) btprob(fit1)
#> 4 x 4 sparse Matrix of class "dgCMatrix" #> citing #> cited JRSS-B Biometrika JASA Comm Statist #> JRSS-B . 0.5672532 0.67936229 0.9615848 #> Biometrika 0.43274683 . 0.61779270 0.9502388 #> JASA 0.32063771 0.3822073 . 0.9219605 #> Comm Statist 0.03841516 0.0497612 0.07803945 .
btprob(fit1, as_df = TRUE)
#> # A tibble: 6 x 5 #> component cited citing prob1wins prob2wins #> <chr> <chr> <chr> <dbl> <dbl> #> 1 full_dataset JRSS-B Biometrika 0.5672532 0.43274683 #> 2 full_dataset JRSS-B JASA 0.6793623 0.32063771 #> 3 full_dataset Biometrika JASA 0.6177927 0.38220730 #> 4 full_dataset JRSS-B Comm Statist 0.9615848 0.03841516 #> 5 full_dataset Biometrika Comm Statist 0.9502388 0.04976120 #> 6 full_dataset JASA Comm Statist 0.9219605 0.07803945
toy_df_4col <- codes_to_counts(BradleyTerryScalable::toy_data, c("W1", "W2", "D")) toy_btdata <- btdata(toy_df_4col) fit2a <- btfit(toy_btdata, 1) btprob(fit2a)
#> $`2` #> 3 x 3 sparse Matrix of class "dgCMatrix" #> player2 #> player1 Han Gal Fin #> Han . 0.5703074 0.8586132 #> Gal 0.4296926 . 0.8206436 #> Fin 0.1413868 0.1793564 . #> #> $`3` #> 4 x 4 sparse Matrix of class "dgCMatrix" #> player2 #> player1 Cyd Amy Ben Dan #> Cyd . 0.6364291 0.6975107 0.7259617 #> Amy 0.3635709 . 0.5684605 0.6021258 #> Ben 0.3024893 0.4315395 . 0.5346338 #> Dan 0.2740383 0.3978742 0.4653662 . #>
btprob(fit2a, as_df = TRUE)
#> # A tibble: 9 x 5 #> component player1 player2 prob1wins prob2wins #> <chr> <chr> <chr> <dbl> <dbl> #> 1 2 Han Gal 0.5703074 0.4296926 #> 2 2 Han Fin 0.8586132 0.1413868 #> 3 2 Gal Fin 0.8206436 0.1793564 #> 4 3 Cyd Amy 0.6364291 0.3635709 #> 5 3 Cyd Ben 0.6975107 0.3024893 #> 6 3 Amy Ben 0.5684605 0.4315395 #> 7 3 Cyd Dan 0.7259617 0.2740383 #> 8 3 Amy Dan 0.6021258 0.3978742 #> 9 3 Ben Dan 0.5346338 0.4653662
btprob(fit2a, subset = function(x) "Amy" %in% names(x))
#> $`3` #> 4 x 4 sparse Matrix of class "dgCMatrix" #> player2 #> player1 Cyd Amy Ben Dan #> Cyd . 0.6364291 0.6975107 0.7259617 #> Amy 0.3635709 . 0.5684605 0.6021258 #> Ben 0.3024893 0.4315395 . 0.5346338 #> Dan 0.2740383 0.3978742 0.4653662 . #>
fit2b <- btfit(toy_btdata, 1.1) btprob(fit2b, as_df = TRUE)
#> # A tibble: 28 x 5 #> component player1 player2 prob1wins prob2wins #> <chr> <chr> <chr> <dbl> <dbl> #> 1 full_dataset Eve Cyd 0.8067082 0.1932918 #> 2 full_dataset Eve Han 0.8396707 0.1603293 #> 3 full_dataset Cyd Han 0.5565123 0.4434877 #> 4 full_dataset Eve Amy 0.8784344 0.1215656 #> 5 full_dataset Cyd Amy 0.6338864 0.3661136 #> 6 full_dataset Han Amy 0.5797890 0.4202110 #> 7 full_dataset Eve Gal 0.8811003 0.1188997 #> 8 full_dataset Cyd Gal 0.6397156 0.3602844 #> 9 full_dataset Han Gal 0.5859168 0.4140832 #> 10 full_dataset Amy Gal 0.5063006 0.4936994 #> # ... with 18 more rows