Search This Blog

Probability Problem - Competition with 5 players

This problem is from the book of M.G.Bulmer "Principles of Statistics"

Problem 2.6, page 28.

Five players enter a competition in which each plays a game against each of the others. 

The two players in each game have an equal chance of winning it; 

a game must end in a win for one of the players who then scores a point. 

The competition is won by the player or players scoring most points. 

What is the probability that a particular player will

(a) win the competition outright;

(b) share in winning with other players?

I could not solve this by paper and pencil so I wrote a Python program which solves it.

Here's the program. 


import pprint

def main():

    mapping = {}
    mp = {}
    
    # There are 10 games in total.
    # So there are 2^10 = 1024 possible outcomes
    # from the competition as a whole.
    
    k = 0
    for i in range(5):
        for j in range(5):
            if (i<j):
                mapping[k] = (i,j)
                k = k+1
    
    # print(mapping)
    for num in range(1024):
        lst = []
        deg = 1
        for q in range(10):
            lst.append(1 if (num & deg) else 0)
            deg = deg << 1
        mp[num] = lst
    
    pprint.pp(mp)
    
    
    player_0_wins = 0
    player_0_wins_outright = 0
    
    for competition in mp:
        
        print()
        print()
        print()
        
        points = {}
    
        n = 0
        
        for game in mp[competition]:
            
            (p,q) = mapping[n]
            
            if not p in points:
                points[p] = 0
            if not q in points:
                points[q] = 0
                
            points[p] = points[p] + game
            points[q] = points[q] + (1-game)
            
            print("Player", p, ":", "Player", q, " = ", game, " : ", 1-game)
            
            n = n+1
        
            
        print("points = ", points)
        
        # Here we suppose that:
        c1 = 1 # player 0 wins
        c2 = 1 # player 0 wins outright
        
        # Now we check if these suppositions are true:
        for p1 in points:
            if (p1 != 0) and points[p1] >= points[0]:
                c2 = 0 # does not win outright; p1 has the same number of points or more points than player 0
                break
                
        for p1 in points:
            if (p1 != 0) and points[p1] > points[0]:
                c1 = 0 # does not win; p1 has more points than player 0
                break
        
        print("Player 0 wins: ", 1==c1)
        print("Player 0 wins outright: ", 1==c2)
                
        player_0_wins = player_0_wins + c1
        player_0_wins_outright = player_0_wins_outright + c2
        
    
    print()
    print()
    print()
    
    print("Player 0 wins outright probability: ", player_0_wins_outright, '/', len(mp))
        
    print("Player 0 shares the win probability: ", player_0_wins - player_0_wins_outright, '/', len(mp))
    
    print()
    print()
    print()
    
    
    
if __name__ == "__main__":
    main()






Continuous Differentiability

If we have a single-variable function $$f : \mathbb{R} \rightarrow \mathbb{R}$$ it is quite clear what it means when we say $f$ is $C^k$ (or of class $C^k$), $k \in \mathbb{Z}, k \ge 0$: 

  • when $k \ge 1$ that means all the derivatives of $f$ up to the $k$-th one exist, and in addition to that $f^{(k)}$ is continuous 
  • when $k=0$ that means simply that $f$ is continuous


Now suppose we have the multi-variable function $$f : \mathbb{R}^m \rightarrow \mathbb{R}$$ 

Here $m \in \mathbb{N}$ and $m \ge 2$.


In that case what does it mean when we say that $f$ is of class $C^k$, $k \in \mathbb{Z}, k \ge 0$? 

I met this concept on page 145 of the Vector Calculus book by Baxandall and Liebeck, and I realized that this concept is not quite clear to me. So I did some research on that, and it turned out this concept pertains to the partial derivatives of $f$.


The best definition that I found is the one given here.


So in the multi-variable case, it turns out that

  • when we say $f$ is of class $C^k$, $k \in \mathbb{N}$, that means that all the partial derivatives of $f$ of order $s \le k$ exist and are continuous
  • when we say $f$ is of class $C$ or $C^0$, that means just that $f$ is continuous

Finally, when the function $f$ is of class $C^1$, it is also called continuously differentiable.


Partial Derivatives Notation

Suppose that we have the function $$f : \mathbb{R}^2 \rightarrow \mathbb{R}$$ which is a function of two real variables. 


We denote its second order partial derivatives in the following way

$$\frac{\partial^2 f}{\partial x^2}$$

$$\frac{\partial^2 f}{\partial y^2}$$

$$\frac{\partial^2 f}{\partial x \partial y}$$

$$\frac{\partial^2 f}{\partial y \partial x}$$


In this notation it is important to remember that by definition

$$\frac{\partial^2 f}{\partial x \partial y} = \frac{\partial f}{\partial x}\left(\frac{\partial f}{\partial y}\right)$$

$$\frac{\partial^2 f}{\partial y \partial x} = \frac{\partial f}{\partial y}\left(\frac{\partial f}{\partial x}\right)$$

So if we write x first in the denominator, that means x is the second variable we differentiate by, 

and if we write y first in the denominator, that means y is the second variable we differentiate by. 

In other words, first we differentiate by the right-most variable (from the denominator), second we differentiate by the second right-most variable (from the denominator) and so on. Of course these are not really denominators (in the algebraic sense) but OK, they look like denominators.


The same applies to functions of more than two variables.


Suppose $$g : \mathbb{R}^3 \rightarrow \mathbb{R}$$ is a function of three real variables. 


Then by definition the expression 

$$\frac{\partial^3 g}{\partial x \partial y \partial z}$$ means the following


$$\frac{\partial^3 g}{\partial x \partial y \partial z} = \frac{\partial g}{\partial x}\left(\frac{\partial g}{\partial y} \left(\frac{\partial g}{\partial z}\right)\right)$$


In other words here we have differentiated first by $z$, then by $y$, and finally by $x$.



Apache Spark - Word Count Example (using RDDs)

This is a simple PySpark script which counts how many times each word occurs in a given text file. Thе script is not really perfectly polished or production-ready. Why? Well, for example words like "don't" are treated as two words "don" and "t". For producing a more professional version of this code we might want to use some specialized Python NLP library like e.g. NLTK which should be able to extract the words from the text file in a better way. Still, this script is quite useful for illustration purposes.


import re

from pyspark import SparkConf, SparkContext



def get_all_words_in_lowercase(textLine):

    # >>> The input value is a line of text from the book

    # >>> The returned value is a list of all the words from that line in lowercase

    return re.compile(r'\W+').split(textLine.lower())



conf = SparkConf().setMaster("local").setAppName("WordCount")

sc = SparkContext(conf = conf)



input = sc.textFile("file:///D:/PlayGround/book.txt")

# Now in input we have all lines from the text file. Each element of the RDD is a line of text.



rdd1 = input.flatMap(get_all_words_in_lowercase);

# We map each line to a list of all words which occur in that line.

# OK, now in rdd1 we have all words from the text file.



rdd2 = rdd1.map(lambda word : (word, 1));

# We map each word to the couple (word, 1)

# This way we produce rdd2



rdd3 = rdd2.reduceByKey(lambda x,y : x+y);

# We now count how many times each word occurs in rdd2.

# This way we produce rdd3 which contains

# couples of the form (word, word_count)



rdd4 = rdd3.map(lambda couple : (couple[1], couple[0]))

# We now change the roles of key and value.

# This way from rdd3 we get rdd4 which contains

# couples of the form (word_count, word)



rdd5 = rdd4.sortByKey(ascending=True)


lst5 = rdd5.collect()


print("Number of unique words ---> " + str(len(lst5)))


for pair in lst5:

    print(str(pair[1]) + " ---> " + str(pair[0]) + " occurrences")