Go to content ALT+c

Intro to Computational BioStatistics with R  (Fall 2018) (Old site; new site is at https://scinet.courses)

Wednesday May 29, 2024 - 00:00

5.5 Assignment 5

Due date: Thursday, October 25th at midnight (Thursday night).


For this assignment we will explore a powerful computational technique known as Monte Carlo. At the core of Monte Carlo (MC) techniques is the generation of random samples from probability distributions, and using those random samples to determine the probability distributions of new quantities.  This relies heavily on statistical concepts such as the law of large numbers and the central limit theorem. Monte Carlo methods are used in a large variety of contexts from numerical integration, simulations, including genetical variations, up to sophisticated predict algorithms. Combined with another technique known as Markov Chains, it has been recognized as one of the most powerful algorithms from last century: Markov Chain Monte Carlo (MCMC).

For this assignment we will be implementing a simple version of an MC algorithm and applying it to the calculation of the likelihood of a specific event occuring. The example we will use is based on these two manuscripts from the British Journal of Sport Medicine: Rafferty et al., Br J Sports Med (2018) and Cross et al., Br J Sports Med (2017). In these papers the authors reported that the likelihood of a rugby player getting a concussion depends, on the one hand, on the number of games in which a player participates during the season and, on the other hand, several other factors such as player acceleration, tackler speed, head contact type and tackle type. For our analysis we will assume that the only factors affecting concussion likelihood are the number of games played and the momentum (speed times mass) of the tackler. We will assume that a typical player will participate in an average of 25 games per season, following a Poisson distribution.  The momentum of a tackler will be given by the exponential distribution with a rate of 5, in dimensionless units. Assuming that a concussion occurs when a player participates in at least 20 games and is tackled by a player carrying a momentum of at least 0.2, we would like to know the likelihood of a player having a concussion during the season.  To answer this we will perform the following steps:


Warning: Read this section!

0) You will continue using version control ("git") in this assignment as you develop your scripts; but instead of creating a repo from scratch you will clone one that is already initialized, and use it as the starting point for your assignment's version control.  You will treat this repository as your own.
This repository contains a utilities file with a plotting tool that you will use in the assignment.

---------------------------------------------------

$ pwd
/Users/mponce/MSC1090

$ ls
assignment0 assignment1 assignment2  assignment3 assignment4 

$ git clone https://gitrepos.scinet.utoronto.ca/public/MSC1090-Assignment5.git
Cloning into 'MSC1090-Assignment5'...

$ ls
assignment0 assignment1 assignment2  assignment3 assignment4 MSC1090-Assignment10

$ cd MSC1090-Assignment5

$ ls
README.md plottingTools.R

$ git log
commit 8ade1456105ae113bace1aa7a9f53fe15614672a (HEAD -> master, origin/master, origin/HEAD)
Author: Marcelo Ponce <mponce@scinet.utoronto.ca>
Date: Wed Oct 17 21:39:37 2018 -0400

adding utility file containing a plotting function to visualize convergence of MC calculation

commit ba132caed76b05e0dec3f95e9dda8ca3d4463319
Author: Marcelo Ponce <mponce@scinet.utoronto.ca>
Date: Wed Oct 17 21:23:45 2018 -0400

Starting project 'MSC1090-Assignment5': Repository for assignment #5: MC probability calculation

---------------------------------------------------

Do not do 'git init' for this assignment!  As you can see from the above, when you run 'git clone ...' a copy ('clone') of an existing git repository created, which contains the data needed for this assignment.
Be sure to continue using version control ("git"), as you develop your script.  Do "git add ....,  git commit" repeatedly as you add to your script.
You will hand in the output of "git log" for your assignment repository as part of the assignment. In addition to the original commits to this repo, you must have a significant number of commits representing the modifications, alterations and changes in your scripts.
If your log does not show a significant and meaningful number of commits (in addition to the original commits!), you will loose points.


1) Create a file named myPDFs.R containing the following 3 functions:

  • a) A function named SamplingGames which will generate and return N samples drawn from a Poisson distribution with a specific mean. The function should return a vector with the random samples. The function will receive two arguments:
    1. a number indicating the number of samples to generate
    2. an optional argument indicating the mean value in the distribution to be used with a default value of 25.
  • b) A function named SamplingTacklers which will generate and return N samples drawn from an exponential distribution with a specific rate. The function should return a vector with the random samples. The function will receive two arguments:
    1. a number indicating the number of samples to generate.
    2. an optional argument indicating the rate value in the distribution to be used with a default value of 5.
  • c) A function named SimProbConcusion which, using the two functions created above, computes the probability of a rugby player having a concussion. The function will receive three arguments:
    1. a number representing the number of samples to consider.
    2. an optional argument that will determine the minimum number of games, thGs, at which a player will suffer a concussion; the value to be used if this argument is not specified is 20.
    3. another optional argument that will determine the minimum momentum of the tackler, thTs, (in dimensionless units) at which a player will suffer a concussion; the value to be used if this argument is not specified is 0.2.
    The function will return the calculation of the probability by following these steps:
    1. it will generate N random number of games using the function SamplingGames.
    2. it will generate N random values for the momentum of the tacklers using the function SamplingTackles.
    3. it will store these values in a data frame.
    4. it will compute the probability of the likelihood of a concussion occurring by considering the number of cases where the number of games is greater than the minimum number of games needed to have a concussion (thGs) and where the tackler has a momentum greater than the minimum momentum for a concussion to occur (thTs), over the total number of cases, N.
    5. it will return the value computed for the probability.

2) Create an R script called concussionMC.R that will perform the following steps:

  1. source your utilities file myPDFs.R
  2. source the auxiliary file plottingTools.R provided in the cloned repository.
  3. create a vector with values for the number of samples to consider ranging from 1000 to 1000000, in steps of 1000.
  4. compute the probability of a concussion for each case in the vector from step 3, using one the *apply family of functions.
  5. execute the function plotMC() (you are welcome to create your own function for this point or use the one provided in the repository for this assignment) passing the vector with the samples sizes and the computation of the probabilities from the previous step.

Notice that as this script does not receive any arguments from the command line, it could be run from the shell using Rscript or within R. If you run it from  the shell the plotMC() function it will generate a file named "Rplot.pdf" containing the plot, while if you run the script within R, the plot will pop up in your screen.

The following is an example of how your final plot might look like 

How can you make sense of the plot and which one do you think is the value for the actual probability of a rugby player to get a concussion in a season?


Submit your "concussionMC.R" and "myPDFs.R" scripts and the output of "git log" from your assignment repository to the 'Assignment Dropbox'. Both R scripts must be added and committed frequently to the repository. To capture the output of 'git log' use redirection ( git log > git.log, and hand in the "git.log" file).  Assignments will be graded on a 10 point basis. Due date is October 25th 2018 (midnight), with 0.5 penalty point per day off for late submission until the cut-off date of November 1st 2018, at 1:00pm.


Last Modified: Monday Oct 22, 2018 - 16:29. Revision: 71. Release Date: Thursday Oct 18, 2018 - 12:00.


Content Navigation


Course Calendar


Forum Posts


Course Events



Questions? Contact Support.
Web site engine's code is copyright © ATutor®.
Modifications and code of added modules are copyright of SciNet.