Go to content ALT+c

Intro to Computational BioStatistics with R  (Fall 2018) (Old site; new site is at https://scinet.courses)

Friday May 31, 2024 - 10:07

5.8 Assignment 8

Due date: Thursday, November 15th at midnight


Be sure to use version control ("git"), as you develop your script.  Do "git add ....,  git commit" repeatedly as you add to your script.  You will hand in the output of "git log" for your assignment repository as part of the assignment.


In this assignment we would like to know a little bit about your own research and study fields.
For doing so, we would like to invite you to use a representative set from your *own* data. It doesn't need to be unpublished nor new data, just something that might resemble the actual data you have to deal with in your research.
If you don't have any data available, you can still use other data that is close to your interests, either from the R data sets or from other websites, like the Open Data Toronto.
If you are going to use some of the R data sets, do *not* use the ones we have been presenting and discussing in class!

The goal of this assignment is that you will have to incorporate several of the tools and techniques we have been discussing in the course so far.

Mandatory points:

  • you will have to create a git-repository
  • you will have to have at least two modules: a main driver script and a utilities filewhere the functions used in the main driver will be defined.
  • the functions should have arguments and return statements, for receiving information and returning what ever they were aimed to do.
  • you must have one loading function to load the data, either yours or from wherever you will use.
  • no global variables of any kind! ie. functions can not access variables that are not passed to them!
  • you will need to incorporate at least four of the statistical techniques discussed in class, each one in its own function:
    • probability/statistical estimators computations
    • model fitting
    • model diagnostics
    • Monte Carlo calculations
    • statistical hypothesis testing
    • statistical power analysis
    • survival analysis
    • PCA
    • . . .

Additionally:

  • you may use some of the functions you have been creating for assignment #5, #6 and #7, or modifications of those.
  • you are welcome to include other statistical methods that we haven't discussed in class, but you will need to briefly explain them and also incorporate them as functions.
  • you can also include shell scripting, in case you need to handle several files, as we did in assignment #4.
  • you may re-use no more than two types of analysis from previous assignments. 

You will have to submit:

  • the git log for the repository you created
  • any data file used in the analysis
  • your main driver and utilities file
  • a short report, including the following sections:
    • Introduction: where you will briefly introduce the field you work in and describe the data you will use and what the goal of your analysis is.
    • Methods: you will describe the statistical methods you will implement to analyze your data, if you are using a method not discussed in class, please provide a short description and justification why are you choosing such a method.
    • Implementation: here, you will describe how did you implement the methods discussed in the previous section
    • Results: in this section you will present the results you obtained, interpreting the actual numerical values in the context of the data and if you have figures please add them here and include a brief description and discussion about them too.
    • Discussion: Explain what advantages or disadvantages did you find by utilizing this implementation, we are specially interested in cases where you can also compare to other tools such as SPSS, STATA, SAS, G*Power, etc... specially if you use those in your lab/group, and how they compare to your implementation.
    • References: Include here, references (if any) for either citing your data and/or statistical methods.

Submit your main driver script and Utiltites file, as well as any data set you decided to use, your report and the output of "git log" from your assignment repository, to the 'Assignment Dropbox'.

To capture the output of 'git log' use redirection, git log > git.log, and hand in the "git.log" file.

Assignments will be graded on a 10 point basis.
Due date is November 15th 2018 (midnight), with 0.5 penalty point per day off for late submission until the cut-off date of November 22th, 2018, at 11:00am.

Last Modified: Thursday Nov 15, 2018 - 13:19. Revision: 5. Release Date: Wednesday Nov 7, 2018 - 17:00.


Content Navigation


Course Calendar


Forum Posts


Course Events



Questions? Contact Support.
Web site engine's code is copyright © ATutor®.
Modifications and code of added modules are copyright of SciNet.