Go to content ALT+c

Intro to Computational BioStatistics with R  (Fall 2020) (Old site; new site is at https://scinet.courses)

Friday May 31, 2024 - 17:23

6.9 Assignment 9

Due date: Thursday, November 19th at midnight


Be sure to use version control ("git"), as you develop your script.  Do "git add ....,  git commit" repeatedly as you add to your script.  You will hand in the output of "git log" for your assignment repository as part of the assignment.


In this assignment you will be creating two functions for creating a couple of professional looking plots.
Ideally you could use this approach for any figure you would need to produce for your papers, thesis, etc.
For doing so, we would like to invite you to use a representative set from your own data. It doesn't need to be unpublished nor new data, just something that might resemble the actual data you have to deal with in your research.
If you don't have any data available, you can still use other data that is close to your interests, either from the R data sets or from other websites, like the Open Data Toronto.
If you are going to use some of the R data sets, do *not* use the ones we have been presenting and discussing in class and try to avoid overlapping with other students as well!

If you need some inspiration we invite you to visit our "Visualization Gallery" which is entirely composed by outstanding submissions from students from previous years... maybe next year we could have your plots displayed here as well...

Your script, named "generatePlots.R", should receive two command line arguments:

  • the first one, as usual will be for specifying the filename containing your data
  • the second one for indicating which type of plot will be generated
Depending on the value of the second argument, the script will perform the following actions:

  1. if the 2nd command line argument is 2D, then the script will generate a professional/publication quality two-dimensional plot, preferable using your own data, following the criteria and conventions discussed in class.
  2. if the 2nd command line argument is 3D, then the script will generate a higher-dimensional (eg. contour/3D/heatmaps) professional quality plot, again preferable using data from your own research.
    You may use the same data used in part 1). Please make sure your plot follows the criteria outlined in class.

    The plots, in particular for part 1), should contain more than one graphical representation, ie. it can *not* be just dots representing the data; it should be something like the data points + a fit, or a boxplot or barplots; ie. at least two graphical representations should be present!
    Please select an appropriate file type to save the plots generated in 1) and 2), such that it preserves the quality of your figure!


Within your script, add comments to briefly describe what data or analysis are you using, and how you are plotting it.‎

    Additionally,
  • you will have to create a git-repository
  • your script should have implemented defensive programming strategies for dealing with the command line argument
  • you will have to have at least two modules: a main driver script and a utilities file (named 'plottingTools.R') where the functions used for plotting purposes in the main driver will be defined.
  • the functions should have arguments for receiving information and return-statements in the cases where you need to communicate further information to the rest of the code.
  • you must have a loading function to load the data, either yours or from wherever you will use.
  • no global variables of any kind! ie. functions can not access variables that are not passed to them!
  • you can also use any of the functions you have been developing in previous assignments, in case you need to perform any statistical analysis in order to generate your plots.
  • include the usual defensive programming for command line arguments and data file checks

Please submit:

  • 1) your generatePlots.R script, plottingTools.R script, and the utilities script in case you were using any additional function located on it.
  • 2)the final products of your R script, i.e. two plot files
  • 3)your data, so that when the script is run it will run successfully. If your data is too big to submit, contact us so that another means of getting us the data can be arranged. If you download the data from the internet, you may directly download the data from within your script. The point is that, however you accomplish it, the script will run successfully on our computers, without modification!
  • 4) The output of 'git log' for this assignment.

Submit your main driver script and Utiltites file, as well as any data set you decided to use, and the output of "git log" from your assignment repository, to the 'Assignment Dropbox'.

To capture the output of 'git log' use redirection, ie. git log > git.log, and hand in the "git.log" file.
It could be a good idea to submit the final version of your plots too.

Assignments will be graded on a 10 point basis.
Due date is November 19th 2020 (midnight), with 0.5 penalty point per day off for late submission until the cut-off date of November 26th, 2020, at noon.

Last Modified: Thursday Nov 12, 2020 - 12:05. Revision: 6. Release Date: Thursday Nov 12, 2020 - 02:00.


Content Navigation


Course Calendar


Forum Posts


Related



Questions? Contact Support.
Web site engine's code is copyright © ATutor®.
Modifications and code of added modules are copyright of SciNet.