6.4 Assignment 4Due date: Thursday, February 11, 2021 at 11:55 pm.
You must use version control `git`, as you develop your scripts. Start by creating a new directory and use the following commands to initialize the git repository
Perform `git add` and `git commit` repeatedly as you add code to your scripts. You will hand in the output of `git log` for your assignment repository as part of the assignment. You must have a significant number of commits representing the modifications, alterations and changes in your scripts. If your log does not show a significant and meaningful number of commits, you will lose marks. For this assignment we will be working with real 311 Service Request data coming from the City of Toronto. Details about the data can be found on the City of Toronto 311 Service Requests page. Part 1The data is stored in the zip file located here: https://support.scinet.utoronto.ca/~alexey/T311_2010-2020.zip. This file contains Comma Separated Values (CSV) files.
To download and uncompress the data set, use the following commands at the Linux command line:
The files contain the Toronto 311 Services Request Data for the years 2010 to 2020. Each file contains the data corresponding to the year specified in its name, eg. SR2010.csv, ..., SR2020.csv. Note that it is a good idea to do some initial exploration of the data (read the data in, use `str()` to examine the names of the columns) before you proceed to the next section. Part 2Write an R script, called `process311.R`, which performs the following steps.
Your script should output the following message, when run from the shell terminal:
Note that part 2c) is the only part that should have a loop. All other questions should be answered using slicing.
Note the following code, which may inspire your answers for some of the above sections:
Part 3
Finally, write a shell script named `processALLyears.sh` that loops over all CSV files in your directory and calls the previous R script so that all the years are processed sequentially. The following is the skeleton of a `for` loop in bash. This code should inspire your shell script.
Start with this, remove and add the necessary commands so that this script executes your R script for all the SR20XX.csv files. Be sure to comment your code, indent your code blocks, and use meaningful variable names. Submit your `process311.R` and `processALLyears.sh` scripts and the output of `git log` from your assignment repository to the 'Assignment Dropbox'. To capture the output of `git log` use redirection `git log > git.log`, and hand in the `git.log` file. Assignments will be graded on a 10 point basis. Due date is February 11, 2021 at 11:55pm, with 0.5 point penalty per day for late submission until the cut-off date of February 18, 2021 at 11:00am.
Last Modified: Friday Feb 5, 2021 - 07:58. Revision: 6. Release Date: Thursday Feb 4, 2021 - 11:00.
|
|
Questions? Contact Support.
Web site engine's code is copyright © ATutor®. Modifications and code of added modules are copyright of SciNet. |