Go to content ALT+c

Quantitative Applications for Data Analysis  (Winter 2017) (Old site; new site is at https://scinet.courses)
Registration is closed

Friday May 31, 2024 - 19:01

5.4 Assignment 4

Due date: Friday, February 10th at midnight (Friday night)


 

0) Be sure to use version control ("git"), as you develop your script.  Do "git add ....,  git commit" repeatedly as you add to your script.  You will hand in the output of "git log" for your assignment repository as part of the assignment. 

 


 

1)  Consider the product of two variables, sample1 and sample2, which are drawn from known distributions.
This product is assigned to a new variable: x = sample1 * sample2.
sample1 is drawn from a normal (Gaussian) distribution, centered on the value m1 with a standard deviation value given by sd1.
sample2 is drawn from a uniform distribution between x0 and x1.

a) Create an Rscript "myFns4.R" which will define a function 'myDistrib()', that accepts 5 arguments: m1, sd1, x0, x1 and N.
The function should return the vector x with N values generated as defined above.
If not values are passed to the function for its arguments, the function should use the following default ones:
m1=0.0, sd1=1.0, x0=-1.0, x1=1.0 and N=100.

b) Write an R script, named "computeSamples.R", that source the "myFns4.R" file and loads the function 'myDistrib()' you wrote for part a).
The script will generate a vector, NS, with sample sizes for the argument N of the function 'myDistrib()' with the following values:
1 10 100 100000 1000000 10000000

Using myDistrib() generate samples for vectors with lengths given by the vector NS and the following other values for the rest of the parameters:
m1=1.0, sd1=1.25, x0=1.0, x1=2.0

You will get full credit for this last part of the question if you find a way of implementing it using one of the *apply family functions (investigate the function mapply).

Use the command hist() to explore the samples generated and see if the results obtained make sense with what you would expect from the proposed definition for x.


2) For the following problem we will use the R-package “ape” (Analysis of Phylogenetics and Evolution), which contains a function called read.GenBank() that connects to Genbank (http://www.ncbi.nlm.nih.gov/) and downloads sequences into R for you.

a) For using the “ape” package be sure to have it installed in your R environment. The installation you'll need to do only once, and do not include it on your script.
Notice thought that the package should be loaded in the script for working properly!

For the following parts, create an Rscript, named “AnalizeNucleotide.R” that performs the following points:

b) Download from Genbank, the sequence for the nucleotide 'NM_005368'.
Explore the options of read.GenBank() so that the script gets the DNA sequence as the sequence expressed by the bases letters (A,T,C,G).
Explore the structure of the object returned by read.GenBank() and the information available.

c) Create a table binning the data obtained for the basis pairs. Explore the function table() for this.
Compute the probability for each of the basis of getting it within this particular sequence.

d) Considering the specific data for the bases distribution of the nucleotide 'NM_005368', compute a chi-square test assuming an equally probability for the 4 bases and the default probabilities for the test.
If we suppose that the bases A and C have twice the probability than T and G (hint: recall that the probabilities should add 1 and assume that the smallest probabilities would be 1/6), which will be the result of the test in this hypothetical case?


Submit your 'myFns4.R', 'computeSamples.R' and 'AnalizeNucleotide.R' files, and the output of "git log" from your assignment repository, to the 'Assignment Dropbox'. To capture the output of 'git log' use redirection, as described in lecture 2 (git log > git.log, and hand in the "git.log" file).

Assignments will be graded on a 10 point basis.
Due date is February 10th 2017 (midnight), with 0.5 penalty point per day off for late submission until the cut-off date of February 17, 2017, at 10:00am.

Last Modified: Friday Feb 10, 2017 - 18:07. Revision: 8. Release Date: Friday Feb 3, 2017 - 10:00.


Content Navigation


Course Calendar


Course Events



Questions? Contact Support.
Web site engine's code is copyright © ATutor®.
Modifications and code of added modules are copyright of SciNet.