Assignment 1 R Programming Tutorials

This experiment implements a function that takes a directory of data files and a threshold for complete cases and calculates the correlation between sulfate and nitrate for monitor locations where the number of completely observed cases (on all variables) is greater than the threshold. The function returns a vector of correlations for the monitors that meet the threshold requirement. If no monitors meet the threshold requirement, then the function returns a numeric vector of length 0. This is part of [Introduction to R programming course](https://www.coursera.org/course/rprog) The experiment graph for this assignment is shown below. For this experiment, we uploaded dataset as zip file, created sample test input as part of enter module data, and finally wrote the function as a script in **Execute-R** module. ![](http://neerajkh.blob.core.windows.net/images/ass1part3_1.PNG) The following figure shows the correlation between sulfur and nitrate pollutants for various sensors who have complete cases exceeding threshold. ![](http://neerajkh.blob.core.windows.net/images/ass1part3_2.PNG) ![](http://neerajkh.blob.core.windows.net/images/ass1part3_3.PNG) Created by a Microsoft Employee

The assignment for week 2 is kinda tough if you have not used R before. The video lectures also did not prepare you for it. If you have not taken the swirl tutorial, I strongly recommend that you finish it at the beginning of the week 2. You also want to start working on the assignment as soon as possible.

Derek Franks wrote a great tutorial. If you follow the step by step tutorial closely, you should have no problem finishing some problems in assignment 1. Here is the link to the tutorial:


The second challenge I had about this assignment is that I did not know how to return a data frame in a function. After experimenting a bit and I finally got it to work. Here are the code for returning a data frame in a function.

## initiate the data frame results <- data.frame() ## loop through the files for (i in id) { ## read file and get completed cases ## add to the data frame. results <- rbind(results, data.frame(id=i,nobs=completed_cases)) } ## return the data frame return(results)

Function cor is used in one of the problems, but it’s not taught. You are supposed to figure it out by yourself. The usage is actually quite easy. Suppose you read the file and store it in a data frame called data. To calculate the correlation between column 2 and column 3, you use corr this way.

cor(data[,2], data[,3])

