This post is notes from the Coursera Data Analysis Course.

Here are some R commands that might serve helpful for cleaning data.

#### String Replacement

**sub()**replace the first occurrence**gsub()**replaces all occurrences

#### Quantitative Variables in Ranges

**cut(data$col, seq(0,100, by=10))**breaks the data up by the range it falls into, in this example: whether the observation is between 0 and 10, 10 and 20, 20 and 30, and so on**cut2(data$col, g=6)**return a factor variable with 6 groups**cut2(data$col, m=25)**return a factor variable with at least 25 observations in each group

#### Manipulating Rows/Columns

**merge()**for combining data frames**sort()**sorting an array**order(data$col, na.last=T)**returns indexes for the ordered row**data[order(data$col, na.last=T),]**reorders the entire data frame based upon the*col***melt()**in the reshape2 package, this is for reshaping data**rbind()**adding more rows to a data frame

Obviously, these functions have other parameters to do a lot more. There are also a number of other helpful R functions, but these are enough to get you started. Check the R help (?functionname) for more details.

## Leave a Reply