Reading Data from files in R

Lets imagine that we have a file of data in a .csv (comma seperated value), or .tsv (tab seperated value) file and We want to load that information in R and treat it as a dataframe. In a previous class we worked with data regarding UFOs, so lets use that data for now.

Download the following files:

Now if we want to read the data into R, we can do this with the read.table function, remember to provide information on whether or not the first line is a header line, as well as what the delimeter is between the columns for each row, usually this will be "," for commas, or \t for tabs.

ufo100 = read.table("100.csv", header=TRUE, sep=",")

We should first look at some of the information about our dataframe:

Why is that interesting? Because it will help us write a for loop to walk through the data. Imagine that we want the city and state from this dataframe, we have a few methods to retrieve this information.

for (i in 1:nrow(ufo100)) {
  print(ufo100[i, 2])
  print(ufo100[i, 3])
}
for (i in 1:nrow(ufo100)) {
  print(ufo100[i,]$city)
  print(ufo100[i,]$state)
}
If you remember our first week of the class, we had multiple options for subsetting our data, in this case we want to walk through each row, and then subset by specific columns.

Working with our data from MySQL

Below is a quick example of writing a function that works with our database.

ufo_shapes = function(db, state){
  query = "SELECT DISTINCT shape FROM ufo WHERE state=?"
  shapes = dbGetQuery(db, query, c(state))
  return(shapes)
}
print(ufo_shapes(db,'tx'))
     shape
1   cylinder
2      light
3     circle
4      other
5     sphere
6    unknown
7      cigar
8    diamond
9       disk
10     flash
11 formation
12      oval
13  changing
14  fireball
15  triangle
16      <NA>
17 rectangle
18       egg
19   chevron
20     cross
21  teardrop
22      cone
23     delta

Continuing our previous example, lets have use the loop through the CSV file as input into our new function.

for (i in 1:nrow(ufo100)) {
  state = ufo100[i,]$state
  shapes = ufo_shapes(db, state)
}

Hopefully you can see that we could easily modify this loop to perform actions like creating SQL INSERT statements. In even more advanced cases we could create a function that would walk through our CSV file, query the database to see if work needed to be done, and then do some work on the database based on the file.

Problems

  1. Try the following and describe the difference in the results, we are just changing the small portions of the for loop.

    shapes = ufo_shapes(db, 'tx')
    
    for (s in shapes) {
      print('---')
      print(s)
    }
    for (s in shapes[,1]) {
      print('---')
      print(s)
    }
    for (s in shapes[,]) {
      print('---')
      print(s)
    }
  2. Do the same for the following code samples

    for (s in ufo100) {
    print('---')
    print(s)
    }
    for (s in ufo100[,1]) {
    print('---')
    print(s)
    }
    for (s in ufo100[,]) {
    print('---')
    print(s)
    }
  3. Create a table in your database called ufo_shape that just stores the shape column.
  4. Write a for loop that walks through the ufo100 data from the CSV file, and creates a SQL INSERT statement that inserts the shape found into a ufo_shape table. Make sure to use prepared statements (the one that uses the ?)