Lets imagine that we have a file of data in a .csv (comma seperated value), or .tsv (tab seperated value) file and
We want to load that information in R and treat it as a dataframe. In a previous class we worked with data regarding UFOs, so lets use
that data for now.
Download the following files:
Now if we want to read the data into R, we can do this with the read.table function,
remember to provide information on whether or not the first line is a header line, as well as
what the delimeter is between the columns for each row, usually this will be "," for commas,
or \t for tabs.
ufo100 = read.table("100.csv", header=TRUE, sep=",")
We should first look at some of the information about our dataframe:
dim() we see that we have 100 rows and 11 columns
dim(ufo100)
[1] 100 11
nrow() we see that we have 100 rows.
nrow(ufo100)
[1] 100
for loop
to walk through the data. Imagine that we want the city and state from this dataframe,
we have a few methods to retrieve this information.
for (i in 1:nrow(ufo100)) {
print(ufo100[i, 2])
print(ufo100[i, 3])
}
for (i in 1:nrow(ufo100)) {
print(ufo100[i,]$city)
print(ufo100[i,]$state)
}
If you remember our first week of the class, we had multiple options for subsetting our
data, in this case we want to walk through each row, and then subset by specific columns.
Below is a quick example of writing a function that works with our database.
ufo_shapes = function(db, state){
query = "SELECT DISTINCT shape FROM ufo WHERE state=?"
shapes = dbGetQuery(db, query, c(state))
return(shapes)
}
print(ufo_shapes(db,'tx'))
shape
1 cylinder
2 light
3 circle
4 other
5 sphere
6 unknown
7 cigar
8 diamond
9 disk
10 flash
11 formation
12 oval
13 changing
14 fireball
15 triangle
16 <NA>
17 rectangle
18 egg
19 chevron
20 cross
21 teardrop
22 cone
23 delta
Continuing our previous example, lets have use the loop through the CSV file as input into our new function.
for (i in 1:nrow(ufo100)) {
state = ufo100[i,]$state
shapes = ufo_shapes(db, state)
}
Hopefully you can see that we could easily modify this loop to perform actions like creating SQL INSERT statements. In even more advanced cases we could create a function that would walk through our CSV file, query the database to see if work needed to be done, and then do some work on the database based on the file.
for loop.
shapes = ufo_shapes(db, 'tx')
for (s in shapes) {
print('---')
print(s)
}
for (s in shapes[,1]) {
print('---')
print(s)
}
for (s in shapes[,]) {
print('---')
print(s)
}
for (s in ufo100) {
print('---')
print(s)
}
for (s in ufo100[,1]) {
print('---')
print(s)
}
for (s in ufo100[,]) {
print('---')
print(s)
}
ufo_shape that just
stores the shape column.