JSON and Scraping

We covered two types of Web queries in our prior two lectures: using an API to retrieve JSON, and scraping HTML web pages. Today puts these two side-by-side for a hands-on comparison with in-class exercises.

Days Until Christmas

Let's start with a JSON review to illustrate what is available. Do a Web search for "current time json" and you'll see a few hits! Right away we can see that APIs are available for retrieving accurate times. Let's write a program that will always tell you how many days and hours we have until Christmas.

The first thing to do is to understand the JSON format. This WorkdTimeAPI interface has a very nice webpage that shows examples! Sometimes the webpage's API will have sufficient information, but sometimes it's hard to find what you need. The direct approach is to simply type into your Web browser an example of a JSON call, in this case: http://worldtimeapi.org/api/timezone/America/New_York

{"abbreviation":"EST","client_ip":"136.160.90.6","datetime":"2022-11-22T11:15:51.994797-05:00","day_of_week":2,"day_of_year":326,"dst":false,"dst_from":null,"dst_offset":0,"dst_until":null,"raw_offset":-18000,"timezone":"America/New_York","unixtime":1669133751,"utc_datetime":"2022-11-22T16:15:51.994797+00:00","utc_offset":"-05:00","week_number":47}

Looks good! I see a nice 'datetime' field which sure sounds familiar (Python datetime module?). We can use that to do our datetime math. Here is a starter program:

import requests
from datetime import datetime

tz = input('What timezone ("local" also an option)? ')

if tz.lower() == 'local':
    tz = "America/New_York"

# Make the JSON query
data = requests.get("http://worldtimeapi.org/api/timezone/" + tz)
json = data.json()

# Create the NOW datetime object.
...

# Create a Christmas datetime, subtract, and print!
...
  

We'll show this in class. A full solution is here.

Crime Watch: JSON

The US Department of Justice has created a few API services to enable the public to take part in solving crimes:

to name a few. These can be found at https://www.justice.gov/developer.

We are going to take a look at the FBI Crime data API (additional documentation on use case) I personally enjoy learning about art crimes, so we are going to focus on that. How would one go about searching the data for art crimes? Well, we could look at the art crime directory or we can go directly to the soure using the API created:

import requests
# if we know the id of the art crime:
id = "9e37f643d8cd4ff29b442f901530a1ef"
query = "https://api.fbi.gov/@artcrimes/" + id
page = requests.get(query)
json = page.json()

# if we know the reference number:
reference = "00747"
query = "https://api.fbi.gov/@artcrimes?referenceNumber=" + reference
page = requests.get(query)
json = page.json()

Bonus Task: Read the documentation, write code that finds all of the stolen art in the crime Category of paintings, Then filter the list based on medium (e.g. oil, water color') and print off all of the painting titles which contain oil as a medium. Solution found here