In this lab, you will get practice using libraries to deal with real-world data files, and along the way you will get more practice using structs as well.
sudo apt update sudo apt install libxml2-dev
and follow the instructions (using your Ubuntu password).
~/ic210/lab09. Untar the file as follows:
tar -xvf lab09.tar
~/bin/submit -c=IC210 -p=lab09 part*.cpp
You can create an input stream from a string.
This type istringstream will be useful when you need
to analyze a given string.
For example, istringstream allows you to extract strings or numbers from a string. You are expected to understand the code on the right. Try to compile and run the code (file ex_iss.cpp in the provided files for the lab). |
|
While people use many different apps nowadays to access podcasts, the standard format for podcasts to tell those apps what episodes are available is called RSS.
RSS is part of a family of formats called XML, which you might recognize as
kind of similar to HTML documents that are used for web pages. When you untar
lab.tar, you will see a few examples of .rss files;
feel free to explore them in your text editor.
The simplest example is the ic210.rss file, which starts like:
|
So you can see, an rss file mostly consists of a number of
|
rss.h and rss.cpp.
The files rss.h and
rss.cpp provide a simple RSS-parsing library. These rely on the libxml2 library that you installed
using the instructions at the beginning of the lab.
first10.cpp.
first10.cpp is a simple program that processes a
given RSS file and displays the first 10 entries.
The main code fragments are shown below:
// Notice, we call open_rss to open a file instead of using ifstream.
RssFile* rss = open_rss(fname);
if (!rss)
{
cout << "ERROR: invalid rss file" << endl;
return 1;
}
cout << "First 10 episodes in the file are below." << endl << endl;
// Declare strings to hold the info for each episode
string title, url, date;
// loop through the first 10 episodes using next_episode()
for(int episode_index = 1; episode_index <= 10; episode_index++)
{
// fetch the next episode into rss
bool success = next_episode(rss);
if( !success ) // no more episode to fetch
break;
// call functions from rss.h to get episode information
title = episode_title(rss);
date = episode_date(rss);
url = episode_url(rss);
// display the information we just looked up
cout << episode_index << ". " << title << endl;
cout << "date: " << date << endl;
cout << "url: " << url << endl << endl;
}
first10.cpp and rss.h carefully (including comments)!
first10.cpp (along with rss.cpp) first10.cpp by running in your lab directory:
g++ `xml2-config --cflags` first10.cpp rss.cpp `xml2-config --libs` -lm -o first10
Makefile that allows you to compile the code more convenielty by
simply typing the following:
make first10to compile the source code. The command above compiles the file
first10.cpp along with all the options and files needed, and
produces the executable first10 that you can run as usual:
./first10
All of your code for this lab can be compiled using make.
Write a program called part1.cpp (compiling command: make
part1) that searches for a given word in the titles of the
episodes in an RSS file. (You might want to start by copying the
first10.cpp file.) Once you are finished, your search
program should work like this:
~$./part1RSS filename:snap.rssSearch for:breaktitle: Outbreak at San Quentin url: https://www.podtrac.com/pts/redirect.mp3/dovetail.prxu.org/320/e5e64e25-7a85-4dd1-a915-be864481c36e/1124_OutbreakAtSanQuentin_Segment1.mp3 title: Breakfast Of Champions url: https://www.podtrac.com/pts/redirect.mp3/dovetail.prxu.org/320/d2613ab7-f4e8-4777-95dd-0bd06fe5cb87/911_BreakfastOfChamps_Podcast%2520Master.mp3
~$ ./part1 RSS filename:corec.rssSearch for:functiontitle: We are teaching Functional Programming Wrong url: https://chtbl.com/track/7D91G/traffic.libsyn.com/secure/corecursive/055_-_Teaching_FP_With_Richard_Feldman.mp3?dest-id=628353 title: Http4s and Functional Web Development With Ross Baker url: https://chtbl.com/track/7D91G/traffic.libsyn.com/secure/corecursive/017_-_Http4s.mp3?dest-id=628353 title: Algebraic Domain Modeling using Functions With Debashish Ghosh url: https://chtbl.com/track/7D91G/traffic.libsyn.com/secure/corecursive/005_FP_Domain_Model.mp3?dest-id=628353
~$ ./part1 RSS filename:ic210.rssSearch for:partyNo matches found
Note:
find()
works for strings.
part2.cpp (compiling command:
make part2) that records the total length of time (in days, hours,
minutes and seconds) between the oldest and newest episodes of a podcast based
on the RSS file.
You will need to utilize the episode_date function from
rss.h, which will give you a string like this for each episode:
Fri, 25 Sep 2020 00:24:00 -0000
Now, your task is to compare these dates/times to get the total duration from the oldest to the newest episode. Your code should account for leap years as well.
Now, you could do all this yourself — scanning in each part of the date/time string, doing a bunch of math with the months and leap years, etc., but that seems pretty painful.
Instead, let’s use a standard library to do the hard work for us! Now it will be hard to figure out how to use this library, but once we figure that out, we can be confident it’s giving the right answers.
The library you want to use is called <ctime>.
The things you will find most helpful in this library are:
|
|
So basically, what you will need to do for each episode in the RSS feed is:
episode_date from the rss.h library.tm object.istringstream.
time_t using mktimedifftime. Note difftime can also be used to check you which time is
ealier or later (i.e., depending on whether the difference is positive or
negative). Take it step by step and pay careful attention! Be sure to debug your program as you go and don’t try to do everything all at once.
Your program should ultimately work like this:
~$ ~$ |
Note: If you look at ic210.rss, the latest time and the earliest time are
as follows:
|
part3.cpp (compiling command: make
part3) that reads in an RSS file, asks the
user for a desired episode duration, and then finds the episode whose length is
closest to that desired duration without going over.
To do this, you will need to utilize the duration tag in each
item of the RSS feed. (Important: the tag name is
qualified as itunes:duration in the example files, but when you
parse it using libxml2, you just need to specify the name
"duration".)
Unlike with the title, publication date, and URL, fetching the duration is
not provided by any of the functions in rss.h. So you have to
write it yourself! (write the function in part3.cpp) Here are some tips:
Look at how the functions like episode_title() work in
rss.cpp and start by imitating them! You are doing something
similar, but looking for a tag whose name is “duration”.
In some files, the duration is specified just as total seconds, for
example 412. In other files, the duration is specified in minutes
and seconds like 6:52. Some even have hours, minutes, and seconds
like 01:01:58.
The duration that the user specifies to search for can similarly either be just the seconds, or minutes:seconds, or hours:minutes:seconds. Your program must handle all three cases automatically.
At the end, print the title and URL of the closest episode, as well as how many seconds shorter than the desired duration it is.
If there are not any episodes shorter than the desired duration, then
your program should just print the message "No shorter episodes
found".
Suggestion: write a function that reads in a string that has seconds, or minutes:seconds, or hours:minutes:seconds, (you may want to check first how many ':'s there are in a given string) and no matter what just returns the total number of seconds. That way, your program logic can just worry about comparing total numbers of seconds.
Here are some example runs:
~$./part3RSS filename:science.rssDesired duration:10:00title: 7 Minute Workout: Fit or Fad? url: https://traffic.megaphone.fm/GLT5377688075.mp3 12 seconds shorter
~$./part3RSS filename:snap.rssDesired duration:1:10:00title: Gotcha url: https://www.podtrac.com/pts/redirect.mp3/dovetail.prxu.org/320/6d494d92-02e1-4df1-a038-090ec9c30767/1125_Gotcha_Segment1.mp3 164 seconds shorter
~$./part3RSS filename:corec.rssDesired duration:45:45title: Open Source Health and Diversity with Heather C Miller url: https://chtbl.com/track/7D91G/traffic.libsyn.com/secure/corecursive/038_-_Open_Source.mp3?dest-id=628353 257 seconds shorter
~$./part3RSS filename:corec.rssDesired duration:10:00No shorter episodes found