Today starts your final lecture series of the semester: interactive python and notebooks.
You've learned the basics (and some not-so-basics!) of programming with the Python language, and now it is time to learn about two Python environments that are commonly used by Python programmers and data scientists.
The first is the interactive python shell, and the second is the Jupyter Notebook.
Re-read Chapter 1: jump to the "Terminology: interpreter and compiler" section for a refresher.
Read the ipython tutorial: this official tutorial of IPython
Read a jupyter notebook tutorial: this official tutorial of IPython
You've created many Python programs in this class. Your programs are the .py text files. What happens when you click that Play button in VSCode? It doesn't run your Python program directly. It calls the Python interpreter as the real program that is executed, and the interpreter reads (and executes) your Python code line-by-line. The Play button is just a convenience that is executing this command in the terminal:
python3 your_program.py
Python3 is the actual Python interpreter. You don't need the Play button. You can type this yourself into the terminal without the Play button at all, if you haven't yet noticed this.
Re-read Chapter 1: jump to the "Terminology: interpreter and compiler section for a refresher.
Reading: this official tutorial of IPython
Open Ubuntu separately from VS Code. It provides us a bigger screen to use. Then type ipython3 on the command-line instead of python3:
ipython3
You'll enter a shell where you can type arbitrary Python commands. This program is called an interactive python shell, and it runs the same Python interpreter that you've been using in your programs. However, this is a shell interface that lets you type commands one at a time, and it executes them as you go. You can define functions, write loops, or whatever else to your heart's content. The shell is somewhat intelligent and it tries to guess when you want to execute a command vs write some more nested code. It also performs syntax highlighting, uses tab completion assistance, and contains some other bells and whistles.
Try defining a function like this and then calling it later:
def tempy(x):
print('hello',x)
The session remembers everything you've defined, and lets you "pause" as you write your program. You can import other python programs as well, useful for testing on the fly.
The Jupyter Notebook is a combined Python interpreter, visualizer, and document manager. It runs an interactive Python shell like above, but it creates a wrapper around it to let developers create Python programs interleaved with descriptive text, data, charts, etc. Professor Chambers likes to refer to it as point-and-click execution. You still write Python programs, but it doesn't run like a normal program from top to bottom without hesitation. You instead split your program's logic into chunks, and you click buttons to tell each section to run when you're ready for it. As you'll see in the next couple of lectures, this gives you flexibility to debug your program along the way, make changes quickly, and analyze your data.
Create your first Notebook: Open VS Code and select "New File..." from your File menu. You should see Jupyter Notebook as an option in the subsequent drop down menu. Select that and a Notebook is created!
You'll see a text editor similar to before, but this time in a new box. The Play button is on the left-hand side of the box now. Add a couple print() statements:
Click the Play button to run it! You might get a message about installing a ipython kernel. If you do, open a terminal, make sure you're in your SD211 conda environment, and install it:
conda activate sd211 conda install ipykernel
You should see your print statements run, but look what happened. The output is not in a Terminal shell like normal. Instead, Python ran in the background, and your program's output is added to your Notebook as text.
Move your mouse over the text, and you'll see a + Code button appear. See my image above. Click that, and it starts a new cell of your program. This cell can be run separately from the first, but it builds on the first and uses whatever has been declared so far. Let's try adding two cells, and you try to mimic this:
When things get confusing: you have a lot of power with this cell-programming structure, but it can also lead to some confusion. Make a change to your lists in the 2nd cell, but don't execute that cell. Instead, execute the 3rd cell again. Your lists are NOT updated with the 2nd cell's changes. They are still containing the prior data. In fact, hit the 3rd cell play button several times, and your list will be appended each time! The output no longer corresponds to what a single execution of your "program" would ever be.
So what?: data science deals with lots of data. We've already had a few labs that pushed the limits of your computers, such as the Twitter lab and the Authorship lab. If we can only run the data loading part once, then we can run the second computation part as often as we wish without having to wait for the data to be re-loaded each time. This lets you debug problems much easier, and it lets you play with visualizations more efficiently.
We finished the JSON lecture series with an example that queries stocks. Let's develop that solution in a Notebook instead of a standalone Python program. Create a new blank Notebook and copy this starter code into the first cell:
import requests
stock = input('Stock Ticker? ').lower()
# Build the JSON query
query = 'http://query1.finance.yahoo.com/v11/finance/quoteSummary/' + stock + '?modules=financialData'
with requests.session():
header = {'Connection': 'keep-alive',
'Expires': '-1',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) \
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36'
}
website = requests.get(query, headers=header)
json = website.json()
Hit play on that cell, enter "GME" as the ticker. You just queried for the JSON and it's in your json variable.
Now make a new code cell separate from the first:
print(json)
You can now play with that command over and over again, hitting play on its cell only, and it will always operate on the same json that was returned. You are not making a Web query anymore, instead keeping the data in memory while you develop and debug. See if you can recover how to extract the current price using this approach.
Pro-tip: hit <shift>-enter with your cursor anywhere in a cell, and it will execute the entire cell.
The big limitation you'll realize soon enough is that you can't have loops that span multiple cells. In the stock example above, there is no loop that allows you to keep entering stocks. Notebooks are a step removed from a user-interface program for this reason. You might write a really long notebook with lots of cells, and then realize you want to run it a few times on different input. Unfortunately you're sort of stuck. The solution is to merge all the cells into one, and then put a loop around it, but you might realize that you've essentially turned it into a regular Python program at that point.
A potential fix to the above is to make each cell a function definition, and then write a new cell with a loop that just calls all the functions in sequence. If that feels strange...that's because it sort of is.
Notebooks are great for single sequence data processing projects, and you'll end up really enjoying them. Just keep in mind their limitations.