Winning Wordle with Data Science
POSTED ON: Monday, April 25, 2022 9:28 AM by MC1 Jordyn Diomede
The new popular game Wordle has been trending across the globe, but here at the U.S. Naval Academy (USNA), professors like Joel Esposito have taken to implementing the game into his new data science and artificial intelligence (AI) elective course.
“As you may be aware there is a big push on embracing AI and data science in all areas of industry and government,” said Esposito. “The Navy is no exception, and this year, USNA is standing up a data science major.”
In his new data science and AI course, his first class project was on web scraping, which entails writing computer programs to automatically collect data from websites that are often used to train “deep learning” AI models. He said examples of those projects included collecting data on COVID-19 off the CDC website, analyzing population data from Wikipedia entries, and collecting thousands of images of common objects from Amazon.
As someone who enjoys word puzzles, Esposito was hooked as soon as he discovered Wordle, noting that he wasn’t the only one as the game has gotten immensely popular over the course of the last couple months.
Within a week, midshipmen in his class went from having never heard of the game to talking about that day’s puzzle prior to the start of class, bringing up a commonly debated question, “what opening guess is best?”
“In Wordle you open with a blind guess,” he said. “People try to pick words with common letters or the most vowels, and they seem to have very strong emotions about this issue, often having their own favorite choices with little evidence to back it up.”
Esposito said that he saw this as the perfect opportunity to take a topic the midshipmen were clearly interested and passionate about and show them how the skills they were acquiring in the elective could be used outside the classroom.
“So I wrote a “web scraping” computer program to visit a popular Scrabble website and download a list of 8,913 5-letter English words,” he said. “The program then filters out all repeated letter words – which are not good first guesses – leaving about 5,823 words. It then computes the frequency of appearance of each letter of the alphabet.”
A big part of data science is the presentation of complex data in an easy-to-digest format. Esposito took that mindset and applied it to the creation of a graphic in order to show optimal word guesses based on how many blind guesses the user was willing to make.
He determined through his research that the word RAISE has a 42% chance of discovering at least one letter or eliminating popular letters to narrow possible answers. Additionally, by entering the words STAIR, CLONE or DUMPY users have more than an 80% chance of getting an easy win on the fourth guess out of the six potential opportunities available.
“Whatever you pick can’t be worse than XYLYL,” said Esposito. “Yes, that is a real five-letter word with only a 10% chance of revealing a letter.”
From doing the New York Times crossword puzzle every day to winning Wordle with data science, Esposito continues to do his part to educate midshipmen.
“The Navy is committed to leveraging data science, and USNA has recently made a huge commitment to educating the brigade,” said Esposito. This has been done by introducing not only a new data science major, but also electives that support this field of study and various other majors like robotics and control engineering.For more information about the Naval Academy, please see www.usna.edu or our Facebook page at https://www.facebook.com/USNavalAcademy.