This page provides extra details for the data used in my NAACL-2018 paper:
Using Social Media Text to Detect Denial-of-Service Attacks: Applying NLP to Network Security
Nathanael Chambers and Ben Fry and James McMasters
NAACL-2018, New Orleans, USA. June 2018.
PDF download
Unfortunately, Twitter's terms of service prohibit us from making our collected tweets available. I will not send raw tweets, so please do not request them from me. You can duplicate our dataset by using Twitter's web-based search tool with date constraints and the company name as a keyword.
The tweets collected for this project focused on 20 day windows around known historical DDoS attacks. We searched old news articles for past attacks, and created a list of attacks where the date of the attack could be ascertained with certainty from the news or related sources. The following services and dates are the final set of attacks. Note that the dates are not the news article publication dates, but rather the dates of the attacks themselves. This is an important distinction as news articles also generate Twitter chatter, but those days are not necessarily attack days (they often follow the day of attack, in fact).
Organization Name | Attack Date (YEAR-MM-DD) |
---|---|
Ancestry.com | 2014-06-16 |
2014-06-17 | |
Bank of America | 2012-09-19 |
BBC Website | 2015-03-14 |
2015-12-31 | |
Bitcoin | 2014-02-11 |
Blizzard | 2016-08-03 |
2016-08-23 | |
2016-08-24 | |
2016-08-31 | |
Call of Duty | 2014-09-20 |
Chase Bank | 2012-09-19 |
Department of Justice (USA) | 2012-01-19 |
DNS | 2016-10-21 |
Evernote | 2014-06-10 |
Feedly | 2014-06-11 |
Federal Bureau of Investigation (FBI) | 2012-01-19 |
Femsplain | 2015-03-08 |
Github | 2015-03-27 |
GetResponse | 2014-04-26 |
2014-04-27 | |
GoDaddy | 2012-09-10 |
Hadopi | 2012-01-19 |
JANET Network | 2015-12-08 |
Organization Name | Attack Date (YEAR-MM-DD) |
---|---|
Komodia | 2015-02-20 |
2015-02-21 | |
2015-02-22 | |
Library of Congress | 2016-07-18 |
2016-07-19 | |
MPAA | 2012-01-19 |
NameCheap | 2014-02-20 |
Newsweek | 2016-09-29 |
Pirate Bay | 2012-05-16 |
2012-11-13 | |
Planned Parenthood | 2015-07-29 |
Playstation | 2014-12-25 |
PNC Bank | 2012-09-19 |
PNC Bank | 2012-09-26 |
PNC Bank | 2012-09-27 |
2013-04-19 | |
RIAA | 2012-01-19 |
Spamhaus | 2013-03-18 |
2013-03-19 | |
2013-03-22 | |
Tor Network | 2014-12-26 |
Universal Music | 2012-01-19 |
ustream | 2012-05-09 |
Wells Fargo | 2012-09-19 |
Wells Fargo | 2012-09-25 |
2012-09-25 | |
XBox Live | 2014-12-25 |
Our neural network model (Neural1 and Neural2 from the paper) is written in Python and uses DyNet for learning.
Our generative model is a modified version of Partially Labeled LDA. We implemented this in Java. It is not currently packaged into a nice easy-to-use library at this point in time. If interested in using it, please send me an email.
Other questions can be sent to Nate Chambers. The two co-authors with Chambers were undergraduate students at the time of this project.