Detecting Denial-of-Service Attacks on Social Media:
Applying NLP to Network Security

This page provides extra details for the data used in my NAACL-2018 paper:

Using Social Media Text to Detect Denial-of-Service Attacks: Applying NLP to Network Security
Nathanael Chambers and Ben Fry and James McMasters
NAACL-2018, New Orleans, USA. June 2018.
PDF download

Tweets Corpus

Unfortunately, Twitter's terms of service prohibit us from making our collected tweets available. I will not send raw tweets, so please do not request them from me. You can duplicate our dataset by using Twitter's web-based search tool with date constraints and the company name as a keyword.

Attack Dates List

The tweets collected for this project focused on 20 day windows around known historical DDoS attacks. We searched old news articles for past attacks, and created a list of attacks where the date of the attack could be ascertained with certainty from the news or related sources. The following services and dates are the final set of attacks. Note that the dates are not the news article publication dates, but rather the dates of the attacks themselves. This is an important distinction as news articles also generate Twitter chatter, but those days are not necessarily attack days (they often follow the day of attack, in fact).

Organization Name Attack Date (YEAR-MM-DD)
Ancestry.com 2014-06-16
2014-06-17
Bank of America 2012-09-19
BBC Website 2015-03-14
2015-12-31
Bitcoin 2014-02-11
Blizzard 2016-08-03
2016-08-23
2016-08-24
2016-08-31
Call of Duty 2014-09-20
Chase Bank 2012-09-19
Department of Justice (USA) 2012-01-19
DNS 2016-10-21
Evernote 2014-06-10
Feedly 2014-06-11
Federal Bureau of Investigation (FBI) 2012-01-19
Femsplain 2015-03-08
Github 2015-03-27
GetResponse 2014-04-26
2014-04-27
GoDaddy 2012-09-10
Hadopi 2012-01-19
JANET Network 2015-12-08
Organization Name Attack Date (YEAR-MM-DD)
Komodia 2015-02-20
2015-02-21
2015-02-22
Library of Congress 2016-07-18
2016-07-19
MPAA 2012-01-19
NameCheap 2014-02-20
Newsweek 2016-09-29
Pirate Bay 2012-05-16
2012-11-13
Planned Parenthood 2015-07-29
Playstation 2014-12-25
PNC Bank 2012-09-19
PNC Bank 2012-09-26
PNC Bank 2012-09-27
Reddit 2013-04-19
RIAA 2012-01-19
Spamhaus 2013-03-18
2013-03-19
2013-03-22
Tor Network 2014-12-26
Universal Music 2012-01-19
ustream 2012-05-09
Wells Fargo 2012-09-19
Wells Fargo 2012-09-25
2012-09-25
XBox Live 2014-12-25

Neural Network Code

Our neural network model (Neural1 and Neural2 from the paper) is written in Python and uses DyNet for learning.

PLDAttack : Partially Labeled LDA

Our generative model is a modified version of Partially Labeled LDA. We implemented this in Java. It is not currently packaged into a nice easy-to-use library at this point in time. If interested in using it, please send me an email.

Questions?

Other questions can be sent to Nate Chambers. The two co-authors with Chambers were undergraduate students at the time of this project.