In my previous posts I’ve produced an infographic to convey the demographic details behind terrorist attacks. The process of gathering the data is painstaking and the decisions around the infographic take some design expertise but for the most part — anyone can produce that.
Let’s take the dataset a step further and see if we can use machine learning and artificial intelligence to give us more insight into terrorists. Using BigML.com you can create an account and follow along.
First, I’m using data gathered from the project “Terrorism in America After 9/11”. You can download and clean that data just as I have.
First, it helps to have a “target” field — something BigML.com can use to identify the stated “prediction” you are trying to attain. In this case we’ll use the fact of whether or not the attack was prevented. (Remember of the nearly 300 terrorist plots since 9/11 only 25 have actually been executed — but the results have been hundreds dead and wounded). In short, we’re trying to find if there are demographic clusters of information that can help us understand who these terrorists are.
Once you have the data mapped out in Excel you may want to add a binary field in addition to the text field (1=not prevented / 0=prevented).
Next, drag and drop your Excel file or csv file into the dashboard at BigML.com. From there, create a dataset. (There are tutorials to walk you through some of this or see my previous posts about using BigML.com)
At this point you might try using the Dynamic Scatterplot. Basically, this is a pivot table visualization tool across your fields. You can find the button in the upper right hand corner once you create your dataset.
Because we imported the data with the binary numerical field of the terror plot success we can do some correlation. As we run this function however we note that since there are very few numerical fields this won’t prove very helpful. We can however view the plot results with age:
The cluster on the left shows the number of plots corresponding to age on the Y-axis and the 1 or 0 plot success on the X-axis. From here we can see what we knew already… terrorists are typically on the younger side. We can see here that the 25+ plots which were not prevented (on the right side) do seem to skew younger and that there is no one over the age of 50.
Going back to our dataset we can start creating models to help us understand the importance of various demographics in finding commonalities. Here’s a basic “all-in” model:
This screenshot from the model shows us that with nearly 80% confidence the unsuccesful plots had a the highest variables associated with the following fields and parameters:
- They had no US military experience
- They were not refugees or asylum seekers
- They were not targeting Jews
- They were between the ages of 23 and 34
Some inference can be made about the people who DID succeed but the confidence level of this model is telling us that it’s easier to find clusters around those plots that failed.
Right away we realize that the breadth and depth of our dataset is not going to yield a LOT of information for us given the relative small number of actual plots.
We can use the clustering reports to find some correlation as in this screen shot:
One thing we can see is that the marital status and age was an important part of the commonalities around the terrorists who succeded in causing harm. The other variables provided did not seem to lend much in that prediction.
For our next post we’re going to widen our net and grab data from the Global Terrorism Database and see if we can yield more prescient insights.