Back to All Events

Unsupervised Machine Learning for Global Terrorism Data using Apache Spark

  • HackerLab Sacramento, Ca (map)

This presentation will demonstrate how to use Apache Spark for Data Science practices, which will be applied to the Global Terrorism Database (GTDB). It will include the required data preparation techniques (feature selection, cleaning, and transformation) before proceeding to clustering, anomaly detection, and model evaluation. The entire demonstration will be presented in an Apache Zeppelin notebook, and it will include a brief introduction Apache Spark and Apache Zeppelin. Those in attendance can participate using Hortonwork’s HDP sandbox, which is a free single node environment of Apache Hadoop. https://www.kaggle.com/START-UMD/gtd 
http://spark.apache.org/ 
http://spark.apache.org/mllib/ 
https://zeppelin.apache.org/ 
https://hortonworks.com/products/sandbox/?gclid=CMr9tJ7a8tMCFYmffgodizMGSQ 
https://en.wikipedia.org/wiki/Anomaly_detection 
https://en.wikipedia.org/wiki/Cluster_analysis

6:30 - 7:00 Networking
7:00 - 8:00 Presentation