Portfolio

Below is a list of recent projects I've completed. These tend to be data science projects, but may also include mapping software and consultation. My skills list is as follows and these projects tend to draw on these skill sets:
Python, Pandas, Scikit-learn, Machine Learning, Statistical Modeling, Scala, Spark, TSQL, HTML, CSS, Quick Base, MS Excel (VBA), Project Management, Research Design, Google Adwords, Google Analytics, Zapier, GoFormz, Workato, MS Suite

FEATURED

UN General Assembly: A Cluster Analysis

Despite all the recent bellicose criticism of the United Nations (some of it deserved) I happen to think the United Nations is one of the world's most important institutions. In this project I wanted to gain some insight into voting patterns each year from the General Assembly's inception in 1946 to 2017.

I use a Hierarchical Database Scan (HDBSCAN) model to identify nation-state clusters for each year. For the same time period I use Principal Component Analysis (PCA) to visualize the clusters in two dimensions. In this case PCA component 1 contains over 100 times the weight of component 2, allowing me to identify which resolutions are responsible for the greatest variance. The end result is a slide widget that allows each year to be selected, outputting a cluster visualization, silhouette , a list of each country in its respective cluster and descriptions and classifications of the resolutions with the greatest impact on clustering.

Right now its just the presentation, but stay tuned for the following additions:

Interactive notebook online (so you can play with the slider)
Geographic visualization on world map
Natural Language processing to identify categories of resolutions
More and more analysis of other data sets connected to yearly clusters

Will my Reddit post go viral?

I'm guessing interesting content is the main predictor. That aside, it looks like we can predict which posts will be popular to a surprisingly high degree. In this project I look at how we can use the "subreddit" and the words in a posts' title in order to predict "hotness" (that's the scientific term).

I look at whether the words "Dog" and "Cat" affect the hotness. Then check the importance of all other words contained in the title. We'll look at how useful this is in a K-nearest neighbors model and then compare its performance to a random forest model.

Hate Mosquitoes? We do too...

This project was a group project with my colleagues Will Long, Ryan Metz and Harmony Lee. The goal was to predict the most effective locations for the City of Chicago to spray for mosquito abatement. This was part of a Kaggle competition that can be found here: https://www.kaggle.com/c/predict-west-nile-virus

We also conducted a cost benefit analysis. The slides below are based on the assumption that we are presenting to City of Chicago. Its a little tongue and cheek in some places, but its a serious analysis

Let's start your project today