Data Science Resources

Code and Analysis Resources

eagles: This is a python package I developed and which is available via pypi that is meant to aid practitioners perform analyses and develop models. For examples of how to use the package see the available example jupyter notebooks in the eagles github repository.

Data Science Tools and Examples: In this github repository I provide various workflows and examples of performing different type of analyses. Including examples using neural networks, regression and others.

CDA for NonNets - In this RPub I provide an example of how to use community detection analysis to cluster data outside of network data. I also compare it to other methods of clustering (i.e. k-means and hierarchical clustering) to show how it has advantages over these algorithms. Functions for running the analysis described in the above article can be found here.

  • Follow this link to see an example of combining multiple imputations and clustering.

ggplot cheatsheet - ggplot is a really useful tool for plotting and visualizing data in R. The link provided brings you to a simple cheat sheet for using ggplot.

Learn Data Science

Data School - This is a great resource for those who want to get started in data science but are not sure how. Among the many resources provided are links to videos which teach the basics of data science in python using scikit-learn.

Introduction to Statistical Learning - This is freely available book that covers a wide range of topics from the basic principles of linear regression to higher level topics such as smoothing splines and principle components analysis. You can also find simple R scripts on the website for performing the analyses described in each chapter.

Kaggle - Although kaggle.com doesn't provide direct or formal training in data science techniques it hosts competitions in data science and also provides freely available datasets for people to practice and hone their skills.