I am a graduate of General Assembly's full time data science boot camp. I also work for Arlington County, Virginia handling GIS related tasks for the Department of Transportation Engineering and Operations. Not long ago I completed a certificate in GIS from George Mason University. Before that, I was the technical operations manger for CitiBike in New York. Further back, I was a Peace Corps volunteer teaching computer literacy and building an intranet for the Southern Regional Health Authority of Jamaica
I look forward to being part of a team working on interesting problems, for the public good. Keep scrolling to see some of my projects and thoughts on data science related issues
I was inspired to make this map thanks to Greater Greater Washington's coverage of the 2016 DC zoning code. After reading about the possibility of corner stores in DC I had fond memories of the tiny grocery that was right around the corner from my home in Brooklyn. Even though I barely cook, I was amazed at the efficiency of that store. It seemed to have everything you needed and nothing else, including free space.Read More
This is a function that applies every Scikit-learn clustering algorithm on a single set of 2D data and visualizes the result. Its useful for doing a quick check to see if pairs of variables are a good candidate for clustering.
This is the introduction to list comprehensions that I wish I had read when I first encountered them. There are other good introductions to list comprehensions, but I wrote this one based on a very simple concept: Every list comprehension evaluates to the list [1,2,3] This should make it easy to follow the examples as they go from very trivial to more complex.Read More
As with any data science endeavor you have to start by getting the shape of the data. This function does that for several columns over several tables using SQL. The trick here is it puts the output in one easy to read pandas data frame.
To make good models you often have to make a lot of dummy variables. I’ve learned there are smart ways to create them and dumb ways to create them.
In a pervious post I showed you how to make quick and dirty dummy variables with list comprehensions. I called that the dumb way because you have to hard code what goes in the column(s). This is about how to do it a smarter way, where the data itself dictates what dummy variables get made. All examples use the The titanic dataset from kaggle
If you’re reading this blog than you probably spend a lot of time using a keyboard to tell computers what to do. Here is how you can do less of the former and more of the latter, plus it makes you look like hacker on TV.
This ArcGIS python toolbox will draw lines between any number of features that have a duplicate field value. This works for any field in the feature class. I created it to resolve hundreds of duplicates in a data set with thousands of points spread over 28 square miles of urban streets. Working through those duplicates one by one in ArcGIS was slow and confusing. Few of the duplicates were in predictable locations and many were half way across the county from each other. After using DupeLines several coworkers and I could resolve the duplicates by just looking at each line and deciding which end was correct and how to fix the other end. The feature class of lines also worked nicely as a todo list.