Brice Wolfgang
a Data scientist

Welcome to

Business Casual

By Start Bootstrap


Creating this tool was interesting because of how few inputs it requires and how many places each of those inputs are used in the code. It was a great way to learn how versatile a python toolbox can be. For example the field that is being checked for duplicates is called in the search cursor, and in the SQL sorting for that cursor, and is checked twice in the for loop at the heart of the tool, and is re-created in two feature classes.

Making a generally useful tool was more complicated than just drawing lines between duplicate points. Instead of hard coding the one field I care about, the tool handles any field, of any data type and outputs a feature class with some of that field’s values which must be the same data type. Recreating the field that contains duplicate information was not trivial. Finding the start and end point for each line, surprisingly, was easy. arcpy.da.SearchCursor returns the centroid of any feature by using the "Shape@xy" field, and according to documentation it’s a fast way to get to that data. Since everything has a centroid it works for any feature type. For one test I drew lines between all the annotations with the same font size.

The tool is feature type agnostic, and does not care what type of field you choose for checking duplicates. To keep things straight the output has the same spatial reference as the input. This seemed like the most widely useful tool I could make on my way to fixing my own problem, which seems like a great way to learn about handling GIS data.

The data set I used this for has ~200 duplicate pole numbers but also ~400 "No Pole Number" features. To make a map that was not covered in 400 useless lines I counted how many times each duplicate is found. Then when making the map I ignore any line that has more than 4 duplicates. It may have been easier to just select out the "No Pole Number" features beforehand, but it seems like knowing the level of duplication might be useful for other applications. Also I can make fun of my coworker for making three of the same pole number one time.

DupeLines repo on GitHub