Associate GPS coordinates with a street address

Obtain street addresses

We would like to associate trees with street addresses when possible. When collecting tree data, we could also make note of the nearest address. Since we are collecting longitude/latitude data anyway, it would be convenient if we had a list of street addresses with associated longitude/latitude. This would allow us to auto-populate an address field based on longitude/latitude data. How does one obtain a list of all street addresses in town with longitude/latitude data? Let’s start with obtaining addresses alone. Addresses are availble commercially e.g. from the post office, but for a substantial fee. Since this is a volunteer effort, I want to see what I can get for free. Our town publishes property assessments on a regular basis, but this is hardcopy in the local paper. Though assessments are available online, it is through a query interface providing one at a time, with no obvious way to download the entire list. I was able to find our town’s street sweeping schedule which supplies a list of all streets in html format:

Street Name
Rte
Location
Aberdeen Road
16
Tanager St. to Dundee Rd.
Academy Street
25
734 Mass. Ave. to Irving St.
Acorn Park
n/a
30 Concord Tpk., 100' swly
Acton Street
20
21 Appleton St. to Appleton Pl.
Adamian Park
7
Baker Rd. to 100'  S'LY OF Upland Rd.
Adams Street
32
319 Mass. Ave. to 216 Bdwy.
Addison Street
27
106 Pleasant St., 800' sely
Aerial Street
2
169 Forest St., 375' nely
Aerial Street
2
375' nely of Forest St. to Carl Rd.
Aerial Street
2
Carl Rd. to 288 Wash. St.
Albermarle Street
21
Walnut St. to Mt. Vernon St.
Alfred Road
28
97 Lake St. to Princeton Rd.
Allen Street
32
339 Mass. Ave. to 70 Warren St.
Alpine Street
15
26 Park Ave. Ext., 350' nly
Alpine Street
15
350' nly Park Ave. Ext. to 300'swly Branch Av
Alpine Street
15
300' swly of Branch Ave. to Summer St
Alpine Terrace
1
Huntington Rd., 286' swly
Alton Street
35
295 Bdwy. to 158 Warren St.
Amherst Street
11
14 River St. to Rawson Rd.
Amsden Street
34
107 Mass. Ave. to Waldo Rd.
Andrew Street
32
Foster St. to Allen St.
Apache Trail
3
Lantern Ln to 150' wly of Cntry Clb Dr

etc.




Copy paste into a text editor and with a little manipulation in R, I can get a vector of street names:

1
2
3
4
5
6
7
8
9
10
11
> rm(list=ls(all=TRUE))
> library(rjson)
> library(plyr)
>
> con <- file("streets-modified.csv", open="rt", encoding="latin1")
> all.streets <-unique( readLines(con))
> head(all.streets)
[1] "street" "Aberdeen Rd " "Academy St " "Acton St "
[5] "Adamian Park " "Adams St "
> length(all.streets)
[1] 561

Looks like about 561 streets total. From my hardcopy assessment I learn that:

  • Most streets don’t begin with 1 as the first address
  • Addresses aren’t consecutive - there can be many gaps in numbering
  • Most streets have fewer than 200 addresses
  • A few of the main streets have up to 1600 addresses

Query for GPS coordinates

To obtain street addresses with lng/lat data I will use the DataScienceToolkit Google Style geocoder. To obtain coordinates I need to submit an address that looks like:

1 Aberdeen Road, Arlington, MA

This will return a data set that looks like:

[{“geometry”:{“viewport”:{“northeast”:{“lng”:-71.188745,”lat”:42.424604},”southwest”:{“lng”:-71.190745,”lat”:42.422604}},”location_type”:”ROOFTOP”,”location”:{“lng”:-71.189745,”lat”:42.423604}},”types”:[“street_address”],”address_components”:[{“types”:[“street_number”],”short_name”:”1”,”long_name”:”1”},{“types”:[“route”],”short_name”:”Aberdeen Rd”,”long_name”:”Aberdeen Rd”},{“types”:[“locality”,”political”],”short_name”:”Arlington”,”long_name”:”Arlington”},{“types”:[“administrative_area_level_1”,”political”],”short_name”:”MA”,”long_name”:”MA”},{“types”:[“country”,”political”],”short_name”:”US”,”long_name”:”United States”}],”formatted_address”:”1 Aberdeen Road Arlington MA”}]

You can see that lng= -71.188745 and lat=42.424604. I want to submit algorithmically, using a post to their URL. The post will look like:

http://www.datasciencetoolkit.org/maps/api/geocode/json?sensor=false&address=1+Aberdeen+Rd+Arlington+MA

I also want to count up the street addresses beginning at 1. This will give me a lng/lat for every address depending on how the geocoder handles addresses that don’t exist. Let’s find out.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
> 
>
> prefix <- 'http://www.datasciencetoolkit.org/maps/api/geocode/json?sensor=false&address='
> suffix <- "Arlington+MA"
>
> arl.addresses <- data.frame( lng = numeric(), lat = numeric(), num=numeric(), street = character())
>
> prefix <- 'http://www.datasciencetoolkit.org/maps/api/geocode/json?sensor=false&address='
> suffix <- "Arlington+MA"
> arl.addresses <- data.frame( lng = numeric(), lat = numeric(), num=numeric(), street = character())
>
> for( i in 1:20){
+ middle <- gsub(" ","+",all.streets[10])
+ url <- paste( prefix, i, "+", middle, suffix, sep="") #replace the space with a + sign
+ doc <- fromJSON(file=url, method='C')
+ arow <- data.frame( lng=doc$results[[1]]$geometry$location$lng,
+ lat=doc$results[[1]]$geometry$location$lat,
+ num=doc$results[[1]]$address_components[[1]]$short_name,
+ street=doc$results[[1]]$address_components[[2]]$short_name)
+
+ arl.addresses <- rbind(arl.addresses, arow)
+
+ }
There were 20 warnings (use warnings() to see them)
>
> arl.addresses
lng lat num street
1 -71.14672 42.41162 1 Allen St
2 -71.14672 42.41162 2 Allen St
3 -71.14672 42.41151 3 Allen St
4 -71.14684 42.41160 4 Allen St
5 -71.14678 42.41145 5 Allen St
6 -71.14690 42.41153 6 Allen St
7 -71.14683 42.41139 7 Allen St
8 -71.14697 42.41146 8 Allen St
9 -71.14688 42.41132 9 Allen St
10 -71.14703 42.41139 10 Allen St
11 -71.14694 42.41126 11 Allen St
12 -71.14709 42.41132 12 Allen St
13 -71.14699 42.41120 13 Allen St
14 -71.14715 42.41125 14 Allen St
15 -71.14705 42.41114 15 Allen St
16 -71.14722 42.41118 16 Allen St
17 -71.14710 42.41108 17 Allen St
18 -71.14728 42.41110 18 Allen St
19 -71.14716 42.41102 19 Allen St
20 -71.14734 42.41103 20 Allen St
>

Allen Street addresses begin at 12 so it looks like we will assigned lng/lat to nonexisting addresses. 1 and 2 share the same lng/lat. This also happens at the high end e.g. if the highest address is 100, 1 through 100 will be assigned unique lng/lats, and then the lng/lats become constant. Modify the code to stop querying once both lng and lat aren't changing.
Share