in application storing geohash of users in table , want find neighbors of user using geohashes.
as per info gathered geohash on wiki:
when used in database, structure of geohashed data has 2 advantages. first, data indexed geohash have points given rectangular area in contiguous slices (the number of slices depends on precision required , presence of geohash "fault lines"). useful in database systems queries on single index easier or faster multiple-index queries. second, index structure can used quick-and-dirty proximity search - closest points among closest geohashes.
so e.g. find neighbors of "sj8101b085", had planned search hashes doing:
select * users geohash 'sj8101b085%' and firing same query reducing hash length 1 one i.e. "sj8101b08%", "sj8101b0%" , on till desired number of neighbors. under impression need do.
but found c library libgeohash referred @ bottom of same article. library has function called geohash_get_adjacent gives neighboring hashes of given hash. geohash string represents rectangle area on earth. , function returns geohashes representing neighboring rectangles. means got run function in recursion (neighbors , neighbors of neighbors , on) till desired number of neighbors.
now confused how shall write search algorithm? using first approach or using second?
a geohash bit-string, bits represent longitude , odd bits represent latitude. each bit of representation of longitude, example, selects half of feasible area. initial feasible area [-180, 180], , if first bit longitude 0, next feasible area becomes [-180, 0], , if 1, becomes [0, 180]. first 2 bits, taken together, select half of earth above or below equator half of earth left or right of prime meridian. can think of "rectangular area", called in wikipedia link. first 4 bits, taken together, select half of northern or southern hemisphere, half of eastern or western hemisphere. et cetera.
the geohash, ezs42, presented in link base 32, each character represents 5 bits of geohash. implication of example hash being 5 characters, geohash 25 bits, 13 of longitude, , 12 of latitude. means longitude divided in half 13 times, while latitude divided in half 12 times, , geohash selects both 1 out of twelve latitudinal extents , 1 out of thirteen longitudinal extents. each character remove end of hash eliminates 5 bits geohash; amounts either 3 divisions longitude , 2 divisions latitude, or vice versa. said otherwise, increases longitudinal extent factor of 8, , latitudinal extent factor of 4, or vice versa. querying geohash gives points fall within corresponding "rectangular" area.
i not familiar libgeohash; however, description, sounds though give geohash , gives collection of geohashes represent neighboring "rectangular" areas @ granularity implied input. presumably, if use find nearest neighbors, you'll need keep track of geohashes have visited , have not, , you'll have repeatedly ask neighbors, until find points looking. visually, fanning out initial geohash of "rectangles" size of original "rectangle". you'll need careful not consider first point find in 1 of neighboring areas, neighboring area might have point closer query point; is, you'll need consider points all neighbors before searching k nearest query point (this means, example, need ask , query points neighbors of 8 neighbors of original "rectangle" before looking k nearest on second iteration of neighbor approach).
considering libgeohash neighbor approach, if original "rectangle" small (say, inches inches), , points sparse enough, take enormous amount of time until cover enough of earth fanning out technique until find points. prefix approach, on other hand, might points dense enough increasing extents 4 , 8 times gives large number of points consider. in either case, if you're looking k nearest neighbors, you'll still need test resulting points distance select k of them nearest. in end, choice depend upon data; however, i'd suggest starting prefix approach, since simpler neighboring "rectangular" area approach.
Comments
Post a Comment