[openskills-dev] Transition to the SkillsTree

lists at bespoke.homelinux.net lists at bespoke.homelinux.net
Fri Aug 31 00:46:15 BST 2007


On Thu, Aug 30, 2007 at 05:32:48PM +0100, Bruce Badger wrote:

> For this purpose I guess we would need to record accurate boundary
> information for a location.

That's the hard way to do it... 

The easy way is to just store the center point of the location
plus a radius. All locations are circular. When there's overlap,
take the closest so boundaries between locations are straight lines.
This is not good enough for electoral boundaries but still provides
for some neat search capabilities. PostgreSQL already supports this
geometry (including index capabilities).

The next step up is for each location to be an ellipse, that's a
centre point plus two vectors (long radius and short radius).
An ellipse can blot over most of the "funny shape" regions
(e.g. Redfern in Sydney).

Then you can build a more complex region out of sub-regions to
get elbow-shapes. There are methods to construct boundaries out
of line segments and curves but that is totally overkill for a
job search.

> I imagined the location link being used for a simple point ...

A point plus a radius (i.e. every region is circular) still offers
good search capabilities. You can (for example) be completely sure
that a given point is outside a given region and you can be about
80% sure that a given point is inside a given region.

> but Wikipedia seems better having thought about it.

The Wikipedia link is useful, but not for searching because
Wikipedia is currently only keyword searchable. Thus, if you want
to do tricky geometry you have to get lat/long values into your own
index. You can probably write a script to fetch back lat/long data
from the wikipedia pages once you have the links because usually
the wikipedia pages are in a consistent format -- not really a 
format designed for machine parsing but probably acceptable.

> I would expect the more sophisticated physical information to be in
> addition to the more fuzzy thing which then begs questions about
> exactly what we would hold and how we would support the searching you
> suggest.  I think this is one for the future, though :-]

Wikipedia provides an area measurement for some places,
you could estimate the radius as some proportion of the square
root of the area and come up with something moderately accurate.
Again, machine-parsing wikipedia pages might be a bit hairy but
probably would work.

Circles are native in PostgreSQL using "(( x, y ), r )" syntax.

The operators that are most useful would be "<->" which is linear
distance (i.e. you can sort by this result) and "&&" which is overlap
(i.e. use this in the WHERE clause).

e.g.:

   circle '((0,0),1)' <-> circle '((5,0),1)'


There are comments on the Rtree index efficiency here:

   http://www.bespoke.homelinux.net/geo/

The newer versions are trying to push GiST index types, I haven't
tried them but I'm guessing they would be pretty close to the Rtree.

So actually, fast geometric search is not as difficult as you might
initially expect, provided we are willing to go with a few approximations.


	- Tel



More information about the OpenSkills-dev mailing list