Ancestry.com, BehindTheName.com, and WeRelate.org
announce an improved approach to finding variant names in genealogy
searches. Up to now, most genealogy websites have had to rely upon
Soundex to return variant names in response to searches.
These approaches often miss variants that should be returned, or
include variants that aren't very similar.
Ancestry.com, BehindTheName.com, and WeRelate.org
have created an open-source database of name variants that is free for
any website or genealogy software developer to use. Tested against pairs
of names provided by Ancestry.com, it reduces
the number of missed name variants by over 25% in comparison with
Soundex.
How you can help: A large portion of genealogical
expertise involves learning variant spellings for the surnames in your
tree. Why not share your knowledge with others? By adding your variant
spellings to the database, searches on any website
that uses it will include your variant spellings automatically. You can
review and add variant spellings here:
http://www.werelate.org/wiki/Special:Names
In addition, we need people to review the changes
that others have made to the database, to make sure that we have
multiple pairs of eyes reviewing the names that are being added and
removed. You can review changes that others have made
here: http://www.werelate.org/wiki/Special:NamesLog
If you are a website or software developer: The database and source code are available at:
https://github.com/DallanQ/Names
In addition to the database of name variants, the
source code also includes a function to return the similarity score
between any two names. This function has been found useful in duplicate
detection.
More information about the project can be found at:
http://www.werelate.org/wiki/WeRelate:Variant_names_project
No comments:
Post a Comment