Duplicate Data Removal Tool

Background

The 'de-duper' is provided as a facility to help our users where the service has inadvertently duplicated their data. Generally this only happens as a result of software bugs: either in our service or in the device client software. The service has extensive provisions for detecting and avoiding duplicate creation by the device software but it isn't 100% perfect. For example if a new duplication bug appears in one of the devices at a firmware upgrade, the service won't be aware of that problem initially.

Status

The de-duper is not exposed anywhere in the Nuevasync web site. This is deliberate. We don't think the tool is sufficiently refined to be used at will by all users. However, it's significantly easier than manually deleting duplicate data. The de-duper does absolutely no 'merging' between records. It only looks for what it considers to be identical records and deletes all except one of them.

How to use it

Simply log into the web site and visit this page. Then follow the instructions.

Experimental Fuzzy Matching

The tool has code designed to remove duplicate records that are similar but not identical. These are typically created when setting up sync with a device that does not delete existing data at that time. For example a Windows Mobile phone already has contacts, and sync is enabled with Google, where the same set (or at least an overlapping set) of contacts exists. In this case the WM device will send its contacts to the service, and thence to Google. Similarly the Google contacts will be synced to the device. There will now be two sets of contacts. These may be slightly different since they have distinct provenance. For example one copy of a contact may have the mobile phone number while the other doesn't. The regular de-duper will not see such records as duplicates. However, if the string ?strictness=low is appended to the de-duper URL (like this), an experimental fuzzy match is used. This should detect the kinds of similar but not identical contacts described above. When deciding which records to delete, those containing less information are deleted. Note that no merging is done and this means that some data can be lost. For example one contact with a postal address and a home phone number will be retained while a duplicate containing a mobile phone number will be deleted. The mobile phone number won't be added to the retained record. This fuzzy duplicate matching feature is experimental and unsupported.

last modified by David Boreham on 2010/01/07 10:10

Creator: David Boreham on 2009/04/09 10:22
Copyright 2008-2011 Nuevasync, Inc
XWiki Enterprise