When files are uploaded into Logikcull, the file content (proxied by an MD5 hash) and the metadata are compared, and assuming both are identical, a duplicate is identified. A MD5 hash is the fingerprint of the file.
Logikcull deduplication uses the following fields to calculate the hash value for email data:
- Email Subject
- Sent Date + Time
For E-docs and non-mail items, it is calculated down to a binary level bit-by-bit calculation based on the content. That is, it is based mainly on the following:
- Content/body of the image
- Created Date
- File Size
Please keep in mind, for calendar invitations, a special set of fields are also used to calculate the hash values for these documents:
- Attachment Name
If you have recurring calendar invite entries, we typically recommend reviewing them outside of Logikcull or, alternatively, uploading these entries as part of a database upload with the following fields populated in your metadata load file: Appointment Start Date, Appointment Start Time, Appointment End Date, and Appointment End Time.
Logikcull de-duplicates at the family level, and as long as the fields referenced above match, it is identified as a duplicate. Exact duplicates are hidden by default at a family level. For example, if a Word document is in a folder but the same Word document is attached to an email, you’ll see the document twice because one stands alone and another is part of that family/attachment context. We preserve these family level relationships because a file’s context may differ as part of a family. Another example is if an email with the same attachment is sent out to two different parties with different bodies of text, the attachment would be identified as a duplicate, but since the parent emails are different, both families would remain in the de-dupe view. If you're looking to see only one instance of every document (regardless of family structure) you can run the syntax file_duplicate:false
The “Has Duplicate” tag (under QC tags) applies to both dupes and the original copy. This tag compares the metadata, and if there is an exact match, it would show up here. In the example above, the attachment has a duplicate because they have the same metadata.
In the case that you want to tag all of the duplicates the same for consistency, you can do this by hovering over the ellipsis (...) next to the tag name in the document info panel. As a reminder, you would only see this option if there are duplicates for the document that you currently have up in the image viewer.