When files are uploaded into Logikcull, the file content (proxied by an MD5 hash) and the metadata are compared, and assuming both are identical, a duplicate is identified. A MD5 hash is the fingerprint of the file.

How is email uniqueness determined?

How is non-email uniqueness determined?

How are calendar invitations deduplicated?

Family Status impacts deduplication

What is the "Has Duplicate" QC Tag?

How to bulk tag duplicates

How is email uniqueness determined?

Logikcull deduplication uses the following fields to calculate the hash value for email data:

  • From

  • To

  • CC

  • BCC

  • Email Subject

  • Sent Date + Time

How is non-email uniqueness determined?

For E-docs and non-mail items, the MD5 Hash value is calculated down to a binary level bit-by-bit calculation based on the content. That is, it is based mainly on the following:

  • Content/body of the image

  • Created Date

  • File Size

How are calendar invitations deduplicated?

Please keep in mind, for calendar invitations, a special set of fields are also used to calculate the hash values for these documents:

  • From

  • To

  • CC

  • Subject

  • Attachment Name

If you have recurring calendar invite entries, we typically recommend reviewing them outside of Logikcull or, alternatively, uploading these entries as part of a database upload with the following fields populated in your metadata load file: Appointment Start Date, Appointment Start Time, Appointment End Date, and Appointment End Time.

Family status impacts the dedupe view

Logikcull deduplicates at the family level, and as long as the fields referenced above match, it is identified as a duplicate. Exact duplicates are hidden by default at a family level. For example, if a Word document is in a folder but the same Word document is attached to an email, you’ll see the document twice because one stands alone and another is part of that family/attachment context. We preserve these family level relationships because a file’s context may differ as part of a family. Another example is if an email with the same attachment is sent out to two different parties with different bodies of text, the attachment would be identified as a duplicate, but since the parent emails are different, both families would remain in the dedupe view.

If you're looking to see only one instance of every document (regardless of family structure) you can run the syntax file_duplicate:false

What is the "Has Duplicate" QC Tag?

The “Has Duplicate” QC Tag applies to both dupes and the original copy. This tag compares the metadata, and if there is an exact match, it would show up here. In the example above, the attachment has a duplicate because they have the same metadata.

How to bulk tag duplicates

In the case that you want to tag all of the duplicates the same for consistency, you can do this by hovering over the ellipsis (...) next to the tag name in the document info panel. As a reminder, you would only see this option if there are duplicates for the document that you currently have up in the image viewer.

Did this answer your question?