This feature is available to subscription customers on projects created on or after June 15, 2021.

What is the Similar Docs tool?

Similar Docs allows you to quickly and easily view near duplicates.

A near duplicate is a document that shares overlapping text in the same order as other documents in your project. For example, a contract with redlines on it would be a near duplicate of the original non redlined contract.

Using Similar Docs is an efficient way of quickly pulling up related documents to gain a fuller understanding of the evolution of a document, the way in which data developed in your collection, and more!

To access Similar Docs, make sure it's added to the right-side toolbar.
Learn more about customizing document tools here.

How does it work?

During indexing, Logikcull analyzes your document’s text to extract information that can be used to quickly compare it to other documents in your workspace. Matching documents are retrieved and assigned an overlap strength score to help you assess similarity at a glance.

The percentage score reflects the overlap in text appearing in the same order after Logikcull has analyzed the extracted text between two documents.

How do I see the differences between documents?

To view the differences between the document in your search results/review set and a document listed in Similar Docs:

  1. Select a similar document from the right sidebar

  2. Click the "Difference Viewer" toggle

  3. The document text will appear side-by-side with differences highlighted.

Can I take bulk actions with similar docs (tag, assign, cull)?

Yes. You can manually select using checkboxes, or click the "Select all" button, then use the kabob (three vertical dots) to choose your bulk action with the selection. To remove all checkmarks, click "Select none."

What kind of data does Logikcull look at to determine similar docs?

Logikcull looks specifically at the text of your document to determine Similar Docs. This means it is particularly powerful when used for documents with a lot of text, even if the file types are different (e.g., content from PDF compared to content from email or Word Documents).

For example, a redlined contract would be given a strong near duplicate score to its original due to the proportion of similarly ordered text throughout.

Logikcull will not examine raw image, audio, and video files; but will review any available transcripts from those files for near dupe analysis.

Did this answer your question?