This feature is available to subscription customers on projects created on or after June 15, 2021.

What is the Similar Docs tool?

Similar Docs allows you to quickly and easily view near duplicates.

A near duplicate is a document that shares overlapping text in the same order as other documents in your project. For example, a contract with redlines on it would be a near duplicate of the original non redlined contract.

Using Similar Docs is an efficient way of quickly pulling up related documents to gain a fuller understanding of the evolution of a document, the way in which data developed in your collection, and more!

To access Similar Docs, make sure it's added to the right-side toolbar of the document viewer.
Learn more about customizing document tools here.

How does it work?

While your upload is being processed, Logikcull analyzes your document’s text to extract information that can be used to quickly compare it to other documents in your workspace. Matching documents are retrieved and assigned an overlap strength score to help you assess similarity at a glance.

The percentage score reflects the overlap in text appearing in the same order after Logikcull has analyzed the extracted text between two documents.

How do I see the differences between documents?

To view the differences between the document in your search results/review set and a document listed in Similar Docs, simply click "Compare" a document listed in the SimDocs panel to see a side-by-side comparison.

Can I take bulk actions with similar docs (tag, assign, cull)?

Yes. You can manually select using checkboxes, or click the "Select all" button, then use the kabob (three vertical dots) to choose your bulk action with the selection.

What kind of data does Logikcull look at to determine similar docs?

Logikcull looks specifically at the text of your document to determine Similar Docs. This means it is particularly powerful when used for documents with a lot of text, even if the file formats are different (e.g., content from PDF compared to content from email or Word Documents).

For example, a redlined contract would be given a strong near duplicate score to its original due to the proportion of similarly ordered text throughout.

Logikcull will not examine raw image, audio, and video files; but will review any available transcripts from those files for near dupe analysis.

Did this answer your question?