BookTranslator
BookTranslator

ᖃᓄᖅ ᐊᔾᔨᙳᐊᓂ PDF ᑐᑭᓕᐅᕐᓂᖅ: OCR + ᑐᑭᓕᐅᕐᓂᖅ ᐃᓘᓐᓇᖓ ᐅᖃᐅᓯᒃᓴᖅ

Scanned PDF-ᑦ ᑎᑎᕋᐅᓯᕐᓂᒃ ᐱᔭᐅᔪᓐᓇᖅᑐᓂᒃ ᐃᓗᓕᖃᙱᓚᑦ; ᑎᑎᕋᐅᓰᑦ ᐊᔾᔨᙳᐊᖑᓗᑎᒃ ᑕᑯᔭᐅᓲᑦ — ᑕᐃᒪᐃᒻᒪᑦ Google Translate ᐊᓯᔾᔨᓯᒪᙱᓪᓗᓂ ᐅᑎᖅᑎᑦᑎᓲᖅ. ᐅᕙᓂ OCR + AI ᐱᓕᕆᐊᖅ ᑕᒪᓐᓇ ᐋᖅᑭᒃᓱᓲᖅ.

BookTranslator

BookTranslator Team

Tarjumiqarnermik Ilitsersuutit8 min read

ᓱᒃᑲᐃᑦᑐᖅ ᑭᐅᔾᔪᑦ: ᐊᔾᔨᙳᐊᓂ PDF ᑐᑭᓕᐅᕆᓂᐊᕈᕕᑦ OCR ᓯᕗᓪᓕᐅᔭᕆᐊᓕᒃ

ᐊᔾᔨᙳᐊᓂ PDF ᑐᑭᓕᐅᕆᓂᐊᕈᕕᑦ, ᓯᕗᓪᓕᖅ OCR ᐊᐅᓚᑦᑎᓗᒍ ᒪᑉᐱᒐᐃᑦ ᐊᔾᔨᙳᐊᖏᑦ ᓂᕈᐊᕐᓗᒋᑦ ᑎᑎᕋᐅᓯᕐᓄᑦ ᐊᓯᔾᔨᖅᓯᓗᒋᑦ. ᑭᖑᓂᐊᒍᑦ OCR-ᒧᑦ ᐱᓕᕆᐊᕆᔭᐅᓯᒪᔪᖅ PDF PDF Translator-ᒥ ᑎᑎᕋᖅᑎᒃᓴᐅᑉ ᑐᑭᓕᐅᕈᑎᒥ ᑐᑭᓕᐅᕐᓗᒍ. OCR ᒪᒥᓴᙱᒃᑯᕕᐅᒃ, ᐊᒥᓱᑦ ᑐᑭᓕᐅᕈᑎᑦ ᓯᕗᓪᓕᐅᔪᖅ file ᐊᓯᔾᔨᙱᓪᓗᓂ ᐅᑎᖅᑎᑦᑎᓲᑦ, ᒪᑉᐱᒐᓂᒃ ᐊᓂᒍᐃᓲᑦ, ᐅᕝᕙᓘᓐᓃᑦ text layer-ᓕᖕᒥᑐᐊᖅ ᐃᓚᖏᓐᓂᒃ ᑐᑭᓕᐅᕆᓲᑦ.

ᐅᕙᓂ ᐱᓕᕆᐊᖅ ᐊᑐᕐᓗᒍ:

  1. PDF ᐅᒃᑯᐃᕐᓗᒍ ᐊᒻᒪ ᐅᖃᐅᓯᖅ ᐃᓚᖓ ᓂᕈᐊᕐᓗᒍ ᐆᒃᑐᕐᓗᑎᑦ.
  2. ᑎᑎᕋᐅᓯᕐᒥᒃ ᓂᕈᐊᕈᓐᓇᙱᒃᑯᕕᑦ, OCR ᐊᐅᓚᑦᑎᓗᒍ.
  3. OCR-ᒧᑦ ᐱᔭᐅᓯᒪᔪᖅ ᑎᑎᕋᐅᓯᖅ ᑐᑭᓕᐅᕆᓂᐊᙱᓂᕐᓂ ᕿᒥᕐᕈᓗᒍ.
  4. OCR-processed PDF PDF Translator-ᒧᑦ ᐃᓕᓗᒍ.
  5. ᑐᑭᓕᐅᕆᔭᐅᔪᖅ ᐱᔭᐅᔪᖅ ᓯᕗᓪᓕᐅᔪᒥᒃ scan-ᒥᒃ ᓴᓂᓕᕇᑦᑎᓗᒍ ᕿᒥᕐᕈᓗᒍ.

PDF-ᐃᑦ ᓂᕈᐊᕈᓐᓇᖅᑐᒥᒃ ᑎᑎᕋᐅᓯᖃᓕᖅᐸ ᐊᒻᒪ ᐊᔪᕐᓇᕈᑎᖓ layout-ᐅᑉ ᐊᑕᐅᓯᐅᖏᓐᓇᕐᓂᖓᐅᑉ, formatting ᐊᓯᔾᔨᙱᓪᓗᒍ PDF ᑐᑭᓕᐅᕐᓂᖅ ᐅᖃᐅᓯᒃᓴᖓ ᐊᑐᕐᓗᒍ.

ᓱᒻᒪᑦ Scanned PDF-ᑦ ᑐᑭᓕᐅᕈᑎᓂ ᐊᔪᕐᓇᖅᓯᓲᖑᕙᑦ

Scanned PDF ᐊᒥᓱᓂᒃ ᒪᑉᐱᒐᐅᑉ ᐊᔾᔨᙳᐊᖏᓐᓂᒃ PDF container-ᒥ ᐃᓗᓕᖃᖅᑐᖅ ᑭᓯᐊᓂᐅᒐᔪᒃᐳᖅ. ᒪᑉᐱᒐᖅ ᐅᖃᐅᓯᕐᓂᒃ ᐃᓄᒻᒧᑦ ᑕᐅᑐᒃᑎᑦᑎᔪᓐᓇᖅᑐᖅ, ᑭᓯᐊᓂ file-ᖓ software-ᒧᑦ ᐱᔭᐅᔪᓐᓇᖅᑐᒥᒃ ᑎᑎᕋᐅᓯᕐᒥᒃ ᐃᓗᓕᖃᕐᓂᖏᓐᓇᕐᓗᓂ.

ᑕᒪᓐᓇ ᓇᐃᑦᑐᒥᒃ ᐊᔪᕐᓇᓯᓂᕐᒥᒃ ᓴᖅᑭᑦᑎᓲᖅ:

File ᐊᐃᑉᐸᖓᑐᑭᓕᐅᕈᑎᐅᑉ ᑕᐅᑐᒃᑕᖓᓱᓕᔪᖅ
ᑎᑎᕋᐅᓯᕐᓂᒃ ᐃᓗᓕᓕᒃ PDFᑎᑎᕋᐅᓯᖅ ᐊᒻᒪ layout dataᑐᑭᓕᐅᕐᓂᖅ ᑕᐃᒪᐃᓕᓴᕋᐃᒍᓐᓇᖅᑐᖅ.
ᐊᔾᔨᙳᐊᑐᐃᓐᓇᖅ scanned PDFᒪᑉᐱᒐᐃᑦ ᐊᔾᔨᙳᐊᖏᑦOCR ᓯᕗᓪᓕᐅᔭᕆᐊᓕᒃ.
ᑎᑎᕋᐅᓯᖅ-ᐊᔾᔨᙳᐊᖅ PDFscan ᐊᔾᔨᙳᐊᖅ ᐊᒻᒪ ᒫᓐᓇ OCR text layerᑐᑭᓕᐅᕆᓂᖅ ᐊᐅᓚᔪᓐᓇᖅᑐᖅ, ᑭᓯᐊᓂ OCR ᑕᖅᑳᖅᑐᑦ ᐱᐅᓂᖓᓂ ᐊᑦᑕᕐᓇᖅᐳᑦ.

ᐱᐅᓛᖅ ᖃᐅᔨᓴᐅᑦ technical-ᒥᑐᐊᖅ ᐊᑐᙱᓚᖅ:

  1. PDF ᐅᒃᑯᐃᕐᓗᒍ.
  2. ᐅᖃᐅᓯᕐᓂᒃ ᐃᓚᐃᓐᓇᖏᓐᓂᒃ highlight ᖃᐅᔨᓴᕐᓗᒍ.
  3. ᐅᖃᐅᓯᖅ ᐊᑕᐅᓯᖅ copy-ᕐᓗᒍ.
  4. ᑎᑎᕋᖅᕕᒻᒧᑦ paste-ᕐᓗᒍ.

ᐅᖃᐅᓯᖅ ᓈᒻᒪᒃᑐᒥᒃ paste-ᖅᑕᐅᒃᐸ, PDF text layer-ᖃᖅᐳᖅ. ᐱᓯᒪᙱᒃᐸ, ᐅᕝᕙᓘᓐᓃᑦ ᒪᑉᐱᒐᖅ ᐊᑕᐅᓯᖅ ᐊᔾᔨᙳᐊᖑᔮᖅᐸ, PDF OCR-ᒥᒃ ᐱᔭᕆᐊᖃᖅᐳᖅ.

OCR ᐱᒋᐊᙱᑦᑐᓐᓇᙱᓚᖅ

OCR ᑕᐃᔭᐅᕗᖅ optical character recognition-ᒥᒃ. ᐊᔾᔨᙳᐊᕐᒥᑦ ᑎᑎᕋᐅᓯᕐᒥᒃ ᐊᓂᑎᑦᑎᓲᖅ ᐊᒻᒪ machine-readable ᑎᑎᕋᐅᓯᕐᒥᒃ ᓴᖅᑭᑦᑎᓲᖅ. PDF ᑐᑭᓕᐅᕐᓂᕐᒧᑦ, OCR ᐊᒥᓲᓂᒃ scanned ᒪᑉᐱᒐᐅᑉ ᖁᓛᒍᑦ ᑕᐅᑐᒃᑕᐅᙱᑦᑐᒥᒃ text layer-ᒥᒃ ᓴᖅᑭᑦᑎᓲᖅ.

text layer ᑕᒪᓐᓇ ᑐᑭᓕᐅᕐᓂᐅᑉ source-ᖓᓂᒃ ᐊᑐᖅᑕᐅᓲᖅ. OCR ᑕᖅᑳᖅᑐᖃᖅᐸ, ᑐᑭᓕᐅᕐᓂᖅ ᑕᒪᑦᑎᒍᑦ ᑕᖅᑳᖅᑐᓂᒃ ᒪᓕᓲᖅ.

ᐊᒥᓲᓂ OCR ᑕᖅᑳᖅᑐᑦ:

OCR ᑕᖅᑳᖅᑐᖓᑐᑭᓕᐅᕐᓂᕐᒧᑦ ᐊᑦᑕᕐᓇᖅᑐᖅ
rn m-ᒥᒃ ᐅᖃᓕᒫᕐᓂᖅᐅᖃᐅᓰᑦ ᑐᑭᖏᑦ ᐊᓯᔾᔨᓲᑦ.
1 l-ᒥᒃ ᐅᖃᓕᒫᕐᓂᖅᓈᓴᐅᑏᑦ, references, ᐅᕝᕙᓘᓐᓃᑦ codes ᑕᖅᑳᖅᓯᓲᑦ.
O 0-ᒥᒃ ᐅᖃᓕᒫᕐᓂᖅIDs, formulas, ᐊᒻᒪ ᐊᑎᖏᑦ ᐊᔪᕐᓇᖅᓯᓲᑦ.
accents ᐃᖏᕐᕋᔭᐅᙱᓐᓂᖏᑦᐊᑎᖏᑦ ᐊᒻᒪ terms ᓈᒻᒪᙱᓕᓲᑦ.
columns ᐊᑕᐅᓯᐅᖅᑕᐅᓂᖏᑦᐅᖃᐅᓰᑦ ᓈᒻᒪᙱᑦᑐᒥᒃ ᑐᓕᖓᓂᖃᖅᑐᒥᒃ ᑐᑭᓕᐅᕆᓲᑦ.
table cells ᓈᒻᒪᙱᑦᑐᒥᒃ row-by-row ᐅᖃᓕᒫᕐᓂᖅdata labels ᓈᒻᒪᙱᑦᑐᒧᑦ values-ᓄᑦ ᑐᓕᖓᓯᐊᙱᓲᑦ.
footnotes body text-ᒥ ᓄᖅᑲᖅᑕᐅᓂᖏᑦcitations ᐊᒻᒪ notes ᓈᒻᒪᙱᑦᑐᒧᑦ context-ᒧᑦ ᓄᒃᑎᑕᐅᓲᑦ.

ᑕᐃᒪᐃᒻᒪᑦ OCR ᕿᒥᕐᕈᓂᖅ ᐊᑑᑎᖃᖅᐳᖅ. ᐲᔭᐅᓯᒪᔪᖅ ᑎᑎᕋᐅᓯᖅ spot-check ᖃᐅᔨᓴᓚᐅᙱᓂᕐᓂ ᐊᔾᔨᙳᐊᓂ ᑎᑎᕋᖅᑎᒃᓴᖅ ᑐᑭᓕᐅᕆᕙᙱᓪᓗᒍ.

OCR ᓯᕗᓪᓕᐅᔪᖅ ᐱᓕᕆᐊᖅ

ᐱᓕᕆᐊᖅ 1: PDF ᐊᐃᑉᐸᖓ ᖃᐅᔨᓗᒍ

ᑎᑎᕋᐅᓯᕐᒥᒃ ᓂᕈᐊᕐᓗᑎᑦ ᐆᒃᑐᕐᓗᑎᑦ. ᓂᕈᐊᕐᓂᖅ ᐊᐅᓚᒃᐸ, OCR ᐊᑐᕆᐊᖃᕋᔭᙱᓚᑎᑦ. ᓂᕈᐊᕈᓐᓇᙱᒃᑯᕕᑦ, file ᐊᔾᔨᙳᐊᑐᐃᓐᓇᐅᔪᖅ ᓈᒻᒪᓯᐅᕐᓗᒍ.

ᒪᑉᐱᒐᖅ ᑕᐅᑐᒃᓴᕆᓗᒍ ᐊᓯᐊᒍᑦ ᕿᒥᕐᕈᒻᒥᓗᒍ:

  • ᓵᙱᑦᑐᑦ ᒪᑉᐱᒐᐃᑦ scan-ᒥᒃ ᓇᓗᓇᐃᕐᓯᓲᑦ.
  • ᖁᖅᓱᖅᑐᖅ ᐸᐃᑉᐹᑉ texture-ᖓ scan-ᒥᒃ ᓇᓗᓇᐃᕐᓯᓲᖅ.
  • spine-ᒧᑦ ᖃᓂᒃᓴᖓᓂ shadows-ᑦ ᐊᔾᔨᙳᐊᖅᑕᐅᔪᒥᒃ ᐅᖃᓕᒫᒐᕐᒥᒃ ᓇᓗᓇᐃᕐᓯᓲᑦ.
  • ᓈᒻᒪᙱᑦᑐᖅ contrast photocopy-ᐅᓂᖓᓂᒃ ᓇᓗᓇᐃᕐᓯᓲᖅ.
  • search-ᒧᑦ ᑕᑯᔭᐅᔪᓂᒃ ᐅᖃᐅᓯᕐᓂᒃ ᓇᓂᓯᙱᒃᑯᓂ, text layer ᐃᓗᓕᖃᙱᓂᖓ ᓇᓗᓇᐃᕐᓯᓲᖅ.

ᐱᓕᕆᐊᖅ 2: scan ᐱᐅᓂᖓ ᐱᐅᓯᒋᐊᕐᓗᒍ ᐱᔪᓐᓇᕈᕕᑦ

OCR ᐱᐅᓂᖓ ᐊᔾᔨᙳᐊᑉ ᐱᐅᓂᖓᓂᑦ ᐱᒋᐊᓲᖅ. scan ᓄᑖᒥᒃ ᐱᔪᓐᓇᕈᕕᑦ, OCR ᑕᖅᑳᖅᑐᓂᒃ ᖃᐅᔨᓴᓂᕐᒥᒃ ᓯᕗᓂᐊᒍᑦ ᑕᒪᓐᓇ ᐱᓕᕆᐊᕆᓗᒍ.

ᐅᕙᓂ image-quality checklist ᐊᑐᕐᓗᒍ:

  • ᒥᑭᔪᓂᒃ ᑎᑎᕋᐅᓯᕐᓂᒃ ᓈᒻᒪᒃᑐᒥᒃ ᑕᑯᔭᐅᓯᐊᕈᓐᓇᖅᑐᒥᒃ resolution-ᒥ scan-ᕐᓗᒍ.
  • ᒪᑉᐱᒐᐃᑦ ᓴᕐᕙᖓᔪᑦ ᐊᒻᒪ ᓇᖏᖅᑎᑕᐅᓯᒪᓗᑎᒃ ᑎᒍᓗᒋᑦ.
  • spine-ᒧᑦ ᖃᓂᒃᓴᖓᓂ shadows-ᑦ ᐱᖃᖅᑕᐅᙱᓪᓗᑎᒃ.
  • table edge-ᑦ, ᐊᒡᒐᐃᑦ, ᐅᕝᕙᓘᓐᓃᑦ background clutter ᐲᕐᓗᒋᑦ.
  • ᑎᑎᕋᐅᓯᖅ ᐊᒻᒪ ᒪᑉᐱᒐᖅ ᐊᑯᓐᓂᖓᓂ ᓴᙱᔪᒥᒃ contrast-ᒥ ᐊᑐᕐᓗᒍ.
  • ᐅᖃᐅᓯᖅ ᑎᑎᕋᕐᓂᖓ ᐃᓘᓐᓇᖓ ᑕᑯᔭᐅᑎᓪᓗᒍ.
  • ᒪᑉᐱᒐᖅ ᓈᒻᒪᒃᑐᒥᒃ orientation-ᒥ ᐃᓕᓗᒍ.
  • ᐊᔾᔨᙳᐊᖅ ᓱᒃᑯᑦᑎᑕᐅᓗᐊᙱᓪᓗᓂ ᑎᑎᕋᐅᓰᑦ ᖃᐅᓴᖅᑕᐅᖏᓪᓗᑎᒃ.

ᐅᖃᓕᒫᒐᑐᖃᓄᑦ ᐊᒻᒪ photocopy-ᓄᑦ, deskewing, contrast correction, ᐊᒻᒪ focus-ᒥ ᐱᐅᙱᑦᑐᓂᒃ ᒪᑉᐱᒐᓂᒃ ᓄᑖᖅ scan-ᓂᖅ ᐊᒥᓱᖅ ᐱᐅᓛᓂᒃ ᐱᔭᐅᓲᑦ.

ᐱᓕᕆᐊᖅ 3: OCR ᐊᐅᓚᑦᑎᓗᒍ

OCR tool ᓂᕈᐊᕐᓗᒍ document-ᒥᑦ ᐊᑐᕐᓗᑎᑦ, brand-ᒥᑦ ᐊᑐᙱᓪᓗᑎᑦ.

OCR optionᐱᐅᓛᖅ ᐅᑯᓄᖓᖃᐅᔨᒋᐊᓕᑦ
Adobe Acrobat OCRᐊᑕᐅᓯᕐᒥ business scan-ᑦ ᐊᒻᒪ PDF cleanupᒫᓐᓇ plan-ᒥ ᐊᑐᕈᓐᓇᕐᓂᖓ ᖃᐅᔨᓗᒍ.
ABBYY FineReaderᐊᔪᕐᓇᖅᑐᑦ scan-ᑦ, tables, columns, ᐊᒻᒪ ᐋᖅᑭᒃᓱᐃᓂᑦᒫᓐᓇᓗᐊᙱᑦᑐᖅ; ᐃᓄᒻᒧᑦ ᕿᒥᕐᕈᓂᖅ ᓱᓕ ᐱᔭᕆᐊᖃᖅᐳᖅ.
Tesseract or OCRmyPDFlocal-ᒥ, technical-ᒥ, ᐅᑎᖅᑕᐅᔪᓐᓇᖅᑐᓂ OCR workflow-ᓄᑦcommand-line tools-ᓄᑦ ᓈᒻᒪᒋᔭᖃᕆᐊᖃᖅᑐᖅ.
Online OCR toolsᐊᑦᑎᑦᑐᒥᒃ risk-ᖃᖅᑐᓂ occasional files-ᓄᑦprivacy, file limits, ᐊᒻᒪ quality ᐊᓯᔾᔨᓲᑦ.
Phone scanning appsᓄᑖᒥᒃ scan-ᒥᒃ ᓱᒃᑲᐃᑦᑐᒥᒃ ᑎᒍᓯᓂᕐᒧᑦperspective distortion OCR-ᒥᒃ ᐱᐅᙱᓕᕆᓲᖅ.

ᓇᒻᒥᓂᖅ contracts-ᓄᑦ, medical records-ᓄᑦ, financial documents-ᓄᑦ, unpublished manuscripts-ᓄᑦ, ᐅᕝᕙᓘᓐᓃᑦ academic work-ᒧᑦ review-ᒥ ᐃᓗᓕᓐᓄᑦ, local OCR workflow ᐅᕝᕙᓘᓐᓃᑦ ᑐᓐᖓᓇᖅᑐᒥ trusted environment-ᒥ ᐊᑐᕐᓗᑎᑦ ᐱᐅᓂᖅᓴᐅᕗᖅ. sensitive scan-ᑦ random free OCR site-ᓄᑦ ᐃᓕᕙᙱᓪᓗᒋᑦ.

ᐱᓕᕆᐊᖅ 4: OCR ᑎᑎᕋᐅᓯᖅ ᕿᒥᕐᕈᓗᒍ

ᕿᒥᕐᕈᓂᖅ ᑐᑭᓕᐅᕆᓂᐊᙱᓂᕐᓂ ᐱᓕᕆᐊᕆᓗᒍ, ᑭᖑᓂᐊᒍᑦ ᐊᑐᙱᓪᓗᒍ. ᐊᔪᕐᓇᖅᑐᓂᒃ ᒪᑉᐱᒐᓂᒃ ᐃᓚᖏᑦ copy-ᕐᓗᒋᑦ ᐊᒻᒪ ᐅᖃᓕᒫᕈᓐᓇᖅᑐᒥᒃ ᐱᖃᕐᒪᖔᑕ ᖃᐅᔨᓴᕐᓗᒋᑦ.

ᕿᒥᕐᕈᕕᒋᐊᓖᑦ ᒪᑉᐱᒐᐃᑦ:

  • title page.
  • ᐊᒥᓱᓂᒃ ᑎᑎᕋᐅᓯᖃᖅᑐᖅ body page.
  • table-ᓕᒃ ᒪᑉᐱᒐᖅ.
  • footnotes-ᓕᒃ ᒪᑉᐱᒐᖅ.
  • ᒥᑭᔪᓂᒃ ᑎᑎᕋᐅᓯᖃᖅᑐᖅ ᒪᑉᐱᒐᖅ.
  • stamps, handwriting, ᐅᕝᕙᓘᓐᓃᑦ marginal notes-ᓕᒃ ᒪᑉᐱᒐᖅ.
  • document multilingual-ᐅᒃᐸ, ᐅᖃᐅᓯᕆᔭᐅᔪᓕᒫᓂ ᒪᑉᐱᒐᖅ ᐃᓚᖓ.

ᐅᑯᓂᖓ ᕿᒥᕐᕈᓗᑎᑦ:

  • ᐱᑕᖃᙱᑦᑐᑦ paragraphs.
  • ᐊᑕᐅᓯᐅᖅᑕᐅᓯᒪᔪᑦ columns.
  • ᓄᖅᑲᖅᑕᐅᔪᑦ words.
  • ᑕᖅᑳᖅᑐᑦ characters.
  • ᐱᓯᒪᙱᑦᑐᑦ diacritics.
  • table labels values-ᓂᑦ ᐊᕕᒃᓯᒪᔪᑦ.
  • headers body text-ᒧᑦ ᐃᓕᔭᐅᔪᑦ.
  • page numbers ᐅᖃᐅᓯᕐᓄᑦ ᐃᓚᐅᔪᑦ.

OCR ᐱᐅᓂᖓ ᐱᐅᙱᒃᐸ, ᑐᑭᓕᐅᕆᓂᐊᙱᓂᕐᓂ ᐋᖅᑭᒃᓱᕐᓗᒍ. OCR ᐱᔭᐅᙱᑦᑐᖅ ᑐᑭ translator-ᒧᑦ ᓈᒻᒪᒃᑐᒥᒃ ᐅᑎᕐᓂᐅᓯᐊᕈᓐᓇᙱᓚᖅ.

ᐱᓕᕆᐊᖅ 5: OCR-Processed PDF ᑐᑭᓕᐅᕐᓗᒍ

PDF clean-ᒥᒃ text layer-ᖃᓕᖅᐸ, PDF Translator-ᒧᑦ ᐃᓕᓗᒍ. ᒫᓐᓇ ᑐᑭᓕᐅᕐᓂᖅ ᒪᑉᐱᒐᐅᑉ ᐊᔾᔨᙳᐊᖏᓐᓂᒃ ᐊᑐᙱᓪᓗᓂ ᑎᑎᕋᐅᓯᕐᓂᒃ ᐊᑐᕈᓐᓇᖅᑐᖅ.

ᑐᑭᓕᐅᕆᔭᐅᔪᖅ ᐱᔭᐅᓯᒪᓕᖅᐸ, ᓴᓂᓕᕇᑦᑎᓗᒋᑦ:

  • ᓯᕗᓪᓕᐅᔪᖅ scan
  • OCR text layer
  • ᑐᑭᓕᐅᕆᔭᐅᔪᖅ PDF

ᐅᕙᓂ three-way review ᑕᖅᑳᖅᑐᖅ OCR-ᒥᖔᕐᒪᖔᖅ ᐅᕝᕙᓘᓐᓃᑦ ᑐᑭᓕᐅᕐᓂᕐᒥᖔᕐᒪᖔᖅ ᖃᐅᔨᓯᓲᖅ. OCR ᑎᑎᕋᐅᓯᖓ ᑕᖅᑳᖅᑐᖃᖅᐸ, OCR ᐃᓕᑕᕆᓯᒋᐊᕐᓗᒍ. OCR ᑎᑎᕋᐅᓯᖓ ᓈᒻᒪᒃᐸ ᑭᓯᐊᓂ ᑐᑭᓕᐅᕐᓂᖅ ᓈᒻᒪᙱᒃᐸ, ᑐᑭᓕᐅᕐᓂᖅ ᐋᖅᑭᒃᓱᕐᓗᒍ.

ᐱᓕᕆᐊᖅ 6: High-Risk ᐃᓗᓕᖏᑦ ᕿᒥᕐᕈᓗᒋᑦ

Scanned documents ᐊᒥᓱᐊᖅ ᑕᐃᒪᐃᑦᑐᓂᒃ ᓈᒻᒪᓯᐊᖅᑐᒥᒃ ᕿᒥᕐᕈᓂᖃᕆᐊᓕᖕᓂᒃ ᐃᓗᓕᖃᓲᖑᕗᑦ: ᐅᖃᓕᒫᒐᑐᖃᑦ contracts-ᑦ, government forms-ᑦ, academic papers-ᑦ, manuals-ᑦ, historical documents-ᑦ, ᐊᒻᒪ ᐅᖃᓕᒫᒐᐅᑉ ᒪᑉᐱᒐᖏᑦ.

ᐅᑯᐊ ᐃᓄᒻᒧᑦ ᐊᒡᓚᒃᑲᓐᓂᕐᓗᒋᑦ:

  • ᐊᑏᑦ
  • ᐅᓪᓗᖏᑦ
  • ᓈᓴᐅᑏᑦ
  • ᑐᕌᕈᑏᑦ
  • product codes
  • legal references
  • citations
  • table labels
  • units
  • equations
  • captions
  • footnotes

research ᐊᒻᒪ academic files-ᓄᑦ, academic research papers ᑐᑭᓕᐅᕐᓂᖏᑦ ᐅᖃᐅᓯᒃᓴᖓ ᐅᖃᓕᒪᓗᒍ, ᐊᔾᔨᙳᐊᓂ academic PDF-ᑦ OCR ᐊᑦᑕᕐᓇᖅᑐᓂᒃ ᐃᓚᓯᓲᖑᒻᒪᑕ citations-ᓂᒃ ᐊᒻᒪ layout-ᒥᒃ ᐊᑦᑕᕐᓇᖅᑐᓂᒃ.

ᓴᓂᓕᕇᖅᑐᑦ ᐊᔪᕐᓇᕈᑎᑦ ᐆᒃᑑᑏᑦ

OCR output ᕿᒥᕐᕈᑎᓪᓗᒍ ᐅᕙᓂ ᑕᐃᑯᖓ table ᐊᑐᕐᓗᒍ.

ᓯᕗᓪᓕᐅᔪᖅ scan ᑕᐅᑐᒃᓴᕆᔪᒻᒪᕆᒃᑕᖓᐱᐅᙱᑦᑐᖅ OCR outputᓱᒻᒪᑦ ᐊᑑᑎᖃᖅᐸ
modernmodemᑐᑭᖓ ᐊᓯᔾᔨᑲᓪᓚᒃᐳᖅ.
Section 10Section IOlegal ᐅᕝᕙᓘᓐᓃᑦ technical references ᐊᔪᕐᓇᖅᓯᓲᑦ.
20262O26ᐅᓪᓗᖏᑦ ᐊᒻᒪ IDs ᓇᓗᓇᐃᖅᑕᐅᔪᓐᓇᙱᓕᓲᑦ.
patientpatlentmedical ᐅᕝᕙᓘᓐᓃᑦ technical terms ᓈᒻᒪᙱᓲᑦ.
ᒪᕐᕉᒃ ᐊᕕᒃᑐᖅᓯᒪᔫᒃ columnsᐊᑕᐅᓯᖅ ᐊᑕᐅᓯᐅᖅᑐᖅ paragraphᐅᖃᐅᓰᑦ ᓈᒻᒪᙱᑦᑐᒥᒃ ᑐᓕᖓᓂᖃᖅᑐᒥᒃ ᑐᑭᓕᐅᕆᓲᑦ.
table row ᓇᓗᓇᐃᕈᑎᓕᒃ ᐊᒻᒪ values-ᓕᒃᐊᑕᐅᓯᖅ ᐊᑕᖏᖅᑐᖅ ᑎᑎᕋᐅᓯᖅdata ᓈᒻᒪᒃᑐᒧᑦ label-ᒧᑦ ᑐᓕᖓᔪᓐᓇᙱᓚᖅ.
footnote marker 1letter lnotes ᓈᒻᒪᙱᑦᑐᒧᑦ sentence-ᒧᑦ ᐃᓚᐅᓕᓲᑦ.

OCR layer-ᒥ ᐅᑯᐊ ᑕᑯᒍᕕᒋᑦ, ᑐᑭᓕᐅᕆᓂᐊᙱᓂᕐᓂ OCR ᐋᖅᑭᒃᓱᕐᓗᒍ.

ᓇᓕᐊᒃ Tool ᐊᑐᕆᐊᖃᖅᐱᑦ?

document ᐊᔪᕐᓇᕐᓂᖓ ᒪᓕᓪᓗᒍ ᓂᕈᐊᕐᓗᒍ.

Documentᐊᑐᕆᐊᓕᒃ ᐱᓕᕆᐊᖅ
ᓴᓗᒪᔪᖅ business scanAcrobat-ᒥ ᐅᕝᕙᓘᓐᓃᑦ ᑐᓐᖓᓇᖅᑐᒥ OCR tool-ᒥ OCR ᐊᐅᓚᑦᑎᓗᒍ, ᑭᖑᓂᐊᒍᑦ PDF Translator.
ᐅᖃᓕᒫᒐᑐᖃᖅ scandeskew ᐊᒻᒪ contrast ᐱᐅᓯᒋᐊᕐᓗᒍ, OCR ᓈᒻᒪᒃᑐᒥᒃ ᐊᐅᓚᑦᑎᓗᒍ, ᑭᖑᓂᐊᒍᑦ ᑐᑭᓕᐅᕐᓗᒍ.
academic paper scanOCR, equations/citations/tables ᕿᒥᕐᕈᓗᒋᑦ, ᑭᖑᓂᐊᒍᑦ layout review-ᓕᒃ ᑐᑭᓕᐅᕐᓗᒍ.
handwriting notesᐃᓄᒻᒧᑦ transcription ᐱᔭᕆᐊᖃᕋᔭᖅᑐᖅ ᑐᑭᓕᐅᕆᓂᐊᙱᓂᕐᓂ.
ᐊᔪᙱᑦᑐᖅ ᓇᒻᒥᓂᖅ documentrisk ᐊᑦᑎᒃᐸ, online OCR ᐊᑐᕈᓐᓇᖅᑐᖅ.
sensitive documentlocal OCR ᐅᕝᕙᓘᓐᓃᑦ trusted controlled workflow ᐊᑐᕐᓗᒍ.

ᐊᓯᖏᑦ tool-ᑦ ᓴᓂᓕᕇᒃᑲᓐᓂᕈᒪᒍᕕᑦ, ᐱᐅᓛᖅ PDF ᑐᑭᓕᐅᕈᑎᑦ ᐅᖃᐅᓯᒃᓴᖅ ᑕᑯᓗᒍ.

ᐊᒥᓱᐊᖅ ᓴᖅᑭᓲᑦ Scanned PDF ᐊᔪᕐᓇᕈᑏᑦ

ᐊᑦᑎᑦᑐᖅ Resolution-ᓕᒃ ᒪᑉᐱᒐᐃᑦ

ᐊᑦᑎᑦᑐᖅ resolution-ᓕᒃ scan-ᑦ ᑎᑎᕋᐅᓰᑦ ᐊᑕᐅᓯᐅᓕᖅᑎᓲᑦ. OCR rn ᐊᒻᒪ m, cl ᐊᒻᒪ d, ᐅᕝᕙᓘᓐᓃᑦ punctuation ᐊᒻᒪ dust ᐊᑯᓐᓂᖓᓂ ᓇᓗᓕᓲᖅ.

ᐋᖅᑭᒃᓱᕈᑦ: ᐱᔪᓐᓇᕈᕕᑦ ᓄᑖᒥᒃ scan-ᕐᓗᒍ. ᑕᐃᒪᐃᙱᒃᑯᕕᑦ, contrast ᓴᙱᒃᓯᒋᐊᕐᓗᒍ ᐊᒻᒪ OCR ᐃᓕᑕᕆᓯᒋᐊᕐᓗᒍ.

ᓵᙱᑦᑐᑦ ᐅᕝᕙᓘᓐᓃᑦ ᖁᕐᓗᖅᑐᑦ ᒪᑉᐱᒐᐃᑦ

ᐅᖃᓕᒫᒐᓂᒃ scan-ᖃᖅᑐᖅ ᐊᒥᓱᐊᖅ spine-ᒧᑦ ᖃᓂᒃᓴᖓᓂ ᖁᕐᓗᖅᑐᓂᒃ ᐱᖃᓲᖑᕗᖅ. OCR ᖁᕐᓗᖅᑐᓂᒃ line-ᓂᒃ ᓈᒻᒪᙱᑦᑐᒥᒃ ᐅᖃᓕᒫᓲᖅ ᐊᒻᒪ ᑎᑎᕋᐅᓯᐅᑉ ᑐᓕᖓᓂᖓ ᐊᓯᔾᔨᓲᖅ.

ᐋᖅᑭᒃᓱᕈᑦ: ᒪᑉᐱᒐᖅ ᓴᕐᕙᖓᓗᒍ, ᓄᑖᒥᒃ scan-ᕐᓗᒍ, ᐅᕝᕙᓘᓐᓃᑦ deskew ᐊᒻᒪ dewarping-ᓕᒃ OCR tool ᐊᑐᕐᓗᒍ.

ᐊᒥᓱᓂᒃ Column-ᓕᒃ Layout

OCR ᓴᐅᒥᐊᓂ ᐊᒻᒪ ᑕᓕᖅᐱᐊᓂ columns-ᓂᒃ ᐊᑕᐅᓯᖅ sentence stream-ᒧᑦ ᐊᑕᐅᓯᐅᖅᓯᓲᖅ.

ᐋᖅᑭᒃᓱᕈᑦ: ᑐᑭᓕᐅᕆᓂᐊᙱᓂᕐᓂ reading order ᕿᒥᕐᕈᓗᒍ. academic papers ᐅᕙᓂ ᐊᒥᓱᖅ ᓱᖏᐅᑎᒋᐊᓖᑦ.

Tables

Tables ᐊᔪᕐᓇᖅᑐᑦ, OCR-ᒧᑦ ᑎᑎᕋᐅᓯᕐᒥᒃ ᐊᒻᒪ ᐋᖅᑭᒃᓱᐃᓂᕐᒥᒃ ᑲᑎᙵᓂᖓᓂ ᑕᑯᔭᕆᐊᖃᖅᑐᒍᑦ. table ᑕᐅᑐᒃᓴᕆᓪᓗᒍ ᓈᒻᒪᒃᑐᔮᖅᑐᖅ, ᑭᓯᐊᓂ text layer-ᖓ ᓈᒻᒪᙱᑦᑐᖅ ᐃᓕᒋᔭᐅᔪᓐᓇᖅᑐᖅ.

ᐋᖅᑭᒃᓱᕈᑦ: table-ᒥ OCR ᑎᑎᕋᐅᓯᖅ copy-ᕐᓗᒍ ᐊᒻᒪ labels ᓱᓕ values-ᓄᑦ ᑐᓕᖓᓯᐊᕐᒪᖔᑕ ᖃᐅᔨᓴᕐᓗᒍ.

Handwriting ᐊᒻᒪ Signatures

printed text OCR handwriting recognition-ᒥᑦ ᐅᖓᑖᓂ ᑐᓐᖓᓂᖅᓴᐅᕗᖅ. ᐊᒡᓚᒃᓯᒪᔪᑦ margin note-ᑦ, signatures, ᐊᒻᒪ ᐅᓪᓗᒥ ᑕᑕᑎᖅᑕᐅᓯᒪᔪᑦ forms-ᑦ ᓇᓂᔭᐅᙱᓲᑦ ᐅᕝᕙᓘᓐᓃᑦ ᓇᓗᓇᙱᓂᖃᙱᑦᑐᒥᒃ ᓴᖅᑭᓲᑦ.

ᐋᖅᑭᒃᓱᕈᑦ: ᐊᑐᕆᐊᓕᒃ handwriting ᐃᓄᒻᒧᑦ transcription-ᒧᑦ ᓯᕗᓂᐊᒍᑦ ᓴᖅᑭᑦᑎᓗᒍ.

ᐊᔾᔨᒌᙱᑦᑐᑦ ᐅᖃᐅᓰᑦ

OCR ᓯᕗᓪᓕᐅᔪᖅ ᐅᖃᐅᓯᖅ ᖃᐅᔨᒪᑉᐸ ᐱᐅᓯᐊᖅᓴᕋᔭᖅᐳᖅ. scan ᐃᓗᓕᖃᖅᐸ English-ᒥᒃ, French-ᒥᒃ, ᐊᒻᒪ Chinese-ᒥᒃ, OCR ᐊᑕᐅᓯᕐᒥᑐᐊᖅ ᐅᖃᐅᓯᕐᒥ ᐋᖅᑭᒃᓱᐃᓯᒪᒃᐸ ᐊᔪᕐᓇᖅᓯᓲᖅ.

ᐋᖅᑭᒃᓱᕈᑦ: tool-ᖓ ᐱᔪᓐᓇᖅᐸᑦ, OCR ᐅᖃᐅᓯᖏᑦ ᐃᓘᓐᓇᑎᒃ ᓂᕈᐊᕐᓗᒋᑦ, ᑭᖑᓂᐊᒍᑦ ᐅᖃᐅᓯᓕᒫᓂ ᐃᓚᖏᑦ spot-check ᕿᒥᕐᕈᓗᒋᑦ.

Privacy ᐊᒻᒪ Security Checklist

scanned PDF ᓇᒧᙵᐃᓐᓇᖅ upload ᐊᐅᓚᑦᑎᓂᐊᙱᓂᕐᓂ, ᐅᑯᓂᖓ ᐊᐱᕆᓗᑎᑦ:

  • document ᓇᒻᒥᓂᖅ data-ᖃᖅᐸ?
  • medical, legal, financial, academic, ᐅᕝᕙᓘᓐᓃᑦ unpublished material-ᓂᒃ ᐃᓗᓕᖃᖅᐸ?
  • client agreement-ᒧᑦ ᐅᕝᕙᓘᓐᓃᑦ school policy-ᒧᑦ ᐊᑕᓂᖃᖅᐸ?
  • online OCR service ᐅᕙᓂ document-ᒧᑦ ᐊᑐᕈᓐᓇᖅᐸ?
  • local workflow ᐱᔭᕆᐊᖃᖅᐱᑦ ᐊᓯᖔᖅ?
  • ᒪᑉᐱᒐᐃᑦ ᑐᑭᓕᐅᕆᐊᖃᙱᑦᑐᑦ ᐲᕈᓐᓇᖅᐱᒋᑦ?

Scanned PDF-ᑦ ᐊᒥᓱᐊᖅ sensitive-ᖑᓲᑦ ᐱᔾᔪᑎᖃᕐᒪᑕ contracts-ᓂᒃ, IDs-ᓂᒃ, forms-ᓂᒃ, research drafts-ᓂᒃ, ᐊᒻᒪ internal archives-ᓂᒃ. OCR upload ᓂᕈᐊᕐᓂᐅᔪᑦ ᓯᕗᓪᓕᐅᔪᖅ document ᐱᔾᔪᑎᖓᑎᑐᑦ ᑲᒪᒋᓗᒋᑦ.

FAQ

ᖃᓄᖅ scanned PDF ᑐᑭᓕᐅᕐᓂᖅ?

ᓯᕗᓪᓕᖅ OCR ᐊᐅᓚᑦᑎᓗᒍ text layer ᓴᖅᑭᑦᑎᓗᒍ, OCR output ᕿᒥᕐᕈᓗᒍ, ᑭᖑᓂᐊᒍᑦ OCR-processed PDF PDF Translator-ᒥ ᑐᑭᓕᐅᕐᓗᒍ. OCR review step ᒪᒥᓴᙱᓪᓗᒍ.

ᓱᒻᒪᑦ Google Translate scanned PDF-ᓐᓂᒃ ᑐᑭᓕᐅᕆᙱᓚᖅ?

PDF image-only-ᖑᔪᓐᓇᖅᐳᖅ. text layer ᐃᓗᓕᖃᙱᒃᐸ, Google Translate ᑎᑎᕋᐅᓯᕐᒥᒃ ᐲᔭᐅᔪᓐᓇᖅᑐᒥᒃ ᐱᓯᒪᙱᓚᖅ. ᓯᕗᓪᓕᖅ OCR ᐊᐅᓚᑦᑎᓗᒍ, ᑭᖑᓂᐊᒍᑦ ᑐᑭᓕᐅᕐᓗᒍ. Google-ᒧᑦ ᑐᓕᖓᔪᖅ workflow Google Translate PDF ᐅᖃᐅᓯᒃᓴᖅ-ᒥ ᐊᐅᓚᔭᐅᕗᖅ.

ChatGPT scanned PDF ᑐᑭᓕᐅᕈᓐᓇᖅᐹ?

ChatGPT ᐊᔾᔨᙳᐊᖅ ᐃᓚᖓᓂᒃ ᐅᕝᕙᓘᓐᓃᑦ ᐲᔭᐅᓯᒪᔪᖅ ᑎᑎᕋᐅᓯᖅ ᐊᑐᖅᑐᒍ ᐃᑲᔪᕈᓐᓇᖅᑐᖅ, ᑭᓯᐊᓂ multi-page scanned PDF ᓱᓕ OCR-ᒥᒃ ᐊᒻᒪ ᕿᒥᕐᕈᓂᕐᒥᒃ ᐱᔭᕆᐊᖃᖅᐳᖅ. full document workflow-ᒧᑦ, ᓯᕗᓪᓕᖅ OCR, ᑭᖑᓂᐊᒍᑦ PDF ᑐᑭᓕᐅᕐᓂᕐᒧᑦ workflow ᐊᑐᕐᓗᒍ.

scanned PDF-ᓄᑦ ᐱᐅᓛᖅ OCR tool ᓇᓕᐊᒃ?

document-ᒥ ᑐᓕᖓᕗᖅ. Acrobat ᐊᒻᒪ ABBYY-ᒥᐅᔪᑦ tool-ᑦ ᐊᑕᐅᓯᕐᒧᑦ ᐊᒻᒪ ᐊᔪᕐᓇᖅᑐᓄᑦ scan-ᓄᑦ ᐱᐅᕗᑦ. Tesseract ᐅᕝᕙᓘᓐᓃᑦ OCRmyPDF local-ᒥ technical workflow-ᓄᑦ ᐱᐅᕗᖅ. online OCR ᐊᑦᑎᑦᑐᒥᒃ risk-ᖃᖅᑐᓄ simple files-ᓄᑦ ᓈᒻᒪᓲᖅ, ᑭᓯᐊᓂ privacy ᐊᒻᒪ quality ᐊᓯᔾᔨᓲᑦ.

OCR formatting ᐊᑕᐅᓯᐅᖏᓐᓇᕐᓂᖓᓂᒃ ᐱᔭᕆᓲᖑᕙ?

OCR text layer-ᒥᒃ ᓴᖅᑭᑦᑎᓲᖅ ᐊᒻᒪ ᐃᓚᖓᓂ reading order-ᒥᒃ ᐅᑎᖅᑎᑦᑎᓲᖅ, ᑭᓯᐊᓂ ᓯᕗᓪᓕᐅᔪᖅ translated layout ᐊᑕᐅᓯᐅᖏᓐᓇᕐᓂᖓᓄᑦ ᓇᓕᒧᙱᓚᖅ. OCR ᑭᖑᓂᐊᒍᑦ, PDF ᑐᑭᓕᐅᕐᓂᕐᒧᑦ workflow ᐊᑐᕐᓗᒍ ᐊᒻᒪ output ᓯᕗᓪᓕᐅᔪᒧᑦ ᓴᓂᓕᕇᑦᑎᓗᒍ ᕿᒥᕐᕈᓗᒍ.

OCR ᐱᐅᓂᖓ ᐱᐅᙱᒃᐸ ᓱᓕᔪᖅ?

scan ᑐᑭᓕᐅᕆᓂᐊᙱᓂᕐᓂ ᐱᐅᓯᒋᐊᕐᓗᒍ. ᐱᔪᓐᓇᕈᕕᑦ ᓄᑖᒥᒃ scan-ᕐᓗᒍ, ᒪᑉᐱᒐᐃᑦ deskew-ᕐᓗᒋᑦ, contrast ᓴᙱᒃᓯᒋᐊᕐᓗᒍ, clutter crop-ᕐᓗᒍ, ᓈᒻᒪᒃᑐᖅ OCR ᐅᖃᐅᓯᖅ ᓂᕈᐊᕐᓗᒍ, ᐊᒻᒪ ᐊᔪᕐᓇᖅᑐᑦ ᒪᑉᐱᒐᐃᑦ ᕿᒥᕐᕈᒃᑲᓐᓂᕐᓗᒋᑦ.