Dedoc

Dedoc is an open universal system for converting textual documents of different formats to a unified output representation.

See dedoc documentation to get more information about dedoc and its API parameters.

Parameters configuration

Type of document structure parsing

document_type, patterns, structure_type, return_format

Patterns for default structure extractor (document_type="other")

Attachments handling

with_attachments, need_content_analysis, recursion_deep_attachments, return_base64

Tables handling

need_pdf_table_analysis, orient_analysis_cells, orient_cell_angle

PDF handling

pdf_with_text_layer, fast_textual_layer_detection, language, pages, is_one_column_document, document_orientation, need_header_footer_analysis, need_binarization

Other formats handling

delimiter, encoding, handle_invisible_table


Useful links