Install and Configure the Plugin
Install the plugin
Log in to your Dify platform, navigate to Tools → Plugin Marketplace, search for SoMark, and add the plugin.
Configure plugin settings
After installation, open the plugin configuration page:
- Base URL: For SoMark API, fill in
https://somark.tech/api/v1; for self-hosted deployment, fill in your local Base URL. - API Key: Required for SoMark API; leave blank for self-hosted deployment.
No API Key yet? Go to the SoMark workbench to claim free quota.
Using SoMark in a Workflow
Add the SoMark Document Parser tool node
In the Dify workflow editor, click + to add a new node, choose Tool, then find and add the SoMark > SoMark Document Parser node.

Configure input variables
Click the variable icon 
{x} in the File input field and select the file variable provided by an upstream node, such as sys.files from your Start node.
You can also configure optional parameters (for example Output Formats, Image Format, Table Format) as needed. If you leave Output Formats empty, the node returns both Markdown and JSON by default. See Input Parameters below for details.
Base URL and API Key are injected automatically from the plugin configuration. You do not need to enter them in the node.
Parameters and Outputs
Input Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
File | File | ✅ | Supported files: PDF, PNG, JPG, JPEG, BMP, TIFF, JP2, DIB, PPM, PGM, PBM, GIF, HEIC, HEIF, WEBP, XPM, TGA, DDS, XBM, DOC, DOCX, PPT, PPTX. Max 200 MB / 300 pages. |
Output Formats | Multi-select | ❌ | Select one or more output formats. Supported options: Markdown, JSON. If left empty, the default outputs are Markdown and JSON. |
Image Format | Single-select | ❌ | Image output format. Supported options: URL, Base64, None. Default: URL. |
Formula Format | Single-select | ❌ | Formula output format. Supported options: LaTeX, MathML, ASCII. Default: LaTeX. |
Table Format | Single-select | ❌ | Table output format. Supported options: HTML, Markdown, Image. Default: HTML. In Markdown mode, merged cells are expanded into individual cells with duplicated content. |
Chemical Structure Formula Format | Single-select | ❌ | Chemical structure output format. Supported options: Image. Default: Image. |
Enable Text Cross Page | True / False | ❌ | Merge text that spans across pages into a continuous paragraph. Default: False. |
Enable Table Cross Page | True / False | ❌ | Merge tables that span across pages into a continuous table. Default: False. |
Enable Title Level Recognition | True / False | ❌ | Recognize heading hierarchy such as H1/H2/H3. Default: False. |
Enable Inline Image | True / False | ❌ | Return images embedded in text paragraphs. Default: False. |
Enable Table Image | True / False | ❌ | Return images embedded in table cells. Default: True. |
Enable Image Understanding | True / False | ❌ | Perform semantic understanding and structured description for images in the document. Default: True. |
Keep Header Footer | True / False | ❌ | Keep page headers and footers instead of filtering them out. Default: False. |
Output Variables
| Variable | Description |
|---|---|
markdown | Parsed document content in Markdown format, preserving the original layout including headings, tables, lists, equations, and images |
json_str | Parsed result as a JSON string, containing structured data such as text blocks, tables, equations, images, bounding boxes, and page numbers. Parse it in a code node for advanced processing |
text | Dify built-in variable. This plugin does not populate it |
files | Dify built-in variable. This plugin does not populate it |


