Skip to main content
POST
/
parse
/
sync
Python
import json
import requests

url = "https://somark.tech/api/v1/parse/sync"

data = {
    "output_formats": ["markdown", "json"],
    "api_key": "sk-***",
    "element_formats": json.dumps({
        "image": "url",
        "formula": "latex",
        "table": "html",
        "cs": "image",
    }),
    "feature_config": json.dumps({
        "enable_text_cross_page": False,
        "enable_table_cross_page": False,
        "enable_title_level_recognition": False,
        "enable_inline_image": False,
        "enable_table_image": True,
        "enable_image_understanding": True,
        "keep_header_footer": False,
    }),
}

files = {"file": ("example.pdf", open("example.pdf", "rb"))}

response = requests.post(url, data=data, files=files)
print(response.json())
{
  "code": 0,
  "message": "任务成功",
  "data": {
    "task_id": "a1b2c3d4e5f6",
    "error": null,
    "metadata": {
      "page_num": 10,
      "file_type": ".pdf"
    },
    "result": {
      "file_name": "document.pdf",
      "outputs": {
        "markdown": "# 第一章 引言\n\n本文档介绍了...",
        "json": {
          "pages": [
            {
              "page_num": 0,
              "blocks": [
                {
                  "idx": 0,
                  "type": "title",
                  "bbox": [
                    72,
                    50,
                    540,
                    80
                  ],
                  "content": "第一章 引言",
                  "format": "text",
                  "captions": [],
                  "img_url": "",
                  "title_level": 1
                },
                {
                  "idx": 1,
                  "type": "text",
                  "bbox": [
                    72,
                    100,
                    540,
                    200
                  ],
                  "content": "本文档介绍了...",
                  "format": "text",
                  "captions": [],
                  "img_url": ""
                }
              ],
              "page_size": {
                "h": 1684,
                "w": 1190
              },
              "merge_content_from_pre_page": false
            }
          ]
        }
      }
    }
  }
}
Path change: This endpoint path has been changed from /extract/acc_sync to /parse/sync. The old path will be discontinued on December 31, 2026. Please migrate to the new path before then. Parameter change: extract_config has been renamed to feature_config. Please replace extract_config with feature_config in your requests.
Available output formatsDefaultDescription
json / markdown / zip["markdown", "json"]Multiple selections supported. Uses the default when omitted. zip packages the Markdown output and all image files into an archive. When output_formats includes zip, element_formats.image must be file
FieldAvailable output formatsDefaultDescription
imageurl / base64 / file / noneurlSingle selection only. When image is set to file, output_formats must include zip. none means images are not returned
formulalatex / mathml / asciilatexSingle selection only. Specifies the output format for formulas
tablemarkdown / html / imagehtmlSingle selection only. In markdown mode, merged cells are automatically split into independent cells and filled with the same content
csimageimageSingle selection only. Output format for chemical structures; smiles format is coming soon

feature_config

FieldDefaultDescription
enable_text_cross_pagefalseCross-page text merging: merge text blocks spanning pages into continuous paragraphs
enable_table_cross_pagefalseCross-page table merging: merge tables spanning pages into a single table
enable_title_level_recognitionfalseHeading level recognition: detect document heading hierarchy (H1/H2/H3…)
enable_inline_imagefalseInline images: return images inside text paragraphs
enable_table_imagetrueImages in tables: return images inside table cells
enable_image_understandingtrueImage understanding: perform semantic understanding and structured description of document images
keep_header_footerfalseKeep headers and footers: headers and footers are filtered by default; enable this if you need to preserve them
If you need auth, usage limits, or sync vs async guidance, read the API overview first. For large files and batch jobs, switch to Async parsing — Submit Task.

Body

multipart/form-data
file
file
required

待解析的文件,支持 PDF、图片、Word、PPT 和 Excel 格式

api_key
string
required

API 密钥,格式 sk-***

output_formats
enum<string>[]

输出格式,可多选。不传时默认为 ["markdown", "json"]。支持 json / markdown / zip,其中 zip 将所有输出文件打包为压缩包

Available options:
json,
markdown,
zip
element_formats
object

元素格式配置,控制各类元素的格式

feature_config
object

特色功能配置(参数已从 extract_config 更名为 feature_config)

Response

200 - application/json

解析成功

code
integer

状态码,0 为成功,非 0错误码

Example:

0

message
string
Example:

"任务成功"

data
object