somark-py and somark-js are independent implementations: each follows its own language ecosystem, async model, and engineering conventions, without glue code or unnecessary dependencies.
somark-py
Python implementation. One package provides both the SDK and the
somark CLI.somark-js
JavaScript / TypeScript implementation. One package provides both the SDK and the
somark CLI.One capability set, two entry points
CLI
Best for terminals, scripts, CI, agent platforms, and every “process this file now” scenario.
SDK
Best for product features, backend services, async jobs, data pipelines, and engineering systems that need to run reliably over time.
Sync principle
The SoMark API is the highest-priority source of truth. Whatever the API supports, the CLI and SDK follow. The CLI does not create a separate workflow, and the SDK does not invent a parallel universe. The Python and JavaScript implementations keep parsing capabilities fully aligned, but some derived capabilities differ because of ecosystem differences:- can only be imported from the JS SDK because SoMarkDown itself is implemented in JavaScript. SoMarkDown Preview is available in both Python and JS workflows.
- PDF processing reaches its broadest capability range on the Python side because it depends on , whose implementation and lower-level dependencies are more complete in Python.
When to use CLI
Agent platforms
Use
somark as a clean external tool in Claude Code, Codex, OpenClaw, and similar environments.Terminal batches
Scan folders, run scripts, and process batches of papers, reports, receipts, or contracts without building an app first.
One-off conversion
Temporarily convert files to Markdown, JSON, SoMarkDown, or ZIP, then move on with the result.
Automation and diagnostics
Put it in CI, scheduled jobs, local preview, usage checks, and installation diagnostics. This is what command lines are good at.
When to use SDK
In-product parsing
After users upload files, return readable, searchable, and reusable structured content inside your app.
Backend services
Connect parsing to APIs, workers, webhooks, or internal services so documents become part of your system.
Async queue control
Use it when large files, batch files, or long-running tasks need fine-grained async control after entering a queue.
Data pipelines
Parse, clean, store, search, and vectorize content with a deep SoMark integration, so unstructured files become usable data.

