For the complete documentation index, see llms.txt. This page is also available as Markdown.

SmartConnector SQL Processing

Overview

The SQL processor is where raw input data can be reshaped before it enters the typed variable and load step Workflow. It runs inside an isolated, ClickHouse-powered environment and supports complex transformations including joins, window functions, aggregations, and lookups against live Kizen data. The goal is to ensure that what reaches variable mapping is already clean, correctly structured, and ready for matching and loading.

What SQL Processing Does

SQL processing is a transformation layer between input data and execution variables. Input rows flow into queryable tables, the SQL script transforms them, and the results are written to named output tables that downstream steps read from.

SQL processing is required for some trigger types and optional for others. Multi-page spreadsheets always require SQL processing.

When SQL is optional, common reasons to use it include:

  • Multi-Object loads requiring separate output tables per destination

  • Webhook payloads that need parsing before variable mapping

  • Any scenario requiring joins against existing Kizen Records

The full range of ClickHouse analytical SQL is available, including joins, window functions, CTEs, and aggregations.


CHDB Runtime Versions

SmartConnectors run SQL against an embedded ClickHouse runtime called CHDB. Multiple versions of CHDB are available, and the version is selected per SmartConnector.

  • New SmartConnectors: use the latest available version.

  • Existing SmartConnectors: stay on the version the SmartConnector was built against. Upgrading CHDB mid-life can introduce behavioral differences that surface as regressions in production.


Provided Input Tables

The tables available in the SQL environment depend on the SmartConnector type and the SmartConnector's configuration.

  • File upload SmartConnectors expose the uploaded file as a queryable table. Multi-page workbooks (CSV, XLS, XLSX) are supported and split into separate input tables per sheet.

  • Webhook SmartConnectors ingest incoming webhook payloads in batches on a schedule (for example, every 5 minutes). Each batch is exposed in SQL through two tables: input.webhooks, which contains typed payload data for straightforward parsing, and input.webhooks_raw, which contains the original payload as raw strings for cases where manual parsing is needed.

  • Reference data makes existing Kizen data queryable from inside SQL, enabling lookups and joins against live Records. Reference data is configured separately on the SmartConnector.

  • External data sources, including external databases (Postgres, MySQL), S3, and URL/HTTP endpoints, are available in all SmartConnector types. They are accessed from within the SQL script using ClickHouse's built-in table functions, not provided as input tables to the pipeline.

    A common pattern is to use a scheduled SmartConnector that reads from an external source directly in its SQL.

For the full list of available functions, see the ClickHouse table functions reference.


Output Tables

SQL must produce one or more named output tables in the output database. Each output database corresponds to a variable scope, and the tables within it become the data source for that scope's execution variables.

The required syntax is:

A single SQL script can declare multiple output tables (output.vendors, output.products, output.invoices, and so on). Each table is independently mapped to execution variables.

Never use SELECT * in an output table: Column order in SELECT * is not guaranteed, and a silent reordering between runs has caused real production data integrity issues; values landing in the wrong fields on live Records, with no error surfaced. Always name columns explicitly in every output table, even when selecting every column.


SQL Parameters

Parameters are static values defined on the SmartConnector and made available inside SQL. They are useful for thresholds, feature flags, lookup keys, and any configuration value that should be adjustable without editing the SQL script itself.

Parameters are referenced in SQL via the meta.parameters table. A typical pattern is to join or cross-join against meta.parameters to pull a value into the query, or to read a parameter value in a WHERE clause for filtering.

For example, to retrieve a parameter value as an integer:


Integration Secrets

Integration secrets, including API keys, database passwords, and tokens, can be stored securely on the SmartConnector and referenced in SQL without exposing their values. Integration secrets are masked in execution logs, so they will not leak through test runs, run reports, or error output.

Integration secrets are referenced in SQL using ClickHouse's named collection pattern inside table functions. Rather than passing a credential directly as an argument to a function like postgresql() or s3(), the named collection holding the secret is referenced by name, and ClickHouse substitutes the credential at execution time.


Writing and Testing SQL

SQL scripts have two states: Draft (the working copy being edited and tested) and Live (the published version used in production runs). Draft changes have no effect on live behavior until promoted.

Saving a SQL change disables the SmartConnector until it is re-enabled. This is intentional. It forces the related configuration changes (variable declarations, mapping rules, load steps) to be threaded through before the SmartConnector runs against live data again.

Test executions run the Draft SQL against the reference or sample file and stop after SQL output is produced. No Records are created, updated, or deleted in Kizen. SQL errors (syntax, type mismatches, missing columns, runtime failures) surface in the test result before any data is written.

The template SQL generated when a SmartConnector is first created is always executable as-is. Regardless of reference data, parameters, or secrets configured, it's a valid starting point that can be run immediately.

The Live tab shows the SQL output from the most recent test execution promoted to Live. This is useful for auditing which sample data was used when the current Live script was published.


Reference Data

Reference data makes existing Kizen Records queryable from inside SQL. Once configured, the selected Records appear as a table in the SQL environment and can be joined against input data, used for lookups, or referenced in conditional logic.

  • Filter groups: a saved filter group can be applied to limit which Kizen Records are included. This is the recommended way to keep reference data scoped to just the Records relevant to the SmartConnector.

  • Column selection: the reference data configuration limits which fields are exposed to SQL. Fields are referenced by their API name, not their display label.

  • Use cases: lookups against existing entities, joins to enrich input rows with live Record IDs, conditional branching based on the current state of Records in Kizen.

Reference data with large datasets significantly increases test execution time. Every test run pulls the reference data fresh, and large filter groups can push test cycles from seconds to minutes. When iterating on SQL against large reference datasets, use the dev package for local testing to avoid repeatedly paying that cost in the UI.


SQL Output Download

The full set of output tables can be downloaded as a zip of CSV files during or after a run, including before the full run report is available. Each output table is exported as its own CSV.

This is particularly useful for long-running SmartConnectors, where waiting for the full run report is not practical. Downloading mid-execution allows the SQL output to be inspected while the load step is still processing, which shortens the feedback loop when investigating issues.


Timeline / SQL Version History

Every saved change to a SQL script is recorded in the SmartConnectors's timeline, with both the old and new script versions captured in the entry. Only the SQL script itself is versioned this way; other SmartConnector configuration (settings, load steps, variable mappings) is not captured in these Timeline entries.

If a script is accidentally overwritten or a recent change needs to be rolled back, the previous version can be recovered directly from the Timeline. No version is lost when a script is saved over.

Note: These timeline entries are currently accessible only through the backend APIs, not the SmartConnector UI.


Security Model

SQL runs in an isolated environment without access to Kizen's internal infrastructure or platform credentials. The runner has internet access and can read from and write to any internet-accessible database or service that the SQL connects to. Input and output files are virus-scanned before reaching the load step.

A SmartConnector's SQL can only touch the data explicitly provided to it such as input files, webhook batches, reference data, along with any external sources it explicitly connects to.


What's Next?

With SQL processing configured, you can shape input data into the structure your downstream steps need. Continue to SmartConnector External Data Sources to learn how to connect to external databases, S3, and HTTP endpoints from within your SQL using ClickHouse table functions.

Last updated

Was this helpful?