# SmartConnector SQL Processing

{% hint style="success" %}
**Audience**: Developers, Integrators, and Solution Architects

**Purpose**: Explains how the ClickHouse-powered SQL processor works, what inputs and outputs it supports, and how to write, test, and maintain SQL reliably
{% endhint %}

## Overview

The SQL processor is where raw input data can be reshaped before it enters the typed variable and load step <code class="expression">space.vars.workflow</code>. It runs inside an isolated, ClickHouse-powered environment and supports complex transformations including joins, window functions, aggregations, and lookups against live <code class="expression">space.vars.Kizen\_company\_name</code> data. The goal is to ensure that what reaches variable mapping is already clean, correctly structured, and ready for matching and loading.

### What SQL Processing Does

SQL processing is a transformation layer between input data and execution variables. Input rows flow into queryable tables, the SQL script transforms them, and the results are written to named output tables that downstream steps read from.

SQL processing is required for some trigger types and optional for others. Multi-page spreadsheets always require SQL processing.

When SQL is optional, common reasons to use it include:

* Multi-<code class="expression">space.vars.object</code> loads requiring separate output tables per destination
* Webhook payloads that need parsing before variable mapping
* Any scenario requiring joins against existing <code class="expression">space.vars.Kizen\_company\_name</code> <code class="expression">space.vars.entities</code>

The full range of ClickHouse analytical SQL is available, including joins, window functions, CTEs, and aggregations.

***

## CHDB Runtime Versions

<code class="expression">space.vars.smartconnectors</code> run SQL against an embedded ClickHouse runtime called CHDB. Multiple versions of CHDB are available, and the version is selected per <code class="expression">space.vars.smartconnector</code>.

* **New SmartConnectors:** use the latest available version.
* **Existing SmartConnectors:** stay on the version the <code class="expression">space.vars.smartconnector</code> was built against. Upgrading CHDB mid-life can introduce behavioral differences that surface as regressions in production.

{% hint style="warning" %}
**Caution**: Avoid runtime versions prior to 3.1. Out-of-bounds date handling changed significantly at 3.1: earlier versions return `null` for any date outside the supported range, making dates too far in the future and dates too far in the past indistinguishable. From version 3.1 onward, the runtime distinguishes between the two cases, which materially changes how downstream logic can handle invalid dates. Version 3.1 is the minimum recommended runtime.
{% endhint %}

***

## Provided Input Tables

The tables available in the SQL environment depend on the <code class="expression">space.vars.smartconnector</code> type and the <code class="expression">space.vars.smartconnector</code>'s configuration.

* **File upload SmartConnectors** expose the uploaded file as a queryable table. Multi-page workbooks (CSV, XLS, XLSX) are supported and split into separate input tables per sheet.
* **Webhook SmartConnectors** ingest incoming webhook payloads in batches on a schedule (for example, every 5 minutes). Each batch is exposed in SQL through two tables: `input.webhooks`, which contains typed payload data for straightforward parsing, and `input.webhooks_raw`, which contains the original payload as raw strings for cases where manual parsing is needed.
* **Reference data** makes existing <code class="expression">space.vars.Kizen\_company\_name</code> data queryable from inside SQL, enabling lookups and joins against live <code class="expression">space.vars.entities</code>. Reference data is configured separately on the <code class="expression">space.vars.smartconnector</code>.
* **External data sources**, including external databases (Postgres, MySQL), S3, and URL/HTTP endpoints, are available in all <code class="expression">space.vars.smartconnector</code> types. They are accessed from within the SQL script using ClickHouse's built-in table functions, not provided as input tables to the pipeline.

  A common pattern is to use a scheduled <code class="expression">space.vars.smartconnector</code> that reads from an external source directly in its SQL.

For the full list of available functions, see the [ClickHouse table functions reference](https://clickhouse.com/docs/sql-reference/table-functions).

***

## Output Tables

SQL must produce one or more named output tables in the `output` database. Each output database corresponds to a variable scope, and the tables within it become the data source for that scope's execution variables.

The required syntax is:

```sql
CREATE TABLE output.vendors ENGINE Log() AS
SELECT
    vendor_id,
    vendor_name,
    region_code
FROM input.file;
```

A single SQL script can declare multiple output tables (`output.vendors`, `output.products`, `output.invoices`, and so on). Each table is independently mapped to execution variables.

**Never use `SELECT *` in an output table:** Column order in `SELECT *` is not guaranteed, and a silent reordering between runs has caused real production data integrity issues; values landing in the wrong fields on live <code class="expression">space.vars.entities</code>, with no error surfaced. **Always name columns explicitly** in every output table, even when selecting every column.

***

## SQL Parameters

Parameters are static values defined on the <code class="expression">space.vars.smartconnector</code> and made available inside SQL. They are useful for thresholds, feature flags, lookup keys, and any configuration value that should be adjustable without editing the SQL script itself.

Parameters are referenced in SQL via the `meta.parameters` table. A typical pattern is to join or cross-join against `meta.parameters` to pull a value into the query, or to read a parameter value in a `WHERE` clause for filtering.

For example, to retrieve a parameter value as an integer:

```sql
select toInt32(value)
from meta.parameters
where name = 'max_results';
```

***

## Integration Secrets

Integration secrets, including API keys, database passwords, and tokens, can be stored securely on the <code class="expression">space.vars.smartconnector</code> and referenced in SQL without exposing their values. Integration secrets are masked in execution logs, so they will not leak through test runs, run reports, or error output.

Integration secrets are referenced in SQL using ClickHouse's [named collection](https://clickhouse.com/docs/operations/named-collections) pattern inside [table functions](https://clickhouse.com/docs/sql-reference/table-functions). Rather than passing a credential directly as an argument to a function like `postgresql()` or `s3()`, the named collection holding the secret is referenced by name, and ClickHouse substitutes the credential at execution time.

***

## Writing and Testing SQL

SQL scripts have two states: **Draft** (the working copy being edited and tested) and **Live** (the published version used in production runs). Draft changes have no effect on live behavior until promoted.

**Saving a SQL change disables the SmartConnector** until it is re-enabled. This is intentional. It forces the related configuration changes (variable declarations, mapping rules, load steps) to be threaded through before the <code class="expression">space.vars.smartconnector</code> runs against live data again.

**Test executions** run the Draft SQL against the reference or sample file and stop after SQL output is produced. No <code class="expression">space.vars.entities</code> are created, updated, or deleted in <code class="expression">space.vars.Kizen\_company\_name</code>. SQL errors (syntax, type mismatches, missing columns, runtime failures) surface in the test result before any data is written.

**The template SQL** generated when a <code class="expression">space.vars.smartconnector</code> is first created is always executable as-is. Regardless of reference data, parameters, or secrets configured, it's a valid starting point that can be run immediately.

**The Live tab** shows the SQL output from the most recent test execution promoted to Live. This is useful for auditing which sample data was used when the current Live script was published.

***

## Reference Data

Reference data makes existing <code class="expression">space.vars.Kizen\_company\_name</code> <code class="expression">space.vars.entities</code> queryable from inside SQL. Once configured, the selected <code class="expression">space.vars.entities</code> appear as a table in the SQL environment and can be joined against input data, used for lookups, or referenced in conditional logic.

* **Filter groups:** a saved filter group can be applied to limit which <code class="expression">space.vars.Kizen\_company\_name</code> <code class="expression">space.vars.entities</code> are included. This is the recommended way to keep reference data scoped to just the <code class="expression">space.vars.entities</code> relevant to the <code class="expression">space.vars.smartconnector</code>.
* **Column selection:** the reference data configuration limits which fields are exposed to SQL. Fields are referenced by their **API name**, not their display label.
* **Use cases:** lookups against existing entities, joins to enrich input rows with live <code class="expression">space.vars.entity</code> IDs, conditional branching based on the current state of <code class="expression">space.vars.entities</code> in <code class="expression">space.vars.Kizen\_company\_name</code>.

**Reference data with large datasets significantly increases test execution time.** Every test run pulls the reference data fresh, and large filter groups can push test cycles from seconds to minutes. When iterating on SQL against large reference datasets, use the dev package for local testing to avoid repeatedly paying that cost in the UI.

***

## SQL Output Download

The full set of output tables can be downloaded as a zip of CSV files during or after a run, including before the full run report is available. Each output table is exported as its own CSV.

This is particularly useful for long-running <code class="expression">space.vars.smartconnectors</code>, where waiting for the full run report is not practical. Downloading mid-execution allows the SQL output to be inspected while the load step is still processing, which shortens the feedback loop when investigating issues.

***

## Timeline / SQL Version History

Every saved change to a SQL script is recorded in the <code class="expression">space.vars.smartconnectors</code>'s timeline, with both the old and new script versions captured in the entry. Only the SQL script itself is versioned this way; other <code class="expression">space.vars.smartconnector</code> configuration (settings, load steps, variable mappings) is not captured in these <code class="expression">space.vars.timeline</code> entries.

If a script is accidentally overwritten or a recent change needs to be rolled back, the previous version can be recovered directly from the <code class="expression">space.vars.timeline</code>. No version is lost when a script is saved over.

{% hint style="info" %}
**Note**: These timeline entries are currently accessible only through the backend APIs, not the <code class="expression">space.vars.smartconnector</code> UI.
{% endhint %}

***

## Security Model

SQL runs in an isolated environment without access to <code class="expression">space.vars.Kizen\_company\_name</code>'s internal infrastructure or platform credentials. The runner has internet access and can read from and write to any internet-accessible database or service that the SQL connects to. Input and output files are virus-scanned before reaching the load step.

A <code class="expression">space.vars.smartconnector</code>'s SQL can only touch the data explicitly provided to it such as input files, webhook batches, reference data, along with any external sources it explicitly connects to.

***

## What's Next?

With SQL processing configured, you can shape input data into the structure your downstream steps need. Continue to [SmartConnector External Data Sources](/docs/concepts/smartconnectors/smartconnector-external-data-sources.md) to learn how to connect to external databases, S3, and HTTP endpoints from within your SQL using ClickHouse table functions.

<details>

<summary>Related Topics</summary>

* [SmartConnector Execution Variables](/docs/concepts/smartconnectors/smartconnector-execution-variables.md)
* [SmartConnector Load Steps](/docs/concepts/smartconnectors/smartconnector-load-steps.md)
* [SmartConnector Diff Checking](/docs/concepts/smartconnectors/smartconnector-diff-checking.md)
* [Running a SmartConnector](/docs/concepts/smartconnectors/running-a-smartconnector.md)

</details>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://developer.kizen.com/docs/concepts/smartconnectors/smartconnector-sql-processing.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
