For the complete documentation index, see llms.txt. This page is also available as Markdown.

SmartConnector External Data Sources

Overview

SmartConnectors can query external databases and cloud storage directly from the SQL processing environment. This allows a Workflow to pull reference data from a PostgreSQL or MySQL database, read a file from an S3 bucket, or fetch a remote CSV over HTTP without uploading a file to the SmartConnector. External data source connections are batch-oriented and intended for scheduled or event-driven Workflows that need to enrich, join, or transform data against systems outside of Kizen.

Credentials for external sources are stored as JSON-encoded integration secrets and referenced from SQL by name. Sensitive values are never written directly into a SQL statement.


Supported External Sources

The following external data source connections are supported within the SmartConnector SQL environment. Each is accessed through a ClickHouse table function referenced directly in SQL. For full reference on each function's arguments and behavior, see the ClickHouse documentation linked in each section.

PostgreSQL

Query a PostgreSQL database using the postgresql() table function, referencing an integration secret that holds the connection parameters.

SELECT *
FROM postgresql(my_postgres_secret, table='customers');

See Configuring Credentials below for the JSON shape of the secret.

MySQL

Query a MySQL database using the mysql() table function, referencing an integration secret. The MySQL function uses the same named-collection pattern as Postgres.

S3

Read Objects from an S3 bucket using the s3() table function. Public buckets require the NOSIGN parameter so ClickHouse does not attempt to sign the request with credentials.

For private buckets, store AWS credentials as an integration secret and reference it in the table function.

URL / HTTP

Fetch data from an HTTP or HTTPS endpoint using the url() table function. CSV and other common tabular formats are supported.


Configuring Credentials

Credentials for external data sources are stored as JSON-encoded integration secrets and referenced from SQL. This keeps sensitive values out of Workflow definitions, out of dry run reports, and out of execution history.

The configuration flow has three steps: identify the parameters the table function needs, encode them as JSON, and save that JSON as an integration secret your SQL can reference.

1

Identify the required parameters

Each ClickHouse table function takes a defined set of connection parameters. For PostgreSQL, the parameters are:

Parameter
Description

host

Hostname or IP of the PostgreSQL server

port

Port number (typically 5432)

database

Database name

user

Username with read access to the target table

password

Password for the user

schema

(Optional) Schema name; defaults to public

For MySQL parameters, see the mysql() table function reference.

2

Encode the parameters as JSON

Format the connection parameters as a JSON Object.

3

Save as an integration secret

Save the JSON as an integration secret in Kizen so it can be referenced by name from SQL. Once saved, the secret name is what you reference in your SQL table function. See the per-source examples above.

Note: Credentials stored as integration secrets are masked in execution logs and never appear in SQL output, dry run reports, XLS output reports, or execution history. Hard-coded credentials in SQL are not supported.


Known Limitations

Kafka is not supported: Kafka integration cannot be used within the SmartConnector SQL environment because the required SASL cryptography libraries are not available in the Kizen environment. Kafka connector code that works outside of Kizen will not execute successfully inside a SSmartConnector.

Streaming data sources are out of scope for SmartConnectors: SmartConnectors are batch-oriented by design and are not intended to consume continuous event streams. Workflows that require streaming ingestion should be handled outside of the SmartConnector framework.

Security Considerations

External connections rely on integration secrets for credential management. Credentials are masked in execution logs and are never exposed in SQL output, so sensitive values do not appear in dry run reports, XLS output reports, or execution history.

Credentials must never be written directly into SQL.


What's Next

Once SQL is reading from the right inputs, whether from input tables or an external source, the next step in the Workflow is declaring how that output flows downstream. Execution Variables are the typed contract between SQL output and the rest of the Workflow. Continue to SmartConnector Execution Variables.

Last updated

Was this helpful?