# SmartConnector Diff Checking

{% hint style="success" %}
**Audience**: Admins, Developers, Solution Architects

**Purpose**: Explain what diff checking is, how it works, how to enable and disable it, and when its built-in behavior is not appropriate for a given use case.
{% endhint %}

## Overview

Diff checking is an optional optimization that prevents <code class="expression">space.vars.smartconnectors</code> from re-processing rows that have not changed since the previous run. In recurring imports, diff checking often speeds up processing by 10x or more, but understanding exactly what it compares, and what it does not, is essential to using it correctly.

### What Is Diff Checking?

Diff checking is a mechanism that compares each incoming row against the previous run's output before deciding whether to send that row to the load step. If the contents of a row are the same as the last run, it is skipped. Only rows whose contents have changed are sent through to the load step.

This is most valuable for recurring imports that ingest complete dataset files on a schedule. When most rows are identical from run to run, diff checking eliminates the overhead of re-processing thousands of unchanged records.

One important detail: diff checking occurs *after* variable resolution. The hash is computed on the mapped execution variable values. If you want a value to be considered by the diff, map it as a variable. This works even if the variable is not used by a later load step: any mapped variable contributes to the diff.

***

## How To Enable Diff Checking

Diff checking is configured at the <code class="expression">space.vars.smartconnector</code> level and applies to the entire <code class="expression">space.vars.smartconnector</code> when enabled. It is not set per output table.

Diff checking can also be toggled directly in the Run GUI at the time of starting a run. Turning it off before a run forces a full re-ingestion regardless of what was processed in previous runs. Every row is sent to the load step as if it were new.

{% hint style="info" %}
**Troubleshooting Tip**: If a <code class="expression">space.vars.smartconnector</code> ran successfully but data is not updating as expected, check two things. First, load step conflict resolution rules (for example, "only update if blank" on a field that is not blank). Second, diff check skipping the row; the run report shows which rows were skipped, or you can toggle diff check off and re-run to rule it out.
{% endhint %}

***

## Behavior and Limitations

Diff check compares the current run's output against the previous successful run's output, not against the current state of <code class="expression">space.vars.entities</code> in <code class="expression">space.vars.Kizen\_company\_name</code>. If the previous run failed, it is not used for comparison, and the next run will diff against the most recent successful run before it. This distinction matters in any environment where <code class="expression">space.vars.entities</code> can be modified between runs.

This behavior is by design for straightforward ingestion <code class="expression">space.vars.workflows</code>, where the source file is the authoritative data source and <code class="expression">space.vars.Kizen\_company\_name</code> <code class="expression">space.vars.entities</code> are not expected to be modified independently. In those cases, skipping unchanged rows is safe and efficient.

The built-in diff check is not the right tool for every use case. If your pipeline needs to detect and correct changes that have been made directly to <code class="expression">space.vars.Kizen\_company\_name</code> <code class="expression">space.vars.entities</code> between runs, diff checking will miss those changes entirely.

#### Custom diff checking

For cases where the built-in diff check is not sufficient, it is also possible to use SQL processing to compare incoming data against reference data from <code class="expression">space.vars.Kizen\_company\_name</code>. This allows for more complex comparisons, such as checking against the current state of <code class="expression">space.vars.entities</code> in <code class="expression">space.vars.Kizen\_company\_name</code> rather than the previous run's output.

The most common reason to reach for a custom diff is to handle <code class="expression">space.vars.entities</code> that are missing from the source data. For example, if one run's file contains <code class="expression">space.vars.entities</code> A, B, and C, and the next run's file contains B, C, and D, a custom diff can detect that A is no longer present and take action on it (such as expiring or archiving the <code class="expression">space.vars.entity</code>). The built-in diff check cannot do this, because it only evaluates rows that are in the incoming file.

This is a power-user feature. It requires significant SQL ability to implement correctly, and most <code class="expression">space.vars.smartconnectors</code> will not need it.

{% hint style="warning" %}
**Caution**: Diff checking has no visibility into changes made directly to <code class="expression">space.vars.Kizen\_company\_name</code> <code class="expression">space.vars.entities</code> between runs. If a user, <code class="expression">space.vars.automation</code>, or other process updates a <code class="expression">space.vars.entity</code> after the last run, the <code class="expression">space.vars.smartconnector</code> will not detect that change. The row will hash to the same value as before and will be skipped on the next run.
{% endhint %}

***

## What's Next

With diff checking configured, you have everything you need to run your <code class="expression">space.vars.smartconnector</code> reliably. Continue to [Running a SmartConnector](/docs/concepts/smartconnectors/running-a-smartconnector.md) to learn how to activate your <code class="expression">space.vars.smartconnector</code>, execute a dry run, interpret the XLS output report, and understand what each execution status means.

<details>

<summary>Related Topics</summary>

* [SmartConnector SQL Processing](/docs/concepts/smartconnectors/smartconnector-sql-processing.md)
* [SmartConnector External Data Sources](/docs/concepts/smartconnectors/smartconnector-external-data-sources.md)
* [SmartConnector Execution Variables](/docs/concepts/smartconnectors/smartconnector-execution-variables.md)
* [SmartConnector Load Steps](/docs/concepts/smartconnectors/smartconnector-load-steps.md)

</details>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://developer.kizen.com/docs/concepts/smartconnectors/smartconnector-diff-checking.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
