Implementing asynchronous Kafka consumer for ClickHouse ingestion via Airflow

Hey, i would like an opinion for this approach, basically i have a source database, e.g. PostgreSQL, i am using Source JDBC Connector to send messages into Kafka partitions, now next step is to ingest those records into OLAP database, e.g. Clickhouse.

Since i want to use Airflow for orchestration, i want to have a deferral sensor task, which runs async Kafka Consumer which waits for number of records, takes the latest offset and triggers the task that does actual ingestion. So basically ingesting batch of records since its inefficient to ingest row by row to OLAP systems.

Did anyone have experience with setup like this?

Is ClickHouse the specific sink database that you’re using? There is a sink connector for it worth checking out: https://www.confluent.io/hub/clickhouse/clickhouse-kafka-connect

The connector batches writes to ClickHouse and there’s documentation on tuning larger batch size here: https://clickhouse.com/docs/integrations/kafka/clickhouse-kafka-connect-sink#tuning-performance