Resources to evaluate Clickhouse

Any good post/talk about clickhouse and how to operate it?
exploring the idea of setting it up for a mostly write-once-read-a-lot datastore. We’re currently using a mysql DB and hitting some limits regarding table size, and specially cost. Seems that the usecase lends itself for clickhouse but before going forward, I want to better understand what it could potentially look like.

From the docs i see that you can probably put it in front of a database and it should just work, but there’s also the option of leveraging S3 storage for colder data

You are aware of the difference in between OLTP and OLAP workloads?

> From the docs i see that you can probably put it in front of a database and it should just work, but there’s also the option of leveraging S3 storage for colder data
From what I know about it, it should be not the case. The whole power of the CH is the performance ratio you can get from it by exploiting data compression and vectorization hard that usually requires lots of specialized adoption

The simplest good post you can find is about how CH can handle millions of time series sensors data on one local notebook

So the workload is pretty OLAP, we’re storing payment events and then aggregating it to monthly/daily based on a model.

> From what I know about it, it should be not the case.
I understood that it was possible to setup connection access (this) but not sure how to move to a more native CH storage (which i think is the good long term solution)

is this https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse the post you mean?

I meant this one https://altinity.com/blog/2020/1/1/clickhouse-cost-efficiency-in-action-analyzing-500-billion-rows-on-an-intel-nuc

Those guys are good, there are many articles on exploitation over the net

https://www.slideshare.net/Altinity/clickhouse-data-warehouse-101-the-first-billion-rows-by-alexander-zaitsev-and-robert-hodges-altinity

They even have k8s operator if you need one

Yeah, i’ve been looking at that. From what i’ve been told it just works, so that sounds very promising

The MySQL thing you are linking too would be pretty useless from what I understand. Its use case is more of joining some clickhouse data with mysql. Say you run some analytics on data in clickhouse and you want to enrich data after aggregation. To get clickhouse performance you need to find a way to move data to clickhouse mergetree family storage

There is an engine doing this for you, it’s experimental and I can’t say much about it https://clickhouse.com/docs/en/engines/database-engines/materialized-mysql

Alternatively, CDC to Kafka then Kafka to ClickHouse (via https://clickhouse.com/blog/kafka-connect-connector-clickhouse-with-exactly-once). That would be something I would be comfortable with running in production.

Operating clickhouse beyond few tables and few nodes becomes a bit of a pain w or w/o the operator and change is ops intensive. When it was initially designed, it was designed as a single node thing, then replication, distributed query execution, distributed schema changes were added/slapped on top and the current ops experience isn’t too pretty. So if you are embarking on this journey, be prepared that you’ll need to invest time in keeping it running smoothly.

So the operator is key for minimizing ops/toil?

the CH <-> MySQL i didn’t have too many expectations. I understand that there might be a lot of data model reworking before getting to an operational state

I’d say so, but don’t expect magic.

I’m not too familiar with its state now. But last time I checked 2y ago (that’s like a lifetime ago in CH ecosystem), a lot of things were done on “best effort” basis. ClickHouse didn’t expose good primitives for automating cluster management (eg removing replicas, adding replicas, changing tables in distributed manner).

I’ve heard from folks at chronosphere that it’s a pretty boring system and they using for TB size loads with traces being stored there

My experience is from Cloudflare. It wasn’t boring at all there :slight_smile:

Maybe it’s ok for TB scale but struggles with PB scale now :smile: