Any good post/talk about clickhouse and how to operate it?
exploring the idea of setting it up for a mostly write-once-read-a-lot datastore. We’re currently using a mysql DB and hitting some limits regarding table size, and specially cost. Seems that the usecase lends itself for clickhouse but before going forward, I want to better understand what it could potentially look like.
From the docs i see that you can probably put it in front of a database and it should just work, but there’s also the option of leveraging S3 storage for colder data
> From the docs i see that you can probably put it in front of a database and it should just work, but there’s also the option of leveraging S3 storage for colder data
From what I know about it, it should be not the case. The whole power of the CH is the performance ratio you can get from it by exploiting data compression and vectorization hard that usually requires lots of specialized adoption
So the workload is pretty OLAP, we’re storing payment events and then aggregating it to monthly/daily based on a model.
> From what I know about it, it should be not the case.
I understood that it was possible to setup connection access (this) but not sure how to move to a more native CH storage (which i think is the good long term solution)
The MySQL thing you are linking too would be pretty useless from what I understand. Its use case is more of joining some clickhouse data with mysql. Say you run some analytics on data in clickhouse and you want to enrich data after aggregation. To get clickhouse performance you need to find a way to move data to clickhouse mergetree family storage
Operating clickhouse beyond few tables and few nodes becomes a bit of a pain w or w/o the operator and change is ops intensive. When it was initially designed, it was designed as a single node thing, then replication, distributed query execution, distributed schema changes were added/slapped on top and the current ops experience isn’t too pretty. So if you are embarking on this journey, be prepared that you’ll need to invest time in keeping it running smoothly.
the CH <-> MySQL i didn’t have too many expectations. I understand that there might be a lot of data model reworking before getting to an operational state
I’m not too familiar with its state now. But last time I checked 2y ago (that’s like a lifetime ago in CH ecosystem), a lot of things were done on “best effort” basis. ClickHouse didn’t expose good primitives for automating cluster management (eg removing replicas, adding replicas, changing tables in distributed manner).