← All projects
Trading Data Platform from Scratch
Stood up an end-to-end data platform for a trading startup — batch + streaming ingestion, a Data-Vault DWH, and Airflow-on-Kubernetes orchestration.
Senior Data Engineer · NDA, Trading startup
ClickHouseCephKafkaElasticsearchSparkAirflowKubernetesData Vault
A trading startup needed a real data platform — and didn’t have one yet. I built it from the ground up, from storage layer to orchestration.
What I did
- Built the platform from zero on ClickHouse + Ceph.
- Migrated data from MS SQL, S3-based storage and BigQuery into the new platform.
- Deployed Airflow 2.9.2 on Kubernetes for orchestration.
- Built streaming pipelines from Kafka topics and Elasticsearch into ClickHouse.
- Authored Spark pipelines that automate creating and loading objects into ClickHouse.
- Tuned ClickHouse tables (schema, primary/order keys, table engines) to cut storage ~20%.
- Designed the DWH using Data Vault methodology for a scalable, auditable model.
Impact
A working data platform — batch and streaming — that took the startup from no central data to a queryable, well-modeled warehouse the team could build analytics on.