← All projects
Delivered 2024

Trading Data Platform from Scratch

Stood up an end-to-end data platform for a trading startup — batch + streaming ingestion, a Data-Vault DWH, and Airflow-on-Kubernetes orchestration.

Senior Data Engineer · NDA, Trading startup

ClickHouseCephKafkaElasticsearchSparkAirflowKubernetesData Vault

A trading startup needed a real data platform — and didn’t have one yet. I built it from the ground up, from storage layer to orchestration.

What I did

  • Built the platform from zero on ClickHouse + Ceph.
  • Migrated data from MS SQL, S3-based storage and BigQuery into the new platform.
  • Deployed Airflow 2.9.2 on Kubernetes for orchestration.
  • Built streaming pipelines from Kafka topics and Elasticsearch into ClickHouse.
  • Authored Spark pipelines that automate creating and loading objects into ClickHouse.
  • Tuned ClickHouse tables (schema, primary/order keys, table engines) to cut storage ~20%.
  • Designed the DWH using Data Vault methodology for a scalable, auditable model.

Impact

A working data platform — batch and streaming — that took the startup from no central data to a queryable, well-modeled warehouse the team could build analytics on.