Schema & Storage Models
- Schema-on-Read: Structure applied at query time for flexible ingestion.
- Schema-on-Write: Structure enforced at load time for consistent BI performance.
- Row-Oriented Storage: Records stored row by row for OLTP workloads.
- Columnar Storage: Columns stored together for OLAP queries, better compression.
- In-Memory Storage: Data in RAM for ultra-low latency analytics.
Processing Patterns
- ETL (Extract, Transform, Load): Batch processโextract, cleanse, load into DWH.
- ELT (Extract, Load, Transform): Load raw data then transform in-place (lakehouse).
- Batch vs Streaming: Scheduled chunks vs real-time events.
Access & Governance
- OLTP: Online Transaction Processing for high-volume writes.
- OLAP: Online Analytical Processing for complex read queries.
- Data Virtualization: Unified, virtual views without copying data.
- Metadata & Data Catalog: Searchable repository of schemas, lineage, quality.
Transaction & Consistency
- ACID Transactions: Atomicity, Consistency, Isolation, Durability for reliable updates.
- Idempotent Processing: Repeatable transforms yield same result for exactly-once jobs.
Advanced Concepts
- Data Lineage: Full lifecycle tracking from origin to report for auditability.
- Data Contract: Schema & SLA definitions between producers & consumers.
- Medallion Architecture: Bronze โ Silver โ Gold.
- Polyglot Persistence: Multiple storage technologies for workload-fit solutions.