Business User's Guide to Data Architecture

A comprehensive reference for understanding modern data architecture terminology

📊 Storage & Schema Models

Schema-on-Read

What it means: Data structure is applied when you query the data, not when you store it.

Business Impact: Faster data ingestion, more flexibility to explore data in different ways.
Example: Storing customer feedback as raw text files, then analyzing sentiment when needed.
Schema-on-Write

What it means: Data structure is enforced when data is loaded into the system.

Business Impact: Consistent, reliable reports but slower to adapt to new data types.
Example: Financial transactions must follow strict format before entering the accounting system.
Columnar Storage

What it means: Data is stored column by column (like Excel columns).

Business Impact: Much faster for analytics and reporting across large datasets.
Example: Analyzing total sales across millions of transactions.

⚙️ Processing Patterns

ETL (Extract, Transform, Load)

What it means: Traditional approach - clean and structure data before storing it.

Business Impact: High data quality but slower time-to-insight.
Example: Nightly processing of sales data into clean reports.
ELT (Extract, Load, Transform)

What it means: Modern approach - store raw data first, clean it when needed.

Business Impact: Faster data availability, more flexibility for different analyses.
Example: Loading all customer interactions immediately, analyzing patterns later.
Stream Processing

What it means: Processing data continuously as it arrives.

Business Impact: Real-time insights but higher complexity and cost.
Example: Live website personalization, instant fraud alerts.

🏗️ Data Architecture Patterns

Data Lake

What it means: Central repository storing all types of raw data at low cost.

Business Impact: Flexible, cost-effective storage for future analytics needs.
Best for: Organizations with diverse data types and exploratory analytics needs.
Data Warehouse

What it means: Structured repository optimized for business reporting and analytics.

Business Impact: Reliable, fast reports but requires upfront planning.
Best for: Organizations with well-defined reporting requirements and compliance needs.
Data Lakehouse

What it means: Combines flexibility of data lakes with reliability of data warehouses.

Business Impact: Best of both worlds - flexibility and performance.
Best for: Organizations wanting both exploratory analytics and reliable reporting.
Data Mesh

What it means: Decentralized approach where business domains own their data as products.

Business Impact: Scales with organization size, improves data quality through ownership.
Best for: Large organizations with distinct business domains and mature data teams.

🔧 Modern Table Formats

Delta Lake

What it means: Enhanced data lake format with database-like reliability features.

Business Impact: Reliable data updates and historical tracking in data lakes.
Key benefit: Can update/delete data reliably, see data changes over time.
Apache Iceberg

What it means: Open format that works with multiple analytics engines.

Business Impact: Avoid vendor lock-in, use best tools for different tasks.
Key benefit: Freedom to choose different analytics tools without data migration.

🔒 Advanced Concepts

ACID Transactions

What it means: Database properties ensuring data reliability (Atomicity, Consistency, Isolation, Durability).

Business Impact: Guarantees data accuracy even during system failures.
Example: Bank transfer either completes fully or not at all - no partial transfers.
Time Travel

What it means: Ability to query data as it existed at any point in the past.

Business Impact: Audit trails, debugging, regulatory compliance.
Example: "Show me customer data as it was on December 31st for year-end reporting."

🎯 Business Decision Framework

When to Choose Each Architecture:
Key Questions for Architecture Selection:
  1. Data Volume: How much data do you have? (GB, TB, PB)
  2. Data Variety: How many different data types and sources?
  3. Speed Requirements: Real-time, hourly, daily, or weekly updates?
  4. Team Size: How many people will work with the data?
  5. Budget: What's your cost tolerance for storage and compute?
  6. Compliance: What regulatory requirements must you meet?

This glossary serves as a reference for business stakeholders to understand and participate in data architecture discussions.

🏠 Back to Home