Overview
A Data Fabric provides real-time, virtualized access to distributed data sources without physical movement. It ensures governance via metadata-driven automation.
Data Organization
- Virtual Views: Logical tables over SQL, NoSQL, file stores
- Metadata Layer: Cataloging, lineage, data profiles
- API Gateway: Standardized REST/GraphQL endpoints
- Streaming Bus: Real-time connectors (Kafka, JMS)
- Policy Engine: Central security & masking rules
Key Components & Patterns
Components
- Virtualization Engines: Denodo, TIBCO, IBM Cloud Pak
- Metadata Catalogs: Informatica EDC, Collibra
- API Layer: Data product interfaces
- Governance: Central policy enforcement
- Streaming Connectors: Kafka, Kinesis, Event Hubs
Patterns
- Virtualization real-time integration
- Metadata-Driven automation
- API-First data products
- Federated Policy enforcement
- Cache Acceleration for performance
Use Cases
- Patient Records: Federated EMR, lab, imaging (FHIR API)
- Fraud Detection: Real-time virtual joins across payments
- Customer 360: Unified customer view from CRM, web, social
Pros & Cons
Pros
- β
Instant access without data copies
- β
Centralized governance & security policies
Cons
- β οΈ Query latency due to virtualization layer
- β οΈ Complexity in policy management
Day-to-Day Operations
- Real-Time Queries: On-demand federation
- Metadata Harvesting: Auto-catalog updates
- Policy Application: Runtime masking & access control
- Cache Management: Accelerate frequent queries
- API Monitoring: Data product usage metrics