Using Denodo and Google Pub/Sub for Unified Data Access Across Distributed Healthcare Systems
Article Main Content
Healthcare organizations increasingly operate across distributed environments with disparate data sources, creating challenges for unified data access and real-time analytics. This paper presents a comprehensive architectural framework leveraging Denodo’s data virtualization platform and Google Cloud Pub/Sub messaging service to achieve seamless data integration across distributed healthcare systems. Through a systematic analysis of implementation strategies, performance characteristics, and security considerations, we demonstrate how this hybrid approach addresses challenges in healthcare data management while maintaining compliance with regulatory requirements such as HIPAA and GDPR.
Prototype results suggest the architecture may reduce data integration complexity by up to 65% while improving real-time data availability by 78% compared to traditional extract-transform-load (ETL) approaches. Initial testing showed query latency improved by up to 40%, with data freshness reduced from 5 minutes to under 1 second through event-driven synchronization. The framework supports both batch and streaming data processing patterns, enabling healthcare organizations to achieve unified data access without compromising on security, performance, or regulatory compliance.
Introduction
The healthcare industry has undergone a large-scale digital transformation over the past decade, with electronic health records (EHRs), medical imaging systems, laboratory information systems (LIS), and Internet of Medical Things (IoMT) devices generating substantial data volume [1]. This emergence of digital health technologies has created a complex landscape where healthcare organizations must integrate data from multiple sources, often distributed across different geographic locations, cloud environments, and legacy systems [2]–[4].
The challenge of unified data access in healthcare is compounded by several factors: stringent regulatory requirements for data privacy and security, the need for real-time clinical decision support, interoperability issues between different vendor systems, and the critical nature of healthcare data where system downtime can directly impact patient care [5]. According to industry analysis, healthcare data integration requirements continue to evolve with regulatory changes such as CMS prior authorization requirements and FHIR R4 implementation mandates [2].
Traditional data integration approaches, such as extract-transform-load (ETL) processes and data warehousing, often fall short in addressing these requirements due to their batch-oriented nature, high latency, inflexibility in handling diverse data formats, and resource-intensive physical data movement that raises privacy concerns under HIPAA and GDPR regulations [3], [6].
Data virtualization has emerged as an alternative that can provide single view access to distributed data sources without requiring physical data movement [6]. Denodo, a leading data virtualization platform, excels in creating a semantic layer over heterogeneous sources, supporting query federation and caching for performance optimization. However, in dynamic healthcare environments where data updates occur in real-time (e.g., vital signs from wearables or inventory changes in supply chains), virtualization alone may not be enough for event-driven synchronization [3]. Google Cloud Pub/Sub addresses this gap by providing a scalable, asynchronous messaging service that decouples producers and consumers, facilitating real-time data streaming with low latency [2].
This paper explores the potential of combining Denodo’s data virtualization platform with Google Cloud Pub/Sub messaging service to create a scalable, performant and compliant solution for unified data access in distributed healthcare environments.
Literature Review
Healthcare Data Integration Challenges
Healthcare data integration presents unique barriers typically not found in other industries. Five barriers frequently cited in the literature [7] include semantic diversity, where different systems use varying terminologies and data models; temporal diversity, where data timestamps and versioning differ across systems; syntactic diversity, involving different data formats and structures; system diversity, encompassing different database technologies and APIs; and privacy diversity, where different privacy policies and access controls must be harmonized [5].
The volume and velocity of healthcare data have grown exponentially with the adoption of IoMT devices and real-time monitoring systems. Current industry trends show healthcare data integration requirements expanding beyond traditional clinical systems to include patient-generated health data, social determinants of health information, and real-time streaming from monitoring devices [2], [3].
Data Virtualization in Healthcare
Data virtualization technology has gained traction in healthcare settings as organizations seek to avoid the complexity and cost of traditional data warehousing approaches [6], [8]. Denodo’s data virtualization platform has been specifically evaluated in healthcare contexts, with case studies showing successful integration of data from Epic EHR systems, Cerner clinical databases, and various DICOM imaging repositories while maintaining HIPAA compliance [3].
The platform’s ability to create logical views over disparate data sources while maintaining security and audit trails makes it particularly suitable for healthcare applications. Denodo stands out from other platforms like Informatica or IBM Data Virtualization for its support of cloud-native deployments and advanced caching mechanisms.
Real-Time Messaging in Healthcare
Cloud-native messaging services such as Google PubSub have emerged as solutions for immediate clinical alerts, real-time monitoring, and time-sensitive clinical workflows [2], [9]. These services provide scalable messaging capabilities with built-in security and compliance features addressing healthcare regulatory requirements [3].
Industry implementations include real-time data ingestion from wearable devices, processing for anomaly detection, and integration with analytics platforms for streaming sensor data [2], [3]. Successful implementation will vary depending on infrastructure maturity and clinical workflow alignment [6].
Integration Architecture Approaches
Recent research has explored hybrid architectures that combine data virtualization with real-time messaging to address both historical data access and real-time event processing needs.
Federated health data networks (FHDNs) enable distributed analysis without centralizing sensitive data, while unified health information system frameworks connect data, systems, and devices to enhance interoperability. However, the integration of virtualization with messaging remains underexplored. While Denodo’s reference architecture with Google Cloud mentions hybrid logical data warehouses, it lacks specific Pub/Sub integration guidance, representing a gap this work addresses.
Proposed Architecture and Methodology
Overall System Design
The proposed framework integrates Denodo’s data virtualization capabilities with Google Pub/Sub’s real-time messaging to create a unified data access layer for distributed healthcare systems. The architecture consists of five primary components as shown in Fig. 1:
1. Data Source Layer: Contains all distributed healthcare data sources including EHR systems, laboratory information systems, medical imaging repositories, and IoMT devices
2. Data Virtualization Layer: Implemented using Denodo Platform to create logical views and unified schemas over disparate data sources
3. Real-time Messaging Layer: Google Cloud Pub/Sub topics and subscriptions for handling streaming data and event-driven communications
4. Unified Access Layer: RESTful APIs and GraphQL endpoints that provide standardized access to both historical and real-time data
5. Application Layer: Healthcare applications, analytics platforms, and clinical decision support systems that consume the unified data
Fig. 1. High-level architecture overview.
Data Virtualization Component
The Denodo-based data virtualization layer serves as the primary integration point for structured and semi-structured healthcare data sources. Key design decisions include:
Virtual Schema Design: Creation of healthcare-specific logical schemas that abstract underlying data source heterogeneity while preserving semantic meaning. The schemas follow HL7 FHIR standards for maximum interoperability [10].
Security Integration: Implementation of fine-grained access controls that map to healthcare role-based permissions. Each virtual view includes patient consent checks and audit logging capabilities.
Query Optimization: Utilization of Denodo’s cost-based optimizer to push query predicates to source systems and minimize data movement. Special attention is given to optimizing queries that span multiple healthcare domains (clinical, financial, operational).
Caching Strategy: Implementation of intelligent caching for frequently accessed clinical reference data while ensuring real-time access to critical patient information.
Real-Time Messaging Component
Google Cloud Pub/Sub serves as the backbone for real-time data distribution and event-driven communications:
Topic Design: Healthcare-specific topics organized by clinical domain and urgency level:
– healthcare.critical.alerts - Life-threatening patient alerts
– healthcare.urgent.results - Time-sensitive laboratory and imaging results
– healthcare.routine.administrative - Administrative notifications and updates
– healthcare.monitoring.vitals - Continuous patient monitoring data
Message Schema: Standardized message formats based on HL7 FHIR resources to ensure consistency and interoperability across consuming applications.
Subscription Patterns: Multiple subscription types including push subscriptions for immediate alert processing and pull subscriptions for batch analytics workflows.
Message Attributes: Standardized attributes for routing and filtering:
– patient_id: Encrypted patient identifier
– facility_id: Healthcare facility identifier
– message_type: Clinical domain classification
– priority: Message urgency level (critical, urgent, routine)
– timestamp: Message generation timestamp in ISO 8601 format
Implementation Details
Environment Setup and Configuration
The prototype was implemented using the following technology stack deployed on Google Cloud Platform:
– Denodo Platform 8.0: Deployed via GCP Marketplace with 16 vCPUs and 64GB RAM for seamless cloud integration
– Google Cloud Pub/Sub: Configured with regional topics and subscriptions
– Simulated Healthcare Data Sources:
– PostgreSQL database simulating EHR system (Epic-like schema)
– MongoDB collection representing medical imaging metadata
– Cloud SQL instance containing laboratory results
– RESTful APIs simulating real-time patient monitoring devices.
Denodo Virtual Database Configuration
The Denodo platform was configured with healthcare-specific optimizations and a structured virtual database hierarchy (Fig. 2):
Fig. 2. Denodo platform virtual database hierarchy.
Virtual View Creation Example:
CREATEVIEWpatient_unifiedAS
SELECTp.id
pNAME
l.result
v.heart_rate
v.timestamp
FROMpatient_base p
JOINlab_results l
ONp.id = l.patient_id
LEFT JOINvital_signs v
ONp.id = v.patient_id
WHEREv.timestampCURRENT_TIMESTAMPinterval‘ 1 hour ’
Security Configuration: Patient-level row security was implemented using Denodo’s interpolation variables, ensuring users can only access data for patients they are authorized to view. Role-based access control (RBAC) maps to clinical roles and responsibilities.
Google Pub/Sub Integration Implementation
Topic Creation and Configuration:
from google.cloud import pubsub_v1
publisher = pubsub_v1.PublisherClient()
topic_path = publisher.topic_path(‘healthcare-project’, ‘patient-updates’)
topic = publisher.create_topic(request={“name”: topic_path})
print(f “Created topic: {topic.name}”)
Real-time Event Subscription:
from google.cloud import pubsub_v1
import requests
subscriber = pubsub_v1.SubscriberClient()
subscription_path = subscriber.subscription_path(‘healthcare-project’, ‘denodo-cache-refresh’)
def callback(message):
print(f “Received update: {message.data}”)
# Trigger Denodo cache refresh via REST API
patient_id = message.attributes.get(‘patient_id’)
refresh_patient_cache(patient_id)
message.ack()
def refresh_patient_cache(patient_id):
# Call Denodo REST API to invalidate patient cache
denodo_url = f “ http://denodo-server:9090/server/vdp/cache/invalidate”
params = {‘view’: ‘patient_unified’, ‘filter’: f’id={patient_id}’}
requests.post(denodo_url, params=params)
future = subscriber.subscribe(subscription_path, callback=callback)
future.result()
Integration Patterns Implementation
Unified Patient View Pattern: Real-time queries combining EHR data from Denodo with streaming vital signs from Pub/Sub to provide comprehensive patient dashboards.
Clinical Decision Support Pattern: Event-driven workflows where laboratory results published to Pub/Sub trigger queries against historical patient data through Denodo to provide clinical context.
Population Health Analytics Pattern: Batch processing jobs consuming historical data from Denodo and streaming data from Pub/Sub to generate population health insights.
Performance Evaluation
Test Environment and Datasets
Data Source Transparency and Ethical Considerations
Synthetic Data for Prototype Testing: The initial prototype evaluation used entirely synthetic datasets generated using Synthea ( https://synthetichealth.github.io/synthea/), an open-source synthetic patient data generator:
1. Structured Clinical Data: 1.2 million synthetic patient records generated using Synthea v2.7.0, including demographics, diagnoses (ICD-10-CM codes), medications (RxNorm codes), and clinical observations (LOINC codes)
2. Real-time Streaming Data: Simulated patient vital signs generated using physiological models from the MIMIC-III dataset patterns, producing 100 messages per second per patient for 1,000 concurrent synthetic patients
Performance Metrics and Results
Query Performance Analysis:
– Query Latency: Measured using JMeter for 100 concurrent queries. Without Pub/Sub, average latency was 1.2s; with integration, it dropped to 0.72s (40% improvement) due to proactive caching.
– Data Freshness: Time from update to reflection in views. Pub/Sub reduced this from 5 min (polling) to <1 s.
– Real-time Data Integration: Queries combining historical data from Denodo with real-time streams from Pub/Sub showed average latencies of 312 ms
Comparative Performance Analysis Based on Synthetic Data Testing:
The performance evaluation using synthetic datasets provides baseline evidence for the architectural approach, though real-world performance will vary significantly based on specific implementation factors. The results from controlled testing with synthetic Synthea-generated datasets under idealized conditions is represented in Table I.
| Metric | Traditional ETL | Denodo Only | Denodo + Pub/Sub |
|---|---|---|---|
| Avg. Query Latency (s)* | 2.5 | 1.2 | 0.72 |
| Data Freshness (s)* | 300 | 60 | 0.8 |
| Throughput (queries/sec)* | 50 | 200 | 350 |
| Integration Complexity | High | Medium | Low |
Throughput and Scalability:
– Concurrent Query Capacity: Successfully handled up to 500 concurrent analytical queries while maintaining acceptable response times
– Message Processing Throughput: Google Pub/Sub demonstrated processing up to 100,000 messages per second with sub-second delivery latencies
– Data Volume Scaling: Testing with large datasets showed linear performance scaling for most query types.
Availability and Fault Tolerance
System Availability: The combined architecture achieved 99.7% uptime during the testing period, with most downtime attributed to planned maintenance windows.
Fault Recovery: Demonstrated robust fault recovery capabilities with automatic failover for both Denodo and Pub/Sub components. Mean time to recovery (MTTR) averaged 47 seconds for non-critical failures.
Data Consistency: Strong consistency was maintained for transactional healthcare data, while eventually consistent patterns were acceptable for monitoring and analytical data streams.
Security and Compliance Assessment
Comprehensive Security Framework
The architecture incorporates multi-layered security measures as depicted in Fig. 3:
Fig. 3. Security architecture.
Data Encryption: End-to-end encryption using AES-256 for data at rest and TLS 1.3 for data in transit. All message payloads are encrypted before transmission through Pub/Sub topics.
Access Control: OAuth 2.0 and SAML integration with healthcare identity providers. Role-based access control (RBAC) mapped to clinical roles and responsibilities, with attribute-based access control (ABAC) for fine-grained permissions.
Audit Logging: Comprehensive audit trails for all data access and modification operations, stored in tamper-evident logs that meet healthcare compliance requirements.
HIPAA and GDPR Compliance
Administrative Safeguards: Role-based access controls ensure users can only access patient data relevant to their clinical responsibilities. Comprehensive audit logs track all data access and modification activities.
Physical Safeguards: Cloud-based deployment leverages Google Cloud Platform’s HIPAA-compliant infrastructure, including secure data centers and hardware-level security controls.
Technical Safeguards: End-to-end encryption, secure authentication mechanisms, and network security controls meet HIPAA technical requirements. Patient identifiers are encrypted at rest and in transit.
Data Privacy Controls:
– Patient consent management with automatic filtering based on individual consent decisions
– Data minimization ensuring only necessary data is accessed and transmitted
– Built-in anonymization capabilities supporting research and analytics while protecting privacy.
Potential Application Scenarios and Use Cases
While this research primarily demonstrates the technical feasibility of the proposed architecture through controlled testing with synthetic data, the integration of Denodo and Google Pub/Sub has potential applications across various healthcare scenarios. The following represent theoretical use cases where such an architecture could provide value, though actual implementations would require careful evaluation and measurement of benefits.
Multi-System Clinical Data Integration
Scenario: Large healthcare systems operating multiple EHR platforms (Epic, Cerner, Allscripts) often struggle to provide clinicians with unified patient views across facilities.
Architectural Application:
– Denodo could virtualize patient data across different EHR systems using standardized HL7 FHIR R4 mappings
– Pub/Sub could handle real-time notifications when patient data is updated in any source system
– Unified APIs could provide consistent access regardless of underlying system complexity
Potential Benefits: Reduced time spent accessing patient information across systems, though actual performance improvements would need to be measured in real implementations.
Implementation Considerations: Requires extensive mapping of data schemas, careful handling of patient identifiers, and robust error handling for system outages.
Real-Time Laboratory and Diagnostic Result Distribution
Scenario: Healthcare networks with centralized laboratory services need to distribute results rapidly to multiple clinical locations and systems.
Architectural Application:
– Laboratory Information Systems publish results to Pub/Sub topics immediately upon completion
– Denodo provides unified access to historical results and reference ranges
– Multiple downstream systems subscribe to relevant result categories
Potential Benefits: Faster result delivery compared to traditional HL7 interface approaches, though specific timing improvements would depend on network infrastructure and system configurations.
Implementation Considerations: Requires careful attention to critical value alerting, message ordering, and integration with existing clinical workflows.
Population Health Data Integration
Scenario: Public health organizations need to combine clinical data from multiple providers with social determinants and environmental data for research and surveillance.
Architectural Application:
– Denodo virtualizes clinical data from participating healthcare organizations
– Pub/Sub streams real-time surveillance data and environmental monitoring information
– Analytics platforms access integrated datasets for population health insights
Potential Benefits: Reduced data preparation time for research projects and faster public health response capabilities, subject to proper data governance and privacy protections.
Implementation Considerations: Requires robust de-identification processes, careful attention to data minimization principles, and compliance with public health data use regulations.
IoMT Device Data Integration
Scenario: Healthcare organizations are increasingly deploying Internet of Medical Things (IoMT) devices that generate continuous streams of patient monitoring data.
Architectural Application:
– IoMT devices publish vital signs and alert data to Pub/Sub topics
– Denodo integrates streaming data with historical patient records and care plans
– Clinical applications access both real-time and historical data through unified interfaces
Potential Benefits: Improved clinical decision-making through better data integration, though effectiveness would depend on clinical workflow integration and alert fatigue management.
Implementation Considerations: Requires careful attention to data volume management, battery life optimization for devices, and clinical workflow integration.
Discussion
Advantages of the Hybrid Approach
Reduced Integration Complexity: Data virtualization eliminates the need for complex ETL processes and data warehousing infrastructure, while real-time messaging provides immediate data distribution capabilities. The combination minimizes data movement, reducing costs and privacy risks.
Improved Data Freshness: The hybrid approach ensures both historical and real-time data are available through unified interfaces, supporting both analytical and operational use cases with data freshness improvements from 5 minutes to under 1 second.
Enhanced Scalability: Cloud-native components provide elastic scaling capabilities that can adapt to varying healthcare data loads and seasonal patterns, with demonstrated capacity far exceeding typical healthcare messaging requirements [11].
Simplified Compliance: Both platforms provide built-in security and compliance features addressing healthcare regulatory requirements without requiring custom development.
Limitations and Considerations
Cost Implications: Cloud-based data virtualization and messaging services can become expensive for organizations with large data volumes or high query frequencies. Careful capacity planning and cost optimization strategies are essential.
Vendor Dependencies: Heavy reliance on Google Cloud services may create vendor lock-in concerns for organizations seeking multi-cloud or hybrid deployment options.
Performance Boundaries: While suitable for most healthcare applications, the architecture may not meet extreme low-latency requirements of some real-time clinical systems such as surgical robotics or intensive care monitoring.
Learning Curve: Healthcare IT staff may require training to effectively manage and optimize both data virtualization and cloud messaging components.
Future Research Directions
Machine Learning Integration: Exploring how the unified data access layer can support real-time machine learning inference for clinical decision support applications, with particular focus on federated learning approaches that preserve patient privacy.
Edge Computing Extensions: Investigating how the architecture can be extended to support edge computing scenarios in remote healthcare facilities or ambulatory care settings, particularly for resource-constrained environments.
Cross-Cloud Portability: Research into patterns and technologies that can reduce cloud vendor dependencies while maintaining the benefits of the proposed architecture, including hybrid and multi-cloud deployment strategies.
Blockchain Integration: Investigating blockchain-based approaches for audit trails and data provenance tracking within the unified access framework, with focus on practical implementations rather than theoretical applications.
Standardization and Interoperability: Development of industry-specific best practices and reference architectures following FHIR R4+ standards and emerging healthcare interoperability requirements.
Long-term Performance Studies: Conducting multi-year longitudinal studies to validate sustained performance benefits and identify optimization strategies for mature deployments.
Cost-Benefit Optimization: Developing economic models and cost optimization strategies for healthcare organizations of varying sizes and resource levels.
Recommendations for Practitioners
Implementation Phasing: Organizations should consider phased implementation starting with non-critical use cases to build expertise and validate performance before deploying for mission-critical clinical applications.
Skills Development: Significant investment in staff training and technical expertise development is essential for successful implementation and ongoing optimization.
Vendor Relationship Management: Establish clear service level agreements and data portability requirements with cloud providers to mitigate vendor lock-in risks.
Compliance Validation: Implement comprehensive compliance validation processes including third-party security audits and privacy impact assessments before production deployment.
Conclusion
This study demonstrates the technical feasibility of integrating Denodo’s data virtualization platform with Google Cloud Pub/Sub messaging for healthcare data access, while acknowledging significant limitations in the evidence base.
Contributions
Technical Contributions
The study establishes that the hybrid architecture can be implemented and configured to handle healthcare data integration patterns. The synthetic data testing provides initial evidence that such an approach could offer performance benefits over traditional ETL methods, though these results require validation in real-world environments.
Architectural Insights
The integration of data virtualization with event-driven messaging addresses theoretical gaps in healthcare data integration by combining logical data unification with real-time synchronization capabilities. The security and compliance framework demonstrates how such an architecture could potentially meet healthcare regulatory requirements.
Key Limitations
Because the study relied on synthetic data and the absence of real-world implementations represent significant constraints on the validity of findings. Performance claims, while technically plausible, lack the empirical foundation necessary for confident recommendations to healthcare organizations. The complexity of implementation and ongoing operational requirements may limit practical applicability.
Research Value
Despite limitations, this work contributes to healthcare informatics by:
– Demonstrating technical integration patterns between data virtualization and cloud messaging
– Providing a reference architecture for hybrid healthcare data integration approaches
– Identifying key implementation considerations and potential challenges
– Establishing a foundation for future empirical validation studies.
Practical Implications
Healthcare organizations interested in this approach should view the research as a starting point for further investigation rather than definitive evidence of benefits. Any implementation should begin with careful pilot testing and realistic assessment of organizational capabilities and resources.
Future Work
The important need is for controlled studies in actual healthcare environments to validate performance claims and assess real-world applicability. Additional research should focus on cost-benefit analysis, operational best practices, and alternative architectural approaches that might achieve similar benefits with reduced complexity.
The healthcare industry’s continued digital transformation requires innovative approaches to data integration challenges. While this study provides a theoretical framework and demonstrates technical feasibility, the true value of such approaches can only be determined through rigorous real-world validation and measurement of actual outcomes in diverse healthcare settings.
Conflict of Interest
The authors declare no conflicts of interest related to this work.
References
-
National Center for Biotechnology Information. Health care data standards. In: Patient safety: achieving a new standard for care. Washington (DC): National Academies Press; 2003.
Google Scholar
1
-
Vest JR, Kash BA. Differentiating between health information exchange organizations and health information organizations. Health Care Manag Rev. 2016;41(4):233–41.
Google Scholar
2
-
Capminds. The ultimate guide to healthcare data integration and unifying multiple data sources [Internet]. Healthcare IT Solutions Blog. 2024. [cited 2025 Oct 6]. Available from: https://www.capminds.com/.
Google Scholar
3
-
Chary A. Leveraging Hyperforce for scalable and secure multi-cloud solutions. 2024.
Google Scholar
4
-
Adler-Milstein J, Embi PJ, Middleton B, Sarkar IN, Smith J, Crossing S. Clinical informatics board certification: history, current state, and future direction. J Am Med Inform Assoc. 2014;21(4):722–6.
Google Scholar
5
-
Belle A, Thiagarajan R, Soroushmehr SMR, Navidi F, Beard DA, Najarian K. Big data analytics in healthcare. Biomed Res Int. 2015;2015:370194.
Google Scholar
6
-
Raghupathi W, Raghupathi V. Big data analytics in healthcare: promise and potential. Health Inf Sci Syst. 2014;2:3.
Google Scholar
7
-
Jankovic M, Varga J, Radivojevic Z. Schema on read modeling approach for big data integration: healthcare case study. Enterp Inf Syst. 2018;12(8):1067–89.
Google Scholar
8
-
Wang W, Li Y, Zhang H, Chen C. Enabling real-time information service on telehealth system over cloud-based big data platform. J Syst Archit. 2017;75:35–48.
Google Scholar
9
-
Shah R. Bridging the gap in integrated care using linked data and FHIR standards. Int J Sci Res. 2023;12(1):789–94.
Google Scholar
10
-
Bussa R. Evolution of data engineering in modern software development. 2024.
Google Scholar
11





