Choosing The Right Data Infrastructure for Human Performance Applications

Jun 20, 2024 10:17:24 AM


Organizations are increasingly leveraging operational data to enhance their performance. Success stories abound across various sectors:

  • E-commerce companies use customer activity data to improve user experience, recommend products, and boost revenue.
  • Datacenter and application service operators utilize system log data to optimize performance and reduce costs.

Analytical applications for these use cases in the past relied heavily on data warehouses built on relational database management systems (RDBMS). However, over the last decade, data infrastructure has undergone significant evolution. New architectures, analytical capabilities, and infrastructure options have emerged, offering deeper insights, better performance, and greater cost efficiency. These advancements enable organizations to tailor their infrastructure to specific types of operational data and domain needs.


One area that has lagged in the deployment of data infrastructure is human performance. Historically, efforts to improve human performance through data have been limited compared to other operational enhancements. This is rapidly changing. The increased availability of relevant data, combined with advancements in data analysis infrastructure, is making human performance applications a priority for a growing number of organizations.


The following sections describe the specific needs of human data-focused applications and how they should drive infrastructure choices.

Opportunities with Human Health & Performance Data 

The sources and types of data that can provide insight into human performance are becoming increasingly plentiful. These include:



  • Wearable devices
  • Instrumented tests
  • Video capture
  • Surveys/Questionnaires
  • Enterprise systems
  • Medical Record Systems


  • Sleep
  • Activity
  • Cardiovascular function
  • Musculoskeletal function
  • Cognitive function
  • Body composition
  • Respiratory function
  • Performance measures

Likewise, the potential benefits of improved human health and performance insights are increasingly important to organizations. These benefits include:

  • Increased availability
  • Injury reduction
  • Better job assignment
  • Higher task performance
  • Faster recovery
  • More efficient use of resources

Human Data Platform

As organizations recognize these benefits, selecting the appropriate data infrastructure to support human performance applications becomes crucial. This involves understanding the unique requirements of human data and choosing systems that can effectively handle the volume, variety, and velocity of this data while providing actionable insights.

Critical Requirements for Human Health & Performance Platforms

As organizations recognize these benefits, selecting the appropriate data infrastructure to support human performance applications becomes crucial. This involves understanding the unique requirements of human data and choosing systems that can effectively handle the volume, variety, and velocity of this data while providing actionable insights.

Flexible & Scalable Data Collection and Storage:

  • Flexibility: Data collection must be convenient and comprehensive, allowing for the gathering of data from various sources, such as wearables and edge devices, directly where people are.
  • Completeness: It's essential to capture both voluminous raw data (e.g., sensor outputs, video, images) and processed data with contextual information.
  • Scalability: The infrastructure must store potentially petabytes of data indefinitely, creating a valuable asset that can be re-mined for additional insights. Once data is missed, it cannot be recaptured retroactively.

We can't go back in time and capture the data we wish we had.

Complex and Scalable Computation:

  • Human performance analysis requires advanced computational capabilities beyond standard aggregations and statistical calculations.
  • The infrastructure should support machine learning training and deployment, high-dimensional vector computation, and time series analysis.
  • This complex, big data problem demands sophisticated computational power to derive meaningful insights from human performance data.

We need big data for big problems.

High Levels of Security, Privacy, and Governance:

  • Protecting Personally Identifiable Information (PII) and Personal Health Information (PHI) is imperative.
  • Robust governance protocols must be in place to ensure traceability of metrics from their source data through all computation layers.
  • Strong governance allows organizations to retrace steps, correct errors, and continuously improve data insights from historical data.

These requirements highlight the importance of choosing a data infrastructure that can support the intricate demands of human health and performance platforms, ensuring comprehensive data collection, advanced computation, and stringent security and governance.


Protecting PII and PHI is non-negotiable.

Data Infrastructure for Human Performance Applications

The data infrastructure for human performance applications has significantly evolved, as illustrated in Figure 1.


Figure 1

While many traditional data warehouse-based approaches, including most commercial Athlete Management Systems (AMS), remain in use, they are increasingly replaced by more cost-effective designs offering superior analytical and governance capabilities. Modern data platforms for human performance applications include the following key elements:

Data Lakehouse:

  • Storage: Utilizes low-cost, highly durable, and available base storage services (e.g., AWS S3).
  • Data Types: Supports a wide array of data types, including tabular data, time series data, images, video, and various forms of semi-structured and unstructured data.
  • Query Layer: Provides a flexible query layer that efficiently supports different use cases:
    • Low latency responses for dashboards and tabular data views
    • Bulk or streaming ingestion of data
    • Data export
    • Data loading for AI/ML processing

Data Catalog:

  • Metadata: Enriched metadata that includes population distribution characteristics, computational relationships, and provenance data.
  • Diversity: Supports diverse data sources and types, along with large volumes of metadata.

Integrated Compute Engine & AI/ML Pipeline:

  • Analysis Models: Supports complex, high-dimensional analysis models.
  • Resource Optimization: Optimizes compute resources for different workloads.

Built-in Governance Framework:

  • Lineage Traceability: Traces the lineage of metrics from source data through multiple levels of transformation.
  • Data Control: Ensures strict controls over data access and transformation.

These modern data platform capabilities enable organizations to manage the complex and diverse data associated with human performance applications, providing the necessary infrastructure for advanced analytics and robust governance.



Harnessing data to enhance human health and performance is beneficial for all organizations, not just elite sports or military entities. By investing in the right data infrastructure, organizations can create a robust data asset that improves operational efficiency and increases in value over time. This infrastructure enables comprehensive data collection, advanced analytics, and stringent governance, ensuring that organizations can derive actionable insights and maintain high standards of data security and privacy.