Draft:Trusted Research Environment

A Trusted Research Environment (TRE), also referred to as a Secure Data Environment (SDE) or Data Safe Haven, is a secure, governed computing environment that enables approved researchers to analyse sensitive data without the data leaving the data owner's control. Data residency, privacy, security, and auditability are enforced through federation, controlled output release (Airlock), and network-level controls, among others, allowing only approved aggregated results, not raw individual-level data, to leave the environment.[1] [2].

The term was first defined by Health Data Research UK [3] [4], which described a TRE as a platform that operationalises the Five Safes Framework, originally developed by the UK Office for National Statistics.[5] TREs emerged in the UK around 2010 as a response to the risk and inefficiency of traditional data-transfer models, and have since become widely adopted for secure biomedical and health data research at population scale.[6]

Architecture and Capabilities

A TRE represents a third architectural model distinct from both SaaS and DIY deployment models. It combines the data residency and control of a self-managed DIY approach with the managed-service capabilities of a SaaS platform. A TRE is built around the principle of bringing researchers to the data, rather than moving data to the researcher or a third-party SaaS vendor. The data custodian retains control of the data storage and compute environment at all times, and sensitive patient-level data never leaves it.

Core architectural pillars

Three architectural controls are widely recognised as defining a true TRE:

  1. Federation: A federated architecture enables in-place querying and computation across distributed environments without moving data. The compute travels to the data, not the reverse. Without federation, data must be copied or centralised, which violates the Safe Data principle.
  1. Airlock (output control): An Airlock is a mandatory control point governing all data outputs. No results leave the environment without explicit review and approval, ensuring Safe Data and Safe Outputs are enforced by design. Without an Airlock, users retain a path to directly export data, breaking both principles.
  1. Firewall enforcement: Active firewall settings block any attempt to bypass the Airlock or export data outside approved pathways. Without firewall-level enforcement, governance relies on policy rather than architecture, and Safe Data cannot be guaranteed.

How TREs implement the Five Safes: architectural controls

TREs are governed by the Five Safes Framework, originally developed by the Office for National Statistics[7] to provide a structured approach to enabling access to sensitive data while managing risk. The framework defines five pillars of safety. Below we describe the architectural controls a TRE uses to enforce these safety pillars:

  • Safe People means that only vetted, trained researchers are granted access. TREs enforce mandatory information governance training and require researchers to sign legally binding agreements prior to access, with clear penalties for misuse. Access is further controlled through granular role-based access controls (RBAC), which restrict each researcher to only the datasets and tools required for their approved project. This is combined with IP restrictions, which limits access to approved networks and locations, and multi-factor authentication (MFA) to verify identity at every login.
  • Safe Projects means that data use is restricted to ethically approved research projects. Independent review boards evaluate every research proposal, this is commonly referred to as an IRB or DAC process. This process includes ethical review and data protection impact assessments. TREs restrict each researcher to only the datasets and tools required for their approved project, through granular role-based access controls (RBAC).
  • Safe Settings means that the data custodian retains full ownership over the data storage and computing environment, including logs and audit trails. TREs enforce data residency through federated architecture, with data querying and compute happening in place. TREs further enforce safe settings through continuous monitoring and management of system health and security posture, providing real-time assurance that the environment remains compliant and tamper-evident at all times. Vulnerabilities identified across the system and firewall are remediated immediately, ensuring that no exposure window remains open and that the security perimeter is continuously hardened against emerging threats.
  • Safe Data means that:
    • data stays in-place with the Data Custodians controlled secure data environment. TREs enforce this via federation and capabilities that securely setup, manage and observe the system and firewall.
    • the data is encrypted at all times. TREs enforce this by providing encryption capabilities at rest, in transit, and during computation.
    • the data is de-identified (aka pseudonymized, stripped of direct identifiers, and assessed against re-identification risk to ensure that no combination of remaining variables could be used to identify an individual). TREs provide built-in data transformation tools that let researchers clean, link, and prepare datasets inside the environment without ever touching raw records. Every step is automatically logged as part of an immutable lineage record, capturing the full provenance of any output: what was done, by whom, and when. This gives custodians a complete data lineage and audit trail rather than just a snapshot of the final result.
    • data access to researchers is limited to the specific data fields and records necessary for the approved research project, and nothing beyond. TREs accomplish this via granular role-based access controls (RBAC).
    • individual-level data never leaves its source environment and only aggregate resulting data can be exported. TREs enforce this by providing Airlock (output checking) capabilities.
  • Safe Outputs means that every output, like charts, statistical summaries, derived datasets, trained models, and any other artefact produced within the environment, must pass through a mandatory review process before it can leave. No export path exists outside of this controlled process. TREs enforce this through two complementary architectural controls: the Airlock and firewall configuration. The Airlock acts as the sole authorised data export point for all outputs. Every data export request is subject to structured review (automated, human, or both) to assess whether the output could reveal sensitive or re-identifiable information before approval is granted. This ensures that Safe Outputs is enforced by architecture, not by researcher self-reporting or policy alone. Firewall configuration provides the complementary enforcement layer. Hardened firewall rules block all alternative egress paths, ensuring that no data can leave the environment through any route other than the Airlock. Together, the Airlock and firewall form a closed output perimeter: the Airlock governs what is permitted to leave, and the firewall ensures nothing else can.

SaaS and TRE architectural differences

A common point of confusion is the distinction between platforms that market themselves as TREs and architectures that fully satisfy the Five Safes Framework. In a SaaS model, data is copied into a vendor's cloud and researchers work inside that vendor-controlled environment. This creates three architectural problems:

  • Safe Settings fails because the environment belongs to the vendor, not the data controller
  • Safe Data fails because a copy of the data has been moved into the vendor's infrastructure, a different jurisdiction with a different security model.
  • Safe Outputs is weakened because researchers may retain a path to download derived data to a local machine.

In a TRE model, data stays at its source. Compute travels to the data, not the reverse, and every output passes through a mandatory Airlock review before it can leave the environment. No researcher-initiated download path exists.

History

TREs emerged in the United Kingdom around 2010, initially developed in response to growing concerns about the risk and inefficiency of traditional data-sharing models for NHS patient data [6]. The approach was formalised by Health Data Research UK [3] [4], drawing on the Five Safes Framework developed by the Office for National Statistics [5].

The TRE model moved from concept to operational reality with the deployment of a federated TRE for Genomics England, one of the world's largest national genomic programmes, by Lifebit [8] [9] [10]. This deployment established a production-grade architecture in which federation, Airlock output controls, and active firewall enforcement operated together as a unified, enforceable standard, not as independent security controls bolted onto an existing platform, but as a coherent architectural model in which each pillar is necessary for the others to hold. The deployment established a reference implementation for TRE security and operational standards, requiring compliance across the following frameworks:

In addtion, compliance is maintained through continuous security posture management rather than periodic audits alone, where the environment is monitored and hardened in real time, vulnerabilities are remediated immediately, and the security state is evidenced at all times, providing data controllers and regulators with continuous assurance rather than point-in-time attestation.

This reference model has since been adopted worldwide for secure research of genomics and real wold data [1] [2], across pharmaceutical companies, like Boehringer Ingelheim [11], biotechs, like Flatiron Health [12] and 23andMe [13], and government programmes, like Singapore's national TRUST platform (Trusted Research and Real World Data Utilisation)[14] and Canada's CanPath initiative (the Canadian Partnership for Tomorrow's Health)[15]

Applications

TREs are used across a range of settings where sensitive data must be made available for research without compromising data residency, privacy, security or regulatory compliance:

  • National health programmes: Governments and national health agencies use TREs to enable approved access to population-scale health records, genomic data, and linked datasets while preserving data sovereignty.
  • Pharmaceutical and biotech research: Companies use TREs to access real-world evidence from hospital systems, biobanks, and national registries for drug development and biomarker discovery, without the data leaving the source institution.
  • Academic and cross-institutional research: TREs enable collaborative studies across multiple institutions without requiring data transfer agreements or centralisation of data. A researcher querying across several national biobanks sends the query to each environment; only aggregated, non-identifiable results are returned.
  • Population genomics programmes: TREs support compute-intensive workflows such as genome-wide association studies (GWAS), variant effect prediction (VEP), and polygenic risk scoring (PRS) on population-scale whole-genome datasets without data movement.

See also

References

  1. ^ a b Alvarellos, Maria; Sheppard, Hadley E.; Knarston, Ingrid; Chatzou Dunford, Maria (2023-01-10). "Democratizing clinical-genomic data: How federated platforms can promote benefits sharing in genomics". Frontiers in Genetics. 13 1045450. doi:10.3389/fgene.2022.1045450. PMC 9871385. PMID 36704354.
  2. ^ a b Nik-Zainal, Serena; Seeger, Thorben; Chatzou Dunford, Maria (2022). Multi-party trusted research environment federation: Establishing infrastructure for secure analysis across different clinical-genomic datasets (Report). doi:10.5281/zenodo.7085536. Retrieved 2026-04-29.
  3. ^ a b "New principles published to improve public confidence in access and use of data for health research through trusted research environments". Health Data Research UK. Retrieved 2021-12-08.
  4. ^ a b Varma, Susheel; Hubbard, Tim; Seymour, David (2021-12-08). Building Trusted Research Environments: Principles and Best Practices; Towards TRE ecosystems (Report). UK Health Data Research Alliance. Retrieved 2026-04-29.
  5. ^ a b "The Five Safes Framework". GOV.UK. Retrieved 2026-04-29.
  6. ^ a b Goldacre, Ben; Morley, Jessica (2022-04-07). Better, broader, safer: using health data for research and analysis (Report). Department of Health and Social Care. Retrieved 2026-04-29.
  7. ^ "About the Secure Research Service". Office for National Statistics. Retrieved 2026-04-29.
  8. ^ "The data platform that breaks down barriers and transcends borders". Nature. 2022-04-11. Retrieved 2026-04-29.
  9. ^ "Genomics England launches next-generation research platform central to UK COVID-19 response". Genomics England. 2020-07-01. Retrieved 2026-04-29.
  10. ^ "Lifebit". Retrieved 2026-04-29.
  11. ^ "Partnership Boehringer Ingelheim Lifebit health data". Boehringer Ingelheim. 2022-03-01. Retrieved 2026-04-29.
  12. ^ "Lifebit and Flatiron Health announce partnership to accelerate cancer data research". Flatiron Health. 2023-05-31. Retrieved 2026-04-29.
  13. ^ "23andMe Launches Discover23 to Help Accelerate Large-Scale Genetics Research, Powered By Lifebit's Trusted Technology". 23andMe. 2025-01-08. Retrieved 2026-04-29.
  14. ^ "Lifebit opens R&D centre in Singapore in collaboration with Synapxe". Synapxe. 2024-02-22. Retrieved 2026-04-29.
  15. ^ "Lifebit, CanPath and AWS collaborate to advance health research with innovative cloud-based data analytics platform". CanPath. 2024-09-17. Retrieved 2026-04-29.

Content Disclaimer

Informasi ini disarikan dari Wikipedia dan disajikan kembali untuk tujuan edukasi. Konten tersedia di bawah lisensi CC BY-SA 3.0. Kami tidak bertanggung jawab atas ketidakakuratan data yang bersumber dari kontribusi publik tersebut.

  1. The information displayed on this website is sourced in part or in whole from Wikipedia and has been adapted for the purpose of restating it. We strive to provide accurate and relevant information, however:
  2. There is no guarantee of absolute accuracy. Wikipedia is an open, collaborative project that can be edited by anyone, so information is subject to change.
  3. It is not intended to constitute professional advice. The content displayed is for informational and educational purposes only. For important decisions (e.g., medical, legal, or financial), please consult a professional.
  4. Content copyright. Wikipedia is licensed under the Creative Commons Attribution-ShareAlike License (CC BY-SA). This means that content may be reused with appropriate attribution and shared under a similar license.
  5. Responsible use. Any risk arising from the use of information from this website is entirely the responsibility of the user.