Data Access Committee
DATA ACCESS POLICY
CF Airway Metagenomics and Clinical Metadata Resource Version 0.1 – Draft for Review. Note: This DAC has not been approved and the data is not currently available.
1. Purpose
This Data Access Policy defines the principles, governance, and operational procedures governing access to the CF Airway Metagenomics Dataset (“the Dataset”). The policy ensures that data sharing occurs ethically, securely, and in a manner consistent with participant consent, institutional ethics approvals, and relevant legislation. The overarching objective is to maximise scientific and clinical benefit while protecting participant confidentiality and preventing unauthorised use, including any research involving human genomic data.
2. Scope
This policy applies to all data generated through the CF airway metagenomics project.
2.1 Sequencing Data
Covered datasets include:
- Raw sequencing reads (fastq files) generated on MGI, MinION, and PromethION platforms.
- Processed microbial, viral, and functional profiles.
- Assembled contigs and MAGs excluding any human-derived sequences.
- Derived computational features (e.g., neural network latent clusters, ML predictors).
2.2 Metadata
- De-identified clinical and laboratory metadata (e.g., antibiotic administration, culture outcomes, age categories, lung-function metrics).
- Longitudinal metadata (sampling intervals, medication trajectories), which may carry increased re-identification risk.
All metadata will be released in de-identified and, where appropriate, aggregated or date-shifted form.
2.3 Explicit Exclusion of Human Genomic Data
The Dataset excludes human DNA sequences. Any incidental human reads present in raw sequencing files are considered unintentional artefacts and must not be analysed, interpreted, or retained.
2.4 Exclusions
This policy does not apply to:
- Identifiable participant information.
- Clinical records or hospital identifiers.
- Human genomic or genetic data of any kind.
3. Data Access Governance
Data access is overseen by a Data Access Committee (DAC).
3.1 Composition
Members are appointed by the study leadership and comprises:
- Robert Edwards, Chair
- A CF clinician or clinical scientist.
- A data custodian or bioinformatics representative.
- An ethics/governance representative.
- A consumer or community representative.
3.2 Responsibilities
The DAC will:
- Evaluate and approve/decline data access applications.
- Ensure alignment with ethics approvals and participant consent.
- Assess re-identification risk, including risks arising from longitudinal metadata.
- Enforce prohibitions on human genomic research.
- Manage compliance, investigate breaches, and update policy as required.
4. Data Access Tiers
4.1 Tier 1 — Open Access
Includes:
- Aggregated statistics.
- High-level abundance summaries.
- Publicly shareable metadata that cannot be linked to individuals.
4.2 Tier 2 — Registered Access
Available to bona fide researchers who register and agree to basic conditions:
Tier 1 plus:
- Raw sequencing data (with any incidental human reads removed).
- De-identified metadata without high-risk identifiers.
4.3 Tier 3 — Controlled Access
Requires full DAC review and a signed Data Use Agreement (DUA):
Tier 2 plus:
- Longitudinal metadata.
- Detailed clinical annotations.
- Any dataset where linkage may elevate re-identification risk.
5. Eligibility for Access
Applicants must:
- Be affiliated with a recognised academic, clinical, government, or not-for-profit organisation (industry permitted where appropriate).
- Demonstrate scientific or methodological capability relevant to their proposed research.
- Provide a clear research plan.
- Provide evidence of ethics approval or exemption.
- Demonstrate capacity for secure storage and high-performance analysis of large sequencing datasets.
Students must list a supervisor as the responsible investigator.
6. Application Process
Applicants must submit:
- A completed Data Access Application Form.
- A research proposal (1–2 pages).
- Ethics approval documentation or exemption.
- A data security plan compliant with institutional and NHMRC guidelines.
- Agreement to the Data Use Agreement (DUA).
- A list of requested data types and access tier(s).
The DAC aims to review applications within 21 days.
7. Conditions of Data Use
7.1 Privacy and Prohibition on Re-identification
Users must:
- Not attempt to identify participants.
- Not combine the Dataset with external information for re-identification purposes.
- Not attempt to contact participants, clinicians, or health services.
7.2 Data Security
Users must:
- Store data within secure institutional environments (HPC clusters, encrypted storage).
- Not store data on unencrypted laptops, USB drives, or personal cloud storage unless explicitly approved.
- Report any data breach to the DAC within 48 hours.
7.3 Human DNA Exclusion Requirements
Because the Dataset is restricted to non-human genomic research:
-
Human DNA sequences must not be analysed, interpreted, retained, or used for any purpose, including genomic, medical, ancestry, or computational analysis.
-
Users must employ standard host-removal screening pipelines to identify residual human reads.
-
If human sequences are detected:
- The DAC must be notified immediately, including sequence identifiers (e.g., read IDs, contig IDs).
- The user must delete all human-derived sequences from all working locations, archives, or backups.
- Confirmation of deletion must be provided to the DAC.
- No downstream dataset may incorporate information derived from these reads.
-
Any deliberate use of human genomic material constitutes a breach and may trigger:
- Revocation of data access;
- Notification to the user’s institution;
- Mandatory reporting to the Human Research Ethics Committee (HREC).
7.4 Redistribution
Users must not share data with third parties. Collaborators must independently obtain approval or be named on the original application.
7.5 Publication and Attribution
Publications using the Dataset must:
- Credit the Dataset, primary publications, and funding bodies.
- Use appropriate accession numbers.
- Notify the DAC on manuscript acceptance.
7.6 Project Completion and Data Destruction
At project end, users must:
- Delete or destroy all data unless an extension is granted.
- Provide confirmation of destruction if requested.
8. Ethical and Legal Compliance
Users must comply with:
- The National Statement on Ethical Conduct in Human Research (NHMRC).
- Australian Privacy Principles.
- GDPR for users in the EU or collaborating internationally.
- Participant consent constraints, including strict prohibition on human genome research.
- All conditions set out in ethics approvals for the Dataset.
9. Incidental Findings
The dataset is not validated for clinical diagnosis. Users:
- Must not provide clinical interpretations based on metagenomic data.
- Must report any potential clinically relevant incidental findings to the DAC only, not directly to patients or clinicians.
10. Transparency and Accountability
To support good governance:
- The DAC may maintain a public list of approved projects (without sensitive details).
- DAC processes will be reviewed annually.
- Access may be revoked for non-compliance.
11. Policy Review and Version Control
This policy will be reviewed annually or updated when:
- New data types are added;
- Ethics approvals change;
- Legal requirements evolve;
- New risks emerge.
Updates will be logged and versioned.
12. Contact Information
All correspondence and data access enquiries should be directed to Robert Edwards, robert.edwards@flinders.edu.au