They can be used instead of real patient data for complex statistical analyses as well as machine learning and artificial intelligence (AI) research applications. These high-fidelity synthetic datasets replicate the complex clinical relationships in real primary care patient data while protecting patient privacy as they are wholly synthetic. The synthetic data generation and evaluation framework used to generate this synthetic dataset and the synthetic datasets are owned by the Medicines and Healthcare products Regulatory Agency (MHRA).Ī detailed technical description of the methodology used to generate the synthetic datasets is available in the publications by Wang et al. Applicants from organisations that are not existing CPRD clients will also need to submit a new client request form.Įxisting multi-study licence (MSL) clients will not need a separate DSA to access the CPRD Aurum sample dataset as this will be added to their MSL agreement. MSL clients do not need to submit a synthetic data access request form and can apply to access to the CPRD Aurum sample dataset at no additional cost by emailing CPRD synthetic data access request form v1.1 (Word, 410KB, 3 pages) High-fidelity synthetic datasetsĬPRD has generated high-fidelity synthetic datasets using a synthetic data generation and evaluation framework that was developed under a grant from the Regulators’ Pioneer Fund launched by The Department for Business, Energy and Industrial Strategy (BEIS) and managed by Innovate UK. The three synthetic datasets will need a data sharing agreement (DSA) with the applicant’s organisation for access in line with advice received from the Information Commissioner’s Office (ICO) Innovation Hub in response to a formal query by the MHRA.įor access to these datasets please submit the CPRD Synthetic data access request form to ‘Synthetic data access request’ in the email subject header. Pricing information is available from CPRD Aurum and CPRD GOLD sample datasets CPRD COVID-19 symptoms and risk factors synthetic datasetĪn additional fee will apply for an annual teaching licence for the high-fidelity synthetic datasets.ĬPRD has also developed the CPRD Aurum sample dataset, a medium-fidelity synthetic dataset that resembles the real world CPRD Aurum.CPRD cardiovascular disease synthetic dataset.Two high-fidelity synthetic datasets are being made available with a nominal administrative fee: CPRD has generated a number of synthetic datasets that can be used for training purposes or to improve algorithms or machine learning workflows.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |