CCLE metadata
ABOUT METADATA FOR DATASETS > CCLE metadata
On this page:
Overview
Metadata is data that describes other data. On this page, we've detailed CCLE metadata that are available for viewing and filtering Cancer Cell Line Encyclopedia (CCLE) data in the Data Browser, and the Datasets API. The CCLE contains Open Access sequencing data in the form of reads aligned to the hg19 reference genome for nearly 1000 cancer cell line samples, as available from cgHub on May 11, 2016.
CCLE metadata on the CGC consist of entities and their properties.
Entities are particular resources with UUIDs, such as files, cases, samples, and cell lines.
Properties can either describe an entity or relate that entity to another entity. For instance, properties include an entity's vital status, sex, data format, or experimental strategy.
Entities for CCLE include:
- CCLE Cell line, which represents data generated for each cell line. Dependent elements include biospecimen data such as Sample and clinical data such as Investigation.
- Aliquot
- File
Below, each of these three entities is followed by a table of their related properties.
CCLE Cell line
The CCLE Cell line entity represents cell lines, which are permanently established cell cultures that will proliferate indefinitely given appropriate fresh medium and space. The CCLE Cell line entity contains these cell lines' clinical and biospecimen data. See the table below for clinical and biospecimen properties and descriptions of CCLE Cell line.
| Properties | Description | 
|---|---|
| ID | A human-readable identifier, such as a number or a string that may contain information about the entity. This identifier is often referred as submitter ID. | 
| Program | The research program under which the data was generated. See NCI Thesaurus Code: C82662. | 
| Investigation | A value denoting the project or study that generated the data. See NCI Thesaurus Code: C41198. | 
| Sex | The collection of behaviors and attitudes that distinguish people on the basis of the societal roles expected for the two sexes. See NCI Thesaurus Code: C17357. | 
| Disease type | The type of the disease or condition studied. See NCI Thesaurus Code: C2991. | 
| Disease type abbreviation | An acronymn or initials for the disease or condition studied. See NCI Thesaurus Code: C2991. | 
| Primary site | The anatomical site where the primary tumor is located in the organism. See NCI Thesaurus Code: C43761. | 
| Histologic diagnosis | Diagnosis of a disease based on the type of tissue, where type is determined based on the microscopic examination of tissue. See NCI Thesaurus Code: C61478. | 
| Histology | The study of the structure of the cells and their arrangements to constitute tissues and the association among these to form organs. In pathology, the microscopic process of identifying normal and abnormal morphologic characteristics in tissues, by employing various cytochemical and immunocytochemical stains. See NCI Thesaurus Code: C16681. | 
| Note | A brief written record which provides information on cell line relations. For instance, notes mention if two cell lines come from the same patient. See NCI Thesaurus Code: C42619. | 
| Sample name | A specific name given to material taken from a biological entity for testing, diagnosis, propagation,treatment, or research purposes, including but not limited to tissues, body fluids, cells, organs, embryos, body excretory products, etc. See NCI Thesaurus Code: C70713. | 
| Sample type | The type of material taken from a biological entity for testing, diagnosis, propagation, treatment, or research purposes. This includes tissues, body fluids, cells, organs, embryos, body excretory products, etc. See NCI Thesaurus Code: C70713. | 
| Sample type code | Code that determines the type of material taken from a biological entity for testing, diagnosis, propagation, treatment, or research purposes. This includes tissues, body fluids, cells, organs, embryos, body excretory products, etc. See NCI Thesaurus Code: C70713. | 
| Source | Commercial vendors or academic labs that the cell lines were obtained from. | 
Aliquot
The aliquot entity in the CCLE metadata schema refers to aliquots, products or units extracted from a sample or specimen 's portion and prepared for analysis. Members of the aliquot entity can be identified by a Universally Unique Identifier (UUID). See below for metadata properties and descriptions relating to the aliquot entity.
| Property | Description | 
|---|---|
| ID | A human-readable identifier, such as a number or a string that may contain metadata information. This identifier is often referred as submitter ID. | 
File
The file entity in the CCLE metadata schema refers to the files in CCLE produced by aliquot analyses. See below for metadata properties and descriptions relating to the file entity.
| Property | Description | 
|---|---|
| Analyte type | Defines the type of an analyte on molecular bases. | 
| File size | Size of a file measured in bytes (B), kilobytes (KB), megabytes (MB), gigabytes (GB), terabytes (TB), and larger values. | 
| Data format | The type of format that determines data content. | 
| Experimental strategy | The method or protocol used to perform the laboratory analysis. See NCI Thesaurus Code: C43622. | 
| Platform | The version (for instance, manufacturer or model) of the technology that was used for sequencing or assaying. See NCI Thesaurus Code: C45378. | 
| Data submitting center | This field takes a string denoting the name of the center that has submitted data. | 
| Data submitting center code | Alphanumerical values assigned to the center that has submitted the data. | 
| Last modified date | Date the file was last modified. | 
| Published date | Date the file was published. | 
| Storage path | The storage path of the file | 
| Reference genome | The reference assembly (such as HG19 or GRCh37) to which the nucleotide sequence of a case can be aligned. | 
| Access level | A boolean value indicating Controlled Data or Open Data. Controlled Data is data from public datasets that has limitations on use and requires approval by dbGaP. Open Data is data from public datasets that doesn't have limitations on its use. | 
| Submitter ID | Analytical identification assigned by the center that submitted the data. | 
Updated 7 months ago
