WES data of 286 diffuse gliomas according to the 2021 WHO classification of tumors of the central nervous system
A total of 286 glioma tissues and matched peripheral blood samples were collected from Beijing Tiantan Hospital, Beijing Puren Hospital and Beijing Sanbo Hospital. All patients were diagnosed with diffuse glioma by consensus, based on multiple pathological reviews by independent board-certified neuropathologists and classified according to the 2021 WHO classification.
All research was approved by the Institutional Review Board of Tiantan Hospital (IRB) and conducted under IRB KY2013-017-01. Written informed consent was obtained from all patients in accordance with the requirements of the Beijing Tiantan Hospital Ethics Committee, and the principles of the Declaration of Helsinki were strictly followed.
Specimens were frozen in liquid nitrogen within 5 minutes of resection. Follow-up information for each patient was also collected, including general information, survival status, clinical therapy, neuropathological classification, and required molecular information (Supplementary Table 1).
Whole exome sequencing
Genomic DNA from tumor tissue and paired blood samples was extracted and confirmed to have high integrity by 1% agarose gel electrophoresis. DNA concentration was measured by a Qubit® DNA analysis kit in a Qubit® 2.0 fluorometer (Invitrogen, USA). A total amount of 0.6 μg of genomic DNA per sample was used as input material for sample DNA preparation. Sequencing libraries were generated using an Agilent SureSelect Human All Exon V6 kit (Agilent Technologies, CA, USA) following the manufacturer’s recommendations and index codes were added to each sample. Clustering of index-coded samples was performed on a cBot cluster generation system using a HiSeq PE cluster kit (Illumina, USA) according to manufacturer’s instructions. After cluster generation, DNA libraries were sequenced on an Illumina HiSeq platform and 150 bp paired reads were generated.
Mutation Mapping and Calling
Whole exome sequencing data were mapped to hg19 genome by applying BWA software (version 0.7.12-r1039, bwa mem)12 with default settings. We used SAMtools (version 1.2)13 (http://broadinstitute.github.io/picard/) to sort reads by coordinates and applied Picard (version 2.0.1, Broad Institute) to mark duplicates for further analysis. An empirical Bayesian tool – SAVI2 was applied to calling somatic mutations (including SNVs and short insertions/deletions) as previously described14.15. In this pipeline, SAMtools mpileup and bcftools were used to find variants, and then preliminary variants were further filtered if the following criteria were met: (1) insufficient sequencing depth; (2) positions with only low quality reads; (3) positions biased towards one or the other strand. In particular, mutations were selected if the frequency of the mutation allele in tumors was significantly higher than in normal controls. Additionally, we used the CNVkit16 software to detect copy number changes. The entire process data flow is shown in Fig. 1.
Supplementation in molecular pathological characters
The HDI1 and HDI2 (HDI) mutation status was collected from pathology records and examined by pyrosequencing or immunohistochemistry with anti-IDH1 antibody R132H. TER promoter mutation (TERTp) information was collected from pathology records reviewed by pyrosequencing. Due to early patient collection, some patients lacked information about the TERTp mutation.
EGFR amplification, CDKN2A/B homozygous deletion, chromosol 1p/19q co-deletion, chromosol 7 amplification and chromosol 10 deletion were calculated by CNVkit16 and manually verified in Integrative Genome Viewer by two independent molecular neuropathologists.
Relevance to the TCGA and Rembrandt cohort
The Cancer Genome Atlas (TCGA) includes 916 cases of whole exome sequencing data from 401 GBMs and 515 lower grade gliomas (LGGs), and the Rembrandt Cohort includes 263 cases of SNP array data from gliomas. However, Asian patients accounted for less than 5%. In addition, due to the lack of subsequent updates, the patients included could not be classified by the new classification and could not be used in the new search system.
For our dataset, 286 Chinese glioma patients with whole exome sequencing data were included, bridging the gap between these two cohorts. Our dataset could be used as an independent validation dataset for comparative TCGA analysis, calling for more focal copy number changes and covering more genetic mutations than the Rembrandt cohort.
More importantly, building on the CGGA project, our dataset included many newly reported molecular disease biomarkers, and patients can be classified according to the WHO 2021 classification, providing crucial materials for global researchers. of the DG.