Download

The PhosCancer database provides downloadable files in TXT format, including:

  1. Phosphoproteomics: It is the preprocessed phosphoproteome data. Data from MS-GF+ and MASIC, combined with a phosphosite localization tool Ascore, had all 0 values replaced with NA, median centering and then log2 transformation. For the matrix, the rows represent the phosphosites, and the columns represent the samples. "Corrected with Protein" indicates that the statistical analysis is based on phosphorylation data adjusted for protein expression.

  2. Proteomics: It is the preprocessed protein data. Data from MS-GF+ and MASIC had all 0 values replaced with NA, median centering and then log2 transformation. For the matrix, the rows represent the proteins, and the columns represent the samples.

  3. Hallmark activity: It is the preprocessed activity matrix of hallmark-based protein expression, using the ssGSEA method. Data from MS-GF+ and MASIC had all 0 values replaced with NA, median centering and then log2 transformation. Proteins with >50% missing values were excluded, and a k-nearest neighbors imputation procedure was applied to impute the remaining missing data. Next, ssGSEA was conducted to estimate the hallmark activities. For the matrix, the rows represent the hallmarks, and the columns represent the samples.

  4. Upstream kinase: It is the correlation analysis result of phosphorylation level and kinases expression. Phosphosites with missing values exceeding 80% and kinases with fewer than 10 non-missing expression values in non-missing phosphorylated samples were excluded from the analysis.
  5.                                                                                                                                                                                                                                    
    ColumnDescription
    CancerCancer type
    SitePhosphosite
    TumorSampleSizeThe number of tumor samples with non-missing values
    MostKinaseMostKinase is identified as the top 10 kinases with the highest absolute correlation values among those for which the correlation p-value is less than 0.05
    SpearmanPP value from Spearman's correlation analysis
    SpearmanFDRFDR for p value from Spearman's correlation analysis
    SpearmanECorrelation coefficient from Spearman's correlation analysis
    PearsonPP value from Pearson's correlation analysis
    PearsonFDRFDR for p value from Pearson's correlation analysis
    PearsonECorrelation coefficient from Pearson's correlation analysis

  6. DE: Differential analysis result of phosphorylation levels between tumor and normal. This analysis excluded phosphosites with missing values exceeding 80% in either tumor or normal samples.
  7.                                                                                                                                                                                                                
    ColumnDescription
    CancerCancer type
    SitePhosphosite
    TumorSampleSizeThe number of tumor samples with non-missing values
    NormalSampleSizeThe number of normal samples with non-missing values
    TumorSampleMeanThe mean phosphorylation level of tumor samples
    NormalSampleMeanThe mean phosphorylation level of normal samples
    log2FClog2fold change (Tumor/Normal)
    wilcoxPP value from Wilcoxon test
    wilcoxFDRFDR for p value from Wilcoxon test

  8. Stage: Differential analysis result of phosphorylation levels across tumor stages. This analysis excluded phosphosites with missing values exceeding 80% in tumor samples.
  9.                                                                                                                                                    
    ColumnDescription
    CancerCancer type
    SitePhosphosite
    TumorSampleSizeThe number of tumor samples with non-missing values
    TumorSampleMeanThe mean phosphorylation level of tumor samples
    kruskalPP value from Kruskal-Wallis test
    kruskalFDRFDR for p value from Kruskal-Wallis test

  10. Survival: Survival analysis result. Phosphosites with missing values not exceeding 80% in tumor samples and dead event of non-missing samples more than 1 were retained from this analysis.
  11.                                                                                                                                                                                                                                    
    ColumnDescription
    CancerCancer type
    SitePhosphosite
    OSHazardRatioHazard ratio for overall survival
    OSCoxPP value from Cox regression analysis for overall survival
    OSKMcutpointPP value from the log-rank test using an optimal cutpoint for overall survival
    OSKMMedianPP value from the log-rank test using a median cutoff for overall survival
    DFSHazardRatioHazard ratio for disease-free survival
    DFSCoxPP value from Cox regression analysis for disease-free survival
    DFSKMcutpointPP value from the log-rank test using an optimal cutpoint for disease-free survival
    DFSKMMedianPP value from the log-rank test using a median cutoff for disease-free survival

  12. Hallmark: Spearman's correlation analysis between phosphorylation levels and the activities of hallmark-based protein expression, using the ssGSEA method. This analysis excluded phosphosites with missing values exceeding 80% in tumor samples.
  13.                                                                                                                                                                                                                                    
    ColumnDescription
    CancerCancer type
    SitePhosphosite
    TumorSampleSizeThe number of tumor samples with non-missing values
    MostHallmarkMostHallmark is identified as the top hallmark with the highest absolute correlation value among those for which the correlation p-value is less than 0.05.
    SpearmanPP value from Spearman's correlation analysis
    SpearmanFDRFDR for p value from Spearman's correlation analysis
    SpearmanECorrelation coefficient from Spearman's correlation analysis
    PearsonPP value from Pearson's correlation analysis
    PearsonFDRFDR for p value from Pearson's correlation analysis
    PearsonECorrelation coefficient from Pearson's correlation analysis