Introduction
Hepatocellular carcinoma (HCC) is the second major cause of cancer-related death in the world.1,2 The prognosis of HCC is still not ideal, although related research has made great progress in recent years.1,3 This is mainly related to the high heterogeneity of HCC and the high diagnostic rate of advanced HCC,4–7 even with identical pathological types and clinical stages, patients’ individual responses to the same treatment can be diversified.8,9 It is worth noting that patients with HCC usually have a background of liver cirrhosis. Nevertheless, in clinical practice, which monitoring strategy is most effective for early tumor detection is uncertain.10 There is a pressing need to identify reliable biomarkers for the diagnosis and prognosis of HCC, to improve the survival of HCC.
In recent years, growing evidence has shown that the metabolic pattern of the cell cancerization process has changed significantly, which involves many aspects, such as glycolysis, the citric acid cycle, and oxidative phosphorylation of amino acids metabolism, fatty acid metabolism and nucleic acid metabolism, etc. This phenomenon is known as the reprogramming of energy metabolism of tumor cells, which is crucial for tumor growth.11,12 Some scholars have found that metabolic abnormalities are an important factor in the pathogenesis of HCC.6,7,13 Lee et al.13 noted that the gene expression levels involved in glycolysis and oxidative metabolism in HCC livers were much higher than those in normal livers, which was indeed relevant to an increased risk of liver cancer and may represent a potential target for the prevention of HCC. Gao et al.6 conducted a multidimensional proteomics study of 159 hepatitis B virus-positive liver and para cancer samples from patients in China, and found that most of the liver-specific metabolic pathway proteins (such as sugar dysplasia, detoxification, ammonia and urea metabolism) in liver tumors were significantly reduced; however, the key enzymes of cholesterol metabolism (SOAT1, SOAT2, etc.) and glutamine metabolism-related proteins (GLS and GLUD2) expressed in tumors were increased significantly, suggesting that hepato-specific metabolic pathways are reprogrammed in hepatitis B virus-associated HCC. Some scholars have also proposed that metabolic changes in the tumor microenvironment (TME) can inhibit antitumor immunity (such as immune cell infiltration) by producing immunosuppressive metabolites.14,15 However, there is still a lack of research on genes related to metabolism in predicting the prognosis of patients with liver cancer. Investigations into the metabolic genes of HCC are expected to open up new avenues for the treatment of HCC.
This research study established a risk score (RS) based upon the expression levels of five metabolism-related genes and analyzed the diverse clinicopathological features correlated with the new RS. The correlations of the RS with tumor immune cell infiltration were also evaluated.
Methods
Data collection and extraction of metabolic genes
The clinical data and mRNA expression profiles of patients with HCC were taken from The Cancer Genome Atlas (TCGA; https://portal.gdc.cancer.gov/ ), including 374 HCC samples and 50 normal samples; the Genotype-Tissue Expression project (GTEx; www.gtexportal.org ), including 110 normal samples; the International Cancer Genome Consortium (ICGC; https://icgc.org/ ), including 215 patients with HCC; and the Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/ ) (GSE14520), including 235 patients with HCC. A total of 2,752 metabolic-related genes that encoded all the known human transporters and metabolic enzymes were obtained from a previously published paper,16 for subsequent analysis. The gene expression profiles obtained from different databases were normalized with the “combat” package in R software. The collected data were used in accordance with the data access strategies of TCGA, ICGC and GEO. All research processes and analyses were conducted in compliance with relevant regulations and guidelines. HCC clinical survival data and mRNA profile data were publicly available, and approval of the local ethics committee was not required.17
Identification of differentially expressed metabolic-related genes (DEMRGs)
Using the Wilcoxon method in the R package “limma” to detect differential genes related to metabolism in HCC and normal tissues, the results of log2 fold change >1 and false discovery rate <0.05 were regarded as significantly different. The “limma” and “heatmap” packages in the R software were used to form volcano and heat maps of DEMRGs.
Annotations of DEMRGs’ functions and pathways
This research applied the R package “cluster profile” for DEMRGs annotation (gene ontology [GO] and Kyoto Encyclopedia of Genes and Genomes [KEGG] pathway)18 to evaluate the underlying biological function of DEMRGs.
Identification of prognostic-related genes and construction of the prognostic model in the TCGA cohort
We used 343 patients (survival ≥1 month) from the TCGA dataset as the training cohort, to develop the prognostic model. During the building process of the prognostic model, we combined univariate Cox regression analysis, Lasso regression analysis, and multivariate Cox regression analyses. First, univariate Cox regression analysis was applied to screen DEMRGs associated with prognosis (p-value <0.001 considered significant).19,20 Next, the least absolute shrinkage and selection operator algorithm was utilized to avoid overfitting of the prognosis-related genes. During the process of this analysis, we subsampled the dataset 1,000 times and chose the genes that were repeated >900 times. A subselection of prognosis-related genes was determined by penalty parameter tuning performed via 10-fold cross-validation. Only genes with non-zero regression coefficients were retained for subsequent multivariate Cox regression analyses19–22 (Supplementary Fig. 1). The formula of the RS was as follows: RS = the sum of each multivariate Cox regression coefficient of mRNA multiplied by each normalized mRNA expression level. According to the median RS, patients were divided into two groups: the low-risk (LR) group and the high-risk (HR) group. Utilizing the Kaplan-Meier approach in the R-package “suvminer” produces survival curves, and the log-rank test was used to contrast discrepancies between the two groups. Using the time-dependent receiver operating characteristic (ROC) curve analysis feature in the R package, “survival ROC” aimed to evaluate the prognostic ability of the RS.
Independence validation of the prognostic model
Univariate and multivariate Cox regression analyses were applied to detect whether the RS was an independent prognostic predictor. A value of p<0.05 was statistically significant.
Internal validation of the prognostic model in the TCGA cohort
We divided the patients into several subgroups for internal validation according to their pathological features (including α-fetoprotein [AFP] level, vascular invasion, histological grade, AJCC-TNM stage, new tumor after initial treatment, and individual tumor status). The analysis of survival adopted the Kaplan-Meier method, and when the log-rank test detected a value of p<0.05, it was considered statistically significant.
External validation of the prognostic model using multiple independent cohorts
We calculated the RS of patients in the validation cohort (ICGC, GSE14520) using the same formula established by the TCGA cohort. The patients were separated into a LR group and a HR group based on the same cutoff value. Kaplan-Meier survival analysis, ROC curve analysis and univariate and multivariate Cox regression analysis were conducted as described above.
Correlation analysis between the RS and clinicopathology
We used the chi-square test to analyze the correlation between the RS and clinicopathology (including gender, age, AJCC-TNM stage, Barcelona Clinic Liver Cancer [BCLC] stage, Cancer of the Liver Italian Program [CLIP] stage, main tumor size, histologic grade, AFP, and vascular tumor cell type). A value of p<0.05 was statistically significant.
Correlation analysis between RS and tumor immune cell infiltration
The CIBERSORT method (using the characteristic matrix of 547 genes to express 22 types of infiltrating immune cells) was used to measure the infiltration ratio of immune cells as a number in tumor tissues, and the samples with p<0.05 were selected for subsequent analysis.
Gene set enrichment analysis (GSEA)
To further explore the internal mechanism of the prognostic model, we conducted GSEA on the LR and HR groups of the three independent cohorts to reveal the molecular biological characteristics of the LR and HR groups.
Results
Functional enrichment analysis and survival analysis of DEMRGs
A total of 134 metabolic-related genes were differentially expressed in HCC tissues (n=374) compared with normal tissues (n=160) (Fig. 1A–B). GO enrichment analysis showed that the main functions of these differentially metabolized genes included small molecule catabolism, organic acid biosynthesis, sulfur compound metabolism, carboxylic acid biosynthesis, organic acid catabolism, fatty acid metabolism, carboxylic acid catabolism and other processes (Fig. 1C). The KEGG enrichment analysis identified these genes as being prevailingly related to chemical carcinogenesis, arachidonic acid metabolism, drug metabolism, glutathione metabolism, retinol metabolism, and carbon metabolism (Fig. 1D).
Construction of the five-metabolic gene prognostic model
To facilitate the clinical application of our prognostic model, five metabolic-related genes were identified by Lasso-penalized Cox analysis to establish a predictive model. RS=(−0.02928*BDH1 normalized expression level)+(0.04763*RRM2 normalized expression level)+(−0.0018*CYP2C9 normalized expression level)+(0.0111*PLA2G7 normalized expression level)+(0.0111*TXNRD1 normalized expression level). The median RS (0.967) of the TCGA cohort is a critical value that divides all patients with HCC into HR and LR groups. This research applied disease-specific survival (DSS), overall survival (OS), progression-free survival (PFS) and disease-free survival (DFS) to compare the prognosis of patients with different risks. The Kaplan-Meier curve showed that compared with the HR group, the PFS, DFS, DSS, and OS of the LR group were remarkably higher (p<0.001) (Fig. 2A–D).
When assessing the performance of the prognostic model by measuring the area over time under the ROC curve, the higher the area under the curve, the better was the model performance. The areas under the curve for the 1-year, 3-year, and 5-year PFS were 0.689, 0.621, and 0.683, respectively (Fig. 2D); the areas under the curve for the 1-year, 3-year, and 5-year DFS were 0.678, 0.615, and 0.691, respectively (Fig. 2C); the areas under the curve for the 1-year, 3-year, and 5-year DSS were 0.815, 0.738, and 0.674, respectively (Fig. 2B); and, the areas under the curve for the 1-year, 3-year, and 5-year OS were 0.8, 0.692, and 0.673, respectively (Fig. 2A). The RS was an independent prognostic indicator linked to PFS, DFS, DSS and OS, as presented by univariate and multivariate Cox regression analyses (Fig. 3A–D).
Internal validation of the prognostic model in the TCGA cohort
We divided the patients into several subgroups for internal validation according to their pathological features, consistent with previous results. Compared to the LR group, the HR group patients’ OS rates were notably lower (Fig. 4A–F).
External validation of the prognostic model in the ICGC and GSE14520 cohorts
Two independent datasets (ICGC, n=215; GSE14520, n=235) were used to test the prognostic value of the RS. The calculation formula of RS and the threshold value for dividing the HR and LR groups were consistent with that of the TCGA cohort. The HR patients’ OS was notably lower than that of LR patients in the two independent cohorts (Fig. 5A, D). The area under the ROC curve of the 1-year, 3-year and 5-year overall survival rates of the ICGC cohort and GSE14520 cohort were 0.750, 0.734, 0.829, 0.675, 0.671, and 0.673, respectively (Fig. 5A, D). The RS could be regarded as an independent prognostic indicator by univariate and multivariate Cox regression analyses (Fig. 5C, F). Due to the lack of relevant information about the Child/model for end-stage liver disease score, we cannot directly compare the prognostic value of the Child/model for end-stage liver disease score with the RS. However, by comparing the area under the curve values of the ROC curve between the RS and the traditional TNM staging system, we found that the RS had better performance in predicting prognosis (Supplementary Fig. 2).
Correlation of the prognostic model with clinicopathological characteristics
We performed chi-square tests on three independent cohorts (TCGA, ICGC, and GSE14520) and revealed that stage, grade, vascular tumor cell type, individual neoplasm status, main tumor size, and new tumor event after initial treatment concerned the RS of patients with HCC (Supplementary Tables 1–3).
Correlation analysis between the RS and tumor immune cell infiltration
The CIBERSORT algorithm was used to further analyze the infiltration degree of immune cell subtypes (samples were screened by p<0.05). The results showed that, compared with the LR group, the HR group had a markedly higher infiltration degree of M2 macrophages and a lower infiltration degree of M1 macrophages (Fig. 6A–C).
GSEA
As shown in Supplementary Fig. 3, the activity of metabolism-related pathways in the LR group was significantly stronger than that in the HR group, suggesting that we may be able to find new therapeutic strategies to improve the prognosis of HCC by targeting metabolic reprogramming of HCC.
Discussion
Because HCC usually occurs in the context of cirrhosis, it has high morbidity, mortality, recurrence and heterogeneity,1–3 and poses a great threat to human health. With the worldwide application of next-generation gene sequencing technology, people have gradually realized that the prognosis of patients with HCC is not only dependent on the traditional clinical staging system but is also related to molecular genetic factors.5,7,23–25 Growing evidence has shown that metabolic reprogramming exerts a huge function in the emergence and growth of HCC.11,13 The prognostic value of some metabolic genes has been validated,6,26 but they are still numerically inadequate. Thus, there remains a pressing need to identify more biomarkers related to the prognosis of HCC.20
Compared with previous studies,18–20,26,27 this study highlighted the following aspects. First, in this study, mRNA data from TCGA, ICGC, GEO, and GTEx were integrated to study the prognostic value of metabolic-related genes in HCC. Second, we used three independent cohorts (TCGA, ICGC, and GSE14520) to construct and validate the prognostic model, making the conclusion more reliable. Third, we investigated the relationship between the prognostic model and tumor immune cell infiltration.
Patients were classified into a LR group and a HR group, based on a uniform cutoff (0.967). In both the training cohort (TCGA) and the validation cohort (ICGC and GSE14520), the OS of the HR group was significantly lower than that of the LR group. The area under the curve of the ROC curve showed that the RS had higher specificity and sensitivity for the prediction of prognosis. Clinical correlation analysis showed that patients with a HR score were found to be significantly correlated with higher tumor grade, larger main tumor size (>5 cm), older (age >65 years), vessel invasion, AFP >300 ng/mL, advanced BCLC stage (B–C), advanced CLIP stage (≥2) and advanced TNM stage (III–IV), and these results suggest the high-RS patients had a higher degree of malignancy. For patients with the same clinical features, the prognosis of the HR group patients was markedly worse than for those of the LR group, which highlights the importance of our RS establishment because it can better reflect the heterogeneity of patients compared with the traditional clinical stage.
It has been reported that abnormal metabolism of purines (such as abnormal elevation of uric acid) is related to the appearance and growth of different malignant tumors, such as colorectal cancer metastasis, non-small cell lung cancer brain metastasis, and the prognosis of pancreatic cancer, etc.,28–33 but there have been few reports on HCC. Immune cells are the main non-tumor components in the TME, and earlier studies have suggested that the infiltration of immune cells (neutrophils, dendritic cells, macrophages, etc.) in tumors has a close relationship with unadvanced HCC prognosis.34–39 Macrophages are the most numerous in tumor tissues and have the most significant regulatory effect on tumors. As such, they are called tumor-associated macrophages. Studies have found that M1-type macrophages can recognize tumor antigens, and phagocytose or kill tumor cells. Type II interferon (interferon-γ) is a classic inducer of M1 macrophage polarization and tumor cell killing.36,38 M2-type macrophages inhibit the activation and proliferation of T cells and natural killer cells by producing interleukin-10, transforming growth factor-β and prostaglandin E2 (prostaglandin E-2, PGE-2), and induce immune tolerance of tumor cells, thus promoting the proliferation, invasion and metastasis of tumor cells.36 In this study, it was found that the HR group had a higher infiltration degree of M2 macrophages and a lower degree of M1 macrophages than the LR group. These results suggested that this model could be used as an effective predictor of immune cell infiltration.
At present, obesity and metabolic diseases have become important factors that induce liver cancer.40 The beige fat cells in the body have come under scrutiny for their ability to burn energy to prevent obesity.41,42 Wang et al.43 found that catabolic metabolism of n-hydroxybutyrate mediated by 3-hydroxybutyrate dehydrogenase (BDH1) is an important step in the formation of beige fat cells in the body. Martinez-Outschoorn et al.44,45 found that BDH1 is preferentially expressed in breast tumor mesenchymal cells and that overexpression of BDH1 can promote the growth of cancer cells by generating fibroblasts to drive increased mitochondrial synthesis. Saraon et al.46 also found significant up-regulation of BDH1 expression in prostate cancer tissues. The current study discovered that, compared with normal tissues, BDH1 expression in tumor tissues was markedly down-regulated (Supplementary Fig. 4). Moreover, the high expression of BDH1 in tumor tissues was correlated with better OS and earlier TNM stage and a maximum tumor diameter of ≤5 cm and AFP of ≤300 ng/mL (Supplementary Figs. 5–8). Whether the mechanism is related to the previously reported promotion of beige fat cell formation in vivo deserves further study.
The ribonucleotide reductase m2 RRM2 has been confirmed repeatedly to have a relationship with HCC prognosis in recent years.47–49 Kosakowska et al.50 found a reduction in RRM2 remarkably suppressed HCC cell proliferation, and that RRM2 catalyzes the conversion of ribonucleoside 5′-diphosphate into a corresponding 2′-deoxyribonucleotide, and since this reaction is a rate-limiting step in DNA synthesis, RRM2 has been identified as a new target for cancer therapy.51,52 Additionally, this study discovered that poor OS was responsible for high RRM2 expression and predicted low tumor differentiation (Supplementary Figs. 5–8).
The cytochrome P450 system of the liver plays an important role in drug metabolism. The CYP2 family is the largest family of the CYP450 enzyme family, among which CYP2C9 is one of the most important subtypes.53 Nebert et al.54 once reported that the expression of CYP can affect the production of arachidonic acid-derived molecules and change various downstream signal transduction pathways, thus causing cell cancerization. Yan et al.’s55 study found that the expression level of CYP was significantly destroyed during the cancer process, while the activity of CYP was highly correlated with the expression level of the protein. The current study found that the expression level of CYP2C9 in tumor tissue was remarkably lower than that in normal tissue (Supplementary Fig. 4), and patients with high CYP2C9 expression in tumor tissues had a better prognosis (Supplementary Figs. 5–8). In addition, the reduced CYP2C9 expression level had a significant relationship with higher TNM stage and higher BCLC stage, a maximum tumor diameter of >5 cm, AFP of >300 ng/mL and vessel invasion (Supplementary Figs. 5–8). The results were similar to those of CYP4A11 in the study by Eun et al.56 These findings indicate that the high CYP2C9 expression is likely to be a favorable signal for the prognosis of HCC (Supplementary Figs. 5–8). Clinically, we can consider using CYP2C9 as a therapeutic target to further improve the prognosis of HCC.
The platelet-activating factor acetylhydrolase PLA2G7 is an effective proinflammatory and anti-inflammatory molecule involved in a variety of inflammatory processes.57 Nair et al.58 found that PLA2G7 expression was significantly up-regulated in fat cell precursors in obese individuals. Hou et al.59 and Hoffmann et al.60 identified PLA2G7 as a risk factor for cardiovascular disease. The current study found that compared with the level in normal tissues, the PLA2G7 expression level in tumors was remarkably higher (Supplementary Fig. 4), and the high expression level of PLA2G7 was significantly correlated with AFP >300 ng/mL (Supplementary Figs. 5–8), so PLA2G7 may be a new diagnostic marker for HCC.
Increasing evidence shows that oxidative stress caused by the destruction of the reduction-oxidation system is closely related to the occurrence of liver cancer.61–63 The thioredoxin reductase 1 TXNRD1, as a member of the thioredoxin system, is essential for maintaining the balance of the redox state in cells.62,64 This study found that TXNRD1 was significantly correlated with poor OS and higher TNM staging (Supplementary Figs. 5–8), which was similar to the report of Fu et al.65 and Lee et al.66 Therefore, TXNRD1 may be a biomarker with important prognostic value for HCC.
Targeted sequencing based on five metabolic genes can undoubtedly significantly reduce the cost of sequencing, but there are some limitations because our research results are mainly based on the description of the phenomenon, and we need to explore its mechanism through experiments. HCC is a complex disease caused by multiple mechanisms, not just metabolic disorders. Although we made full use of data resources, the lack of some clinical data will cause inevitable limitations, for example, the adjuvant treatment methods patients receive, such as chemotherapy, targeted therapy, and immunotherapy, comorbidities of patients and whether patients have underlying cirrhosis, because these factors have a significant impact on the clinical outcome. This study was retrospective and needs to be improved upon and verified in future multicenter prospective studies.