 Abstract
  Abstract 
Background: Accumulated studies have shown that low expression of 25-hydroxyvitamin D [25(OH)D] was significantly associated with the risk of non-alcoholic fatty liver disease (NAFLD). However, the exact causality is still unknown. The aim of this study was to investigate whether levels of 25(OH)D are associated with risk of NAFLD, using a two-sample Mendelian randomization (MR).
Methods: Data from a recent large vitamin D genome-wide association study (GWAS) on 417,580 Europeans were utilized, and the largest published histology-based NAFLD GWAS study (1,483 cases and 17,781 healthy controls) for genetic variants predicted to cause NAFLD were searched. All genetic datasets for the MR analyses were obtained using publicly available summary statistics based on individuals of European ancestry from the MR-Base and NHGRI-EBI GWAS Catalog database. Inverse-variance weighted (IVW) MR approach was used to estimate causal effects in the main analysis, complemented by 4 additional methods to control for pleiotropy. Sensitivity analyses were conducted to verify whether heterogeneity and pleiotropy can bias the MR results.
Results: The MR analysis did not provide strong evidence for the causal association of circulating 25(OH)D with NAFLD by IVW method (OR = 0.746, 95%CI 0.517–1.078; P = 0.119). The results were consistent using four other MR methods. Sensitivity analysis using all different analytical approaches yielded similar results. There was no evidence for pleiotropy (MR-Egger intercept: −0.0003758, P = 0.970). The replication process also showed consistent results using IVW method (P = 0.710).
Conclusion: This study indicates that serum 25(OH)D levels did not possess an obvious effect on the risk of NAFLD. The associations in previous studies may be due to residual confounding or reverse causation.
Keywords: 25-hydroxyvitamin D, GWAS, Mendelian randomization, NAFLD, SNPs
How to cite this article:Authors Qi Sheng and Huanchen Shi contributed equally to this work.
 Introduction
  Introduction 
Non-alcoholic fatty liver disease (NAFLD) represents the most common cause of chronic liver disease and affects almost a quarter of the world's population.[1],[2] It has become an epidemic, which causes a tremendous health and economic burden. Moreover, NAFLD can evolve towards steatohepatitis (NASH), eventually cirrhosis and hepatocellular carcinoma.[3] There is still no Food and Drug Administration (FDA) approved medications available for the treatment of NAFLD. Therefore, it is particularly urgent to identify-risk factors of this disease.
Vitamin D is a sterol derivative that affects a few common chronic diseases such as bone metabolic disorders, tumors, cardiovascular diseases, diabetes and so on.[4] Serum 25-hydroxyvitamin D [25(OH)D] is a primary circulating form of vitamin D, which can best reflect vitamin D stores.[5],[6] In the last few years, several studies explored the association between vitamin D and NAFLD but arrived at controversial conclusions. An early meta-analysis linked the risk for NAFLD to lower circulating concentrations of 25(OH)D.[7] Two recent meta-analyses concluded that 25(OH)D level was not related to either NAFLD activity score or fibrosis.[8],[9] Whether vitamin D deficiency is a contributing factor to NAFLD remains unclear. However, observational studies and RCTs are limited to confounding factors, such as obesity, which affects both serum 25(OH)D concentrations and incidence of NAFLD.[10],[11],[12] Further, it is difficult to judge the sequence of exposure and outcome which may lead to the reverse causality.
Mendelian randomization (MR) has been increasingly used to assess causal inference between exposure and outcome. The principle is based on the surmise that alleles are randomly allocated at meiosis and can be used as instrumental variables (IVs) to test causation.[13] This method can strengthen the exposure-outcome association inference by robustly reducing the risk of confounding and eliminating reverse causality.[14] Two-sample MR is one particularly valuable MR method using two independent GWAS datasets. Through dividing the dataset into two (or more) samples for estimation, it can mitigate the phenomenon of the winner's curse leading to bias in MR.[15],[16]
Mendelian randomization has been used elsewhere to investigate the effect of circulating 25(OH)D levels on NAFLD risk. However, the study was limited by the fact that all participants were of Asian origin and the diagnosis was based on ultrasound.[17] In this study, we examined whether 25(OH)D is causally related to NAFLD risk using a two-sample MR approach, with recent GWAS studies of the Europeans population. Moreover, the diagnosis of NAFLD was histology-based and more IVs were extracted from recent GWAS datasets. Several sensitivity analyses were performed to critically examine the underlying assumptions of the MR approach. We also performed MR with another two GWAS datasets for replication purposes to make our results more robust.
 Materials and Methods
  Materials and Methods 
Software selection and data source
This two-sample MR was conducted using the “Two Sample MR” R package[18] following the guidelines provided by the developers (https://mrcieu.github.io/TwoSampleMR), and in-house developed R scripts (available upon request). The model based on MR is shown in [Figure 1] and needs to satisfy three assumptions [Figure 2]. In order to avoid the potential bias caused by population stratification, only the subjects with genetic background of European origin were selected. GWAS data of 25(OH)D from the Integrative Epidemiology Unit (IEU) GWAS database based on the large UK Biobank (UKB) sample (N = 417,580) were obtained. The datasets of Revez et al.'s[19] study was obtained through observing data from UK Biobank. The database was developed by the MRC IEU at the University of Bristol. It is a collection of complete GWAS summary datasets, which can be downloaded as open-source files. The NAFLD dataset was obtained from the largest histology-based NAFLD GWAS which involved 1,483 European NAFLD cases and 17,781 genetically matched controls.[20] This dataset was obtained from NHGRI-EBI GWAS Catalog database. Another large GWAS study associated with 25(OH)D (including 79,366 European-ancestry individuals) and GWAS open summary level datasets associated with NAFLD from FinnGen study were included for replication purposes.[6] All datasets were publicly available summarized results from published genome-wide association studies.
 Figure 1: Forest plot of the MR sensitivity analyses excluding SNPs with possible pleiotropic effects
Figure 1: Forest plot of the MR sensitivity analyses excluding SNPs with possible pleiotropic effects Figure 2: Graph illustrating Mendelian randomization assumptions. (1) SNPs are strongly correlated with exposure. (2) SNPs should only affect the outcomes through affecting 25(OH)D. (3) SNPs should not be associated with potential confounder
Figure 2: Graph illustrating Mendelian randomization assumptions. (1) SNPs are strongly correlated with exposure. (2) SNPs should only affect the outcomes through affecting 25(OH)D. (3) SNPs should not be associated with potential confounderSelection of the genetic instruments
Instrumental variables associated with 25(OH)D concentration were extracted from IEU GWAS database as previously mentioned, including 417,580 individuals of European ancestry.[19] One hundred and forty three loci associated with 25(OH)D concentration were identified. Instrument SNPs associated with 25(OH)D levels were chosen at a level of genome-wide significance (P < 5 × 10−8), so as to satisfy the first MR assumption that the instrument SNPs robustly associates with the exposure. Independent SNPs were selected with the clump function in the “Two Sample MR” R package, which used the 1,000 genomes project as a reference panel. Clumping process (R2 < 0.001, window size = 10,000 kb) was conducted with the samples to ensure the instruments were truly in linkage equilibrium.[21] PhenoScanner database was explored to exclude SNPs with traits inducing potential horizontal pleiotropic associations. Genetic variants of palindromic and incompatible alleles should be removed when harmonizing exposure and outcome. 25(OH)D associated variants which possess the potential effects on NAFLD were extracted and the Proxy SNPs were excluded. F-statistics was calculated to examine the strength of the instruments using an equation used in previous MR studies.[22] If the F-statistic is greater than 10, the weak instrument bias appears unlikely.[23],[24] The variance (R2) of 25(OH)D, which is explained by the instruments, was calculated using the approximation described by Bowden et al.,[24] and the calculation method was showed in the Supplementary Material Text S1.
Mendelian randomization analysis
A two-sample MR was performed for testing the causal relationship between 25(OH)D and NAFLD. Analyses were performed using Two Sample MR R package of MR-Base (https://github.com/MRCIEU/TwoSampleMR). Causal effect was estimated in the primary analyses with the inverse-variance weighted (IVW) method, which uses a meta-analysis approach to combine the Wald ratios of the causal effects of each SNP and can provide the most precise estimates.[18],[25] When the directional pleiotropy (P for MR-Egger intercept >0.05) was not found among the selected IVs, the IVW method was regarded as reliable. MR-Egger, weighted median, weighted mode methods and simple mode analysis were also performed to estimate the causal effect of exposure on outcome to make our results more robust. Furthermore, the PhenoScanner tool was used to search for other phenotypes (including BMI, alcohol, triglyceride, cholesterol, liver enzyme, waist circumference, LDL) which are related to the selected instrumental variable SNPs at risk of affecting NAFLD phenotypes. Then these SNPs were manually removed from the MR analysis to rule out possible pleiotropic effects.
Heterogeneity tests
Heterogeneity indicates potential violation of the IV assumptions and displayed the results using forest plots.[26] Cochran's Q statistics was used to estimate the heterogeneity of IVW. Cochran's Q test follows χ2 distribution, and P < 0.01 is defined as significant heterogeneity.[27]
Sensitivity analyses
To evaluate the sensitivity of IVs in MR analysis, a leave-one-out analysis was performed by discarding each SNP in turn and repeating the IVW analysis to examine whether a single SNP could drive the association.[18] In the main MR analysis, sensitivity analysis was estimated by excluding variants associated with potential confounders. Therefore, the PhenoScanner database was used to search GWAS traits of each 25(OH)D-related SNP as instrument, in order to identify the potential confounders. When the GWAS P value of the variant for a trait was below 5 × 10−8, the associations were considered as positive. Certain traits were classified into 10 larger trait categories and an unclassified group, according to the number of traits-associated SNPs [Table 1]. Although no confounders were found which may cause the horizontal pleiotropy between the vitamin D and NAFLD, we evaluated parameters that may be associated with NAFLD in sensitivity analysis. Through searching by keywords in PhenoScanner database, we identified all the SNPs of possible traits that may associate with NAFLD. The traits included lifestyle (alcohol consumption), obesity-related (waist circumference, body mass index, etc.), lipid-related (triglyceride, cholesterol, etc.), and so on. Then traits with SNPs intersecting with 25(OH)D were obtained to exclude the interference [Table 1]. Certain parameters were classified into 10 larger trait categories and an unclassified group. The top three trait categories which were ranged according to the relevancy of phenotype and SNPs were selected for sensitivity analysis. Blood cell counts, body composition and anthropometrics were included [Table 1]. In addition, a sensitivity analysis was performed by excluding all SNPs that were associated with any trait with P value below 5 × 10−8 in the PhenoScanner database. The aim of performing special sensitivity analyses by excluding traits clustered SNPs is to try to explore how these SNPs impact on 25(OH)D levels and distinguish vertical pleiotropy to adjust causal estimation.[28]
 Table 1: Results of the MR study testing causal association between low 25OHD and NAFLD
Table 1: Results of the MR study testing causal association between low 25OHD and NAFLD   Results
  Results 
Selection of genetic instrumental variables
Following the method described above, 113 significant SNPs (P < 5 × 10−8) were selected accounting for approximately 3.5% of vitamin D variation, while 10 SNPs were absent in the NAFLD GWAS datasets and 26 SNPs with NAFLD-associated traits were excluded based on PhenoScanner database [Supplementary Materials Table S1[Additional file 1]]. We also conducted another harmonizing process and excluded 10 SNPs as a result of palindromic sequences. Finally, 63 SNPs were included after removing linkage disequilibrium and palindrome sequence. Detailed information of the 63 IVs is available in the supplementary material. In replication practice, the data showed six SNPs associated with genome-wide significance (P < 5 × 10–8) with 25(OH)D. One SNP (rs12785878) did not exist in NAFLD GWAS datasets. A palindromic SNP (rs8018720) was not included. Finally, four intersected SNPs (rs10741657, rs10745742, rs17216707, rs3755967) were included in the analysis [Supplementary Material Table S2[Additional file 2]].
The influence of 25(OH)D on the risk of non-alcoholic fatty liver disease
The results of the MR analyses are shown in [Table 1] and [Figure 1]. No significant association of low vitamin D levels with increased risk of NAFLD was identified (OR = 0.746, 95%CI 0.517–1.078; P = 0.119). A potential heterogeneity was identified using the Cochran's Q test (Q = 84.9; P = 0.0283). IVW with multiplicative random effects method was utilized in case of potential heterogeneity. The total F-statistic was 198.82 and each single F statistic was larger than 10, indicating strong instrumental variables. There was no evidence for pleiotropy (MR-Egger intercept: −0.0003758, se = 0.010079; P = 0.97). Similar results were obtained using the other 4 MR methods [Figure 1]. A scatter plot was also generated [Supplementary Material Figure S1[Additional file 4]].
In replication practice, the result also showed a negative causal link between vitamin D and NAFLD using the IVW method (OR = 0.609, 95%CI: 0.045–8.307; P = 0.710), simple mode method (OR = 0.905, 95%CI: 0.0393–20.844; P = 0.954), weighted median method (OR = 1.148, 95%CI: 0.121–10.925; P = 0.954) and MR-Egger method (OR = 7.498, 95%CI: 0.201–280.004; P = 0.389). No significantly causal relationship between vitamin D and NAFLD was found under the different MR methods [Supplementary Material Figures S2[Additional file 5]] and [Supplementary Material Figures S3[Additional file 6]].
Sensitivity analysis
Although the intercept obtained from the MR-Egger regression did not show significant unbalanced horizontal pleiotropy (MR-Egger intercept: −0.0003758, se = 0.010079; P = 0.97), evidence for pleiotropy based on MR-PRESSO analysis (P value global test 0.001) was found. The result of MR estimates did not alter the inference after removing outlier SNPs [Supplementary Materials Table S3[Additional file 3]]. Leave-one-out sensitivity analysis by discarding each SNP one by one was conducted and the results remained unchanged (P > 0.05) [Supplementary Materials Figure S4[Additional file 7]]. Weighted Median, Simple Mode, Weighted Mode and MR-Egger method confirmed the lack of association [Supplementary Material Figure S5[Additional file 8]]. All the results indicated that none of the genetic variants were sensitive.
Additionally, sensitivity analyses were conducted after removing several traits-clustered SNPs, as the above methods described, referring to previous studies.[28] SNPs associated with blood cell counts, body composition and anthropometrics were included in this sensitivity analysis. Moreover, SNPs with any distinctive traits based on PhenoScanner databases were removed (P < 5 × 10−8). Sensitivity analyses demonstrated a consistent association, suggesting that our conclusions were robust [Figure 1].
In replication practice, leave-one-out sensitivity analysis by discarding each SNP in turn was conducted and the results were also consistent, [Supplementary Material Figure S6[Additional file 9]].
 Discussion
  Discussion 
In this study, the Two-sample MR was used to evaluate the association between 25(OH)D and NAFLD. The results showed that there was no genetic evidence for a causal effect of 25(OH)D levels on the risk of NAFLD, and this conclusion was consistent after the sensitivity analyses and replication. To our knowledge, this is the first two-sample MR study to examine the effect of 25(OH)D levels on histology-based NAFLD risk on a European population.
There has been a significant scientific interest in the relationship between vitamin D status and NAFLD. However, whether the serum level of vitamin D is a risk factor for NAFLD is controversial, where previous findings have been contradictory.[29],[30] Many previous studies have investigated the association between vitamin D deficiency and NAFLD/NASH. Observational studies found that vitamin D levels decreased in NAFLD.[31],[32],[33] Whether vitamin D deficiency is a contributing factor to NAFLD remains unclear. Some studies showed that low vitamin D levels were associated with the presence of NAFLD, suggesting that vitamin D may be a target for the treatment of NAFLD and NASH.[34],[35],[36],37],[38],[39] In a meta-analysis, patients in NAFLD were 26% more likely to have lower 25(OH)D level, reflecting deficiency in vitamin D, when compared with controls.[7] However, the results of other studies were inconsistent.[40],[41],[42],[43],[44]
Differences between conclusions may be attributed to a number of reasons. Initially, the failure in controlling the confounders included different sample sizes, races, BMI, lack of a control group without NAFLD on liver biopsy and so on.[30] Some confounders themselves such as BMI can apparently influence the 25(OH)D serum levels, making it difficult to distinguish whether the confounders or the 25(OH)D serum levels caused the NAFLD. On the other hand, observational studies can be hampered by confounding or reverse causation. NAFLD contributes to lower concentrations of serum 225(OH)D. The limitation of current works prevent us from establishing any temporality.[45]
MR analyses reduce confounding and reverse causality due to the random allocation of genotypes from parents to offspring.[46] To the best of our knowledge, the present MR study is the first two-sample MR to evaluate the causal role of 25(OH)D for NAFLD in the European population. We have used the largest GWAS of serum 25(OH)D traits and diagnosed NAFLD disease to investigate causal relationships. Horizontal pleiotropy means that genetic variation has many independent ways to affect the results and reject the IV assumptions of the MR model.[47] In this study, MR-Egger test showed the low intercept and high P value (−0.00038; P = 0.970). It proved that the results were un-affected by horizontal multidirectionality.
There has been a MR analysis limited to less genetic variation tools sample incorporating four vitamin D-related SNPs (rs12785878, rs10741657, rs2282679, rs6013897), which were chosen on the basis of a genome-wide association study on 25(OH)D. All participants were of Asian origin and the diagnosis of NAFLD was made by ultrasound.[17] These factors may bias the results. This Mendelian randomization study used the largest histology-based NAFLD GWAS datasets as outcome data. GWAS data of 25(OH)D from the IEU GWAS database based on the large UKB sample (N = 417,580) were obtained. The research chose a conservative threshold of 25 nmol/L for vitamin D deficiency and applied a rank-based inverse-normal transformation (RINT) to the phenotype (25(OH)D levels) and fit age at time of assessment, sex, assessment month, assessment centre, supplement-intake information, genotyping batch and the first 40 ancestry PCs as covariates in the model. The above aspects made the datasets more reliable and authoritative. The results of our Mendelian analysis were consistant and reassuring by using five MR methods (IVW, MR-Egger, weighted median, weighted mode MR and simple mode analysis) and sensitivity analyses by excluding SNPs with pleiotropic effects. In addition, another two GWAS samples were used for the replication to validate the conclusion.
This study has several limitations. Primarily, the exact function of some SNPs remains unknown, making residual bias possible while examining pleiotropy. However, we obtained consistent results using 5 MR methods making sure of pleiotropy, robust methodology, and sensitivity analyses excluding SNPs with pleiotropic effects is reassuring.[28] Secondarily, the MR results cannot be generalized to other race origins. Although both exposure and outcome GWAS were restricted to European ancestry participants, residual confounding from population stratification cannot be completely excluded, due to ethnic differences in the exposure and outcome GWAS populations.
In summary, this study provides new evidence that serum 25(OH)D levels may not be causally related to the risk of NAFLD. Our findings imply that the associations based on some previous observational studies between 25(OH)D and risk of NAFLD might be due to reverse causality and confounders, such as environment correlated with exposure to sun and skin pigmentation.[28] This assumption cannot be eliminated assuming that low concentration of vitamin D may just play an intermediary role as a real risk factor for NAFLD. This conclusion needs to be confirmed by large RCTs or future MR evidence based on updated GWAS samples.
Financial support and sponsorship
This project was supported by the National Natural Science Foundation of China (Grant No. 32171277).
Conflicts of interest
There are no conflicts of interest.
 Supplementary Material
  Supplementary Material 
Text S1: Equation For F-statistic
The total F statistic of the IVs sample in the exposure was explained by the genetic variants (R2), sample size (N) and number of instruments (K), by the formula:

We characterized instrument strength in the univariable MR setting by generating mean F-statistic. For a single variant, we used the approximation described by Bowden et al[1]:

Where γj is the SNP-exposure association and σxj is the standard deviation for SNP-exposure association for variant j.
Bowden, J., Del Greco, M.F., Minelli, C., Davey Smith, G., Sheehan, N.A., and Thompson, J.R. (2016). Assessing the suitability of summary data for two-sample Mendelian randomization analyses using MR-Egger regression: the role of the I2 statistic. Int J Epidemiol 45, 1961-1974.
 References
  References 















































            
			
Correspondence Address:
Dr. Yongning   Xin
Department of Infectious Disease, Qingdao Municipal Hospital, 1 Jiaozhou Road, Qingdao 266011, Shandong Province 
China
Source of Support: None, Conflict of Interest: None
 Check
CheckDOI: 10.4103/sjg.sjg_297_22

Comments (0)