SEPLINE: Socioeconomic Position in Epidemiological Research—A National Guideline on Danish Registry Data

Introduction

Over the past decades, addressing socioeconomic differences in health has gained growing scientific, clinical, and political attention, increasing the demand for studies investigating the impact of socioeconomic position on health outcomes. The complexity of the field has led to the application of varying methods for data collection, data management, categorization, and analyses of socioeconomic indicators, which limits the comparability of results. The choice of method is often determined by local traditions and data accessibility but is also guided by the specific research question. As information about these processes is often inadequately reported in scientific papers, the reproducibility of findings is compromised. New researchers, in particular, may spend considerable time identifying suitable variables and how to apply them in observational studies.

To improve the comparability, quality, and feasibility of socioeconomic epidemiological research, the SEPLINE initiative was launched in 2024 as a collaborative effort among subject-matter experts from Danish universities and institutions. The primary outcome of this collaboration is the development of this guideline, which offers methodological recommendations and practical guidance for selecting and analyzing socioeconomic indicators from Danish population-based registries. Focusing on socioeconomic position as an exposure, the guideline is tailored to research involving the adult population. While cancer is used as a case example to illustrate key points, the methodological considerations are broadly applicable to research in other diseases and contexts. Developed through a consensus-driven process and grounded in existing evidence, the guideline aims to serve as a starting point for socioeconomic research. As the guideline does not encompass all aspects of this complex research field, researchers should carefully assess whether the suggestions provided are appropriate for their specific research question and study population.

Methods

The guideline is based on data from Danish national registries available at Statistics Denmark as of October 2024.

The Danish Welfare State

Denmark is a welfare state with universal access to education and healthcare for all residents, alongside a comprehensive social security system. The educational system is state-funded and offers free tuition at all levels from primary school to university.1 Education is mandatory for 10 years for children aged about 6–15 years, corresponding to primary and lower secondary education. To further support students, the Danish government provides financial aid through the State Educational Grant and Loan Scheme to cover basic living expenses. Despite these provisions, a strong correlation persists between parental socioeconomic factors and the highest educational level attained, indicating ongoing barriers regarding social mobility.2 The Danish social security system seeks to reduce poverty and improve economic equality through income redistribution and social benefit programs, including unemployment benefits, child allowances, housing subsidies, sick leave compensations, and pensions. Though income inequality, as measured by the GINI coefficient, has increased in Denmark since the late 1980s, Denmark is among the Organisation for Economic Co-operation and Development (OECD) countries with the lowest level of income inequality and poverty.3 Most healthcare services are provided with no out-of-pocket payments, including general practitioner visits, hospital care, certain vaccinations, cancer screening programs, and rehabilitation. Other services are only partially subsidized and require co-payment, eg, adult dental care, prescription medication, eye exams, physiotherapy, chiropractors, and psychologists. While the public healthcare system remains the dominant provider, the use and provision of private health services and private health insurance have expanded.1 Despite addressing inequality in health having been a key public health priority for decades, considerable socioeconomic differences in health outcomes and life expectancies persist in Denmark and other nations.4–6

Danish Registry Data

The Danish public system has a long tradition of comprehensive data collection and a unique ability to link data across systems, using a universal personal identification number, called the CPR number.7 Since 1968, all Danish residents have been given a CPR number at birth or residence permit. Information on many life events—including date of birth, immigration, educational attainment, employment and income, marriages and households, retirement, emigration, and death—is systematically registered in national administrative registers. Moreover, medical information is collected in national registers covering healthcare contacts, diagnoses, therapies, drug prescriptions etc.1 All these registries are linkable by the CPR number. The data are primarily collected for administrative purposes; however, Danish research institutions can be permitted to use the data for research under certain conditions. Statistics Denmark is the data holder of national registries that include socioeconomic information, and data extracts for research are accessed in a pseudo-anonymized form on the secure servers of Statistics Denmark. Through their website, Statistics Denmark offers a documentation system describing each variable including data breaches and validity.8 Researchers are encouraged to explore the website along with the information provided in this guideline. A list of the registers referred to in this guideline can be found in Table S1.

Guideline Development

This guideline was developed during two collaborative full-day workshops using a consensus driven approach with broad representation across institutions, geographical regions, seniority levels, and subject-matter expertise. Invitations were sent to research group leaders within the field as well as data-holder consultants from Danish universities and institutions, encouraging representation ranging from junior researchers to senior professors and chief consultants. Open invitations were also distributed via social media and the Danish Comprehensive Cancer Center's website. Participants spanned epidemiologists, sociologists, data consultants, clinicians, data managers, and statisticians. Participation was free of charge and open to researchers and other professionals with an interest in the topic.

Upon registration, participants received a questionnaire on their applied methodological practices, including variable selection, categorization, and data management. The responses revealed substantial variation in the choice of variables, time of measurement, handling of missing values, and categorizations. These findings formed the foundation for a preliminary guideline serving as the focal point for Workshop I. Workshop I, held in spring of 2024, focused on the choice of socioeconomic indicators, data collection, and data management. The workshop included two lectures. Chief Adviser Jarl Christian Quitzau from Statistics Denmark presented an overview of registry-based data sources for socioeconomic indicators and pitfalls in their use. Assistant Professor Linda Ejlskov from Aarhus University presented findings of a study emphasizing the analytical impact of different income measures.9 Building on the preliminary guideline, participants engaged in group discussions to draft concrete recommendations for single indicators. Where multiple—and potentially conflicting—recommendations emerged, all options were documented and systematically compared. The group recommendations were presented and discussed in a plenary session. After the workshop, group notes and plenary feedback were summarized in a revised version of the guideline. This version also included a proposed theoretical model and a new chapter on analytical methods. The updated guideline was circulated to all participants for review ahead of Workshop II. Workshop II, held in fall of 2024, focused on analytical methods. Epidemiologist Michael Green from Duke University gave a lecture on causal inference issues when working with multiple indicators of socioeconomic position. PhD student Søren Korsgaard Martiny from Aarhus University presented methodological considerations on rank-based income and wealth assessments. Participants then worked in groups to refine the theoretical model and continued building upon the work initiated in Workshop I. Outcomes from group discussions were again presented in plenum, with the aim of reaching consensus. In cases where consensus could not be reached—particularly when methodological choices were context-dependent—alternative approaches were included in the guideline, with their respective advantages and disadvantages clearly described. Before publication, the revised guideline was circulated for several rounds of review, ensuring that all SEPLINE group members endorsed the final content.

Socioeconomic Determinants of Health

Several public health theories describe how health is shaped by the social and structural conditions in which individuals are born, live, grow, work, and age. During the 19th century, with the foundation of social medicine, the lack of education, poverty, housing and working conditions were argued to be key determinants of health.10–12 Since then research has demonstrated a systematic socioeconomic gradient in health, with lower socioeconomic position being associated with poorer health outcomes at every level of this gradient. Concurrently, frameworks on the social determinants of health evolved to encompass distal and upstream societal mechanisms and emphasize the dynamic interplay between structural, contextual, and individual-level factors.10–17

In line with many current social epidemiological theories, the theoretical approach used in this guideline employs a life course perspective. This approach asserts that societal, contextual, and individual factors interact throughout life and shape different socioeconomic conditions (Figure 1). These conditions encompass both resource-based stratifying mechanisms in a society, such as material and social resources, and prestige-based mechanisms, such as occupational prestige. Socioeconomic position is a conceptualization of these conditions and refers to an individual’s or group’s social and economic position within the structure of society.10

Figure 1 Theoretical model of the relationship between structural, contextual, and individual factors and the trajectory of cancer. Adapted from Olsen MH, Kjær TK, Dalton SO, Danish Cancer Society Research Center. Social Inequality in Cancer in Denmark: White Paper. Danish Cancer Society; 2023.18

Structural factors include politics, legislation, and organization such as health policies, redistributive policies, housing policies, and structures of the educational system and healthcare system. Contextual factors encompass social norms, cultural practices and prestige, as well as the broader social, economic, and physical environment of our neighborhoods. These factors include—but are not limited to—access to and quality of education and healthcare. Individual factors include genetics and personality, early life conditions and morbidity, ethnicity, social support and resources, and capability and competencies, such as familial socioeconomic circumstances, social networks, and cognitive skills.

As illustrated in Figure 1, structural, contextual, and individual factors shape an individual’s socioeconomic position and influence different health-related factors and resources to navigate the society and healthcare system. It is through the combined interaction of these factors that the socioeconomic differences in, for example, cancer outcomes, are believed to arise.18,19 For instance, the design of healthcare initiatives, such as national screening programs, cancer treatments, or follow-up care, may interact with an individuals’ socioeconomic position or health resources, contributing to socioeconomic differences in timely diagnosis, treatment adherence, and ultimately cancer survival.18

Measures of Socioeconomic Position

Socioeconomic position is a latent and multifaceted construct that cannot be measured by a single indicator. Instead, it is approximated through various indicators that serve as proxies for overlapping, yet distinct dimensions of socioeconomic position. This guideline encompasses educational level, labor market affiliation or occupation, income, and wealth, alongside closely related sociodemographic factors such as marital or cohabitation status, ethnicity, and area-level characteristics. Though these indicators are correlated, each captures unique aspects of an individual’s socioeconomic position at different stages of life and potentially has differential effects on health.15,20Figure 2 illustrates some of these multi-dimensionalities, which can develop over the life course. The non-overlapping areas of the circles illustrate the unique contributions of each indicator, while the overlapping areas represent shared effects and a core dimension of socioeconomic position.20 For example, as educational level is correlated with later income level, observed health effects from income may be influenced by and share effects with education. Yet, the educational level may to a larger degree reflect knowledge and cultural resources, whereas income and wealth are more directly associated with the availability of economic resources.

Figure 2 The multidimensionality of indicators of socioeconomic position. Adapted from Green MJ, Popham F. Interpreting mutual adjustment for multiple indicators of socioeconomic position without committing mutual adjustment fallacies. BMC Public Health. 2019;19(1):19. Creative Commons.20

Given the unique aspects of each indicator, they are not interchangeable proxies for socioeconomic position. The choice of specific indicators should, therefore, be carefully aligned with the research question. Moreover, investigating different socioeconomic indicators and their interactions may elucidate the complex, multidimensional nature of socioeconomic position, and its influence on health outcomes. Such an approach may provide valuable insights to guide public health strategies and interventions. This guideline covers information about the selected individual socioeconomic indicators. For some settings, it may be meaningful to combine these individual indicators into a composite measure or socioeconomic index to reflect the multidimensionality of socioeconomic position, however, such approaches are not addressed in this guideline. Note that for all socioeconomic indicators, individuals with missing values may constitute a vulnerable group. Hence, it is recommended to explore this subpopulation separately in relation to the outcome.

Education

Educational attainment is considered a comprehensive indicator of socioeconomic position, which may encompass both individual-level factors and reflect the familial, contextual, and structural conditions in which one is raised. Individual-level factors include early cognitive development and abilities, physical and mental health status, intellectual capacity, motivation to learn, and personal interests and values. Familial conditions include the educational level, economic resources, living standard, and parental health or household, as well as the level of cognitive stimuli and support in early life and adolescence. Contextual and structural conditions include the quality and availability of learning environments, accessibility and structure of secondary and higher educational institutions, as well as social norms, expectations, and prestige.15,21 In regard to studies of health, the educational attainment and the abilities learned under education, are related to the level of health literacy, which encompasses the ability to understand, evaluate, and act upon health information, communicate and actively engage with healthcare professionals, and navigate in the healthcare system.22,23 Education is often defined as the highest attained (ie, completed) level of education. Although an individual’s educational level is typically attained in early adulthood, the educational level may influence future job opportunities and income level, and, therefore, also reflect aspects of these factors, such as available economic resources.15,21

Data Source and Variables

Different data sources are available to determine educational attainment, including registers on completed education programs and a longitudinal registry of ongoing activities (Table 1). The age at index date can help determine the most appropriate data sources and variables. As eg, a higher proportion of individuals below the age of 30 are still under education, it may be more appropriate for these age groups to use information from the student registry covering ongoing education (Figure S1), ie the KOTRE registry or the parental educational level.

Table 1 Data Sources for Educational Level in the Danish Registries

In Denmark, information on education has been systematically registered in student authorization registries since 1974 and maintained by Statistics Denmark.24 For some individuals, the educational data comes from other sources, as indicated by the variable HF_KILDE. Information on education attained prior to 1974 derives from the qualification registry and from a census conducted in 1970 based on self-reported information (HF_KILDE= 2). For immigrants, the available information on education can be self-reported from surveys (HF_KILDE= 3, 17) or imputed by Statistics Denmark (HF_KILDE= 9, 10, 18). Before 2018, the information came from surveys conducted in 1999, 2006, and 2016. Since 2018, information on education completed outside Denmark by individuals who have immigrated within the past year has been collected through The Immigrant Survey annually. This data is obtained from Danish language courses and the Danish Agency for Labor Market and Recruitment. However, since these sources do not cover all immigrants, Statistics Denmark supplements the data by sending a questionnaire each December via digital post to those for whom information is unavailable. For non-respondents, educational attainment is imputed. This imputation is based on population data from the past four years of surveys. Background variables from other registries are incorporated, and observations with unknown educational levels are imputed using a random forest algorithm. Therefore, the imputed data should be used with caution. Its limitations and uncertainties are more pronounced at the individual level, whereas aggregated data is generally reliable because errors in the imputed data are likely to be balanced out at the group average. Therefore, when analyzing the educational level across larger population groups—such as comparing entire municipalities or individuals in and out of the labor market—the imputed data remains useful. However, analyses involving individual-level education data are less reliable. In such cases, it may be advisable to exclude the imputed data altogether or analyze them separately (Table 2). Detailed information about the education imputation models is available in Table S2.

Table 2 Data Management of Educational Data

The highest attained educational level (HFAUDD) variable can be formatted by using formats available in the secure servers of Statistics Denmark. Each year, the format is updated with codes for new educational programs and corrections, eg, if a specific education program has changed level. There are large differences in categorization depending on which format you choose. The format most suitable for the Danish population is provided in Table 3.

Categorization

The categorization suggested in Table 3 is aligned with international classifications from the International Standard Classification of Education (ISCED) and OECD, enabling global comparisons.25 Depending on the study’s statistical power, it may be informative to treat missing values as a separate category, as this group could provide valuable insights. If the statistical power is insufficient, a complete-case analysis can be applied instead. The choice of categorization may consider that the length of education has different meanings for different birth cohorts. In 1972, compulsory education was extended from 7 to 9 years for students in 7th grade in 1971/72 and onwards, typically those born ≥1958. However, students born before 1958 increasingly attended voluntary 8th to 10th grades. In 1975, a unified school system was introduced, and a selective type of middle school (In Danish: “Realskolen”) was abolished. This educational program covered grades 6th to 10th for students who met specific grade requirements and prepared the students for careers or higher secondary education. Depending on the study population and research question, researchers may consider assigning these individuals a medium instead of short education. The last group completed the program (In Danish: “Realeksamen”) in the summer of 1978.

Table 3 Categorization of Educational Level Using Danish Registry Data

Attention Points Causality Issues Educational level can be influenced by early life health, sex, ethnicity, parental socioeconomic position, and possibly cohabitation/parenthood status. As such, researchers should be aware of both unmeasured confounding, and reverse causality.Missing Information The information on education from the census in 1970 was discarded for individuals aged 50 years and above in 1970 due to poor quality. Education is, therefore, missing for most people born in or before 1920 or available for a highly selected group only (Figure S2). In 2018, 51% of the adult population with missing information on education were immigrants. Be aware of the source of which the information is collected and consider whether imputed values should be excluded. There is a data gap with missing information on education attained between 1970 (census information) and 1974 (start of student authorization registries) (Figure S2).Inaccuracy and Misclassification By 31st December 2018, available information on the highest attained educational level was self-reported for 22% of the adult population (18% from the census in 1970 and 3.8% from surveys among immigrants). The proportion increases by decreasing birth year.26 Compared to information obtained from registries, the self-reported information may be less precise.24 Education attained outside Denmark is not systematically registered. The educational level is not necessarily proportional to income levels, eg, some blue-collar workers have a higher income than certain groups of academics (Figure S3). Besides primary education, most educations either qualify for further schooling or prepare for a specific profession. However, some educations, eg, some introductory courses, do not provide qualifications. Information about the attained qualifying level for each educational level can be found in the variable KOMP, as 1=qualifying general education, 3=vocational education and 7=higher education. Non-qualifying education and eg, privately paid education courses are not registered.Birth Cohort/Life Course Aspects The educational level in the population has increased notably over time, and the length of education may have different meanings for different birth cohorts (Figure S4). For some education programs, their level has changed over time. For example, the educational level for nurses has changed from short further education (category 40) to medium further education (category 50). For this reason, it is recommended to use the latest accessible format at Statistics Denmark. For younger cohorts, it might be appropriate to differentiate between the non-qualifying and qualifying educations. General upper secondary education (category 20) often requires further education to qualify for employment, while vocational education and training programs (category 30) are designed to directly prepare students for the labor market.Labor Market Affiliation and Occupation

This chapter offers guidance on measuring labor market affiliation and occupation type, helping researchers assess the relevance of each indicator to their specific research questions. Labor market affiliation provides insights into an individual’s social and economic standing, reflecting income and job stability, access to resources, and social networks. It reflects the dynamic interplay between work, economic resources, and social structures, making it a crucial factor in understanding social determinants of health, social mobility, and policy impacts.

Compared to education, labor market affiliation is more sensitive to economic cycles, making it particularly useful for identifying vulnerable socioeconomic groups. Labor market affiliation indicates one’s engagement with income-generating activities, including employment status and job stability. However, depending on the categorization used, it may not capture socioeconomic differences among the large group of employed individuals. In contrast, occupation type conveys information on education, skill levels, social prestige, and insight into roles and responsibilities, but this is not accurately reflected in registry data, leading to potential misclassification. Additionally, shifts in industry demands may alter the socioeconomic implications of certain occupations over time, and cultural or regional differences in occupational prestige further complicate cross-context comparisons.15,27

Data Sources and Variables

Three main data sources for labor market affiliation are summarized in Table 4.

Table 4 Data Sources for Labor Market Affiliation and Occupation in the Danish Registries

The Employment Classification Module (AKM)

In AKM, employed individuals are registered based on their main occupation during a given year, based on the activity that generated the largest amount of income.28 The registry comprises an annual primary activity status (variable: BESKST), occupation type (variable: SOCIO13), and a Danish version of the International Standard Classification of Occupations for the economically active (DISCO). The registry goes back to 1976, with several changes over time, including new criteria for classifying students with part-time jobs as well as adjustments made to how self-employed individuals are categorized. These changes have been applied to historical data, with older classifications phased out from 2014.

The Integrated Database for Labour Market Research (IDA)

IDA registers the labor market affiliation of all Danish residents on 30th November each year, among those residing in Denmark on 1st January the following year. Due to changes in the coding system, two variables may be needed to assess labor market affiliation in the IDA database. Before 2008, the information is collected in the variable PSTILL, which refers to the primary affiliation during the given year. Until 1995 employed individuals are subclassified according to their profession code. From 1996, employed individuals are classified based on their work function in accordance with recommendations from the International Labour Organization and DISCO.28 Since 2008, the information has been collected in the variable PSOC_STATUS_KODE, with different subcategories referring to employment types, unemployment, social assistance groups, etc. Therefore, the variables are not directly comparable.

The Danish Registry for Evaluation of Marginalization (DREAM)

DREAM encompasses all individuals who have received social benefits or other public transfer income, though not benefits related to the Social Services Act.29 From 2008, DREAM also includes monthly employment information, hence also including all employed. The database is longitudinal, with one new variable added per week, with a code indicating the type of benefit received (variable: y_YYMM, where Y=year without century and M=month). The codes can, for instance, indicate if an individual has been unemployed, on leave, in early retirement, on sick leave, on social assistance, enrolled in eligible student education, or in job activation. DREAM also monitors transitions to receiving a state retirement pension, emigration from Denmark, and deaths occurring before reaching the state retirement pension age. If the variable is empty for a given week, it indicates that the individual neither received any benefits during that time nor matched any other specified codes (referred to as “empty”). This will in the following be used to indicate that the individual was working or self-supporting. Be aware that this assumption may lead to misclassification, particularly for young individuals who are supported by their parents or are not entitled to unemployment benefits. To address this issue, linking salary or income data can help verify employment status. Starting in 2008, the variable Branche_YYYY_MM can be used to verify whether weeks with no recorded code (“empty”) are due to employment.

Weekly benefit information is generated whenever an individual has received a benefit for at least one day. However, only one type of weekly benefit can be recorded, necessitating automatic prioritization in cases of overlapping data. For example, unemployment benefits take precedence over social assistance, while sick leave benefits take precedence over unemployment benefits.

The choice of data source offers different options for time of measurement (Table 5). Due to the annual nature of AKM and IDA, it is recommended to collect data in the year prior to index, whereas DREAM gives the opportunity to measure labor market affiliation closer to index, eg the week before index. However, this should be done with caution as labor market affiliation may change in the time leading up to index and cause reverse causality issues. Also, DREAM data are sensitive to weekly changes, why some previous studies have categorized according to main status in the 3–12 months leading up to diagnosis.30,31

Table 5 Data Management of Labor Market Affiliation and Occupation Data

Table 6 Categorization of Labor Market Affiliation Using Danish Registry Data

Categorization

Codes from the data sources are broadly classified into four major labor market affiliation groups, which can be further divided into occupational statuses when using AKM or IDA (Table 6). Note that the validity of the occupation subcodes remains unknown. If interested in more granular groups of occupation within thematic categories (eg, agriculture, health etc.), 1st or 2nd level of the variable DISCO from Statistics Denmark can be used. The labor market affiliation groups provided may not be directly comparable across data sources; eg, in SOCIO13, flex jobs are combined with other social security programs. In contrast, IDA and DREAM classify flex jobs separately, making it possible to categorize them as “working” or “working under special conditions” if relevant for the specific research question. The main groups can be merged into being able-bodied (Category A+B) or not (Category C) among people of working age. However, this may cause some misclassification, as the reasons for labor market detachment are unknown. Other distinctions include participating in the labor market vs not and being self-supporting or on social support. The terminology used in the literature regarding Danish retirement and labor market schemes varies considerably in the literature. The terminology used in the guideline appears from Table S3.

Attention Points Causality Issues Health and diseases may impact the ability to work, leading to potential issues of reverse causality when socioeconomic position is the exposure of interest. This reverse causality will largely depend on the disease, time from onset to diagnosis and treatment, and screening.Missing Information People with missing information in DREAM are likely employed or in other ways self-supported (unless recently immigrated). Some may also be without any income, eg, young people transitioning from school to the labor market.Inaccuracy and Misclassification Labor market affiliation can change frequently, making it difficult to establish a stable measure of socioeconomic position. Standard measures of labor market affiliation may poorly capture within-group differences. Be aware that short-term (in 2025: <31 days) sick leave is not included in DREAM, as this is covered by the employer. The period of employer-paid sick leave (the period before the employer can receive reimbursement from the municipality) has changed several times, which will influence assessments over time. The periods can be found in Figure S5. When people exceed the employer-paid sick leave period (~ long-term sick leave), DREAM adds the number of weeks corresponding to the employer-paid sick leave, approximating their first day of sick leave.Birth Cohort/Life Course Aspects Labor market conditions can vary widely with shifts in economic cycles (Figure S6), affecting job availability and stability. In this connection, when examining long-term data (decades and longer), it is important to recognize the fact that economic downturns disproportionately impact the most socially vulnerable groups compared to others.32 The relevance of labor market affiliation may vary significantly at different stages of life, such as early career vs retirement. For some research questions, it may be relevant to investigate patterns and the potential multiple shifts in labor market affiliation over a longer period of time, eg, by using sequence analyses.33Income and Wealth

Income level is a widely used indicator of socioeconomic position, providing insights into the health effects of a broad spectrum of factors related to economic prestige and resources. Differences in accessibility of healthcare services and quality of care have been observed across Danish income groups.34–36 Resource-based factors include housing conditions, exposure to environmental factors, and access to more nutritious food, leisure activities, and private healthcare services.15 Various components of wealth, such as the value of properties, shares, cars, savings, or loans, also impact an individual’s available economic resources and experience of financial stability, which enables the management of unexpected expenses and economic stress. For pensioners, income may not be suitable for capturing economic prestige and resources, and, therefore, wealth may be a more relevant indicator in cohorts of older people.

Data Sources and Variables

For tax registration purposes, income levels are well-documented in Danish registries (Table 7). Various components of total income are accessible, including primary income, public transfer, private pensions, and capital income. In addition, information on taxes, interest expenses, and alimonies are also available for the calculation of net income. Furthermore, to allow for comparisons of disposable income between owners and renters, as well as between households of different sizes, the imputed value of rent for owner-occupied homes and information on the size and income of the household is also available. Finally, real estate, deposits in banks, shares, pensions, and loans are available as indicators of wealth or financial stability. Both personal and household income are assessed on an annual basis and available in Danish Kroner in the Income Statistics Registry. These income values can be converted from Danish Kroner to other currencies, as needed.37

Table 7 Data Sources for Income and Wealth in the Danish Registries

Income

Income is both assessed as individual and household income, as well as total income, and disposable income (income with subtraction of taxes, paid interest and paid alimonies). Further, an equivalized disposable income is provided, for which Statistics Denmark uses the OECD-modified equivalence scale to divide the total household income by its weighted size. The first adult in a household is assigned a weight of one, a second adult a weight of 0.5, and each child a weight of 0.3.38

As default, it is recommended to use equivalized disposable household income (FAMAEKVIVADISP_13 or AEKVIVADISP_13) unless disposable personal income better accommodates the research question. Ejlskov et al9 found a high correlation between individual and household income (both total and disposable income) in males, but not in females. As such, especially in populations including females, household income may be a better indicator of overall accessible financial resources.15 The study also found that disposable and total income were highly correlated (r=0.95).

Total pre-tax personal income may be useful in relation to questionnaire-based research. It is common for researchers to ask respondents about their pre-tax income level. The variable PERINDKIALT_13 (personal income in total) can be used to benchmark against the entire population or specific sub-populations when analyzing the representativeness of a survey population.

Wealth

As an indicator of wealth, using the variables for net assets FAMFORMREST_NY05 (household-level) or FORMREST_NY05 (individual-level) may be considered per default. These measures account for total liquid assets minus liabilities. If the household-level measure is used alongside the equivalized household-level income measure (FAMAEKVIVADISP_13), one should note that the wealth is not yet provided in equalized form in the registers. Therefore, consider manually equalizing the household-level wealth using the modified OECD-modified equivalence scale mentioned above or per adults in the household. If the study period allows the wealth variable FGNF_2020 from the FORMPERS registry should be used, as this variable is less sensitive to problems with public property assessments in the 2010s and 2020s.

In some cases, it may be relevant to look at selected financial assets rather than the net assets. For young people, for example, it may make sense to exclude pension savings because these are often not disposable. This can be done by subtracting pension assets, or by using variables not including pension, see Table S4 for variables.

Some individuals have low income, but high wealth. Therefore, considering both income and wealth may be more appropriate for determining an individual’s economic resources. A combined indicator may potentially be useful and additionally aids in classifying individuals with negative or near-negative income values who possess high wealth. However, the literature currently lacks examples on ways to most appropriately combine income and wealth in epidemiology, but inspiration can be found within mathematical sciences.39 The suggested data management practices for engaging with Danish income and wealth data are included in Table 8.

Table 8 Data Management of Income and Wealth

Categorization

As suggested by Ejlskov et al,9 an initial step in categorizing income is to examine its functional form by plotting across deciles, quintiles and quartiles. This analysis helps determine the right categorization. While dividing income into three groups based on the grouping of quintiles is often preferred to ensure adequate cell counts, as suggested in Table 9, some studies may benefit from using more categories. For research focusing on poverty, the relative poverty threshold defined as <50% of the median income can be applied.40

Table 9 Categorization of Income and Wealth Using Danish Registry Data

Continuous Variable

For some studies, modeling income as a continuous variable may be appealing. However, recent findings indicate that modeling income continuously is highly susceptible to downward bias, particularly due to the likely misclassification of wealthier individuals at the lower end of the tax-reported income spectrum.9 Moreover, the functional relationship between income and health does not follow a linear pattern but rather shows diminishing returns for higher income levels. Modelling income continuously tends to underestimate the health association for lower incomes while overestimating it for higher incomes. If including income as a continuous variable, we suggest using splines or fractional polynomials, if possible, to more accurately capture the complex relationship between income and health. These approaches allow for a more flexible representation of the underlying functional form.

Attention Points Causality Issues Illness and diseases, accidents, or family situations may for shorter or longer periods impact the ability to work and thus income.15 The time of which income is measured should, therefore, be chosen in a way that minimize reverse causality between health and income.Missing Information Most income variables available at Statistics Denmark are based on individuals residing in Denmark for the entire calendar year in question.Inaccuracy and Misclassification In 2022, 0.3% of the population had a negative income. This proportion varies across age groups and calendar year. The main causes of negative income are interest expenses (45%), losses from self-employment (27%), losses on investments (19%), delayed tax payments (5%), and alimonies (4%). Ejlskov et al9 found a high correlation between negative income and high wealth. As such, those with a negative income might belong to a wealthy population and can be misclassified if negative income is not handled. This may also apply to small positive incomes, which for the same reasons may be downward adjusted. Some individuals may in shorter or longer periods have low income due to unemployment, education, maternity leave, or because they work reduced hours.Birth Cohort/Life Course Aspects Income levels vary substantially over the life course and are highly sensitive to life events. Young people in education and training usually have lower incomes, which gradually increase as they complete their studies and gain more work experience. Likewise, the income level decreases again upon reaching retirement age, while wealth may accumulate with age. To account for societal salary developments over time, income values should be re-calculated according to inflation levels during the study period. Please refer to the suggested data management, for which income is calculated in income quintiles by calendar year and accounting for inflation. Another approach is to use a costumer price index (eg, from Statistics Denmark). The income can be adjusted to current currency values using the country-specific GDP deflator, which is available on the World Bank’s website.Cohabitation and Marital Status

Cohabitation status indicates whether an individual lives alone or with a partner, whereas marital status refers to the legal marriage status. Cohabiting or married individuals often exhibit better health outcomes and lower mortality compared to their counterparts.41 This phenomenon is likely attributed to differences in health behavior and emotional support, financial stability, and encouragement to seek medical care and adhere to treatment regimens. Additionally, cohabiting/married individuals may experience less psychological stress and better mental health, which can positively impact their overall health and ability to recover from serious illnesses like cancer. Though not a traditional indicator of socioeconomic position, incorporating cohabitation and marital status allows for a more comprehensive understanding of how social factors impact health outcomes and the identification of vulnerable groups. In Denmark, many families/couples live together without being married. By default, it is, therefore, suggested to use cohabitation status. Internationally, marital status is, however, a more common approach and can be included in supplementary analyses.

Data Sources and Variables

Marital status and address (used to define cohabitation status at Statistics Denmark) have since 1968 been systematically registered in the Danish Civil Registration System and are also available at Statistics Denmark in the population table BEF (Table 10). In the algorithm for cohabitation, Statistics Denmark defines couples as 1) Married individuals, 2) Individuals in a registered partnership, and 3) Cohabitating individuals, either defined as 3.1) living on the exact same address and ≥1 cohabitating common children, or 3.2) as living on the exact same address, and no cohabitating common children (except stepchildren), and opposite biological sex, and age difference of <15 years, and not closely related based on identifiers for children and parents, and no other adults living at the same address.

Table 10 Data Sources for Cohabitation and Marital Status and in the Danish Registries

To avoid reverse causation, it is recommended to measure cohabitation status at the year preceding the index data and marital status at index (Table 11).

Table 11 Data Management of Cohabitation and Marital Status

Categorization

The suggested categorization provided in Table 12 can be used independently of the measure used, offering the opportunity to apply more granular groups of “B: not married” if using the CIVST variable (marital status).

Table 12 Categorization of Cohabitation and Marital Status Using Danish Registry Data

Attention Points Causality Issues Cohabitation/marital status is associated with later health outcomes.41 However, illness may also lead to a change in cohabitation/marital status. Thus, it is important to consider a potential causal association in both directions.Missing Information Information on marital status is available at Statistics Denmark for individuals residing in Denmark for the calendar year in question. Individuals with missing values will often be recently immigrated.Inaccuracy and Misclassification The algorithm for cohabitation misclassifies non-hetero unmarried cohabiting couples as living alone. Moreover, cohabiting individuals not living in a romantic relationship, eg, roommates of the opposite sex, will figure as cohabiting, though not having a partner. Marital status is a valid indicator on its own, but using marital status as eg, a proxy for partnership is linked with misclassification, assuming that unmarried individuals are single.Birth Cohort/Life Course Aspects Measuring the duration of cohabitation or marital status may provide valuable additional information, both as an exposure and as a potential effect modifier in relation to relevant health outcomes. Younger cohorts may be more likely to be cohabiting and unmarried, than older cohorts.Ethnicity

Ethnicity is a complex and multifaceted concept, which is widely recognized as a social construct. It encompasses aspects such as shared culture, social structures, religion, traditions, language, and shared geographical origin. As a dynamic construct, ethnicity can evolve and shift depending on the surrounding context. Considering these factors, an individual’s ethnicity refers to the group to which an individual identifies, belongs, or is perceived to belong.42–44

The term “ethnic minority group” commonly identifies population groups distinct from the predominant population in a country. While “minority” has a numerical implication, it can also signify a less empowered position within societal hierarchies.44 In this chapter, the terms “ethnic minority group” or “ethnic minority background” are used when referring to populations numerically smaller than the majority of the country. SEPLINE does not assume any specific positions of power and does not take a stance on it. Migration can also be a determinant of health. There is no internationally accepted definition of the term “migrant” but it is commonly used as an umbrella term referring to individuals who relocate from their usual place of residence. This movement can occur within a country or across international borders, and be temporary or permanent.45

Migrants and individuals with ethnic minority backgrounds often face difficulties when accessing healthcare services, eg, due to language barriers, financial barriers, or structural barriers. When combined with varying levels of health literacy, these barriers can contribute to poorer health outcomes for these populations.46 Furthermore, many migrants and individuals with ethnic minority backgrounds occupy lower socioeconomic positions, with limited access to education, employment opportunities, and adequate income. These socioeconomic disadvantages are associated with poorer health outcomes and further challenged by cultural differences, systemic inequities, and the enduring effects of discrimination and marginalization.47–49

In epidemiological research, measurement of ethnicity is often simplified and limited to indicators such as country of origin or immigration status, which likely fail to capture the complex nuances of ethnicity. This highlights the importance of developing more nuanced and comprehensive approaches to studying ethnicity or variables related to ethnicity and clearly conceptualizing and defining the variables.

Data Sources and Variables

While the Danish registers do not contain direct measures of ethnicity, they offer several related indicators, including migration status, country of origin, residency data and length of residency in Denmark (Table 13). Researchers should select the variables that best align with the focus of their study and the information they seek to uncover. Since 1968, country of origin and immigration status have been systematically registered in the Danish Civil Registration System and are available at Statistics Denmark in the population table BEF from 1985 and onwards. These variables are derived from information on citizenship and the parents’ country of birth. An individual’s country of origin is defined as “Denmark” if at least one parent was born in Denmark and holds Danish citizenship, regardless of the individual’s own country of birth. For immigrants and descendants, the country of origin is determined by the mother’s country of birth or citizenship. If the mother’s information is unavailable, the father’s country of birth or citizenship is used. If information from both parents is unavailable, the individual’s own country of birth or citizenship is used. Detailed information regarding “country of origin” and “immigration status” and how they overlap can be found in Table S5 and Table S6.

Table 13 Data Sources for Ethnicity in the Danish Registries

Information regarding the grounds of residence and types of residence permits has been registered in the OPHG table since 1997. This can, for example, be used to identify refugees. Linkable data from before 1997 is also available in the D101200.PSD_OPHOLDSGRUND_FOER1997_SURVEY registry at Statistics Denmark. However, this registry primarily contains imputed data, supplemented with information from a survey conducted by Statistics Denmark in 2019 and non-published residence permits from 1993 to 1996. Due to these limitations, the data is not recommended for analyses of small study populations but can be considered for research on larger populations.50 In addition, studies have used country of origin to approximate refugee status by coding individuals from largest refugee-sending countries between 1986 and 1992.51 Immigration and emigration dates have been systematically documented in the VNDS table since 1973, enabling the calculation of the duration of stay in Denmark. Additionally, this information has been available in the BEF table since 2004 through the variables FOERSTE_INDVANDRING, SENESTE_INDVANDRING, OPHOLDMD_DK.

Categorization

The context-dependent nature of ethnicity requires a flexible and sensitive approach to its categorization, tailored to s

Comments (0)

No login
gif