1 Introduction

Job satisfaction is defined as “a positive emotional state of feeling resulted from jobs, thus fulfilling individuals’ value towards their jobs” [1]. Although job satisfaction can be used as an overarching term to signify how satisfied an individual is with their job, several components contribute to their job satisfaction. These components can be intrinsic or extrinsic by nature. Intrinsic factors are related to the work’s nature, such as the level of autonomy, opportunity for professional development, sense of achievement, and recognition received for one’s efforts [2]. Extrinsic factors include external aspects such as salary, job security, working conditions, and relationships with colleagues and supervisors [1]. Psychological components such as perceived fairness, organisational support, and aligning personal and organisational values play a critical role in influencing job satisfaction and overall employee well-being [3].

Job satisfaction among medical doctors shows significant variability across countries and is influenced by factors like low pay, burnout, and administrative burdens, with UK doctors reporting the highest stress (71%) and lowest job satisfaction (24%) [4]. This dissatisfaction often has a spillover effect, negatively impacting other aspects of doctors’ lives, such as relationships and personal milestones [5]. However, in other countries [6] higher job satisfaction is reported, likely due to better working conditions and greater professional autonomy, though differences across studies may stem from varying measurement tools or country-specific factors [7].

Assessing job satisfaction is crucial, particularly in the medical sector, as it is closely linked to significant outcomes. High levels of job satisfaction among medical doctors are associated with improved well-being [8], reduced burnout [9], and increased motivation and engagement [10] in their work. This, in turn, enhances the quality of patient care [11], leading to better patient outcomes, such as improved safety, satisfaction, and recovery rates [12]. Job satisfaction is critical for the functioning of medical systems, as it influences workforce retention [13], reducing turnover rates [14] and minimising the costs and disruptions associated with recruitment and training [15]. Therefore, understanding and addressing job satisfaction is essential not only for individual medical doctors but also for the overall performance and sustainability of medical institutions. In the current literature, several scales have been developed that aim to measure job satisfaction in medical doctors. However, there is a lack of research that comprehensively analyses these scales and provides recommendations on which scales can measure job satisfaction most accurately. Thus, the present research aims to address this gap through a systematic review of the psychometric properties of the existing scales, with the specific objectives of identifying: (a) what measures are used to assess job satisfaction in medical doctors, (b) the structure of these measures and their psychometric properties, and (c) which measures are recommended to assess job satisfaction in medical doctors.

2 Methods

We conducted a systematic review of scales to assess job satisfaction in medical doctors, identifying their psychometric properties and structure of such scales. To ensure a rigorous systematic review process and limit inclusion bias, we adhered to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines to identify the studies [16].

2.1 Search strategy

A comprehensive literature search was conducted across APA PsycINFO, APA PsycArticles, APA PsycTests, MEDLINE, and EMBASE from database inception to March 2024. The search terms combined keywords and MeSH terms related to psychometric properties, job satisfaction, scales, and medical doctors. The detailed search terms that were used are as follows: (“Psychometric Properties” OR “Validity” OR “Reliability” OR “Psychometric” OR “Internal Consistency” OR “Test-Retest Reliability” OR “Construct Validity” OR “Convergent Validity” OR “Discriminant Validity” OR “Criterion Validity”) AND (“Job Satisfaction” OR “Employee Satisfaction” OR “Occupation* Satisfaction” OR “Work Satisfaction”) AND (“Scale*” OR “Measure” OR “Instrument” OR “Questionnaire” OR “Test*” OR “Measurement” OR “Survey*” OR “Self-report”) AND (“Medical Doctors” OR “Physicians” OR “Medical Specialists” OR “Clinicians” OR “Clinical Professionals” OR “Medical Professionals” OR “Healthcare Specialists” OR “Audiologists” OR “Allergists” OR “Andrologists” OR “Anaesthesiologists” OR “Cardiologists” OR “Dentists” OR “Dermatologists” OR “Endocrinologists” OR “Epidemiologists” OR “Family Doctors” OR “Gastroenterologists” OR “Gynaecologists” OR “Haematologists” OR “Hepatologists” OR “Immunologists” OR “Infectious Disease Specialists” OR “Internal Medicine Specialists” OR “Internists” OR “Neonatologists” OR “Nephrologists” OR “Neurologists” OR “Neurosurgeons” OR “Obstetricians” OR “Oncologists” OR “Ophthalmologists” OR “Orthopaedic Surgeons” OR “ENT Specialists” OR “Otolaryngologists” OR “Perinatologists” OR “Paleopathologists” OR “Parasitologists” OR “Pathologists” OR “Paediatricians” OR “Physiologists” OR “Physiatrists” OR “Podiatrists” OR “Psychiatrists” OR “Pulmonologists” OR “Radiologists” OR “Rheumatologists” OR “Surgeons” OR “Urologists” OR “Emergency Doctors).

2.2 Operational definitions

2.2.1 Medical Doctors

The criteria for identifying medical doctors in the articles examined for this systematic review were adopted from the World Health Organisation guidelines [17].

2.2.2 Job satisfaction

Job satisfaction has been defined by Locke [1] as “a positive emotional state of feeling resulted from jobs, thus fulfilling individuals’ value towards their jobs.”

2.3 Selection criteria

We included studies that: (i) were related to job satisfaction in medical doctors; (ii) were focusing on measuring a form of job satisfaction in medical doctors; (iii) reported information about at least one psychometric property of the scale used to measure job satisfaction; (iv) were based on quantitative research; and (v) published in peer reviewed journal articles.

We excluded: (i) qualitative systematic reviews or narrative reviews; (ii) other web articles and dissertations; (iii) studies that did not describe or provide a reference to the scale used; (iv) studies that did not provide any information about the psychometric properties of the scale used to assess job satisfaction.

2.4 Study identification

The study identification process was conducted in two phases.

In the first phase of the review, the reviewers independently screened each study based on the title and abstract. In the second phase, the reviewers screened the full text of each potentially eligible study for the final inclusion assessment.

The whole study selection process was carried out by two independent reviewers (FS and TM), and disagreements were resolved by consulting two other researchers (EM and MPC) at each stage of the revision. The percentage of agreement between the reviewers in the first phase was 99.30% and the second phase was 80.55%.

2.5 Data extraction

For each study included, data extraction was performed independently by one reviewer (FS), and subsequently verified for accuracy by a second reviewer (TM). The extracted data were organised into a custom data extraction table.

Data extracted included: Author, Year, Title, Country, Scale used to assess job satisfaction, Subscales, Number of items, Type of medical doctors, Psychometric properties assessed, Outcomes of psychometric properties, and Number of studies who used the scale.

2.6 Risk of bias

A risk of bias assessment was conducted through the COSMIN Guidelines (COnsensus-based Standards for the selection of health status Measurement INstruments) [18].

2.7 Data analysis

The COSMIN Guidelines [18] were adopted to analyse the psychometric properties of each job satisfaction scale. A rating scale was used wherein ratings of Very good, Adequate, Doubtful, Inadequate, N/A were used to evaluate the following psychometric properties: PROM development, Content validity, Structural validity, Internal consistency, Cross-cultural validity\Measurement invariance, Reliability, Measurement error, Criterion validity, Hypotheses testing for construct validity, and Responsiveness. A narrative synthesis was carried out once the ratings were allotted to each scale.

3 Results

3.1 Systematic literature search

A total of 1,368 abstracts were retrieved from six databases: Embase (342 records), PsycINFO (160 records), PsycExtra (84 records), PsycArticles (450 records), PsycTest (53 records), and Medline (279 records). After removing 368 duplicates, 1,000 studies were further assessed for eligibility. Figure 1 presents the flow chart of the studies and instruments identified during the literature search and selection process according to the PRISMA guidelines [16]. In the first stage, reviewers screened 1,000 records by title and abstract. In the second stage, 71 records were screened by full text. Following this, 35 studies comprising 29 scales were included in the review.

Fig. 1
figure 1

PRISMA Flowchart of Selecting Included Studies and Scales

3.2 General characteristics of the scales

The publication years ranged from 1985 to 2023. Scales originated from various countries: Australia (n = 3), Brazil (n = 1), Canada (n = 1), China (n = 3), Ecuador (n = 1), Ethiopia (n = 1), Hong Kong (n = 1), Japan (n = 2), Netherlands (n = 1), Poland (n = 1), Mali (n = 1), Singapore (n = 1), Spain (n = 1), Switzerland (n = 1), Taiwan (n = 1), Turkey (n = 1), and the USA (n = 10). The number of items across the 29 scales ranged from 5 to 79, with subscales ranging from 1 to 9. Two scales were unidimensional, and 27 scales were multi-dimensional. Across the 29 scales, the following subscales have been the most common: financial compensation (n = 20), peer support (n = 11), followed by autonomy (n = 4). Only one scale reported the completion time [19] which was an average of 12 minutes. Several studies provided a total score across all dimensions (n = 15), while others provided scores for each subscale as well including the total score (n= 13), except for one scale which does not provide clarification on its scoring method (Table 1).

Table 1 General characteristics of job satisfaction scales identified during the systematic review

3.3 Most frequently used scales

The scales used most frequently across the 35 studies include the Job Content Questionnaire [20] and Job Satisfaction Scale [21]. The Job Content Questionnaire has a α of 0.73 and has also been widely used across literature, thus demonstrating its applicability to accurately measure job satisfaction. However, since this scale has 34 items, it may be time-consuming for medical doctors to fill it. The second scale, the Job Satisfaction Scale’s α values ranged from 0.66 to 0.83 and is also a recommended scale to measure job satisfaction in medical doctors.

3.4 Healthcare professionals

The majority of the scales were developed for physicians (n = 16), followed by emergency physicians (n = 2), rural physicians (n = 2), general practitioners (n = 2), specialists (n = 3), clinical medical professionals (n = 1), anaesthesiologists (n = 1), cardiologists (n = 1), healthcare professionals (n = 1), paediatricians (n = 1), telemedicine practitioners (n = 1), primary care physicians (n = 1), primary healthcare professionals (n = 1), public sector medical doctors (n = 1), radiation oncologists (n = 1), rheumatologists (n = 1), and specialised care physicians (n = 1).

3.5 Psychometric properties

Table 2. 3 shows the assessment of the psychometric properties of the scales and the risk of bias assessment according to the COSMIN checklist.

Table 2 Summary of psychometric properties of job satisfaction scales identified during the systematic review
Table 3 Risk of bias assessment using the COSMIN guidelines

3.5.1 PROM design

In terms of scale development, most scales (n = 13) described the origin of the construct and the target population it was developed for, except for one [22]. However, most of the scales (n = 15) also had inadequacies regarding the group meetings or interviews conducted to establish the PROM [23, 24,25,26,27,28,29,30,31, 33,34,35].

3.5.2 Content validity

Most scales (n = 17) achieved a rating between low to moderate for content validity, indicating that while they have some relevance to the constructs they aim to measure, there is room for improvement. This suggests that the items included in these scales may not fully capture the breadth and depth of the intended concepts.

3.5.3 Structural validity

The Lichtenstein Physician Job Satisfaction Scale [26] was not based on a reflective model; thus, its structural validity was not reported. Most scales (n = 20) performed confirmatory factor analysis (CFA) ([36]) and exploratory factor analysis (EFA) [37], but a few scales (n = 7) [20, 28, 30, 33,34,35, 38] performed these analyses inadequately. For many scales (n = 15), the Item Response Theory (IRT)/Rasch model [39] did not adequately represent the research question, affecting their structural validity. Some scales (n = 11) [20,21,22, 27, 31, 38, 40,41,42,43,44] achieved high structural validity.

3.5.4 Internal consistency

The Lichtenstein Physician Job Satisfaction Scale [26] not being based on a reflective model, did not report internal consistency. The majority of the scales (n = 25) reported good values of α, ranging from 0.60 to 0.92, with most (n = 20) reporting alpha values for each subscale. However, since most scales (n = 25) used Likert scoring, the Standard Error of Theta (SE (θ)) or reliability coefficient of the estimated latent trait value (index of subject or item separation) was not calculated.

3.5.5 Cross-cultural validity

Out of the 29 scales, 28 reported high cross-cultural validity, except the Lichtenstein Physician Job Satisfaction Scale [26], which did not report any information about cross-cultural validity.

3.5.6 Reliability

Test-retest reliability was reported for most scales (n = 18), though a few (n = 3) [25, 31, 35] did not report adequate test-retest reliability. The intraclass correlation coefficient (ICC) was calculated for only a few scales (n = 8) [21,22,23, 27, 42,43,44].

3.5.7 Measurement error

Despite most scales (n = 20) having Likert-based scoring, the majority failed to calculate the Standard Error of Measurement (SEM), Smallest Detectable Change (SDC), or Limits of Agreement (LoA). Only a few scales (n = 9) [21, 22, 24, 27, 34, 42,43,44,45] calculated these metrics and the percentage agreement.

3.5.8 Criterion validity

Most scales (n = 24) calculated item correlations and the area under the operating curve (AUC), highlighting good discriminative ability. However, a few scales (n = 5) [22, 25, 28, 41, 46] did not report these correlations. Due to Likert scoring, only few scales (n = 5) [22, 24, 27, 34, 43] reported sensitivity and specificity for dichotomous scores.

3.5.9 Hypotheses testing for construct validity

Most scales (n = 25) demonstrated high clarity in what the instrument measures, showed sufficient measurement properties, and provided an adequate description of important characteristics of the subgroups, except for a few scales (n = 5) [25, 28, 41, 46, 47].

4 Discussion

4.1 Key findings

Across the 29 scales reviewed, only two scales, namely the Physician Job Satisfaction Scale [43] and the CARDIOSATIS-Equipe Scale [27], achieved a high rating across all psychometric properties. The other scales on an average received ratings from low to moderate with different scales (n = 11) achieving a high rating in some psychometric property categories. The Physician Job Satisfaction Scale [43] assesses physician satisfaction under the Ontario Health Insurance Plan through four subscales across 16 items. It has a α of 0.85 overall, indicating strong reliability. The CARDIOSATIS-Equipe Scale [27] measures physician satisfaction in pre-hospital cardiovascular care within a Telehealth project. It includes 15 items across two subscales, with high reliability (α = 0.88–0.90). However, it focuses mainly on the clinical environment, lacking aspects like wages and work-life balance.

4.2 Comparison with other literature

The Job Content Questionnaire [20] is widely recognised for its strong reliability across various work environments, making it a popular choice for assessing job satisfaction. However, its length (34 items) has been criticised, especially in contexts involving medical doctors, where time constraints are significant. This concern is echoed by Munir and Rahman [48] who argued that longer surveys might lead to lower response rates and increased respondent fatigue, ultimately affecting the data collected’s accuracy. This challenge is not unique to job satisfaction; similar issues have been noted in systematic reviews of related constructs such as employee well-being and job performance. For instance, a systematic review by Jarden et al. [49] on wellbeing measures highlighted that lengthy scales might not capture the nuances of employee well-being effectively due to respondent fatigue.

The Job Satisfaction Scale [21] is praised for its practicality and applicability across diverse medical environments. It efficiently covers essential aspects of job satisfaction, making it a valuable tool in healthcare settings. However, Gottert et al. [50] suggest that while this scale is useful, it might not fully address the unique challenges faced by medical professionals, such as administrative burdens and the emotional stress associated with patient care. These findings align with other studies on work-related well-being, where scales often struggle to account for the complex and multifaceted nature of job performance in high-stress environments like healthcare [51].

5 Strengths and limitations

This systematic review is the first to examine the psychometric properties of job satisfaction scales. It is methodologically strong, with two reviewers involved in every stage of study selection and additional researchers to resolve disagreements, ensuring a thorough and unbiased process. The review also employed a comprehensive search strategy, covering literature from database inception to March 2024, leading to the identification of a wide range of scales.

However, the study has limitations, including the difficulty in generalising findings across all medical practitioners due to the use of scales for different types of doctors. Some scales lacked full psychometric properties, and there is potential for publication bias, as scales with positive results may be more likely to be published and included, possibly skewing the findings. Moreover, conceptual heterogeneity across job satisfaction scales negatively impacts comparability of what is being measured. This limits both cross-study synthesis and the generalisation of findings.

5.1 Implications for future practice and research

This review recommends the Job Content Questionnaire [20] and the Job Satisfaction Scale [21] for general use in assessing job satisfaction among medical professionals. The former offers strong reliability and broad applicability, though its length may make it more suitable for research contexts. The latter provides a practical balance for routine assessments in healthcare settings, offering a balance of reliability and conciseness.

For more specific applications, the Physician Job Satisfaction Scale [43] is appropriate for physicians working under particular healthcare plans, while the CARDIOSATIS-Equipe Scale [27] is well suited to pre-hospital cardiovascular care environments.

The limited validity (both structural and content) and inconsistent reliability observed across different scales may likely reflect the broad nature of the construct of job satisfaction. Without a clear and shared operationalisation, different instruments may measure different facets of the construct, making cross-study comparisons problematic and limiting the possibility of generalising findings. Furthermore, these limitations in validity can distort the assessment of physicians’ true work experiences, resulting in ineffective organisational interventions and a failure to detect early signs of burnout. Consequently, patient-related outcomes such as care quality, safety, and satisfaction may be adversely affected. Inaccurate measurements also impede efforts to improve physician retention and overall healthcare system performance [22].

Improving validity is warranted to address these issues. While conducting factor analyses and ensuring adequate sample size can improve construct validity, content validity can be improved by incorporating user input during the item development process. For example, cognitive interviews can help identify misinterpretations and enhance clarity, while expert panel reviews ensure comprehensive coverage of the construct. An iterative refinement based on these methods can lead to more robust, valid, and comparable instruments [36].

6 Conclusions

This systematic review identified 29 job satisfaction scales across 35 studies for measuring job satisfaction in medical doctors. Among these, only four scales - Job Content Questionnaire [20], Job Satisfaction Scale [21], CARDIOSATIS-Equipe Scale [27], and the Physician Job Satisfaction Scale [43] - demonstrated high reliability and good validity for accurately measuring job satisfaction in medical doctors. Future research should focus on validating these scales across different medical specialties, addressing specific stressors faced by doctors, and developing scales that are applicable to a broader range of healthcare workers to improve generalisability across the medical field. The findings of this study can contribute to improve job satisfaction measurement, inform healthcare policies, enhance work environments, and ultimately improve patient care outcomes.