Screening for depression, anxiety, and general psychological distress in pre-operative surgical patients: A psychometric analysis of the Patient Health Questionnaire 4 (PHQ-4)
Léonie F. Kerper, Claudia D. Spies, Janina Tillinger, Karl Wegscheider, Anna-Lena Salz, Edith Weiss-Gerlach, Tim Neumann, Henning Krampe
About the Authors:
Department of Anesthesiology and Intensive Care Medicine, Campus Charité Mitte and Campus Virchow-Klinikum, Charité – University Medicine Berlin, Germany;
Department of Medical Biometry and Epidemiology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
These authors have contributed equally to this work.
Background Although brief screening tools for depression and anxiety have proven psychometric quality in the general medical field, evidence is lacking for the perioperative setting. We investigated whether the ultra-short questionnaire PHQ-4 and its subscales Patient Health Questionnaire-2 (PHQ-2) and Generalised Anxiety Disorder Scale-2 (GAD-2) are reliable, valid and accurate screening tools for self-reported depression, anxiety and general psychological distress in surgical patients of preoperative anaesthesiological assessment clinics.
Methods This study was conducted in the context of the Bridging Intervention in Anaesthesiology programme (BRIA), which includes a computer assisted self-assessment of psychological screening tests. In total, data of 2,852 consecutive patients were analysed. We determined Cronbach’s alpha, construct validity, factorial validity, sensitivity, specificity, negative and positive predictive value, Youden index, and ROC-AUC analyses. As criterion measures, we used the scores of the Brief Symptom Inventory (BSI) scales for depression, anxiety, phobic anxiety, interpersonal sensitivity, and the total mean score Global Severity Index (GSI).
Results Cronbach’s alphas were 0.66, 0.78 and 0.83 for PHQ-2, GAD-2 and PHQ-4, respectively. Principal component analysis did not confirm the item allocation to PHQ-2 and GAD-2. All three scales showed good construct validity, as well as adequate accuracy with areas under the curve (AUC) between 0.80 and 0.88. PHQ-2 (≥ 3), GAD-2 (≥ 3), and PHQ-4 (≥ 6) had sensitivities between 46.4% and 61.2% and specificities between 89.4% and 94.5% with their established cut-off points and the respective BSI scales as criterion standards. With a lowered cut-off point of ≥ 4, sensitivity and specificity of the PHQ-4 total scale were 80.5% and 80.2%, respectively, for detecting clinically significant psychological distress according to the GSI.
Conclusion At a lowered cut-off point, the PHQ-4 total scale has sufficient psychometric quality to detect self-reported clinically significant psychological distress including depression and/or anxiety in surgical patients. PHQ-2 and GAD-2 are not recommended as exclusive measures of depression and anxiety in these patients.
There is ample evidence that screening for depression and anxiety is the first step of successful therapy of these disorders in patients with medical diseases (1-4). Preoperative anaesthesiological assessment clinics have proven to be an ideal setting for implementing psychosocial screening as a component of a psychotherapeutic stepped care approach for surgical patients with comorbid mental disorders (5; 6). These clinics are not restricted to specific surgical fields, thus a wide range of hospital patients can be addressed. However, preoperative assessment clinics are busy settings with limited resources of time and personnel, so it is necessary to search for brief screening tools which are reliable, valid, accurate and time-saving.
Although data exist that demonstrate psychometric quality of brief depression and anxiety screening instruments in the general medical field (3;7), evidence is lacking for the perioperative setting.
From the clinical perspective, an ideal stepped-care approach for surgical patients with diverse mental disorders comprises screening, detailed diagnostics, therapy and follow-up (5;8;9). In such an approach, screening should accurately identify self-reported psychological distress of any mental disorder. Therefore it is especially important to, in the first step, detect clinically significant mental distress that is associated with heterogeneous psychopathological symptoms. This is conventionally accomplished by comprehensive self-report questionnaires of psychiatric symptoms like the Brief Symptom Inventory (BSI) (10;11). As a consequence, short depression and anxiety screening tools should be highly sensitive to detect clinically significant depression, anxiety and general psychological distress that are measured by comprehensive self-report questionnaires. Particularly, the initial screening should avoid false-negative results because in the second step, only patients scoring positively can be further diagnostically examined with a clinical interview. On the other hand, specificity should not be too low in order to avoid an inadequately high number of diagnostic interviews with patients scoring false-positive (12).
We investigated whether the ultra-short Patient Health Questionnaire-4 (PHQ-4) and its subscales Patient Health Questionnaire-2 (PHQ-2) and Generalised Anxiety Disorder Scale-2 (GAD-2) are reliable and valid screening tools for self-reported depression, anxiety and general psychological distress of surgical patients in the preoperative anaesthesiological assessment clinic. As criterion standards, we used diverse scales of the BSI, a well-established and validated self-report questionnaire for a wide range of symptoms of psychological distress and mental disorders (10;11;13).
Material and Methods
Setting, study design and patient sample
This prospective observational study is a part of the feasibility study on BRIA which was approved by the Ethics Committee of Charité University Medicine Berlin [EA1/23/2004, Amendment April 2009] and was conducted according to the principles expressed in the Declaration of Helsinki. The full details of the setting, assessment instruments and recent sub-studies of the BRIA project are available elsewhere (5;8;9). We collected preoperative psychosocial questionnaire data with a computer-assisted self-assessment, including screening for depression, anxiety and general psychological distress. This assessment took place before the anaesthesiological examination in the preoperative assessment clinics of the Charité – University Medicine Berlin. Six months after the preoperative assessment, we obtained medical data from the electronic patient management system of the hospital.
The computer-assisted preoperative self-assessment took place from Monday to Friday between 9.00 am and 5.00 pm in order to cover the complete opening hours of the assessment clinics. Inclusion criteria were: patient in a preoperative anaesthesiological assessment clinic, sufficient knowledge of German language, age ≥18 years, written informed consent. Exclusion criteria were: urgent or emergency surgery; inability to attend the preoperative assessment clinic (bedside visit); members of the hospital staff; relatives of the study team; study participation in another clinical trial; homelessness; admitted in police custody; unwilling to use or incapable of using a computer. After having been properly instructed, patients supplied written informed consent to participate in the study.
From January 2010 to June 2010, we assessed 7,178 patients for eligibility. Of these, 3,157 were not eligible according to the inclusion/exclusion criteria, 953 refused to participate, and datasets of 216 patients were not applicable for data analyses because of missing data in the PHQ-4 and / or the BSI. As a result, data of 2,852 patients were analysed in this study. Figure 1 shows the details of the inclusion process.
The PHQ-4 is an ultra-short screening tool for depression and anxiety that combines the first two items of each of the scales Patient Health Questionnaire-9 (PHQ9) and Generalised Anxiety Disorder-7 (GAD-7) (3). Thereby this 4-item self-report questionnaire consists of two 2-item subscales, the depression scale PHQ-2 and the anxiety scale GAD-2 (14). The items of the PHQ-4 measure core symptoms of depressive disorders (loss of interest, depressed mood) and generalised anxiety disorder (feeling nervous and anxious, difficulty to stop or control worrying). The questionnaire starts with the general question “Over the last two weeks, how often have you been bothered by the following problems?” and continues with asking for the four symptoms which are rated on a 4-point scale from 0 (‘not at all’) to 3 (‘nearly every day’). The total sum score ranges from 0 to 12 with a range of 0 to 6 for each of the two subscales. An additional single item that is not included in any of the scale sum scores asks for the extent of the respondent’s subjective psychosocial symptom-related impairment.
A PHQ-2 score ≥ 3 indicates clinically significant depression, and a GAD-2 score ≥ 3 indicates clinically significant anxiety (3). Because it corresponds to a percentile rank of 95.7 of the normative data of a large German population sample, a PHQ-4 total score cut-off point ≥ 6 has been recommended as an indicator of the presence of a depressive or an anxiety disorder (4).
Kroenke et al. investigated the psychometric properties of the PHQ-4 in a sample of 2,149 primary care patients (14). They reported good reliability with Cronbach’s alphas of 0.85, 0.81 and 0.82 for the total scale, the PHQ-2 and the GAD-2, respectively. Factor analysis confirmed two suggested factors with the two depression items loading highest on factor 1 and the two anxiety items loading on factor 2. Construct validity has been shown by adequate associations with diverse domains of health-related quality of life, as well as self-reported disability days and physician visits. Reliability, factorial and construct validity were confirmed in a large general population sample (4). Criterion validity of the PHQ-2 and the GAD-2 were tested in separate studies which used structured clinical interview diagnoses according to DSM-IV as criterion standards.
The PHQ-2 cut-off point of ≥ 3 had a sensitivity of 0.83 and a specificity of 0.90 for ‘major depressive disorder’, as well as a sensitivity of 0.62, and a specificity of 0.95 for ‘any depressive disorder’ in a sample of 580 USAmerican clinical patients (15). In a German sample of 520 medical outpatients, criterion validity of the cut-off point ≥ 3 was confirmed with a sensitivity of 0.87 and a specificity of 0.78 for ‘major depressive disorder’, as well as a sensitivity of 0.79 and a specificity of 0.86 for ‘any depressive disorder’ (16). In a clinical sample of 965 US-American patients, the GAD-2 cut-off point ≥ 3 had sensitivities of 0.86, 0.76, 0.70, 0.59 and 0.65, as well as specificities of 0.83, 0.81, 0.81, 0.81 and 0.88 for the criterion standards generalised anxiety disorder, panic disorder, social anxiety disorder, posttraumatic stress disorder and any anxiety disorder (17).
The Brief Symptom Inventory (BSI) is an internationally widely used and validated 53-item self-report scale of symptoms of psychological distress. This short form of the Symptom Checklist 90-R (SCL-90-R) has proven sound psychometric properties in community samples as well as in samples of patients with medical conditions and mental disorders (10;11;13;18). The 53 items measure severity of diverse symptoms of mental disorders during the past 7 days and are rated on a 5-point scale from 0 (not at all) to 4 (extremely). The scale consists of 9 subscales of symptom dimensions with mean scale scores ranging from 0 to 4. The total scale score Global severity index (GSI) is the mean of all 53 items. It reflects both the number of symptoms and intensity of perceived distress. Whereas previous studies challenged the symptom dimensions and suggested a unidimensional structure of the BSI, recent research has shown superiority of a bifactorial model with one factor reflecting general psychological distress and a second factor consisting of the domain-specific symptom dimensions (19). We used the scale scores depression, anxiety, phobic anxiety, interpersonal sensitivity, as well as the GSI (Global Severity Index). The cut-off points for clinically significant symptoms are, according to the test manual, for all scales at the T-score of the normative population sample of T ≥ 0.63 (10;11). In the present sample, the reliability coefficients of the scales ranged from sufficient to excellent with Cronbach’s alphas of 0.85, 0.79, 0.74, 0.78 and 0.96 for depression, anxiety, phobic anxiety, interpersonal sensitivity, and the total scale.
Data were entered into a computerised database and statistical analyses were performed with IBM SPSS Statistics, Version 21. Data of those cases who had any missing item in the PHQ-4, or according to the test manual of the BSI, more than 1 missing item in any of the four investigated subscales, or more than 13 missing items in the total scale of the BSI were excluded from analyses (11). Descriptive results were expressed as follows: frequencies and percent; median (Md) and range of the 25th-75th percentiles (interquartile range IQR); mean (M) and standard deviation (SD); mode; skewness, kurtosis and the respective standard errors (SE); minimum, maximum. Reliability of the scales was studied by Cronbach’s alpha, Pearson correlations and principal component analysis (PCA). Validity was studied using correlations, as well as sensitivity, specificity, negative predictive value (NPV), positive predictive value (PPV), positive likelihood ratio (LR) and ROC-AUC analyses. Comparison of different cut-off points was based on the Youden index. In order to estimate which corresponding sizes of sensitivity and specificity can be considered as an optimal trade-off we followed the suggestion of Löewe et al. (2004) for a twostage screening (screening followed by further examination, e.g. a structured clinical interview) (12): The cutoff points that had a maximum sensitivity lying above specificity which, in turn, should be at least 75% were considered the best. All 95% confidence intervals were calculated with the confidence interval calculator (20) or SPSS. As criterion standard measures, we used clinically significant depression, anxiety, phobic anxiety, interpersonal sensitivity and general psychological distress as measured with respective BSI scales. A two-tailed pvalue < 0.05 was considered statistically significant.
Demographic, medical and psychological characteristics of the 2,852 study participants are summarised in Table 1. The patients had a median age of 47 years, and the ratio of women and men was nearly equally distributed. Concerning preoperative physical health, the majority of the patients was evaluated as healthy or having mild systemic disease and no functional limitations. There were moderately higher percentages of patients living with a partner, having no university entrance qualification and being treated in the abdominal and thoracic surgical field. The frequency of clinically significant psychological distress ranged from 9.2% for interpersonal sensitivity to 14.6% for general psychological distress.
Item and scale characteristics
Detailed item and scale characteristics of PHQ-2, GAD2 and PHQ-4 are shown in Tables 2 and 3. Concerning item intercorrelations, it is noteworthy that the PHQ-2 item 2 has stronger correlations with the GAD-2 items 3 (0.64) and 4 (0.67) than with the PHQ-2 item 1 (0.50) (Table 2). Whereas reliability in terms of Cronbach’s alpha is sufficient for the GAD-2 (0.78) and good for the PHQ-4 (0.83) it is rather low for the PHQ-2 (0.66). Using the originally established cut-off points, the scales indicate that the rates of clinically significant depression, anxiety, as well as depression or anxiety are 17.4%, 13.3% and 12.2%, respectively.
Principal component analysis of the PHQ-4 items indicates that there is only one factor with an eigenvalue above 1. It explains 67% of the total variance. The items 1, 2, 3 and 4 have loadings on this factor of 0.693, 0.873, 0.833 and 0.854. Including a second factor in the PCA revealed that the two factors explained 83% of the total variance. After varimax rotation, the first factor explained 55% and the second factor 28% of the total variance. However, rotated component matrix does not confirm the original item allocation to PHQ-2 and GAD2. Whereas item 1 loads higher on the factor 2 (0.962), the items 2, 3 and 4 load higher on the factor 1 (0.809, 0.861 and 0.851).
Associations with parameters that are different from depression, anxiety and general psychological distress demonstrate a clear pattern (Table 4). All 3 PHQ-4 scales correlate only weakly with age, gender, partnership status, education and physical health, indicating good discriminant validity. Convergent validity is demonstrated by moderate to strong correlations with the BSI scales depression, anxiety, phobic anxiety, interpersonal sensitivity and GSI, as well as the PHQ single item of subjective psychosocial impairment (Table 4).
Table 5 summarises sensitivity, specificity, positive likelihood ratio, positive predictive value and negative predictive value of the PHQ-2, GAD-2 and PHQ-4 with the established cut-off points and the five BSI scales as criterion standards. Across all criterion standards sensitivities of the PHQ scales are low with a range between 46.4% for the GAD-2 with the criterion BSI phobic anxiety, and 61.2% for the PHQ-2 with the criterion BSI depression. On the other hand, specificities of the PHQ scales are high with a range between 89.4% for the PHQ-2 with the criterion BSI depression, and 94.5% for the PHQ-4 with BSI-GSI as criterion. Correspondingly,
positive predictive values are low and negative predictive values are high.
Sensitivities and specificities resulting from different cut-off points are visualised with ROC curves in Figure 2. PHQ-2, GAD-2 and PHQ-4 are related to clinically significant depression, anxiety and general psychological distress as measured with the respective BSI scales. The AUC as a measure of classification performance was greatest for the PHQ-4 with BSI-GSI as criterion standard (0.88). With values between 0.80 and 0.87, the AUCs of PHQ-2 and GAD-2 can also be considered as good.
Table 6 summarises sensitivity, specificity and Youden index of PHQ-2, GAD-2 and PHQ-4 at various tentatively selected cut-off points. For all three PHQ scales the highest Youden index can be observed at cut-off points that are lower than the established cut-off points. The present data indicate cut-off points of ≥ 2 for PHQ-2 and GAD-2, as well as ≥ 4 for the PHQ-4. These points yield sensitivities from 76.1% to 90.3% and specificities from 67.1% to 75.0% for PHQ-2 and GAD-2, respectively. With a cut-off point of ≥ 4, sensitivity and specificity of the PHQ-4 total scale are 80.5% and 80.2%, respectively, for detecting clinically significant psychological distress according to the GSI. Regarding the suggestion of Löwe et al. (2004) for an adequate trade-off between sensitivity and specificity (12), only the GAD-2 with criterion BSI anxiety and the PHQ-4 with the criterion BSI-GSI show specificities of at least 75% and sensitivities that are above the respective specificity. The corresponding two lines of Table 6 are highlighted in bold.
To our knowledge, this is the first study that investigated psychometric quality of an ultra-short screening tool for self-reported depression and anxiety in surgical patients. The most important result is that the PHQ-4 proved to be a reliable and valid measure that accurately detected self-reported clinically significant general psychological distress, including depression and/or anxiety, in a sample of 2,852 preoperative surgical patients. Reliability and construct validity of the PHQ-4 total scale are demonstrated by an alpha of 0.83, weak correlations with parameters not measuring psychological distress, as well as strong correlations with the BSI-GSI, the BSI depression and anxiety scales, and the PHQ-4 psychosocial impairment item. Criterion validity is indicated by a large AUC of 0.88 with the BSI-GSI as criterion standard. Analysis of operating characteristics of the PHQ-4 total scale suggest that not the established cut-of point of ≥ 6, but a lowered cut-off point of ≥ 4 yield the best trade-off between sensitivity (80.5%) and specificity (80.2%).
Comparison with other studies
Previous studies demonstrated good psychometric properties of the PHQ-4, the PHQ-2 and the GAD-2 (4; 14-17). Recent studies confirmed these results with more evidence for the PHQ-2 (21-26) than for the GAD-2 (27; 28). Studies using the PHQ-4 as a stand-alone questionnaire did not re-evaluate its psychometric properties but used its subscales in the context of clinical objectives or to evaluate other instruments (8;29-31). Our results concerning PHQ-2 and GAD-2 as explicit subscales of the PHQ-4 are rather complicated and in some points inconsistent with previous research. A comprehensive discussion of the detailed item and scale characteristics is beyond the scope of this article. We decided to include these details in the tables 2 and 3 in order to stimulate further comparisons of our data with independent data sets based on clinical and population samples. In the following discussion we focus on the most important
points. While reliability of the PHQ-2 was weak, it was sufficient for the GAD-2. Both scales proved good construct validity. Adequately large AUC’s indicate good criterion validity. However, PCA and analyses of operating characteristics advice caution when applying the subscales PHQ-2 and GAD-2 with the established cutoff points of ≥ 3 to detect self-reported clinically significant depression and anxiety. The accuracy of both scales could be moderately increased by using cut-off points of ≥ 2. This result is consistent with findings in recent studies (22-25;27). However, the trade-off between sensitivity and specificity could not be optimised by this strategy. Sensitivities that were adequately high to avoid a large number of false-negative results in the first step of a two-stage screening were now associated with specificities hardly reaching 75%. This indicates more patients scoring false positive who will have to be examined clinically in the diagnostic stage. Finally, it has to be mentioned that, inconsistent with previous studies (14; 4), the original allocation of the PHQ-4 items to PHQ-2 and GAD-2 could not be confirmed by PCA. Therefore, the subscales, at least in the present sample, do not seem to be suited to classify patients into groups of only depression, only anxiety or both depression and anxiety.
The major limitation of our study is that we could not determine diagnostic accuracy of the PHQ-4 with diagnoses of mental disorders as criterion standards. This lies in the nature of the two-stage screening because only patients scoring positive in the first step are examined in the second step of clinical diagnostics. A second limitation is that our results are based on a first in-sample evaluation of the PHQ-4 as a stand-alone questionnaire.
They have not yet been tested out-of-sample using an independent data set. Interestingly, to our knowledge, there are no other psychometric analyses of the PHQ4 as a stand-alone instrument. Previous studies on its reliability and validity assessed the items of the PHQ-4 simultaneously in the context of at least one of its parental scales PHQ-9, PHQ-8 and/or GAD-7 (4;14-17). As a consequence, the PHQ-4 as a stand-alone instrument should be re-evaluated in further psychometric studies before it is widely used in clinical research and practice.
Clinical implications and conclusion
Evidence-based recommendations point out that screening for psychological distress in patients with medical diseases is only reasonable when it is integrated in therapeutic care that comprises adequate psychological assessment, diagnostics and therapy (1;3;4). For the perioperative setting, the BRIA program has been suggested as a feasible option of a psychotherapeutic stepped care approach that combines screening, brief intervention, motivational interviewing, basic elements of cognitive behavioral therapy and follow-up booster sessions (5;8; 9). As a research instrument, the computer assisted selfassessment of BRIA was rather long and comprehensive. The question arises as to which extent the results of the present study may contribute to develop an efficient and short screening for mental distress in the clinical routine of psychotherapeutic stepped care of surgical patients.
Our data support the inclusion of the PHQ-4 in a shorter computer assisted psychosocial self-assessment of surgical patients because its total scale proved sufficient psychometric properties to measure clinically significant general psychological distress including depression and/or anxiety. However, our data do not confirm the original item allocation of PHQ-2 and GAD-2 and do also suggest that both subscales have the risk to produce either false negative (cut-off ≥ 3) or false positive (cutoff ≥ 2) screening results. As a consequence, we advise against using PHQ-2 and GAD-2 as exclusive measures of depression and anxiety in preoperative surgical patients. Previous research on depression screening demonstrated several ways of improving the accuracy of short screening tools.
Among the easiest strategies are the inclusion of items asking for the patients’ subjective need of therapeutic help (32) and the combination of a first-stage screening with an ultra-short tool at a low cut-off point with an additional second-stage screening using a longer and more accurate instrument of 5 to 15 items (7). To our opinion, the PHQ-4 should both be refined and re-evaluated accordingly in further studies before reconsidering the role of its subscales PHQ-2 and GAD-2 in psychotherapeutic stepped care of surgical patients.
The authors wish to thank the teams of the preoperative anaesthesiological assessment clinics and of BRIA (Bridging Intervention in Anaesthesiology), Department of Anesthesiology and Intensive Care Medicine, Campus Charité Mitte and Campus Virchow-Klinikum, Charité - University Medicine Berlin, for the excellent help with patient care, data collection, and analysis.
Conception and design: LFK, CDS, JT, HK. Acquisition of data: LFK, JT, ALS, HK. Analysis and interpretation of data: LFK, CDS, JT, KW, ALS, EWG, TN, HK. Drafting the article: LFK, CDS, JT, ALS, HK. Revising the article critically for important intellectual content: LFK, CDS, JT, KW, ALS, EWG, TN, HK. Final approval of the version to be published: LFK, CDS, JT, KW, ALS, EWG, TN, HK. Funding This work was supported by the DFG (German Research Foundation, Grant KR 3836/3-1).
(1) U.S. Preventive Services Task Force: Screening for depression in adults: U.S. Preventive Services Task Force recommendation statement. Ann Intern Med 2009; 151:784-92. (2) Katon W, Roy-Byrne P: Anxiety disorders: efficient screening is the first step in improving outcomes. Ann Intern Med 2007; 146:390-2. (3) Kroenke K, Spitzer RL, Williams JB, Löwe B: The Patient Health Questionnaire somatic, anxiety, and depressive symptom scales: a systematic review. Gen Hosp Psychiatry 2010; 32:345-59. (4) Löwe B, Wahl I, Rose M, et al. A 4-item measure for depression and anxiety: Validation and standardization of the Patient Health Questionnaire-4 (PHQ-4) in the general population. J Affect Disord 2010; 122:86-95. (5) Lange LF, Spies CD, Weiß-Gerlach E, et al. Bridging Intervention in Anaesthesiology: First results on treatment need, demand and utilization of an innovative psychotherapy program for surgical patients. Clin Health Promot 2011; 1:41-9. (6) Linnen H, Krampe H, Neumann T, et al. Depression and essential health-risk factors in surgical patients in the preoperative anesthesiological assessment clinic. Eur J Anaesthesiol 2011; 28:733-41. (7) Mitchell A, Coyne J: Do ultra-short screening instruments accurately detect depression in primary care? A pooled analysis and meta-analysis of 22 studies. Br J Gen Pract 2007; 57: 144-51. (8) Kerper LF, Spies CD, Lößner M, et al. Persistence of psychological distress in surgical patients with interest in psychotherapy: Results of a 6-month follow-up. PLOS ONE 2012; 7:e51167 (9) Kerper LF, Spies CD, Buspavanich P, et al. Preoperative depression and hospital length of stay in surgical patients. Minerva Anestesiol 2013; epub ahead of print. (10) Derogatis LR: The Brief Symptom Inventory (BSI): Administration, scoring and procedures manual (3rd ed.). 1993, Minneapolis, MN, National Computer System. (11) Franke GH: Brief Symptom Inventory (BSI). 2000, Göttingen, Beltz Test. (12) Löwe B, Spitzer R, Gräfe K, et al. Comparative validity of three screening questionnaires for DSM-IV depressive disorders and physicians diagnoses. J Affect Disord 2004; 78:131-40. (13) Derogatis LR, Melisaratos N: The Brief Symptom Inventory (BSI): an introductory report. Psychol Med 1983; 13:595-605. (14) Kroenke K, Spitzer RL, Williams JB, Monahan PO, Loewe B: An ultra-brief screening scale for anxiety and depression: The PHQ–4. Psychosomatics 2009; 50:613- 21. (15) Kroenke K, Spitzer R, Williams J: The Patient Health Questionnaire-2: Validity of a two-item depression screener. Med Care 2003; 41:1284-92. (16) Löwe B, Kroenke K, Graefe K: Detecting and monitoring depression with a two-item questionnaire (PHQ-2). J Psychosom Res 2005; 58:163-71. (17) Kroenke K, Spitzer RL, Williams JB, Monahan PO, Loewe B: Anxiety disorders in primary care: prevalence, impairment, comorbidity, and detection. Ann Intern Med 2007; 146:317-25. (18) Geisheim C, Hahlweg K, Fiegenbaum W, Frank M, Schröder B, von Witzleben I: The German version of the Brief Symptom Inventory (BSI): Reliability and validity in a sample of outpatient psychotherapy patients [Das Brief Symptom Inventory (BSI) als Instrument zur Qualitätssicherung in der Psychotherapie]. Diagnostica 2002; 48:28-36. (19) Thomas M: Rewards of bridging the divide between measurement and clinical theory: demonstration of a bifactor model for the Brief Symptom Inventory. Psychological Assessment 2012; 24:101-13. (20) Herbert R: Confidence Interval Calculator. http://wwwpedroorgau/english/ downloads/ confidence-interval-calculator/ Accessed on January, 25, 2014 2013. (21) Li C, Friedman B, Conwell Y, Fiscella K: Validity of the Patient Health Questionnaire 2 (PHQ-2) in identifying major depression in older people. J Am Geriatr Soc 2007; 55:596-602. (22) Bennett I, Coco A, Coyne J, et al. Efficiency of a two-item pre-screen to reduce the burden of depression screening in pregnancy and postpartum: An IMPLICIT Network Study. J Am Board Fam Med 2008; 21:317-25. (23) Thombs B, Ziegelstein R, Whooley M: Optimizing detection of major depression among patients with coronary artery disease using the Patient Health Questionnaire: Data from the Heart and Soul Study. J Gen Intern Med 2008; 23:2014-217. (24) Arroll B, Goodyear-Smith F, Crengle S, et al. Validation of PHQ-2 and PHQ-9 to screen for major depression in the primary care population. Ann Fam Med 2010; 8:348-53. (25) Chae S, Chae M, Tyndall A, Ramirez M, Winter R: Can we effectively use the two-item PHQ-2 to screen for postpartum depression? Fam Med 2012; 44:698- 703. (26) Allgaier A, Pietsch K, Fruehe B, Sigl-Gloeckner J, Schulte-Körne G: Screening for depression in adolescents: Validity of the Patient Health Qestionnnaire in pediatric care. Depress Anxiety 2012; 29:906-13. (27) Delgadilloa J, Payneb S, Gilbodyc S, et al. Brief case finding tools for anxiety disorders: Validation of GAD-7 and GAD-2 in addictions treatment. Drug Alcohol Depend 2012; 125:37-42. (28) García-Campayo J, Zamorano E, Ruiz M, Pérez-Páramo M, López-Gómez V, Rejas J: The assessment of generalized anxiety disorder: psychometric validation of the Spanish version of the self-administered GAD-2 scale in daily medical practice. Health Qual Life Outcomes 2012; 10:114: (29) Bagley C, Rendas-Baum R, Maglinte G, et al. Validating migraine-specific quality of life questionnaire v2.1 in episodic and chronic migraine. Headache 2012; 52:409-21. (30) Wang S-J, Wang P-J, Fuh J-L, Peng K-P, Ng K: Comparisons of disability, quality of life,and resource use between chronic and episodic migraineurs: A clinic-based study in Taiwan. Cephalalgia 2012; 33:171-81. (31) Glaesmer H, Braehler E, Grande G, Hinz A, Petermann F, Romppel M: The German Version of the Hopkins Symptoms Checklist-25 (HSCL-25) - Factorial structure, psychometric properties, and population-based norms. Compr Psychiatry 2014; 55:396-403. (32) Arroll B, Goodyear-Smith F, Kerse N, Fishman T, Gunn J: Effect of the addition of a “help” question to two screening questions on specificity for diagnosis of depression in general practice: diagnostic validity study. BMJ 2005; 331:884.