A Review of Commonly Used Self-Report Assessments of Sleep and Behavioral Measures of Sleep in an Adult Population
Question: Sleep disturbances (e.g., nightmares, insomnia, poor sleep quality) frequently co-occur with psychopathology, especially anxiety and depression (Morin & Ware, 1996). However, many studies investigating sleep disturbance have utilized self-report, as opposed to behavioral measures (e.g., actigraphy). With the rise of commercially available sleep tracking devices (e.g., FitBit, JawBone), the validity of these behavioral measures could provide objective information to primary healthcare providers on patients’ sleep patterns. First, critically review the psychometric properties of commonly used self-report sleep measures and sleep diaries compared to commercial (e.g., FitBit) and non-commercial behavioral measures. Second, discuss how the addition of behavioral measures, in conjunction with others, may help elucidate the role of sleep in the development and treatment of at least one form of psychopathology.
A Review of Commonly Used Self-Report Assessments of Sleep and Behavioral Measures of Sleep in an Adult Population
Sleep disturbances include a multitude of sleep problems and can refer to specific sleep complains or specific sleep disorders. The Association of Sleep Disorders Centers (ASDC) categorizes sleep disorders into four categories – disorders of: initiating and maintaining sleep, excessive somnolence (i.e., sleepiness), sleep-wake schedule, and dysfunctions associated with sleep, sleep stages, or partial arousals (Association of Sleep Disorders Centers & the Association for the Psychophysiological Study of Sleep, 1979). The Diagnostic and Statistical Manual of Mental Disorders (5th Edition; DSM-5), includes primary sleep disorders and two specific subcategories of sleep-wake disorders: parasomnias (unusual behavior or physiological events during sleep that interrupt sleep) and breathing-related sleep disorders (American Psychiatric Association [APA], 2013). It is estimated that 50-70 million adults in the United States have a sleep disorder (Institute of Medicine, 2006). While there are many categories and types of sleep disorders, the most common problem is insomnia (APA, 2013). Each year, 33%-40% of US adults report symptoms of insomnia (Simon & VonKordd, 1997; APA, 2013) and 6-10% meet diagnostic criteria for insomnia (APA, 2013).
Not only are sleep-wake disorders common, they are also highly comorbid with other mental health disorders. For example, 40-50% of adults with insomnia have an additional mental health diagnosis (APA, 2013). Ford and Kamerow (1989) examined the relationship between sleep and psychiatric symptoms as part of the National Institute of Mental Health Epidemiologic Catchment Area study. They found that 40% of individuals with insomnia and 47% of individuals with hypersomnia met criteria for a psychiatric disorder, while only 16% of those without sleep complaints met criteria. Similarly, parasomnias (e.g., night terrors and nightmares) are related to increased trait-anxiety and elevations on clinical scales of the Minnesota Multiphasic Personality Inventory (MMPI), specifically depression and psychasthenia, which measures phobias, obsessions, compulsions, or excessive anxiety (Kales et al., 1980).
In addition, sleep disturbances are symptoms of many mental health disorders. Decreased need for sleep is a symptom of a manic and hypomanic episode; insomnia and hypersomnia are symptoms of major depressive disorder (MDD); recurrent distressing dreams and sleep disturbance are noted as symptoms of posttraumatic stress disorder; and sleep disturbance is also noted as a symptom of generalized anxiety disorder (APA, 2013). There is evidence both that sleep disturbance may be symptomatic of an underlying mental health disorder and that sleep disturbance can lead to psychological symptoms (Morin & Ware, 1996).
In general, sleep disturbances are related to poor overall mental health quality. In a sample of elderly primary care patients, excessive daytime sleepiness was the best predictor of poor overall mental health-related quality of life (Reid et al., 2006). Another study found that adults who reported frequent sleep insufficiency were more likely to report depressive, anxiety, and pain symptoms than those without frequent sleep insufficiency (Strine & Chapman, 2005).
It is evident that sleep disturbances have a strong relationship with mental health. Thus, it is important to evaluate the validity and reliability of sleep disturbance measures to better inform how different sleep disturbances may relate to the etiology, maintenance, and treatment of mental health disorders. Historically, polysomnography has been considered the gold standard for monitoring and assessing sleep in both clinical and research settings. Traditional polysomnography is conducted in sleep laboratories and requires electroencephalography, electrooculography, and chin and leg electromyelography, electrocardiography, respiratory effort, and measurement of airflow (Redeker, 2000). While polysomnography provides the most accurate and complete measurement of sleep, it is expensive, requires experienced technicians to administer and score, and requires patients to sleep in an artificial environment (Redeker, 2000). In addition, polysomnography does not assess individuals’ perceptions of their sleep. Due to these limitations of polysomnography, it is important to evaluate additional forms of sleep measurement. Novel forms of objective sleep measurement, such as actigraphy and commercial sleep trackers, may allow researchers to monitor sleep in individuals’ natural settings and are more accessible than polysomnography. In addition, self-report measures of sleep disturbances require little time and are readily available. Similarly, self-report measures can capture the subjective aspects of sleep and allow researchers to examine how objective and subjective report of sleep may differential relate to mental health complaints. Given the valuable information these measures can provide, it is important to fully understand the limitations and strength of theses measures. While sleep disturbances can comprise a multitude of sleep complaints, this review focuses on the most commonly used measures of sleep in US adult populations.
Non-Commercial Behavioral Measures
Polysomnography gives data on sleep stages, sleep time, sleep efficiency, arousal, cardiorespiratory events during sleep, and neuromuscular events associated with sleep. Polysomnography is considered the gold standard for sleep measurement, but requires well-trained staff and expensive equipment. Full polysomnography requires electrodes to be placed by experienced technicians and involves complex scoring techniques (Redeker, 2000). Test-retest reliability was evaluated in a sample of patients with obstructive sleep apnea between two consecutive nights of polysomnography. There were no significant differences in total sleep time, sleep efficiency, time in REM sleep, sleep latency or time in stages of sleep (Levendowski et al., 2009). Studies have demonstrated a range of interrater and intrarater reliability in scoring of sleep stage from weak to strong (Whitney et al., 1998; Collop, 2002). The validity of polysomnography has been evaluated across numerous sleep-wake disorders, including obstructive sleep apnea and REM sleep behavior disorder and has been found to be strong (e.g., Consens et al., 2005). Clinically, polysomnography is considered the diagnostic reference tool, the gold standard, and is recommended in clinical guidelines for the diagnosis of several sleep disorders (Kapur et al., 2017; Van de Water, Holmes, & Hurley, 2011).
Actigraphy is a measure of 24-hour patterns of activity, sleep, and circadian patterning that uses accelerometers to record movement several times per second. It requires individuals to wear a watch-like device on their wrist or, sometimes, ankle or waist. Actigraph data can be scored to provide information on sleep time, sleep efficiency, number of awakenings, length of awakenings, and levels of activity (Redeker, 2000; Ancoli-Israel, 2013). While actigraphy requires little effort on the part of the individual, it has some noted limitations. It can over estimate sleep time for individuals who lie awake without moving and can have confounding data for individuals with movement disorders (e.g., tremors; Sadeh, Hauri, Kripke, & Lavie, 1995). According to Sadeh (2011), actigraphy has exceeded the rate of growth in number of publications than polysomnography in the last two decades, suggesting that it is now a major assessment tool for sleep research and sleep medicine.
Numerous studies have examined the psychometric properties of sleep data from actigraphy, many comparing it directly to polysomnography. A caveat of these studies, however, is that there is not a standardization of equipment, procedures, or algorithms for data analysis, which makes it difficult to conclude validity across multiple studies (Sadeh, 2011). Actigraphy was highly correlated with polysomnography for total sleep time (r=.97, p<.05; Jean-Louis et al., 1996) and had high levels of agreement for differentiating sleep and wake in adults (91%-96.5%; Sadeh et al., 1994; Jean-Louis, 2001). Actigraphy has also demonstrated validity for measuring sleep duration compared to polysomnography; however, the concurrent validity for other sleep data from actigraphy are unclear. Some studies suggest that there are strong correlations between actigraphy and polysomnography for sleep onset latency and wake time after sleep (Shinkoda et al., 1998), but other studies reported weaker correlations for those outcomes (Jean-Louis et al., 1996; Blood, Sack, Percy, & Pen, 1997; Cole, Kripke, Gruen, Mullaney, & Gillin, 1992). In general, actigraphy has been found to have good reliability and validity for sleep outcomes in healthy adults but is less reliable for those same variables in some populations with disrupted sleep (Jean-Louis et al., 1996; Sadeh, 2011).
Across scoring algorithms and devices specificity for detecting wake time during sleep episodes has been an area of concern. Numerous studies have reported low specificity for different algorithms’ ability to detect time awake in young children, infants, and healthy adults, despite having high sensitivity (Sitnick, Goodlin-Jones, & Anders, 2008; Insana, Gozal, & Montogmery-Downs, 2010; de Souza et al., 2003; Paquet, Kawinska, & Carrier, 2007). In response to this concern, Tryon (2004) suggested that these discrepancies are known and can therefore be accounted for. In addition, new algorithms have shown better specificity while maintaining sensitivity in infant populations (Tilmanne, Urbain, Kothare, Wouwer, & Kothare, 2009; Sazonva, Sazonova, Schuckers, Neuman, & CHIME Study Group, 2004). Not only have researcher examined new algorithms, but some have also examined new methodologies for using actigraphy (Enomoto et al., 2009; Chae et al., 2009).
Commercially Available Behavioral Measures
Fitbit (Fitbit, Inc., San Francisco, CA, US)
Fitbit offers at least nine devices (accelerometers) that measure components of sleep, sedentary behavior, activity level, and steps. With regards to sleep measurements, Fitbit devices give data on actual sleep time, latency, number of awakenings, and efficiency. The Fitbit system relies on movement and absence of movement to infer sleep and wake, similar to the system of actigraphy. The use of commercially available tracking devices in research is relatively new, but there are some studies that have evaluated Fitbit’s performance measuring sleep against actigraphy and polysomnography. While test-retest has not been evaluated in the tradition sense, inter-Fitbit reliability agreement has been shown to be strong (96.5%-99.1%; Montgomery-Davis, Insana, & Bond, 2012). When evaluating concurrent validity, Fitbit differed significantly from polysomnography and standard actigraphy for sleep efficiency and total sleep time. Fitbit overestimated sleep efficiency by an average of 8% compared to polysomnography and 5.2% when compared to actigraphy. Similarly, Fitbit overestimated total sleep time by an average of 41 to 67.1 minutes compared to polysomnography and 24.1 minutes compared to actigraphy (Montgomery-Davis et al., 2012; Meltzer, Hiruma, Avis, Montomery-Downs, Valentin, 2015). Other studies have also demonstrated that Fitbit overestimated total sleep time, time in bed, and sleep efficiency compared to actigraphy (Dickenson, Cazier, & Cech, 2016; Rosenberger, Buman, Haskell, McConnell, Carstensen, 2016). One study found that Fitbit underestimated total sleep time by 105 minutes and underestimated sleep efficiently by 21% when used in sensitive mode (Meltzer et al., 2015). In sensitive mode, Meltzer et al. (2015) also found that Fitbit overestimated wake after sleep onset by 106 minutes. Fitbit demonstrated high sensitivity for identifying sleep epochs (91.4%-98.8%, p<.05) and poor specificity for identifying wake epochs (11.8%-19.8%, p<.001; Montgomery-Davis et al., 2011). Meltzer et al. (2015) found differences in sensitivity and specificity depending on mode used (normal mode: 87% sensitivity, 52% specificity; sensitive mode: 70% sensitivity, 79% specificity). While Fitbit devices have demonstrated good reliability, evidence suggests that it tends to overestimate total sleep time, overestimate sleep efficiency, and under-estimate wake after sleep in normal mode.
Jawbone (Jawbone Inc., San Francisco, CA, US)
Similar to Fitbit devices, Jawbone has six activity trackers that measure sleep variables. No studies have measured reliability in the Jawbone. Similar to Fitbit devices, Jawbone devices also overestimated total sleep time (by 10 to 26.6 minutes, ps<.001), sleep efficiency (by 1.9%, p<.001), and sleep onset latency (by 5.2 minutes, p=.005) compared to polysomnography (de Zambotti et al., 2015a; de Zambotti et al., 2015b). Jawbone underestimated wake after sleep onset by 10.6 to 31.2 minutes (p<.001; de Zambotti et al., 2015a; de Zambotti et al., 2015b). More research is needed to test the reliability and validity of Jawbone in order to fully understand its merits for measuring sleep.
Self-Report Sleep Measures
There are more than 22 unique self-report measures of sleep and sleep disturbance (Devine, Hakim, & Green, 2005). Three of the most cited measures are the Pittsburgh Sleep Quality Index, the Epworth Sleepiness Scale, and the Insomnia Severity Index. The most cited sleep diary is the Pittsburg Sleep Diary. A common limitation of self-report sleep measures is that they rely on individuals to recall many aspects of their sleep (e.g., duration, awakening, quality) for relatively long periods of time (e.g., up to a month). This level of recall can be especially difficult for acutely ill patients or inpatient samples (Redeker, 2000).
Pittsburgh Sleep Quality Index (PSQI; Buysse et al., 1989)
The PSQI was developed to measure sleep quality and disturbances in a one month period. It was originally developed for use in clinical populations, but is now widely used in both clinical and community populations (Buysse et al., 1989; Buysse et al., 1991). The PSQI consists of 19 items, which yield a global score and seven component scales (i.e., subjective sleep quality, sleep latency, sleep duration, sleep efficiency, daytime dysfunction, sleep fragmentation, and use of sleep aid medications). The subscales are scored from 0 to 3 and are combined to provide the global sleep quality index score from 0 to 21. Higher scores indicate worse sleep quality, and global sleep quality index scores above a 5 indicate impaired sleep (Buysse et al., 1989). The PSQI requires approximately 5-10 minutes to administer (Buysse et al., 1989).
The psychometric properties of the PSQI have been evaluated in multiple studies and populations (e.g., Tiffin et al., 1995; Gentili et al., 1995). In the original examination of the PSQI, Buysse et al. (1989) found that there was strong internal consistency between the seven component scores (overall reliability coefficient, a=.83) in a sample of 52 healthy individuals and 96 individuals with sleep problems. This level of internal consistency indicates that the seven subscales all measure some facet of an overall construct. Similarly, Carpenter and Andrykowki (1998) reported good internal consistency (a=.80) for the global rating across four clinical populations. Studies have shown moderate to high correlations between the component scales and the global score (rs=.42 to .83, ps<.001; rs=.37 to .80, ps<.0005; Caprenter & Andrykowki, 1998; Grandner, Kripke, Yoon, & Youngstedt, 2006), further suggesting that the component scales are related to an overall measure of sleep. In addition, Buysse et al. (1989) found that the PSQI had good test-retest reliability (.85, p<.001); however, they did not specify the time frame in which the measure was re-administered. Other studies have also found good test-retest reliability. For example, Gentili et al. (1995) found that when the PSQI was re-administered to a sample of nursing home residents after 19 days, it was reliable (.82).
In regards to content validity, the PSQI underwent 18 months of field testing and was created from clinical experience, as well as using existing measures (Buysse, 1989). It has been shown to have good discriminative validity as there have been significant differences in the PSQI global score between controls and individuals with disorders of initiating and maintain sleep and between controls and individuals with disorders of excessive somnolence (Buysse, 1989). Using the cutoff score of five, the PSQI is able to identify individuals with a sleep disorder with a sensitivity of 89.6% and specificity of 86.5% (Buysse, 1989). The PSQI has also demonstrated adequate construct validity. PSQI scores have been shown to be highly correlated with other scales of sleep quality and sleep disturbances (i.e., convergent validity), such as the sleep problems of the SER (r >.65, p <.001) and sleep restlessness of the CES-D (r >.69, p <.001; Carpenter and Andrykowki, 1998). Yet, when comparing the PSQI to polysomnographic results, there were fewer and smaller significant positive correlations (Buysse, 1989). The global PSQI score was weakly correlated with objective sleep latency from a polysomnographic test for a combined sample of individuals with and without depression (r=.20, p<.01) and was moderately correlated with percentage of REM sleep in healthy participants (r=.34, p<.006). The PSQI global score was moderately correlated with number of arousals in participants with depression in polysomnographic tests (r=.47, p<.002). Sleep latency was also moderately correlated between the PSQI and the polysomnographic results (r=.33, p<.001; Buysse, 1989). =Furthermore, the PSQI estimated longer sleep duration (t=9.98, p<.001) and efficiency (t=4.50, p<.011) than the polysomnography (Buysse et a., 1989). In addition, PSQI global scores and component scores did not correlated significantly with actigraphic measures of sleep latency, wake-time-after-sleep onset or sleep efficiency in a healthy sample of older adult or a sample of younger adults (Grandner et al., 2006). Total sleep time on the actigraphic measure was positively correlated with Sleep Duration component scale (r=.204, p<.01) and the Sleep Latency component scale of the PSQI (r=.275, p<.05; Grandner et al., 2006). The PSQI was poorly correlated with unrelated constructs (i.e., divergent validity), such as vomiting past week of the SER (r<.26, p<.001) and appetite past week of the SEAS (r<-.37, p<.001; Carpenter and Andrykowki, 1998). While Buysse et al. (1989) proposed a one-factor structure of the PSQI, others have found evidence for a three factor structure (e.g., Cole et al., 2006; Mariman et al., 2012). Taken together, the PSQI is a reliable measure and has adequate support for its validity. While the PSQI seems to have strong discriminant validity, it does not appear to have strong convergent validity when compared to behavioral measures of sleep. Thus, the PSQI global score may not be the best indicator of specific sleep variables, such as sleep latency. In addition, the component scores may not accurately measure sleep latency, sleep duration or sleep efficiency, but may provide important information on perceptions of those variables. Importantly, the PSQI global score is intended to measure overall sleep quality, which is not an outcome of polysomnography. In addition, polysomnography measures specific aspects of sleep over a given time period, while the PSQI is measuring aspects of sleep over the past month.
The Epworth Sleepiness Scale (ESS; Johns, 1991)
The ESS is the most commonly used measure of sleepiness in sleep research and in clinical settings (Kendzerska, Smith, Brignardello-Petersen, Leung, & Tomlinson, 2014). The ESS requires individuals to rate their likelihood of falling asleep in eight unique situations on a four-point scale from 0 to 3. The eight situations represent activities in daily life, such as watching television or sitting and reading. The ESS yields a sum total score (ranging from 0 to 24), with higher scores representing more daytime sleepiness (Johns, 1991).
The ESS has been repeatedly shown to have good internal consistency in samples of college students and samples of patients with sleep-wake disorders (Cronbach’s a=.73 to .88; Johns, 1992; Johns, 1994; Nguyen et al., 2006; Smith, 2008). Test-retest reliability was found to be good (r=.82) in one study of undergraduate students over five months (Johns, 1992); however, test-retest reliability has not been evaluated in clinical populations or with a shorter time period.
The ESS has demonstrated strong discriminative validity by showing significantly different scores between healthy individuals and individuals with a variety of sleep-wake disorders that are known to cause excessive daytime sleepiness (e.g., Beaudreau et al., 2012; Spira et al., 2011). Construct validity of the ESS has also been evaluated in numerous studies. The ESS has been found to have to have a moderate association with the maintenance of wakefulness test (MWT), which measures an individual’s ability to stay awake in a number of situations (Spearman rank coefficient from -.39 to -.48; Kingshott, Engleman, & Deary 1998; Banks et al., 2004; Sangal, Sangal, & Belisle, 1999). The ESS also has moderate correlations with the vitality subscale of the SF-36 (r=-.41 to -.47; Briones et al., 1996; Bennett, Barbour, Langford, Stradling, & Davies, 1999). The ESS has a weak to moderate association with the multiple sleep latency test (MSLT; -.03 to -.42; Johns, 1994; Olson, Cole, & Ambrogetti, 1998), weak correlations with the apnea hypopnea index that measures obstructive sleep apnea (r=.00002 to .07; Olson et al., 1998; Kingshott, Engleman, Deary, & Douglas, 1998), and weak associations with the PSQI (r=.008 to .16; Buysse et al., 2008b; Skibitsky et al., 2012). While the measures with moderate associations with the ESS support the construct validity of the ESS, one would expect higher correlations between the ESS and MSLT and measures of obstructive sleep apnea, as sleep latency and sleep apnea are proposed to be related to daytime sleepiness (Force, A.O.S.A.T., & American Academy of Sleep Medicine, 2009). Buysee et al. (2008b) found that ESS was weakly correlated with polysomnography, but concluded that this lack of association should be expected since the ESS measures habitual sleep patterns and sleepiness, while polysomnography measures sleep for a distinct time period. In addition, polysomnography does not measure an individual’s subjective feeling of sleepiness, which is the primary outcome of the ESS. There have been mixed finding on the unidimensionality of the ESS. Two studies found that the ESS had one dimension with the caveat that three items had low factor loadings (.08-.37; Johns, 1992; Johns, 1994). Another study found three factors in the ESS (Nguyen et al., 2006).
Overall, the ESS appears to have adequate reliability. Future research should examine the test-retest reliability in a wider range of populations and should further examine the factor structure of the ESS. In addition, the ESS appears to have adequate validity. It was moderately and negatively correlated with measures of wakefulness and vitality and was able to discriminate between healthy individuals and those with sleep-wake disorders that cause daytime sleepiness.
Insomnia Severity Index (ISS; Morin, 1993)
The ISI is a seven-item self-report measure that assesses subjective symptoms and consequences of insomnia and the amount of distressed caused by those symptoms (Morin, 1993). The ISI includes items that map on to the diagnostic criteria of insomnia (APA, 2013), such as items related to sleep-onset, sleep maintenance, and degree of impairment caused by sleep problems. Each item is assessed on a five-point scale (from 0-4) and summed for a total score. Higher scores indicate more severe insomnia (Bastien, Valieres, & Morin, 2001). The original score interpretations for the ISI were 0-7 indicated no insomnia, 9-14 indicated mild insomnia, and greater than 14 indicated moderate to severe insomnia (Morin, 1993).
The ISI has adequate to strong internal consistency in clinical samples (a=.74 to .91, Bastien et al., 2001; Morin, Belleville, Belanger, & Ivers, 2011) and strong internal consistency for community samples (a=.90; Morin et al., 2011). Since it was created based on the DSM-IV criteria and the ICD classification of insomnia, the ISI has strong content validity. In regards to discriminative validity, the cut-off scores have been reexamined based on estimates of sensitivity, specificity and ROC curves. Morin et al. (2011) found that the highest correct classification rate was found with a cutoff score of 10 for a community sample (86.1% sensitivity, 87.7% specificity) and a score of an 11 for a clinical sample (97.2% sensitivity, 100% specificity). Another cutoff score that has been suggested is 14, which yielded a sensitivity of 82.4% and a specificity of 82.1% in a primary care setting (Gagnon, Belanger, Ivers, & Morin, 2013). Concurrent validity was measured by comparing the ISI to a sleep diary and to polysomnography given at the same point in time. The ISI was significantly correlated with sleep diary data for sleep-onset, wake after sleep-onset, early morning awakening, and sleep efficiency (absolute value r=.32 to .91, p<.05). It was also significantly related to polysomnographic data on sleep onset (r=.39 to .45, p<.05); however, it was only significantly related to polysomnographic data on wake after sleep onset and sleep efficiency at post-treatment in a clinical study (r=.45 and =-.35, respectively, ps<.05; Bastien et al., 2001). Convergent validity was found to be strong in clinical and community samples when comparing the ISI total score to the PSQI total score (r=.80, p<.05), sleep diary total sleep time (r=-.54, p<.05), and sleep diary sleep efficiency (r=-.59, p<.05); however, the ISI was weakly correlated to sleep efficiency from polysomnographic data (r=-.16, p<.05) and not significantly related to total sleep time from polysomnographic data (Morin et al., 2011). The ISI was also moderately related to total scores on the Beck Depression Inventory, the Beck Anxiety Inventory, and general fatigue on the Multidimensional Fatigue Inventory (rs>.41, ps<.05; Morin et al., 2011). Overall, the ISI appears to have strong reliability and validity. While some variables of the ISI do not strongly correlate with polysomnographic data (e.g., total sleep time), the ISI had adequate validity for other variables (e.g., sleep onset) compared to polysomnography, sleep diaries, and other self-report.
Pittsburg Sleep Diary (PSD; Monk et al., 1994)
The PSD is a two-week sleep diary that asks individuals to document their sleep each night (bedtime) and each morning (waketime) for 14 days (Monk et al., 1994). The bedtime questionnaire consists of six items and assesses timing of meals, consumption of caffeine, alcohol, and tobacco, medication use, and exercise and napping periods. The 11 items of the waketime questionnaire assess the previous nights’ sleep, including information on sleep latency, frequency of nightly awakenings, and sleep quality, among others. Some items of the PSD are hand entered (e.g., dosage of medications), others are scored on a six-point Likert scale (e.g., frequency of nightly awakenings), and some are scored on a 100-mm visual analog scale (e.g., rating of sleep quality; Monk et al., 1994).
While the PSD is one of the most widely used sleep diaries and outcomes measures of insomnia, few studies have evaluated its psychometric properties. Test-retest of the items on the wakefulness questionnaire across an average of 22 months was strong (rs=.56 to .81, ps<.001; Monk et al., 1994). The PSD has been found to be sensitive to differences between individuals diagnosed with a sleep-wake disorder and controls (Monk et al., 1994). When compared to actigraphy data, the PSD had adequate support for concurrent validity. PSD was significantly correlated with actigraphy measures of total sleep time (r=.43, p<.0001). PSD estimates of sleep disruption based on wake after sleep onset time were significantly related to mean activity counts between bedtime and waketime on actigraphy (t=4.1, p<.0001; Monk et al., 1994). Weak to moderate convergent validity has been demonstrated with personality measures (e.g., time to bed with Eysenck Personality Inventory, extroversion, rho=.24, p<.05; Monk et al., 1994) and with measures of circadian type (i.e., Horne-Osteberg morningness questionnaire, timing of sleep episode, rho=.6, p<.0001). In addition, Monk et al. (1994) compared the PSD to the PSQI and found weak to moderate correlations between the PSQI global scale and items of the PSD (absolute value rho= .27 to .36, ps<.01). More research is needed to support the reliability and validity of the PSD.
Sleep and Depression
As mentioned above, insomnia and hypersomnia are included as possible symptoms of depression in the DSM-V. Some studies have estimated that as many as 90% of individuals with depression complain about sleep quality (Tsuno, Besset, & Ritchie, 2005). Numerous studies have looked at the relationship between depression symptoms and objective and subjective sleep. Research has shown that individuals with depression often report longer sleep latency, more frequent awakenings, longer awakenings, shorter total sleep time, and waking earlier in the morning compared to individuals without depression (Gillin, Duncan, Pettigrew, Frankel, & Synder, 1979). In addition, depressive disorders and symptoms are significantly related to daytime sleepiness and sleep quality (Maglione et al., 2012). A systematic review found strong support for the bidirectional relationship between sleep disturbances and depression, indicating that each contributes to the development and is a consequence of the other (Alvaro, Roberts, & Harris, 2013). One study reviewed, however, suggested that insomnia symptoms predict depression more strongly than the reverse (Buysse et al., 2008b). Similarly, Ford & Kamerow (1989) found that having insomnia at two time points a year apart predicted a new episode of major depression by almost forty fold compared to individuals without insomnia at either time point. In addition, individuals with insomnia that had resolved in a year were much less likely to develop a new episode of major depression. Research suggests that insomnia is related to an increase in severity and duration of depressive episodes in MDD (Franzen & Buysse, 2008).
Research has also evaluated how accurately individuals with depressive symptoms are able to report sleep variables. Individuals with depression have been shown to accurately report their total sleep time, as well as sleep onset latency (Argyropoulos et al., 2003). Yet, they are less accurate in reporting how many times they awake after falling asleep (Argyropoulos et al., 2003). Mayers and colleagues (2009) examined the underling mechanisms for inaccurate reporting of sleep variables for individuals with depressive symptoms and found that sleep satisfaction and overall quality of life report was highly related to symptoms of depression, while variance in sleep timing was highly related to the report of more anxiety symptoms. There is some evidence to suggest that inaccuracy of reporting sleep variables may increase with greater severity of depression (Argyropoulos et al., 2003). It has been hypothesized both that depressive symptoms lead an individual to negatively bias their sleep report and that sleep misperceptions may influence depressive symptoms (Ford & Cooper-Patrick, 2001).
Given that the literature suggests that individuals with depression may not be accurate in reporting some sleep variables, such as wake after sleep onset, the inclusion of behavioral measures may offer important information and improve validity of sleep measurement in this population. McCall and McCall (2012) found that for individuals with depression and insomnia, actigraphy data and sleep diary data from the same night were significantly different for sleep onset latency, total sleep time, and wakefulness after sleep onset. For the same night, actigraphy and polysomnography were only significantly different on sleep latency. In addition, actigraphy has been shown to be highly correlated with polysomnography for total sleep time and moderately correlated for sleep efficiency in individuals with MDD and either insomnia or hypersomnia (Jean-Louis et al., 2000). These studies suggest that actigraphy can be more valid for reporting total sleep time and wakefulness after sleep onset than sleep diaries for individuals with comorbid depression and sleep disturbances.
While actigraphy appears to be a valid measure of objective sleep data in individuals with depressive disorders, it is still important to include self-report measures of sleep when evaluating depression since subjective sleep misperceptions may be related to the development of depressive symptoms. A review by Gillin (1998) provided support for a strong relationship between subjective insomnia lasting at least two weeks and later onset of depressive episodes. In addition, since sleep duration, energy level, and sleepiness are components of depression, assessing a subjective report of those variables is important.
Beyond the etiology and maintenance of depression and insomnia, tracking sleep is also important in the treatment of depression. Treatment outcomes may be affected by subjective ratings of sleep quality prior to treatment. For example, women who had higher sleep quality ratings prior to treatment were more likely have remitting symptoms of depression during a trial of interpersonal therapy (Buysse et al., 1999). In addition, poor sleep quality was related to poorer response of combined pharmacological and psychological intervention for depression (Dew et al., 1997). Other studies have found that persistent insomnia may impede treatment response for depressive symptoms (e.g., Pigeon et al., 2008). Likewise, improvements in subjective sleep quality are correlated with lower rates of recurrence of depression (Buysse et al., 1996). One study examined which elderly individuals with remitted depression remained without depressive symptoms one year after combined interpersonal therapy and pharmacological treatment after switching to a placebo pill. They found that 90% of individuals with improved sleep quality remained well, but only 33% that had persistent insomnia symptoms remained without depressive symptoms (Reynolds et al., 1997).
Not only do sleep symptoms relate to the efficacy of treatment for depression, but sleep problems are also a common residual symptom after depressive symptoms remit. Following cognitive behavioral therapies and pharmacological treatment for depression, insomnia is the most common residual symptom (Franzen & Buysse, 2008). One study estimated that around half of individuals that responded to treatment for depression still complained of insomnia (Carney, Segal, Edinger, & Krystal, 2007). Recurrence of depression is related to enduring sleep disturbances, thus assessing and improving sleep is an important for the treatment of depression.
Pharmacological treatments are common for both sleep and for depression; however, research has shown that selective serotonin reuptake inhibitors (SSRIs), a common medication for depression, affect sleep at a neurobiological level (e.g., Pace-Schott et al., 2001). SSRIs have been found to suppress rapid eye movement (REM) sleep (Winkelman & James, 2004) and have been linked to an overall decrease in sleep quality (Trivedi et al., 1999).
Since treating depression does not appear to adequately address symptoms of insomnia and other sleep disturbance, research has also examined the effect of including sleep interventions with depressive treatment. Most research in this area has focused on the addition of sleep medication to pharmacological treatment of depression. Krystal et al. (2007) conducted an eight-week, placebo-controlled, double-blind study of patients with both MDD and insomnia, according to DSM-IV criteria. All patients received an SSRI each morning and were randomly assigned to also receive a placebo or a benzodiazepine receptor agonist in the evening. Individuals who received the sleep medication showed significant improvement in both self-reported sleep symptoms and depressive symptoms over the eight-week. At eight-weeks, patients with both medications showed significantly more improvement in depression, as well as improvement in sleep, suggesting that the sleep medication enhanced the response of the antidepressant. After the eight weeks, the sleep medication was replaced with a placebo for two weeks. During that time, individuals’ symptoms of insomnia or depressive symptoms did not increase. Other pharmacological trials have also shown that combining SSRIs with benzodiazepine receptor agonists can improve self-reported sleep and well being without affecting the efficacy of the antidepressant (e.g., Asnis et al., 1999).
While studies have demonstrated strong support on the effectiveness of cognitive-behavioral therapy for insomnia (CBT-I) for improving symptoms of insomnia (e.g., Morin, Culbert, & Schwartz, 1994), few have evaluated the effectiveness of CBT-I for individuals with comorbid depression and insomnia. One study compared the effects of an SSRI with CBT-I compared to an SSRI with a control therapy for individuals with MDD and insomnia. They found that those receiving CBT-I had higher remission rates for both depression and insomnia (Manber et al., 2008). Research also suggests that CBT-I without pharmacological treatment can reduce not only sleep symptoms, but may also decrease depressive symptoms for individuals with mild depressive symptoms (Taylor, Lichstein, Weinstock, Sanford, & Temple, 2007). While these results are promising, more research is needed to evaluate the effectiveness for CBT-I and other non-pharmacological treatments of insomnia on reducing comorbid depression.
After insomnia, fatigue is the second most common symptom in depression and is frequently treated with a psychostimulant called Modafinil (Nierenberg et al., 1999; Franzen & Buysse, 2008). In trials of Modafinil and SSRIs for depression and excessive daytime sleepiness, Modafinil did not reduce sleepiness, fatigue, or depressive symptoms (e.g., Fava, Thase, & DeBattista, 2005). Thus, more research is needed on treatment of fatigue in depression.
Taken together, there is strong support that insomnia and depression have a reciprocal relationship. In addition, insomnia symptoms appear to be related to the maintenance and recurrence of depressive symptoms. There are limited studies on treatment that focuses on symptoms of depression and insomnia concurrently, even though these symptoms commonly occur together. Given that individuals with depression may be less accurate in reporting symptoms of sleep, such as wake after sleep onset, the addition of behavioral measures may be especially important for studies investigating sleep as a treatment outcome. Assessment of sleep from self-report and commercially available behavioral measures (e.g., Fitbit) may also provide important information for primary care providers, as it has been suggested that individuals may be more willing to disclose sleep problems compared to other mental health symptoms, like depression (Pigeon, Britton, Ilgen, Chapman, & Conner, 2012).
Conclusion and Future Directions
Sleep disturbances are commonly occurring in the general population and are highly related to mental health. There are countless methods of sleep measurement with varying degrees of reliability and validity. In general, there are significant differences in the data from self-report measures and behavioral measures. Thus, self-report measures should not be used to replace objective measures. However, self-report measures should be used to compliment objective measures, as they provide unique data about the experience of sleep. In addition, new commercially available forms of objective sleep measurement provide opportunities to measure sleep in larger samples as they are cheaper than polysomnography and are become increasing common in the population. More data is needed on the specific devices and algorithms used in these commercial devices to support their validity. In regard to depression, research should focus on how subjective and objective changes in sleep relate to the onset, progression, maintenance, and treatment of depressive episodes. Objective measures can provide insight into how sleep duration and wake after sleep onset relate to depression, while self-report measures may inform how subjective experience of sleep relates to the development and maintenance of depression.