Treatment Integrity of School-Based Behavior Analytic Interventions: A Review of the Research

A growing body of evidence suggests that treatment integrity of school-based behavior analytic interventions is related to intervention outcomes. These findings are of importance to behavior analysts, educators, and other practitioners working in school settings, and indicate that these professionals should be knowledgeable in the assessment of treatment integrity. In this article, we describe the methods used to measure treatment integrity in research and summarize the findings on consultation characteristics that affect treatment integrity. Based on the reviewed research, recommendations are offered to behavior analysts and school professionals to aid in the measurement and maintenance of treatment integrity in school settings.

Descriptors: School-based interventions, treatment integrity

Educational settings accommodate many children with learning disabilities, significant behavior problems, and developmental disabilities, guaranteeing a need for effective school-based interventions. Student response to intervention is the primary measure of the effectiveness of an intervention and the basis on which scientist-practitioners determine whether to modify, intensify, or terminate an intervention (Gresham, 2004). If student behavior improves following the implementation of an intervention, many assume the intervention is effective. In contrast, if student behavior worsens or remains stable, many assume the intervention is ineffective and seek to modify it. Response to intervention, however, is only a true measure of intervention effectiveness if the intervention is implemented as intended. The degree to which the treatment is implemented as planned is referred to as treatment integrity (Noell, Gresham, & Gansle, 2002).

For years, treatment integrity was considered only relevant for research-based treatment. It was—and still is—a primary methodological concern for researchers developing effective interventions. The pertinence of treatment integrity to the application of school-based interventions for children with learning and behavior problems was largely overlooked and underestimated, but recent research has demonstrated the benefits of treatment integrity measures for school personnel. Despite these findings, the use of treatment integrity measures remains low even in practice-based research. For example, a recent analysis showed that 68% of published treatment studies did not measure the accuracy of implementation (Wheeler, Baggett, Fox, & Blevins, 2006). Such findings indicate that measures of treatment integrity may be rarely used in practice.

In this issue of Behavior Analysis in Practice, Vollmer, Sloman, and St. Peter Pipkin provided some practical guidelines for measuring and monitoring treatment integrity. This paper supplements that of Vollmer and colleagues by providing a review of research findings on treatment integrity, including factors that may influence treatment integrity and how best to maintain it over time, with a focus on school-based settings in particular. Although treatment integrity is relevant to interventions implemented at individual, group, and organizational levels, this paper focuses on its application to individual behavior analytic interventions.

We begin by examining the relation between treatment integrity data and treatment outcome in order to highlight the importance of treatment integrity in school-based interventions. We follow this with a discussion of issues related to measurement, training, and consultation. Finally, we offer recommendations for the assessment and strengthening of treatment integrity by behavior analysts and school personnel.

Treatment Integrity and Intervention Outcome

One of the most convincing arguments for maintaining high treatment integrity is that it strengthens the effects of intervention. Although the body of research in this area is limited, results support the commonly held assumption that higher treatment integrity results in better treatment effects (Arkoosh et al., 2007; Gresham, Gansle, Noell, Cohen, & Rosenblum, 1993; Holcombe, Wolery, & Snyder, 1994; Noell et al., 2002; Northup, Fisher, Kahng, Harrel, & Kurtz, 1997; Sterling-Turner, Watson, & Moore, 2002; Vollmer, Roane, Ringdahl, & Marcus, 1999; Wilder, Atwell, & Wine, 2006).

Experimental studies in which the integrity of an intervention is systematically manipulated provide the most rigorous evaluation of this relationship. A handful of studies have examined the effects of treatment integrity on intervention outcome by implementing interventions with predetermined levels of integrity and evaluating the resulting effects on student behavior (Holcombe et al., 1994; Noell et al., 2002; Northup et al., 1997; Vollmer et al., 1999; Wilder et al., 2006). In several studies, researchers alternated varying levels of integrity of behavioral interventions and found greater improvements in behavior under conditions of high treatment integrity than under conditions of low treatment integrity (e.g., Holcombe et al., 1994; Noell et al., 2002; Wilder et al., 2006). Skills were also mastered more quickly when treatment was implemented with high integrity (Wilder et al., 2006).

It is important to note, however, that the effects of treatment integrity on outcomes often varied considerably across participants. For example, in a study examining the effects of treatment integrity on mathematics acquisition, students consistently performed well with perfect integrity (Noell et al., 2002). Surprisingly, however, some students performed just as well when the intervention was implemented with much lower treatment integrity. These data suggest low treatment integrity may not uniformly decrease outcomes for all students. Some elements of an intervention may remain robust despite low treatment integrity (Vollmer et al., 1999); in other cases, treatment components implemented with high treatment integrity may compensate for those implemented with low treatment integrity (Northup et al., 1997; Wilder et al., 2006). School professionals should not assume, however, that every intervention or child will sustain improved behavior with low levels of treatment integrity. Component analyses (i.e., systematic reductions of the integrity or presence of components) are necessary to identify components that maintain behavior change even when implemented with low integrity.

In sum, the current body of research suggests an emerging relationship between levels of treatment integrity and treatment outcomes. Knowledge of the measurement and consultation factors affecting treatment integrity will aid scientist-practitioners in choosing the best set of procedures to use in the classroom.

Measurement Factors Affecting Treatment Integrity

Several components of the measurement process, including the operational definition of the treatment and methods of measurement and calculation, may influence treatment integrity and the accuracy with which it is measured.

Operational Definition of Treatment

One crucial element when measuring treatment integrity is the operational definition of the intervention. All components of a behavioral treatment should be defined to promote high adherence to the treatment (Gresham, 1989) and encourage replication of the same treatment by multiple caregivers (Gresham et al., 1993). The behavioral interventions currently reported in many research studies are operationally defined on a molecular level, in which every step of the intervention is explicitly described. An example of such a definition includes the following: “Student will earn one sticker paired with verbal praise if he attempts to respond even if he responds with an incorrect answer” (DiGennaro, Martens, & Kleinmann, 2007, p. 450). By defining the intervention at a molecular level, behavior analysts may increase understanding of the intervention by those who implement it and permit a more objective assessment of treatment integrity. Thus, it is recommended that the treatment be defined for caregivers at this level.

Measurement of Treatment Integrity

The most frequently reported methods of treatment integrity measurement include direct observation, self-report, and permanent product recording. Direct observation is a method in which a rater observes the implementation of the intervention in vivo and records the occurrence/nonoccurrence of elements of the intervention (see Vollmer et al., in this issue, for a detailed discussion of this method). Self-report measures require teachers to indicate through interviews or questionnaires the extent to which they implemented the intervention correctly. Permanent product recording is a measurement of the finished products of the intervention. For example, if a teacher was required to provide a child with feedback on his work, the presence of the teacher's correction marks on his worksheet would indicate that she successfully implemented this component of the intervention.

Direct observation can result in accurate measurement of many components, including those that cannot be measured through permanent products. However, it can also be time-consuming (Gresham, 1989). Another difficulty with direct observation is the possibility of teacher reactivity, in which the presence of the rater inflates the typical levels of integrity. For instance, Reid, Parsons, Green, and Schepis (1991) and Brackett, Reid, and Green (2007) showed that staff implementation of interventions increased when they were aware of the observation and evaluation of their behavior. In contrast, data collected during inconspicuous observations captured low levels of staff treatment integrity. Unscheduled, unobtrusive, and unannounced observation may best help reduce reactivity and increase accuracy of the measurement of treatment integrity.

Although self-report and permanent product measurement may be more efficient than direct observation, they also have limitations. Noell et al. (2005) demonstrated that teachers' self-reports of treatment integrity were unrelated to observational measures of treatment integrity and may inaccurately represent the true level of treatment integrity. Self-report is therefore considered the least reliable method of treatment integrity measurement (Noell, 2007). Some authors also have speculated that permanent products may be created without implementing the intervention (Noell, Duhon, Gatti, & Connell, 2002), resulting in artificially high levels of treatment integrity. Perhaps even more challenging, however, not all interventions include components that produce permanent products (Noell, 2007).

As a result, a combination of methods may be best for collecting treatment integrity data in the classroom setting. Permanent products are extremely useful for collecting data on a daily basis and can be easily built into an intervention. However, observation of caregiver behavior should complement permanent product recording. Initially, frequent observations may be required following teacher training to monitor treatment integrity, but observations can be faded if treatment integrity remains high and stable. Both permanent product and daily, weekly, or bi-weekly observations lend themselves well to performance feedback, a method that is highly effective for increasing and maintaining teacher implementation of behavior intervention plans.

Calculation of Treatment Integrity

One of the challenges faced by behavior analysts collecting data on treatment integrity in classroom settings is identifying the best method with which to calculate treatment integrity data. In research, many methods are used. For interventions in which the teacher has the opportunity to implement each component only once during the observation, a simple method of dividing the number of components implemented correctly by the total number of components in the intervention, and multiplying by 100, has been sufficient (e.g., DiGennaro et al., 2007; DiGennaro, Martens, & McIntyre, 2005). The same method can be used with permanent product recording by dividing the number of permanent products produced by the teacher in a session by the total number of permanent products expected to be produced during the session, and multiplying by 100 (e.g., Mortenson & Witt, 1998, 2000, Noell et al., 2005). In other studies, treatment integrity was calculated by dividing the number of intervals (e.g., 30 s) in an observation during which the teacher correctly implemented all components by the total number of intervals within the observation, and multiplying by 100 (e.g., Jones, Wickstrom, & Friman, 1997; Wood, Umbreit, Liaupsin, & Gresham, 2007).

When school professionals evaluate how to best collect treatment integrity data for interventions with multiple components that may be implemented multiple times within an observation, they should consider options that provide the most detail for individual treatment components. For example, Hagermoser Sanetti, Luiselli, and Handler (2007) examined the treatment integrity for a 27-component intervention designed to reduce problem behavior. During the observation, the observer collected treatment integrity data on each individual treatment component by rating whether the component was implemented as written for every opportunity during the observation or whether there were no opportunities to observe implementation of the component. Overall treatment integrity was then calculated by dividing the number of components scored as “implemented as written” by the total number of components observed. This approach provides detailed information on individual treatment components, allowing practitioners to identify components that are being implemented incorrectly so that specific feedback can be provided to teachers. In addition, this method allows practitioners to identify individual steps that may be crucial for improving child behavior.

Consultation Factors Affecting Treatment Integrity

Much discussion and surprisingly little empirical research has focused on the impact of consultation characteristics on the relation between treatment integrity and intervention outcome. A number of authors have suggested characteristics (e.g., of the treatment, consultant, consultee, and environment) that may affect intervention implementation, but few have provided empirical support for these hypotheses (e.g., Dusenbury, Brannigan, Falco, & Hansen, 2003; Gresham, 1989; Noell, 2007). Consultation characteristics may be placed in two broad categories: those that are under the consultant's control and those that are not under the consultant's control.

Controllable Consultation Characteristics

Treatment characteristics (i.e., elements of the intervention) and consultant characteristics (i.e., the skills and behavior of the consultant) are largely under the consultant's control. In addition, the training that consultants provide to caregivers is under the consultant's control.

Treatment characteristics.

The treatment characteristics most frequently hypothesized to impact treatment integrity are the complexity of the intervention and the time and materials required to implement it (e.g., Gresham, 1989; Yeaton & Sechrest, 1981). However, research findings on these factors have been inconsistent. Early research examining the relation between these characteristics and treatment integrity was based largely on teacher or caregiver reports of preference regarding these characteristics and not on the actual impact of complexity, time, or materials on treatment integrity (e.g., Witt, Martens, & Elliott, 1984). More recent research has disputed these early hypotheses. For example, Witt, Noell, LaFleur, & Mortenson (1997) found that the provision of materials required for intervention implementation did not significantly alter treatment integrity, as was originally hypothesized. Additionally, studies have shown rapid deterioration of treatment integrity following training, regardless of the complexity of the intervention (e.g., DiGennaro et al., 2007; Hagermoser Sanetti et al., 2007).

Consultant characteristics.

Some researchers have debated whether consultants who work in collaboration with teachers produce more significant behavior change in the teacher than consultants who dictate intervention approaches without collaborating with teachers (e.g., Sheridan, 1992; Witt, 1990). Wickstrom, Jones, LaFleur, and Witt (1998) found the treatment integrity of interventions to be unaffected by the consultation approach used; integrity dropped quickly following training on the intervention across all teachers, regardless of whether a consultant collaborated with or prescribed an intervention to them. In fact, Noell (2007) suggested that a collaborative, co-investigator approach in consultation may even threaten treatment integrity by placing more emphasis on changing the behavior of the child rather than that of the teacher. Although the behavior of the child is a necessary focus, the behavior of the teacher in the context of treatment integrity should also receive attention.

Training and feedback.

Research indicates that detailed and direct methods of staff training are most effective in improving initial implementation of an intervention (Dusenbury et al., 2003). For instance, Sterling-Turner, Watson, Wildmon, Watkins, and Little (2001) assessed the effectiveness of didactic training and direct modeling with performance feedback in teaching undergraduate students to implement a behavioral intervention. Modeling with feedback resulted in higher treatment integrity following training than did didactic instruction. Nonetheless, extensive training alone does not ensure prolonged high treatment integrity. In Mortenson and Witt (1998), for example, teachers demonstrated high levels of treatment integrity following an initial training comprised of component review, discussion of intervention rationale and importance, and in vivo training. Despite this intensive training, treatment integrity quickly dropped below desired levels within a week of intervention implementation.

To respond to this concern, much research has focused on the effects of continued performance feedback on teachers' treatment implementation (Gilbertson, Witt, Singletary, & VanDerHeyden, 2007; Jones et al., 1997; Noell, Duhon, et al., 2002; Noell, Witt, Gilbertson, Ranier, & Freeland, 1997; Noell et al., 2000, 2005; Witt et al., 1997). In a performance feedback model, consultants collect treatment integrity data through direct observation or permanent product recording and meet with the teacher to share data on treatment integrity and student behavior. Frequently, the data are graphed to allow for visual inspection of performance. Components incorrectly implemented during the session are identified and practiced, and the importance of missed components is reviewed. The consultant and the teacher problem-solve ways in which treatment integrity can be improved, and praise is provided for all components of the intervention that were implemented correctly (Noell et al., 1997).

To illustrate, Witt et al. (1997), Noell et al. (1997), and Jones et al. (1997) compared daily performance feedback to a more traditional model of classroom consultation. Traditional consultation (didactic instruction and provision of rationale for intervention) was ineffective in maintaining teacher implementation of individualized interventions. However, daily performance feedback was associated with 80% or higher treatment integrity levels in all studies. Consistently high integrity, however, was maintained only while performance feedback was in place; treatment integrity was more variable when feedback was discontinued in the maintenance phase.

Performance feedback is also effective when provided less frequently, such as once weekly (Mortenson & Witt, 1998) or even bi-weekly (Codding, Feinberg, Dunn, & Pace, 2005). These less intensive consulting schedules allow for more prolonged provision of performance feedback, which results in greater maintenance of teacher skills (Codding et al., 2005).

Given the impressive impact of performance feedback on treatment integrity, a number of studies have focused on the essential components of the performance feedback package. Findings indicate that consultant contact alone is not sufficient to increase treatment integrity and that data review is essential (Noell et al., 2000, 2005). The data review must focus on the teacher's performance rather than on student behavior (DiGennaro et al., 2007). Performance feedback sessions that focused only on student behavior or teacher commitment to the improvement of student behavior were not as effective as full performance feedback sessions (DiGennaro et al., 2007; Noell et al., 2005). Finally, graphing data on teacher behavior has been shown to be an essential component of performance feedback, as sessions conducted without visual graphs have been shown to be less effective in improving and maintaining treatment integrity (Noell, Duhon, et al., 2002; Hagermoser Sanetti et al., 2007).

The mechanism by which performance feedback impacts the behavior of teachers is as yet undetermined. The results are not merely an outcome of training, as integrity was typically high following training in most studies (e.g., DiGennaro et al., 2005; Mortenson & Witt, 1998; Noell et al., 2000). Performance feedback may maintain high integrity as a result of positive reinforcement that teachers receive for correct intervention implementation (i.e., praise and recognition by the consultant.) Alternatively, correct implementation may be negatively reinforced by the avoidance of feedback. Research has focused largely on the latter explanation, noting that performance feedback sessions often are only conducted if the teacher achieves less than perfect treatment integrity during an observation session (e.g., DiGennaro et al., 2005, 2007; Gilbertson et al., 2007). By maintaining high levels of treatment integrity, teachers avoid performance feedback sessions, which may include such undesirable activities as discussion of errors, role play (DiGennaro et al., 2005), or directed rehearsal (Ward, Johnson, & Konukman, 1998) and can be time-consuming for the teacher. A study that examined this relationship failed to completely isolate the effects of negative reinforcement and positive reinforcement on teacher behavior (e.g., DiGennaro et al., 2005), and further research in this area is warranted.

Uncontrollable Consultation Characteristics

Several elements of the consultation process that may influence integrity appear beyond consultant control. These include consultee characteristics (e.g., beliefs, attitudes, experiences, and demographics of the treatment agent), client characteristics (e.g., characteristics of the student, including behavior), and environmental characteristics (e.g., the structure of the organization in which the intervention is implemented). For instance, research indicates that experienced teachers implement interventions more poorly than less experienced teachers (Noell et al., 2000). In addition, low integrity can be significantly predicted by poor student behavior (Hansen, Graham, Wolkenstein, & Rohrbach, 1991).

However, many assumptions in the literature about these uncontrollable characteristics are based largely on conjecture; the little empirical research that has emerged does not indicate that these characteristics affect treatment integrity. For example, teachers' attitudes about the acceptability of the proposed intervention, or the degree to which an individual perceives a treatment procedure as fair, reasonable, appropriate, and unintrusive (Kazdin, 1980), have been hypothesized to affect treatment integrity (Sterling-Turner & Watson, 2002). In contrast, Wickstrom et al. (1998) and Noell et al. (2005) found acceptability to be unrelated to reported integrity levels. Noell et al. also showed teacher acceptability to be unrelated to student outcome, indicating that some teachers found an intervention acceptable even though the student's behavior did not improve.

In summary, results of this literature review suggest that most of the consultation characteristics hypothesized to affect treatment integrity are uncontrollable, too poorly studied to inform practice, or, if studied, have not demonstrated an effect on treatment integrity. In the face of such ambiguity, practitioners should focus on the one characteristic with substantial evidence for its effectiveness in increasing treatment integrity: the direct training and performance feedback provided to caregivers.

Conclusions and Guidelines for Best Practice

Emerging research findings have demonstrated a clear positive relationship between treatment integrity and intervention outcome. Methods used for measuring and evaluating integrity detailed in research provide excellent models for incorporating the assessment of treatment integrity into school-based interventions, and research evidence supporting the effectiveness of performance feedback provides school professionals with the tools needed to maintain treatment integrity. Nonetheless, research is still needed to better understand the impact of consultation characteristics on the maintenance of treatment integrity and to provide practitioners with direction in the face of potential barriers to treatment integrity. Current research indicates that performance feedback may overcome many of these barriers, but further investigation is needed to determine whether additional strategies may be even more effective and efficient. In the meantime, the following guidelines are recommended for monitoring and ensuring high levels of treatment integrity in school settings:

Recognize the positive relationship between treatment integrity and intervention outcome. Provide clear, operational definitions of the intervention components. Develop a reliable, accurate method for measuring treatment integrity.

Choose a method for calculating integrity that will provide the most information about the individual treatment components.

Use detailed and direct training methods to teach staff.

Provide on-going consultation consisting of performance feedback to maintain treatment integrity levels over time.

Although additional research is needed to further our understanding of the role of treatment integrity in school-based interventions, behavior analysts and other school professionals now stand to answer a call from the field to include treatment integrity measurement as an integral part of the behavioral intervention process. As such, scientist-practitioners should strive to implement the practices in treatment integrity measurement and maintenance described here to strengthen and maximize the effectiveness of behavior analytic school-based interventions.

Footnotes

Special thanks to Sandra Harris, Lara Delmolino, and Suzannah Ferraioli for their helpful comments on earlier versions of this manuscript.

References

Articles from Behavior Analysis in Practice are provided here courtesy of Association for Behavior Analysis International