In examining the three categories that showed statistically significant improvement, the equipment category (which focuses on the appropriate maintenance and certification of equipment as well as the overall structure of the laboratory building) showed the most improvement. Laboratories completed recommendations to implement maintenance agreements and annual calibration of equipment as well as implementing regular testing of backup generators to drive the positive improvement. The QA/QC category focuses on processes and procedures designed to minimize error. The management category highlights overall laboratory processes such as inventory control. Those with statistically significant gains appear more procedural in nature. By simply ensuring practices were written into standard operating procedures and made available to staff, laboratories improved both their quality and overall management practices. The five remaining categories: virology laboratory, molecular laboratory, specimen collection, safety and NIC criteria showed improvement, though not statistically significant. These are more dependent on human resources and training. Skills are needed to isolate viruses, perform molecular procedures and specimen collection and these skills take time, human resources and training effort to implement, which may take more time to show significant improvement. The safety category reviews use of personal protective equipment (PPE) and inclusion of safety procedures for laboratory practices. In addition, two categories, molecular biology and specimen collection, were among the highest scoring in the initial assessments and thus statistical improvement is harder to achieve.
To validate that positive improvement was the result of actual improved work processes and procedures and not driven by changes in the tool or with different assessors, we asked assessors to review recommendations from the first assessment and document the status where actual improvements were made. Items such as lack of equipment maintenance agreements, inventory control or ordering procedures, written crisis plans and protocols for testing were examples of recommendations in the initial assessments. During subsequent assessments, these items were addressed or in the process of being completed. In addition, as other examples of progress, laboratories had purchased either equipment maintenance agreements or hired maintenance staff and developed and implemented written protocols for testing and biosafety. A recent article provides detailed analysis of all the recommendations made during the first assessment and the documented status garnered from either a follow up assessment or from laboratory personnel updates [10].
Recommendations provided necessary information to understand the needs and gaps in each country. In addition, comparing an individual country’s percentages for each category from first to second assessments is useful to identify areas of need. Charts displaying each category’s status for sequential assessments have been incorporated into our standard assessment reports. This provides a quick visual display of gains or losses during subsequent assessments. Lack of a value for a particular year is indicative of a 0% score in category for that year. A representative chart from one country is shown in Fig. 1.
A well-designed tool should identify gaps to assist in developing targeted training programs. By aggregating information from multiple countries within a Region, we were able to identify and assess common regional gaps. The identified gaps assisted with the development of the laboratory training content. For example, data and recommendations from the 2009 to 2011 assessments were aggregated and utilized to develop the curriculum for both laboratory management and biosafety trainings. These trainings included topics such as; “Human Resources Basics, Biosafety for Lab Managers, Quality Assurance and Quality Control, Inventory Management…” The trainings were offered in three different regions around the world in 2011 (South Africa), 2012 (Bangkok), and 2014 (Istanbul). Of the 17 laboratories assessed, 10 (59%) attended one of these laboratory trainings between their first and second assessments. Of the 10 countries attending training 9 out of 10 improved in overall management (quality assurance, inventory management, human resources, standard operating procedures, etc.) scores with 1 country maintaining the same score. The laboratory safety category also had 9 of 10 countries improving with the remaining country slightly decreasing its score. The topics of the training largely match the areas of the tool where subsequent assessments yielded statistically significant improvement overall. While we cannot precisely relate our training to the gains, the documentation of the needs in laboratories followed by our design of training to address these needs is compelling. Trainings to address additional gaps such as PCR and virus isolation are also conducted as needed.
One of the difficulties and a potential limitation in conducting assessments using a common tool with different assessors is the issue of inter rater consistency. Guidance for assessors using the original tool was provided through a webinar that presented the purpose of the tool and assessment and a simple overview with instructions. With the development of the revised tool we sought to address inter rater consistency by creation of a guidance document to minimize any potential issues and support consistency across different time intervals, assessors, and laboratories. The guidance document contains specific instructions and examples to assist with addressing all questions. Assessors were provided training prior to performing assessments with the revised tool including exercises on inter rater consistency. The lack of a significant effect on the use of different assessors versus the same assessor over time suggests the exercises for interrater consistency, guidance documents and training have been effective. Another limitation for this study is the small number of laboratories relative to the number of parameters in the model. The small sample size resulted in the wider confidence intervals in this study.
We were able to document a slight effect of time or length of time between assessments on score variability. Analysis shows that while there is positive movement for all assessments, there is a statistically significant difference when the time frame between assessments is greater than 27 months. This timeframe designation was driven by the available dataset. A limitation identified within this assessment process is that improvement, in large part, is driven by a willingness to improve as well as the infrastructure of the country’s Ministry of Health. To provide time for impact and improvement, the assessments should be conducted far enough apart to allow for implementation of the identified recommendations. As more data are collected, this analysis could be revisited to validate the statistical significance for either a shorter or longer timeframe between assessments. More data points are needed to assess the actual impact of time span on quality improvement. More importantly these data can help us determine optimal timeframes between assessments to maximize use of travel and assessor resources.
While programs strive for continuous improvement, revising our tool to add an analytic framework posed challenges. Performance measurement is a key component for analytic analysis and an applied quantitative framework that facilitates performance measurement should be included as part of the tool development process. By including this component in the development process, current status can be documented and action plans/recommendations for issue resolution can be developed.