SMS review finds flaws, proposes new statistical model
A congressionally mandated report from the National Academies of Sciences, Engineering, and Medicine recommends a replacement for the Safety Measurement System, but it also concluded that SMS’s structure is reasonable and that its method of identifying motor carriers for alert status is defendable. A copy of the National Academies report is available at www.nap.edu/read/24818. The report, required by the 2015 Fixing America’s Surface Transportation (FAST) Act, also calls for some undoubtedly controversial additions to the types of data the Federal Motor Carrier Safety Administration collects. However, the National Academies report leaves unsettled one of the most controversial disagreements among stakeholders: Whether SMS percentiles should be public. It recommends another study “to better understand the statistical operating characteristics of the percentile ranks to support decisions regarding the usability of public scores.” A 12-member committee assembled by the National Academies acknowledged some of the major alleged data sufficiency and fairness flaws in SMS, such as geographical disparities in enforcement and limited use of data on clean inspections. But because multiple factors contribute to crashes and because crashes are relatively rare for small carriers, the committee concluded that FMCSA’s approach based on a prevention model rather than a prediction model is sensible:
…SMS has the objective of identifying carriers that give too little priority to practices indicative of safety performance. By intervening with those carriers, the hope is to encourage them to modify their behavior and, by so doing, reduce future crashes. We believe that the general approach taken by SMS is sound, and shares much with similar programs in other areas of transportation safety. Further, we have examined, to the extent possible, the various issues that have been raised in criticism of SMS. We have found, for the most part, that the current SMS implementation is defendable as being fair and not overtly biased against various types of carriers, to the extent that data on MCMIS can be used for this purpose.
The committee recommended some steps to improve data quality and increase the types of data elements FMCSA considers, but it concluded that some data flaws and limitations in today’s SMS – differing inspection performance by state, volatility of SMS data and reliance on nonpreventable crashes, for example – probably cannot realistically be fixed due to cost, dependence on state enforcement agencies or the inherent tradeoff between measuring many carriers and measuring individual carrier precisely. For example:
There is no getting around the point that providing BASIC measures to carriers that have very infrequent inspections will result in highly variable assessments of such carriers. This is simply because not much is known about the frequency of violations for small carriers. Such high variance measures can result in mischaracterizing the nature of a carrier—the high variability could result in the carrier being given alerts more or less often than what would be warranted given its behavior. On the other hand, the industry is highly skewed, being comprised of a very large number of small carriers. If the data sufficiency standards were raised, a high percentage of the industry would be excluded from measurement by SMS and therefore monitoring by FMCSA. We believe that this issue should be further investigated.
On the issue of nonpreventable crashes, the report acknowledges that it “is an important issue, especially for small carriers, since such events can be extremely damaging, possibly putting some small carriers out of business.” However, some considerations complicate the notion that such crashes be set aside, the report states. First, many crashes that were not the carriers fault “might have been prevented by drivers who took a more defensive approach to operating their vehicles.” Also, “it would be difficult to create an algorithm that would take as input the evidence at the scene of a crash and determine which crashes were and were not preventable.” And even if such information could be incorporated, there are questions about the reliability of police accident reports, the report states. The National Academies committee struggled with the concept of safety event groups. While it recognized some statistical rationale for safety event groups, it did not accept the common justification that they reflect carriers’ differing scopes and resources – such as the ability to afford technology and fatigue management programs. “We do not feel this is a reasonable justification for this peer group stratification. After all, the public has an interest in safe operations regardless of the size of the carrier.” Another problem is that for safety event groups to work well, carriers in the groups need to be as close in size as possible to reduce variability and randomness in data, but that would be difficult, the committee concluded. Instead, it said FMCSA should consider providing a confidence interval along with relative percentiles so that insurance companies, shippers, and the public will understand that a high percentile rank could have been due to randomness. Likewise, “it would be desirable for FMCSA to take the natural variability of the percentile ranks into consideration in the determination of which carriers receive interventions rather than just treating the percentile ranks as fixed quantities.” Item response theory Given limitations with SMS, the committee recommended that a new approach be developed over the next two years based on a concept known "item response theory" (IRT). The result would be a complex statistical model designed to adjust for the flaws and uncertainties inherent in today's system. The approach would resemble statistical models used in assessing performance related to health care – hardly surprising given that four of the 12 panel members are professors in the medical field. The National Academies report concludes that an IRT model could have many advantages over SMS. For example, it would rely on current observed data rather than dated information and could account for the variability of scores and rankings that plague especially small carriers’ SMS results. An IRT model also could account for the probability of a carrier being selected for an inspection and provide a basis to evaluate the structure of the current Behavior Analysis Safety Improvement Categories (BASICs), including which violations go into which BASIC. Other advantages include assessing time weighting better than SMS and allowing for “the addition of new safety measures without having to start from scratch.” However, carriers and even FMCSA itself basically would have to trust that the new model is better than SMS. The committee noted that “the proposed Bayesian IRT model, which involves use of 20-30 million observations and hundreds of variables to estimate hundreds of model parameters, is something that requires very specific expertise, usually found in academic statisticians who carry out research on these specific models. The sparsity of data and other aspects of the problem are likely to raise some computational complexities that would require software development.” The panel suggested that once the model was developed, “FMCSA staff would be very capable of maintaining” it. More and better SMS data While a new statistical model was the most fundamental change proposed in the National Academies report, the committee did recommend some steps to enhance SMS in its current form. For example, FMCSA should consider making reporting of clean inspections mandatory and should change its data sufficiency standards to accept carriers with a sufficient number of only clean inspections. FMCSA also should continue to work with states and other agencies to improve the quality of Motor Carrier Management Information System (MCMIS) data, the report states. “Two specific data elements require immediate attention: carrier exposure and crash data.” The report found that current exposure data are missing with high frequency and that data that are collected “are likely of unsatisfactory quality. FMCSA needs better-quality vehicle miles traveled (VMT) data, and it needs to collect that information by state and month. FMCSA potentially could get better information on exposure by working with state taxing authorities or, once the mandate takes effect in December, using data from electronic logging devices, the report states. The latter might require legislation, however, as FMCSA officials so far have interpreted a provision in the 2012 ELD mandate legislation as barring the agency from using ELD data for any purpose other than determining hours-of-service compliance. The panel also concluded that “there is information available from police reports currently not represented on MCMIS that could be helpful in understanding the contributing factors in a crash.” Such information could help to validate assumptions linking violations to crash frequency, the committee concluded. It recommended that FMCSA support the states in collecting more complete crash data and encourage the universal adoption of the Model Minimum Uniform Crash Criteria. Perhaps the most controversial recommendation in the National Academies report relates to additional data elements the committee proposes for MCMIS to assess how certain carrier attributes or practices might affect safety. The least sensitive proposed data point is better information on the type of cargo carried than what is collected today through the MCS-150. Much more controversial are the report’s recommendations for information on turnover rate and driver compensation. Regarding turnover, the report states that the information “could be very predictive of a company’s treatment of its employees, which could be related to safety operations.”