By Ginger Zielinskie (Center for Civic Futures & Yale SCALE Lab), Sarah Esty (Aspen Institute Financial Security Program), Eric Giannella (Georgetown University Better Government Lab), Anthony Lollo (Yale SCALE Lab), Maria Reyes-Gaskin (U.S. Digital Response)
Partner:
With the passage of the One Big Beautiful Bill Act (OBBBA), many states are now responsible for covering a portion of SNAP benefit costs, tied to the state’s SNAP payment error rate (PER). Based on historical SNAP PERs, states will collectively face an additional $10B annually in SNAP benefit costs beginning in October 2027. The severe budget implications of these penalties could have harmful and lasting effects on the ability of states to administer SNAP. But this is not set in stone. States can act to reduce their SNAP PERs but must move quickly for their efforts to impact their share of program costs in 2027.
States have concrete opportunities to use data, evidence, and rapid cycle analytics to identify and prioritize the sources of SNAP payment errors, remediating errors before they go out the door and preventing errors upstream at the root cause. States already have internal error detection built into processing through supervisor and/or quality assurance (QA) review. But states can make this step more effective by using historical data to develop predictive models on which cases are most likely to have errors, then using those models to prioritize which files go to supervisor or QA review. Given that states have limited resources for case review (so can thoroughly review only a fraction of all cases), and that review adds a step and slows down processing (so reviewing all cases thoroughly wouldn’t be a good idea even if there were resources to do so), analytics-driven prioritization of QA can maximize the effectiveness of this step.
Effective use of historical data can also identify patterns in errors that states can use to improve their processes, communications, training, and navigation supports, preventing errors before they occur. This prevention work should operate across the full journey of a case, addressing issues at application, processing, post-certification, and renewal, and address both client- and caseworker-generated issues.
Good models can help states prioritize the most significant error types (such as a caseworker’s failure to reconcile conflicting income documents vs. a client’s failure to report a new job that substantially increases their income). Then, design research with the caseworkers and clients can help identify potential solutions (like an AI-generated flag that draws caseworkers’ attention to document conflicts or a mid-period text reminder to clients with a history of frequent job changes encouraging them to report any new jobs or increases in income). As they roll out these changes, states should deploy a rapid cycle test-and-learn approach to measure the impact of the new approaches and avoid unintended consequences.

Below, we outline how states can work with the data they have to:
Not all errors are created equal. The Payment Error Rate is only concerned with errors outside of the tolerance threshold ($58 in FY26) or wholly ineligible cases, so fixing small errors is generally less useful. States seeking to find and reduce errors driving PER should specifically focus on errors that are greater than the threshold error. This focus applies to picking which cases to use to train a predictive model, and in any performance management approaches, such as counting which errors a QA team caught and fixed.
If states spend substantial time making minor corrections to benefit allotments this could result in many fixed errors with little impact on PER. For example, one cause of payment errors could be that a case’s SSI income was not updated to reflect annual cost of living adjustments. However, given these adjustments are minor, corrected benefits would only change by $10-20 from this error—an amount which wouldn’t have counted toward PER in the first place. Prioritizing the errors leading to the largest discrepancies in allotted benefits ensures effort is being spent fixing the errors that matter for the SNAP PER.
States have significant flexibility in how they manage QA (which may also include more informal reviews like supervisors’ double-checks of their teams’ case files), which can be optimized for the right balance of effectiveness at fixing material errors, resources devoted to QA, and reduction in case processing timeliness due to additional reviews.
As states are rolling out new QA processes or upstream interventions, they need to know whether those changes are working. However, Payment Error Rate has significant limitations as a day-to-day metric. PER is calculated from a small sample of cases, usually with a 2-3 month delay. That makes it hard to tell in real time whether a specific change is working. States implementing payment accuracy improvements need faster feedback. Here are four types of metrics that can fill the gap:
Error reduction metrics track whether you’re catching and preventing errors at their source. If states are creating or expanding QA reviews of cases, prioritizing change report processing, or asking caseworkers to double-check high-risk case elements before determination, these metrics show whether it’s working. They should track QC-reported errors by source (change reports, specific income types), errors caught through pre-determination review, and the percentage of QA reviews that find errors above the threshold. At a minimum, tracking the percentage of cases pulled into QA without no errors and seeing a decline can help ensure the state’s criteria is balancing workload and actionable cases.
Specific metrics could include:
- QC-reported errors (citing periodic reports, change reports)
- Errors and error dollars found by QA or pre-QA review
- Error dollars or frequency of errors tied to specific error types identified as key drivers (for example, those related to income, verifications, or arithmetic and computation)
Processing efficiency metrics show whether workflow changes are reducing the conditions that lead to errors. If your state has created a team dedicated to change reports, improved case-based processing, or streamlined how clients report changes, these metrics reveal whether backlogs are shrinking. Growing backlogs signal growing risks with respect to errors and timeliness.
Specific metrics could include:
- Time spent sitting in queue (for periodic reporting, change reports, preauthorization reviews, and general cases)
- Average case processing time
- Processing time for client-reported changes or change reports
- Cases processed untimely
- Average queue tasks/cases per worker
Workload management metrics help ensure quality improvements don’t create unsustainable bottlenecks. If you’re reorganizing task queues, moving QA review to pre-determination, or reducing unnecessary case touches, you should track pending case volumes, tasks completed per worker, and time spent sorting through queues. The goal is improving accuracy without increasing worker burden in the long term.
Specific metrics could include:
- Average number of pending periodic reporting and change reports
- Average pending cases
- Percentage of PreAuth reviews that find an error over the threshold
Staff and client outcome metrics capture whether accuracy gains are sustainable, without burning out workers or burdening clients. Accuracy gains that erode worker morale or increase beneficiary burden aren’t durable. On the other hand, if your state is running quality retrospectives, improving feedback loops from QA/QC to frontline staff, or improving worker companion tools like job aides, reductions in errors can even show improvements in client and staff satisfaction.
Specific metrics could include:
- Caseworker job satisfaction, trust, and retention
- Client requests for additional information
- Procedural denials, including missed interviews and missing verifications
These operational metrics won’t replace PER, but they provide the real-time feedback that states need to refine interventions before annual results arrive.
In tracking the metrics above to see if interventions are working well, states will need to be able to make comparisons—either to a pre-intervention baseline, or to a control group, and tweak the intervention approaches based on real-time feedback on how things are going. These “test and learn” methods include:
Randomize the intervention or process change: Randomizing which staff or cases are subject to the change will provide the strongest evidence for the effectiveness of that change. Randomization is also a useful tool to promote equity and fairness when capacity is constrained and the intervention cannot reach every case. Although randomization is the gold standard for academic research, it doesn’t have to be run at a giant scale for a long time to produce statistically rigorous and practical results; even a short test of 1-2 weeks might be enough to show if an intervention is working well enough to roll out to the full caseload.
Stagger implementation of intervention or process change: Randomization is not always feasible. Staggering the roll-out of changes provides an evaluation alternative to randomization that offers teams the chance to understand whether or not the intervention is effective. It also allows for testing and refining an approach before scaling. And many implementations are already phased rollouts due to the need to train staff or bring new tools online, so there are built-in opportunities to measure results in places that went first.
Measure change against a baseline: If a change has to be rolled out population-wide all at once, states can still measure the impact by collecting a pre-intervention baseline and measuring outcomes at specific intervals after the rollout to track the impact. If the changes are not working as expected, the state may need to roll back the change or revise it further.
Monitor PER improvements alongside other program and operational goals: Actions taken to reduce SNAP PER may have other system impacts, such as increasing denial rates or reducing timeliness. While undertaking PER reduction efforts, states will need to monitor multiple aspects of system functioning to provide a comprehensive view of the impact of interventions and identify if rebalancing across goals may be necessary. To achieve this more holistic system evaluation, states should consider:
The OBBBA has changed the financial stakes of SNAP payment accuracy. Although reducing PER is difficult, especially with limited staffing and financial resources, states can nonetheless make progress by focusing limited capacity on the errors that most influence PER, making their QA as effective as possible, tracking proxies to measure changes in close-to-real time, and rapidly iterating on improvement initiatives based on what the data shows. The timeline is tight, but the tools are in states’ hands. The key now is moving quickly, measuring consistently, and turning data into durable operational change.
This article was originally featured on the Safety Response Network.