Share this: Facebook Twitter Reddit LinkedIn WhatsApp

Abstract

Sentencing disparity among similar offenders has increased at a disconcerting rate over the last decade. Some judges issue sentences twice as harsh as peer judges, meaning that a defendant’s sentence substantially depends on which judge is randomly assigned to a case. The old mandatory sentencing guidelines repressed disparity but only by causing unwarranted uniformity. The advisory guidelines swing the pendulum toward the opposite extreme, and this problem promises to grow worse as the lingering effect of the old regime continues to decrease.

This Article is the first to propose a system—data-driven appellate review—that curbs sentencing disparity without re-introducing unwarranted uniformity. Congress should establish a rebuttable presumption that outlier sentences among similar offenders are unreasonable. The U.S. Sentencing Commission collects data on over 70,000 criminal cases annually. This data provides the tool for defining categories of similar offenders. Culling outlier sentences through data-driven appellate review would increase judicial awareness of sentences issued by peer judges and would therefore curb the increase in inter-judge disparity without resorting to unwarranted uniformity.

I. Introduction

II. Current State of Sentencing

A. Genesis

1. The Lead-up to Booker

2. Booker and Its Progeny

B. Problems Created by Booker

C. Recent Studies Demonstrating Inter-Judge Disparity

1. Boston Study

2. Nebraska Study

3. California Study

4. Syracuse TRAC Reports

5. Multi-District Empirical Analysis

D. Limitations of These Studies

III. A Proposal for Presumed Unreasonableness

IV. Methodology for Determining Which Defendants Are Similarly Situated

A. Limitation to Specific Offense Categories

B. Sample Size

C. Choice of Factors

1. District of Sentence

2. Other Factors

3. The Complexity of Guideline Amendments

V. Reform in Action and Lessons Learned from the Data

A. Lessons and Data from Massachusetts

B. Lessons and Data from W.D. Mo.

VI. The Proposal’s Benefits

A. Distinctions from Existing Proposals

1. Systematic Reforms

2. Calls for Stronger Appellate Review

3. Trailing-Edge Guidelines

B. The Proposal Is Sound Policy

1. Direct Effect on Judges

2. Minimal Effect on the Anchoring Properties of the Guidelines

3. Effect on Non-Judicial Sources of Disparity

4. Further Positive Effects

5. The Limits of the Proposal

C. The Proposal Is Constitutional

VII. Conclusion

Appendix

I. Introduction

Federal sentencing policy leaves a substantial portion of each sentence subject to the identity of the sentencing judge. A card randomly drawn from a shuffled deck—a method some courts use to assign judges to cases[2]—can alter the length of a sentence by several years. Between 2007 and 2011, one judge in the District of Nebraska sentenced drug offenders to a median 60 months while a colleague on the same court sentenced drug offenders to a median twice that amount.[3] This inter-judge disparity has swelled since the Supreme Court decision in United States v. Booker[4]rendered the sentencing guidelines advisory, and the disparity shows signs of only increasing.

Booker nullified an important part of the Sentencing Reform Act (SRA)—the Act that created the Sentencing Commission and clothed it with authority to create the mandatory Sentencing Guidelines. Reducing inter-judge disparity was a central aim—if not thecentral aim[5]—of the Act.[6] Congress sought to achieve this aim by pursuing an ideal of uniformity that required judges to sentence defendants within complicated and narrow specified guideline ranges.[7] The SRA had no shortage of faults,[8] and it reduced inter-judge disparity at the cost of increasing unwarranted uniformity,[9]but it successfully reduced inter-judge disparity in measurable ways through the mandatory guidelines.[10]

In a post-Booker world where pre-defined guideline ranges are only advisory, the SRA has lost the element most responsible for reducing inter-judge disparity, and nothing remotely sufficient has replaced that element. Instead, “copious evidence” now shows that Booker decimated the SRA’s central purpose.[11] To be sure, Booker is responsible for only a slight decrease in aggregate sentence averages (although average drug sentences decreased substantially).[12] But Booker instigated significant disparity between the individual sentences that factor into those aggregate averages by augmenting judicial discretion and reducing meaningful appellate review.[13] Booker has at least doubled how important a specific judge is to the outcome of a sentence in certain districts, and the effect is even greater controlling for mandatory minimum sentences, which create an artificial uniformity that masks sentence variation.[14] The example from the District of Nebraska also understates the extent of inter-judge disparity because judges in that district adhere to the advisory guidelines at higher rates than average and so experience below-average disparity.[15] This disparity is driven not only from differences in judicial philosophy, but also from a simple lack of awareness of sentences imposed by peer judges.[16]

This problem is likely to intensify. Although many judges continue to issue sentences within the Guidelines, that practice is attributable in part due to the inertia of judges who sentenced under the mandatory regime. One should not expect newly appointed judges who never sentenced under a mandatory regime to exhibit those same tendencies.[17]

What is to be done with a system that does not achieve its central aim? No shortage of scholars have suggested abolishing and replacing the Guidelines.[18] Yet the majority of judges favor the Guidelines in their advisory state.[19] Judges tend to appreciate the Guidelines and find them helpful;[20] presumably, they disfavor revolutionary departures from the current system. Moreover, the existing proposals seek other goals, not a reduction in inter-judge disparity. This Article contributes to the literature the first reform tailored specifically to inter-judge disparity in the post-Booker world, and it does so by being the first to delve deeply into the sentencing data the Commission collects and the first to suggest a reform that supplements the system instead of replacing it.

In particular, this Article establishes that direct methods of attacking inter-judge disparity are inappropriate because they invariably produce unwarranted uniformity, which creates its own inequities. Instead, Congress should attack inter-judge disparity indirectly. By creating a rebuttable presumption that sentences are unreasonable when they stray too far from the median sentence imposed on similarly situated offenders, Congress can 1) induce judges to provide greater justifications for their sentences and 2) make judges more aware of the sentences imposed by peer judges, an awareness that is currently lacking.[21] These two effects would ameliorate inter-judge disparity. Owing to the obligation of the U.S. Sentencing Commission to collect and publish voluminous sentencing data,[22] the resources necessary to define categories of similarly situated offenders and to review the sentences of those offenders are available. This data, combined with modern technology, permits this critical proposal to move forward in a way unfathomable when Congress passed the SRA because it allows courts to readily construct categories of similarly situated offenders and calculate the range of sentences issued to that category.

Other Articles have correctly identified the main problem in the post-Booker world—a lack of meaningful appellate review—but none has constructed the contours of an appellate solution or established why it is feasible. This proposal is the first to do so. Yet Booker struck down stronger appellate review in making the Guidelines advisory,[23] and several scholars believe that increasing appellate review might unconstitutionally violate the remedial holding of Booker.[24] Unlike articles that limited their analysis to generic calls for greater appellate review, this Article resolves the constitutional arguments against an increase in appellate scrutiny. Finally, the literature has not yet addressed inter-judge disparity that occurs within guideline ranges. Within-guideline sentences occur overwhelmingly at the bottom of each range—96.6% of judges sentenced at the bottom of the range in the pre-Booker era—meaning that an offender who draws a judge not in conformity with the norm will be subject to inter-judge disparity.^[25]

Figure 1: Sentences for Certain Drug Convictions, D. Neb.

Figure 1,[26] shows the distribution of sentences issued to defendants in the District of Nebraska bearing the same major offender characteristics: statutory sentencing guideline, guideline range (70-87 months), criminal history, acceptance of responsibility, and whether a defendant was subject to and sentenced to a mandatory minimum.[27] This figure shows the dominant practice of sentencing offenders toward the bottom of a guideline range (70 months in this example). The literature has failed to ask whether the offender who was sentenced to 84 months experienced inter-judge disparity despite being sentenced within the calculated guideline range. This Article creates a system of appellate review able to address that question.

In Part II, this Article describes the genesis of the Guidelines. It briefly explains Booker and the Supreme Court’s relevant subsequent decisions before synthesizing the studies that show a substantial increase in post-Booker inter-judge disparity. In Part III, this Article explains the proposal in more detail and suggests a statutory change that would lend effect to the proposal. Part IV describes how the data that the Commission collects can be used 1) to create categories of similarly situated offenders and 2) to discern what sentences were issued to those offenders. This proposal can only work under a relatively simple category formula rather than the more complicated formula proposed by at least one other author for a different reform purpose.

Part V establishes how this method of selecting similarly situated offenders works and provides several examples that display the distribution of sentences among different offenders. In Part VI, this Article discusses both the expected results of this reform on inter-judge disparity and the limits of the proposal. In doing so, it contrasts this proposal with others and shows that this reform is more able to reduce inter-judge disparity. Additionally, this Article undertakes the critical task of showing that the reform is constitutional and that other reforms may not be. This reform is constitutional because it operates outsidethe structure of the Guidelines.

II. Current State of Sentencing

A proper understanding of where we stand now on sentencing policy is incomplete without at least a general understanding of the historical developments in sentencing policy.[28]

A. Genesis

1. The Lead-up to Booker

In early colonial history, sentencing decisions were commonly made by juries, in part because penitentiaries were not yet common. Without prisons, convictions were often linked with one specific sentence—often death—so no rigorous sentencing mechanism was needed.[29] Once penitentiaries became more common, federal law frequently provided fixed statutory sentences[30] but then moved into defining open ranges in which judges could sentence.[31] The pre-defined ranges were indeterminate, and federal law provided for virtually no appellate review, which was unusual among common law countries.[32] This made it almost impossible to develop federal common law on sentencing issues.[33]

In the 20^th century, sentencing law structurally changed in correlation with a change in sentencing philosophy. Early sentencing philosophy was largely retributive.[34] By the middle of the 20^th century, rehabilitative philosophies dominated and inherently required broad discretion.[35] But rehabilitation soon gave way to rationalism and scientific rigor.[36] The ideal of uniformity arrived,[37] and with it judicial discretion was less necessary.[38] A system based on rehabilitation required the careful judgment of judicial actors; a scientifically rigorous, algorithmic system, however, was naturally more retributive and made Congress and the public the experts.[39]

One example showcasing the new paradigm of scientific rigor is the American Law Institute’s attempt to “rationaliz[e] reforms in the area of crime definition” by instituting the Model Penal Code.[40] At the time, the Supreme Court was busy fundamentally reshaping criminal procedure but was only doing so at the adjudication stage, leaving sentencing largely untouched.[41] This made sentencing appear anti-scientific and non-rigorous[42] and invited strong criticism against a system that many viewed as depending on the whims of judges and parole authorities.[43]

During the push for the SRA, scholars argued that sentence variation was partly explained by the temperaments of individual judges and sensitive defendant characteristics, such as race, education, and class.[44] This culminated into a virtual crisis when Marvin Frankel, then a judge on the Southern District of New York and the “father of sentencing reform,”[45] created a questionnaire of hypothetical cases. The questionnaire was mailed to federal judges in the Second Circuit. The results evinced a “glaring disparity” in sentencing practices on identical cases.[46] In eighty percent of the hypothetical cases, judges disagreed on whether incarceration was even appropriate; in one hypothetical extortion case, the most lenient judge would have only applied a three-year sentence compared to the twenty-year sentence of a harsher judge.[47]

The desire to rationalize sentencing and make it scientifically rigorous finally led to the SRA.[48] Only a single voter in the Senate rejected the bill.[49] But by that time, the SRA had morphed into a tool bent on creating utopia. Senator Ted Kennedy, one of the principle architects, had first tried to create advisory Guidelines,[50] but the SRA as passed instead sought to reduce inter-judge disparity through brute force by 1) limiting judges to sentences within the calculated guideline range for each specific offender and 2) granting appellate courts broad appellate review.[51] A sentence outside the calculated guideline range was to be reversed except under a few authorized departure situations.[52] Congress further ramped up enforcement of the Guidelines in 2003—in apparent reaction to higher-than-acceptable departure rates—by creating de novo appellate review.[53] Originally intended to apply to all unenumerated downward departures,[54] the 2003 statute ultimately only made it harder to depart in crimes involving pornography, sexual abuse, child sex, and child kidnapping and trafficking.[55] The SRA turned a process that had been intended to be more humane and acceptable to offenders into one designed to discourage crime and encourage cooperation in the name of uniformity. This model, centered on an ideal of uniformity, was largely rejected by the states.[56]

The SRA created the Sentencing Commission[57] and required it to create a comprehensive, binding[58] sentencing code that controlled essentially all significant sentencing decisions.[59] Congress directed the Commission to create a rigorous and comprehensive system that would “reflect every important factor relevant to sentencing for each category of offense and each category of offender, give appropriate weight to each factor, and deal with various combinations of factors.”[60] In addition to the comprehensive system of sentencing, the Commission included “commentaries” and “policy statements” that delineated the purpose of the Guidelines. The Court interpreted these to be as authoritative as the Guidelines themselves.[61]

The desire to introduce scientific rigor into sentencing certainly found its imprint on the SRA. As written, the SRA requires judges to follow a lengthy process of discerning (by preponderance of evidence) whether any number of voluminous adjustments[62] applies to an offender. A judge first determines which Guidelines category covers the offender before applying numerous “specific offense characteristics” to calculate a “base offense level” score. This process “reflect[s] the Commission’s determination to minimize the need for sentencing judges to exercise their judgment.”[63] The task is often difficult. Categories often cross-reference each other,[64] and a judge may have to compute multiple calculations. After the base offense level is determined, a judge finds the Chapters 2 and 3 adjustments for an offender’s role in the crime, the vulnerability of the victim, etc. Then a judge calculates an offender’s criminal history.

A judge then locates the intersection of the criminal history and offense level scores on the Commission’s matrix chart. Each intersection yields a narrow range where the top generally cannot exceed the bottom by more than twenty-five percent.[65] Still not yet finished, the judge then takes into account several dozen Guidelines policy prescriptions that dictate whether a judge can depart from the calculated guideline range in narrow circumstances.

2. Booker and Its Progeny

This rigorous and binding system survived for eighteen years until United States v. Booker made the Guidelines effectively advisory. The Court reasoned that a Sixth Amendment violation occurs when a maximum sentence is increased based on facts proved only by preponderance to a judge, not beyond a reasonable doubt to a jury.[66] Elements that increase the minimum sentence a court can impose must also be proved before a jury.[67]

In fashioning a remedy, the Court could have simply required prosecutors to present elements to a jury.[68] Instead, the Court, in an attempt to divine Congressional intent,[69] excised from the SRA two provisions. The first required de novo review. The Court found that the SRA, in light of the Court’s constitutional holding, implied a substitute standard of “reasonableness” review.[70] The second provision had made the Guidelines mandatory. Excising that provision, the Court required judges to both consider whether a sentence was reasonable in light of the § 3553(a) statutory factors that Congress had identified as relevant to sentencing and to calculate and consider the Guidelines before sentencing a defendant.[71] Calculating the Guidelines is important because “in the ordinary case, the Commission’s recommendation of a sentencing range will ‘reflect a rough approximation of sentences that might achieve § 3553(a)’s objectives.’”[72]

Booker quickly became a subject of both scholarly and judicial interest. One of the most-cited Supreme Court cases,[73] Booker also spawned a lengthy doctrine.[74] Most importantly, the Court expounded on its promulgation of reasonableness review shortly after Booker. The Court highlighted the close relationship between the Guidelines and the § 3553(a) statutory sentencing factors, holding that appellate courts may adopt a rebuttable presumption that within-Guidelines sentences are reasonable. The whole purpose of the Guidelines was to “seek to embody the § 3553(a) considerations.” As such, the guideline ranges “reflect a rough approximation of sentences that might achieve § 3553(a)’s objectives.” A presumption of reasonableness recognizes that both the Commission and the district court have come to the same conclusion.[75]

Although district courts cannot presume sentences to be reasonable, the Court said that within-range sentences ordinarily need no explanation other than a statement that the court is following the Guidelines rationale, but that greater explanations are often necessary the further a sentence is from the calculated range.[76] (This dicta ignored the potential problem of inter-judge disparity within a guideline range.) Yet the Court later prohibited any type of “rigid mathematical formula” from determining the strength of the justification that should be required for departures. Doing so would “come too close to creating an impermissible presumption of unreasonableness” for non-Guidelines sentences.[77] The worry is that a presumption of unreasonableness could reintroduce the Sixth Amendment problem by making the Guidelines effectively mandatory.[78] The Court explicitly declared that a heightened appellate standard “is inconsistent with” Booker’s abuse of discretion standard,[79] although court’s can “consider the extent of a deviation from the Guidelines” when determining if a sentence is reasonable.[80]

Not yet done with disarming appellate review after Booker, the Court showed just how broad district court discretion is in deciding Kimbrough v. United States. There, the Court permitted district court judges to depart downward not because of any actual conduct or characteristics of the offender, but because of a policy disagreement with the 100-to-1 crack-cocaine ratio outlined in the Guidelines.[81] The Court extended this in Spears v. United States, upholding a court’s decision to not only disagree with the Guidelines ratio but to create its own.[82]

B. Problems Created by Booker

Booker’s return of judicial discretion has generally been welcomed[83] (though some find the case objectionable[84]). The mandatory system was too constraining in its utopian pursuit of an algorithm that could be used to sentence unique offenders. The system weighed too heavily loss values and drug quantities and left little room for judges to provide feedback to the Commission through their sentencing.[85]

But in its zeal to permit wide judicial discretion, the Court unfortunately coupled the return of judicial discretion with a virtual destruction of any meaningful form of appellate review. Not only is the “implied” reasonableness review of Booker a highly unusual standard for district court decisions,[86] but the Court actually removed the central functions of appellate review in order to promote judicial discretion. For instance, the presumption of reasonableness in Rita diverges from ordinary standards of review because it is an optional presumption that appellate courts may make, and Kimbrough, by requiring reasonableness review for policy disagreements with the Guidelines, departs from the ordinary norm that district court legal determinations are reviewed de novo.[87] The Court has essentially created a safe harbor for within-Guidelines sentences (Rita), has prohibited courts from rigorously defining proportional justification requirements for departures (Gall), and permits virtually limitless discretion based solely on judicial philosophy (Kimbrough/Spears).

This last point is even more severe. Although the Kimbrough Court heavily considered (arguably in dicta) that the ratio existed because of congressional inertia and despite the Commission’s public proclamation that the ratio should be changed,[88] the Court later emphasized that sentencing judges are afforded “wide discretion” and may disregard Guidelines policies in areas other than the crack-cocaine ratio, such as when a judge considers post-sentencing rehabilitative conduct during resentencing.[89] The Court has even permitted sentencing courts to not just disregard policy statements, but to substitute their own—at least with regard to crack-cocaine ratios[90]—and courts have read this to authorize vast judicial discretion to disregard Guidelines policy statements.[91] Finally, although Kimbrough hinted that “closer review” might be appropriate for certain policy disagreements,[92] the Court left open whether closer review is ever appropriate.[93] The weight of the doctrine leaves little to be imagined as to the broad scope of a sentencing judge’s discretion.

Furthermore, reasonableness review nominally permits sentences to be struck down for being substantively unreasonable, but the extremely deferential standard of review that the Court has promulgated has ensured that sentences are almost never struck down for this reason.[94] This actually overstates the matter. Many of the sentences that are struck down as substantively unreasonable are considered by other circuits to be procedural, not substantive, error.[95] Procedural error review hardly constrains a judge’s substantive decisions, there being little to prevent a judge from sentencing an offender to the same sentence after the first is remanded. Procedural reasonableness has been interpreted to only require a district judge to show that he or she has correctly calculated the applicable guideline range, has not treated the Guidelines as mandatory, has considered the § 3553(a) factors, and has not relied on clearly erroneous facts.[96] Yet after calculating the guideline ranges, a judge is fully permitted to disregard the Guidelines. The result of all this is a system of wide judicial discretion with limited appellate oversight, which was the major problem with indeterminate sentencing before the SRA.[97]

It is this reduction in meaningful appellate review that appears to have been the primary driver of increases in inter-judge disparity. Booker’s return of judicial discretion certainly had an effect, but the brunt of the increase did not take off until after Rita, Gall, and Kimbrough.[98] The needed destruction of uniformity, coupled with the unwise decimation of meaningful appellate review has created a system in which virtually no parts are oriented toward reducing inter-judge disparity. The only aspects that do still limit inter-judge disparity are the Guidelines’ anchoring effects coupled with judicial inertia.[99] After Booker, there was a fear (or hope) that judges would issue radically disparate sentences.[100] After all, sentencing lengths tremendously increased (and non-incarceration rates significantly decreased) after the SRA and the Guidelines.[101] But the decrease since Booker, while measurable, has been far from severe. The mean and median sentences decreased by three to six months and may actually be on the rise again.[102]

But even inertia and anchoring effects are unlikely to stem the flow of inter-judge disparity. The relative stability of sentences is true only in the aggregate, not across the board. The means and medians are level, but only because firearm and economic sentences increased significantly to counter the significant drop in drug and immigration sentences.[103] Additionally, mandatory minimums cloud the data by making median sentences appear more uniform. This masks the true effect of Booker on sentences where judges are free to disregard the Guidelines.[104] Most notably, even with a high adherence to the Guidelines, geographic disparity and inter-judge disparity have increased measurably.[105] Quite apart from the hard data, even the anecdotal evidence has been tremendous.[106] Professor Frank Bowman concludes that “uncontrolled disparity will again become a rallying cry” because inter-judge disparity “should matter to anyone who believes that one important component of a just system of criminal punishment is that similarly situated offenders are treated substantially similarly.”[107]

One can only expect inter-judge disparity to increase. Immediately after Booker, the vast majority of the judges were used to the mandatory Guidelines regime and—quite apart from the anchoring effect of the Guidelines—may have remained relatively inert in their sentencing practices.[108] Fast forward to 2016, and President Obama has appointed more than one-third of the district court bench.[109] To the extent the inertia hypothesis is correct, inter-judge disparity should increase as pre-Booker judges retire.[110] If one cares about inter-judge disparity, the time for action is yesterday. A more meaningful form of appellate review is necessary to accomplish this.

C. Recent Studies Demonstrating Inter-Judge Disparity

Measuring inter-judge disparity has traditionally been notoriously difficult.[111] The Commission gathers a wealth of data, compiling extensive information for almost all of the 70,000-plus defendants each year. This includes the guidelines calculations, statutory offense category, and thousands of other possible data points. The problem is the Commission removes all identifying information before publicly releasing the data.[112] Even where identifying information is available, studies have traditionally had difficulty controlling for relevant sentencing characteristics.[113] Nevertheless, several notable recent studies have overcome these boundaries to show the increasing significance of inter-judge disparity.

1. Boston Study

Professor Ryan Scott surveyed the data released by judges in the Boston division of the District of Massachusetts. The only district at the time to publicly release its data with judge identifiers,[114] the district opened up a critical insight into a world before unknown. The results are striking. Limiting his analysis to judges who had sentenced at least fifty defendants, Scott found that the three most lenient judges impose sentences that average 25.5 months or less; the other two judges impose sentences that average more than double (51.4 months) the more lenient judges. Not unsurprisingly, four judges now sentence below the guideline ranges at more than tripletheir pre-Booker rates.[115]

2. Nebraska Study

Following the Boston Study and increasing scrutiny regarding inter-judge disparity, the District of Nebraska voted to release its sentencing data to the public.[116] In Nebraska, sentences are more likely to be uniform because judges impose within-Guidelines sentences at rates much higher than the national average, yet one judge sentenced at twice the median rate of a second with regard to drug offenses. Although this effect might be overstated given that the stricter judge was on senior status and had sentenced fewer drug offenders, the difference in median rate between the two active judges with a comparable number of sentences was still twenty-two percent.[117]

3. California Study

The researchers in this study manually collected data by accessing the complaints and dockets in each relevant case in the Southern District of California, allowing the researchers to identify the judges.[118] But rather than look at average or median sentences, the researchers employed a new methodology. They recognized that offenders with somewhat disparate drug quantities can end up with the same Guidelines calculations. For instance, the drug quantity table assigns the same offense level to individuals who deal 15 kilograms of cocaine and to those who deal 49 kilograms of cocaine,[119] yet one would expect judges to sentence the latter somewhat more heavily. The researchers isolated their data so they were looking at offenders who differed only in type and quantity of drugs.[120] This allowed them to use regression analysis to create an “expected” sentence and to measure how far from the expected sentence each judge deviated.[121]

The researchers found “substantial disparity” despite the majority of judges being uniform.[122] Ten out of thirteen judges had low variance from the expected sentence, but two judges were noticeably more lenient (sentencing around 17-18 percent below the expectation) and one was noticeably harsher (sentencing around 30 percent above the expectation).[123] These differences “reflect individual judges’ preferences about the appropriate sentence for the same crime.”[124]

4. Syracuse TRAC Reports

The most comprehensive (and perhaps most controversial) foray into this area was accomplished by Syracuse University’s Transactional Records Access Clearinghouse (TRAC) program.[125] The study compiled most of the sentencing data covering a five-year period that amounts to roughly 370,000 cases nationwide.[126] Because the Sentencing Commission does not voluntarily release data that identifies judges, TRAC gathered this data from other sources, such as the Office of Personnel Management[127] and, evidently, PACER.[128] Like the Boston study, TRAC limited its data to judges who had sentenced at least 50 defendants, and TRAC subdivided the data into categories of judges who had served continuously during the five-year post-Booker span and for judges who had continuously served on active status during that time.[130]

The results were consistent with all the other studies. The median sentences between judges in Dallas, Texas varied between 60 months and 121.5 months; in Fort Worth between 102.5 months and 160 months; in the District of Columbia between 27 and 77 months; and in the Atlanta division of the Northern District of Georgia a total variance of 90 months. Some districts maintained low variance, however.[131]

5. Multi-District Empirical Analysis

The most crucial and developed study comes from Professor Crystal S. Yang, who built upon the work of the TRAC Report and Scott’s Boston Study. In merging the TRAC Report data with sentencing data that the Commission makes available, Yang placed judge identifiers with the sentences of over 400,000 offenders, thus eliminating some of the most serious hindrances to studying inter-judge disparity.[132] The most comprehensive study to date, Yang’s study found that the effect of inter-judge disparity doubled in the post-Booker period.[133]

D. Limitations of These Studies

These studies are not without their flaws, and each included caveats about interpreting the findings too broadly.[134] But each also shows increases in inter-judge disparity; no study has shown an absence of an increase; and the sources of anecdotal evidence are tremendous.[135]

The flaws do not overcome these findings. The first major flaw is that each of the studies assumes that cases are randomly assigned. Recent history indicates potentially significant problems (for the purposes of data collection) in the way cases are assigned.[136] Additionally, random assignment does not extend to prosecutors, who may tailor their conduct to the judges that they draw.

Second, each study limited itself to judges that had sentenced more than fifty defendants, but only the TRAC study further qualified its data by sentence type (though it considered 40 sentences instead of 50).[137] On the surface, this seems like a fine number to choose—after all, introductory statistics books often paint thirty as the magic number[138]—but fifty may be too small. For instance, one judge may have fifty randomly allotted sentencings while another judge may receive a 30-defendant drug trafficking case with lots of mandatory minimums. Yet these sentencing patterns are most likely rare and would largely affect judges who had sentenced relatively few offenders. Many of the judges in the studies sentenced far more than fifty offenders. The Scott study also ameliorated this difficulty by looking at the difference between the sentence imposed and the sentence suggested by the calculated guidelines.[139]

Finally, the most noteworthy objection is that the data that the researchers relied on is either not useful or is unreliable. A random spot check conducted by Federal Defenders of the TRAC Reports revealed several inaccuracies, and TRAC appears to have taken some of its data from documents that are compiled for the purpose of budgetary requests, not sentencing research. To the extent that budgetary documents are framed in a different light, this data may be less reliable.[140] What’s more, the reports that the data is taken from are sometimes filled out by courtroom deputies or probation officers, not judges,[141] and there are often multiple ways for a person filling out the paperwork to represent the same sentencing factor. But no better proposal for measuring inter-judge disparity has been floated. The Commission and Congress consider the data good enough for the Commission to satisfy its duty to research and report on sentencing patterns.[142] Moreover, as explained in greater detail below, the proposal that this Article floats does not depend on the precision of the data. Rather, the data is used as a tool to shift judge’s focus onto the sentencing patterns of their peer judges.[143] Even if the data suffers from reliability issues, it is still sufficient to effect this paradigm-shifting function because the benefit of this Article’s proposal is intended to be felt more in the aggregate than on the individual level.

Thus, there are good reasons to be somewhat skeptical of these studies. Yet the studies’ consistent findings coupled with the knowledge that Booker took away the greatest weapon under the SRA against inter-judge disparity make it easy to conclude with high confidence that inter-judge disparity has become more of a problem in the post-Booker world. It is not an understatement to say that the weight of scholarly authority strongly favors the notion that Booker has caused an increase in inter-judge disparity.

III. A Proposal for Presumed Unreasonableness

The weight of the evidence from Part II shows that inter-judge disparity has increased and will likely grow worse with an increase in the number of judges who never sentenced under the mandatory Guidelines. But a return to a mandatory regime[144] would be an incalculably large mistake. Reducing inter-judge disparity should not be pursued at the expense of unwarranted uniformity. After the mandatory Guidelines era, there is little reason to suggest that sentencing can be pressed into a rigorous, scientific system that places a quantifiable weight on every single relevant sentencing factor.

Instead, reducing inter-judge disparity must be pursued via indirect means that have ancillary effects on sentencing behavior. The problem that caused an increase in inter-judge disparity was not Booker alone; it was the subsequent reduction of any meaningful system of appellate review. Thus, increasing appellate review is the obvious reform, but the details are tricky. Notably, increasing appellate review raises some constitutional issues. A system of appellate review that is 1) tied to the Guidelines and 2) so constraining as to make the Guidelines system mandatory almost certainly violates the Sixth Amendment.[145]

It follows that an effective form of stricter appellate review should be independent of the Guidelines so as to not make them mandatory. But to effectively reduce inter-judge disparity, the appellate review system must also induce judges to provide greater justifications for their sentences and make judges more aware of the sentences imposed by their peer judges. Simply knowing how a judge’s peers sentence should have an anchoring effect on any given judge, and creating a rebuttable presumption of unreasonableness should induce a judge to think more critically about the distinctions between each specific sentence, an effect that would itself reduce inter-judge disparity.

To achieve this, Congress should adopt a statutory amendment requiring appellate courts to presume sentences unreasonable if they fall too far from the median sentence among like offenders. (Part IV explores the methodology that the Commission should adopt in defining categories of similarly situated offenders.) This statute could take the following form and could be added to 18 U.S.C. § 3742 to read:

Presumption of Unreasonableness: A court of appeals shall presume a sentence is unreasonable when it departs too far from the median sentence for like offenders. This presumption can be rebutted if the court of appeals finds that the sentencing judge’s explanation is sufficiently compelling.

The Sentencing Commission shall have the authority to determine what constitutes a group of like offenders and how far a sentence may vary before being presumed unreasonable.

Furthermore, Congress would add a subsection to 28 U.S.C. § 994 to read:

The Sentencing Commission shall promulgate and distribute methods for defining categories of offenders who are similarly situated to the instant defendant. In doing so, the Commission shall detail which of the Commission’s individual offender database variables shall be used, how many years of sentencing data should be included in the sample, what the minimum sample size of similarly situated defendants should be, how to treat Guidelines amendments, and any other factors that would be appropriate to implement the appellate system of review spelled out in 18 U.S.C. § 3742.

Congress could provide the details itself, but it is likely the specific details may be crime-dependent and that the Sentencing Commission is in a better position to tailor specific formulas to specific crimes.

This reform would accomplish two important things: it would avoid constitutional problems by creating an appellate system wholly outside the Guidelines (but which uses data created by the Guidelines system), and it would create a system that has indirect ameliorating effects on inter-judge disparity. By allowing judges to overcome the presumption of unreasonableness, this reform would create a system whereby judges would be required to include more detailed and comprehensive explanations for their sentences. Although the § 3553(a) factors already require judges to avoid unwarranted disparities among similarly situated offenders,[146] this proposal would require an even greater justification when a judge decides to sentence an offender too far from the median.

This Article suggests courts presume a sentence to be unreasonable when it falls within the bottom or top fifteen percent of sentences among similarly situated offenders. This number is exclusive: a sentence length would not be presumed unreasonable for falling in the 90th percentile if the same length also fell within the 84th percentile. The fifteen-percent figure is chosen for illustrative purposes. It is likely that the figure might be too blunt to rectify inter-judge disparity across the board, or it may be that a more robust statistical metric is more appropriate, such as median absolute deviation.[147] This is why this Article proposes the above-mentioned statute, which gives the Commission the authority to fine-tune the percentile. But for the purposes of this Article, the actual number is less important than the way the review process would work.

One benefit of this reform is that it will induce greater awareness in judges of the sentencing practices of their colleagues. The best way to obtain this goal would be to create an interactive system where judges could input the six factors identified as relevant in Part IV and see the corresponding sentences for similarly situated offenders. There are good arguments for prohibiting judges from viewing this information ahead of sentencing. The data may exercise too strong of an anchoring effect or may cause judicial abdication as some judges shield themselves from appellate review by not sentencing in the presumptively unreasonable ranges. But the ameliorative effect of increasing judicial sentencing awareness is more compelling. Even if some judges arbitrarily increase or decrease sentences to avoid appellate review, then the judge will have voluntarily reduced inter-judge disparity, albeit for less honorable reasons.

One concern with this proposal is that it could induce litigation over whether the procedures were correctly followed to define categories of similarly situated offenders. Already, federal appellate courts deal with over 3,000 procedural reasonableness challenges a year.[148] This issue can be avoided by drawing from the habeas playbook. Defendants will not be permitted to appeal unless they are granted a certificate of appealability by a neutral federal authority whose job it is to review each sentence and notify those defendants who are eligible to appeal. Although this would entail some extra work, extra work is inherent in a proposal that calls for an increase in appellate review. Moreover, Part IV shows that calculating whether a sentence is presumptively unreasonable should take no longer than a few mouse clicks on a computer once the necessary software is running.

Permissive appeal is appropriate at least for appeals by offenders. (In order to limit prosecutorial discretion, Congress may choose to not allow permissive appeal where the government is the appellant.) This is because this proposal is designed to indirectly affect all offenders in the aggregate who may be subject to inter-judge disparity. Appeals are simply the vehicle used to create a structure that shifts the sentencing conversation, encouraging judges to be more conscientious in their sentencing habits and educating judges about the sentencing habits of their peers in ways that will mitigate the effects of inter-judge disparity on all offenders.

IV. Methodology for Determining Which Defendants Are Similarly Situated

This proposal operates as a new appellate structure outside the Guidelines, but it depends on Guidelines data to create categories of similarly situated offenders. Fortunately, the Commission’s wealth of sentencing data provides a ready opportunity to create these categories. Within thirty days of sentencing an offender, judges must provide detailed sentencing data to the Commission.[149] Data files of individual and corporate sentences are publicly available back to 2002.[150] The compliance rate is good. Courts gave the Commission every Judgment and Conviction order and 98.8% of the “Statement of Reason” (SOR) forms.[151] Courts are required to use these forms to record the reasons they issued a specific sentence,[152] and these forms are the primary document that the Commission uses to code sentencing data. And although courts were somewhat less compliant with providing Pre-Sentence Reports (handing over about 92%), PSRs are largely used to fill data gaps in the SORs.

Even where researchers have discerned judge’s identities, their studies often suffer from being unable to control for case characteristics that affect sentencing.[153] This Article’s method benefits from not needing judge identifiers, and it allows for control of case characteristics because it uses the Commission’s data, which includes thousands of data points for each individual offender. Among other things, this includes the Guidelines code representing the statute of conviction (e.g. § 2D1.1 for drug trafficking), the presence or absence of the numerous factors that go into calculating a range, the circuit and courthouse of proceedings, and 264 possible reasons that a district court judge can give for departing from a guideline range.[154]

Although meaningful variables that can be used to define categories of similarly situated offenders can be derived from the sentencing data, there are flaws in the data. Notably, they are based on Guidelines that are themselves an inexact tool for categorizing like offenses. Yet there is no better data set, and the immensity of this data allows one to tailor the variable choices to diminish the Guidelines’ internal biases. For instance, the Guidelines arguably place too much emphasis on drug quantity, skewing the comparison between drug and non-drug cases. But the data points allow one to isolate a sample to only include drug offenders, getting rid of this bias.

This section demonstrates the methodological choice for determining which defendants are similarly situated—an area where the Commission can and should be involved. This Article compiles the set of variables to show that there is enough data to meaningfully set offenders apart into like categories.

A. Limitation to Specific Offense Categories

Measuring inter-judge disparity is already difficult; it is made practically impossible if researchers take steps that decimate population sample sizes. The more variable qualifiers that are used to define a category of similarly situated offenders, the smaller the sample size will be. As such, this proposal is limited to a few offense categories where the number of offenses are generally high enough to draw a decent sample size: notably drug, firearm, and white collar cases, which comprise roughly fifty-six percent of the federal criminal docket.[155] Although immigration offenses make up a large part of the federal docket (twenty-nine percent in fiscal year 2015[156]), these are excluded because immigration offenses, despite their number, create comparatively little opportunity for inter-judge disparity because Congress’s authorization of fast-track sentencing nationwide has decreased judicial discretion.[157] All the remaining offense categories only constitute roughly fifteen percent. The population samples with these offenses will rarely be large enough to allow detection of differing sentencing patterns. That this proposal is inapplicable to certain offenses does not dampen the promise of the proposal. This proposal does not seek an all-or-nothing revolution to make all sentences just; rather, it adopts the ideal that sentencing some defendants in a more just manner is better than sentencing none in such a manner. Even still, this proposal works in part by making judges more aware of sentencing patterns within their districts. To that extent, this reform should have positive effects even on offenders who are sentenced to categories that are excluded from the appellate portion of this reform.

B. Sample Size

Certain offense categories are excluded per se due to sample size issues, but even offenses that are included must not be considered if there is no sufficient sample size of like offenders. The default choice often presented in statistics casebooks is thirty,[158] yet this is no less arbitrary than any other number. Some of the proposals dealt with above excluded judges who had not sentenced fifty defendants. But that larger sample size included sentences across different substantive areas of criminal law. Where samples are limited to offenders who are sentenced under the same substantive Guidelines category, such a large sample is not needed.

This Article proposes a sample size of twenty-five similarly situated offenders (as defined in the next subsection). The Commission can further tailor this sample size to specific offenses, or even to specific districts. For instance, the Commission might allow a smaller sample size in districts where there has been a worse history of inter-judge disparity, whether within that sentencing category or within other sentencing categories. This sample size ensures that some measure of sentencing pattern can be drawn in a district. Choosing a larger sample size risks eliminating many smaller districts from consideration. Even this sample size eliminates many districts—although, again, the opportunity for inter-judge disparity is comparatively smaller in districts where less sentencing occurs.

The sample should be taken from the previous three years of offenders for which there is available data. This compromises between a need to reach a sufficient sample size and the recognition that sentencing patterns are dynamically tied to changes in judges, Guidelines, and statutory law. Additionally, this compromise recognizes that the choice of year introduces complexities; Guidelines amendments take effect on November 1 rather than at the beginning of a regular calendar year.[159] Where major amendments have been added to the Guidelines, the Commission can make exceptions or create a formula by which offenders in previous years can be compared to current offenders.[160]

C. Choice of Factors

This Article defines categories of similarly situated offenders as those in the past three available years who have the following identical characteristics[161]:

District of sentencing
Guideline section dictating which offense conduct category was used to sentence the offender[162]
Applicable Guidelines sentencing range in months, reflecting all adjustments minus acceptance of responsibility
Criminal history score
Acceptance of responsibility; and
Application or nonapplication of a mandatory minimum.

Ultimately, because this reform is premised on shifting the location of the conversation, stricter precision—which would require a greater number of variables—is not necessary. Indeed, including more variables would complicate the calculations and decimate sample sizes. Just as a fully detailed map of a city would have to be the size of that city, so too the choice of factors sacrifices some precision for workability.

1. District of Sentence

This reform is oriented toward the narrow purpose of reducing inter-judge disparity, not transforming the entire system. As such, the ideal variable regarding geography should be tailored as much as practicable to include sentences issued by any judge that could ordinarily sentence an offender. It is only between those judges than an individual can suffer from or benefit from inter-judge disparity. This ideal removes judges outside the district in which an offense was committed because those judges would not ordinarily be permitted to sentence the specific offender.[163]

Descriptive and practical reasons favor not limiting geography to specific courthouses. First, offenders in some districts can be randomly assigned to any courthouse within a district.[164] Second, using a variable covering the entire district helps ensure that a sufficiently large population sample can be drawn. Finally, the location of a crime within a district is sometimes as arbitrary as the assigning of an individual offender to a specific judge. It makes no difference that a police officer might find drugs during a routine traffic stop close to the ultimate delivery location in, say, Hartford than merely in route in New Haven or Bridgeport. This criticism is, of course, applicable where somebody is caught in New York for an eventual delivery to Bridgeport, but interstate travel—being a superset of intrastate travel—is necessarily less common.

One could alternatively consider all offenders across separate districts so as to increase the population sample sizes for categories of similarly situated offenders. However, that consideration would be better tailored toward reducing inter-district disparity, which may, but does not necessarily, reduce inter-judge disparity. For instance, the disparity between two districts could be zero, yet inter-judge disparity within each could still be tremendous. Because this Article seeks to reduce inter-judge disparity, attacking that disparity in a more narrowly tailored fashion is appropriate.

Furthermore, some inter-district disparity is healthy to a sentencing regime, even if the SRA has rejected it, whereas there is no affirmative direct benefit to inter-judge disparity. For instance, Guidelines-identical firearms crimes do not have the same level of harm when committed in rural Wyoming than when committed in Manhattan, and it is understandable that two judges might sentence these offenders differently. In fact, it is precisely because not all statutorily identical crimes are actually equal that the Guidelines prescribes ranges in the first place. Moreover, even if the harm of a crime is identical in different districts, the harm to different localities of incarceration changes between districts. The incidental harms of incarceration are arguably worse in areas where the incarceration rate is already high, and federal law has underappreciated in its rush to federalize many issues that used to be within the sole province of the states.[165] Pursuing too vigorously an ideal of inter-district uniformity also removes one of the largest checks on the harshness of the federal system and undermines state policy.[166]

2. Other Factors

The inclusion of the other five factors does not need as much justification. Their relevance is easy to see. For instance, the Guidelines offense category ensures that offenders with similar criminal conduct are compared, and the variable for acceptance of responsibility ensures that offenders with similar culpabilities are compared. (Acceptance of responsibility is not included in the variable for the calculated guideline range.) The researchers in the California Study lumped offenders into the same category if they committed drug trafficking crimes, were subject to mandatory minimums, and had received the safety valve (thus ensuring that they had the same criminal history).[167] This Article essentially takes this as its beginning point but also considers acceptance of responsibility and narrows the population samples to offenders who have the same calculated guideline ranges.[168]

It is the exclusion of other relevant factors that requires justification. Here, too, this Article’s choice reflects a compromise between precision and practicality. The drug quantity table often dictates the same offense level for two offenders who have wildly varying drug quantities (15 kgs of cocaine versus 49),[169] but judges may reasonably sentence such Guidelines-identical offenders differently. One could further narrow the categories of similarly situated offenders by controlling drug quantities more precisely, but this line-by-line comparison of every factor that the Commission collects would do nothing other than to reveal that each crime is unique—a useless exercise when trying to determine which offenders are similar.

This type of comparison would virtually ensure that an appropriate sample size never arises. Moreover, such a precise comparison is not needed. This Article employs an indirect method to reduce inter-judge disparity. By making certain sentences subject to unreasonableness presumptions, this proposal will induce judges to include greater justifications for their sentences and to become more familiar with the sentencing practices of their peer judges. By nature, it is difficult to tell who is a victim or beneficiary of inter-judge disparity simply because there is no baseline “correct” sentence from which to measure a departure. But the strength in this proposal is that it will have an ameliorating effect on practically all sentences by making judges more aware of the sentencing practices of their peers. The only requirement is that the offenders be compared to similar offenders, and this Article’s choice of factors does just that.

No good method exists to ensure greater similarity without diminishing sample sizes. For instance, rather than looking at the Guidelines offense category and the calculated range, as this proposal does, one could start with the offense category and then match every single adjustment. But this would include a voluminous number of variables. For example, the offense category for theft and embezzlement includes nineteen specific offense characteristics.[170]

Instead, this proposal recognizes that the internal logic of the Guidelines asserts that offenders are similarly situated based on the calculated guideline range alone. Because this reform, at its core, accepts the Guidelines logic, Congress could more feasibly adopt it because it departs less from current SRA policy. The conversion of these variables into a common currency is admittedly a legal fiction. In reality, the Guidelines place inordinate weight on certain factors—notably drug quantity and loss amount.[171] But a lot of these biases are contained to specific sentencing categories (such as drug quantity). To get rid of these biases, this Article does not fully accept the assertion of the Guidelines that any two offenders with the same calculated range can be compared. Instead, it accepts the assertion when two offenders have identical guideline ranges for the same offense. This Article further limits the sample with five other elements, which mitigates the internal biases of the Guidelines.

Other variables are omitted in part because even the Guidelines contain no method by which they can be compared. One such area is the hundreds of reasons judges may choose for departing from a guideline range, none of which are quantified and many of which are inherently incomparable. One variable asks whether a sentence “[a]fford[s] adequate deterrence to criminal conduct.”[172] That variable will mean two very different things for different offenders.

Finally, this Article lumps together all offenders that received a mandatory minimum sentence and, separately, all who do not, regardless of the reason the court did not apply a mandatory minimum. This latter choice is made because the Commission treats as equal offenders who were never subject to a mandatory minimum and those who were simply granted a reprieve.[173]

3. The Complexity of Guideline Amendments

One noticeable omission from this Article’s choice of factors is in regard to recent Guidelines amendments. Major Guidelines amendments can make it difficult to compare sentences of offenders over a course of years. For example, the new “drug-minus-two” amendment retroactively dropped the offense levels assigned to the drug quantity table for most non-career offenders.[174]

This proposal begins at a base level of generality and calls for the Commission to complicate the analysis only where appropriate. Considering amendments at the general stage risks overcomplicating this reform to the point where its use is not feasible. For one thing, fashioning a rule for all amendments would be overbroad. For another, amendments take effect on November 1 instead of the beginning of a calendar year. If amendments were taken into account, then additional variables (such as date of sentencing) would need to be included, further complicating this proposal.

This proposal permits ignoring some amendments altogether if the effect of the amendments is not too broad, recognizing that offenders sentenced before and after the amendment will often still be similar. But some amendments, like drug-minus two, have significant impacts that should be taken into account. It is in this area that the Commission should draft special policy statements instructing judges on how specific amendments should be considered. For instance, the Commission could decide that current drug offender sentences should be compared to the equivalent guidelines of past defendants, even though those past offenders carried smaller drug quantities.[175] Or the Commission could split the difference and compare current offenders to those who had one higher offense level.

V. Reform in Action and Lessons Learned from the Data

The previous sections in this Article have been somewhat abstract. To see how the proposal would work in practice, consider the following examples.

A. Lessons and Data from Massachusetts

The three graphs below represent sentencing patterns in the District of Massachusetts in the three-year period between 2011 and 2014. These charts represent, for three different guideline ranges, offenders sentenced in the District of Massachusetts who matched the following variables: they were sentenced under § 2D1.1 (drug trafficking), had criminal history scores of 1, received a three-level reduction in sentence due to acceptance of responsibility, and did not receive a mandatory minimum sentence.

Figure 2: Guideline Minimum: 37 Months, D. Mass.

Figure 3: Guideline Minimum: 46 Months, D. Mass.

Figure 4: Guideline Minimum: 87 Months, D. Mass.

Consistent with earlier remarks on the topic, sentences tend to aggregate toward the Guidelines minimum. In each of these cases, not one offender received higher than the Guidelines minimum—indicating that there may be less inter-judge disparity within the Guidelines in this district. However, some judges frequently sentence elsewhere in a guideline range, meaning that an offender can receive a noticeably harsher intra-Guidelines sentence based entirely on judicial identity. This further establishes the need to develop a system that can deal with disparity that occurs within a calculated guideline range. Furthermore, several spikes are present in each of these graphs, especially in Figure 2 and Figure 4. Both contain a spike at the Guidelines minimum, at 0 (representing a non-prison sentence), and in between. There is also no discernible correlation between the mid-point spike and a recognizable trend in downward departures. One could understand a mid-point spike if it correlated with a reduction in an offense level, but the absence of a correlation with the bottom of a guideline range is evidence for inter-judge disparity outside the Guidelines because it means one or more judges are sentencing at a specific month without discernible input from the Guidelines. For instance, the mid-point spike in Figure 3 is 24 months, but that does not represent a Guidelines minimum or maximum from the sentencing table, and the overwhelming practice of judges is to sentence at the minimum level of a range. Only the spike in Figure 2 correlates with an actual guideline range, but it represents a six-point reduction. Guidelines policies, however, typically provide for two- and three-point reductions. As such, it is entirely possible that this mid-point spike is just as indicative of inter-judge disparity as the other midpoints.

The graphs also showcase how this reform would work. Figure 2 includes no sentences that would be presumed unreasonable because the lowest and highest sentence (0 and 37 months) actually reflect 19 out of the 42 sentences, meaning that there is no category of sentence that falls wholly within the bottom or top fifteen percent. But in Figure 3, the offenders who received sentences of 0, 1, 2,[176] 41, or 46 months would have their sentences presumed unreasonable. Sentencing judges could overcome these presumptions by providing sufficiently compelling justifications, which themselves would have ameliorating effects on the sentences given to other offenders. For Figure 4, any sentence of 0 months or 87 months would be presumed unreasonable.

One thing that can be seen from this exhibition is that heartland offenders are somewhat less likely to have their sentences presumed unreasonable. This proposal is intended to have indirect ameliorating effects on inter-judge disparity and merely does not target heartland offenders because inter-judge disparity for those offenders is much harder to measure.

B. Lessons and Data from W.D. Mo.

Offenders in the same three-year span in the Western District of Missouri matched the following variables: they were sentenced under § 2D1.1 (drug trafficking), had criminal history scores of 1, received a three-level reduction in sentence due to acceptance of responsibility, and did not receive a mandatory minimum sentence.

Figure 5: Guideline Minimum: 30 Months, W. D. Mo.

Figure 6: Guideline Minimum: 37 Months, W. D. Mo.

Figure 7: Guideline Minimum: 46 Months, W. D. Mo.

Figure 8: Guideline Minimum: 70 Months, W. D. Mo.

In each of these four figures, one spike relates to the Guidelines minimum and another relates to the minimum for a two-level reduction from the applicable Guidelines minimum. Thus, unlike the data from the District of Massachusetts, there is evidence that some judges are effectively applying a two-level reduction after calculating the Guidelines, perhaps under a philosophical belief that the Guidelines are two levels too harsh. This is present in all four figures but is most clear in Figure 5 and Figure 6.[177]

No common variable is apparent among the offenders who received de facto two-level reductions. Furthermore, noticeable spikes occur in the data from the District of Massachusetts, yet those spikes do not correlate with Guidelines minimums, so those offenders likely are not receiving de facto reductions. Without an exhaustive search, one cannot rule out the null hypothesis that there is some common variable, but the commonality in spikes between this data and that from the District of Massachusetts is evidence that inter-judge disparity causes some judges to depart from the guideline ranges and that one or more judges in Missouri have simply determined that an appropriate departure rate is two levels.

It is easy to see how presumptive unreasonableness review would work in these situations as well. The sentences for offenders in the four left-most bars in Figure 5 would be presumed unreasonable, as would the 43-month sentence (but not the 30-month sentences, as that sentencing bucket, though above the 85^th percentile also includes percentages below the 85^th percentile). In Figure 6, everybody sentenced to 18 months or less, as well as the individual sentenced to 41 months, would have their sentences presumed unreasonable. Note that in each of Figures 5, 6 and 7 at least one judge deviated from the norm that offenders should be sentenced at the very bottom of a range, indicating possible inter-judge disparity within the guideline ranges

VI. The Proposal’s Benefits

A. Distinctions from Existing Proposals

Before outlining in detail the merits of this proposal and defending its constitutionality, it is worth exploring how this proposal differs from those others that have been suggested and how those proposals suffer from serious weaknesses.

1. Systematic Reforms

Immediately after Booker, some reforms were proposed that are now clearly unconstitutional,[178] but others remain. The Constitution Project Sentencing Initiative, which included then-Judge Alito, proposed that the Guidelines effectively be re-written. The sentencing ranges would be made wider; the adjustments would be found beyond a reasonable doubt by a jury (or admitted to by the offender); and judges would enjoy more permissive departures than under the Guidelines.[179] Judge William Sessions, who served on the Commission from 1999-2010, proposed a similar reform.[180] Sessions believed advisory Guidelines “are by definition unenforceable and thus allow for the emergence of sentencing disparities that motivated many American sentencing reforms in the first instance.”[181] Thus, he pushed for a binding approach that had “sub-ranges” within each of the wider ranges. The wider ranges would be mandatory, and judges would use the presence or absence of aggravating factors to sentence within sub-ranges.[182] Departures “would need to be based on truly extraordinary mitigating circumstances” and would be subject to “relatively strict” appellate review; appellate courts would review the sufficiency of the evidence for whether mandatory aggravating factors were proved to a jury.[183]

The Commission proposed several reforms. First, the Commission asked for a statutory change that would require judges to abide by a “three-step” process: judges would first calculate the Guidelines, consider the Commission’s commentary and policy statements—which are strongly oriented toward preventing departures—and finally consider the § 3553(a) sentencing factors.[184] Second, the Commission asked Congress to codify its policy statements that currently conflict with congressional statutory policy. Congress requires the Commission to “assure the guidelines . . . reflect the general inappropriateness of considering the education, vocational skills, employment record, family ties and responsibilities, and community ties of the defendant.”[185] But Congress also requires judges to consider the “history and characteristics of the defendant,”[186] which has created a “tension” between statutory law and the Guidelines.[187] Finally, and most important, the Commission asked Congress to create a presumption of reasonableness for within-Guideline sentences, require proportionally “greater justification[s]” when sentences are further outside the guideline range (overruling Gall), and require appellate courts to apply a heightened standard of review when district courts disagree with the Commission’s policy statements.[188]

The American Bar Association’s proposal is arguably the simplest. It seeks to adopt the Booker dissenters’ position of requiring elements to be presented to juries.[189] But the ABA proposes the Guidelines be greatly simplified to reduce the number of adjustments, offense levels, and other complex factors.[190] This is necessary because presenting to a jury every single relevant element of an offense can quickly become procedurally unmanageable. A prosecutor might have to allege in an indictment, “in addition to the elements of robbery, whether the defendant possessed a firearm, whether he brandished or discharged it, [or] whether he threatened death,” among other things.[191]

The problems with these reforms are numerous. First, they are poorly suited to attacking inter-judge disparity. Sessions’ proposal and the Commission’s proposal essentially seek to recreate the Guidelines in most of their complex rigor. Amy Baron-Evans and Professor Kate Stith characterize the Commission’s proposal as one meant to “undo the holdings of the Supreme Court in Booker and its progeny and to reestablish the Commission’s guidelines and policy statements as the ‘law’ of sentencing.”[192] Quite apart from their constitutional implications, these proposals are likely to produce serious unwarranted uniformity.

In permitting more discretion, the ABA model does not look as much like the old mandatory Guidelines system; but it, too, does not address inter-judge disparity. Nothing indicates that the ABA system would not be subject to just as great inter-judge disparity as today’s system, yet the SRA was geared toward reducing inter-judge disparity in the first place.

Some of these proposals also operate on constitutionally tenuous grounds. The Court has exercised tremendous influence in the area of reasonableness review under the Guidelines, and it is not necessarily clear that a stricter system would not draw out the Sixth Amendment issue that Booker tried to put away. Several scholars have interpreted the Booker doctrine as creating a delicate balance between judicial discretion and uniformity, a balance that can be easily upset.[193] Baron-Evans and Stith have directly declared Sessions’ and the Commission’s proposals to be unconstitutional.[194] And the Commission’s proposal to require appellate courts to presume within-Guidelines sentences to be reasonable directly contradicts Rita’s holding that such presumptions be permitted out of a “desire to avoid creating a bias for within-Guidelines sentences.”[193] The Commission’s proposal directly creates that bias. The constitutional issue deserves rigorous analysis, yet the proposals fail to do so.

2. Calls for Stronger Appellate Review

Arguably, “Booker left unclear exactly how loose appellate review must be to satisfy the Sixth Amendment.”[194] Some scholars thus argue that Congress should require appellate courts to apply a sufficiency-of-the-evidence standard to judicial fact-finding.[195] Additionally, Professor Stephanos Bibas argues that Congress should create sentencing courts, prohibit defendants from waiving their rights to appeal, and require appellate panels to be politically mixed.[196] Professor Crystal S. Yang argues for courts to require a “heightened justification for more severe departures.”[197]

These advocates are on the right track. The problem with the post-Booker system is the same as the main problem during indeterminate sentencing: the system lacks meaningful appellate review. But these advocates merely put forth off-hand calls for greater appellate review. Their work neither fleshes out what the systems would look like nor defends the constitutionality of these systems, which is critical given that Booker struck down stronger appellate review and other scholars have identified serious constitutional problems with increasing appellate review.[197] Bibas readily acknowledges that Court doctrine bars the sufficiency-of-evidence proposal,[198] and Yang introduces the appellate proposal at the very end of her article. Furthermore, none of these proposals are well tailored toward reducing inter-judge disparity. Yang explicitly limits her form of appellate review to sentences outside guideline ranges, which would prevent her form from dealing with inter-judge disparity that occurs within guideline ranges.

3. Trailing-Edge Guidelines

One proposal stands out among others in that it bears some similarities to this proposal. Professor Mark Osler describes uniformity and discretion as historically antagonistic to each other. He suggests this can be rectified by creating a computer system through which judges enter specific sentencing data in order to see what sentences similarly situated offenders received. This notably would achieve some of the same goals as this system—namely, making judges more aware of the sentences issued by peer judges. Osler also suggests that appellate courts apply a heightened standard of review when sentences stray too far from sentences of peer judges.[199]

In Osler’s system, however, the common sentencing practices of judges would become the Guidelines, and the Commission’s role would be limited to defining mitigating or aggravating circumstances without placing numerical offense levels on those circumstances.[200] While Osler’s proposal has some appeal given that the Commission claims to have created the Guidelines in the first place based on historical sentencing practice,[201] the proposal in this Article pursues a different tactic in response to information derived from a review of the sentencing data.

Foremost, this Article’s proposal permits a greater degree of forward-looking Guidelines modifications. Osler’s model seeks to institute a method that would provide feedback regarding sentencing practices,[202] but it also risks entrenching past sentencing practices by ensuring that previous sentences have a strong gravitational effect on future sentences. This Article’s proposal leaves more room for the Commission to amend the Guidelines in numerical ways, such as the Commission did by reducing the drug quantity table by two levels.

Second, this Article’s proposal more strongly adheres to the ability of the Guidelines to act as an anchor. Osler’s proposal seeks to be more transformational, but by prohibiting the Commission from attaching numerical offense levels to aggravating and mitigating factors,[203] Osler’s model would make it difficult for judges to compare aggravating and mitigating factors with each other because the computer system would show only the end sentence, not how a judge arrived at it. For instance, the current Guidelines dictate that a defendant’s intent to promote terrorism should be weighed six times as heavily as physically restraining a victim,[204] and it is twice as mitigating when an offender is a “minimal” participant than when he is a “minor” participant.[205] This Article’s proposal retains this useful information.

Finally, this Article’s proposal expands on some of the limitations of Osler’s model by synchronizing the basic idea of computer-generated, data-driven sentencing with the actual data itself—something that was beyond the scope of Osler’s initial article. It turns out that the data, while enormous, is not robust enough to apply Osler’s model. As laid out in Part IV, the Commission’s data files shows that, as the number of compared factors increases, the sample sizes decrease exponentially. There is not enough information to meaningfully analyze sentencing patterns when more than a few data points are controlled. For instance, Osler indicates that judges could input the specific offense characteristics associated with a crime into the computer.[206] But the data shows that this method would shrink the population sample beyond usefulness. For instance, the Guidelines category for basic forms of fraud (§ 2B1.1) includes nineteen specific offense characteristics. The presence or absence of these offense characteristics is not random, but assume for the sake of illustration that each characteristic is equally likely. In that scenario, only 1 in 524,288 offenders under § 2B1.1 would exactly match any other defendant[207] before Chapter 3 and 4 enhancements are even factored in. Only about 70,000 defendants are convicted each year for all Guidelines categories, much less for § 2B1.1. Of course, the presence of offense characteristics is not random, but even in if including all those variables would sometimes produce a sufficiently large sample size, including more than a few variables risks overcomplicating the system enough to render it ineffective. Osler’s expansion of the data window to all districts, not just the district the offender is sentenced in, mitigates this problem, but as explained in Part IV.C.1, limiting the viewing window to the district of sentencing more readily targets inter-judge disparity.

Thus, the main departure of this model from Osler’s is that this Article’s proposal does not seek high precision. This proposal only categorizes offenders for the purpose of shifting the sentencing conversation. It is fine if this proposal causes appellate courts to presume sentences unreasonable even if every judge would have imposed the same sentence. This proposal induces judges to ordinarily justify their sentences to higher degrees and encourages judges to become more aware of the sentences imposed by their peers—rather than requiring initial sentences to be crafted based on past sentencing decisions.

B. The Proposal Is Sound Policy

To date, nobody has proposed a reform that accomplishes what the proposal outlined in this Article would. Indeed, only Osler’s is close to targeting inter-judge disparity, which was a central aim of the SRA and continues to frame today’s debates on sentencing.[208] Following Booker, the Guidelines fall short of reducing inter-judge disparity, though they are not completely dissociated from these goals given that they provide an anchoring mechanism that begins all judges on the same page.

Moreover, this proposal addresses the problem without repeating the error that the SRA made in drastically reducing judicial discretion. Until the SRA, broad judicial discretion had been universal, rarely challenged, and explicitly upheld by the Supreme Court.[209] Congress attempted to reduce inter-judge disparity by significantly reducing discretion, but the problem was not in discretion; it was in the absence of meaningful appellate review. When Congress did increase appellate review, it did so only to enforce the mandatory nature of the Guidelines.[210]

As shown in Part II.B, this appellate standard has proven to be ineffectual. Booker implemented an unusual form of appellate court review and then gutted its central aspects to afford district court judges extremely broad discretion.[211] Those appellate forms of review that remain are toothless. District courts can disregard Guidelines policies;[212] substantive reasonableness challenges are almost never successful;[213] and procedural reasonableness review is relatively unchallenging.

As opposed to the proposals that seek a fundamental transformation of the Guidelines system, this proposal instead creates a system that supplements the Guidelines, makes judges more aware of the sentences issued by their peers, and induces judges to provide greater justifications for their sentences. Judges already must “adequately explain the chosen sentence”[214] and do so in a way that is at least somewhat proportional to the amount of departure from a guideline range.[215] But given the Court’s doctrine, “adequate” means little more than “minimal”[216] and does little to reduce inter-judge disparity. This reform goes further by ratcheting up what constitutes an “adequate” explanation.

For example, in addition to requiring higher justifications for sentences in general, this proposal also requires higher justifications for within-Guidelines inter-judge disparity. This goes far beyond what the Court considers adequate for a within-Guidelines sentence. The Court’s requirements explicitly assume away any concern for this form of disparity; the Court has even said that no explanation beyond relying on the Guidelines’ internal reasoning will generally be warranted for these sentences.[217] Congress also fails to recognize the existence of this form of disparity. The statute that covers appellate reversal for failure to provide a statement of reasons only permits a sentence to be remanded if it lies outside the calculated guideline range.[218] Yet most judges sentence at the bottom of a calculated range,[219] causing disparity whenever a judge sentences elsewhere within the range. This form of disparity is particularly harmful given that judges who sentence within a guideline range can rest easy knowing they will almost certainly be affirmed.[220]

This proposal gives teeth to the Court’s nominal requirement that judges adequately explain their sentences. Booker implied the standard of reasonableness review from the SRA (after excising its mandatory portions).[221] This proposal asks Congress to revamp that slightly to put more pressure onto sentencing judges to adequately explain their sentences. As laid out below, increasing the burden on sentencing judges to adequately justify their sentences will have both ameliorative effects on inter-judge disparity and other incidental benefits that reduce general disparity.

1. Direct Effect on Judges

Perhaps one of the biggest criticisms of this proposal is that it does not deal with the problem directly. Individuals in the top or bottom fifteen percent of sentences are not disproportionately likely to be subject to inter-judge disparity. Disparity is just as likely to happen in mine-run cases. As an initial matter, this proposal does have somedirect effect on offenders. Especially where the distribution of sentences for a set of offenders is narrow, the upper and lower fifteenth percentiles will often include some mine-run offenders.

But the indirectness of this proposal is precisely why this proposal should be effective. For many reasons, the formula used in this Article to categorize similarly situated offenders is imprecise. Not only does it use only a few variables, but it also lumps into the same categories people who may not be similarly situated offenders because of prosecutor-driven charge-and-fact bargaining. But this proposal is intended to be felt more in the aggregate than on the individual scale. The SRA took a heavy handed approach trying todirectlyeliminate inter-judge disparity. That created a host of other problems. But when judges provide greater justifications for their sentences, their actions will mitigate inter-judge disparity on all offenders—even if the categories of similarly situated offenders are somewhat imprecise. In targeting harsher and more lenient sentences, this proposal will disproportionately affect harsher and more lenient judges. These judges will either find themselves reversed with great frequency (unlikely)—which will convey important information to these judges—or (more likely) will learn to anticipate appeals of their sentences based on a growing awareness of the sentences issued by their peers and will pay extra attention to justifying their sentences.

Greater awareness of district-wide sentencing norms should ameliorate inter-judge disparity. But even if a few stray judges take pride in intentionally defecting from sentencing norms, increased appellate review will require these judges to provide more compelling explanations. That requirement alone should decrease unwarranted inter-judge disparity. And should appellate courts repeatedly come cases appealed from the same district court judges, one would expect appellate courts to exercise greater skepticism in reviewing those sentences.

2. Minimal Effect on the Anchoring Properties of the Guidelines

The anchoring effect of the Guidelines remains one of the strongest tools against disparity, so one possible criticism of this proposal is that this proposal will steer the focus of litigation onto the new appellate system instead of the Guidelines, further reducing the unifying effects of the Guidelines. But this proposal is sufficiently limited to avoid those concerns. By prohibiting appeals until neutral federal authorities determine a defendant is eligible to appeal and authorities grant defendants a certificate of appealability,[222] sentencing policy can ensure that the Guidelines maintain their current locus in federal sentencing policy.

3. Effect on Non-Judicial Sources of Disparity

This reform not only brings federal sentencing practice more in line with the central purpose of the SRA; it also carries ancillary benefits. Most notably, it can put a dent in unwarranted disparity caused by non-judicial actors. Judges are responsible for relatively little sentencing disparity compared with prosecutors.[223] But reducing inter-judge disparity can have collateral effects on prosecutor-driven disparity.

Prosecutors have numerous avenues to create disparity. In determining what charges to bring and whether to stipulate to certain facts, prosecutors can significantly frame an offender’s record in relation to the actual offense. Prosecutor stipulations have increased since Booker, as have prosecutors’ demands that offenders waive their rights to appeal in plea bargaining.[224] Prosecutors are all over the map in their decisions to pursue mandatory minimum convictions. Districts ranged from zero percent of eligible defendants to three-quarters of eligible defendants being charged with mandatory minimums.[225] Prosecutors also control whether sentencing enhancement or reduction motions are filed: they have the sole authority to file § 851 motions of information that double an offender’s mandatory minimum sentence, and the vast majority of departures for substantial assistance are prosecutor-driven.[226]

The sources of prosecutor-driven disparity are voluminous, and a tailored reform like this Article’s is incapable of addressing all of them. However, this reform can address some. Although judges are often assigned to cases randomly, prosecutors are not,[227] which means prosecutors can tailor their conduct toward judges that are assigned a case. It is this disparity source that this reform can mitigate. For instance, prosecutors can often file superseding indictments to change the charges after a judge has been assigned.[228] Prosecutors who draw harsher judges will be able to press for sentences that would be scoffed at by others. And because prosecutors are repeat players, they will have incentives to tailor their conduct accordingly to help ensure beneficial future interactions.[229] But when a district court judge’s sentencing decisions are constrained not by the force of guideline ranges but by the need to construct a compelling explanation justifying a significant deviance from the median, prosecutors will have less influence on these judges.

4. Further Positive Effects

The TRAC Reports may be criticized for numerous reasons, but one is that they expose the identities of various judges. Opinions vary on whether judges need identity protection given their Article III protection,[230] but this reform benefits from being more politically feasible. It does not step on the toes of those who would rather keep judge identifiers out of public hands.

Furthermore, to the extent that many judges are largely unaware of the sentences issued by their peer judges,[231] judges on senior status are even more at risk of being unaware of district-wide sentences. Judges on senior status are able to tailor their dockets,[232] which means that some of them are less aware of the sentences that the Guidelines advise for offenses other than those a judge routinely deals with. Some of these senior judges may find themselves incrementally moving in a more lenient or harsh direction. This proposal makes this gradual transition less likely.

Additionally, recognizing that inter-judge disparity is actually a relatively small part of the disparity equation, this proposal has the benefit of not aggravating other sources of disparity. The same cannot be said of some other proposals. For instance, mandatory minimums are a significant source of disparity; therefore, proposals that increase prosecutorial power—like the ABA proposal does—make disparity worse.[233]

5. The Limits of the Proposal

Not every subpart of a system must be oriented toward the system’s overall goals; therefore, this proposal is limited to ameliorating inter-judge disparity—in a way that avoids impinging on the gains that Booker made[234] while attacking the inter-judge disparity that Booker introduced.

As previously established, this proposal makes some compromises in methodology choice. Whether different offenders are similarly situated will always be subject to debate. The definition of similarly situated offenders must be narrow enough to actually make the offenders similar, but it also must be broad enough to ensure a meaningfully large sample size. But this proposal does not require high degrees of precision. It is not the appeal that is important in this proposal. The point of this proposal is to use the appeal to induce greater justifications from judges and to make judges more aware of district-wide sentencing practices.

But this does make the proposal somewhat inequitable. The ameliorative effects of this proposal will be shared by all, but a fifteen-percent-subset will receive the additional benefit of collaterally attacking their sentences, and another fifteen-percent-subset will bear the burden of defending their sentences. This inequity is mitigated somewhat because this reform should largely lead to a shift in the sentencing conversation, not a huge increase in overturned sentences, but some vestiges of inequity will remain. These costs are acceptable though because they enable society to pursue worthwhile, broadly shared benefits.

Sample size constraints not only mean that this reform is unavailable to certain offense categories; it will also often be unavailable in smaller districts. There is less opportunity for inter-judge disparity in those districts, if only because there are fewer offenders in the first place, but the small number of judges can also create larger rates of inter-judge disparity if, say, three judges have widely divergent philosophies. The Commission can diminish this problem by tailoring this proposal in a way that allows for smaller sample sizes in districts that have a history of worse inter-judge disparity.

The final and most important limitation of this proposal is its reliance on data that the Commission collects. This data largely comes from a Statement of Reasons form that judges are supposed to fill out after sentencing an offender and must submit to the Commission within 30 days of a sentence.[235] When information is left off the form or is otherwise unavailable, the Commission pulls data from the Pre-Sentencing Reports compiled by the probation officers.[236] This can introduce inconsistencies. Some judges may be more conscientious about filling out these forms than others.

Former-Judge Nancy Gertner has declared the entire data set nearly worthless for measuring inter-judge disparity. She claims the SOR forms are too “simplistic” and were designed only to help the Commission “monitor Guideline enforcement, principally departures.” They are also sometimes filled out by courtroom deputies or probation officers.[237] The sentencing data shows that judges (or the Commission) also differ on how the same sentencing factors are coded. The vast majority of judges who find there to be no applicable mandatory minimum on the statutory form appropriately report the “statutory minimum” on the SOR as “0.” However, out of the more than 10,000 drug trafficking cases in 2014 where the judge reported there to be no applicable mandatory minimum, the data for more than 800 offenders inexplicably listed a non-zero value for the variable that shows a “statutory minimum.”[238] The data for one offender showed he was subject to a statutory minimum of life but included a contradictory variable that indicated that “no count of conviction carries a mandatory sentence.” Despite this, another variable stated that a “mandatory minimum sentence [was] imposed.” Additionally, the data for 229 offenders reported that no counts of conviction carried a mandatory sentence but that the judges had somehow departed from mandatory minimum sentences.[239]

Gertner’s analysis may be overstated. Her criticisms derive from her experience on a court that may be unrepresentative of national practice. Further, the Commission takes measures to mitigate data inaccuracy. Biannually all data is compared with the sentencing datafile held by the Administrative Office of the U.S. Courts to ensure accuracy, and sentences that fall outside a guideline range are reviewed by hand.[240]

Even accepting all of Gertner’s criticism, the data remains the best source for defining categories of similarly situated offenders, and the Commission uses this data for its congressionally authorized research purposes.[241] The shortcomings are relatively rare in a scheme where more than 70,000 offenders are sentenced each year. Especially considering that high precision is not necessary for this proposal, data that is good enough for the Commission still ought to be good enough for this proposal, even if it is less than ideal.

C. The Proposal Is Constitutional

The critical weakness in many of the proposals considered above is that there are good arguments that they are unconstitutional and that none of the proposals sets out to defend itself on these grounds.[242] The constitutional concerns are two-fold. First, the Court may have set out a delicate balance: the Court promoted judicial discretion to avoid the Sixth Amendment issue but also put in place a gutted form of appellate review to promote some uniformity.[243] If, as some say, this balance is refined and delicate, then reducing judicial discretion to issue a non-Guidelines sentence might be constitutionally problematic.[244] The second point of concern is that a constraint, even if it is not tied to the Guidelines, might still render an indirect effect on the Guidelines that makes them effectively mandatory.

If a proposal reduces a judge’s discretion to avoid a non-Guidelines sentence (the first point of concern), then that proposal must wade through the Booker progeny. The case friendliest toward altering the forms of review is Booker itself. In drawing up a remedy, the Booker majority—which included Justice Breyer, an architect of the original Guidelines from his days on the First Circuit[245]— explicitly sought to divine what Congress would have intended in light of the Court’s constitutional holding.[246] The Court found that Congress had implied a reasonableness review from Congress’s decision to orient the SRA toward promoting uniformity.[247] The Court made this clear when it declared “[o]urs is not the last word: The ball now lies in Congress’s court.”[248] It seems clear that Congress has at least some flexibility to alter the Court’s abuse of discretion reasonableness standard, and the Court’s later cases are essentially interpretations of the “reasonableness review” that it found that Congress had statutorily created.

But the majority of movement has occurred on the bench, not in Congress. In Gall, the Court prohibited courts from adopting “rigid” mathematical rules that required lower courts to justify their sentences in proportion to degree of departure. Doing so would come “too close to creating an impermissible presumption of unreasonableness for sentences outside the guideline range.”[249] The Court also prohibited a heightened standard of review for sentences lying outside the guideline range.[250] But unlike in Booker, the Court dropped its deferential language, making it unclear how much of the holding was constitutionally necessary and how much was just a common law exposition on the reasonableness review that the Court implied from the SRA in Booker. In Rita, the Court permitted courts of appeals to presume a sentence within the Guidelines is reasonable.[251] Because the presumption was non-binding, the Court rejected the Petitioner’s argument that “plac[ing] an additional burden on the district court to justify non-Guidelines sentences” is impermissible.[252]

Taken together, the Court permits presumptions if they do not bind judges to the Guidelines (Rita) but not if there is a sufficient risk that they will (Gall). But more importantly, the Court seems to care tremendously about whether an appellate standard creates presumptions that depend on whether the sentence is inside or outside the Guidelines. In Rita, the presumption was only applied because it was a Guidelines sentence, and in Gall, the Court prohibited an appellate court from applying a rigid proportionality justification or a heightened form of review because the appellate court determined the sentence was outside the Guidelines.

This is good news for this proposal, which has the distinct benefit of not tying itself to the Guidelines. This proposal ignores whether a sentence falls in or outside a guideline range. Courts of appeals would not be required to apply a heightened form of review because a sentence fell inside or outside the Guidelines. And given the common, but not universal, practice of judges to sentence at the very bottom of a guideline range, this proposal would implicate sentences both within and outside the guideline ranges. Thus, this proposal could exist even in the absence of the Guidelines, making it immune from the direct constitutional concerns in the Booker progeny. To be sure, this Article uses Guidelines-constructed data to create categories of similarly situated offenders, but that is only because there is no better data. The use of the data itself does not directly discourage judges from departing from the Guidelines.

This proposal also passes constitutional muster under the second concern: whether the reform indirectly affects the Guidelines in a way that makes them essentially mandatory. Admittedly, this proposal draws some parallels to the de novo form of review that the Court excised in Booker, but those parallels are shallow. Section 3742(e) explicitly directed an appellate court to review whether a sentence departed from a guideline range in a way not authorized by the Guidelines.[253] This reform does not care whether a sentence is inside or outside the Guidelines. And although any form of stricter appellate review constrains judges somewhat, Congress is constitutionally capable of increasing constraints on judges. To be sure, reasonableness review as interpreted in the Booker progeny is extremely permissive, but Booker does not require such review. The choice of review forms “now lies in Congress’s court.”

It is for this reason that a slightly altered version of the second constitutional concern is not problematic. The main Apprendi problem underlying Booker’s holding is that it violated the Sixth Amendment to compel judges to sentence based on factual findings that the judges were required to make. Under this Article’s proposal, the effective range in which judges could sentence would be narrowed slightly. And although Gall expressly permits appellate review of sentencing variation, it seems to reject almost any meaningful attempt to measure variance.

Such a limitation on judicial discretion cannot be per se unconstitutional. To take an extreme example, Congress could eliminate judicial discretion entirely by creating a five-year mandatory minimum and maximum for all felonies (assuming away Eighth Amendment issues).

Limiting judicial discretion somewhat based on facts found by a judge should likewise be permissible. Indeed, the whole scheme of reasonableness review (especially substantive reasonableness) expressly permits judges to do this. When a court of appeals finds that a district judge’s sentence was substantively unreasonable, the court has in effect declared that the judge’s sentence was bounded by judicially found facts. Even if the Court has carefully crafted a balance of reasonableness review that Congress could not deviate from, Booker states that Congress could do away with reasonableness review altogether. The Court’s holding in Rita that a presumption of reasonableness is not unconstitutional in part because it is not binding is instructive here. While this Article’s proposal constrains judges somewhat, it only does so to the extent that it requires greater attention to explanations of sentences. It does not actually bind judges to the Guidelines, nor does it consider them at all (except as a convenient data point for comparing the sentences of like offenders).[254]

The strongest argument against this proposal is that any reduction in the range of sentences that a judge can impose is unconstitutional—that is, a court can only review sentences for procedural reasonableness. While some scholars support this position,[255] the Court rejected this argument 7-2 when Justice Scalia (joined by Justice Thomas) promulgated it,[256] and it has failed to gain traction since. But this proposal need not even be categorized as a substantive constraint. It can be more readily categorized as a procedural constraint that all the Justices (including the late Justice Scalia) approve of. Some lower courts would arguably construe this constraint as substantive,[257] but the Court’s holding in Gall explicitly describes as “procedural error” a judge’s “failing to adequately explain the chosen sentence.”[258] Boiled down, this Article’s proposal merely asks Congress to define what is “adequate.”

VII. Conclusion

Studies consistently show a significant increase in inter-judge disparity after Booker. While Booker removed many of the worst qualities of the Sentencing Reform Act, it also removed the aspect of the SRA that most effectively reduced inter-judge disparity. Cases subsequent to Booker have gutted any meaningful appellate review, which has left the federal sentencing system inept at controlling inter-judge disparity. And we can only expect that form of disparity to grow worse as newer judges join the bench.

Scholars have floated many proposals to change the Guidelines, but these have been poorly tailored to reducing inter-judge disparity. Many of them also make the same mistakes that the SRA did, and they have neglected to address serious constitutional questions raised by their proposals. Instead, reducing inter-judge disparity requires not a new set of Guidelines, but a heightened form of appellate review, and the Commission’s data set provides a terrific opportunity for creating a form of review unimaginable at the time of the SRA. This Article is the first to take a serious attempt at operationalizing a form of appellate review that targets inter-judge disparity.

Appendix

All the data that was reviewed for the purposes of this Article is publicly available on the Commission’s website, and it is this data that should be used to create categories of similarly situated offenders for this reform.[259] Courts are required to provide the Commission with detailed data regarding every single sentence within thirty days of that sentence.[260] The Commission then strips this data of identifying information (e.g., judge’s and offender’s identities, case number) but still leaves up to thousands of different variables regarding the sentences. This comes from a number of sources, including the SOR form, the Pre-Sentence Report (if necessary), the Judgment and Conviction Order, and in some instances the plea agreement.[261]

The data is stored in an ASCII “.dat” file format and must be extracted using SPSS or SAS software. SPSS extraction is superior because many of the Commission’s variables are not available using SAS extraction, including potentially relevant factors such as the offender’s age, sentencing date, the loss amount, the date an offense of conviction began and ended, and (critically) whether a non-incarceration sentence was issued.[262]

After extracting the data, the relevant variables were segregated in order to create a file of manageable size. Then appropriate sentencing characteristics were isolated to find sentencing patterns. The following are the variables that were isolated:

CIRCDIST: the district in which the offender was sentenced.
GDLINEHI: the sentencing guideline, showing the section of the Guidelines under which the offender’s highest range was calculated.
XCRHISSR: the criminal history score of the offender.
GLMIN and GLMAX: the calculated guideline range, including all chapter adjustments but excluding acceptance of responsibility.
STATMIN: the existence and amount of any applicable mandatory minimum.
ACCTRESP: the existence, and offense level reduction, attributable to acceptance of responsibility.
MAND_: whether a mandatory minimum was applicable and was actually applied.
SENTTOT0: The total monthly prison sentence, including probation as “0” values.
SENTMON & SENTYR: the month and year of sentencing.
REAS_: The reason(s) given for a sentence imposed outside the range.
BOOKERCD: The generalized description (12 possible values) for whether a sentence fell outside the range and for what generalized reason.[263]

After isolating the variables, the data was filtered into common factors. Similarly situated offenders were defined as those who were sentenced in the same district (CIRCDIST), under the same Guidelines section (GDLINEHI), with the same criminal history calculation (XCRHISSR), whose calculations fell within the same range (GLMIN and GLMAX), who had the same status and applicable offense level reduction for acceptance of responsibility (ACCTRESP), and who either had a mandatory minimum applied or who did not (MAND_).[264]] The remaining variables (e.g., REAS_) were used to make sure that disparities and spikes in the sentencing data were not explainable by some other readily identifiable factor. For instance, after discovering consistent twin spikes in Figures 5-8, the REAS_ factors were reviewed, which come in almost 300 varieties,[265] to make sure there was no discernible pattern suggesting that one or more factors were contributing to disparately spiked sentences. Although a judge could list dozens or even hundreds of reasons for a departure, judges overwhelmingly issued either no reason at all or just one.

[1] Law Clerk to the Honorable William H. Pryor, United States Court of Appeals for the Eleventh Circuit, Acting Chair, U.S. Sentencing Commission. Yale Law School, J.D. 2016. I am grateful to Kate Stith, Daniel Kelly, Marah McLeod, Derek Muller, and Mark Osler for their conversations or insightful comments on earlier drafts.

[2] Several courts use cards to randomly assign judges to cases. E.g., D. Minn. Order for Assignment of Cases 7 (Dec. 1, 2008).

[3] Richard G. Kopf, Judge-Specific Sentencing Data for the District of Nebraska, 25 Fed. Sent. R. 50, 51 app. tbl. 7 (2012). One of these judges was a senior judge and may not have had a random sampling of cases, but the average difference between active judges was still 22% in drug cases. Id.

[4] 543 U.S. 220 (2005).

[5] Kate Stith & Jose A. Cabranes, Fear of Judging: Sentencing Guidelines in the Federal Courts 104 (1998). Certainly “[e]liminating unwarranted sentencing disparity was the primary goal of the Sentencing Reform Act,” even if not inter-judge disparity in particular. U.S. Sent’g Comm’n, Fifteen Years of Guidelines Sentencing 79 (2004) [hereinafter Fifteen Years of Guidelines Sentencing].

[6] Ryan W. Scott, Inter-Judge Sentencing Disparity after Booker: A First Look, 63 Stan. L. Rev. 1 (2010) (noting that the SRA was intended to drive away sentencing disparity created by “the philosophy, politics, or biases of the sentencing judge.”).

[7] 18 U.S.C. § 3553(b) (2012).

[8] See, e.g., Stith & Cabranes, supra note 4, at 51-60, 68, 82, 94, 98, 115.

[9] Paul J. Hofer, Data, Disparity, and Sentencing Debates: Lessons from the TRAC Report on Inter-Judge Disparity, 25 Fed. Sent. R. 37, 40 (2012) (noting that Guidelines departures are sometimes necessary to prevent disparity).

[10] Fifteen Years of Guidelines Sentencing, supra note 4, at 97-99 (“The federal sentencing guidelines have made significant progress toward reducing disparity caused by judicial discretion.”); James M. Anderson, Jeffrey R. Kling & Kate Stith, Measuring Interjudge Sentencing Disparity: Before and After Federal Sentencing Guidelines, 42 J.L. & Econ. 271, 294 (1999) (finding evidence that interjudge disparity fell 36% in a six-year span following the SRA). But see id. at 98 (showing that judicial disparity actually increased for robbery and immigration offenses); Stith & Cabranes, supra note 4, at 106 (arguing that inter-judge disparity was exaggerated before the SRA).

[11] Frank O. Bowman, III, Dead Law Walking: The Surprising Tenacity of the Federal Sentencing Guidelines, 51 Hous. L. Rev. 1227, 1266 (2014).

[12] Jeffery Ulmer, Beyond Disparity: Changes in Federal Sentencing after Booker and Gall, 23 Fed. Sent. R. 333, 335 (2011); Frank O. Bowman, III, Prolegomenon on the Status of the Hopey, Changey Thing in American Criminal Justice, 23 Fed. Sent. R. 93, 94 (2010) (reporting that within-Guidelines sentences dropped from 70.9% before Booker to 54.8% in FY2010).

[13] Bowman, supra note 10, at 1261, 1266-67.

[14] Scott, supra note 5, at 5, 33 (reporting, among other things, that several judges in the post-Booker period depart three times as often as several others).

[15] Kopf, supra note 2, at 51.

[16] Id. at 50.

[17] See Nancy Gertner, Supporting Advisory Guidelines, 3 Harv. L. & Pol’y Rev. 261, 270 (2009) (identifying adherence to the Guidelines post-Booker as the result of “habits ingrained during twenty years of mandatory Guideline sentencing”).

[18] E.g., William K. Sessions III, At the Crossroads of the Three Branches: The U.S. Sentencing Commission’s Attempts to Achieve Sentencing Reform in the Midst of Inter-Branch Power Struggles, 26 J.L. & Pol. 305 (2011).

[19] U.S. Sent’g Comm’n, Results of Survey of United States District Judges January 2010 Through March 2010 tbl. 19 (2010), http://www.ussc.gov/research-and-publications/research-projects-and-surveys (reporting that more than three-quarters of judges believe the advisory system “best achieves” the § 3553(a) sentencing goals and only three percent of judges believe the mandatory Guidelines were better).

[20] Alexander Bunin, Reducing Sentencing Disparity by Increasing Judicial Discretion, 22 Fed. Sent. R. 81, 82 (2009).

[21] Kopf, supra note 2, at 50.

[22] 28 U.S.C. § 994(w) (2012).

[23] United States v. Booker, 543 U.S. 220 (2005).

[24] E.g., Amy Baron-Evans & Kate Stith, Booker Rules, 160 U. Pa. L. Rev. 1631, 1721-29, 1731-36 (2012) (arguing that there are constitutional issues with Congress legislatively creating a presumption of reasonableness); Carissa Byrne Hessick, A Critical View of the Sentencing Commission’s Recent Recommendations to “Strengthen the Guidelines System”, 51 Hous. L. Rev. 1335, 1359 (2014) (explaining Booker as striking a balance between district court discretion and fealty to the Guidelines, which creates “a real possibility that the Court might determine that [appellate review] proposals, if adopted, alter the Booker remedy to such an extent that it no longer fixes the Sixth Amendment problem”).

[25] Of 911 judges who sentenced at least 10 defendants between 1999 and 2001, sentencing at the bottom of the guideline range was the standard practice for an astounding 880 of judges. Fifteen Years of Guidelines Sentencing, supra note 4, at 109.

[26] In order to reduce the complexity of the horizontal axis, this Article uses bins to organize the data. Thus the “bin” labeled “70” represents all defendants who received sentences greater than 68 months and less than or equal to 70 months. All defendants listed here received 70-month sentences.

[27] Because the Guidelines instruct judges to disregard mandatory minimums where an offender has substantially assisted the prosecution or is eligible for a safety valve, this Article analyzes all offenders together who were not issued a mandatory minimum.

[28] For an extended analysis of this history, see Stith & Cabranes, supra note 4, at 9-77, or Nancy Gertner, A Short History of American Sentencing: Too Little Law, Too Much Law, or Just Right, 100 J. Crim. L. & Criminology 691 (2010).

[29] Gertner, supra note 27, at 693-94.

[30] Note, The Admissibility of Character Evidence in Determining Sentence, 9 U. Chi. L. Rev. 715 (1942).

[31] Apprendi v. New Jersey, 530 U.S. 466, 481 (2000).

[32] Id. at 695.

[33] Id. at 23.

[34] Gertner, supra note 27, at 694.

[35] United States v. Mueffelman, 327 F. Supp. 2d 79, 83 (D. Mass. 2004) (Gertner, J.) (describing historical sentencing practices and declaring the judge’s role in rehabilitative sentencing to be “almost like a doctor or social worker exercising clinical judgment”).

[36] Id. at 11-13, 23.

[37] Michael M. O’Hear, The Original Intent of Uniformity in Federal Sentencing, 74 U. Cin. L. Rev. 749, 751 (2006).

[38] Stith & Cabranes, supra note 4, at 28-30.

[39] Gertner, supra note 27, at 691.

[40] Stith & Cabranes, supra note 4, at 29-30.

[41] Williams v. New York, 337 U.S. 241 (1949) (rejecting argument that judge’s consideration of a Pre-Sentence Report, consisting of information provided by witnesses not available for cross-examination, violated the defendant’s due process rights).

[42] Stith & Cabranes, supra note 4, at 29-30.

[43] Id. at 21; see also Marvin E. Frankel, Criminal Sentences: Law Without Order 5 (1973); Anderson, supra note 9, at 274 (regarding disparity as “that variation caused by the identity” of specific judges). See generally Kenneth Culp Davis, Discretionary Justice: A Preliminary View (1976).

[44] Stith & Cabranes, supra note 4, at 31.

[45] Id. at 35.

[46] Id. at 31.

[47] Anthony Partridge & William B. Eldridge, Fed. Judicial Ctr., The Second Circuit Sentencing Study: A Report to the Judges of the Second Circuit 6-7 (1974).

[48] See O’Hear, supra note 36 at 751 (“[C]ritics argued in the 1970s that rehabilitation was an uncertain concept that might be misused as cover for irrational and inhumane practices.”)

[49] Stith & Cabranes, supra note 4, at 43.

[50] Id. at 41 n.21.

[51] 18 U.S.C. § 3742(a) (2012).

[52] 18 U.S.C. § 3742(f)(2) (2012).

[53] PROTECT Act Pub. L. No. 108-21 § 401(d)(2), 117 Stat. 650, 667-68 (2003) (amending 18 U.S.C. § 3742(e) to grant appellate courts de novo review over “the district court’s application of the guidelines to the facts”).

[54] H. Amend. 19 to H.R. 1104, 108th Cong. (2003) (Rep. Feeney).

[55] PROTECT Act, Pub. L. No. 108-21, § 401(a), (b), 117 Stat. 650, 667-68 (2003) (codified as amended at scattered statutes of 18 U.S.C. and 42 U.S.C.) (permitting only those departures listed in U.S.S.G. § 5H, not those departures recognized solely in U.S.S.G. § 5K).

[56] O’Hear, supra note 36, at 816.

[57] 28 U.S.C. § 991(a) (2012). Notably, the Commission initially only had one person who had ever actually been involved in sentencing. Stith & Cabranes, supra note 4, at 49.

[58] Judges were prohibited from departing from the calculated guideline range except in rare and exceptional circumstances. 18 U.S.C. § 3553(b)(1) (2012) (excised in United States v. Booker, 543 U.S. 2202 (2005)); U.S.S.G. §5K1.1; id. at § 5K2.0.

[59] 28 U.S.C. § 994(a) (2012). The Act also abolished parole. Kate Stith & Steve Y. Koh, The Politics of Sentencing Reform: The Legislative History of the Federal Sentencing Guidelines, 28 Wake Forest L. Rev. 223, 226-42 (1993) (examining the abolition of parole and the many proposed, but never enacted, statutory provisions that would have kept some indeterminate parole). Parole had been in place since 1910. Act of June 25, 1910, ch. 387, 36 Stat. 819.

[60] S. Rep. No. 225, 98th Cong., 1st Sess. 168 (1983).

[61] Stinson v. United States, 508 U.S. 36, 41-47 (1993) (holding commentary statements to be authoritative); Williams v. United States, 503 U.S. 193, 201 (1992) (holding policy statements to be authoritative).

[62] Among many others, these include whether a gun was possessed or fired, the amount of fiscal loss, the quantity of drugs, whether a burglared building was a residence, and whether there was bodily injury. U.S.S.G. ch. 2.

[63] Stith & Cabranes, supra note 4, at 69.

[64] U.S.S.G. § 1B1.5 (2015); U.S.S.G. § 2K2.1 (felon-in-possession of firearms guideline, instructing the judge to apply § 2X1.1—which covers attempts, solicitations, and conspiracies—if the possession was in connection with another offense and the resulting sentence is greater).

[65] 28 U.S.C. § 994(b)(2) (2012); U.S.S.G. ch 1, pt A, 2 (2015).

[66] United States v. Booker, 543 U.S. 220, 231 (2005). Because the Guidelines were mandatory before Booker, the district court was statutorily limited to imposing a sentence within the 210-262 month calculated range based on what the jury had found (minus certain statutory exceptions not applicable). But the district court exercised judicial fact-finding to increase the offender’s sentencing range to 360 months to life.

[67] Alleyne v. United States, 133 S. Ct. 2151, 2162-63 (2013).

[68] See, e.g., Booker, 543 U.S. at 325 (Thomas, J., dissenting in part).

[69] Id. at 246 (seeking to determine “what ‘Congress would have intended’ in light of the Court’s constitutional holding”).

[70] Id. at 261.

[71] Id. at 245.

[72] Kimbrough v. United States, 552 U.S. 85, 109 (2007).

[73] Courts have cited Booker more than 29,000 times compared to the just under 15,000 citations of Chevron, U.S.A., Inc. v. Nat. Res. Def. Council, Inc., 467 U.S. 837 (1984) according to WestlawNext-recognized citations (last accessed Jan. 20, 2017).

[74] As of the end of 2016, the Court has heard a number of cases expanding on or related to Booker: Molina-Martinez v. United States, 136 S. Ct. 1338 (2016) (holding that a defendant need not establish that unpreserved error affected the defendant’s substantial rights); Alleyne v. United States, 133 S. Ct. 2151, 2162-63 (2013) (holding that any fact increasing the mandatory minimum is an “element” required to be submitted to a jury); Peugh v. United States, 133 S. Ct. 2072, 2081, 2088 (2013) (holding that the Ex Post Facto Clause prohibits sentencing offenders under Guidelines promulgated after an offender has committed a crime when the newer Guidelines provides a higher sentencing range, despite the Guidelines being advisory); S. Union Co. v. United States, 132 S. Ct. 2344, 2350, 2357 (2012) (applying Apprendi to courts issuing criminal fines); Pepper v. United States, 562 U.S. 476, 500-01 (2011) (permitting district courts to consider post-sentencing rehabilitation that might support a downward departure during re-sentencing after an offender’s sentence has been overturned on appeal); Abbott v. United States, 562 U.S. 8 (2010) (holding that an offender must be sentenced to the highest mandatory minimum under 18 U.S.C. § 924(c)(1)(A), rejecting higher minimum for separate counts of conviction, unless another statute targeting the same conduct imposes a greater mandatory minimum); Skilling v. United States, 561 U.S. 358 (2010) (holding that a former Enron executive had not shown that his right to a fair trial was impinged because of pretrial publicity and limiting the “honest services” fraud provision in 18 U.S.C. § 1346 to bribery and kickbacks); United States v. O’Brien, 560 U.S. 218, 224-26, 235 (2010) (requiring firearm’s status as a machinegun to be proved beyond a reasonable doubt to a jury, not to be included as a sentencing factor to be proved to a judge); Dillon v. United States, 560 U.S. 817, 821, 824, 828-30 (2010) (holding that the Guidelines are mandatory under an 18 U.S.C. §3582(c)(2) resentencing proceeding when the Guidelines have been altered in favor of the offender); Spears v. United States, 555 U.S. 261, 262, 265-66 (2009) (“clarifying that district courts are entitled to reject and vary categorically from the crack-cocaine Guidelines based on a policy disagreement” alone after Kimbrough); Oregon v. Ice, 555 U.S. 160, 168 (2009) (permitting states to assign to judges finding of facts regarding whether a consecutive sentence for multiple offenses should be imposed); Moore v. United States, 555 U.S. 1, 3-4 (2008) (remanding case for sentencing where the judge believed he had no discretion to reject the crack-cocaine ratio); Greenlaw v. United States, 554 U.S. 237, 243-44 (2008) (prohibiting courts of appeals from increasing an offender’s sentence absent a federal government appeal or cross-appeal); Irizarry v. United States, 553 U.S. 708, 709-10, 712-13 (2008) (holding that Fed R. Crim. P. 32(h), requiring notice when a court is considering a non-Guidelines sentence, is not applicable for a regular sentence variance from the recommended guideline range); Kimbrough v. United States, 552 U.S. 85, 91 (2007) (permitting courts to depart from the Guidelines’ 100-to-1 crack-cocaine ratio disparity); Gall v. United States, 552 U.S. 38, 41 (2007) (requiring courts of appeals to review all federal criminal sentences under a deferential abuse-of-discretion standard); Rita v. United States, 551 U.S. 338, 341 (2007) (permitting courts of appeals to apply a presumption of reasonableness to a district court sentence within the recommended guideline range); Cunningham v. California, 549 U.S. 270, 274 (2007) (striking down California’s determinate sentencing law as violating Booker); Shepard v. United States, 544 U.S. 13, 15-16 (2005) (prohibiting a district judge, in applying the Armed Career Criminal Act, 18 U.S.C. § 924(e), from reviewing police reports or complaint applications to determine whether an earlier guilty plea admits to committing generic burglary).

[75] Rita v. United States, 551 U.S. 338 (2007).

[76] Id. at 356-57.

[77] Gall v. United States, 552 U.S. 38, 47 (2007).

[78] Brief for United States, at 34-35, Rita v. United States, 551 U.S. 338 (2007), (No. 06–5754) (implying that a presumption of unreasonableness might be problematic because it might “transform an ‘effectively advisory’ system . . . into an effectively mandatory one”).

[79] Gall, 552 U.S. at 49.

[80] Id. at 50.

[81] Kimbrough v. United States, 552 U.S. 85, 109-110 (2007).

[82] Spears v. United States, 555 US 261 (2009).

[83] E.g., Baron-Evans & Stith, supra note 23, at 1632-33.

[84] E.g., Frank O. Bowman III, Nothing Is Not Enough: Fix the Absurd Post-Booker Federal Sentencing System, 24 Fed. Sent. R. 356, 356 (2012) (criticizing the post-Booker system as “retain[ing] most of the flaws of the system it replaced, while adding new ones”).

[85] Baron-Evans & Stith, supra note 23, at 1675, 1706-07.

[86] Carissa Byrne Hessick & F. Andrew Hessick, Appellate Review of Sentencing Decisions, 60 Ala. L. Rev. 1, 14 (2008).

[87] Id. at 19, 25.

[88] Id. at 99, 109. See also U.S. Sent’g Comm’n, Special Report to the Congress: Cocaine and Federal Sentencing Policy, viii (2002), http://www.ussc.gov/news/congressional-testimony-and-reports/drug-topics/report-congress-cocaine-and-federal-sentencing-policy (“[T]he Commission . . . unanimously and firmly concludes that the various congressional objectives can be achieved more effectively by decreasing substantially the 100-to-1 drug quantity ratio.”). The Commission had passed an amendment to equalize the crack/powder penalties, but President Clinton signed legislation nullifying the amendment. Id. at v.

[89] Pepper v. United States, 562 U.S. 476, 480, 500-01 (2011). In Pepper, the Guidelines had explicitly prohibited consideration of post-sentencing rehabilitation during resentencing. U.S.S.G. § 5K2.19 (2010). But the Court declared that sentencing courts only have to “give ‘respectful consideration’” to policy statements, a sharp post-Booker departure from its original holding that such policy statements were authoritative. Id. at 501; cf. Williams v. United States, 503 U.S. 193, 201 (1992) (holding policy statements to be authoritative).

[90] Spears v. United States, 555 US 261 (2009).

[91] See, e.g., United States v. Henderson, 649 F.3d 955, 960 (9th Cir. 2011) (permitting downward variance because of a policy disagreement with the child pornography Guidelines); United States v. Grober, 624 F.3d 592, 608-09 (3d Cir. 2010) (same); United States v. Dorvee, 616 F.3d 174, 183 (2d Cir. 2010) (same); United States v. Herrera-Zuniga, 571 F.3d 568, 584-85 (6th Cir. 2009) (reading Kimbrough as declaring “the broad authority of sentencing judges” to “categorically reject” a calculated sentencing range); see also Clare Freeman, Spears v. U.S.—Getting the Kimbrough Point Across, Sixth Circuit Blog (Jan 22, 2009), http://circuit6.blogspot.com/2009/01/spears-v-us-getting-kimbrough-point.html. But see United States v. Pugh, 515 F.3d 1179, 1201 n.15 (11th Cir. 2008) (rejecting downward variance for disagreements with the child pornography Guidelines).

This reading of Kimbrough is internally coherent with Booker’s desire to create an advisory system and is the better reading after Spears and Pepper. Yet it is worth mentioning that one can plausibly read Kimbrough more narrowly given its great emphasis on the history of the crack-cocaine ratio and the legislative inertia that had, at the time, prevented a new ratio from issuing. Kimbrough, 552 U.S. at 94-100. In Peppers, too, the Court permitted the court to disregard the Guidelines policy statement in part because it found the statement to be “wholly unconvincing.” Pepper, 562 U.S. at 501.

Judge Hardiman, who sits on the Court of Appeals for the Third Circuit, argues that the Kimbrough line should not be interpreted to allow judges to vary from the Guidelines simply based on a policy disagreement. Instead, Hardiman counsels that, before varying, district court judges should consider whether the Guidelines reflect the Commission acting in its archetypal institutional role and whether the advisory sentencing range is reasonable in light of the § 3553(a) factors. Honorable Thomas M. Hardiman & Richard L. Heppner Jr., Policy Disagreements with the United States Sentencing Guidelines: A Welcome Expansion of Judicial Discretion or the Beginning of the End of the Sentencing Guidelines?, 50 Duq. L. Rev. 5, 33 (2012).

[92] Kimbrough, 552 U.S. at 109.

[93] Peugh v. United States, 133 S. Ct. 2072, 2080 n.2 (2013).

[94] Except for in 2014, when courts of appeals were somewhat more generous, courts of appeals reverse barely one percent of sentences that are challenged as substantively unreasonable. U.S. Sent’g Comm’n, 2011 Sourcebook of Federal Sentencing Statistics tbl. 59 (2012), http://www.ussc.gov/sites/default/files/pdf/research-and-publications/annual-reports-and-sourcebooks/2011/Table59.pdf; U.S. Sent’g Comm’n, 2012 Sourcebook of Federal Sentencing Statistics tbl. 59 (2013), http://www.ussc.gov/sites/default/files/pdf/research-and-publications/annual-reports-and-sourcebooks/2012/Table59.pdf; U.S. Sent’g Comm’n, 2013 Sourcebook of Federal Sentencing Statistics tbl. 59 (2014), http://www.ussc.gov/sites/default/files/pdf/research-and-publications/annual-reports-and-sourcebooks/2013/Table59.pdf. [hereinafter 2013 Sourcebook]; U.S. Sent’g Comm’n, 2014 Sourcebook of Federal Sentencing Statistics tbl. 59 (2015), http://www.ussc.gov/sites/default/files/pdf/research-and-publications/annual-reports-and-sourcebooks/2014/Table59.pdf [hereinafter 2014 Sourcebook].

[95] For instance, the Sixth Circuit considers it to be substantively unreasonable when a court considers impermissible factors. United States v. Ward, 506 F.3d 468, 478 (6th Cir. 2007). While this case has been favorably cited elsewhere, United States v. Pugh, 515 F.3d 1179, 1192 (11th Cir. 2008), impermissible-factors cases are largely relegated to the Sixth Circuit. That is probably because Gall appears to indicate that considering impermissible factors is procedural error, not substantive error. Gall held that procedural unreasonableness includes failing to consider each of the § 3553(a) sentencing factors. Gall v. United States, 552 U.S. 38, 51 (2007). This implies a negative component that the § 3553(a) factors are exhaustive and that courts should not consider other factors.

The Sixth Circuit also categorizes as substantively unreasonable a court’s failure to adequately justify a sentence. United States v. Borho, 485 F.3d 904, 911 (6th Cir. 2007). But Gall explicitly considers this to be procedural unreasonableness. Gall, 552 U.S. at 51.

[96] Gall, 552 U.S. at 51.

[97] Stith & Cabranes, supra note 4, at 82 (identifying the lack of appellate review as one of the greatest deficiencies of the indeterminate era).

[98] Crystal S. Yang, Have Interjudge Sentencing Disparities Increased in an Advisory Guidelines Regime? Evidence from Booker, 89 N.Y.U. L. Rev. 1268, 1333 (2014).

[99] The Court has implicitly recognized this effect. E.g.,Peugh v. United States, 133 S. Ct. 2072, 2081, 2088 (2013) (holding that the Ex Post Facto Clause prohibits sentencing offenders under Guidelines promulgated after an offender has committed a crime when the newer Guidelines provides a higher sentencing range, despite the Guidelines being advisory).

The effect is also well-documented in scholarship. Birte Englich, et al., Playing Dice with Criminal Sentences: The Influence of Irrelevant Anchors on Experts’ Judicial Decision Making, 32 Personality & Soc. Psychol. Bull. 188, 194 (2006) (showing that giving judges starting values, even ones known to be arbitrary, has a discernible effect on judicial sentencing); Amos Tversky & Daniel Kahneman, Judgment Under Uncertainty: Heuristics and Biases, 185 Science 1124, 1128-30 (1974) (demonstrating this psychological bias in non-judicial realms). See also, e.g., Nancy Gertner, Thoughts on Reasonableness, 19 Fed. Sent. R. 165, 167 (2007) (recognizing the effect); Stephanos Bibas et al., Policing Politics at Sentencing, 103 Nw. U. L. Rev. 1371, 1387 (2009) (same); Michael M. O’Hear, Appellate Review of Sentence Explanations: Learning from the Wisconsin and Federal Experiences, 93 Marq. L. Rev. 751, 758 (2009) (same).

[100] Carl Hulse & Adam Liptak, New Fight over Controlling Punishments Is Widely Seen, N.Y. Times, Jan. 13, 2005, http://www.nytimes.com/2005/01/13/politics/new-fight-over-controlling-punishments-is-widely-seen.html (identifying some commentators as calling Booker an “egregious overreach” and saying it will lead to “wildly inconsistent” outcomes while others praise the decision for allowing greater leniency).

[101] Stith & Cabranes, supra note 4, at 5-6.

[102] Bowman, supra note 10, at 1236.

[103] Id. at 1241.

[104] Id. at 1257.

[105] Id. at 1261.

[106] Scott, supra note 5, at 3-4.

[107] Bowman, supra note 10, at 1267-68.

[108] Scott, supra note 5, at 5 (finding two judges in the District of Massachusetts to be relatively inert in their sentencing practices). However, Scott found that there was no greater inertia effect for judges who were appointed after 1987 than those appointed before; both sentenced below the Guidelines more than 30% of the time in the post-Booker era. Id. at 43-44.

These findings do not necessarily counter the conjecture that newly appointed judges will abide by the Guidelines less. Scott’s hypothesis is that judges appointed before 1987 “might cast off the yoke of the Guidelines more readily.” Id. at 43. But there is little reason to think that’s actually the case. Even if a judge had enjoyed greater sentencing freedom before 1987, by the time Booker was decided, the judge would have been sentencing under a mandatory regime for nearly two decades. The difference between a pre-1987 judge and a post-1987 (but pre-Booker) judge is not terribly meaningful. But the difference between a pre-Booker judge and a post-Booker judge is tremendous. One should not expect a pre-1987 judge to sentence terribly differently from a judge appointed in, say, 1999. But one should expect that both will sentence very differently from a judge who never sentenced under a mandatory regime.

[109] Admin. Office of the U.S. Courts, Judgeship Appointments by President, http://www.uscourts.gov/judges-judgeships/authorized-judgeships/judgeship-appointments-president (last visited Dec. 24, 2015).

[110] See Gertner, supra note 16, at 270 (identifying adherence to the Guidelines post-Booker as the result of “habits ingrained during twenty years of mandatory Guideline sentencing”); Yang, supra note 97, at 1319 (noting that as newer judges take the bench, the Guidelines are likely to have less of an anchoring effect).

[111] Scott, supra note 5, at 21 (noting that “changes are almost impossible to detect”); Caleb Mason & David Bjerk, Inter-Judge Sentencing Disparity on the Federal Bench: An Examination of Drug Smuggling Cases in the Southern District of California, 25 Fed. Sent. R. 190, 190 (2013) (identifying as “principal shortcomings” of these measurements the difficulty in either identifying judges or controlling for “relevant offense-conduct facts”).

[112] 54 Fed. Reg. 51279 (Dec. 13, 1989) (delineating an agreement between the Commission and the Judicial Conference to remove all judge and defendant identifiers from publicly released datafiles); U.S. Sent’g Comm’n, Guide to Publications & Resources 2010/2011, at 28 (2010), http://www.ussc.gov/sites/default/files/pdf/research-and-publications/topical-index-publications/2010_Guide_to_Publications_and_Resources.pdf (“Pursuant to the policy on public access to Sentencing Commission documents and data, all case and defendant identifiers have been removed from the data.” (internal citation omitted)).

[113] E.g.,Anderson et al., supra note 9; P.J. Hofer et al., The Effect of the Federal Sentencing Guidelines on Inter-Judge Sentencing Disparity, 90 J. Crim. L. & Criminology 239 (1999); J. Waldfogel, Aggregate Inter-Judge Disparity in Federal Sentencing: Evidence from Three Districts (D.Ct., S.D.N.Y., N.D.Cal.), 4 Fed. Sent. Rep. 151 (1991). But see Mason & Bjerk, supra note 110, at 191 (employing a methodology that allowed the researchers to limit the cases to offenders who had dealt in drug quantities triggering mandatory minimums but were safety valve eligible and received reduced sentences).

[114] Scott, supra note 5, at 4.

[115] Id. at 4-5.

[116] Kopf, supra note 2, at 50.

[117] Id. at 51, app. tbl. 7.

[118] Mason & Bjerk, supra note 110, at, 191.

[119] U.S.S.G. § 2D1.1(c).

[120] The researchers ensured they were measuring similarly situated offenders by limiting their analysis to individuals who were given sentences below mandatory minimums despite carrying drug quantities that would trigger a mandatory minimum but for the safety valve, which is only available to offenders with criminal history scores of 1. Mason & Bjerk, supra note 110, at 191.

[121] Id. at 192.

[122] Id. at 196.

[123] Id. at 193, 195.

[124] Id. at 192.

[125] TRAC Reports, Surprising Judge-to-Judge Variations Documented in Federal Sentencing, March 5, 2012, http://trac.syr.edu/tracreports/judge/274.

[126] Susan B. Long, Trac Report: Examining Current Federal Sentencing Practices: A National Study of Differences Among Judges, 25 Fed. Sent. R. 6 (2012).

[127] Id. at 7.

[128] Hofer, supra note 8, at 37.

[130] Long, supra note 125, at 7.

[131] Id. at 9-10, 12-13.

[132] Yang, supra note 97, at 1295-96.

[133] Id. at 1275.

[134] E.g., id. at 52 (noting that the District of Massachusetts may not be representative of national trends); Long, supra note 125, at 15 (The large differences in median sentences “is not sufficient to establish that such differences are indeed unwarranted sentencing disparities.”).

[135] Scott, supra note 5, at 3-4.

[136] E.g., Benjamin Weiser & Joseph Goldstein, Federal Court Alters Rules on Judge Assignments, N.Y. Times, Dec. 23, 2013, ww.nytimes.com/2013/12/24/nyregion/federal-court-alters-rules-on-judge-assignments.html (asserting that the stop-and-frisk litigation shows problems in supposed random assignment); Katherine A. Macfarlane, The Danger of Nonrandom Case Assignment: How the Southern District of New York’s “Related Cases” Rule Shaped Stop-and-Frisk Rulings, 19 Mich. J. Race & L. 199 (2014) (showing that one judge was randomly assigned a single case and then received all “related” cases); Joe Palazzolo, The Problem with Not-So-Random Case Assignment, Wall St. J., Nov. 4, 2013, http://blogs.wsj.com/law/2013/11/04/the-problem-with-not-so-random-case-assignment (discussing a congressionally authorized pilot program for judges to take “related” cases from other judges in the name of judicial efficiency).

[137] Long, supra note 125, at 9.

[138] E.g.,Gregory W. Corder & Dale I. Foreman, Nonparametric Statistics for Non-Statisticians: A Step-by-Step Approach 2 (2009) (stating that most researchers choose sample sizers at least as large as 30); Alan Agresti & Yongyi Min, On Sample Size Guidelines for Teaching Inference about the Binomial Parameter in Introductory Statistics, 1 (2002) (unpublished manuscript), http://www.stat.ufl.edu/~aa/articles/agresti_min_binomial.pdf (identifying the similarity a population distribution obtains with a normal distribution when sample sizes are 30 or greater as the reason for suggesting a sample size of 30 in the first place).

[139] Scott, supra note 5, at 25.

[140] Sentencing Resource Counsel Project, Federal Public Defenders, TRAC’s Report Claiming “Surprising Judge-to-Judge Variation” Fails to Compare Similar Cases, Relies on Poor Quality Data, Uses an Unreliable Method of Identifying Case Type, Uses Incorrect Methods of Reporting Sentence Length, and Contains Numerous Errors, 25 Fed. Sent. R. 20, 26-27 (2012).

[141] Nancy Gertner, Judge Identifiers, TRAC, and a Perfect World, 25 Fed. Sent. R. 46, 47 (2012).

[142] 28 U.S.C. § 995(a)(12) and (14)-(16) (2012).

[143] See Part III; Part VI.A.3.

[144] Congress could constitutionally return to a mandatory regime by statutorily adopting the Booker dissent’s remedy: statutorily providing for mandatory Guidelines only when the necessary facts are either admitted to by the defendant or found beyond a reasonable doubt by a jury.

[145] See Gall v. United States, 552 U.S. 38, 47 (2007) (prohibiting a presumption of unreasonableness for sentences outside the Guidelines).

[146] 18 U.S.C. § 3553(a)(6) (2012).

[147] Using the median absolute deviation—which measures the statistical dispersion centered around the median instead of the mean—rather than the more familiar standard deviation, would likely be more appropriate as this more robust statistical measure better protects against the influence of outlier sentences. The sentencing data does not fall into a normal distribution, in which the mean and median would be the same. This is due to a number of factors: the Guidelines exercise a large anchoring effect on judges, and there is a strong inclination among judges to sentence at the very bottom of a calculated range when judges do not depart. Supra note 24 and accompanying text. Because sentencing data does not even come close to resembling a normal distribution, the median is a more appropriate tool of analysis because it diminishes the significance of outliers such as life sentences.

[148] 2013 Sourcebook, supra 93, at tbl. 59; 2014 Sourcebook, supra note 93, at tbl. 59.

[149] 28 U.S.C. § 994(w) (2012).

[150] U.S. Sent’g Comm’n, Commission Datafiles (2015), [hereinafter Commission Datafiles] http://www.ussc.gov/research-and-publications/commission-datafiles.

[151] 2014 Sourcebook, supra note 93, at tbl. 1.

[152] 28 U.S.C. § 994(w)(1)(B) (2012).

[153] See sources cited supra note 112.

[154] U.S. Sent’g Comm’n, Variable Codebook for Individual Defenders, A-11 to A-16 (April 8, 2015), http://www.ussc.gov/sites/default/files/pdf/research-and-publications/datafiles/Individual_Codebook_FY99_FY14.pdf [hereinafter Individual Variable Codebook]; U.S. Sent’g Comm’n, Variable Codebook for Organizational Cases, (June 3, 2014), http://www.ussc.gov/sites/default/files/pdf/research-and-publications/datafiles/Organizational_Codebook_FY00_FY14.pdf.

[155] U.S. Sent’g Comm’n, 2015 Sourcebook of Federal Sentencing Statistics fig. A (2016), http://www.ussc.gov/research/sourcebook-2015 [hereinafter 2015 Sourcebook].

[156] Id.

[157] Protect Act, Pub. L. 108-21 § 401(m)(2)(B), 117 Stat. 650 (2003) (The Commission shall promulgate “a policy statement authorizing a downward departure of not more than 4 levels if the Government files a motion for such departure pursuant to an early disposition program authorized by the Attorney General and the United States Attorney.”); Memorandum from Deputy Attorney General James M. Cole, Department Policy on Early Disposition or “Fast-Track” Programs 2 (Jan. 31, 2012), http://www.justice.gov/sites/default/files/dag/legacy/2012/01/31/fast-track-program.pdf (permitting fast-track sentencing in all districts).

[158] See sources cited supra note 136.

[159] A possible way around this would be to include the variables that detail the sentencing date, but even this is imperfect. The sentencing date does not necessarily detail the offense date, and defense litigation often focuses on which Guidelines are applicable in the first place. Going down this rabbit hole risks drowning the usefulness of this proposal in complexity. This proposal seeks to strike a balance between pragmatic simplicity and the precision that comes from considering more variables.

[160] For example, after the changes to the drug quantity tables took place on November 1, 2014, most drug offenders will receive guideline ranges two levels below what the same offenders used to. In comparing current offenders to earlier offenders, the Commission could keep the analysis the same—that is, modern offenders would be compared with those receiving the same guideline ranges pre-2016 but who had lower drug quantities. But if there is good evidence, for example, that judges across the board found the drug penalties too harsh and that the drug quantity table was reduced to reflect modern sentencing patterns, not to reflect a policy change signifying that drug offenders are somewhat less deserving of punishment than before, then it will be appropriate to compare modern offenders with earlier offenders who had higher guideline ranges.

[161] For a list of the Commission’s variables that correlate with these traits, see the Appendix, infra.

[162] This is admittedly a slightly imperfect variable because the conduct of some offenders requires that multiple different Guidelines sections be calculated and that the chosen calculation be the highest one. Thus, the conduct of offenders with the same Guidelines calculation might diverge more than might be immediately apparent.

[163] Fed. Rule Crim. Pro. 5(c)(3)(D). One notable exception to this is the opportunity for senior status judges to sit in other districts by designation. 28 U.S.C. § 294(d) (2012). But while they may bring their sentencing practices and biases from a different district, their sentences should be compared with other in-district sentences.

[164] D. Conn. L. Civ. R. 50 (requiring only that cases be assigned “in accordance with a general policy . . . in the interest of the effective administration of justice”). The District of Minnesota also assigns cases “without regard to the division in which the case arose.” D. Minn. Order for Assignment of Cases 7 (Dec. 1, 2008).

[165] See, e.g., Michael M. O’Hear, National Uniformity/local Uniformity: Reconsidering the Use of Departures to Reduce Federal-State Sentencing Disparities, 87 Iowa L. Rev. 721, 723 (2002).

[166] Id. at 753-65 (arguing that courts have pursued too readily an ideal of national uniformity at the expense of an ideal of local uniformity, which is equally valid under federal law).

[167] Mason & Bjerk, supra note 110, at 191.

[168] This proposal does lose some information that the California Study used. The existence of a safety valve does not just clue one into the criminal history of an offender; it also lets one know whether the offense was violent and the offender’s relative role in the offense—factors that this proposal does not directly consider. 18 U.S.C. § 3553(f).

[169] U.S.S.G. § 2D1.1(c).

[170] U.S.S.G. § 2B1.1(b) (2015).

[171] For economic crimes, the Guidelines can give much more weight to economic loss than for an offender’s actual role in the offense. In United States v. Adelson, 441 F. Supp. 2d 506, 507 (S.D.N.Y. 2006), the court rejected the government’s request for a Guidelines life sentence. Noting that the practically inevitable decline in stock prices that follows from the revelation of fraud can have a tremendously large loss impact when that decline happens across millions of shares, the court stated that the loss table “may lead to guidelines offense levels that are, quite literally, off the chart” and are “patently absurd.” Id. at 509, 515.

The loss table is located at § 2B1.1, but in a sense of rigid display, the Guidelines apply the loss table to a host of rather diverse crimes, and it is far from obvious that a loss amount in one crime is applicable to an equivalent fiscal loss amount in another. For instance, the loss table comes into play for the “volume of commerce attributable” to a defendant broadcasting obscene material, § 2G3.2, or merely transporting it, § 2G3.1. But it also applies to insider trading gains, § 2B1.4, and the destruction of fish, other wildlife, and plants. § 2Q2.1. Furthermore, it’s not even clear that loss amount is an appropriate measure to begin with. For instance, embezzlement of funds that destroys a small company and puts 30 people completely out of work may be more devastating than a loss ten times as high that is spread loosely among millions of passive investors.

Also controversial is how a loss is calculated in the first place. For instance, some courts have applied a market-capitalization measure of damages, looking at the average value decrease per share multiplied by the number of shares. Cf. United States v. Olis, 429 F.3d 540, 542 (5th Cir. 2005) (reporting that district court had sentenced according to the loss in stock price to a shareholder, without regard to any other factor that might affect stock price).

This method of loss calculation is overly simplistic; it does not take into account the price at which shares were purchased or the multitude of factors that affect share price. In fact, two-thirds of the loss in Olis had occurred either before the problems were revealed or more than a week after the problems were revealed. Id. at 548. Because market capitalization can drive a sentence tremendously, some have called for a separate guideline for these types of offenses. Am. Bar Assoc. Criminal Justice Task Force, The Reform of Federal Sentencing for Economic Crimes 9 (Nov. 10, 2014), http://www.americanbar.org/content/dam/aba/uncategorized/criminal_justice/economic_crimes.authcheckdam.pdf. The Commission recently amended the Guidelines to help ameliorate this problem. Previously, the Guidelines required courts to presume that market capitalization loss provided a reasonable estimate of the actual loss. Now, the Guidelines allow courts to use “any method that is appropriate and practicable.” U.S. Sent’g Comm’n, Amendments to the Sentencing Guidelines 25, 30 (April 30, 2015).

[172] Individual Variable Codebook, supra note 152 at A-11.

[173] This is a bit imperfect. The main reasons for departing from a mandatory minimum—substantial assistance and safety valve—are not perfectly comparable. Congress has stated that it is “general[ly] appropriate[]” to impose a lower sentence than the mandatory minimum when an offender has rendered substantial assistance. 28 U.S.C. § 994(n) (2012). In contrast, a court “shall impose” a non-mandatory minimum sentence if the offender is safety-valve eligible. 18 U.S.C. § 3553(f). But Booker effectively nullified any distinction between “may” and “shall,” diminishing the distinction between the two categories. The only real difference is that with substantial assistance, the government sometimes suggests a departure amount, and the judge sometimes obliges because judges are supposed to consider “the government’s evaluation of the assistance rendered.” U.S.S.G. § 5K1.1(a)(1).

[174] U.S. Sent’g Comm’n, Amendments to the Sentencing Guidelines 29-36 (April 30, 2014), http://www.ussc.gov/sites/default/files/pdf/amendment-process/reader-friendly-amendments/20140430_RF_Amendments.pdf (displaying a two-level reduction from previous years in the base offense level assigned to various drug quantities).

[175] See supra note 158.

[176] In order to reduce the complexity of the horizontal axis, this Article uses bins to organize the data. The 2-month bin includes all sentences greater than 0 months but less than or equal to two months.

[177] The sentences in the 58-month bin of Figure 8 (all sentences greater than 56 months but less than or equal to 58 months) were all sentences of 57 months, which represents the Guidelines minimum if a judge were to apply a two-level reduction to the applicable Guidelines minimum.

[178] For instance several individuals, including Professor Frank Bowman and then-Attorney General Alberto Gonzalez, proposed “topless guidelines” where judges would be bound by the minimum calculated guideline ranges but would have advisory maximum ranges. Douglas A. Berman, Tweaking Booker: Advisory Guidelines in the Federal System, 43 Hous. L. Rev. 341, 356-64 (2006). This is undoubtedly unconstitutional after Alleyne v. United States, 133 S. Ct. 2151 (2013) (requiring any factor that increases the mandatory minimum of a crime to be proved before a jury).

[179] The Constitution Project Sentencing Initiative, Recommendations for Federal Criminal Sentencing in a Post-Booker World, 18 Fed. Sent’g Rep. 310, 314-317 (2006); see also Bowman, supra note 83, at 364 (summarizing the plan).

[180] Sessions, supra note 17, at 338-53. Bowman endorsed both Sessions’ and the Constitution Project Sentencing Initiative’s proposals. Bowman, supra note 83, at 364.

[181] Sessions, supra note 17, at 339 (citing Kevin R. Reitz, The New Sentencing Conundrum: Policy and Constitutional Law at Cross-Purposes, 105 Colum. L. Rev. 1082, 1114 (2005)).

[182] Id. at 348.

[183] Id. at 351, 354.

[184] Hearing Before the Subcomm. on Crime, Terrorism, & Homeland Sec. of the H. Comm. on the Judiciary, 112th Cong. 55-59 (2011) (statement of Judge Patti B. Saris, Chair, U.S. Sent’g Comm’n), [hereinafter Commission Testimony] http://www.ussc.gov/sites/default/files/pdf/news/congressional-testimony-and-reports/testimony/20111012_Saris_Testimony.pdf.

[185] 28 U.S.C. § 994(e) (2012). The Commission has adopted this policy. U.S.S.G. § 5H, introductory cmt.

[186] 18 U.S.C. § 3553(a)(1) (2012).

[187] Commission Testimony, supra note 182, at 57; see also Henry Bemporad, Fed. Pub. Defender for the W. Dist. of Tex., Statement Before the U.S. Sentencing Commission app. question 4 (Feb. 16, 2012), http://www.fd.org/pdf_lib/bemporad_statement_2_16_12.pdf (requesting that Congress codify the Commission’s policy statements that education, family ties, etc., are “not ordinarily relevant”).

[188] Commission Testimony, supra note 182, at 55-56.

[189] Berman, supra note 176, at 355.

[190] American Bar Association, Criminal Justice Section, Report on Booker and Recommendation, 17 Fed. Sent. R. 335, 339 (2005).

[191] United States v. Booker, 543 U.S. 220, 254 (2005).

[192] Baron-Evans & Stith, supra note 23, at 1731.

[193] Hessick, supra note 23, at 1336, 1348 (“[T]here is little doubt that this recommendation would shift the delicate balance the Court has struck after Booker away from district court discretion towards the Guidelines.”); Hessick & Hessick, supra note 85, at 3-4 (describing the balance as increasing judicial discretion to avoid the Sixth Amendment problem, but creating some appellate review to pursue uniformity).

[194] Baron-Evans & Stith, supra note 23, at 1716-1731 (arguing that Booker requires any Guidelines system to be advisory).

[193] Hessick, supra note 23, at 1347.

^[194] Bibas et al., supra note 98, at 1372-73.

[195] Id.; O’Hear, supra note 98, at 752 (2009) (arguing that courts should create an “explanation review”).

[196] Bibas et al., supra note 98, at 1373.

[197] Yang, supra note 97, at 1333.

[197] Baron-Evans & Stith, supra note 23, at 1731.

[198] Bibas et al., supra note 98, at 1396 (“[T]he Court’s jurisprudence is deeply misguided. Binding guidelines and searching appellate review are needed to make sentencing more consistent and legitimate.”).

[199] Mark Osler, The Promise of Trailing-Edge Sentencing Guidelines to Resolve the Conflict Between Uniformity and Judicial Discretion, 14 N.C. J. L. & Tech. 203, 207, 234 (2012).

[200] Id. at 230, 243.

[201] U.S.S.G. ch. 1(A)(1)(3) (2015) (noting that the Commission started with an “empirical approach that used as a starting point data estimating pre-guidelines sentencing practice”).

[202] Osler, supra note 200, at 230.

[203] Id. at 229.

[204] U.S.S.G. § 3A1.3; § 3A1.4.

[205] Id. § 3B1.2.

[206] Osler, supra note 200, at 233 (including as one of the factors to be included in the search for similarly situated offenders the fact that an offender robbed a bank).

[207] The probability of matching 19 variables, the probability of the presence of each being fifty percent, is

1219=

1524,288.

[208] Hofer, supra note 8, at 39. But cf. Judge Nancy Gertner, How to Talk About Sentencing Policy-and Not Disparity, 46 Loy. U. Chi. L.J. 313 (2014) (remarking that disparity is “far, far less important than issues of sentencing fairness, of proportionality, of what works to address crime” and that rampant sentencing disparity is a myth).

[209] See supra Part II.A.

[210] 18 U.S.C. § 3742(e) (2006).

[211] Hessick & Hessick, supra note 85, at 3, 14.

[212] See supra notes 88-90 and accompanying text.

[213] See sources cited in note 93, supra.

[214] Id.; see also 18 U.S.C. § 3553(c) (2012) (requiring a court to include a statement of reasons when sentencing an offender).

[215] The holding in Rita requires sentencing judges to “set forth enough to satisfy the appellate court that he has considered the parties’ arguments and has a reasoned basis for exercising his own legal decisionmaking authority.” Rita v. United States, 551 U.S. 338, 356 (2007). The judge will “normally go further and explain why he has rejected” arguments to depart from the Guidelines, and “[w]here the judge imposes a sentence outside the Guidelines, the judge will explain why he has done so.” Id. at 2468. It is also “uncontroversial that a major departure should be supported by a more significant justification than a minor one” and that a judge must “adequately explain the chosen sentence.” Gall, 552 U.S. at 50.

[216] See supra Part II.B.

[217] Rita, 551 U.S. at 356-57.

[218] 18 U.S.C. § 3742(f) (2012). The only exceptions are when the sentence is otherwise unlawful; is incorrectly implied; or where the offense has no actual applicable guideline range, in which case the court may only remand a sentence if the sentence “is plainly unreasonable.” Id. The presence of these has been criticized as violating Booker. United States v. Booker, 543 U.S. 220, 307 n.6 (2005) (Scalia, J., dissenting).

[219] See sources cited in note 24, supra.

[220] Gertner, supra note 209, at 319.

[221] Booker, 543 U.S. at 261.

[222] See Part III, supra.

[223] William J. Stuntz, The Collapse of American Criminal Justice 6 (2011) (identifying the shift in power from judges and juries to prosecutors as one of the key reasons for current criminal justice problems); Fifteen Years of Guidelines Sentencing, supra note 4, at 101-102 (finding that, pre-Booker, statutes and Guidelines accounted for 73% of sentencing disparity, with judges only accounting for 2.9% of sentencing disparity);Anderson, et al., supra note 9, at 301 (detailing the significant impact its regional prosecution policies through U.S. Attorney’s Offices have on inter-district sentencing disparity); Bunin, supra note 19, at 81 (identifying prosecutors as the single greatest source of disparity); Hofer, supra note 8, at 39 (identifying mandatory minimums, prosecutor decisions, and plea bargaining decisions as more significant sources of disparity); cf. Stith & Cabranes, supra note 4, at 130 (noting that prosecutors have gained greater relative control over sentencing than before the SRA).

[224] Ulmer, supra note 11, at 338.

[225] Yang, supra note 97, at 1323-24.

[226] Although a judge can grant a substantial assistance departure without a motion of the government, judges rarely do. Section 5K1.1 substantial assistance motions from the government were recognized in 9,482 cases last year. 2014 Sourcebook, supra note 93, at tbl. 30 n.1. Substantial assistance was only recognized in just over 400 cases without the government providing a motion. Id. at tbl. 25, tbl. 25A, tbl. 25B.

Prosecutors can also file for substantial assistance motions to reduce a sentence after an offender has been sentenced. Fed. R. Crim. P. 35(b) (authorizing such a motion up to one year following sentencing, and even beyond one year in certain situations).

[227] Hofer, supra note 8, at 41.

[228] Yang, supra note 97, at 1278 n.43.

[229] See generally Robert Axelrod, The Evolution of Cooperation (1984) (arguing that cooperation can arise from self-interested motives alone).

[230] Compare, e.g., Scott, supra note 5, at 22 (blaming the Commission’s policy of refusing to publicly disclose judge identifiers as rooted in a desire for judges to shield themselves from criticism—“an astonishing expectation for public officials who enjoy life tenure”), with Mosi Secret, Wide Sentencing Disparity Found Among U.S. Judges, N.Y. Times, March 5, 2012, http://www.nytimes.com/2012/03/06/nyregion/wide-sentencing-disparity-found-among-us-judges.html (identifying Justice Rehnquist as worrying “that collecting data on judges’ sentencing practices ‘could amount to an unwarranted and ill-considered effort to intimidate individual judges.’”).

[231] Kopf, supra note 2, at 50.

[232] 28 U.S.C. § 294(c) (2012) (requiring senior status judges to take on only as much as they are “willing and able to undertake”). District courts can adopt more concrete regulations.

[233] Yang, supra note 97, at 1279.

[234] Baron-Evans & Stith, supra note 23, at 1682, 1682-1703; see also Bunin, supra note 19, at 82-83 (identifying judicial discretion as preventing more disparity than it causes, in part because the Guidelines fail to take into account some mitigating factors).

[235] 28 U.S.C. § 994(w) (2012).

[236] To the extent possible, information is first taken from the SOR. In addition to a few other forms, the Presentence Report prepared by the probation officers under Fed. R. Crim. P. 32 is used to fill gaps in the SOR data. Individual Variable Codebook, supra note 152, at 2.

[237] Gertner, supra note 139, at 47.

[238] This Article segregates the information accordingly: the offender was sentenced under § 2D1.1, the judge issued a non-zero value for the variable that shows whether the offender was convicted under a statute bearing a statutory minimum, and a contradictory separate variable was present that signaled that “no count of conviction carries a mandatory sentence.”

[239] This Article segregates the information accordingly: the offender was sentenced under § 2D1.1, the judge issued a zero value for the variable that shows whether the offender was convicted under a statute bearing a statutory minimum, and a contradictory separate variable was present that signaled that “one or more counts of conviction carry mandatory [minimums] but the court determined it does not apply.”

[240] Lou Reedt et al., Effective Use of Federal Sentencing Data, U.S. Sent’ Comm’n 79 (2013), http://www.ussc.gov/sites/default/files/pdf/research-and-publications/datafiles/20131122-ACS-Presentation.pdf.

[241] 28 U.S.C. § 995(a)(12) and (14)-(16) (2012).

[242] Baron-Evans & Stith, supra note 23, at 1716-41.

[243] Hessick & Hessick, supra note 85, at 3-4, 14.

[244] Hessick, supra note 23, at 1348; see also Gall v. United States, 552 U.S. 38, 47 (2007) (prohibiting a presumption of unreasonableness for sentences outside the Guidelines).

[245] Linda Greenhouse, Guidelines on Sentencing Are Flawed, Justice Says, N.Y. Times, Nov. 21, 1998, http://www.nytimes.com/1998/11/21/us/guidelines-on-sentencing-are-flawed-justice-says.html.

[246] United States v. Booker, 543 U.S. 220, 246 (2005) (seeking to determine “what ‘Congress would have intended’ in light of the Court’s constitutional holding”).

[247] Id. at 260, 263 (ironing out the sentencing differences).

[248] Id. at 265.

[249]Gall v. United States, 552 U.S. 38, 47 (2007).

[250] Id. at 50.

[251] Rita v. United States, 551 U.S. 338, 347 (2007).

[252] Brief for the Petitioner at 33, Rita v. United States, 551 U.S. 338 (2007) (No. 06-5754).

[253] 18 U.S.C. § 3742(e)(3)(B) (2012).

[254] Other scholars have argued that the Court has created a balance between discretion and uniformity and that a proposal is unconstitutional when it “shift[s] the delicate balance the Court has struck after Booker away from district court discretion towards the Guidelines.” Hessick, supra note 23, at 1343, 1348. Hessick identifies a uniformity ideal in the Court’s jurisprudence. However, this uniformity ideal is itself derived from the SRA. By adopting this proposal, though, Congress would be shifting its ideals away from some of those expressed in the SRA.

[255] E.g.,Sarah M. R. Cravens, Judging Discretion: Contexts for Understanding the Role of Judgment, 64 U. Miami L. Rev. 947, 966 & n.73 (2010).

[256] Rita v. United States, 551 U.S. 338, 370 (2007) (Scalia, J., concurring in part and dissenting in part) (arguing that there can be no substantive component to reasonableness review at all and that “district courts must be able . . . to sentence to the maximum of the statutory range”).

[257] United States v. Borho, 485 F.3d 904, 911 (6th Cir. 2007).

[258] Gall v. United States, 552 U.S. 38, 51 (2007).

[259] Commission Datafiles, supra note 148.

[260] 28 U.S.C. § 994(w) (2012).

[261] Individual Variable Codebook, supra note 152, at 2.

[262] Id. at 22, 30, 42-43, 43.

[263] The underscore (“_”) denotes a placeholder. For instance, an offender’s data can include hundreds of possible reasons for a departure under the variable REAS_ (e.g. REAS1, REAS2, REAS3).

[264]] This was used in conjunction with the variable STATMIN. If an offender had a mandatory minimum applied, relying on both these variables helps determine whether the sentence was the mandatory minimum or whether a sentence above the mandatory minimum was applied.

[265] Individual Variable Codebook, supra note 152, at A-11 through A-16.

Booker Disparity and Data-Driven Sentencing for Offenders

I. Introduction

II. Current State of Sentencing

A. Genesis

1. The Lead-up to Booker

2. Booker and Its Progeny

B. Problems Created by Booker

C. Recent Studies Demonstrating Inter-Judge Disparity

1. Boston Study

2. Nebraska Study

3. California Study

4. Syracuse TRAC Reports

5. Multi-District Empirical Analysis

D. Limitations of These Studies

III. A Proposal for Presumed Unreasonableness

IV. Methodology for Determining Which Defendants Are Similarly Situated

A. Limitation to Specific Offense Categories

B. Sample Size

C. Choice of Factors

1. District of Sentence

2. Other Factors

3. The Complexity of Guideline Amendments

V. Reform in Action and Lessons Learned from the Data

A. Lessons and Data from Massachusetts

B. Lessons and Data from W.D. Mo.

VI. The Proposal’s Benefits

A. Distinctions from Existing Proposals

1. Systematic Reforms

2. Calls for Stronger Appellate Review

3. Trailing-Edge Guidelines

B. The Proposal Is Sound Policy

1. Direct Effect on Judges

2. Minimal Effect on the Anchoring Properties of the Guidelines

3. Effect on Non-Judicial Sources of Disparity

4. Further Positive Effects

5. The Limits of the Proposal

C. The Proposal Is Constitutional

VII. Conclusion

Appendix

Professor