CLRS-Lean
Chapter 7. Quicksort
7.3. Randomized Quicksort

Imports

import CLRSLean.Chapter_07.Section_07_1_Description_Of_Quicksort
import Mathlib

CLRS Section 7.3 - Randomized quicksort

This section defines the expected-comparison recurrence for randomized quicksort (CLRS equation (7.4)) and proves its closed-form solution, giving the O(n log n) average-case bound for the first time in CLRS-Lean.

The expected number of comparisons expectedComparisons n = E[T(n)] satisfies:

T(0) = 0, T(1) = 0
For n >= 1: T(n) = n-1 + (2/n) * sum_{k=0}^{n-1} T(k)

The closed form is T(n) = 2(n+1)H_n - 4n where H_n is the n-th harmonic number. This yields T(n) <= 2n H_n and T(n) <= n^2 (quadratic fallback).

Main results:

Lemma harmonic_succ: recurrence for harmonic numbers
Lemma harmonic_le_n: H_n <= n
Lemma sum_mul_harmonic_eq: sum_{k=1}^{n} k H_k = n(n+1)/2 H_n - n(n-1)/4
Lemma sum_expectedComparisons_eq: closed form of sum_{k=0}^{n-1} T(k)
Theorem expectedComparisons_closed_form: named CLRS closed-form formula
Theorem expectedComparisons_recurrence: closed form satisfies CLRS (7.4)
Theorem expectedComparisons_telescope: (n+1)T(n+1) = (n+2)T(n) + 2n
Theorem expectedComparisons_clrs_harmonic_bound: T(n) <= 2(n+1)H_n
Theorem expectedComparisons_harmonic_bound: T(n) <= 2n H_n
Theorem expectedComparisons_quadratic: T(n) <= n^2
Theorem expectedComparisons_monotone: T(n) <= T(n+1)

Notation conventions:

harmonic n : H_n, the n-th harmonic number in Q
expectedComparisons n : T(n), expected number of comparisons for randomized quicksort on n distinct elements

              namespace CLRSnamespace Chapter07open Chapter07

Harmonic numbers

The n-th harmonic number as a rational. H_0 = 0, H_{n+1} = H_n + 1/(n+1).

def harmonic : Nat → Rat
  | 0 => 0
  | n+1 => harmonic n + 1 / ((n+1 : Nat) : Rat)

@[simp]
theorem harmonic_zero : harmonic 0 = 0 := rfl

@[simp]
theorem harmonic_one : harmonic 1 = 1 := by
  simp [harmonic]

Recurrence for harmonic numbers: H_{n+1} = H_n + 1/(n+1).

theorem harmonic_succ (n : Nat) : harmonic (n+1) = harmonic n + (1 : Rat) / ((n+1 : Nat) : Rat) :=
  rfl

Harmonic numbers are nonnegative.


theorem harmonic_nonneg (n : Nat) : 0 ≤ harmonic n := by
  induction n with
  | zero => simp
  | succ n ih =>
      rw [harmonic_succ]
      have hpos : 0 ≤ (1 : Rat) / ((n+1 : Nat) : Rat) := by
        positivity
      nlinarith

The harmonic number is bounded by its index: H_n <= n for all n.

This trivial bound is enough for many estimates.


theorem harmonic_le_n (n : Nat) : harmonic n ≤ (n : Rat) := by
  induction n with
  | zero => simp
  | succ n ih =>
      rw [harmonic_succ]
      push_cast
      have hdiv : (1 : Rat) / ((n : Rat) + 1) ≤ 1 :=
        (div_le_one (by positivity)).mpr (by nlinarith)
      nlinarith

Expected comparisons: closed form

Expected number of comparisons in randomized quicksort on n distinct elements, given by the closed-form solution of CLRS recurrence (7.4):

T(n) = 2(n+1)H_n - 4n

where H_n is the n-th harmonic number. This is a computable deterministic rational function; the expectation is folded into the recurrence coefficients, not into a probability space.

def expectedComparisons (n : Nat) : Rat :=
  2 * ((n : Rat) + 1) * harmonic n - 4 * (n : Rat)

Named CLRS closed form for randomized-quicksort expected comparisons.

theorem expectedComparisons_closed_form (n : Nat) :
    expectedComparisons n = 2 * ((n : Rat) + 1) * harmonic n - 4 * (n : Rat) :=
  rfl

@[simp]
theorem expectedComparisons_zero : expectedComparisons 0 = 0 := by
  simp [expectedComparisons, harmonic]

@[simp]
theorem expectedComparisons_one : expectedComparisons 1 = 0 := by
  simp [expectedComparisons, harmonic]
  ring

Explicit formula for expectedComparisons (n+1) in terms of harmonic (n+1).

theorem expectedComparisons_succ (n : Nat) :
    expectedComparisons (n+1) = 2 * ((n+1 : Rat) + 1) * harmonic (n+1) - 4 * ((n+1 : Rat)) := by
  simp [expectedComparisons]

Key combinatorial identity - sum of k times harmonic k

Central combinatorial identity for the expected-quicksort closed form:

sum_{k=1}^{n} k * H_k = (n(n+1)/2) * H_n - n(n-1)/4

This is proved by induction on n using the harmonic recurrence to express H_n in terms of H_{n+1} in the inductive step.


theorem sum_mul_harmonic_eq (n : Nat) :
    (∑ k ∈ Finset.Icc 1 n, ((k : Rat) * harmonic k)) =
    (((n : Rat) * ((n : Rat) + 1)) / 2) * harmonic n - ((n : Rat) * ((n : Rat) - 1) / 4) := by
  induction n with
  | zero =>
      simp [harmonic]
  | succ n ih =>
      rw [Finset.sum_Icc_succ_top (by omega) (fun k => (k : Rat) * harmonic k)]
      rw [ih]
      -- Now: (n(n+1)/2)*H_n - n(n-1)/4 + (n+1)*H_{n+1} = ((n+1)(n+2)/2)*H_{n+1} - (n+1)n/4
      -- Use H_n = H_{n+1} - 1/(n+1)
      have hH_n : harmonic n = harmonic (n+1) - (1 : Rat) / ((n+1 : Nat) : Rat) := by
        rw [harmonic_succ]
        ring
      rw [hH_n]
      push_cast
      ring_nf
      have hpos : ((n : Nat) : Rat) + 1 ≠ 0 := by
        intro hzero
        have hsum : ((n+1 : Nat) : Rat) = 0 := by push_cast; simpa using hzero
        exact Nat.succ_ne_zero n (by exact_mod_cast hsum)
      field_simp [hpos]
      ring

Sum of expected comparisons

Closed form for the sum of expected comparisons up to n-1:

sum_{k=0}^{n-1} T(k) = n(n+1)*H_n - (5 n^2 - n)/2


theorem sum_expectedComparisons_eq (n : Nat) :
    (∑ k ∈ Finset.range n, expectedComparisons k) =
    ((n : Rat) * ((n : Rat) + 1)) * harmonic n - ((5 : Rat) * (n : Rat) * (n : Rat) - (n : Rat)) / 2 := by
  induction n with
  | zero => simp
  | succ n ih =>
      rw [Finset.sum_range_succ, expectedComparisons, ih]
      have hH_succ : harmonic (n+1) = harmonic n + (1 : Rat) / ((n+1 : Nat) : Rat) := harmonic_succ n
      rw [hH_succ]
      push_cast
      ring_nf
      have hpos : ((n : Nat) : Rat) + 1 ≠ 0 := by
        intro hzero
        have hsum : ((n+1 : Nat) : Rat) = 0 := by push_cast; simpa using hzero
        exact Nat.succ_ne_zero n (by exact_mod_cast hsum)
      field_simp [hpos]
      ring

Recurrence verification

The closed-form expectedComparisons satisfies the CLRS expected-comparison recurrence (7.4): for n >= 1,

T(n) = n-1 + (2/n) * sum_{k=0}^{n-1} T(k).

The proof multiplies through by n and uses the closed form of the sum.


theorem expectedComparisons_recurrence (n : Nat) (hn : n ≥ 1) :
    expectedComparisons n = ((n : Rat) - 1) + (2 / (n : Rat)) *
      (∑ k ∈ Finset.range n, expectedComparisons k) := by
  have hnpos : (n : Rat) ≠ 0 := by
    intro hzero
    have : n = 0 := by exact_mod_cast hzero
    omega
  -- Clear denominator by multiplying both sides by n
  field_simp [hnpos]
  -- Goal: n * T(n) = n * (n-1) + 2 * S(n)
  rw [sum_expectedComparisons_eq n]
  rw [expectedComparisons]
  ring

Alternative form of the recurrence, clearing denominators:

(n+1) * T(n+1) = (n+2) * T(n) + 2n for all n >= 0.

This telescoping identity is the key to the closed form and is used in the inductive proofs below.


theorem expectedComparisons_telescope (n : Nat) :
    ((n+1 : Nat) : Rat) * expectedComparisons (n+1) =
    (((n : Rat) + 2)) * expectedComparisons n + 2 * (n : Rat) := by
  rw [expectedComparisons, expectedComparisons]
  have hH_succ : harmonic (n+1) = harmonic n + (1 : Rat) / ((n+1 : Nat) : Rat) := harmonic_succ n
  rw [hH_succ]
  push_cast
  ring_nf
  have hpos : ((n : Nat) : Rat) + 1 ≠ 0 := by
    intro hzero
    have hsum : ((n+1 : Nat) : Rat) = 0 := by push_cast; simpa using hzero
    exact Nat.succ_ne_zero n (by exact_mod_cast hsum)
  field_simp [hpos]
  ring

Expected comparisons: nonnegativity

Expected comparisons are nonnegative.


theorem expectedComparisons_nonneg (n : Nat) : 0 ≤ expectedComparisons n := by
  induction n with
  | zero => simp
  | succ n ih =>
      have ht := expectedComparisons_telescope n
      -- ht: (n+1)*T(n+1) = (n+2)*T(n) + 2n
      -- RHS >= 0 since T(n) >= 0 and n >= 0, and (n+1) > 0 so T(n+1) >= 0
      have hpos_denom : ((n+1 : Nat) : Rat) ≠ 0 :=
        Nat.cast_ne_zero.mpr (Nat.succ_ne_zero n)
      have hnum_nonneg : 0 ≤ (((n : Rat) + 2)) * expectedComparisons n + 2 * (n : Rat) := by
        nlinarith
      -- From ht: T(n+1) = numerator / (n+1)
      have hT_expr : expectedComparisons (n+1) =
          ((((n : Rat) + 2)) * expectedComparisons n + 2 * (n : Rat)) / ((n+1 : Nat) : Rat) :=
        (eq_div_iff_mul_eq hpos_denom).mpr (by
          -- Need: T(n+1) * (n+1) = numerator
          -- ht gives: (n+1) * T(n+1) = numerator
          simpa [mul_comm] using ht)
      rw [hT_expr]
      refine div_nonneg hnum_nonneg (by positivity)

Bounds

Harmonic upper bound. The expected number of comparisons in randomized quicksort is at most 2 n * H_n.

Since H_n = Theta(log n), this gives T(n) = O(n log n).


theorem expectedComparisons_harmonic_bound (n : Nat) :
    expectedComparisons n ≤ 2 * (n : Rat) * harmonic n := by
  have hle : harmonic n ≤ (n : Rat) := harmonic_le_n n
  rw [expectedComparisons]
  nlinarith

CLRS-facing harmonic upper bound using the closed-form scale 2(n+1)H_n.


theorem expectedComparisons_clrs_harmonic_bound (n : Nat) :
    expectedComparisons n ≤ 2 * ((n : Rat) + 1) * harmonic n := by
  rw [expectedComparisons_closed_form]
  have hn : 0 ≤ (4 : Rat) * (n : Rat) := by positivity
  nlinarith

Quadratic upper bound. On any input of length n, the expected number of comparisons is at most n^2.

The proof uses induction with the telescope identity: T(n+1) = ((n+2)T(n) + 2n)/(n+1). The inductive hypothesis T(n) <= n^2 and a simple polynomial inequality n^2 + n + 1 >= 0 close the step.


theorem expectedComparisons_quadratic (n : Nat) :
    expectedComparisons n ≤ (n : Rat) * (n : Rat) := by
  induction n with
  | zero => simp
  | succ n ih =>
      have ht := expectedComparisons_telescope n
      -- ht: (n+1)*T(n+1) = (n+2)*T(n) + 2n
      have hpos : ((n+1 : Nat) : Rat) ≠ 0 :=
        Nat.cast_ne_zero.mpr (Nat.succ_ne_zero n)
      -- From ht: T(n+1) = ((n+2)*T(n) + 2n) / (n+1)
      have hT_succ : expectedComparisons (n+1) =
          ((((n : Rat) + 2)) * expectedComparisons n + 2 * (n : Rat)) / ((n+1 : Nat) : Rat) :=
        (eq_div_iff_mul_eq hpos).mpr (by
          simpa [mul_comm] using ht)
      rw [hT_succ]
      -- Need: ((n+2)*T(n) + 2n) / (n+1) <= (n+1)^2
      -- First, bound the numerator using ih: T(n) <= n^2
      have hnum_bound : (((n : Rat) + 2)) * expectedComparisons n + 2 * (n : Rat) ≤
          ((n : Rat) + 1) * ((n : Rat) + 1) * ((n : Rat) + 1) := by
        -- (n+2)*T(n) + 2n <= (n+2)*n^2 + 2n = n^3 + 2n^2 + 2n
        -- <= n^3 + 3n^2 + 3n + 1 = (n+1)^3  (since n^2 + n + 1 >= 0)
        nlinarith
      -- Apply the division lemma: if a <= b and c > 0, then a/c <= b/c
      refine le_trans (div_le_div_of_nonneg_right hnum_bound (by positivity)) ?_
      -- Now need: (n+1)^3 / (n+1) <= (n+1)^2
      -- Since (n+1)^3 / (n+1) = (n+1)^2 exactly, this is equality
      push_cast
      have h_eq : ((n : Rat) + 1) * ((n : Rat) + 1) * ((n : Rat) + 1) / ((n : Rat) + 1) =
          ((n : Rat) + 1) * ((n : Rat) + 1) := by
        field_simp [show ((n : Rat) + 1) ≠ 0 from by positivity]
      exact h_eq.le

Monotonicity. The expected comparison count is non-decreasing: T(n) <= T(n+1).

From the telescope identity, T(n+1) - T(n) = (T(n) + 2n)/(n+1) >= 0.


theorem expectedComparisons_monotone (n : Nat) : expectedComparisons n ≤ expectedComparisons (n+1) := by
  have ht := expectedComparisons_telescope n
  -- ht: (n+1)*T(n+1) = (n+2)*T(n) + 2n
  -- Rearranged: (n+1)*(T(n+1) - T(n)) = T(n) + 2n
  -- Since T(n) >= 0, RHS >= 0, so T(n+1) - T(n) >= 0
  have hpos : ((n+1 : Nat) : Rat) ≠ 0 :=
    Nat.cast_ne_zero.mpr (Nat.succ_ne_zero n)
  have hnonneg : 0 ≤ expectedComparisons n := expectedComparisons_nonneg n
  have hdiff : expectedComparisons (n+1) - expectedComparisons n =
      (expectedComparisons n + 2 * (n : Rat)) / ((n+1 : Nat) : Rat) :=
    (eq_div_iff_mul_eq hpos).mpr (by
      -- Need: (T(n+1) - T(n)) * (n+1) = T(n) + 2n
      -- Start from ht: (n+1)*T(n+1) = (n+2)*T(n) + 2n
      calc
        (expectedComparisons (n+1) - expectedComparisons n) * ((n+1 : Nat) : Rat)
            = ((n+1 : Nat) : Rat) * expectedComparisons (n+1) -
              ((n+1 : Nat) : Rat) * expectedComparisons n := by ring
        _ = (((n : Rat) + 2) * expectedComparisons n + 2 * (n : Rat)) -
              ((n+1 : Nat) : Rat) * expectedComparisons n := by rw [ht]
        _ = expectedComparisons n + 2 * (n : Rat) := by push_cast; ring
      )
  have hdiff_nonneg : 0 ≤ expectedComparisons (n+1) - expectedComparisons n := by
    rw [hdiff]
    refine div_nonneg ?_ (by positivity)
    nlinarith
  linarith

end Chapter07end CLRS