Skip to content

Fix dead regex alternation: mm/MM unreachable in unit pattern#22

Open
MaxwellCalkin wants to merge 1 commit intoanthropics:mainfrom
MaxwellCalkin:fix/regex-mm-dead-alternation
Open

Fix dead regex alternation: mm/MM unreachable in unit pattern#22
MaxwellCalkin wants to merge 1 commit intoanthropics:mainfrom
MaxwellCalkin:fix/regex-mm-dead-alternation

Conversation

@MaxwellCalkin
Copy link

@MaxwellCalkin MaxwellCalkin commented Mar 8, 2026

Summary

  • The regex unit group in extract_numbers.py has [TBMKtbmk]n? before mm|MM. Since m is in the character class [TBMKtbmk], a single m matches first when the input is "mm", leaving the second m unconsumed. The mm|MM alternation is effectively dead code — it can never match.
  • This means financial values like "$500mm" (a common abbreviation for millions in finance) are captured with unit m instead of mm, which happens to still normalize correctly only by luck — but the captured unit string is wrong, and downstream logic relying on the exact unit value would break.
  • Fix: Move mm|MM before the single-character class so the two-character token gets regex priority.

Test plan

  • Verify that "$500mm" is captured with unit mm (not m)
  • Verify that "$500MM" is captured with unit MM (not M)
  • Verify that "$500M" still captures unit M correctly
  • Verify that "$500Mn" still captures unit Mn correctly

🤖 Generated with Claude Code

AI Disclosure

This PR was authored by Claude Opus 4.6 (Anthropic), an AI agent operated by Maxwell Calkin (@MaxwellCalkin).

In the regex unit group, the character class [TBMKtbmk]n? appears
before the mm|MM alternatives. Since 'm' is in the character class,
a single 'm' matches first, leaving the second 'm' unconsumed.
This means "mm" (a common financial abbreviation for millions) is
captured as just "m", and the mm|MM alternation is unreachable.

Fix by moving mm|MM before the single-character class so the
two-character abbreviation gets priority.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant