Fix dead regex alternation: mm/MM unreachable in unit pattern#22
Open
MaxwellCalkin wants to merge 1 commit intoanthropics:mainfrom
Open
Fix dead regex alternation: mm/MM unreachable in unit pattern#22MaxwellCalkin wants to merge 1 commit intoanthropics:mainfrom
MaxwellCalkin wants to merge 1 commit intoanthropics:mainfrom
Conversation
In the regex unit group, the character class [TBMKtbmk]n? appears before the mm|MM alternatives. Since 'm' is in the character class, a single 'm' matches first, leaving the second 'm' unconsumed. This means "mm" (a common financial abbreviation for millions) is captured as just "m", and the mm|MM alternation is unreachable. Fix by moving mm|MM before the single-character class so the two-character abbreviation gets priority. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
extract_numbers.pyhas[TBMKtbmk]n?beforemm|MM. Sincemis in the character class[TBMKtbmk], a singlemmatches first when the input is "mm", leaving the secondmunconsumed. Themm|MMalternation is effectively dead code — it can never match.minstead ofmm, which happens to still normalize correctly only by luck — but the captured unit string is wrong, and downstream logic relying on the exact unit value would break.mm|MMbefore the single-character class so the two-character token gets regex priority.Test plan
mm(notm)MM(notM)McorrectlyMncorrectly🤖 Generated with Claude Code
AI Disclosure
This PR was authored by Claude Opus 4.6 (Anthropic), an AI agent operated by Maxwell Calkin (@MaxwellCalkin).