Skip to content

arm64: Refactor mov/movprfx for unmasked operations#123717

Open
ylpoonlg wants to merge 7 commits intodotnet:mainfrom
ylpoonlg:github-movprfx_refactor_1
Open

arm64: Refactor mov/movprfx for unmasked operations#123717
ylpoonlg wants to merge 7 commits intodotnet:mainfrom
ylpoonlg:github-movprfx_refactor_1

Conversation

@ylpoonlg
Copy link
Contributor

@ylpoonlg ylpoonlg commented Jan 28, 2026

This PR is the first of a few contributing to #115508.

Motivation was that the jit doesn't recognize read-modify-write instructions (Add with two operands for example) until later after lsra. Jit needs to prefix such instructions with mov/movprfx where it can't encode the dst operand. Previously, this logic was sprinkled around code-generation. This PR moves this logic to create these movs in a unified location, during emit

  • Apply the proposed changes to the codegen and emit functions for the non-embedded masked operations, which only concern cases using unpredicated mov/movprfx.
  • Clean up RMW instructions codegen in hwintrinsiccodegenarm64.cpp.
  • Add a helper emit function to handle SVE mov/movprfx checks.
  • Move the special codegen for AddCarryWideningEven/Odd into the emit function.

cc @dotnet/arm64-contrib @a74nh

* Move MOVPRFX logic from codegen to emit.
@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 28, 2026
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jan 28, 2026
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @dotnet/jit-contrib
See info in area-owners.md if you want to be subscribed.

@ylpoonlg
Copy link
Contributor Author

SPMI asmdiffs:

 G_M39772_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             ldr     q16, [fp, #0x20]	// [V00 arg0]
             ldr     w0, [fp, #0x1C]	// [V01 arg1]
-            mov     v0.16b, v16.16b
+            movprfx z0, z16
             insr    z0.s, w0
-						;; size=16 bbWeight=1 PerfScore 8.50
+						;; size=16 bbWeight=1 PerfScore 10.00

ASIMD movs are replaced with SVE movprfxs where possible. Slight increase in PerfScore, but this would allow the uarch to fuse them with the following instruction.
Other failures look unrelated.

@a74nh
Copy link
Contributor

a74nh commented Jan 30, 2026

Slight increase in PerfScore, but this would allow the uarch to fuse them with the following instruction.

In an ideal world, the perfscore would detect instruction fusing (but let's not try to fix that here - perfscore needs fixing regardless). So, agreed, your patch is better.

@JulieLeeMSFT
Copy link
Member

@dhartglassMSFT, PTAL.

return true;
}

if (ins == INS_sve_movprfx)
Copy link
Contributor

@dhartglassMSFT dhartglassMSFT Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason that movprfx between the same register is not redundant, or was this more of a pre-emptive/safety thing?

Either way, can you add a comment about why this case is needed

Also if you're up for it, the comment on line 16909 mentions "3 cases" but lists four :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason that movprfx between the same register is not redundant, or was this more of a pre-emptive/safety thing?

I don't think currently there is any instance of movprfx being called with canSkip = false, so it was just added preemptively. It might be useful for codegen testing, or maybe @a74nh had something else in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code below this are some specific optimisations that are trying to merge the current mov with he previous instruction. It would need some reworking to also work for movprfx.
However - movprfx is already an optimised mov, the microarch will merge it into the instruction following it. So, we probably don't want/need to try to optimise away the movprfx.

So, happy with this as is.
(Although the comment should still be updated)

instruction ins, emitAttr attr, regNumber dstReg, regNumber srcReg, bool canSkip, insOpts opt /* = INS_OPTS_NONE */)
{
emitAttr size = EA_SIZE(attr);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert(IsMovInstruction(ins)); similar to the one in emitIns_Mov would make sense here too

@dhartglassMSFT
Copy link
Contributor

dhartglassMSFT commented Mar 11, 2026

Hi @ylpoonlg I added a blurb in the PR description, I didn't find the description clear in 115508. Thanks Alan for filling me in on that. Hopefully will help out anyone looking in this area in the future. Please feel free to re-word it too, if you dont like the wording or if I got a detail wrong.

for (helper.EmitBegin(); !helper.Done(); helper.EmitCaseEnd())
{
GetEmitter()->emitIns_R_R(INS_sve_movprfx, EA_SCALABLE, reg1, reg3);
GetEmitter()->emitInsSve_R_R(INS_sve_movprfx, EA_SCALABLE, reg1, reg3);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my education, going after these would be part of the remaining work for 115508 correct?

Copy link
Contributor Author

@ylpoonlg ylpoonlg Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they will be removed in the next PR. This particular instance is just to avoid hitting

if (IsMovInstruction(ins))
{
    assert(!"Please use emitIns_Mov() to correctly handle move elision");
    emitIns_Mov(ins, attr, reg1, reg2, /* canSkip */ false, opt);
}

in emitIns_R_R, since we now consider INS_sve_movprfx a mov instruction.
Maybe we could do a similar thing in emitInsSve_R_R, but that would mean moving the whole emit logic for INS_sve_movprfx/INS_sve_mov into emitInsSve_Mov.

@dhartglassMSFT
Copy link
Contributor

dhartglassMSFT commented Mar 11, 2026

Change lgtm, thanks for the refactor

I can merge once Alan signs off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI arm-sve Work related to arm64 SVE/SVE2 support community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants