hippy
Posts: 10269
Joined: Fri Sep 09, 2011 10:34 pm
Location: UK

[SOLVED] Dynamic code execution

Fri Jun 18, 2021 12:52 pm

I am presuming it's entirely possible to reserve a chunk of suitably aligned RAM on a Pico, create and write ARM instruction codes into that, jump to it and have that code execute.

Are there any examples or instructions on how to do that, and how one would call out of that code to terminate its execution or do things it doesn't handle itself then return to it ?
Last edited by hippy on Mon Jun 21, 2021 7:29 pm, edited 1 time in total.

User avatar
triss64738
Posts: 31
Joined: Wed Jun 16, 2021 5:13 pm
Location: masto/fedi: sys64738@hellsite.site
Contact: Website

Re: Dynamic code execution

Fri Jun 18, 2021 1:03 pm

Wait what exactly do you mean by "reserve"? put the bytes or words in the opposite order (and thus want to execute code with a decreasing program counter)? reverse-engineer?

In general, you could put a svc or bkpt instruction at the end of a block of code, and handle it to return back to whatever else should be done. Similarly, you can also hook into the undefined instruction exception handler to handle eg. a few extra ARM/Thumb instructions you'd want. Though these mechanisms are relatively slow.. You could also try a manual coroutine-style register-saving approach. Also note that due to the overhead of handing undefined instruction exceptions, if you want to use mostly custom/unsupported instructions, you may be better off with a cached/threading interpreter (a JIT is probably too overkill here).

cleverca22
Posts: 3953
Joined: Sat Aug 18, 2012 2:33 pm

Re: Dynamic code execution

Fri Jun 18, 2021 1:09 pm

you can also just put a proper return opcode into the generated code, to return to whatever the link register is

User avatar
triss64738
Posts: 31
Joined: Wed Jun 16, 2021 5:13 pm
Location: masto/fedi: sys64738@hellsite.site
Contact: Website

Re: Dynamic code execution

Fri Jun 18, 2021 2:15 pm

cleverca22 wrote: you can also just put a proper return opcode into the generated code, to return to whatever the link register is
That's not always reliable: leaf functions or handwritten assembly routines sometimes use it for extra storage... at least on ARM. Maybe for Thumb it's different.

User avatar
jahboater
Posts: 7181
Joined: Wed Feb 04, 2015 6:38 pm
Location: Wonderful West Dorset

Re: Dynamic code execution

Fri Jun 18, 2021 2:21 pm

May not be relevant on the Pico, but you can use mmap() to reserve some memory with execute access (just give PROT_EXEC).

dthacher
Posts: 40
Joined: Sun Jun 06, 2021 12:07 am

Re: Dynamic code execution

Fri Jun 18, 2021 2:45 pm

Cortex-M0+ is Von Neumann system bus to the best of my knowledge, unless they modified it. The compiler may support RAM function. You will likely want relative based logic.

User avatar
triss64738
Posts: 31
Joined: Wed Jun 16, 2021 5:13 pm
Location: masto/fedi: sys64738@hellsite.site
Contact: Website

Re: Dynamic code execution

Fri Jun 18, 2021 3:12 pm

jahboater wrote:
Fri Jun 18, 2021 2:21 pm
May not be relevant on the Pico, but you can use mmap() to reserve some memory with execute access (just give PROT_EXEC).
mmap() (or similar) is only a thing when you're running an OS, which the Pïco doesn't have. The Pico doesn't even have an MMU (iirc it does have an MPU, but I think it's unused?), so doing this also doesn't make much sense.
dthacher wrote: Cortex-M0+ is Von Neumann system bus to the best of my knowledge, unless they modified it. The compiler may support RAM function. You will likely want relative based logic.
The Pico can definitely very much run code from RAM (cf. PICO_NO_FLASH CMake option, or the BOOTSEL button example). However, instruction pipelining and separate instruction and data caches means that some caution has to be made when modifying code in RAM: changes to the modified code will only come through when you flush the data cache and invalidate the instruction cache, and when you try to modify an instruction that's already in the CPU's pipeline, the old one will still be executed.

cleverca22
Posts: 3953
Joined: Sat Aug 18, 2012 2:33 pm

Re: Dynamic code execution

Fri Jun 18, 2021 3:20 pm

triss64738 wrote:
Fri Jun 18, 2021 3:12 pm
The Pico can definitely very much run code from RAM (cf. PICO_NO_FLASH CMake option, or the BOOTSEL button example). However, instruction pipelining and separate instruction and data caches means that some caution has to be made when modifying code in RAM: changes to the modified code will only come through when you flush the data cache and invalidate the instruction cache, and when you try to modify an instruction that's already in the CPU's pipeline, the old one will still be executed.
i dont think it has any data cache, and the i-cache is basically not even there, its more of a prefetch, it will do 32bit reads to fetch an opcode, and if it gets 2 opcodes by accident, oh well, now it doesnt need to issue a read!

hippy
Posts: 10269
Joined: Fri Sep 09, 2011 10:34 pm
Location: UK

Re: Dynamic code execution

Fri Jun 18, 2021 4:58 pm

triss64738 wrote:
Fri Jun 18, 2021 1:03 pm
Wait what exactly do you mean by "reserve"? put the bytes or words in the opposite order (and thus want to execute code with a decreasing program counter)? reverse-engineer?
"Reserve" not "reverse" ;)

Basically setting aside some RAM for this. My first thought was a uint16_t or unint32_t array but 'malloc' or something might do, 'mmap' if that is provided for in Pico-SDK.
triss64738 wrote:
Fri Jun 18, 2021 1:03 pm
In general, you could put a svc or bkpt instruction at the end of a block of code, and handle it to return back to whatever else should be done. Similarly, you can also hook into the undefined instruction exception handler to handle eg. a few extra ARM/Thumb instructions you'd want. Though these mechanisms are relatively slow.. You could also try a manual coroutine-style register-saving approach. Also note that due to the overhead of handing undefined instruction exceptions, if you want to use mostly custom/unsupported instructions, you may be better off with a cached/threading interpreter (a JIT is probably too overkill here).
Sounds about right, but I have no idea of how to do that, how to hook into exception handlers and so on, how to execute 'userland code' from within them; for example to call 'printf' from my code.

What I am trying to do is create ARM executable code in MicroPython but that would be handled within a C Extension, something like below where my "svc" will assemble to something which eventually leads to my 'Callback' routine being executed, and "end" allow the Python code to continue -

Code: Select all

def Callback(svc, r):
  if svc == 12:
    print("{} = {} + {}".format(r[1], r[2], r[3]))

def Add():
  myArmCode = array.array("W")
  ArmAsm(myArmCode, "mov r1, r2") # r1 = r2 + r3
  ArmAsm(myArmCode, "add r1, r3")
  ArmAsm(myArmCode, "svc 12")
  ArmAsm(myArmCode, "end")
  ArmRun(myArmCode, Callback, r2=22, r3=33)
  print("Done")

User avatar
triss64738
Posts: 31
Joined: Wed Jun 16, 2021 5:13 pm
Location: masto/fedi: sys64738@hellsite.site
Contact: Website

Re: Dynamic code execution

Fri Jun 18, 2021 5:59 pm

hippy wrote:
Fri Jun 18, 2021 4:58 pm
What I am trying to do is create ARM executable code in MicroPython but that would be handled within a C Extension, something like below where my "svc" will assemble to something which eventually leads to my 'Callback' routine being executed, and "end" allow the Python code to continue -

Code: Select all

def Callback(svc, r):
  if svc == 12:
    print("{} = {} + {}".format(r[1], r[2], r[3]))

def Add():
  myArmCode = array.array("W")
  ArmAsm(myArmCode, "mov r1, r2") # r1 = r2 + r3
  ArmAsm(myArmCode, "add r1, r3")
  ArmAsm(myArmCode, "svc 12")
  ArmAsm(myArmCode, "end")
  ArmRun(myArmCode, Callback, r2=22, r3=33)
  print("Done")
Ok wait so, do I understand it right, that:
  • The assembled code is something you can control, or at least it will be (or is mean to be) a proper function, following the AAPCS calling convention (arguments in r0-r3, all other registers saved, return address in lr)
  • The assembly can be regular Thumb2 code (supported by the M0+), instead of the full-fledged ARM instruction set (not supported)
In this case, I think you can simply call it as an external (C-ish) function pointer when written to memory, and you can use "bx lr" etc. to return to your previous code. In regular Python on your PC, you can use ctypes to turn a bunch of bytes (including a bytearray) into a C function pointer, which can then be called like a Python function (also you need to declare the memory as read-execute on a 'real' computer)... don't know how different this is in MicroPython, but maybe you can figure out something like that.

Though, if you want to know how to override exceptions: you can declare the (C/asm) symbols "isr_svcall" and "isr_invalid" for hooking to when this happens (these will be called automatically by the CPU hardware, see pico-sdk/src/rp2_common/pico_standard_link/crt0.S). Though, these two don't behave exactly like C functions (I think, I've implemented these only on an old ARM7TDMI, maybe it's cleaner on a Cortex-M0+), you need to save registers and reinitialize the stack and choose a place to return to (can be the next instruction) etc, so you'll need to dig out an assembler and the Cortex-M0+ datasheet.

Also FYI I haven't ever used MicroPython myself, so I don't have the slightest clue on how to make C/asm stuff work with it (though I guess it'll be fine with the regular FFI if you follow the AAPCS nicely).

hippy
Posts: 10269
Joined: Fri Sep 09, 2011 10:34 pm
Location: UK

Re: Dynamic code execution

Fri Jun 18, 2021 8:20 pm

triss64738 wrote:
Fri Jun 18, 2021 5:59 pm
Ok wait so, do I understand it right, that:
  • The assembled code is something you can control, or at least it will be (or is mean to be) a proper function, following the AAPCS calling convention (arguments in r0-r3, all other registers saved, return address in lr)
  • The assembly can be regular Thumb2 code (supported by the M0+), instead of the full-fledged ARM instruction set (not supported)
Yes; exactly that. I will have an array of 16-bit values, each representing a 16-bit Thumb2 instruction, two for 32-bit. Jump to the start of the array and let it rip.

I don't know what will be created in the array because it will depend on what the end-user (not me) specifies at run-time. Creating the array is easy enough. I could emulate/interpret what's in the array but it makes much more sense to have the RP2040 execute it.

I suppose I could have a C routine which just consists of 'nop' instructions, overwrite those with my actual code and then call it. Could fix-up instructions so I can call the address of another C routine which acts as an SVC handler. That might avoid the need to have actual traps and whatnot.

hippy
Posts: 10269
Joined: Fri Sep 09, 2011 10:34 pm
Location: UK

Re: Dynamic code execution

Fri Jun 18, 2021 11:21 pm

Proof of concept coming along nicely. I have my MicroPython passing a code array and r0-r7 register values into a C Extension module, getting r0-r7 back. I am currently not using the code array, am simply calling an internal "Run()" routine making this a purely C issue.

I have decided I can ignore callbacks for now by making code before, between and after callbacks their own routines; return from one routine, do whatever the callback would do, call the next routine. Not efficient but it will do.

Where I am currently stuck is how to get C variables into ARM registers and vice-versa ...

Code: Select all

uint32_t reg[8];

void Run(void) {
  r1 = reg[1];    // Set ARM register 'r1' from C array item reg[1]
  r2 = reg[2];    // Set ARM register 'r2' from C array item reg[2]
  r1 = r2;        // Make this ARM "mov r1, r2"
  reg[1] = r1;    // Set C array item reg[1] to ARM register 'r1'
  reg[2] = r2;    // Set C array item reg[2] to ARM register 'r2'
}
I guess I need to look at how any Pico-SDK C code using inline assembler does it. But if anyone does have any ideas or experience I am all ears.

User avatar
triss64738
Posts: 31
Joined: Wed Jun 16, 2021 5:13 pm
Location: masto/fedi: sys64738@hellsite.site
Contact: Website

Re: Dynamic code execution

Sat Jun 19, 2021 4:36 am

hippy wrote:
Fri Jun 18, 2021 11:21 pm
Proof of concept coming along nicely. I have my MicroPython passing a code array and r0-r7 register values into a C Extension module, getting r0-r7 back. I am currently not using the code array, am simply calling an internal "Run()" routine making this a purely C issue.

I have decided I can ignore callbacks for now by making code before, between and after callbacks their own routines; return from one routine, do whatever the callback would do, call the next routine. Not efficient but it will do.

Where I am currently stuck is how to get C variables into ARM registers and vice-versa ...

Code: Select all

uint32_t reg[8];

void Run(void) {
  r1 = reg[1];    // Set ARM register 'r1' from C array item reg[1]
  r2 = reg[2];    // Set ARM register 'r2' from C array item reg[2]
  r1 = r2;        // Make this ARM "mov r1, r2"
  reg[1] = r1;    // Set C array item reg[1] to ARM register 'r1'
  reg[2] = r2;    // Set C array item reg[2] to ARM register 'r2'
}
I guess I need to look at how any Pico-SDK C code using inline assembler does it. But if anyone does have any ideas or experience I am all ears.
You'll need some assembly, as the C code can't do this directly. Maybe someting like this:

Code: Select all

uint32_t reg[8];

void Run(void) {
  PreCallback(); // whatever...
  asm volatile(
    // load the saved registers into the actual CPU registers
    "ldr r0, =reg\n" 
    "ldm r0, {r0-r7}\n"
    // 32 nops: put code here
    "nop;nop;nop;nop;nop;nop;nop;nop\n"
    "nop;nop;nop;nop;nop;nop;nop;nop\n"
    "nop;nop;nop;nop;nop;nop;nop;nop\n"
    "nop;nop;nop;nop;nop;nop;nop;nop\n"
    // save the CPU registers back into our array
    // we need to do this with 4 at a time as we need a register for destination pointer, first do the second half
    "push {r0-r3}\n"
    "ldr r0, =(reg+4*4)\n"
    "stm r0, {r4-r7}\n"
    // now the first half
    "pop {r0-r3}\n"
    "ldr r4, =(reg+0*4)\n"
    "stm r4, {r0-r3}\n"
    // tell the compiler this code uses no inputs or outputs, but it destroys all registers, and accesses memory
    ::: "r0","r1","r2","r3","r4","r5","r6","r7","memory"
  );
  PostCallback(); // whatever...
}
(I did NOT test this code, just wrote it out here. There's most likely a mistake in here somewhere.)

hippy
Posts: 10269
Joined: Fri Sep 09, 2011 10:34 pm
Location: UK

Re: Dynamic code execution

Sat Jun 19, 2021 12:33 pm

triss64738 wrote:
Sat Jun 19, 2021 4:36 am
(I did NOT test this code, just wrote it out here. There's most likely a mistake in here somewhere.)
Thanks. There was a lot of help there; some things I would have struggled to figure out.

I am familiar with ARM assembly but no expert, and have never done any assembler within C. So, while I have a good idea of where I'm heading, I'm not so clear on what the individual steps are, what the incantations should be for each step.

I had got closer than I had expected but it was your help got me there.

I have never used "ldm" and "stm" and will read up on those. My more long-winded version seems to have worked. With the overhead insignificant compared to making the call into it from MicroPython, that's something for later -

Code: Select all

void Run(void) { __asm volatile(
  "push {r0-r7}\n"

  "ldr  r7, =reg\n"
  "ldr  r0, [r7, #0*4]\n"
  "ldr  r1, [r7, #1*4]\n"
  "ldr  r2, [r7, #2*4]\n"
  "ldr  r3, [r7, #3*4]\n"
  "ldr  r4, [r7, #4*4]\n"
  "ldr  r5, [r7, #5*4]\n"
  "ldr  r6, [r7, #6*4]\n"
  "ldr  r7, [r7, #7*4]\n"

  "mov  r1, r2\n"
  "add  r1, r3\n"

  "push {r7}\n"
  "ldr  r7, =reg\n"
  "str  r0, [r7, #0*4]\n"
  "str  r1, [r7, #1*4]\n"
  "str  r2, [r7, #2*4]\n"
  "str  r3, [r7, #3*4]\n"
  "str  r4, [r7, #4*4]\n"
  "str  r5, [r7, #5*4]\n"
  "str  r6, [r7, #6*4]\n"
  "pop  {r0}\n"
  "str  r0, [r7, #7*4]\n"

  "pop  {r0-r7}\n"

  :::   "r0","r1","r2","r3","r4","r5","r6","r7","memory" );
}

Code: Select all

import _thumb2
#      0   1   2   3   4  5  6  7
reg = (99, 11, 22, 33, 4, 5, 6, 7)
print("Before : ", reg)
reg = _thumb2.run(None, reg)
print("After  : ", reg)

Code: Select all

MicroPython v1.16 on 2021-06-19; Raspberry Pi Pico with RP2HACK
Type "help()" for more information.
>>> %Run -c $EDITOR_CONTENT

Before :  (99, 11, 22, 33, 4, 5, 6, 7)
After  :  (99, 55, 22, 33, 4, 5, 6, 7)
So, so far, looking like we're on our way.

Next step is getting my "r1 = r2 + r3" into a code array and getting the C Extension to execute that. As you said, that should be a case of creating a pointer to an array and calling it.

Once again; many thanks.

User avatar
triss64738
Posts: 31
Joined: Wed Jun 16, 2021 5:13 pm
Location: masto/fedi: sys64738@hellsite.site
Contact: Website

Re: Dynamic code execution

Sat Jun 19, 2021 1:19 pm

Ah, nice!

FYI ldm and stm are "load/store multiple" instructions: they can read/write multiple registers from/to memory one after the other in one instruction. When you add the "ia"/"db" suffix (so it becomes eg. ldmia or stmdb), the address will be post-incremented or pre-decremented, respectively ("increase after"/"decrease before"), the "!" right after the address operand means "end address writeback to the address register". (Maybe the Cortex-M0+ only allows a few combinations, I'd have to check, but it'd disambiguate the exclamation mark.) At least in the full ARM ISA, push and pop are basically aliases of stmdb sp!, <regs..> and ldmia sp!, <regs..>. These instructions were mostly useful pre-ARMv6, as that was the fastest way to access memory. (Due to the increased complexity of the ARMv6 and Cortex-Asomething microarchitectures, these are now microcoded I think? But the end result is that they're quite slow, using only a single load/store unit, thus not fully saturating the memory bandwidth. VFP/NEON block copies can be quite a bit faster as those utilize all load/store units in the CPU iirc.)

kilograham
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 741
Joined: Fri Apr 12, 2019 11:00 am
Location: austin tx

Re: Dynamic code execution

Sat Jun 19, 2021 1:41 pm

LDMIA/STMIA are faster for reading multiple registers on CoretexM0+ - it is 2 + N cycles vs 2 * N

As for your interface, not sure that inline assembler is really your friend here, i'd just use an aassembly function in a .S to call into the generate code.

Also at a higher level, i'm not sure what you are trying to do, but MicroPython supports inline thumb assembly, but i guess you are getting the instruction bytes from somewhere else?

hippy
Posts: 10269
Joined: Fri Sep 09, 2011 10:34 pm
Location: UK

Re: Dynamic code execution

Sat Jun 19, 2021 2:24 pm

It would seem to make sense to use "ldm" and "stm" but my main reluctance to do so comes down to rather useless warning messages when I do ...

Code: Select all

[ 16%] Building C object CMakeFiles/myasm.dir/myasm.c.obj
/tmp/ccgQGdx1.s: Assembler messages:
/tmp/ccgQGdx1.s:73: Warning: this instruction will write back the base register
/tmp/ccgQGdx1.s:150: Warning: this instruction will write back the base register
[ 17%] Linking CXX executable myasm.elf
[100%] Built target myasm
I really don't like seeing warning messages and haven't figured out how to turn them off. As an indicator that "you had best check that code to make sure it is doing what you want it to do and not what you don't" they seem worse than useless because '/tmp/ccgQGdx1.s' is a temporary file deleted after compilation, so there's no clue as to what the actual warnings relate to or where the pertinent code would be.

I will stick with what I have for now, may reconsider "ldm" and "stm" later.

Doing it in assembler is an option but, as what I have is working as desired, I'll hold that in reserve if things do become problematic. At least doing it in C means the MicroPython build just works, there's no need to figure out how to include assembler source with that. It should also allow the functionality to be easily delivered as a Native C Module library - I might move that up the list, do that sooner rather than later.

As to the overall project; it's a compiler for user provided source code when that user is using MicroPython. The executable code has to be generated by that compiler on-the-fly then executed. I have looking at using MicroPython's built in helpers for native thumb code generation on my list but I am okay without those, can generate the thumb instruction codes myself.

dthacher
Posts: 40
Joined: Sun Jun 06, 2021 12:07 am

Re: Dynamic code execution

Sat Jun 19, 2021 2:54 pm

Building a basic assembler should be possible in Python. Storing the logic inside of an array should allow it to execute. You can jump, call, etc. to the array. However you will be trusting the assembly which you do not write to properly manage context saves, stack management, argument passing, etc. I do not believe you will support printf/malloc/etc. currently. (There are a few approaches for this.)

If you want to build the interface into a proper function call instead of a random jump you could. If you want argument passing to the max of the register convention I guess I could see that. (Note more will require assembly logic to understand structures via pointers.) I would either build a function via C module or assembly function via C module. All caller management will be done by function call and basic callee management can/would be done here.

Code: Select all

// Option 1
typedef struct { uint32_t args[8]; } arg_t;
void Run(arg_t *args);

// Option 2
void Run(uint32_t arg0, uint32_t arg1, uint32_t arg2, uint32_t arg3, uint32_t arg4, uint32_t arg5, uint32_t arg6, uint32_t arg7);
This will allow the guts of the assembly logic to be wrapped within a function. Probably one of the arguments to the function will need to be assembly array. Which the will be called by the function using a jump and link, call, etc. instruction. This will ensure that you come back to this function. You can do this with a void function pointer to the assembly arrays address.

Code: Select all

void Run(uint8_t *asm, uint32_t arg0, uint32_t arg1, uint32_t arg2, uint32_t arg3, uint32_t arg4, uint32_t arg5, uint32_t arg6, uint32_t arg7) {
	void (*func)(uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t);
	func = (void *) asm;
	
	func(arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7);
}
Edit: Note this model here forces the assembly code to be a valid function more or less. Meaning it must return back. You can support return types if you want also. The assembly for a simple random add should not worry about stack or context logic. However uses the inputs registers and writes to the output registers.

Note I have not tried this myself and may be incomplete. This is purely for educational/example purposes. This also could/would be a massive security hole. (I distrust Von Neumann and other things for this very reason.)

User avatar
triss64738
Posts: 31
Joined: Wed Jun 16, 2021 5:13 pm
Location: masto/fedi: sys64738@hellsite.site
Contact: Website

Re: Dynamic code execution

Sat Jun 19, 2021 4:00 pm

hippy wrote:
Sat Jun 19, 2021 2:24 pm
It would seem to make sense to use "ldm" and "stm" but my main reluctance to do so comes down to rather useless warning messages when I do ...

Code: Select all

[ 16%] Building C object CMakeFiles/myasm.dir/myasm.c.obj
/tmp/ccgQGdx1.s: Assembler messages:
/tmp/ccgQGdx1.s:73: Warning: this instruction will write back the base register
/tmp/ccgQGdx1.s:150: Warning: this instruction will write back the base register
[ 17%] Linking CXX executable myasm.elf
[100%] Built target myasm
I really don't like seeing warning messages and haven't figured out how to turn them off. As an indicator that "you had best check that code to make sure it is doing what you want it to do and not what you don't" they seem worse than useless because '/tmp/ccgQGdx1.s' is a temporary file deleted after compilation, so there's no clue as to what the actual warnings relate to or where the pertinent code would be.

I will stick with what I have for now, may reconsider "ldm" and "stm" later.
It's probably means that the assembler is automatically turning "ldm reg, {stuff}" into "ldmia reg!, {stuff}". It did that too when I wrote this code here (which originally only used ldm/stm). (This also gives you an example on how to declare a function completely in assembly, FYI.)

hippy
Posts: 10269
Joined: Fri Sep 09, 2011 10:34 pm
Location: UK

Re: Dynamic code execution

Sat Jun 19, 2021 6:20 pm

triss64738 wrote:
Sat Jun 19, 2021 4:00 pm
It's probably means that the assembler is automatically turning "ldm reg, {stuff}" into "ldmia reg!, {stuff}".
Yes, exactly that when I looked at the disassembler listing. Not something to worry about as said, and there's no real worry over what my generated code can or can't do or how it will do it. All we need for now is to get it running my provided instructions which will only manipulate registers and return

As noted it works when my generated instructions are hard-coded into my 'Run()' routine. Moving to a function pointer, pointing to an internal routine or an array of hard-wired instructions, is proving more of a nightmare.

Adding a function pointer call unsurprisingly causes GCC to generate a jump via a register to where it points which corrupts the registers just loaded. That should be easy enough to compensate for; push r0-r7 before the call and pop r0-r7 as the first thing done in my own instructions, push lr, do my stuff, then pop pc to return. But that crashes MicroPython.

Not being a proficient C programmer I don't know if I am doing it wrong or what, maybe am not even setting my function pointer correctly. Looking at the disassembler output I'm not sure if I have an endian issue or it just looks that way ...

Code: Select all

void (*codePtr)(void); // Function pointer to the code array

uint16_t InternalCodeArray[] = {
  0xBCFF, // pop  {r0-r7}
  0xB500, // push {lr}
  0x1C11, // adds r1, r2, #0
  0x18C9, // adds r1, r1, r3
  0xBD00, // pop  {pc}
};

void Run(void) {
  __asm volatile(
    // ...snip... Set the ARM registers from reg[] C array
    "push {r0-r7}\n"
    :::   "r0","r1","r2","r3","r4","r5","r6","r7", "memory"
  );
  codePtr();
  __asm volatile(
    // ...snip... Set the reg[] C array from ARM registers
    :::   "r0","r1","r2","r3","r4","r5","r6","r7", "memory"
  );
}

// ...snip... when called by MicroPython ...
      codePtr = (void*)&InternalCodeArray[0];
      Run();
Comments added by myself ...

Code: Select all

20006f14 <InternalCodeArray>:
20006f14:       b500bcff 18c91c11                        ..........

1002a9a4 <Run>:
  GCC inserted entry code and push return address
1002a9a4:       b5f8            push    {r3, r4, r5, r6, r7, lr}
  My own push everything code
1002a9a6:       b4ff            push    {r0, r1, r2, r3, r4, r5, r6, r7}
  Set registers fron reg[]
1002a9a8:       4f0d            ldr     r7, [pc, #52]   ; (1002a9e0 <Run+0x3c>)
1002a9aa:       6838            ldr     r0, [r7, #0]
1002a9ac:       6879            ldr     r1, [r7, #4]
1002a9ae:       68ba            ldr     r2, [r7, #8]
1002a9b0:       68fb            ldr     r3, [r7, #12]
1002a9b2:       693c            ldr     r4, [r7, #16]
1002a9b4:       697d            ldr     r5, [r7, #20]
1002a9b6:       69be            ldr     r6, [r7, #24]
1002a9b8:       69ff            ldr     r7, [r7, #28]
  push everything which will get popped as first thing in code array
1002a9ba:       b4ff            push    {r0, r1, r2, r3, r4, r5, r6, r7}
  GCC uses r3 for function pointer, but no pushes so TOS should be as above when entered
1002a9bc:       4b07            ldr     r3, [pc, #28]   ; (1002a9dc <Run+0x38>)
1002a9be:       681b            ldr     r3, [r3, #0]
1002a9c0:       4798            blx     r3
  Return from code array, set reg[] from registers
1002a9c2:       b480            push    {r7}
1002a9c4:       4f06            ldr     r7, [pc, #24]   ; (1002a9e0 <Run+0x3c>)
1002a9c6:       6038            str     r0, [r7, #0]
1002a9c8:       6079            str     r1, [r7, #4]
1002a9ca:       60ba            str     r2, [r7, #8]
1002a9cc:       60fb            str     r3, [r7, #12]
1002a9ce:       613c            str     r4, [r7, #16]
1002a9d0:       617d            str     r5, [r7, #20]
1002a9d2:       61be            str     r6, [r7, #24]
1002a9d4:       bc01            pop     {r0}
1002a9d6:       61f8            str     r0, [r7, #28]
  Undo my push everything
1002a9d8:       bcff            pop     {r0, r1, r2, r3, r4, r5, r6, r7}
  GCC's inserted undo entry preamble and return
1002a9da:       bdf8            pop     {r3, r4, r5, r6, r7, pc}
1002a9dc:       2003a1e4        .word   0x2003a1e4
1002a9e0:       2003a1c4        .word   0x2003a1c4

1002a9e4 <thumb2_run>:
  ... snip ...
  what I assume set codePtr=
1002aa1a:       4b19            ldr     r3, [pc, #100]  ; (1002aa80 <thumb2_run>
1002aa1c:       4a19            ldr     r2, [pc, #100]  ; (1002aa84 <thumb2_run>
1002aa1e:       601a            str     r2, [r3, #0]
  And then it calls Run()
1002aa20:       f7ff ffc0       bl      1002a9a4 <Run>

User avatar
triss64738
Posts: 31
Joined: Wed Jun 16, 2021 5:13 pm
Location: masto/fedi: sys64738@hellsite.site
Contact: Website

Re: Dynamic code execution

Sat Jun 19, 2021 7:07 pm

Hmmmm... are you sure the address the MicroPython code sets it to is correct? Does the crash go away when you comment out the call to the function pointer? What if you replace the array by a pointer to a function you wrote earlier? So something like this:

Code: Select all

void TestFunc(void) {
  asm volatile(
    "pop {r0-r7}\n"
    "bx lr\n"
    :::"r0","r1","r2","r3","r4","r5","r6","r7", "memory"
  );
}
// ...
codePtr = TestFunc;
If only the above fixes it, it's probably the Cortex-M0+ that is enabled... (idk how the SDK sets this one up)

Also, what kilograham said about using plain assembly... at this point it does start feeling like this may be easier, as you don't have to guess about what the compiler may be doing.

hippy
Posts: 10269
Joined: Fri Sep 09, 2011 10:34 pm
Location: UK

Re: Dynamic code execution

Sat Jun 19, 2021 7:49 pm

triss64738 wrote:
Sat Jun 19, 2021 7:07 pm
Hmmmm... are you sure the address the MicroPython code sets it to is correct?
I'm not using MicroPython to do anything but invoke the C code and pass the registers. The C code dictates what code to execute. My four stage plan -

1) Get the C code to accept and return registers, manipulate the registers as part of that - Success.

2) Move the manipulation of registers out of that code and do the actual manipulation via a function pointer - Failed, whether pointing to an internal C function or a C array.

3) Once executing instructions from an internal array works, make it work from an external array passed down from MicroPython - Pending.

4) Once that's working the C part is over; finish the MicroPython side of things - Pending.

triss64738 wrote:
Sat Jun 19, 2021 7:07 pm
Does the crash go away when you comment out the call to the function pointer?
Yes. And if the register manipulation is done where the call to the function pointer would be, that all works.
triss64738 wrote:
Sat Jun 19, 2021 7:07 pm
What if you replace the array by a pointer to a function you wrote earlier?
codePtr = TestFunc;
That doesn't work because GCC adds push and pop wrappers inside that routine, around what I have in that code, stops it working. I tried to compensate for that but had no success.

That's what had me focusing on an internal code array because GCC doesn't interfere with that as far as I can see.
triss64738 wrote:
Sat Jun 19, 2021 7:07 pm
Also, what kilograham said about using plain assembly... at this point it does start feeling like this may be easier, as you don't have to guess about what the compiler may be doing.
We can see exactly what the compiler is doing from the disassembly and I can understand that enough to say there should be no issues calling a code array using the function pointer. It seems to me an issue related to pointing the function pointer to where I want it, or the code array not being correct in some way.

I suppose the next step in debugging has to be to return what is in the code array and what the code pointer points to, to see if they are what I would expect them to be.

I don't see that moving to assembler at this stage is going to achieve anything, will likely just make things harder. First thing is I don't know how to integrate assembler into a MicroPython build, and I am betting bridging MicroPython to C to assembler brings its own complications.

As far as I can tell this should be entirely possible in C; it's more that I am not a C programmer and don't have the experience and knowledge to get it right or know where I am going wrong.

dthacher
Posts: 40
Joined: Sun Jun 06, 2021 12:07 am

Re: Dynamic code execution

Sat Jun 19, 2021 7:59 pm

hippy wrote:
Sat Jun 19, 2021 6:20 pm
As noted it works when my generated instructions are hard-coded into my 'Run()' routine. Moving to a function pointer, pointing to an internal routine or an array of hard-wired instructions, is proving more of a nightmare.

Adding a function pointer call unsurprisingly causes GCC to generate a jump via a register to where it points which corrupts the registers just loaded. That should be easy enough to compensate for; push r0-r7 before the call and pop r0-r7 as the first thing done in my own instructions, push lr, do my stuff, then pop pc to return. But that crashes MicroPython.

Not being a proficient C programmer I don't know if I am doing it wrong or what, maybe am not even setting my function pointer correctly. Looking at the disassembler output I'm not sure if I have an endian issue or it just looks that way ...
There are a few aspects to some of the function pointer which can be ignored in some cases. What you are more or less doing is telling the compiler how the function preamble works then providing it the address of the function. The issue you have is you are using a function without input arguments. I mentioned this before.

The registers have roles in assembly/C convention for functions. This is divided into two roles, callee and caller. Some context saving must be handled before the function call is made. The rest is required inside the function by the function called. However this can be ignored if it will never be disrupted.

Your function pointer fails to notify the C compiler about an arguments list. Therefore the callee and caller do not touch them. You will need to rework your MicroPython logic to address this. Like I said there are few ways to solve this. I do not know you end goal. Self modifying code?

One option is use a struct which you assemble using C types. This will be passed as void pointer to C function and ultimately to the assembly logic. Your assembly logic will need to know how to pull this out. Meaning over SPI you could send two arrays over. One containing code and another containing arguments.

Another option is to make multiple function prototypes or convert arguments into a single function which uses max allowed by convention. Unused registers will get some dummy value.

Overall this will not be ideal. Normally this is not how you do things. You want to make this general purpose. However you are narrowing it down to function, which you technically could avoid.

Note you have not even gotten to the fun parts yet.

hippy
Posts: 10269
Joined: Fri Sep 09, 2011 10:34 pm
Location: UK

Re: Dynamic code execution

Sat Jun 19, 2021 8:12 pm

hippy wrote:
Sat Jun 19, 2021 7:49 pm
I suppose the next step in debugging has to be to return what is in the code array and what the code pointer points to, to see if they are what I would expect them to be.
That seems to deliver what I would expect, 'codePtr' contains a value which is the address of the 'InternalCodeArray', and the first 'InternalCodeArray' entry is the 0xBCFF I put there ...

Code: Select all

    codePtr = (void*)&InternalCodeArray[0];
    Run();
    reg[5] = (uint32_t)codePtr;
    reg[6] = (uint32_t)&InternalCodeArray[0];
    reg[7] = InternalCodeArray[0];
    mp_obj_t tuple[] = {
        mp_obj_new_int(reg[0]),
        ... snip ...
        mp_obj_new_int(reg[7]),
    };
    return mp_obj_new_tuple(8, tuple);

Code: Select all

#      0   1   2   3   4  5  6  7
reg = (99, 11, 22, 33, 4, 5, 6, 7)
print("Before : ", reg)
reg = _thumb2.run(None, reg)
print("After  : ", reg)
print("reg[5] codePtr               = {}".format(hex(reg[5])))
print("reg[6] c&InternalCodeArray[0] = {}".format(hex(reg[6])))
print("reg[7] InternalCodeArray[0]  = {}".format(hex(reg[7])))

Code: Select all

efore :  (99, 11, 22, 33, 4, 5, 6, 7)
After  :  (99, 55, 22, 33, 4, 536899348, 536899348, 48383)
reg[5] codePtr               = 0x20006f14
reg[6] &InternalCodeArray[0] = 0x20006f14
reg[7] InternalCodeArray[0]  = 0xbcff
So I would expect 'codePtr();' to call the 'InternalCodeArray'. Either it doesn't or it does and there's something else going on. It's nicely aligned on a 32-bit boundary, so it's not alignment.

kilograham
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 741
Joined: Fri Apr 12, 2019 11:00 am
Location: austin tx

Re: Dynamic code execution

Sat Jun 19, 2021 8:20 pm

CortexM0+ programmer axiom #1: "you will have forgotten to set the thumb bit"

if you are branching to an address indirectly, then the low bit of the target address must be set.

Return to “General”