I can save memory by using 2 byte structs but I don't know if I am giving up performance and speed by doing so because the arm cpu is 32 bit.
Thanks,
Chris
Code: Select all
uint32_t *mvect_buffer = (uint32_t*)malloc(mv_size);
struct vector_package { //package will be 32 bit wide to optimize memory transfers
uint8_t xcoord1=0;
uint8_t ycoord1=0;
uint8_t xcoord2=0;
uint8_t ycoord2=0;
} ups;
....
//Write to buffer in 4 byte chunks otherwise, performance penalty
memcpy(mvect_buffer+offset,&ups,sizeof(vector_package));
//Load from buffer in 4 byte chunks
memcpy(&ups,mvect_buffer+offset,sizeof(vector_package));
while (true) { //No performance penalty here due to the fact that special registers are used?
ups.xcoord1+=other_value;
ups.xcoord2+=other_value;
ups.ycoord1+=other_value;
ups.ycoord2+=other_value;
}
One of the founding fathers of computer science saidcmisip wrote: ↑Tue Jul 10, 2018 12:17 amI will be using memcpy to load values into a buffer and read from it. I therefore need to create a buffer of uint32_t* to be assured that I am given an address that is 4 byte aligned. Instead of saving a 4 byte word and using bit operations, use a struct with 4 members that are one byte size. In the loop iteration, the handling of the struct members should not be a performance penalty. It is when I copy to and from memory that I must keep the transfer 4 bytes wide. Is this correct?
Thanks,
Chris
Code: Select all
uint32_t *mvect_buffer = (uint32_t*)malloc(mv_size); struct vector_package { //package will be 32 bit wide to optimize memory transfers uint8_t xcoord1=0; uint8_t ycoord1=0; uint8_t xcoord2=0; uint8_t ycoord2=0; } ups; .... //Write to buffer in 4 byte chunks otherwise, performance penalty memcpy(mvect_buffer+offset,&ups,sizeof(vector_package)); //Load from buffer in 4 byte chunks memcpy(&ups,mvect_buffer+offset,sizeof(vector_package)); while (true) { //No performance penalty here due to the fact that special registers are used? ups.xcoord1+=other_value; ups.xcoord2+=other_value; ups.ycoord1+=other_value; ups.ycoord2+=other_value; }
It looks to me like you have created a clumsy data structure for the sake of optimisation that will likely make your program much more complicated while offering little if any performance improvement.Donald Knuth wrote:We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.
Thats a very good idea. Memcpy will always do the right thing, aligned or not, and is the preferred way of "type punning".
Code: Select all
@ try.c:22: uint32_t *mvect_buffer = (uint32_t*)malloc(4000);
bl malloc
@ try.c:39: memcpy(&ups,mvect_buffer+offset,sizeof(struct vector_package));
ldr r3, [r0, #12]
Code: Select all
while (true) { //No performance penalty here due to the fact that special registers are used?
ups.xcoord1+=other_value;
ups.xcoord2+=other_value;
ups.ycoord1+=other_value;
ups.ycoord2+=other_value;
}
Sorry, no its not.
Code: Select all
@ try.c:39: memcpy(&ups,mvect_buffer+offset,sizeof(struct vector_package));
ldr r3, [r0, #12] So presumably the compiler realises the copy is small as its a literal and does something different. Does it specifically look for memcpy/similar calls to optimise out then?jahboater wrote: ↑Tue Jul 10, 2018 6:34 amSorry, no its not.There is no function call.Code: Select all
@ try.c:39: memcpy(&ups,mvect_buffer+offset,sizeof(struct vector_package)); ldr r3, [r0, #12]
Writing things like:
*(uint32_t*)ptr = num
will likely produce identical code to:
memcpy( ptr, &num, 4 )
but if it cant for some reason, it will always do the right thing.
memcpy is the preferred way of "type punning" like this.
I'm not sure what you are supposed to do if "num" is a literal though.
Yes indeed, exactly that.
I think it will replace quite a few different library routines where there is a hardware instruction that can do the job.