User avatar
PeterO
Posts: 4257
Joined: Sun Jul 22, 2012 4:14 pm

Dynamic sized arrays

Mon May 28, 2018 11:42 am

I'm writing some C code to parse a "wavefront" format file which contains data for a 3D object.

As part of this my code needs a two dimensional array to keep track of used verticies and normals, but the problem is that the size of the arrays is not known at compile time and can only be determined after the first pass through the file.

A quick "google" showed me that C99 introduced the ability to declare dynamically sized arrays so that's helpful, but there remains the issue of scope. I need the array to be global so that the various functions that handle the different lines in the file can all access the array.

My solution is to use a global void * to hold the address of the memory malloced for the array after the first pass, and then to assign this to a pointer to a dynamically sized array in each function that needs to access the array.

Below is my small "proof of concept" code. Does anyone have a better solution ?

Code: Select all

#include <stdio.h>
#include <stdlib.h>

void *data;    // Use this to point to the memory malloced for the array
int xs,ys;

static int testRead(int x,int y)
{
    int value;
    int (*array)[xs][ys];    // a pointer to a 2D array

    array = data;            // Use the malloced memory for the array

    value = (*array)[x][y];

    return(value);
}


static void testWrite(int x,int y,int value)
{
    int (*array)[xs][ys];    // a pointer to a 2D array

    array = data;            // Use the malloced memory for the array

    (*array)[x][y] =  value;
}


int main(__attribute__((unused)) int argc,__attribute__((unused)) char **argv)
{
    int v;

    xs = 10;    // Set required array size
    ys = 10;

    data = malloc(sizeof(int) * (unsigned) xs * (unsigned) ys);

    // Do some simple tests

    for(int x = 0; x < xs; x++)
	for(int y = 0; y < ys ; y++)
	{ 
	    v = x * y;
	    printf("WRITE (%d,%d)=%d\n",x,y,v);
	    testWrite(x,y,v);
	}

    for(int x = 0; x < xs; x++)
	for(int y = 0; y < ys;  y++)
	{ 
	    v = testRead(x,y);
	    printf("READ (%d,%d)=%d\n",x,y,v);
	}
    return(EXIT_SUCCESS);
}
I needs to be compiled with "-std=gnu99".

PeterO
Discoverer of the PI2 XENON DEATH FLASH!
Interests: C,Python,PIC,Electronics,Ham Radio (G0DZB),Aeromodelling,1960s British Computers.
"The primary requirement (as we've always seen in your examples) is that the code is readable. " Dougie Lawson

jahboater
Posts: 3051
Joined: Wed Feb 04, 2015 6:38 pm

Re: Dynamic sized arrays

Mon May 28, 2018 12:14 pm

PeterO wrote:
Mon May 28, 2018 11:42 am
It needs to be compiled with "-std=gnu99".
That is the default by the way (well the later -std=gnu11 is the default, and with gcc 8.1 the default is -std=gnu17).

Do you know what the maximum size it could possibly be is ?

I have some code that carefully allocated the minimum size for many years. It could grow at run time.
There was lots of fiddly code to check each access wasn't off the end, with calls to realloc when needed (and the odd bug when realloc passed a new pointer back). One day recently I got fed up with the complication and just malloc'd the maximum size in one go - gasp what a lot of wasted space - 5MB of it!

No, looking at the memory usage of the program it was actually using no more memory than it did before. The paging/virtual memory sorted it all out and new pages were only allocated when needed. The system was doing all the complicated work for me!

When you malloc more than 128KB it just mmap's /dev/zero. At first all the pages get mapped to a single zero'd page then, on demand, new pages are allocated.

Perhaps only one pass is needed?

User avatar
PeterO
Posts: 4257
Joined: Sun Jul 22, 2012 4:14 pm

Re: Dynamic sized arrays

Mon May 28, 2018 12:48 pm

jahboater wrote:
Mon May 28, 2018 12:14 pm
PeterO wrote:
Mon May 28, 2018 11:42 am
It needs to be compiled with "-std=gnu99".
That is the default by the way (well the later -std=gnu11 is the default, and with gcc 8.1 the default is -std=gnu17).
It certainly isn't the default on the version I'm using. If I didn't need it I wouldn't have mentioned it !
gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4
Do you know what the maximum size it could possibly be is ?
No, because I have no idea of the complexity of the objects I may design in blender. Plus adding a modifier like a "sub-surface" can quickly multiply the complexity of the object's geometry.
Perhaps only one pass is needed?
No , there are several "counts" that need to be done during the first pass, number of verticies,texture verticies,faces,textures,objects....
I'm using linked lists for holding the actual data items read in pass two but I need the array to assign indexes to the used pairs of normals and vertices.

PeterO
Discoverer of the PI2 XENON DEATH FLASH!
Interests: C,Python,PIC,Electronics,Ham Radio (G0DZB),Aeromodelling,1960s British Computers.
"The primary requirement (as we've always seen in your examples) is that the code is readable. " Dougie Lawson

jahboater
Posts: 3051
Joined: Wed Feb 04, 2015 6:38 pm

Re: Dynamic sized arrays

Mon May 28, 2018 12:51 pm

PeterO wrote:
Mon May 28, 2018 12:48 pm
It certainly isn't the default on the version I'm using. If I didn't need it I wouldn't have mentioned it !
gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4
Ah OK sorry. I was thinking of the compiler that comes with Raspbian.

Heater
Posts: 10013
Joined: Tue Jul 17, 2012 3:02 pm

Re: Dynamic sized arrays

Mon May 28, 2018 1:51 pm

You can malloc() all the memory available in the Pi and none of it will get used or reserved for your program until it is actually written to.

The result of this is that your program will use more and more RAM as your model grows until there is none left and it crashes out with a SEGFAULT.

Note that is not true of calloc(). calloc writes zeros to all the allocated RAM so that would instantly consume all the RAM you have and cause chaos.

User avatar
PeterO
Posts: 4257
Joined: Sun Jul 22, 2012 4:14 pm

Re: Dynamic sized arrays

Mon May 28, 2018 2:07 pm

Heater wrote:
Mon May 28, 2018 1:51 pm
You can malloc() all the memory available in the Pi and none of it will get used or reserved for your program until it is actually written to.
It's unlikely I'll ever run this code on a PI. I run blender on a x86_64 with 4Gb of RAM and plenty of swap space. The processing only needs to be done once and the results written to a file in the correct format for using with OpenGL ES.
PeterO
Discoverer of the PI2 XENON DEATH FLASH!
Interests: C,Python,PIC,Electronics,Ham Radio (G0DZB),Aeromodelling,1960s British Computers.
"The primary requirement (as we've always seen in your examples) is that the code is readable. " Dougie Lawson

Heater
Posts: 10013
Joined: Tue Jul 17, 2012 3:02 pm

Re: Dynamic sized arrays

Mon May 28, 2018 7:28 pm

Whatever I said for malloc() on the Pi is true for any Unix/Linux machine.

I lose track, what was the actual problem again?

jahboater
Posts: 3051
Joined: Wed Feb 04, 2015 6:38 pm

Re: Dynamic sized arrays

Mon May 28, 2018 9:57 pm

The OOM killer will get you ...

User avatar
PeterO
Posts: 4257
Joined: Sun Jul 22, 2012 4:14 pm

Re: Dynamic sized arrays

Mon May 28, 2018 10:36 pm

It's amazing how you guys can manage to completely miss the point of a thread :roll: Did any of you actuually look at the code and try to see what I was asking about, or did you just decide to spout off about memory management because you don't understand C ?

PeterO :evil:
Discoverer of the PI2 XENON DEATH FLASH!
Interests: C,Python,PIC,Electronics,Ham Radio (G0DZB),Aeromodelling,1960s British Computers.
"The primary requirement (as we've always seen in your examples) is that the code is readable. " Dougie Lawson

Heater
Posts: 10013
Joined: Tue Jul 17, 2012 3:02 pm

Re: Dynamic sized arrays

Tue May 29, 2018 3:08 am

PeterO,
Did any of you actuually look at the code and try to see what I was asking about, or did you just decide to spout off about memory management because you don't understand C ?
Ouch!

I think we understand C well enough.

Your problem description kicks off with talk of needing arrays of a size unknown at compile time, that can grow as data is read in. Like so:
...the problem is that the size of the arrays is not known at compile time and can only be determined after the first pass through the file.
That immediately gets one into the issues of dynamically allocating memory for data and memory management.

There is not much of a question in your post appart from:
Does anyone have a better solution ?
Well yes:

1) Do not use global variables. Pass a pointer to your data as a parameter to functions.

2) Don't forget to pass size parameters as well.

3) You might want to define some struct that holds a pointer to your data and its various dimensions. Pass a pointer to that around. Use typedef.

4) C99's variable length arrays are not dynamic. They may be different sizes at each runtime instantiation, depending on some variable, but they cannot grow as your data grows. So they don't help much.

5) Change language. All this is much easier in C++ http://www.cplusplus.com/articles/37Mf92yv/

6) Change language. C++ is a pig. All this is much easier in a language that transparently handles memory management. Suggest Javascript. Looks like C, nearly as fast. Especially if you have a lot of memory management to do.

jahboater
Posts: 3051
Joined: Wed Feb 04, 2015 6:38 pm

Re: Dynamic sized arrays

Tue May 29, 2018 6:47 am

Heater wrote:
Tue May 29, 2018 3:08 am
1) Do not use global variables. Pass a pointer to your data as a parameter to functions.
Then you can use the stack frame in main and dimension the array with a variable as C99.
By default there is an 8MB limit on the stack size in Linux though.

Obviously globals should be avoided for various reasons, but passing pointers/size around all the way down the tree just adds to the complexity which is also undesirable. A trade-of an experienced programmer will know how to make.

Yes, std::vector or one of the other new STL container classes in C++ could be handy if you want to grow the data as you read it in.

PeterO

To avoid the need for two passes:-

The data is in a file, so could you use stat or lseek to get an indication of the maximum size needed?

Or, if the data size is significantly different from the file size, and your data is being written by a program, then perhaps you could write the actual size at the start of the file so that pass one could be trivial.

User avatar
PeterO
Posts: 4257
Joined: Sun Jul 22, 2012 4:14 pm

Re: Dynamic sized arrays

Tue May 29, 2018 7:06 am

Heater wrote:
Tue May 29, 2018 3:08 am
1) Do not use global variables. Pass a pointer to your data as a parameter to functions.
I would normally agree, but here using the global leads to a much simpler solution. The way my parser works is very simple (and it's code I've reused many times) but it gets very messy if you have to pass things around via pointers. I much prefer the simplicity.
2) Don't forget to pass size parameters as well.
Again the sizes are in globals for simplicity.


3) You might want to define some struct that holds a pointer to your data and its various dimensions. Pass a pointer to that around. Use typedef.
Yes that's a good idea.

4) C99's variable length arrays are not dynamic. They may be different sizes at each runtime instantiation, depending on some variable, but they cannot grow as your data grows. So they don't help much.
I really don't like thinks that change size (and possibly move when they do). It's all to easy to leave dangling pointers behind. Maybe "dymanic" was not the right description... "Unknown" might have been better, but C99's VLA certainly do help.

5) Change language. All this is much easier in C++ http://www.cplusplus.com/articles/37Mf92yv/
C++ is NEVER the right answer for me lol: :lol: I like simplicity and C++ never provides that.

6) Change language. C++ is a pig. All this is much easier in a language that transparently handles memory management. Suggest Javascript. Looks like C, nearly as fast. Especially if you have a lot of memory management to do.
I might move to a Python solution, but at the moment I'm actually using the parsed/processed data to directly drive existing C openGL ES code in the same programme to check that the reformatted data is correct, and at the time of writing it isn't :( Somewhere I have an "out by one error" that seems to be screwing up the indices of the vertices that form each triangle in the object.

PeterO
Discoverer of the PI2 XENON DEATH FLASH!
Interests: C,Python,PIC,Electronics,Ham Radio (G0DZB),Aeromodelling,1960s British Computers.
"The primary requirement (as we've always seen in your examples) is that the code is readable. " Dougie Lawson

GuruMeditation
Posts: 33
Joined: Thu May 31, 2018 3:33 pm
Location: UK

Re: Dynamic sized arrays

Thu May 31, 2018 3:56 pm

When interacting with data files the size of the data is never known at compilation time, and it is extremely bad practice to try to highjack the system's whole memory upfront (it may also be difficult to do on a system with virtual memory).

A wavefront file is a text file with lists of different pieces of data ( vertices, normals, face elements, etc.), one per line, with a character indicating the type at the beginning of each line.

In my view the first step is for you to work out how you want to represent these data in memory. That's probably a set of structures.

For example;

typedef struct {
float x;
float y;
float x;
} vertex_normal_t;

To parse a file you indeed need to do a first pass on it, to count the number of each type of data it contains.

Then you do one or several malloc to allocate the memory you need based on the way you represent the data in memory. For example, an array of vertex_normal_t.

Finally, you do a second pass on the file and parse the contents into your in-memory structures.

I would avoid a global variable and a 'void' type.

Heater
Posts: 10013
Joined: Tue Jul 17, 2012 3:02 pm

Re: Dynamic sized arrays

Thu May 31, 2018 10:10 pm

GuruMeditation,
To parse a file you indeed need to do a first pass on it, to count the number of each type of data it contains.

Then you do one or several malloc to allocate the memory you need based on the way you represent the data in memory. For example, an array of vertex_normal_t.

Finally, you do a second pass on the file and parse the contents into your in-memory structures.
There is something I don't understand about that plan...

If you are going to dynamically allocate memory for the data in the file, using malloc or new or whatever, then why the need to do two passes? Why not malloc/new things as you go in the one pass?

What am I missing here?

GuruMeditation
Posts: 33
Joined: Thu May 31, 2018 3:33 pm
Location: UK

Re: Dynamic sized arrays

Thu May 31, 2018 10:37 pm

To allocate an array you need to know its size, i.e. the number of elements.

The first pass is to work out the number of elements in order to be able to allocate the array as needed.

You could allocate element by element as you read through the file in one pass to create, e.g, a linked list, but the result is likely to be much less practical to use and larger in memory.

Also, regarding variable length arrays in C99, it only means that they can be declared using a variable instead of a constant, but the exact size must still be known at the point of declaration. It's essentially a convenient notation to hide the call to 'malloc'.

Heater
Posts: 10013
Joined: Tue Jul 17, 2012 3:02 pm

Re: Dynamic sized arrays

Thu May 31, 2018 11:16 pm

GuruMeditation,

OK, thanks, that explanation makes sense.

I do understand the limitations of C99 variable length arrays. I think you will find there is no malloc behind C99 arrays. It's all done on the stack.

On the other hand, if I were doing this in Javascript or Python or whatever I would just keep adding things to an array as I read them in and the array would magically get bigger to accommodate them.

The same can be done in C++ or C with a bit more work.

There is no need for linked lists and the memory waste that would cause.

My guess is that at the end of the day it's a case of which is which is easier and which is faster, a single pass or two passes.

GuruMeditation
Posts: 33
Joined: Thu May 31, 2018 3:33 pm
Location: UK

Re: Dynamic sized arrays

Thu May 31, 2018 11:42 pm

Heater wrote:
Thu May 31, 2018 11:16 pm
I think you will find there is no malloc behind C99 arrays. It's all done on the stack.
If so then the feature is really only useful for very small local arrays.
Heater wrote:
Thu May 31, 2018 11:16 pm
On the other hand, if I were doing this in Javascript or Python or whatever I would just keep adding things to an array as I read them in and the array would magically get bigger to accommodate them.
These are high level interpreted languages and the magic is actually a lot of work under the bonnet.

For example in Python, although lists are not actually implemented as linked lists, they are implemented as multiple arrays of pointers to the actual elements of the list.

In C++ a vector is an array that is expanded with essentially 'realloc'. Of course the snag is that when expansion fails then a whole new array must be allocated and all the content of the old array copied into it, and the way to avoid that is... To know the size beforehand.

Basically in C you are left to implement those algorithms yourself if you so wish. That a bit of work, indeed ;)

jahboater
Posts: 3051
Joined: Wed Feb 04, 2015 6:38 pm

Re: Dynamic sized arrays

Fri Jun 01, 2018 6:34 am

GuruMeditation wrote:
Thu May 31, 2018 11:42 pm
Heater wrote:
Thu May 31, 2018 11:16 pm
I think you will find there is no malloc behind C99 arrays. It's all done on the stack.
If so then the feature is really only useful for very small local arrays.
Think of it more like alloca().
In C++ a vector is an array that is expanded with essentially 'realloc'. Of course the snag is that when expansion fails then a whole new array must be allocated and all the content of the old array copied into it, and the way to avoid that is... To know the size beforehand.

Basically in C you are left to implement those algorithms yourself if you so wish. That a bit of work, indeed ;)
Realloc() is pretty simple to use. And if you note when the ptr changes, you will see that most of the time its very fast too, it has simply adjusted the size.

GuruMeditation
Posts: 33
Joined: Thu May 31, 2018 3:33 pm
Location: UK

Re: Dynamic sized arrays

Fri Jun 01, 2018 7:20 am

jahboater wrote:
Fri Jun 01, 2018 6:34 am
Realloc() is pretty simple to use. And if you note when the ptr changes, you will see that most of the time its very fast too, it has simply adjusted the size.
Using realloc is doing what I described C++ does with vectors. IMO, it is not very suited if the array can be quite large.

Heater
Posts: 10013
Joined: Tue Jul 17, 2012 3:02 pm

Re: Dynamic sized arrays

Fri Jun 01, 2018 7:51 am

GuruMeditation,
In C++ a vector is an array that is expanded with essentially 'realloc'. Of course the snag is that when expansion fails then a whole new array must be allocated and all the content of the old array copied into it, and the way to avoid that is... To know the size beforehand.
The question then is, why is that reallocation and copying a snag and why would one want to avoid it? I can think of a couple of reasons:

1) Reliability.

You don't want your program to bail out with an out of memory error after it has started up and is doing its work. Which could be many hours, days, weeks, later. You want to know it has all the resources it needs to work in from the get go.

Of course for many applications this is not a prime requirement and if you need it simply using malloc() is not good enough to actually be sure you have the memory.

2) Performance.

Clearly all that reallocation and copying takes time. Which sounds terrible.

In practice I don't think it is though. Because:

a) That C++ vector does not do a reallocation every time you add a little thing to it. The vector's underlying capacity will be bigger than your data content. The upshot being that far fewer reallocations happen than you might think.

b) In order to know the size beforehand, in PeterO's case, one has to parse the input file first, then parse it again to extract the data. Which of course takes a long time. I'd wager that dynamically allocating the memory on the fly as one parses the file only once will be faster.

Of course all this can be done in C with malloc and realloc, it's not much work to write it. Less work than parsing the file twice.

GuruMeditation
Posts: 33
Joined: Thu May 31, 2018 3:33 pm
Location: UK

Re: Dynamic sized arrays

Fri Jun 01, 2018 8:03 am

Heater wrote:
Fri Jun 01, 2018 7:51 am
a) That C++ vector does not do a reallocation every time you add a little thing to it. The vector's underlying capacity will be bigger than your data content.
Yes, obviously. My point was about reallocation of large arrays.

Maybe I'm too old school :D
Last edited by GuruMeditation on Fri Jun 01, 2018 8:50 am, edited 1 time in total.

Heater
Posts: 10013
Joined: Tue Jul 17, 2012 3:02 pm

Re: Dynamic sized arrays

Fri Jun 01, 2018 8:38 am

GuruMeditation,
...what about reallocation of large arrays.
Good question. I'm not sure. How large is large? 100MB, 500MB...

Thing is in PeterO's case having a lot of data means doing a lot of parsing. The time lost in reallocating and copying the array in one pass is likely to be a lot less than the time spent parsing the file ahead of time.

But what happens if it's really big, big enough to take a big proportion of your available RAM? 700MB on the Pi say. Surely then the vector cannot reallocate and copy when it finds you need yet another byte. There is not enough room left to do so. I have no idea what a C++ vector does then.

This calls for an experiment...

GuruMeditation
Posts: 33
Joined: Thu May 31, 2018 3:33 pm
Location: UK

Re: Dynamic sized arrays

Fri Jun 01, 2018 8:55 am

Heater wrote:
Fri Jun 01, 2018 8:38 am
But what happens if it's really big, big enough to take a big proportion of your available RAM? 700MB on the Pi say. Surely then the vector cannot reallocate and copy when it finds you need yet another byte. There is not enough room left to do so. I have no idea what a C++ vector does then.
bad_alloc exception, I'd guess.

User avatar
PeterO
Posts: 4257
Joined: Sun Jul 22, 2012 4:14 pm

Re: Dynamic sized arrays

Fri Jun 01, 2018 9:09 am

Heater wrote:
Fri Jun 01, 2018 7:51 am
Of course all this can be done in C with malloc and realloc, it's not much work to write it. Less work than parsing the file twice.
That depends on how your parsing code works.

Code: Select all

parseFile("/opt/RPI/tmp/Sunday7.obj",passOneTokens,NULL);
parseFile("/opt/RPI/tmp/Sunday7.obj",passTwoTokens,NULL);
The tokens arrays look like this..

Code: Select all


static Token passOneTokens[] = {

    {"v",0,P1v},
    {"vn",0,P1vn},
    {"vt",0,P1vt},
    {"f",0,P1f},
    {"usemtl",0,P1usemtl},
    {"o",0,P1object},
    {NULL,0,NULL}
};
Here are the pass one and pass two handlers for a vertex

Code: Select all


static int P1v(__attribute__((unused)) int next)
{
    currentObject->positions += 1;
    masterObject.positions += 1;
    return 1;
}

static int P2v(int next)
{
    POSITION *pp;

    pp = &masterObject.Positions[masterObject.positions++];
    sscanf(getField(next+1),"%f",&pp->x);
    sscanf(getField(next+2),"%f",&pp->y);
    sscanf(getField(next+3),"%f",&pp->z);
    return 1;
}
PeterO
Discoverer of the PI2 XENON DEATH FLASH!
Interests: C,Python,PIC,Electronics,Ham Radio (G0DZB),Aeromodelling,1960s British Computers.
"The primary requirement (as we've always seen in your examples) is that the code is readable. " Dougie Lawson

ejolson
Posts: 2041
Joined: Tue Mar 18, 2014 11:47 am

Re: Dynamic sized arrays

Sat Jun 02, 2018 6:18 am

PeterO wrote:
Mon May 28, 2018 11:42 am
Does anyone have a better solution?
I think you have one too many levels on indirection. Try something like

int (*array)[ys];

and then reference the array as

array[x][y]

using the same syntax as any other two-dimension array.

Return to “C/C++”