edbird
Posts: 50
Joined: Wed Jun 25, 2014 1:11 pm

Low Level (FAST) GPIO Access

Thu Jul 24, 2014 4:15 pm

Hi,

I am working on an embedded project and am trying to streamline my main loop code. From doing some initial tests, I know that I can increase the performance of GPIO read/write operations by 5x if I use the "native C method" rather than wiringPi.

My first question is "is it safe to use both together"? This probably isn't good practice, but the example on wikipedia http://elinux.org/RPi_Low-level_peripherals#C (see section on "native C") doesn't explain how to set pull up and pull down resistors in detail, or more importantly how to configure interrupts. Hence, to make my life easier I would like to use wiringPi for the setup of the GPIO.

My second question is "is it possible to configure interrupts in the same manner as shown in this wiki article, using the "native C" method? If so how is this done?

User avatar
joan
Posts: 14887
Joined: Thu Jul 05, 2012 5:09 pm
Location: UK

Re: Low Level (FAST) GPIO Access

Thu Jul 24, 2014 4:43 pm

I can't think of any reason there should be conflict.

Sample code to set up and respond to an interrupt.

Code: Select all

/* 2014-07-06
   wfi.c

   gcc -o wfi wfi.c

   ./wfi [gpio]
*/

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/time.h>
#include <poll.h>

#define GPIO 4

int main(int argc, char *argv[])
{
   char str[256];
   struct timeval tv;
   struct pollfd pfd;
   int fd, gpio;
   char buf[8];

   /*
      Prior calls assumed.
      sudo sh -c "echo 4      >/sys/class/gpio/export"
      sudo sh -c "echo in     >/sys/class/gpio/gpio4/direction"
      sudo sh -c "echo rising >/sys/class/gpio/gpio4/edge"
   */

   if (argc > 1) gpio = atoi(argv[1]);
   else          gpio = GPIO;

   sprintf(str, "/sys/class/gpio/gpio%d/value", gpio);

   if ((fd = open(str, O_RDONLY)) < 0)
   {
      fprintf(stderr, "Failed, gpio %d not exported.\n", gpio);
      exit(1);
   }

   pfd.fd = fd;

   pfd.events = POLLPRI;

   lseek(fd, 0, SEEK_SET);    /* consume any prior interrupt */
   read(fd, buf, sizeof buf);

   poll(&pfd, 1, -1);         /* wait for interrupt */

   lseek(fd, 0, SEEK_SET);    /* consume interrupt */
   read(fd, buf, sizeof buf);

   exit(0);
}

User avatar
AndrewS
Posts: 3625
Joined: Sun Apr 22, 2012 4:50 pm
Location: Cambridge, UK
Contact: Website

Re: Low Level (FAST) GPIO Access

Thu Jul 24, 2014 4:53 pm

edbird wrote:My second question is "is it possible to configure interrupts in the same manner as shown in this wiki article, using the "native C" method? If so how is this done?
I guess you'd need to read through the peripherals datasheet and then prod the registers as appropriate.

User avatar
joan
Posts: 14887
Joined: Thu Jul 05, 2012 5:09 pm
Location: UK

Re: Low Level (FAST) GPIO Access

Thu Jul 24, 2014 5:03 pm

AndrewS wrote:
edbird wrote:My second question is "is it possible to configure interrupts in the same manner as shown in this wiki article, using the "native C" method? If so how is this done?
I guess you'd need to read through the peripherals datasheet and then prod the registers as appropriate.
I think the OP will need to be in a bare-metal environment to be able to handle the interrupt at the register level.

edbird
Posts: 50
Joined: Wed Jun 25, 2014 1:11 pm

Re: Low Level (FAST) GPIO Access

Thu Jul 24, 2014 6:31 pm

Why does the manual (http://www.raspberrypi.org/documentatio ... herals.pdf) suggest that the GPIO base address is 0x7E200000 (page 90), when the example code uses 0x20200000?

User avatar
joan
Posts: 14887
Joined: Thu Jul 05, 2012 5:09 pm
Location: UK

Re: Low Level (FAST) GPIO Access

Thu Jul 24, 2014 6:39 pm

edbird wrote:Why does the manual (http://www.raspberrypi.org/documentatio ... herals.pdf) suggest that the GPIO base address is 0x7E200000 (page 90), when the example code uses 0x20200000?
Look at the diagram on page 5 and the accompanying description on page 6.

edbird
Posts: 50
Joined: Wed Jun 25, 2014 1:11 pm

Re: Low Level (FAST) GPIO Access

Thu Jul 24, 2014 6:49 pm

This thing is complicated. I'm still not sure if I understand which one is the correct address to use and why. (Except the example uses 0x2020000, and it works therefore I must use that address...)

User avatar
AndrewS
Posts: 3625
Joined: Sun Apr 22, 2012 4:50 pm
Location: Cambridge, UK
Contact: Website

Re: Low Level (FAST) GPIO Access

Thu Jul 24, 2014 6:52 pm

edbird wrote:This thing is complicated.
Welcome to low-level programming :!: That's why people use things like WiringPi ;)

User avatar
joan
Posts: 14887
Joined: Thu Jul 05, 2012 5:09 pm
Location: UK

Re: Low Level (FAST) GPIO Access

Thu Jul 24, 2014 6:56 pm

edbird wrote:This thing is complicated. I'm still not sure if I understand which one is the correct address to use and why. (Except the example uses 0x2020000, and it works therefore I must use that address...)
Then there are physical addresses, bus addresses, Linux userland virtual addresses etc. etc. For what it's worth I use these.

Code: Select all

#define SYST_BASE  0x20003000
#define DMA_BASE   0x20007000
#define CLK_BASE   0x20101000
#define GPIO_BASE  0x20200000
#define UART0_BASE 0x20201000
#define PCM_BASE   0x20203000
#define SPI0_BASE  0x20204000
#define I2C0_BASE  0x20205000
#define PWM_BASE   0x2020C000
#define UART1_BASE 0x20215000
#define I2C1_BASE  0x20804000
#define I2C2_BASE  0x20805000
#define DMA15_BASE 0x20E05000

#define DMA_LEN   0x1000 /* allow access to all channels */
#define CLK_LEN   0xA8
#define GPIO_LEN  0xB4
#define SYST_LEN  0x1C
#define PCM_LEN   0x24
#define PWM_LEN   0x28
#define I2C_LEN   0x1C

edbird
Posts: 50
Joined: Wed Jun 25, 2014 1:11 pm

Re: Low Level (FAST) GPIO Access

Thu Jul 24, 2014 7:06 pm

Huh, thanks for this, I guess all I really need to worry about is the GPIO addresses, and how to use them. More manual reading to come I think.

User avatar
joan
Posts: 14887
Joined: Thu Jul 05, 2012 5:09 pm
Location: UK

Re: Low Level (FAST) GPIO Access

Thu Jul 24, 2014 7:17 pm

edbird wrote:Huh, thanks for this, I guess all I really need to worry about is the GPIO addresses, and how to use them. More manual reading to come I think.
More sample code http://abyz.co.uk/rpi/pigpio/code/minimal_gpio.zip

User avatar
AndrewS
Posts: 3625
Joined: Sun Apr 22, 2012 4:50 pm
Location: Cambridge, UK
Contact: Website

Re: Low Level (FAST) GPIO Access

Thu Jul 24, 2014 7:21 pm

joan wrote:For what it's worth I use these.

Code: Select all

#define I2C2_BASE  0x20805000
I was really puzzled to see this as I thought the BCM2835 only had 2 I2C busses. Then I found this in the datasheet: "Note that the BSC2 master is used dedicated with the HDMI interface and should not be accessed by user programs."
And I see the signals are referred to as HDMI_SCL and HDMI_SDA on the schematics.

User avatar
joan
Posts: 14887
Joined: Thu Jul 05, 2012 5:09 pm
Location: UK

Re: Low Level (FAST) GPIO Access

Thu Jul 24, 2014 7:28 pm

AndrewS wrote:
joan wrote:For what it's worth I use these.

Code: Select all

#define I2C2_BASE  0x20805000
I was really puzzled to see this as I thought the BCM2835 only had 2 I2C busses. Then I found this in the datasheet: "Note that the BSC2 master is used dedicated with the HDMI interface and should not be accessed by user programs."
And I see the signals are referred to as HDMI_SCL and HDMI_SDA on the schematics.
I have never tried using that bus, it's just convenient to put the addresses in one place.

User avatar
gordon@drogon.net
Posts: 2020
Joined: Tue Feb 07, 2012 2:14 pm
Location: Devon, UK
Contact: Website

Re: Low Level (FAST) GPIO Access

Fri Jul 25, 2014 6:32 pm

[quote="edbird"]Hi,

I am working on an embedded project and am trying to streamline my main loop code. From doing some initial tests, I know that I can increase the performance of GPIO read/write operations by 5x if I use the "native C method" rather than wiringPi.[/code]

5x faster? I'm somewhat surprised to hear this. Not saying wiringPi is the fastest - I know where there are places it could be improved, but I am surprised at a 5x speed-up.

Do you have example code that you want to publish?

-Gordon
--
Gordons projects: https://projects.drogon.net/

edbird
Posts: 50
Joined: Wed Jun 25, 2014 1:11 pm

Re: Low Level (FAST) GPIO Access

Wed Aug 06, 2014 11:07 am

gordon@drogon.net wrote:
edbird wrote:Hi,

I am working on an embedded project and am trying to streamline my main loop code. From doing some initial tests, I know that I can increase the performance of GPIO read/write operations by 5x if I use the "native C method" rather than wiringPi.[/code]

5x faster? I'm somewhat surprised to hear this. Not saying wiringPi is the fastest - I know where there are places it could be improved, but I am surprised at a 5x speed-up.

Do you have example code that you want to publish?

-Gordon

Firstly results. I got my initial result from some tests someone else has done, see this link:
http://codeandlife.com/2012/07/03/bench ... pio-speed/

But I have also tested square-wave output using wiringPi myself, and can confirm essentially the same result with the -O2 flag enabled. (Just less than 5 MHz wave, if my memory serves me correctly, it was some weeks ago now.)

Do you have plans to improve it in the future? My advice would be that unless you can at least double if not multiply by 4 the performance then there probably is little point putting the work into doing it... The reason I say this is because if someone *really* needed something as fast as I am trying to achieve then they won't have a choice to use wiringPi - it will not be an option.

Again, the argument is that if you are using a Raspberry Pi and wiringPi is too slow then you need to choose something else which isn't a Raspberry Pi. Many people keep telling me this. (And this is true, but I don't know what else I would use, considering I need OpenGLES 2.0 as well as about 16 GPIO.)

Having said what I said above, I did find that only being able to read 1 pin at a time is a problem. There is a digitalWriteByte() function, but no digitalReadByte(). If you wanted to add that functionality that would be *extremely* useful, and I assume would make reading GPIO 0 through GPIO 7 8 times faster if they can all be done at once by reading 1 8 bit location? (Parallelism?)

User avatar
AndrewS
Posts: 3625
Joined: Sun Apr 22, 2012 4:50 pm
Location: Cambridge, UK
Contact: Website

Re: Low Level (FAST) GPIO Access

Wed Aug 06, 2014 11:24 am

edbird wrote:Firstly results. I got my initial result from some tests someone else has done, see this link:
http://codeandlife.com/2012/07/03/bench ... pio-speed/

Code: Select all

while(1) {
  GPIO_SET = 1<<4;
  GPIO_CLR = 1<<4;
}
is obviously an extremely tight loop (not even any function call overheads!). I suspect that if you added any "real code" around that, you'd find the maximum frequency would rapidly drop off. Ahhh, the joys of benchmarking ;-)

I don't have any experience myself, but I've read that for ultimate GPIO speed (especially over multiple pins) you probably want to be looking at an FPGA-based solution. You could probably link the FPGA to the Pi (maybe over SPI) and still use the Pi 'just' for your OpenGLES 2.0 display?

User avatar
gordon@drogon.net
Posts: 2020
Joined: Tue Feb 07, 2012 2:14 pm
Location: Devon, UK
Contact: Website

Re: Low Level (FAST) GPIO Access

Wed Aug 06, 2014 12:42 pm

edbird wrote:
gordon@drogon.net wrote:
edbird wrote:Hi,

I am working on an embedded project and am trying to streamline my main loop code. From doing some initial tests, I know that I can increase the performance of GPIO read/write operations by 5x if I use the "native C method" rather than wiringPi.[/code]

5x faster? I'm somewhat surprised to hear this. Not saying wiringPi is the fastest - I know where there are places it could be improved, but I am surprised at a 5x speed-up.

Do you have example code that you want to publish?

-Gordon

Firstly results. I got my initial result from some tests someone else has done, see this link:
http://codeandlife.com/2012/07/03/bench ... pio-speed/

But I have also tested square-wave output using wiringPi myself, and can confirm essentially the same result with the -O2 flag enabled. (Just less than 5 MHz wave, if my memory serves me correctly, it was some weeks ago now.)
So you're saying that you can generate a 25MHz signal with your code? That's fine, but for what purpose? It's easier to program a clock generator and plumb it to a pin (and that then takes 0% CPU)
Do you have plans to improve it in the future? My advice would be that unless you can at least double if not multiply by 4 the performance then there probably is little point putting the work into doing it... The reason I say this is because if someone *really* needed something as fast as I am trying to achieve then they won't have a choice to use wiringPi - it will not be an option.
I have no plans to improve it. I don't think really needed and there are better (and much more stable) ways to generate high frequency outputs - e.g. using the pins configured as a hardware clock rather than wiggling them in software.

But it's hard to know what your actual application is though, so who knows.
Again, the argument is that if you are using a Raspberry Pi and wiringPi is too slow then you need to choose something else which isn't a Raspberry Pi. Many people keep telling me this. (And this is true, but I don't know what else I would use, considering I need OpenGLES 2.0 as well as about 16 GPIO.)
Maybe you need to re-think what you need? Do it differently? You won't bit-bang a stable output clock at all on the Pi. It was never designed for that. There are clock generators internal to the Pi and a means (via wiringPi and others) to program them and hook them to the GPIO pins if you need it.

Having said what I said above, I did find that only being able to read 1 pin at a time is a problem. There is a digitalWriteByte() function, but no digitalReadByte(). If you wanted to add that functionality that would be *extremely* useful, and I assume would make reading GPIO 0 through GPIO 7 8 times faster if they can all be done at once by reading 1 8 bit location? (Parallelism?)
No-one asked me for readByte() - until relatively recently (ie. the past 3-4 weeks - did you email me too?). And do note that the bits are not in consecutive order, so it's a matter of reading the 32-bit input register, then doing some bit twiddling/shuffling to assemble it into 8 consecutive bits - a bit faster than 8 x digitalReads though.

This is something I've been though/discussed, etc. with others several times over the past 2.5 years - the Pi is great, but its not perfect. If you want to generate nanosecond wide pulses, then the Pi simply isn't the system to do it with. If you want to generate a stable 100Mhz clock, then it's possible with the Pi, but not in software - there is hardware to do that for you.

Maybe if we knew more about your application then we can suggest other ways to do what you need?

-Gordon
--
Gordons projects: https://projects.drogon.net/

edbird
Posts: 50
Joined: Wed Jun 25, 2014 1:11 pm

Re: Low Level (FAST) GPIO Access

Wed Aug 06, 2014 4:01 pm

I keep running into the same problems. I need to be able to do this in software rather than attaching a clock source in hardware. I'm reading 2 lots of 8 bit wide data after an interrupt is triggered, so that's why I need fast GPIO... I'm not requiring a fast, constant clock source, as you suggested above.

I have had an idea today about how I can do this more efficiently. In the external hardware if I store 256 sets of 2x8 bits of data, (in shift-registers say) and then have 1 interrupt to read it all rather than 256 individual interrupts, then that's less interrupts to service, hence less wasted CPU time, or so I hope.

But that still requires reading 512 bytes of data, byte at a time, and hence I'm back to "get fast GPIO".

So anyway, having looked at the "native C" library, I see that some memory is mapped from a /dev/mem device. I don't completely understand what this does, other than create a copy of some memory, somewhere else in memory, which seems kind of pointless.

Surely there exist some memory locations on the Raspberry Pi which connect directly to the GPIO (even if some of them are control registers for the GPIO device itself?) Can I not just declare a pointer to all the relevant memory addresses and set those memory locations to values to write data to the gpio and read the value of the pointer to read data back in? (Assuming I set the correct values in control registers first to set what is an input / output etc?) Also I understand that there is some sort of issue with "hardware memory addresses" and "linux memory addresses" and possibly even a 3rd type of memory address. The hardware one being the actual address, and the linux one being the address linux uses instead of the actual address. I have no clue why this is done however...

My guess is that this would be the absolute fastest way, and that performance would then be limited by the CPU, hence under ideal situations, you would get a maximum bit-bash rate of 1/2 a giga baud. (Half of 1000 MHz overclocked R-Pi clock speed, assuming that 1 instruction was set the gpio high and the next was to set it low again and that these instructions executed in 1 cycle, which clearly they will not.)

User avatar
joan
Posts: 14887
Joined: Thu Jul 05, 2012 5:09 pm
Location: UK

Re: Low Level (FAST) GPIO Access

Wed Aug 06, 2014 4:19 pm

Er, yes. I linked to code which does that. It won't advance you any further though.

User avatar
AndrewS
Posts: 3625
Joined: Sun Apr 22, 2012 4:50 pm
Location: Cambridge, UK
Contact: Website

Re: Low Level (FAST) GPIO Access

Wed Aug 06, 2014 4:20 pm

Maybe Linux is "getting in the way" and you need to try coding in bare-metal instead? :shock:

Or maybe you need to consider using alternative hardware, as already suggested? :? See also http://www.raspberrypi.org/forums/viewt ... 91&t=83830

User avatar
gordon@drogon.net
Posts: 2020
Joined: Tue Feb 07, 2012 2:14 pm
Location: Devon, UK
Contact: Website

Re: Low Level (FAST) GPIO Access

Wed Aug 06, 2014 4:28 pm

edbird wrote:I keep running into the same problems. I need to be able to do this in software rather than attaching a clock source in hardware. I'm reading 2 lots of 8 bit wide data after an interrupt is triggered, so that's why I need fast GPIO... I'm not requiring a fast, constant clock source, as you suggested above.
If you're taking an interrupt into user-land, (e.g. using wiringPi's ISR code), then the rate you get these interrupts is probably not going to be fast enough - I've benchmarked them at about 66K/sec max. That's 15µS per interrupt...

That may seem shockingly slow but that interrupt goes from the hardware through Linux which wakes up your program and lets it go... If you write a kernel module, then you get them much faster... But you need to write a kernel module... (Or go "bare bones")
I have had an idea today about how I can do this more efficiently. In the external hardware if I store 256 sets of 2x8 bits of data, (in shift-registers say) and then have 1 interrupt to read it all rather than 256 individual interrupts, then that's less interrupts to service, hence less wasted CPU time, or so I hope.

But that still requires reading 512 bytes of data, byte at a time, and hence I'm back to "get fast GPIO".

So anyway, having looked at the "native C" library, I see that some memory is mapped from a /dev/mem device. I don't completely understand what this does, other than create a copy of some memory, somewhere else in memory, which seems kind of pointless.
It doesn't quite copy memory - it's mapping the memory mapped hardware registers and making them direcly accessible from user-land. This is how wiringPi works. The code in wiringPi then has direct access to the memory mapped registers. Where wiringPi is "slow" is that it handles an indirection to map 3 different types of pin numbering to the bit-positions in the hardware registers. If you use wiringPiSetupGpio() then this is the fastest as there is no indirection, but there is still some table-lookups to work out which of the 2 32-bit register banks to use and what bit-position in that register corresponds to the output bit. This is the price (in terms of overall speed) to pay for flexability and ease of use. If you know the bits you're reading
Surely there exist some memory locations on the Raspberry Pi which connect directly to the GPIO (even if some of them are control registers for the GPIO device itself?) Can I not just declare a pointer to all the relevant memory addresses and set those memory locations to values to write data to the gpio and read the value of the pointer to read data back in? (Assuming I set the correct values in control registers first to set what is an input / output etc?) Also I understand that there is some sort of issue with "hardware memory addresses" and "linux memory addresses" and possibly even a 3rd type of memory address. The hardware one being the actual address, and the linux one being the address linux uses instead of the actual address. I have no clue why this is done however...
Yes, these exist and that's exactly what the mmap() calls are doing. You then declare a pointer to those memory mapped regions and it goes directly to the hardware without passing 'go' or collecting £200 ...
My guess is that this would be the absolute fastest way, and that performance would then be limited by the CPU, hence under ideal situations, you would get a maximum bit-bash rate of 1/2 a giga baud. (Half of 1000 MHz overclocked R-Pi clock speed, assuming that 1 instruction was set the gpio high and the next was to set it low again and that these instructions executed in 1 cycle, which clearly they will not.)
The fastest you will probably easily get via GPIO might be via the SPI interface. You can clock that at 32Mb/sec. That still falls short of what you need by the sounds of it though, however if it's 8-bit data, then maybe you can buffer the incoming data in SRAM, then pull it on the Pi, 8-bits at a time. (with some sort of clocked address generator, tri-state buffers, etc. might be less chips than chift-registers)

I also suggest you look at the pianalyzer project - they were doing high speed data sampling, but I have a funny feeling the maximum rate they could get to get reliable samples was not much more than 1M samples/sec. (over a number of bits)

-Gordon
--
Gordons projects: https://projects.drogon.net/

edbird
Posts: 50
Joined: Wed Jun 25, 2014 1:11 pm

Re: Low Level (FAST) GPIO Access

Wed Aug 06, 2014 6:17 pm

32Mb/s is 4MB/s which allows a read-rate of 2 M samples / sec. (10 bits per sample.)

This is wayyyy more than the 44100 samples / sec I was aiming for.

Why is the SPI interface so much faster than the general GPIO then?

Also going back to the point about memory mapping... Why bother doing that? What does it gain you over just accessing the "correct" locations initially without a mapping?

User avatar
PeterO
Posts: 5829
Joined: Sun Jul 22, 2012 4:14 pm

Re: Low Level (FAST) GPIO Access

Wed Aug 06, 2014 6:34 pm

edbird wrote: Also going back to the point about memory mapping... Why bother doing that? What does it gain you over just accessing the "correct" locations initially without a mapping?
Why don't you try writing some code that " just access[es] the "correct" locations" and see what happens.......

PeterO
Discoverer of the PI2 XENON DEATH FLASH!
Interests: C,Python,PIC,Electronics,Ham Radio (G0DZB),1960s British Computers.
"The primary requirement (as we've always seen in your examples) is that the code is readable. " Dougie Lawson

edbird
Posts: 50
Joined: Wed Jun 25, 2014 1:11 pm

Re: Low Level (FAST) GPIO Access

Wed Aug 06, 2014 6:43 pm

PeterO wrote:
edbird wrote: Also going back to the point about memory mapping... Why bother doing that? What does it gain you over just accessing the "correct" locations initially without a mapping?
Why don't you try writing some code that " just access[es] the "correct" locations" and see what happens.......

PeterO
I'm guessing that it wouldn't work or would break something, then?

Why don't you explain why it won't work rather than doing what you are currently doing.

User avatar
PeterO
Posts: 5829
Joined: Sun Jul 22, 2012 4:14 pm

Re: Low Level (FAST) GPIO Access

Wed Aug 06, 2014 6:54 pm

Because Gordon has already told you than mmap is the way to access these locations, yet you just ignored him and kept on going on about how it seemed pointless.
Discoverer of the PI2 XENON DEATH FLASH!
Interests: C,Python,PIC,Electronics,Ham Radio (G0DZB),1960s British Computers.
"The primary requirement (as we've always seen in your examples) is that the code is readable. " Dougie Lawson

Return to “C/C++”