Call for beta testers: FIQ_FSM USB driver rewrite


251 posts   Page 1 of 11   1, 2, 3, 4, 5 ... 11
by jdb » Mon Feb 24, 2014 6:50 pm
The latest rpi-update kernels now include fiq_fsm, my USB driver rewrite. This is a comprehensive rewrite to push more of the time-critical work into the FIQ in order to get needed reliable communication to problem USB devices.

Last post update: 28/04/2014

1. Installation
The rpi-update kernel has fiq_fsm included.
Code: Select all
sudo rpi-update

fiq_fsm is enabled by default on this kernel.

2. Overview of changes
Introducing FIQ_FSM: The flying spaghetti monster FIQ

The dwc_otg driver has been rewritten to include a much more fully-featured FIQ. This FIQ now handles select types of transcation by itself without any involvement from the dwc_otg host controller driver (HCD).

Two modes of operation are provided:

"NOP" FIQ which emulates the base FIQ implemented by gsh some time ago. This is an extremely fast interrupt handler designed solely to hold off SOF interrupts until USB transactions need queueing by the driver. All other driver functionality is unchanged when using this FIQ - there have been some minor tweaks that result in a small reduction in interrupt rate.

"FSM" FIQ
- Performs the entirety of all split transactions without driver intervention via a state machine design.
- A bitmask module parameter selects which features are enabled at boot-time.
- Implements its own stateless microframe "pipelining" allowing optimal use of a full-speed frame bandwidth.
- Performs an entire URB's worth (32 or 64 transactions total) of high-speed Isochronous transactions to an endpoint (webcam or DVB dongle)

Advantages of pushing more work into the FIQ:
- When handling a host channel interrupt, the FIQ takes approximately 0.1x the CPU time that the HCD takes to service the interrupt and perform the next transaction. This results in an increase in CPU cycles available for everything else: the reduction typically increases available cycle count by 20% under worst-case conditions.
- For SOF interrupts, the CPU time is approx 0.03x the time the driver takes. A 490ns interrupt handler is important when the interrupt generation rate is 8000 per second.
- The memory footprint of the FSM FIQ is tiny compared to the HCD. This dramatically reduces both the effect and likelihood of L1 cache evictions when servicing a FIQ interrupt which should tangentially increase responsiveness.
- The FIQ is unaffected by system interrupt latency. The FIQ is only ever disabled in minimal critical sections where the HCD is reading or writing FIQ state information.
- There's a nice side-effect of performing transactions in lock-step in the FIQ: under heavy load, there's an interrupt-aggregation effect. This means that the total time spent in the HCD interrupt handlers levels off as workload increases, rather than increasing until the Pi grinds to a halt.
- The OTG hardware has a number of bugs in it that cause scheduling problems for periodic transactions. By precisely tracking interrupt status and host channel state, these scheduling errors can be worked around or masked.
- High-speed isochronous transactions on dwc_otg are especially vulnerable to interrupt latency: by giving control of a whole batch (usually 32 or 64) of transactions to the FIQ, they can be precisely scheduled and also reap the benefit of 0.1x the CPU time.

3. Known issues / bug list
- There is a bug affecting interrupt processing with fiq_enable = 0. Use fiq_enable=1.
- If the root port is disconnected on boot, or if the root port is disconnected while the FIQ is running, USB becomes unresponsive. This is because the core changes mode from host to device which changes interrupt register meanings.
Github issues have been logged for-
If you have additional data (i.e. a device that exhibits the same behaviour but has a different vendor/product ID) then please add to those Github issues.


4. Module options in detail
  • dwc_otg.fiq_enable: Support using the ARM FIQ.
  • dwc_otg.fiq_fsm_enable: If 1, then the larger FSM handler is installed. If 0, the NOP FIQ is installed.
  • dwc_otg.fiq_fsm_mask: Bitmask of transaction types to perform in the FIQ. Has no effect for the NOP FIQ.
    Bit 0: Accelerate non-periodic split transactions
    Bit 1: Accelerate periodic split transactions
    Bit 2: Accelerate high-speed isochronous transactions
    The default is 0x7, i.e. all options enabled.
  • dwc_otg.nak_holdoff: default 8. For split transactions to bulk endpoints, this adjustable parameter specifies the hold-off time in microframes before retrying a transaction to a full-speed Bulk endpoint. Useful for throttling interrupt frequency. Set 0 to disable. This can be used with either the NOP FIQ or FSM FIQ.

5. Adjustable NAK holdoff (bitcoin miners take note)
The NAK holdoff for bulk split transactions is now adjustable.

The NAK holdoff can be used to dramatically reduce the CPU interrupt frequency when polling full-speed bulk endpoints. The default value of 8 should be used in most cases. However, if the answer to all these questions is yes:
    - The only full-speed devices attached with bulk endpoints are serial UART adapters or similar with documented tx/rx FIFO sizes
    - The data source accumulation rate is "slow" compared to USB1.1 bandwidth (slow means <0.1Mbit/s)
    - Latency of returned data is not an issue (latency in ms = nak_holdoff/8)
then it will be of benefit to increase this figure. Most serial port adapters use quite large FIFOs which can be left for quite some time to accumulate data before polling.

As an example for a device with 128-byte FIFOs and a slow baud rate, the maximum values listed below can be used without risking data loss:
Code: Select all
Baud      nak_holdoff
2400      2048
4800      1024
9600      512
19200     256
38400     128
57600     64
115200    32

Note: the default value of 8 may cause overflows if the input baud rate is >400,000 baud. Set this to 4 or even 2 if the source data rate exceeds 1Mbps.
Control endpoints are throttled to a fixed interval of 8 microframes.

6. Limitations
The FIQ is still dependent on the HCD to queue periodic transactions in a timely fashion. While each individual stage of a split transaction or high-speed isochronous transaction can be performed perfectly, endpoints can still get a longer service interval if there is a long interrupt hold-off time. With typical strenuous usage of subsystems known to cause interrupt latency (heavy write activity to filesystems/SD card, heavy ethernet use) it is possible for periodic transactions to be queued too late in a full-speed frame to be performed. The FIQ will automatically time-out any transaction that could not be started in the correct frame. In most cases, this results in the transaction being re-queued for the subsequent frame (in the case of interrupt transactions) but for Isochronous transports this will cause data loss.

There is an upper bound to the amount of USB1.1 bus bandwidth that can be used per TT. This restriction is born from the limitations of the hardware, which conspire to reduce the throughtput for all types of split transaction. In effect, we can only make use of:
    - Approx 45% of a downstream TT's non-periodic bandwidth
    - Maximum 3 periodic transactions per frame per TT, inclusive of Isochronous
    - Maximum of 752 periodic bytes per frame for an Isochronous IN or OUT endpoint
    - Using large-bandwidth Isochronous transport will reduce the number of other types of transactions that can be completed in a frame.
In the vast majority of cases, with a suitable multiple-TT hub these issues will not be noticeable. The only times where contention will be an issue is with single-TT hubs.

The driver rewrite has resulted in a much more aggressive reservation of host channels for transactions performed by the FIQ. Each host channel can theoretically be recycled for another transaction after each transfer complete interrupt (and this is what the HCD does at the cost of significant CPU time) but for a FIQ-enabled transfer the host channel is reserved for the duration of the transfer. This imposes a greater constraint on the maximum number of active endpoints that it is possible to communicate simultaneously with: typical effects would be that bulk transport endpoints start to slow down in throughput as contention for host channels occurs. In extreme cases, Isochronous will start to lose out on host channel contention and thus miss frames.

The microframe scheduler is currently unaware of the increased reservation period necessary for host channels used by the FIQ. It also does not accurately track the frame bandwidth required when considering a full-speed transaction and the associated guard interval.
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 1544
Joined: Thu Jul 11, 2013 2:37 pm
by jbeale » Fri Feb 28, 2014 10:43 pm
For what it's worth, the new FIQ_FSM code seems to be working here normally so far, but no real testing; just booted up and checked network connections on a R-Pi model A with 4-port USB hub, keyboard, mouse, and two wifi devices working at the same time. To check things I started up X and Midori which I have not done in a while (normally I use the R-Pi headless, just interacting via ssh). For whatever reason, performance seems faster than I remember.

Code: Select all
pi@rpi10 ~ $ lsusb
Bus 001 Device 002: ID 0424:2504 Standard Microsystems Corp. USB 2.0 Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 003: ID 0bda:8176 Realtek Semiconductor Corp. RTL8188CUS 802.11n WLAN Adapter
Bus 001 Device 004: ID 046d:c05a Logitech, Inc. Optical Mouse M90
Bus 001 Device 005: ID 413c:2003 Dell Computer Corp. Keyboard
Bus 001 Device 006: ID 0bda:8176 Realtek Semiconductor Corp. RTL8188CUS 802.11n WLAN Adapter

pi@rpi10 ~ $ uname -a
Linux rpi10 3.10.30+ #644 PREEMPT Fri Feb 21 18:44:33 GMT 2014 armv6l GNU/Linux

pi@rpi10 ~ $ cat /boot/cmdline.txt
dwc_otg.fiq_enable=1 dwc_otg.fiq_fsm_enable=1 dwc_otg.fiq_fsm_mask=0x3 dwc_otg.lpm_enable=0 console=ttyAMA0,115200 kgdboc=ttyAMA0,115200 console=tty1 root=/dev/mmcblk0p6 rootfstype=ext4 elevator=deadline rootwait
User avatar
Posts: 3141
Joined: Tue Nov 22, 2011 11:51 pm
by DougieLawson » Sat Mar 01, 2014 12:49 am
One thing to note is the BRANCH=next rpi-update is 3.10.30+ whereas the mainline rpi-update is 3.10.32+.
Microprocessor, Raspberry Pi & Arduino Hacker
Mainframe database troubleshooter
MQTT Evangelist
RPi owner since '12.
Twitter: @DougieLawson

2B, B+, A+, 5Bs, zero, 3B

Please post ALL technical questions on the forum.Do not send private messages.
User avatar
Posts: 23122
Joined: Sun Jun 16, 2013 11:19 pm
Location: Basingstoke, UK
by Richard-TX » Sat Mar 01, 2014 4:41 am
I am testing. Up 22:40:16 playing music (mpc/mpd) So far so good.
Richard
Doing Unix since 1985.
The 9-25-2013 image of Wheezy can be found at:
http://downloads.raspberrypi.org/raspbian/images/raspbian-2013-09-27/2013-09-25-wheezy-raspbian.zip
User avatar
Posts: 1545
Joined: Tue May 28, 2013 3:24 pm
Location: North Texas
by milhouse » Sat Mar 01, 2014 6:13 am
Some observations on FIQ_FSM (mostly stream of consciousness), running basic USB and network transfer tests, monitoring with bcmstat.sh (xgcd10), with and without FIQ_FSM enabled, also without the entire FIQ_FSM patch (ie. pre-FIQ_FSM). Testing with OpenELEC (kernel 3.13.5) with XBMC disabled (add a very long sleep to autostart.sh).

Overclock Settings: ARM 1000MHz, Core 500Mhz, RAM 600Mhz, force_turbo=1.

1) USB write[1] (USB3 Flash storage, ext2), USB read[2] and NFS (udp,rsize=32768,wsize=32768) network read[3] performance seems to be the same (or very slightly reduced) when FIQ_FSM is enabled. Tests were repeated 5 times each, min-max values shown where there is a range of values (peak values for IRQ/s).

Code: Select all
Test            FIQ_NOP        FIQ_FSM       Pre-FIQ_FSM
USB Write    12.0-12.6MB/s  11.8-12.3MB/s   11.8-12.5MB/s
               9,500 IRQ/s    2,250 IRQ/s     2,650 IRQ/s

USB Read     30.6-30.8MB/s  29.8-30.4MB/s   27.6-27.9MB/s
              11,000 IRQ/s    4,200 IRQ/s     5,000 IRQ/s

NFS Read          11.3MB/s       11.3MB/s        11.3MB/s
              16,835 IRQ/s    9,590 IRQ/s    10,816 IRQ/s

Idle           8,120 IRQ/s      750 IRQ/s       750 IRQ/s*
             
* 1,000 IRQ/s following network transfer

Given the results above, I'm not seeing any tangible benefit from FIQ_FSM - if anything performance may be slightly reduced (though statistically it's probably not significant).

That's not to say FIQ_FSM isn't a better solution, maybe it's too early to tell, it's just that USB or network transfers don't seem to show any obvious benefit. Obviously the network performance is constrained by the 100Mbit/s PHY, but CPU is similarly maxed out with/without FIQ_FSM, so again I'm not sure where or what the benefit is with USB/network storage as the "0.1x CPU time" improvements don't seem to have any practical benefit.

2) IRQ/s (as measured by bcmstat.sh) never dips below 8120/s when FIQ_FSM is disabled. With builds that do not include the FIQ_FSM patch, IRQ/s would usually run at about 750/s while the system is idle so this is a significant difference. When performing the network test with FIQ_FSM disabled, IRQ/s peaked at 16835/s! It appears that when FIQ_FSM is disabled there is now a baseline of 8100 IRQ/s, not including USB or network activity - is this attributable (and expected) with the FIQ NOP behaviour?

When FIQ_FSM is enabled, idle IRQ/s peaks at 750/s which is about the same as builds where FIQ_FSM is not present. When performing the network test with FIQ_FSM enabled, IRQ/s peaked at 9590 IRQ/s, which is again comparable with a system without FIQ_FSM present (10816 IRQ/s).

Interestingly once the system without FIQ_FSM had returned to an idle state, IRQ/s on this pre-FIQ_FSM kernel remained at 1010 IRQ/s following the network transfer, whereas the FIQ_FSM enabled kernel would return to the original 750 IRQ/s of for an idle system. This is a win for FIQ_FSM as there seems to be no way to explain the extra ~300 IRQ/s that remain active on non-FIQ_FSM kernels following a network transfer.

So there does appear to be a slight difference in IRQ/s behaviour when FIQ_FSM is added to the kernel and then left disabled, which is seen as significantly increased IRQ/s even while idle. I'm not entirely sure what effect this has on CPU performance - presumably it's costing a few more cycles to service all those extra IRQs as the CPU never went below 5.00% load with FIQ_FSM disabled, but with FIQ_FSM enabled the CPU would idle at 1.85%, and with no FIQ_FSM present in the kernel the CPU would idle at 1.7%.

Conclusion:
FIQ_NOP (ie. FIQ_FSM disabled) incurs more CPU overhead than pre-FIQ_FSM and significantly more IRQ/s, but strangely appears to just shade pre-FIQ_FSM and FIQ_FSM enabled in terms of performance on USB/network transfers.

A kernel with FIQ_FSM enabled seems to perform roughly on par with pre-FIQ_FSM kernels, however FIQ_FSM does appear to demonstrate superior IRQ control following large network transfers.

---------------------------
Code: Select all
1. dd if=/dev/zero of=/storage/test.dat bs=1M count=1000       (x5)
2. dd if=/dev/urandom of=/storage/test.dat bs=1M count=1000
   dd if=/storage/test.dat of=/dev/null bs=1M count=1000       (x5)
3. dd if=/freenas/media/test.mkv of=/dev/null bs=1M count=1000 (x5)
Posts: 611
Joined: Mon Jan 16, 2012 12:59 pm
by jdb » Sat Mar 01, 2014 10:00 am
Thanks for the testing.

The FSM FIQ should have a negligible impact on USB2.0 transfers - its job is to be transparent to high-speed bulk or interrupt traffic - but the larger code size of the FIQ handler may incur a slight overhead. Code paths through the FSM FIQ are not yet completely optimised: there's a lot of early returns that can be done in the passthrough states.

2) IRQ/s (as measured by bcmstat.sh) never dips below 8120/s when FIQ_FSM is disabled. With builds that do not include the FIQ_FSM patch, IRQ/s would usually run at about 750/s while the system is idle so this is a significant difference. When performing the network test with FIQ_FSM disabled, IRQ/s peaked at 16835/s! It appears that when FIQ_FSM is disabled there is now a baseline of 8100 IRQ/s, not including USB or network activity - is this attributable (and expected) with the FIQ NOP behaviour?


The NOP FIQ shouldn't behave like that: this is probably a bug. You should see similar interrupt numbers as the original FIQ fix.
Rockets are loud.
https://astro-pi.org
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 1544
Joined: Thu Jul 11, 2013 2:37 pm
by DougieLawson » Sun Mar 02, 2014 2:06 pm
I updated yesterday

pi@pi ~/python $ uname -a
Linux pi 3.10.30+ #644 PREEMPT Fri Feb 21 18:44:33 GMT 2014 armv6l GNU/Linux
pi@pi ~/python $ vcgencmd version
Feb 21 2014 18:57:22
Copyright (c) 2012 Broadcom
version 67f299c3a92a793fb8b8efcaf94f7f553153f89a (clean) (release)
pi@pi ~/python $

I'm gettting this message logged frequently
[15687.655504] Transfer to device 6 endpoint 0x2 failed - FIQ reported NYET. Data may have been lost.

pi@pi ~/python $ lsusb
Bus 001 Device 006: ID 04f3:0103 Elan Microelectronics Corp. ActiveJet K-2024 Multimedia Keyboard
Bus 001 Device 005: ID 7392:7811 Edimax Technology Co., Ltd EW-7811Un 802.11n Wireless Adapter [Realtek RTL8188CUS]
Bus 001 Device 004: ID 0409:005a NEC Corp. HighSpeed Hub
Bus 001 Device 003: ID 0424:ec00 Standard Microsystems Corp. SMSC9512/9514 Fast Ethernet Adapter
Bus 001 Device 002: ID 0424:9512 Standard Microsystems Corp. LAN9500 Ethernet 10/100 Adapter / SMSC9512/9514 Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
pi@pi ~/python $

pi@pi ~ $ cat /proc/cmdline
dma.dmachans=0x7f35 bcm2708_fb.fbwidth=1824 bcm2708_fb.fbheight=984 bcm2708.boardrev=0xf bcm2708.serial=0x4800069e smsc95xx.macaddr=B8:27:EB:00:06:9E sdhci-bcm2708.emmc_clock_freq=250000000 vc_mem.mem_base=0x1ec00000 vc_mem.mem_size=0x20000000 dwc_otg.lpm_enable=0 console=tty1 root=/dev/mmcblk0p2 rootfstype=ext4 elevator=deadline rootwait dwc_otg.fiq_enable=1 dwc_otg.fiq_fsm_enable=1 dwc_otg.fiq_fsm_mask=0x3
pi@pi ~ $


It's the cheapest of the cheap keyboard that I've been using with my RPi since I got the board.
Microprocessor, Raspberry Pi & Arduino Hacker
Mainframe database troubleshooter
MQTT Evangelist
RPi owner since '12.
Twitter: @DougieLawson

2B, B+, A+, 5Bs, zero, 3B

Please post ALL technical questions on the forum.Do not send private messages.
User avatar
Posts: 23122
Joined: Sun Jun 16, 2013 11:19 pm
Location: Basingstoke, UK
by TomasC » Mon Mar 03, 2014 12:09 pm
Hi,
uptime since applying the patch: 21:47 hours.
Setup: async DAC NAD D 3020 connected to powered usb hub, also connected to the hub is my media hard drive with samba share. The powered usb hub is connected to the usb hub of the raspi. I play music usually via mpd, accessing the files on the media hard drive.

So far I was unable to reproduce jitter and crackling, including situations were before applying the patch I could reproduce it: playing music through mpd from an internet music stream, while starting pyload on the raspi. Mind, I got regular crackles without stressing the raspi, but if I wanted the raspi to "crackle now", thats what I had to do. But since this patch: no crackles whatsoever. I see no noticable performance loss. Just for sports I started to stream a movie to my PC though samba from the raspi media hard drive, again no crackles on the usb DAC. Me likes this patch :)

Tomas
Posts: 2
Joined: Mon Mar 03, 2014 11:44 am
by Jahangir » Tue Mar 04, 2014 2:31 pm
Code: Select all
sudo lsusb
Bus 001 Device 012: ID 045e:04ec Microsoft Corp. Windows Phone (Zune)
Bus 001 Device 009: ID 0fce:5170 Sony Ericsson Mobile Communications AB
Bus 001 Device 007: ID 0bc2:2320 Seagate RSS LLC USB 3.0 bridge [Portable Expansion Drive]
Bus 001 Device 006: ID 05e3:0608 Genesys Logic, Inc. USB-2.0 4-Port HUB
Bus 001 Device 005: ID 05e3:0608 Genesys Logic, Inc. USB-2.0 4-Port HUB
Bus 001 Device 004: ID 0cf3:9271 Atheros Communications, Inc. AR9271 802.11n
Bus 001 Device 003: ID 0424:ec00 Standard Microsystems Corp. SMSC9512/9514 Fast Ethernet Adapter
Bus 001 Device 002: ID 0424:9512 Standard Microsystems Corp. LAN9500 Ethernet 10/100 Adapter / SMSC9512/9514 Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub


Setup :
Raspberry Pi Model B 512MB
Using the 1500 mA power source from PiHut
Powered hub 7 ports from PiHut : http://thepihut.com/products/7-port-usb ... spberry-pi
Using the Seagate USB3/2 as rootfs (ext4), swap and storage (xfs) connected to the powered hub
Using a TL-WN721N as wifi dongle with Hostapd in AP mode connected to the raspberry pi
Charging an android phone and windows phone on the powered hub
Internet connection 100MB/s = actual max speed 70 MB/s
cmdline.txt
Code: Select all
dwc_otg.lpm_enable=0 console=ttyAMA0,115200 kgdboc=ttyAMA0,115200 console=tty1 root=/dev/sda2 rootfstype=ext4 elevator=deadline rootwait ip=10.113.0.250 dwc_otg.fiq_enable=1 dwc_otg.fiq_fsm_enable=1 dwc_otg.fiq_fsm_mask=0x3

Config.txt
Code: Select all
#uncomment to overclock the arm. 700 MHz is the default.
arm_freq=1100

# for more options see http://elinux.org/RPi_config.txt
gpu_mem=8
core_freq=520
sdram_freq=600
over_voltage=6


Kernel version 3.13.5 with BFQ patch compiled using a 4.8.2 Linaro compiler
Code: Select all
uname -a
Linux raspy 3.13.5-bfq-dwc-gf4c9fa5 #1 Tue Mar 4 12:50:48 EET 2014 armv6l GNU/Linux


sysctl :
Code: Select all
kernel.printk = 3 4 1 3
vm.swappiness = 30
vm.min_free_kbytes = 65536
net.ipv4.ip_forward = 1
vm.vfs_cache_pressure = 300
net.ipv4.conf.default.rp_filter = 1
net.ipv4.ip_dynaddr = 1
fs.inotify.max_user_watches = 100000
net.core.wmem_max = 1048576
net.core.rmem_max = 1048576

Usage
NAS, torrent downloader and wifi router
Using Transmission-daemon to download the torrent on the hard drive and samba to share it on the wifi.
Usual issue
Because of overloading the USB with the Ethernet and the Hard Drive, I was getting a lot of these messages even when playing with the vm.min_free_kbytes. It was especially true when downloading big torrent file.
Code: Select all
Mar 4 13:42:37 raspy kernel: [ 1942.213060] net_ratelimit: 7989 callbacks suppressed
Mar 4 13:42:37 raspy kernel: [ 1942.219604] smsc95xx 1-1.1:1.0 eth0: kevent 2 may have been dropped

Test
Donwloading a big torrent file (9Go) at +- 3.5Mb/s + copying another big file at 5.5Mb/s using samba on the wifi + doing a speedtest.net (on the raspberry directly using python) + looking a movie using samba.

No kevent drop at all, no loss of connection.
Conclusion
From my test (just a couple of hours now), my raspberry pi is now clearly more stable and seems to handle better the load on the USB ports. I know it's not as "technical test", but well, I don't have that much knowledge about how the USB is handled, just understood that my issue with smsc95xx were linked to USB load and system memory. It seems this patch is helping it.
Posts: 7
Joined: Tue Sep 10, 2013 4:02 pm
by qfg » Tue Mar 04, 2014 6:26 pm
I installed the update because of the issues I had with the Pi periodically dropping interrupts from my Bluetooth dongle. (This in turn garbled the entire Bluetooth stream from that point onward.)

The new update definitely fixed my problem.
Posts: 1
Joined: Tue Mar 04, 2014 6:22 pm
by milhouse » Wed Mar 05, 2014 5:58 pm
This commit has fixed the 8000/IRQ/s with NOP FIQ - thanks.
Posts: 611
Joined: Mon Jan 16, 2012 12:59 pm
by jamesh » Wed Mar 05, 2014 7:31 pm
milhouse wrote:This commit has fixed the 8000/IRQ/s with NOP FIQ - thanks.


Have you now seen an improvement in performance?
Volunteer at the Raspberry Pi Foundation, helper at Picademy September, October, November 2014.
Forum Moderator
Forum Moderator
Posts: 15910
Joined: Sat Jul 30, 2011 7:41 pm
by milhouse » Wed Mar 05, 2014 7:38 pm
jamesh wrote:Have you now seen an improvement in performance?


I'll test again later this evening, but I didn't really see much difference between the different modes when i tested last time, I suspect my tests don't really benefit a great deal (in terms of performance) from these changes.
Posts: 611
Joined: Mon Jan 16, 2012 12:59 pm
by jdb » Wed Mar 05, 2014 9:41 pm
milhouse wrote:I'll test again later this evening, but I didn't really see much difference between the different modes when i tested last time, I suspect my tests don't really benefit a great deal (in terms of performance) from these changes.


Typically, USB mass storage and the LAN95xx devices (along with most ethernet-USB adapters) use USB2.0 bulk transports. There's non-trivial support for this in the DWC core: the hardware will perform a bulk transfer of length N in as many packets as it takes (and as long as it takes) without software intervention.

The remaining overhead is a function of
- Length of transfer requested (directly translates into frequency of transfer complete interrupts)
- How much software processing is required for the incoming/outgoing data (ex. Ethernet framing via USB: there's a vendor-specific padding applied to packets to delineate ethernet packet boundaries)
- Whether the device drivers are efficient in moving data around in kernel space.

Of course, the CPU cycles for kernel/userspace are stolen by USB interrupt handling (which for the most part does nothing) - at 8000 interrupts per second, you will see a non-negligible difference in throughput if there is a significant fractional difference in the number of cycles it takes to handle a USB SOF interrupt, for example.
Rockets are loud.
https://astro-pi.org
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 1544
Joined: Thu Jul 11, 2013 2:37 pm
by milhouse » Thu Mar 06, 2014 1:13 am
I'm just repeating the tests from my earlier posts, and with FIQ FSM there is an odd trend emerging while performing test #2:
Code: Select all
2. dd if=/dev/urandom of=/storage/test.dat bs=1M count=1000
   dd if=/storage/test.dat of=/dev/null bs=1M count=1000


Each successive test results in higher peak IRQ/s during the test, and lower read rate. These are the results from 10 consecutive tests immediately after booting, showing *peak* IRQ/s and average read rate for each test:
Code: Select all
Test#     IRQ     Read Rate
#1       5133       29.9 MB
#2       5279       29.4 MB
#3       5557       29.1 MB
#4       5770       28.8 MB
#5       5648       28.5 MB
#6       5712       28.4 MB
#7       5878       28.3 MB
#8       5854       28.1 MB
#9       5977       28.0 MB
#10      6148       27.9 MB
#11      6546       27.3 MB
#12      6510       27.6 MB
#13      6453       27.5 MB
#14      6576       27.4 MB
#15      6601       27.4 MB
#16      6737       26.7 MB
#17      6927       27.2 MB
#18      6863       27.2 MB
#19      6852       27.0 MB
#20      6932       26.8 MB
#21      7013       27.0 MB
#22      7119       26.9 MB
#23      7268       27.0 MB
#24      7130       26.6 MB
#25      7111       26.9 MB
#26      7077       26.9 MB
#27      7209       26.9 MB
#28      7226       26.8 MB
#29      7256       26.5 MB
#30      7344       26.8 MB


As you can see with each test the peak IRQ is steadily climbing, and the read rate reducing.

Once each test is complete, the IRQ/s returns to "normal" which is about 650 IRQ/s on this 1GHz OpenELEC system.

I'm going to leave the test running repeatedly (with a 10 second interval) and see if the trend continues...

Edit: Added tests #11-#30, the trend is continuing fairly consistently. Will leave the test cycling and make a note of ultimate progression
Posts: 611
Joined: Mon Jan 16, 2012 12:59 pm
by jibbs » Thu Mar 06, 2014 4:24 am
I'm running RuneAudio 0.2 on a RevA Raspberry Pi with an Ayre QB-9 USB DAC connected (it's using asynchronous USB). With the stock build I'd always get the occasional pops and clicks so I thought I'd give this new driver a try since it seems to be getting good results from some when playing audio. As soon as I update the drive and modify boot/cmdline.txt as instructed in the first post, I get a kernel panic related to the USB driver at bootup.
- if I remove the DAC and reboot the Pi then there's no kernel panic
- if I remove dwc_otg.fiq_fsm_mask=0x3 from cmdline.txt and plug the DAC back in, then after a reboot there's no kernel panic and audio is streamed to the DAC but it's full of clicks and pops

Here's a screenshot of the error, in case it's of any help... http://imgur.com/b4re5Ha
Posts: 1
Joined: Thu Mar 06, 2014 4:14 am
by tuxx » Thu Mar 06, 2014 10:46 am
Thank you very much for this driver!!!

I applied your latest fix to two distributions:

- Volumio (debian based)
- Runeaudio (Archlinux based)

My dac is the following:

http://hifimediy.com/index.php?route=pr ... duct_id=87 (adaptive).

Before the new fix, i was unable to listen to 24/96 flac files both over ethernet and usb disk and i also got occasional pops and clicks with 16/44.1 flacs.

After following your instructions i confirm that everything runs smoothly with zero problems. Feel free to ask me for any log available.
Posts: 7
Joined: Thu Mar 06, 2014 10:42 am
by milhouse » Thu Mar 06, 2014 4:01 pm
Here's the result of my FIQ FSM testing. After 766 iterations, IRQ had increased from 4868 IRQ/s (straight after booting) to a peak of 7221 (+2357), and read rate had dropped from 30.8614MB/s to 27.8587MB/s (-3.0026MB/s).

Here's a graph of my results, showing read-rate declining rapidly then tailing off, while IRQ/s increases rapidly then stabilises at a high level over time:

Image
Code: Select all
rpi512:~ # dmesg | grep FIQ
[    1.515785] WARN::dwc_otg_hcd_init:1047: FIQ DMA bounce buffers: virt = 0xd0d51000 dma = 0x4ef4c000 len=9024
[    1.515809] FIQ FSM acceleration enabled for :
[    1.515865] WARN::hcd_init:474: FIQ at 0xc04bb144
[    1.515877] WARN::hcd_init:475: FIQ ASM at 0xc04bb41c length 36
[    1.517104] dwc_otg: FIQ enabled
[    1.517123] dwc_otg: FIQ split-transaction FSM enabled

I've now started the test again, this time with FIQ NOP, but early indications suggest that read rate is trending downwards, and IRQ/s is increasing.

I don't know if this is expected behaviour, but I would have thought that IRQ/s and USB read rate would remain fairly constant on a system that is doing nothing else (idle CPU is ~1% when not running the tests, idle IRQ/s 650).
Posts: 611
Joined: Mon Jan 16, 2012 12:59 pm
by milhouse » Thu Mar 06, 2014 4:25 pm
Results are similar with FIQ NOP:

Code: Select all
rpi512:~ # dmesg|grep FIQ
[    1.516281] WARN::dwc_otg_hcd_init:1047: FIQ DMA bounce buffers: virt = 0xd0d51000 dma = 0x4ef4c000 len=9024
[    1.516341] WARN::hcd_init:474: FIQ at 0xc04bb334
[    1.516353] WARN::hcd_init:475: FIQ ASM at 0xc04bb41c length 36
[    1.517573] dwc_otg: FIQ enabled
[    1.517592] dwc_otg: FIQ split-transaction FSM disabled


Not sure another graph is necessary as it's going to look very similar to FIQ FSM.

Here's the raw results after 40 iterations:
Code: Select all
#    IRQ/s     Read Rate   dIRQ        dR/Rate          Time
001   5197  31.1258 MB/s      0    0.0000 MB/s  2014-03-06 15:51:32
002   5163  30.9458 MB/s    -34   -0.1800 MB/s  2014-03-06 15:52:17
003   5142  30.8166 MB/s    -55   -0.3092 MB/s  2014-03-06 15:53:01
004   5292  30.6665 MB/s    +95   -0.4593 MB/s  2014-03-06 15:53:46
005   5274  30.5753 MB/s    +77   -0.5505 MB/s  2014-03-06 15:54:31
006   5340  30.4718 MB/s   +143   -0.6540 MB/s  2014-03-06 15:55:16
007   5369  30.3866 MB/s   +172   -0.7392 MB/s  2014-03-06 15:56:01
008   5188  30.3249 MB/s     -9   -0.8009 MB/s  2014-03-06 15:56:46
009   5185  30.2527 MB/s    -12   -0.8731 MB/s  2014-03-06 15:57:31
010   5144  30.1710 MB/s    -53   -0.9548 MB/s  2014-03-06 15:58:16
011   5122  30.1427 MB/s    -75   -0.9831 MB/s  2014-03-06 15:59:01
012   5189  30.0643 MB/s     -8   -1.0615 MB/s  2014-03-06 15:59:47
013   5275  30.0331 MB/s    +78   -1.0927 MB/s  2014-03-06 16:00:32
014   5304  29.9681 MB/s   +107   -1.1577 MB/s  2014-03-06 16:01:18
015   5476  29.8826 MB/s   +279   -1.2432 MB/s  2014-03-06 16:02:03
016   5432  29.8093 MB/s   +235   -1.3165 MB/s  2014-03-06 16:02:49
017   5541  29.7799 MB/s   +344   -1.3459 MB/s  2014-03-06 16:03:35
018   5420  29.6940 MB/s   +223   -1.4319 MB/s  2014-03-06 16:04:20
019   5512  29.6898 MB/s   +315   -1.4360 MB/s  2014-03-06 16:05:06
020   5562  29.6463 MB/s   +365   -1.4795 MB/s  2014-03-06 16:05:52
021   5597  29.5960 MB/s   +400   -1.5298 MB/s  2014-03-06 16:06:38
022   5572  29.5276 MB/s   +375   -1.5982 MB/s  2014-03-06 16:07:24
023   5681  29.5118 MB/s   +484   -1.6140 MB/s  2014-03-06 16:08:10
024   5700  29.4652 MB/s   +503   -1.6606 MB/s  2014-03-06 16:08:56
025   5794  29.3450 MB/s   +597   -1.7808 MB/s  2014-03-06 16:09:42
026   5790  29.3795 MB/s   +593   -1.7463 MB/s  2014-03-06 16:10:28
027   5844  29.3828 MB/s   +647   -1.7430 MB/s  2014-03-06 16:11:14
028   5934  29.3811 MB/s   +737   -1.7447 MB/s  2014-03-06 16:12:01
029   5962  29.3428 MB/s   +765   -1.7830 MB/s  2014-03-06 16:12:47
030   6122  29.3228 MB/s   +925   -1.8030 MB/s  2014-03-06 16:13:33
031   6109  29.3125 MB/s   +912   -1.8133 MB/s  2014-03-06 16:14:19
032   6068  29.3099 MB/s   +871   -1.8160 MB/s  2014-03-06 16:15:05
033   5936  29.2926 MB/s   +739   -1.8332 MB/s  2014-03-06 16:15:52
034   6001  29.2896 MB/s   +804   -1.8362 MB/s  2014-03-06 16:16:38
035   6054  29.2513 MB/s   +857   -1.8745 MB/s  2014-03-06 16:17:24
036   6070  29.1696 MB/s   +873   -1.9562 MB/s  2014-03-06 16:18:11
037   6068  29.2077 MB/s   +871   -1.9181 MB/s  2014-03-06 16:18:57
038   6100  29.1800 MB/s   +903   -1.9458 MB/s  2014-03-06 16:19:43
039   6016  29.1027 MB/s   +819   -2.0232 MB/s  2014-03-06 16:20:30
040   6099  29.1237 MB/s   +902   -2.0022 MB/s  2014-03-06 16:21:16


which shows that 2.0022 MB/s has been shaved off the read rate, and 902 more IRQ are required a second, after reading 40GB of data.

I'll let this run a bit longer, then try the same test on a kernel without the FIQ FSM patch and see if this behaviour has always been present.
Posts: 611
Joined: Mon Jan 16, 2012 12:59 pm
by jdb » Thu Mar 06, 2014 4:45 pm
There is one possibility that springs to mind - memory fragmentation.

The mass storage driver will queue a variable number of URBs using scatter-gather helpers. If memory is fragmented then it's likely that fewer URBs (possibly of shorter length, meaning higher interrupt rate) get created as the amount of contiguous memory drops.

One way to see if this is linked to the slowdown is to monitor /proc/buddyinfo. Large numbers on the left hand side point to increasing fragmentation.
Rockets are loud.
https://astro-pi.org
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 1544
Joined: Thu Jul 11, 2013 2:37 pm
by milhouse » Thu Mar 06, 2014 4:57 pm
Here's a sample of entries from /proc/buddyinfo taken over a period of a couple of minutes, while a test is in progress:
Code: Select all
Node 0, zone   Normal    156    355    118     27     27      2      1      1      1      0      4
Node 0, zone   Normal     15    111     44     32     27      2      1      1      1      0      4
Node 0, zone   Normal     15    295     83     34     27      2      1      1      1      0      4
Node 0, zone   Normal     15    106    114     32     27      2      1      1      1      0      4
Node 0, zone   Normal     15     36     82     28     27      2      1      1      1      0      4
Node 0, zone   Normal     15    105     66     30     27      2      1      1      1      0      4
Node 0, zone   Normal     10    289     73     28     27      2      1      1      1      0      4
Node 0, zone   Normal     34    276     94     29     27      2      1      1      1      0      4
Node 0, zone   Normal     23    285     96     29     27      2      1      1      1      0      4
Node 0, zone   Normal     24    190    141     25     23      2      1      1      1      0      4
Node 0, zone   Normal     14     37    111     27     23      2      1      1      1      0      4
Node 0, zone   Normal     15    105    118     27     23      2      1      1      1      0      4
Node 0, zone   Normal     15    127    101     24     26      2      1      1      1      0      4
Node 0, zone   Normal     15    280     62     24     26      2      1      1      1      0      4
Node 0, zone   Normal     10    504     88     25     26      2      1      1      1      0      4
Node 0, zone   Normal     70    278     69     30     26      2      1      1      1      0      4
Node 0, zone   Normal     15    229     38     45     26      2      1      1      1      0      4
Node 0, zone   Normal     14     36     52     31     26      2      1      1      1      0      4
Node 0, zone   Normal     11    266     52     33     26      2      1      1      1      0      4
Node 0, zone   Normal     37    123    153     30     26      2      1      1      1      0      4

Presumably this confirms the memory is fragmented? I guess it would explain the tailing off, due to the memory reaching some sort of peak/worst case fragmentation.
Posts: 611
Joined: Mon Jan 16, 2012 12:59 pm
by milhouse » Thu Mar 06, 2014 6:48 pm
Results from a pre-FIQ FSM kernel:
Code: Select all
rpi512:~ # lsb_release
OpenELEC (Milhouse) - Version: devel-20140224200730-r17808-gfbf7d43
rpi512:~ # uname -a
Linux rpi512 3.13.5 #1 PREEMPT Mon Feb 24 20:18:05 GMT 2014 armv6l GNU/Linux
rpi512:~ # dmesg | grep FIQ
[    1.518011] dwc_otg: FIQ enabled
[    1.518029] dwc_otg: FIQ split fix enabled


Same behaviour as with FIQ FSM and FIQ NOP, steadily increasing IRQ and reducing read rate. Raw results with output of /proc/buddyinfo (test started after a fresh boot):
Code: Select all
#    IRQ/s     Read Rate   dIRQ        dR/Rate          Time
001   5350  28.0740 MB/s      0    0.0000 MB/s  2014-03-06 17:18:22    Node 0, zone Normal     30     45     47     55    20     5     3     2     0    1    4
002   5779  27.9404 MB/s   +429   -0.1336 MB/s  2014-03-06 17:19:10    Node 0, zone Normal     39     74     58     33    10     9     3     2     2    2    3
003   6191  27.8300 MB/s   +841   -0.2441 MB/s  2014-03-06 17:19:58    Node 0, zone Normal     53    115     62     25    11    10     2     4     1    2    3
004   5866  27.7499 MB/s   +516   -0.3241 MB/s  2014-03-06 17:20:46    Node 0, zone Normal     23     58     67     12    26    18     1     4     1    2    3
005   6155  27.7177 MB/s   +805   -0.3564 MB/s  2014-03-06 17:21:34    Node 0, zone Normal     16     50     51     17    45     6     2     3     1    2    3
006   6085  27.6414 MB/s   +735   -0.4326 MB/s  2014-03-06 17:22:22    Node 0, zone Normal     13     47    101    117    12     5     2     3     1    2    3
007   5997  27.6005 MB/s   +647   -0.4735 MB/s  2014-03-06 17:23:11    Node 0, zone Normal      9    156    186     31    24     6     2     3     1    2    3
008   5999  27.5629 MB/s   +649   -0.5112 MB/s  2014-03-06 17:23:59    Node 0, zone Normal    181    105     62     23    21     5     2     3     1    2    3
009   6018  27.4995 MB/s   +668   -0.5745 MB/s  2014-03-06 17:24:48    Node 0, zone Normal    140     83     63     29    26     6     3     2     1    2    3
010   5995  27.4639 MB/s   +645   -0.6101 MB/s  2014-03-06 17:25:36    Node 0, zone Normal     36     66     50     35    31     9     2     2     1    2    3
011   6005  27.3915 MB/s   +655   -0.6825 MB/s  2014-03-06 17:26:25    Node 0, zone Normal      8     51    108    108    12     9     2     2     1    2    3
012   5966  27.3521 MB/s   +616   -0.7219 MB/s  2014-03-06 17:27:14    Node 0, zone Normal      8     55     75     65    13     8     2     2     1    2    3
013   6103  27.3271 MB/s   +753   -0.7469 MB/s  2014-03-06 17:28:02    Node 0, zone Normal     36    166    173     79     9     8     2     2     1    2    3
014   5938  27.2683 MB/s   +588   -0.8057 MB/s  2014-03-06 17:28:51    Node 0, zone Normal      8    259     55     31    17     8     2     2     1    2    3
015   5972  27.2233 MB/s   +622   -0.8507 MB/s  2014-03-06 17:29:40    Node 0, zone Normal     20    299     44     49    13     8     2     2     1    2    3
016   5992  27.1684 MB/s   +642   -0.9056 MB/s  2014-03-06 17:30:29    Node 0, zone Normal      3    121    118     45    13     8     2     2     1    2    3
017   6142  27.1442 MB/s   +792   -0.9299 MB/s  2014-03-06 17:31:18    Node 0, zone Normal      3     89    158     20    10     8     2     2     1    2    3
018   6200  27.0824 MB/s   +850   -0.9917 MB/s  2014-03-06 17:32:07    Node 0, zone Normal      3     72    186     44     8     9     2     2     1    2    3
019   6146  27.0500 MB/s   +796   -1.0240 MB/s  2014-03-06 17:32:56    Node 0, zone Normal      3    123    185     17    10     9     2     2     1    2    3
020   6261  27.0005 MB/s   +911   -1.0735 MB/s  2014-03-06 17:33:45    Node 0, zone Normal      4    117    132     24     8     9     2     2     1    2    3
021   6314  26.9376 MB/s   +964   -1.1365 MB/s  2014-03-06 17:34:34    Node 0, zone Normal      5     90    137     29    10    10     1     2     1    2    3
022   6468  26.9044 MB/s  +1118   -1.1696 MB/s  2014-03-06 17:35:24    Node 0, zone Normal      5     79    198     34    11    12     2     1     1    2    3
023   6350  26.9041 MB/s  +1000   -1.1700 MB/s  2014-03-06 17:36:13    Node 0, zone Normal    162    205    140     20    10    12     2     1     1    2    3
024   6402  26.8663 MB/s  +1052   -1.2077 MB/s  2014-03-06 17:37:02    Node 0, zone Normal     40    276    160     19    12    11     2     1     1    2    3
025   6361  26.8727 MB/s  +1011   -1.2014 MB/s  2014-03-06 17:37:52    Node 0, zone Normal      4    123    154     43    15    10     2     1     1    2    3
026   6367  26.7928 MB/s  +1017   -1.2812 MB/s  2014-03-06 17:38:41    Node 0, zone Normal      4    169    199     39    15    10     2     1     1    2    3
027   6558  26.6938 MB/s  +1208   -1.3802 MB/s  2014-03-06 17:39:31    Node 0, zone Normal      4    115    176     19    15    10     2     1     1    2    3
028   6628  26.7765 MB/s  +1278   -1.2976 MB/s  2014-03-06 17:40:20    Node 0, zone Normal    146    455    114     13    14    10     2     1     1    2    3
029   6626  26.7282 MB/s  +1276   -1.3459 MB/s  2014-03-06 17:41:10    Node 0, zone Normal    148    477    140     11    17     9     2     1     1    2    3
030   6627  26.6842 MB/s  +1277   -1.3898 MB/s  2014-03-06 17:41:59    Node 0, zone Normal     10    227    123     30    18     9     2     1     1    2    3
031   6582  26.7059 MB/s  +1232   -1.3682 MB/s  2014-03-06 17:42:49    Node 0, zone Normal      4    113     94     14    16     9     2     1     1    2    3
032   6585  26.6750 MB/s  +1235   -1.3990 MB/s  2014-03-06 17:43:38    Node 0, zone Normal     62     93    141     42    14     7     1     1     1    2    3
033   6736  26.6485 MB/s  +1386   -1.4255 MB/s  2014-03-06 17:44:28    Node 0, zone Normal      3     99    206     30    17     7     1     1     1    2    3
034   6772  26.6137 MB/s  +1422   -1.4603 MB/s  2014-03-06 17:45:18    Node 0, zone Normal     99    172    100     34    17     7     1     1     1    2    3
035   6739  26.5978 MB/s  +1389   -1.4762 MB/s  2014-03-06 17:46:07    Node 0, zone Normal     20    278    101     32    18     7     1     1     1    2    3
036   6753  26.6099 MB/s  +1403   -1.4641 MB/s  2014-03-06 17:46:57    Node 0, zone Normal      4    206    167     23    19     7     1     1     1    2    3
037   6781  26.5997 MB/s  +1431   -1.4743 MB/s  2014-03-06 17:47:47    Node 0, zone Normal      7    180    160     24    19     6     2     1     1    2    3
038   6729  26.5532 MB/s  +1379   -1.5209 MB/s  2014-03-06 17:48:37    Node 0, zone Normal      7    290     87     31    18     7     2     1     1    2    3
039   6809  26.5195 MB/s  +1459   -1.5545 MB/s  2014-03-06 17:49:26    Node 0, zone Normal      5    220    185     26    18     7     2     1     1    2    3
040   6823  26.5291 MB/s  +1473   -1.5449 MB/s  2014-03-06 17:50:16    Node 0, zone Normal    201    339     70     18    18     7     2     1     1    2    3
041   6794  26.5377 MB/s  +1444   -1.5363 MB/s  2014-03-06 17:51:06    Node 0, zone Normal    137    189    123     19    18     7     2     1     1    2    3
042   6910  26.5138 MB/s  +1560   -1.5602 MB/s  2014-03-06 17:51:56    Node 0, zone Normal     72    155    120     26    17     7     2     1     1    2    3
043   6906  26.4825 MB/s  +1556   -1.5916 MB/s  2014-03-06 17:52:46    Node 0, zone Normal    205    127    160     24    17     7     2     1     1    2    3
044   6927  26.4576 MB/s  +1577   -1.6164 MB/s  2014-03-06 17:53:36    Node 0, zone Normal     70    164    228     22    17     7     2     1     1    2    3
045   7032  26.4654 MB/s  +1682   -1.6086 MB/s  2014-03-06 17:54:26    Node 0, zone Normal    182    237     58     22    17     7     2     1     1    2    3
046   6980  26.4469 MB/s  +1630   -1.6271 MB/s  2014-03-06 17:55:15    Node 0, zone Normal    126    223     83     24    17     8     1     1     1    2    3
047   6930  26.4427 MB/s  +1580   -1.6314 MB/s  2014-03-06 17:56:05    Node 0, zone Normal     17    145    151     23    18     8     1     1     1    2    3
048   6987  26.4538 MB/s  +1637   -1.6202 MB/s  2014-03-06 17:56:55    Node 0, zone Normal      4    143    202     22    19     8     1     1     1    2    3
049   6978  26.4363 MB/s  +1628   -1.6377 MB/s  2014-03-06 17:57:45    Node 0, zone Normal      7    437     94     19    19     8     1     1     1    2    3
050   6961  26.4047 MB/s  +1611   -1.6693 MB/s  2014-03-06 17:58:35    Node 0, zone Normal    135    203     62     25    19     8     1     1     1    2    3
051   6940  26.3986 MB/s  +1590   -1.6754 MB/s  2014-03-06 17:59:25    Node 0, zone Normal     40    404     98     18    20     8     1     1     1    2    3
052   6995  26.4086 MB/s  +1645   -1.6654 MB/s  2014-03-06 18:00:15    Node 0, zone Normal    328    283    115     22    20     8     1     1     1    2    3
053   7039  26.3989 MB/s  +1689   -1.6752 MB/s  2014-03-06 18:01:05    Node 0, zone Normal    192    112    120     21    21     8     1     1     1    2    3
054   7046  26.4076 MB/s  +1696   -1.6664 MB/s  2014-03-06 18:01:55    Node 0, zone Normal    166    122    111     23    21     8     1     1     1    2    3
055   6967  26.3780 MB/s  +1617   -1.6960 MB/s  2014-03-06 18:02:45    Node 0, zone Normal     59    402    157     15    21     8     1     1     1    2    3
056   7132  26.3942 MB/s  +1782   -1.6799 MB/s  2014-03-06 18:03:35    Node 0, zone Normal    113    499     48     25    17     8     1     1     1    2    3
057   7091  26.3983 MB/s  +1741   -1.6758 MB/s  2014-03-06 18:04:25    Node 0, zone Normal      3    289     57     23    17     8     1     1     1    2    3
058   7046  26.3791 MB/s  +1696   -1.6949 MB/s  2014-03-06 18:05:15    Node 0, zone Normal    368    202     72     23    17     8     1     1     1    2    3
059   7061  26.3726 MB/s  +1711   -1.7014 MB/s  2014-03-06 18:06:05    Node 0, zone Normal    348    283     61     23    17     8     1     1     1    2    3
060   7009  26.3649 MB/s  +1659   -1.7091 MB/s  2014-03-06 18:06:55    Node 0, zone Normal    134    106    147     24    17     8     1     1     1    2    3
061   7034  26.3377 MB/s  +1684   -1.7364 MB/s  2014-03-06 18:07:45    Node 0, zone Normal    178    112    123     29    17     8     1     1     1    2    3
062   7045  26.3367 MB/s  +1695   -1.7374 MB/s  2014-03-06 18:08:36    Node 0, zone Normal    184    124    119     36    13     7     1     1     1    2    3
063   7082  26.3235 MB/s  +1732   -1.7505 MB/s  2014-03-06 18:09:26    Node 0, zone Normal     59    126    150     37    13     7     1     1     1    2    3
064   7022  26.3183 MB/s  +1672   -1.7557 MB/s  2014-03-06 18:10:16    Node 0, zone Normal      4    301     67     36    13     7     1     1     1    2    3
065   7154  26.2767 MB/s  +1804   -1.7973 MB/s  2014-03-06 18:11:06    Node 0, zone Normal      4    120    177     37    13     7     1     1     1    2    3
066   7243  26.2742 MB/s  +1893   -1.7998 MB/s  2014-03-06 18:11:56    Node 0, zone Normal      4     82    175     35    14     7     1     1     1    2    3
067   7153  26.2611 MB/s  +1803   -1.8129 MB/s  2014-03-06 18:12:46    Node 0, zone Normal      3    226    178     31    15     7     1     1     1    2    3
068   7136  26.2626 MB/s  +1786   -1.8115 MB/s  2014-03-06 18:13:37    Node 0, zone Normal      4    245    102     36    17     6     1     1     1    2    3
069   7260  26.2538 MB/s  +1910   -1.8203 MB/s  2014-03-06 18:14:27    Node 0, zone Normal      5    151    170     31    17     6     1     1     1    2    3
070   7219  26.1716 MB/s  +1869   -1.9024 MB/s  2014-03-06 18:15:17    Node 0, zone Normal    264    316    118     32    16     7     1     1     1    2    3
071   7196  26.2510 MB/s  +1846   -1.8230 MB/s  2014-03-06 18:16:07    Node 0, zone Normal     93    328     96     30    16     7     1     1     1    2    3
072   7231  26.2525 MB/s  +1881   -1.8215 MB/s  2014-03-06 18:16:57    Node 0, zone Normal    187    357    101     41    17     4     1     1     1    2    3
073   7152  26.2494 MB/s  +1802   -1.8246 MB/s  2014-03-06 18:17:48    Node 0, zone Normal    214    222     48     38    18     4     1     1     1    2    3
074   7173  26.2321 MB/s  +1823   -1.8419 MB/s  2014-03-06 18:18:38    Node 0, zone Normal     68    265     77     38    18     4     1     1     1    2    3
075   7189  26.2341 MB/s  +1839   -1.8399 MB/s  2014-03-06 18:19:28    Node 0, zone Normal     88    201    162     39    16     5     1     1     1    2    3
076   7199  26.2188 MB/s  +1849   -1.8552 MB/s  2014-03-06 18:20:18    Node 0, zone Normal      5    333     54     37    17     5     1     1     1    2    3
077   7251  26.2204 MB/s  +1901   -1.8536 MB/s  2014-03-06 18:21:09    Node 0, zone Normal     79    134    162     33    17     5     1     1     1    2    3
078   7260  26.2083 MB/s  +1910   -1.8657 MB/s  2014-03-06 18:21:59    Node 0, zone Normal      8    300     93     34    17     5     1     1     1    2    3
079   7293  26.2074 MB/s  +1943   -1.8667 MB/s  2014-03-06 18:22:49    Node 0, zone Normal      4    420     64     34    17     5     1     1     1    2    3
080   7363  26.2225 MB/s  +2013   -1.8515 MB/s  2014-03-06 18:23:39    Node 0, zone Normal      6    307    120     34    17     5     1     1     1    2    3
081   7251  26.2087 MB/s  +1901   -1.8653 MB/s  2014-03-06 18:24:30    Node 0, zone Normal     11    237     96     39    19     4     1     1     1    2    3
082   7311  26.1829 MB/s  +1961   -1.8911 MB/s  2014-03-06 18:25:20    Node 0, zone Normal      4    236    105     37    19     4     1     1     1    2    3
083   7377  26.1861 MB/s  +2027   -1.8880 MB/s  2014-03-06 18:26:10    Node 0, zone Normal    129    369    112     36    19     4     1     1     1    2    3
084   7321  26.1701 MB/s  +1971   -1.9039 MB/s  2014-03-06 18:27:01    Node 0, zone Normal     31    133    162     25    19     4     1     1     1    2    3
085   7344  26.1800 MB/s  +1994   -1.8940 MB/s  2014-03-06 18:27:51    Node 0, zone Normal      9    231    202     34    20     4     1     1     1    2    3
086   7335  26.1769 MB/s  +1985   -1.8972 MB/s  2014-03-06 18:28:41    Node 0, zone Normal    276    228     94     31    20     4     1     1     1    2    3
087   7309  26.1909 MB/s  +1959   -1.8831 MB/s  2014-03-06 18:29:32    Node 0, zone Normal    143    239     72     43    18     4     1     1     1    2    3
088   7348  26.2066 MB/s  +1998   -1.8674 MB/s  2014-03-06 18:30:22    Node 0, zone Normal     16    459     45     38    18     4     1     1     1    2    3
089   7458  26.1878 MB/s  +2108   -1.8862 MB/s  2014-03-06 18:31:12    Node 0, zone Normal      7    330     77     31    18     4     1     1     1    2    3
090   7295  26.1682 MB/s  +1945   -1.9058 MB/s  2014-03-06 18:32:03    Node 0, zone Normal      4    451    106     41    18     4     1     1     1    2    3
091   7282  26.1651 MB/s  +1932   -1.9089 MB/s  2014-03-06 18:32:53    Node 0, zone Normal    180    207     82     29    18     4     1     1     1    2    3
092   7330  26.1583 MB/s  +1980   -1.9157 MB/s  2014-03-06 18:33:43    Node 0, zone Normal     65    326     47     40    18     4     1     1     1    2    3
093   7396  26.1627 MB/s  +2046   -1.9114 MB/s  2014-03-06 18:34:34    Node 0, zone Normal      8    237    122     40    17     4     1     1     1    2    3
094   7456  26.1495 MB/s  +2106   -1.9245 MB/s  2014-03-06 18:35:24    Node 0, zone Normal     84    496     51     42    17     4     1     1     1    2    3
095   7274  26.1359 MB/s  +1924   -1.9381 MB/s  2014-03-06 18:36:14    Node 0, zone Normal      3    357     80     46    16     4     1     1     1    2    3
096   7343  26.1536 MB/s  +1993   -1.9204 MB/s  2014-03-06 18:37:05    Node 0, zone Normal      4    144    151     43    16     4     1     1     1    2    3
097   7344  26.1510 MB/s  +1994   -1.9230 MB/s  2014-03-06 18:37:55    Node 0, zone Normal      5    336     68     44    15     4     1     1     1    2    3
098   7413  26.1336 MB/s  +2063   -1.9404 MB/s  2014-03-06 18:38:45    Node 0, zone Normal    385    261    125     43    15     4     1     1     1    2    3
099   7373  26.1218 MB/s  +2023   -1.9522 MB/s  2014-03-06 18:39:36    Node 0, zone Normal    175    186    116     43    14     4     1     1     1    2    3
100   7523  26.1330 MB/s  +2173   -1.9410 MB/s  2014-03-06 18:40:26    Node 0, zone Normal     93    210     81     45    14     4     1     1     1    2    3
Posts: 611
Joined: Mon Jan 16, 2012 12:59 pm
by jdb » Thu Mar 06, 2014 7:33 pm
As I suspected, it seems to be broadly correlated. The columns in /proc/buddyinfo correspond to free contiguous 2^n page slabs (starting at 0 in the leftmost column, not including cached or buffer pages). The leftmost two or three show a substantial level of thrash which means there's a lot of reclaim and compaction going on.

It'd be interesting if you do a echo 1 > /proc/sys/vm/drop_caches while running a test and see if the rate improves.
Rockets are loud.
https://astro-pi.org
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 1544
Joined: Thu Jul 11, 2013 2:37 pm
by jamesh » Thu Mar 06, 2014 7:37 pm
I'm unfamiliar with kernel coding rules - can you use a different/home grown allocator to try and avoid fragmentation? Or is the 'problem' not fixable in that way? Is it even a problem - do these increasing numbers actually cause a slowdown?
Volunteer at the Raspberry Pi Foundation, helper at Picademy September, October, November 2014.
Forum Moderator
Forum Moderator
Posts: 15910
Joined: Sat Jul 30, 2011 7:41 pm
by milhouse » Thu Mar 06, 2014 7:43 pm
I don't know if it's a problem really, I've only become aware of it while testing these patches and if it can be explained then great, and if there's no easy solution then that's too bad.

I'll try "echo 1 > /proc/sys/vm/drop_caches" with a FIQ FSM kernel and report back.
Posts: 611
Joined: Mon Jan 16, 2012 12:59 pm