User avatar
schorsch76
Posts: 22
Joined: Sun Apr 28, 2013 8:01 am

Re: Entering aarch64 execution state

Sat Apr 02, 2016 4:50 pm

@Electron752: In my case, it hangs too, when i use a different USB Ethernet adapter.
Very good work :geek:

I used the module mcs7830
9710:7830 MosChip Semiconductor MCS7830 10/100 Mbps Ethernet adapter

EDIT: I unloaded the smsc95xx module and only had the mcs7830 loaded and it hang too ....

When you unplug the USB device (the network card) the pi starts to respond again. ;)
USB storage devices (sdcard reader in my case) seem to work ;)

Electron752
Posts: 142
Joined: Mon Mar 02, 2015 7:09 pm

Re: Entering aarch64 execution state

Sat Apr 02, 2016 5:21 pm

Cool, I was also able to copy a large amount of data from a USB connected flash drive. Although performance is very poor and after awhile the RPI 3 gets bogged down. So part of my theory is that the RPI isn't hung, it's just extremely slow when performing USB I/O.

At this point, I can see a whole bunch of 32bit assumptions in the dwc2 code. Just search for casts to u32 and you can see several memory pointers being cast to 32bit. I think this is because the DMA controller is only 32 bit.

I'm wondering if anybody more familar with the DMA/Memory Manager of linux can help at this point. I'm thinking that all that is needed is to make sure all DMA addresses fall within the 32bit or even 31bit memory range. I think it might be possible to modify the device tree to fix this, but I don't know enough about this.

User avatar
schorsch76
Posts: 22
Joined: Sun Apr 28, 2013 8:01 am

Re: Entering aarch64 execution state

Sat Apr 02, 2016 5:39 pm

In my case it wasnt so slow

Code: Select all

Pi_3 64_bit p2 # dd if=/dev/zero of=test.img count=1M bs=1000
1048576+0 records in
1048576+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 154.839 s, 6.8 MB/s
EDIT: I tried my various USB WLAN and LAN connectors but all failed. The WLAN stick for example:

Code: Select all

[   44.065637] usb 1-1.5: new high-speed USB device number 4 using dwc2
[   46.638162] rtl8192cu: Chip version 0x11
[   46.772057] rtl8192cu: MAC address: e8:94:f6:1d:d1:73
[   46.772067] rtl8192cu: Board Type 0
[   46.772426] rtl_usb: rx_max_size 15360, rx_urb_num 8, in_ep 1
[   46.772493] rtl8192cu: Loading firmware rtlwifi/rtl8192cufw_TMSC.bin
[   46.817332] ieee80211 phy0: Selected rate control algorithm 'rtl_rc'
[   46.822447] usbcore: registered new interface driver rtl8192cu
[   46.849989] systemd-udevd[1713]: Process 'net.sh wlan0 start' failed with exit code 1.
[   46.927697] rtl8192cu: MAC auto ON okay!
[   46.993624] rtl8192cu: Tx queue select: 0x05
it all looks like its happening when the net interface goes up. At the 8192cu case, it dont come back when i disconnect it.

marcus_c
Posts: 11
Joined: Sun Mar 13, 2016 12:15 am

Re: Entering aarch64 execution state

Sat Apr 02, 2016 9:01 pm

Electron752 wrote:

Code: Select all

--- a/drivers/usb/dwc2/core.c
+++ b/drivers/usb/dwc2/core.c
@@ -3163,9 +3163,9 @@ void dwc2_set_parameters(struct dwc2_hsotg *hsotg,
        dev_dbg(hsotg->dev, "%s()\n", __func__);
 
        dwc2_set_param_otg_cap(hsotg, params->otg_cap);
-       dwc2_set_param_dma_enable(hsotg, params->dma_enable);
-       dwc2_set_param_dma_desc_enable(hsotg, params->dma_desc_enable);
-       dwc2_set_param_dma_desc_fs_enable(hsotg, params->dma_desc_fs_enable);
+       dwc2_set_param_dma_enable(hsotg, 0);
+       dwc2_set_param_dma_desc_enable(hsotg, 0);
+       dwc2_set_param_dma_desc_fs_enable(hsotg, 0);
        dwc2_set_param_host_support_fs_ls_low_power(hsotg,
                        params->host_support_fs_ls_low_power);
        dwc2_set_param_enable_dynamic_fifo(hsotg,
Rather than modifying the code, you could just change the .dma_enable setting in params_bcm2835 in platform.c (same directory). The .dma_desc_enable and .dma_desc_fs_enable fields are already 0.

User avatar
schorsch76
Posts: 22
Joined: Sun Apr 28, 2013 8:01 am

Re: Entering aarch64 execution state

Sun Apr 03, 2016 6:50 am

Neddy on the gentoo forum told me, that the framebuffer is working again with that "DMA off" patch.
https://forums.gentoo.org/viewtopic-p-7 ... ml#7901252

Electron752
Posts: 142
Joined: Mon Mar 02, 2015 7:09 pm

Re: Entering aarch64 execution state

Sun Apr 03, 2016 9:35 am

marcus_c:

The "Disable DMA" isn't meant to be a long term solution. It's just a quick fix until someone figures out what needs to be done to get DMA to work correctly.
schorsch76 wrote:Neddy on the gentoo forum told me, that the framebuffer is working again with that "DMA off" patch.
https://forums.gentoo.org/viewtopic-p-7 ... ml#7901252
I would like to know which module they are using. I started trying to get the foundation's kernel module to work, and it has alot of dependencies. Even if the DMA is disabled, it does a bunch of MAILBOX stuff to get the screen info such as the frame buffer address. It might be possible to simply hard code these addresses until this whole things gets farther. I'm going to look at how u-boot queries this information or if it's hardcoded as well.

Personally, getting USB+Networking to work is more important to me since I can do most of what I want through a network connection.

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 5161
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge

Re: Entering aarch64 execution state

Sun Apr 03, 2016 11:20 am

Electron752 wrote: I'm wondering if anybody more familar with the DMA/Memory Manager of linux can help at this point. I'm thinking that all that is needed is to make sure all DMA addresses fall within the 32bit or even 31bit memory range. I think it might be possible to modify the device tree to fix this, but I don't know enough about this.
DMA (like the GPU) uses bus addresses for source and destination. By definition these are 32-bit physical addresses (30 real bits with top two bits used for determining caching behaviour).
So casting to u32 shouldn't lose any information. It might be worth adding some "BUG_ON" sanity checks to any addresses given to DMA controller.

More likely it's not the DMA driver itself that is the issue, but perhaps some of the virtual<->physical<->bus translation functions do not work with 64-bit build.

User avatar
schorsch76
Posts: 22
Joined: Sun Apr 28, 2013 8:01 am

Re: Entering aarch64 execution state

Sun Apr 03, 2016 3:50 pm

dom wrote: More likely it's not the DMA driver itself that is the issue, but perhaps some of the virtual<->physical<->bus translation functions do not work with 64-bit build.
Is this DMA address translation architecture dependent? I think of x86_64 (obvious 64 bit and running). Is this BCM2837 specific?

I just read in my book "Linux kernel programming" about the DMA zones.
Zone_DMA/Zone_DMA32/Zone_Normal/Zone_Highmem. See <linux/mmzones.h>

Lets see if there is a document about the BCM2837 DMAs capabilities...

Electron752
Posts: 142
Joined: Mon Mar 02, 2015 7:09 pm

Re: Entering aarch64 execution state

Mon Apr 04, 2016 3:24 am

So I now have DMA+Networking working. I'm able to ping the RPI 3(64bit) and even run apt!!!!

I'm in the process of backing out a large number of changes, but it appears that all is needed is to add the correct address mappings from the 32bit version. The 64bit version currently was not doing any mapping at all and just stripping off the top 32 bits which doesn't work.

Here is a path, but I'm working on a more minimal amount of changes. The disable DMA patchfrom before is not needed.

BTW, at this point maybe we should start collecting all the changes somewhere such as on github because I think things are far ahead enough now to be useful.

Code: Select all

diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h
index ba437f0..02d72fd 100644
--- a/arch/arm64/include/asm/dma-mapping.h
+++ b/arch/arm64/include/asm/dma-mapping.h
@@ -19,7 +19,11 @@
 #ifdef __KERNEL__
 
 #include <linux/types.h>
-#include <linux/vmalloc.h>
+#include <linux/scatterlist.h>
+#include <linux/dma-attrs.h>
+#include <linux/dma-debug.h>
+
+#include <asm/memory.h>
 
 #include <xen/xen.h>
 #include <asm/xen/hypervisor.h>
@@ -56,6 +60,81 @@ void arch_teardown_dma_ops(struct device *dev);
 #define arch_teardown_dma_ops	arch_teardown_dma_ops
 #endif
 
+/*
+ * dma_to_pfn/pfn_to_dma/dma_to_virt/virt_to_dma are architecture private
+ * functions used internally by the DMA-mapping API to provide DMA
+ * addresses. They must not be used by drivers.
+ */
+#ifndef __arch_pfn_to_dma
+static inline dma_addr_t pfn_to_dma(struct device *dev, unsigned long pfn)
+{
+	if (dev)
+		pfn -= dev->dma_pfn_offset;
+	return (dma_addr_t)__pfn_to_bus(pfn);
+}
+
+static inline unsigned long dma_to_pfn(struct device *dev, dma_addr_t addr)
+{
+	unsigned long pfn = __bus_to_pfn(addr);
+
+	if (dev)
+		pfn += dev->dma_pfn_offset;
+
+	return pfn;
+}
+
+static inline void *dma_to_virt(struct device *dev, dma_addr_t addr)
+{
+	if (dev) {
+		unsigned long pfn = dma_to_pfn(dev, addr);
+
+		return phys_to_virt(__pfn_to_phys(pfn));
+	}
+
+	return (void *)__bus_to_virt((unsigned long)addr);
+}
+
+static inline dma_addr_t virt_to_dma(struct device *dev, void *addr)
+{
+	if (dev)
+		return pfn_to_dma(dev, virt_to_pfn(addr));
+
+	return (dma_addr_t)__virt_to_bus((unsigned long)(addr));
+}
+
+#else
+static inline dma_addr_t pfn_to_dma(struct device *dev, unsigned long pfn)
+{
+	return __arch_pfn_to_dma(dev, pfn);
+}
+
+static inline unsigned long dma_to_pfn(struct device *dev, dma_addr_t addr)
+{
+	return __arch_dma_to_pfn(dev, addr);
+}
+
+static inline void *dma_to_virt(struct device *dev, dma_addr_t addr)
+{
+	return __arch_dma_to_virt(dev, addr);
+}
+
+static inline dma_addr_t virt_to_dma(struct device *dev, void *addr)
+{
+	return __arch_virt_to_dma(dev, addr);
+}
+#endif
+
+/* The ARM override for dma_max_pfn() */
+static inline unsigned long dma_max_pfn(struct device *dev)
+{
+	return PHYS_PFN_OFFSET + dma_to_pfn(dev, *dev->dma_mask);
+}
+#define dma_max_pfn(dev) dma_max_pfn(dev)
+
+#define arch_setup_dma_ops arch_setup_dma_ops
+extern void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+			       struct iommu_ops *iommu, bool coherent);
+
 /* do not use this function in a driver */
 static inline bool is_device_dma_coherent(struct device *dev)
 {
@@ -66,25 +145,36 @@ static inline bool is_device_dma_coherent(struct device *dev)
 
 static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
-	return (dma_addr_t)paddr;
+	unsigned int offset = paddr & ~PAGE_MASK;
+	return pfn_to_dma(dev, __phys_to_pfn(paddr)) + offset;
 }
 
 static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t dev_addr)
 {
-	return (phys_addr_t)dev_addr;
+	unsigned int offset = dev_addr & ~PAGE_MASK;
+	return __pfn_to_phys(dma_to_pfn(dev, dev_addr)) + offset;
 }
 
 static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size)
 {
+	u64 limit, mask;
+
 	if (!dev->dma_mask)
-		return false;
+		return 0;
 
-	return addr + size - 1 <= *dev->dma_mask;
-}
+	mask = *dev->dma_mask;
 
-static inline void dma_mark_clean(void *addr, size_t size)
-{
+	limit = (mask + 1) & ~mask;
+	if (limit && size > limit)
+		return 0;
+
+	if ((addr | (addr + size - 1)) & ~mask)
+		return 0;
+
+	return 1;
 }
 
+static inline void dma_mark_clean(void *addr, size_t size) { }
+
 #endif	/* __KERNEL__ */
 #endif	/* __ASM_DMA_MAPPING_H */
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 853953c..5a9e6cc 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -160,6 +160,14 @@ static inline void *phys_to_virt(phys_addr_t x)
 
 #endif
 
+#ifndef __virt_to_bus
+#define __virt_to_bus	__virt_to_phys
+#define __bus_to_virt	__phys_to_virt
+#define __pfn_to_bus(x)	__pfn_to_phys(x)
+#define __bus_to_pfn(x)	__phys_to_pfn(x)
+#endif
+
+
 #include <asm-generic/memory_model.h>
 
 #endif
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index a6e757c..eaeb046 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -208,8 +208,10 @@ static dma_addr_t __swiotlb_map_page(struct device *dev, struct page *page,
 
 	dev_addr = swiotlb_map_page(dev, page, offset, size, dir, attrs);
 	if (!is_device_dma_coherent(dev))
+        {
+//                dev_err(dev, "Mapping non coherent DMA %08lx\n", dev_addr );  
 		__dma_map_area(phys_to_virt(dma_to_phys(dev, dev_addr)), size, dir);
-
+        }
 	return dev_addr;
 }
 
@@ -219,8 +221,11 @@ static void __swiotlb_unmap_page(struct device *dev, dma_addr_t dev_addr,
 				 struct dma_attrs *attrs)
 {
 	if (!is_device_dma_coherent(dev))
+        {
+//                dev_err(dev, "Unmapping coherent DMA %08lx\n", dev_addr );
 		__dma_unmap_area(phys_to_virt(dma_to_phys(dev, dev_addr)), size, dir);
-	swiotlb_unmap_page(dev, dev_addr, size, dir, attrs);
+	}
+        swiotlb_unmap_page(dev, dev_addr, size, dir, attrs);

Electron752
Posts: 142
Joined: Mon Mar 02, 2015 7:09 pm

Re: Entering aarch64 execution state

Mon Apr 04, 2016 5:26 am

I confirmed that the previous patch is all that is needed to make DMA,USB, and networking work.

Here is a very small patch for what I think is a rather severe issue with the dwc2 driver which is not 64 bit specific. It looks like the bug was a simple typo, but it means that USB send/receive to unaligned memory addresses is broken in the stock driver. It may or may not be necessary to apply this fix.

Code: Select all

+++ linux64-rpi3/drivers/usb/dwc2/hcd.c
@@ -870,8 +870,7 @@
 		chan->xfer_dma = urb->dma + urb->actual_length;
 
 		/* For non-dword aligned case */
-		if (hsotg->core_params->dma_desc_enable <= 0 &&
-		    (chan->xfer_dma & 0x3))
+		if (chan->xfer_dma & 0x3)
 			bufptr = (u8 *)urb->buf + urb->actual_length;
 	} else {
 		chan->xfer_buf = (u8 *)urb->buf + urb->actual_length;

At this point, I've tested openssh(client/server) and am able to ssh into the RPI 3(64bit). I'm also able to sftp large files between the RPI 3. So I'm thinking that alot of things are just going to work at this point.

To me personally, I think getting multiprocessors working is more useful then video but that's a personal preference. I noticed that the mmc driver needs some debugging and is cause intermittent performance issues. The reboot command doesn't work well either. The system will halt but the kernel is missing the magic dust to cause a system reset.

I won't be posting anymore for a few days since I don't want to generate alot of noise. I will still be tinkering with this though...

Electron752
Posts: 142
Joined: Mon Mar 02, 2015 7:09 pm

Re: Entering aarch64 execution state

Mon Apr 04, 2016 6:14 am

Since I can ssh into the RPI 3 now and am now less dependent on the serial port, I reran bytecpu with all the overclocking and clockfixing removed from config.txt. Here are the new results.

RPI 3(64 bit)
===========OVERALL============
INTEGER INDEX: 17.584480
FLOATING-POINT INDEX: 7.295657
(90 MHz Dell Pentium = 1.00)
==============================

RPI 3(32 bit)
===========OVERALL============
INTEGER INDEX: 15.779668
FLOATING-POINT INDEX: 7.787629
(90 MHz Dell Pentium = 1.00)
==============================

This is about a 11% improvement over 32 bit for integer and almost nothing for floating point. Since the difference is so small, I can see now why the foundation/broadcom left it as a community project to port linux to arm64.

User avatar
schorsch76
Posts: 22
Joined: Sun Apr 28, 2013 8:01 am

Re: Entering aarch64 execution state

Mon Apr 04, 2016 6:56 am

The RPi3 is specified with 1.2 GHz. So it is no overclocking.

About the patches: I create today evening a repository on github based on marcus_c/zeldin linux. About the dwc2 patch: I try to contact the maintainers and ask them about their opinion.

11% more performance would be fantastic on the Intel market. 11 % more on one core, is something noticeable IMHO.

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 5161
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge

Re: Entering aarch64 execution state

Mon Apr 04, 2016 11:41 am

Electron752 wrote: This is about a 11% improvement over 32 bit for integer and almost nothing for floating point. Since the difference is so small, I can see now why the foundation/broadcom left it as a community project to port linux to arm64.
Looks like a 6% reduction in performance for floating point.
That is possible with 64-bit builds as code/data sizes tends to be larger so caches may be less effective.

User avatar
schorsch76
Posts: 22
Joined: Sun Apr 28, 2013 8:01 am

Re: Entering aarch64 execution state

Mon Apr 04, 2016 6:03 pm

Here is the repository with the dma patch:
https://github.com/schorsch1976/linux/tree/rpi3

I created a pull request for marcus_c as this was his branch.

fsck
Posts: 26
Joined: Mon Feb 23, 2015 4:49 pm

Re: Entering aarch64 execution state

Mon Apr 04, 2016 6:03 pm

Try running Linpack. Make sure it's compiled with all AArch64 features and NEON + VFPv4.

Electron752
Posts: 142
Joined: Mon Mar 02, 2015 7:09 pm

Re: Entering aarch64 execution state

Mon Apr 04, 2016 8:09 pm

Today, I was able to get video working in fb mode. It was rather easy once the previous fix was made. The mailbox support, and firmware support are included in the stock kernel. All I had to do was add the bcm2708_fb.c from the 32bit kernel and fix Kconfig and Makefile. I did disable video DMA for now just for safely.

So at this point, I'm trying to get all 4 cores to boot and I'm a bit stumped. In what environment doesn't u-boot launch the kernel in? Are all 4 cpus active at this point?

I looks like all I need to do is change the device tree to set the enable-mode to "spin-table" and use the address that u-boot is using for the release address. Seems simple enough. The machine sill boots, but I get an error that cpus 2-4 could not be brought online.

Code: Select all

	cpus: cpus {
		#address-cells = <1>;
		#size-cells = <0>;

		v8_cpu0: [email protected] {
			device_type = "cpu";
			compatible = "arm,cortex-a53", "arm,armv8";
			reg = <0x0>;
			clock-frequency = <1200000000>;
                        enable-method = "spin-table";
			cpu-release-addr = <0 0x0FFFFFF0>;
                        next-level-cache = <&L2_0>;
		};

		v8_cpu1: [email protected] {
			device_type = "cpu";
			compatible = "arm,cortex-a53", "arm,armv8";
			reg = <0x1>;
			clock-frequency = <1200000000>;
                        enable-method = "spin-table";
			cpu-release-addr = <0 0x0FFFFFF0>;
                        next-level-cache = <&L2_0>;	
		};

		v8_cpu2: [email protected] {
			device_type = "cpu";
			compatible = "arm,cortex-a53", "arm,armv8";
			reg = <0x2>;
			clock-frequency = <1200000000>;
                        enable-method = "spin-table";
			cpu-release-addr = <0 0x0FFFFFF0>;
			next-level-cache = <&L2_0>;
		};

		v8_cpu3: [email protected] {
			device_type = "cpu";
			compatible = "arm,cortex-a53", "arm,armv8";
			reg = <0x3>;
			clock-frequency = <1200000000>;
                        enable-method = "spin-table";
			cpu-release-addr = <0 0x0FFFFFF0>;
			next-level-cache = <&L2_0>;
		};

		L2_0: l2-cache0 {
			compatible = "cache";
		};

	};

swarren
Posts: 45
Joined: Tue Mar 01, 2016 5:56 am

Re: Entering aarch64 execution state

Mon Apr 04, 2016 8:18 pm

In the ARM stub code I wrote, the CPU release address values are (IIRC) 0xe0, 0xe8, 0xf0. BTW, Eric Anholt mentioned to me on IRC that he'd just tested the spin table (with the mainline kernel I imagine) and found it worked fine.

User avatar
schorsch76
Posts: 22
Joined: Sun Apr 28, 2013 8:01 am

Re: Entering aarch64 execution state

Mon Apr 04, 2016 8:24 pm

fsck wrote:Try running Linpack. Make sure it's compiled with all AArch64 features and NEON + VFPv4.
This my my linpack result.

Code: Select all

gcc -O3 -march=armv8-a+crc -mtune=cortex-a53 -ftree-vectorize linpackc.c -o linpack
[email protected]_3 64_bit ~ $ ./linpack 
Enter array size (q to quit) [200]:  
Memory required:  315K.


LINPACK benchmark, Double precision.
Machine precision:  15 digits.
Array size 200 X 200.
Average rolled and unrolled performance:

    Reps Time(s) DGEFA   DGESL  OVERHEAD    KFLOPS
----------------------------------------------------
     128   0.63  86.31%   2.91%  10.78%  314949.156
     256   1.25  86.32%   2.91%  10.78%  314754.881
     512   2.50  86.31%   2.91%  10.77%  314805.048
    1024   5.01  86.31%   2.91%  10.77%  314749.034
    2048  10.04  86.36%   2.91%  10.74%  313756.603

Enter array size (q to quit) [200]:  400
Memory required:  1255K.


LINPACK benchmark, Double precision.
Machine precision:  15 digits.
Array size 400 X 400.
Average rolled and unrolled performance:

    Reps Time(s) DGEFA   DGESL  OVERHEAD    KFLOPS
----------------------------------------------------
      16   0.53  91.80%   1.88%   6.32%  350159.421
      32   1.06  91.78%   1.90%   6.32%  349969.477
      64   2.11  91.79%   1.91%   6.29%  349769.851
     128   4.23  91.79%   1.91%   6.30%  349723.157
     256   8.46  91.80%   1.91%   6.29%  349807.992
     512  16.91  91.79%   1.91%   6.30%  349871.049

Enter array size (q to quit) [200]:  600
Memory required:  2820K.


LINPACK benchmark, Double precision.
Machine precision:  15 digits.
Array size 600 X 600.
Average rolled and unrolled performance:

    Reps Time(s) DGEFA   DGESL  OVERHEAD    KFLOPS
----------------------------------------------------
       4   0.57  94.72%   1.34%   3.94%  264407.519
       8   1.14  94.82%   1.38%   3.80%  265486.726
      16   2.28  94.86%   1.36%   3.78%  265056.355
      32   4.56  94.84%   1.36%   3.80%  264995.685
      64   9.12  94.84%   1.37%   3.79%  265222.145
     128  18.24  94.84%   1.37%   3.79%  265174.725

Enter array size (q to quit) [200]:  1000
Memory required:  7824K.


LINPACK benchmark, Double precision.
Machine precision:  15 digits.
Array size 1000 X 1000.
Average rolled and unrolled performance:

    Reps Time(s) DGEFA   DGESL  OVERHEAD    KFLOPS
----------------------------------------------------
       1   0.97  97.41%   0.54%   2.04%  176045.740
       2   1.94  97.90%   0.55%   1.55%  175961.030
       4   3.87  97.90%   0.55%   1.55%  175904.679
       8   7.74  97.90%   0.55%   1.55%  175953.366
      16  15.49  97.90%   0.55%   1.55%  175933.069
But afaik Electron752 thinks this settings is overclocked....

Electron752
Posts: 142
Joined: Mon Mar 02, 2015 7:09 pm

Re: Entering aarch64 execution state

Mon Apr 04, 2016 9:15 pm

swarren wrote:In the ARM stub code I wrote, the CPU release address values are (IIRC) 0xe0, 0xe8, 0xf0. BTW, Eric Anholt mentioned to me on IRC that he'd just tested the spin table (with the mainline kernel I imagine) and found it worked fine.
I have switched to your bootloader now that I have video working. Do you have a working cpu section of the device tree that works with all 4 cpus?
schorsch76 wrote: This my my linpack result.
[/code]

schorsch76: Did you run the test in 32bit mode with the same config.txt settings? I'm curious what the improvement is.

swarren
Posts: 45
Joined: Tue Mar 01, 2016 5:56 am

Re: Entering aarch64 execution state

Mon Apr 04, 2016 9:36 pm

I've only tested the spin table with the test app in my rpi-3-aarch64-demo repo. However, Eric Anholt got it working with the mainline kernel. It looks like that code is here:

https://github.com/anholt/linux/commit/ ... 317b0ecR30
(that's the latest commit in the bcm2837-64 branch in case it gets rebased)

Electron752
Posts: 142
Joined: Mon Mar 02, 2015 7:09 pm

Re: Entering aarch64 execution state

Mon Apr 04, 2016 10:47 pm

swarren wrote:I've only tested the spin table with the test app in my rpi-3-aarch64-demo repo. However, Eric Anholt got it working with the mainline kernel. It looks like that code is here:

https://github.com/anholt/linux/commit/ ... 317b0ecR30
(that's the latest commit in the bcm2837-64 branch in case it gets rebased)
Outstanding....

It took me a few times to figure out which branches to use and that I had to cat the stub with u-boot.bin, but I now have 4 working cpus in 64bit mode. +Video+USB+DMA+Networking All a RPI addict needs.

Now if I can just figure out how to correctly build the files to automate the boot process in u-boot so I don't have to type everything in.

What is the expected file name I need to use with the boot script for u-boot? Alot of the online web pages refer to it as boot.scr.

swarren
Posts: 45
Joined: Tue Mar 01, 2016 5:56 am

Re: Entering aarch64 execution state

Mon Apr 04, 2016 10:49 pm

printenv will show you the boot scripts. IIRC it searches for both boot.scr.uimg and boot.scr in /boot and / on the partition.

User avatar
schorsch76
Posts: 22
Joined: Sun Apr 28, 2013 8:01 am

Re: Entering aarch64 execution state

Tue Apr 05, 2016 6:10 am

Electron752 wrote:[What is the expected file name I need to use with the boot script for u-boot? Alot of the online web pages refer to it as boot.scr.
I dont know if i understand you correctly (no native speaker).
I used in config.txt

Code: Select all

cat p1/config.txt 
 kernel=u-boot.bin 
 kernel_old=1 
 arm_control=0x200 
 disable_commandline_tags=1 
and i set my bootargs to

Code: Select all

U-Boot> setenv linux_load 'fatload mmc 0:1 $loadaddr image' 
 U-Boot> setenv fdt_load 'fatload mmc 0:1 $fdt_addr_r $fdtfile' 
 U-Boot> setenv fdtfile 'bcm2837-rpi-3-b.dtb' 
 U-Boot> setenv bootcmd 'run fdt_load ; run linux_load ; booti $loadaddr - $fdt_addr_r' 
 U-Boot> setenv bootargs 'dwc_otg.lpm_enable=0 console=ttyAMA0,115200 root=/dev/mmcblk0p2 rootfstype=ext4 elevator=deadline fsck.repair=yes rootwait' 
 U-Boot> saveenv 
When you enter this, it creates a uboot.env file which will be loaded and it starts your kenrel everytime you power it on after 2 secs.

I have tried linpack with the same optimizations and the results were nearly half of the 64 results. But i havent tried with my 64 bit config.txt. I have used the default one.

as i run linpack in 64 bit mode and my upper config.txt the CPU got warm. But i have got it in 32 bit mode to 89°C too as i used all 4 cores with max frequency and max cpu load. I have no cooling device attache to the cpu, because the cpu is specified with 1.2GHz. It was not instable as i used all 4 cores at max load. I compiled my gentoo userland and it runs. ;)

User avatar
schorsch76
Posts: 22
Joined: Sun Apr 28, 2013 8:01 am

Re: Entering aarch64 execution state

Tue Apr 05, 2016 6:25 am

swarren wrote:I've only tested the spin table with the test app in my rpi-3-aarch64-demo repo. However, Eric Anholt got it working with the mainline kernel. It looks like that code is here:

https://github.com/anholt/linux/commit/ ... 317b0ecR30
(that's the latest commit in the bcm2837-64 branch in case it gets rebased)
Have you any link to see what he did? Mailling list?
Thanks

User avatar
patrikg
Posts: 168
Joined: Sun Mar 18, 2012 10:19 pm

Re: Entering aarch64 execution state

Tue Apr 05, 2016 8:39 am

schorsch76 wrote:
Electron752 wrote:[What is the expected file name I need to use with the boot script for u-boot? Alot of the online web pages refer to it as boot.scr.
I dont know if i understand you correctly (no native speaker).
I used in config.txt

Code: Select all

cat p1/config.txt 
 kernel=u-boot.bin 
 kernel_old=1 
 arm_control=0x200 
 disable_commandline_tags=1 
and i set my bootargs to

Code: Select all

U-Boot> setenv linux_load 'fatload mmc 0:1 $loadaddr image' 
 U-Boot> setenv fdt_load 'fatload mmc 0:1 $fdt_addr_r $fdtfile' 
 U-Boot> setenv fdtfile 'bcm2837-rpi-3-b.dtb' 
 U-Boot> setenv bootcmd 'run fdt_load ; run linux_load ; booti $loadaddr - $fdt_addr_r' 
 U-Boot> setenv bootargs 'dwc_otg.lpm_enable=0 console=ttyAMA0,115200 root=/dev/mmcblk0p2 rootfstype=ext4 elevator=deadline fsck.repair=yes rootwait' 
 U-Boot> saveenv 
When you enter this, it creates a uboot.env file which will be loaded and it starts your kenrel everytime you power it on after 2 secs.

I have tried linpack with the same optimizations and the results were nearly half of the 64 results. But i havent tried with my 64 bit config.txt. I have used the default one.

as i run linpack in 64 bit mode and my upper config.txt the CPU got warm. But i have got it in 32 bit mode to 89°C too as i used all 4 cores with max frequency and max cpu load. I have no cooling device attache to the cpu, because the cpu is specified with 1.2GHz. It was not instable as i used all 4 cores at max load. I compiled my gentoo userland and it runs. ;)
Run command can take more args, you can short it if you want to be like this...
from

Code: Select all

run fdt_load ; run linux_load
to

Code: Select all

run fdt_load linux_load
And what i have seen on internet you also could append reset command after boot.
So if something went wrong in the boot, the board will reset and try again.

Return to “Bare metal, Assembly language”