declanmalone
Posts: 9
Joined: Fri Nov 23, 2012 5:25 pm

kexec doesn't seem to be working

Tue Dec 04, 2012 8:59 pm

I've got a small (4-Pi) cluster and I want to write my own boot manager (linux + embedded initfamfs) replacing the kernel_emergency.img file (and jumpering pins 5 and 6 on the GPIO to boot into it by default) so that I can control the boot process, backup the machine, re-image it, fix filesystems and so on over either sshd or a web interface. I've got a basic "rescue" kernel in place, but I haven't been able to get it to boot into the "real" raspbian image using kexec. Neither can I kexec into another kernel even from the official kernel.img.
I've read all the forum posts here that mention 'kexec', and I'm pretty sure I'm doing everything correctly. I also saw that support for kexec was added back in 3.2.18 but it's just not working for me.
When I run

Code: Select all

sudo kexec --append "$(</proc/cmdline)" /boot/kernel.img
The system shuts down normally, then I get a message about trying to load the new kernel, but I never get any messages from that new kernel at all.
I've exhausted every possible test I could think of, but nothing works. I'm running the very latest (as of yesterday) firmware/kernel that I downloaded using the rpi-update script, and it definitely has kexec support built in.

I've also tried changing around the kernel boot parameters, omitting some, adding others (particularly debug, loglevel=7, and initcall_debug) but I still don't get a peep out of the new kernel. In all cases it just hangs and I have to power cycle.

At one point early on I did manage to get kexec to come up with "Bye" after trying to load the new kernel, but I don't know how to reproduce that behaviour. Not that that would be of any use, but I thought I should mention it anyway.

Here's a dump of all the relevant info I can think of from the system:

Code: Select all

$ uname -a
Linux hamilton 3.2.27+ #307 PREEMPT Mon Nov 26 23:22:29 GMT 2012 armv6l GNU/Linux

$ zcat /proc/config.gz | grep -i kexec
CONFIG_KEXEC=y

$ cat /proc/cmdline 
dma.dmachans=0x7f35 bcm2708_fb.fbwidth=1824 bcm2708_fb.fbheight=984 bcm2708.boardrev=0xe bcm2708.serial=0x282fc977 smsc95xx.macaddr=B8:27:EB:2F:C9:77 sdhci-bcm2708.emmc_clock_freq=100000000 vc_mem.mem_base=0x1c000000 vc_mem.mem_size=0x20000000  dwc_otg.lpm_enable=0 console=ttyAMA0,115200 kgdboc=ttyAMA0,115200 console=tty1 root=/dev/mmcblk0p2 rootfstype=ext4 elevator=deadline rootwait

$ cat /proc/cpuinfo 
BogoMIPS	: 697.95 (no overclock)
Revision	: 000e

$ free
             total       used       free     shared    buffers     cached
Mem:        448948      92088     356860          0      11392      60276

$ perl -nle 'print if /^\d+[= ]/' /boot/config.txt 
(no output: I'm using default config with no edits)

$ md5sum /boot/* # emergency kernel is mine, the rest are from rpi-update
a990c6bcb62a26880609b805ca9e71bb  /boot/bootcode.bin
a51b441bfdde643f237ad2fa77027052  /boot/cmdline.txt
65b1f32223e8609ff5d0e80afb435285  /boot/config.emergency
5a17b9828796861050b3b17465a8ab1d  /boot/config.txt
7c25ed1c16092063bc5fd0c64d5a0661  /boot/fixup_cd.dat
1935f75fdf989a78a9f99dc738e09626  /boot/fixup.dat
68ccad657720710325ce5c3890860a02  /boot/issue.txt
6227ede075298b5f6d124b8bfcad3032  /boot/kernel_cutdown.img
d574ffcbfbe87d9ca40fbb3caab0086f  /boot/kernel_emergency.img
a27b345e85ab3e1d8db530bd805356dc  /boot/kernel_emergency.old
c13c0fa41d86cb4f397129b11214ce65  /boot/kernel_emergency.sav
1794be00832e1f909b39c2412fb759e4  /boot/kernel.img
6e5296cb3f82310d7e8ac535bcbb387f  /boot/start_cd.elf
5d192f847eb1aa55967ed6727f9e80aa  /boot/start.elf
I'd really love to get this working so that I can move on to writing my boot manager. I know that I can just have my boot kernel modify the startup scripts to boot into raspbian, but then I have to have that system change the boot parameters back again afterwards and it's really messy. The solution with the jumpered GPIO automatically booting into the emergency kernel (and having that kexec the real kernel unless it detects it has to stick around and let me do maintenance) is so much neater since it none of the system config files need to be changed, apart from my emergency kernel image. So, sorry if this is a bit long, but I'd really appreciate if someone could help figure out why kexec isn't working and what can be done to fix it.
Thanks for reading!

declanmalone
Posts: 9
Joined: Fri Nov 23, 2012 5:25 pm

Re: kexec doesn't seem to be working

Wed Dec 05, 2012 2:36 am

I've done some more investigation of what's happening here. With my new emergency kernel (built using the instructions on the RPi wiki) I made a few modifications of the kernel source (my first--yay!) to show what's happening with the kexec calls. Basically I just added some printks, but also changed them from KERN_INFO to KERN_EMERG because I wasn't seeing the new messages on my console.
The changes to kernel_kexec() in kernel/kexec.c are just a couple of printks with the new loglevel (marked with '+'):

Code: Select all

                printk(KERN_EMERG "Starting new kernel\n");
                machine_shutdown();
+                printk(KERN_EMERG "did machine_shutdown()\n");
        }

        machine_kexec(kexec_image);

#ifdef CONFIG_KEXEC_JUMP
        if (kexec_image->preserve_context) {
                syscore_resume();
 Enable_irqs:
                local_irq_enable();
 Enable_cpus:
                enable_nonboot_cpus();
                dpm_resume_noirq(PMSG_RESTORE);
 Resume_devices:
                dpm_resume_end(PMSG_RESTORE);
 Resume_console:
                resume_console();
                thaw_processes();
 Restore_console:
                pm_restore_console();
                mutex_unlock(&pm_mutex);
        }
#endif
+                printk(KERN_EMERG "did machine_kexec()\n");
The #ifdef CONFIG_KEXEC_JUMP code section doesn't seem (based on reading the entire function) to be executed, so I think it can be safely ignored.

The other changes (just more printks) were to machine_kexec() in arch/arm/kernel/machine_kexec.c, which is called from kernel_kexec() above:

Code: Select all

void machine_kexec(struct kimage *image)
{
        unsigned long page_list;
        unsigned long reboot_code_buffer_phys;
        void *reboot_code_buffer;


        page_list = image->head & PAGE_MASK;

        /* we need both effective and real address here */
        reboot_code_buffer_phys =
            page_to_pfn(image->control_code_page) << PAGE_SHIFT;
        reboot_code_buffer = page_address(image->control_code_page);

        /* Prepare parameters for reboot_code_buffer*/
        kexec_start_address = image->start;
        kexec_indirection_page = page_list;
        kexec_mach_type = machine_arch_type;
        kexec_boot_atags = image->start - KEXEC_ARM_ZIMAGE_OFFSET + KEXEC_ARM_ATAGS_OFFSET;
+        printk(KERN_EMERG "About to memcpy()\n");

        /* copy our kernel relocation code to the control code page */
        memcpy(reboot_code_buffer,
               relocate_new_kernel, relocate_new_kernel_size);

+        printk(KERN_EMERG "About to flush_icache_range()\n");

        flush_icache_range((unsigned long) reboot_code_buffer,
                           (unsigned long) reboot_code_buffer + KEXEC_CONTROL_PAGE_SIZE);
+        printk(KERN_EMERG "Bye!\n");

        if (kexec_reinit)
+        {
+                printk(KERN_EMERG "We must do kexec_reinit\n");
                kexec_reinit();
+                printk(KERN_EMERG "did kexec_reinit()\n");
+        }
        local_irq_disable();
        local_fiq_disable();
        setup_mm_for_reboot(0); /* mode is not used, so just pass 0*/
        flush_cache_all();
        outer_flush_all();
        outer_disable();
        cpu_proc_fin();
        outer_inv_all();
        flush_cache_all();

+        printk(KERN_EMERG "About to cpu_reset()\n");
        cpu_reset(reboot_code_buffer_phys);
}
When I rebuild the kernel and boot from it (then continue to boot into the regular raspbian using pivot_root) and try to run kexec again, I see this output:

Code: Select all

[  52.xxx] Starting new kernel
[  52.xxx] did machine_shutdown()
[  52.xxx] About to memcpy()
[  52.xxx] About to flush_icache_range()
[  52.xxx] Bye!
[  52.xxx] About to cpu_reset()
And then it hangs as usual.

So two things are clear. First (maybe not important), the if statement deciding whether to call kexec_reinit() never triggered. Second (the main point), we can see that we called cpu_reset(), but it seems to hang or get lost somewhere in the trampoline/new kernel image/whatever it's doing in there because we never see the "did machine_kexec()" message printed from the calling routing (which should be the next statement executed).

I haven't looked at the code for cpu_reset yet. Chances are that it's beyond my ability to understand what it's doing, but at least it seems clear that that is where the kexec process is derailing.

I hope that maybe this will help someone to narrow down the cause of the problem.

remsnet
Posts: 151
Joined: Wed Dec 19, 2012 7:32 pm
Location: Planet Gaia
Contact: Website Yahoo Messenger

Re: kexec doesn't seem to be working

Tue Dec 24, 2013 4:27 pm

As RPi has no bios ,
it will hangout - known issue since RPI exist,
there is no working boothelper /bootmenu /grub/... yet ..

To safly replace an RPI kernel i use this script :

this does

Code: Select all

#/bin/sh

D=`date +%Y%m%d_%H%M`
mkdir /boot.old-$D

echo "backing up /boot to  /boot.old-$D"
cp -p /boot/* /boot.old-$D/

cd /usr/src/linux
kversion=$(make -s kernelrelease)

echo "Installing selfcompiled  $kversion kernel"
cp System.map /boot/System.map-$kversion
cp System.map /boot/System.map

make ARCH=arm modules_install INSTALL_MOD_PATH=/

make ARCH=arm INSTALL_PATH=/boot/ install

cp .config /boot/config-$kversion
cp ./Module.symvers  /boot/symvers-$kversion
cp arch/arm/boot/Image /boot/kernel.img

echo "You can reboot now rebooting"
echo " In case of failtures , copy back the backup from /boot.old-$D \nwith use of an sdcard reader and an Arm5 Live cd within an OS emulator Like vmware Player"
echo "
"


Hope this helps.

Return to “Troubleshooting”