Could not load mining kernel

Something goes horribly wrong for me in the mining kernel. Running this on arm64 with 32gb ram and the same amount of swap.

2025-05-22T12:36:38.306762Z  INFO poke{src="timer"}:do_poke:slam:interpret: slogger: candidate block timestamp updated: 0x8000000d36cd27d6
2025-05-22T12:36:38.306762Z DEBUG next_effect: nockapp::nockapp::driver: Waiting for recv on next effect
2025-05-22T12:36:38.306814Z DEBUG next_effect: nockapp::nockapp::driver: Waiting for recv on next effect

thread 'serf' panicked at crates/nockvm/rust/nockvm/src/mem.rs:301:23:
Box<dyn Any>

thread 'tokio-runtime-worker' panicked at crates/nockchain/src/mining.rs:175:14:
Could not load mining kernel: OneshotChannelError(RecvError(()))
2025-05-22T12:36:38.309510Z  WARN nockchain::mining: Error during mining attempt: JoinError::Panic(Id(2932), "Could not load mining kernel: OneshotChannelError(RecvError(()))", ...)

It’s related to this code in mem.rs:

    /**  Initialization:
     * The initial frame is a west frame. When the stack is initialized, a number of slots is given.
     * We add three extra slots to store the “previous” frame, stack, and allocation pointer. For the
     * initial frame, the previous allocation pointer is set to the beginning (low boundary) of the
     * arena, the previous frame pointer is set to NULL, and the previous stack pointer is set to NULL
     * size is in 64-bit (i.e. 8-byte) words.
     * top_slots is how many slots to allocate to the top stack frame.
     */
    pub fn new(size: usize, top_slots: usize) -> NockStack {
        let result = Self::new_(size, top_slots);
        match result {
            Ok((stack, _)) => stack,
            Err(e) => std::panic::panic_any(e),
        }
    }

    pub fn new_(size: usize, top_slots: usize) -> Result<(NockStack, usize), NewStackError> {
        if top_slots + RESERVED > size {
            return Err(NewStackError::StackTooSmall);
        }
        let free = size - (top_slots + RESERVED);
        #[cfg(feature = "mmap")]
        let mut memory = Memory::allocate(AllocType::Mmap, size)?;
        #[cfg(feature = "malloc")]
        let mut memory = Memory::allocate(AllocType::Malloc, size)?;
        let start = memory.as_mut_ptr() as *mut u64;

        // Here, frame_offset < alloc_offset, so the initial frame is West
        let frame_offset = RESERVED + top_slots;
        let stack_offset = frame_offset;
        // FIXME: This was alloc_offset = size; why?
        let alloc_offset = size;

        unsafe {
            // Store previous frame/stack/alloc info in reserved slots
            let prev_frame_slot = frame_offset - (FRAME + 1);
            let prev_stack_slot = frame_offset - (STACK + 1);
            let prev_alloc_slot = frame_offset - (ALLOC + 1);

            *(start.add(prev_frame_slot)) = ptr::null::<u64>() as u64; // "frame pointer" from "previous" frame
            *(start.add(prev_stack_slot)) = ptr::null::<u64>() as u64; // "stack pointer" from "previous" frame
            *(start.add(prev_alloc_slot)) = start as u64; // "alloc pointer" from "previous" frame
        };

        assert_eq!(alloc_offset - stack_offset, free);
        Ok((
            NockStack {
                start: start as *const u64,
                size,
                frame_offset,
                stack_offset,
                alloc_offset,
                memory,
                pc: false,
            },
            free,
        ))
    }

I’d love to learn more about why this happens and how to avoid it. It’s not sporadic, this happens every time I try to start PoW.

logs/min2-1747915616.log:2025-05-22T12:40:27.525940Z  INFO poke{src="libp2p"}:do_poke:slam:interpret: slogger: [%mining-on 14.013.155.469.355.287.694 17.658.163.466.538.601.719 16.139.960.547.538.818.049 13.146.085.519.865.444.801 3.604.770.390.141.248.621]
logs/min2-1747915616.log:thread 'tokio-runtime-worker' panicked at crates/nockchain/src/mining.rs:175:14:
logs/min2-1747915616.log:Could not load mining kernel: OneshotChannelError(RecvError(()))
logs/min2-1747915616.log:2025-05-22T12:40:27.692917Z  WARN nockchain::mining: Error during mining attempt: JoinError::Panic(Id(3138), "Could not load mining kernel: OneshotChannelError(RecvError(()))", ...)

What operating system are you running this from?

This is on Ubuntu 24.04. What are you thinking?

1 Like

[All Irrelevant, See Next Post]

I can spin up a virtual machine and see if it happens for me.
For Reproducibility:

https[:]//cdimage[.]ubuntu[.]com/daily-live/20240421/noble-desktop-arm64[.]iso

The above was strange; trying a server iso install instead:

https[:]//cdimage[.]ubuntu[.]com/releases/24[.]04/release/ubuntu-24[.]04[.]2-live-server-arm64[.]iso

The thing about Linux ARM from what I’ve seen is that the repositories aren’t always the same, and software behaves differently especially rust*. Things that build on amd64 for me break on arm64 sometimes.

*: alpine linux with the non-GNU libs has similar issues

1 Like

Hey @grilledasparagus, based on your screenshot in the other thread: you also have this issue! Or at least had the issue this morning. Def let us know if it’s gone now and you’re generating actual proofs.

I am experiencing the same issue on Debian 12. My machine has 64GB of RAM.

1 Like

amd or arm?

Mine is AMD.

1 Like

So I spun up that virtual machine but before I even started building hoon I just grep’ed for mining in my logs on my original machine and also saw this:

2025-05-22T18:09:56.265749Z  INFO poke{src="libp2p"}:do_poke:slam:interpret: slogger: [%mining-on 2.355.513.181.070.318.655 809.918.659.070.895.438 1.802.357.504.238.026 8.368.239.197.549.738.390 9.645.348.589.553.451.187]
thread 'tokio-runtime-worker' panicked at crates/nockchain/src/mining.rs:175:14:
Could not load mining kernel: OneshotChannelError(RecvError(()))
2025-05-22T18:09:56.396984Z  WARN nockchain::mining: Error during mining attempt: JoinError::Panic(Id(558), "Could not load mining kernel: OneshotChannelError(RecvError(()))", ...)

around crates/nockchain/src/mining.rs:175; error around kernel = under the spawns a new std thread comment

pub async fn mining_attempt(candidate: NounSlab, handle: NockAppHandle) -> () {
    let snapshot_dir =
        tokio::task::spawn_blocking(|| tempdir().expect("Failed to create temporary directory"))
            .await 
            .expect("Failed to create temporary directory");
    let hot_state = zkvm_jetpack::hot::produce_prover_hot_state();
    let snapshot_path_buf = snapshot_dir.path().to_path_buf();
    let jam_paths = JamPaths::new(snapshot_dir.path());
    // Spawns a new std::thread for this mining attempt
    let kernel =
        Kernel::load_with_hot_state_huge(snapshot_path_buf, jam_paths, KERNEL, &hot_state, false)
            .await
            .expect("Could not load mining kernel");
    let effects_slab = kernel
        .poke(MiningWire::Candidate.to_wire(), candidate)
        .await
        .expect("Could not poke mining kernel with candidate");
    for effect in effects_slab.to_vec() {
        let Ok(effect_cell) = (unsafe { effect.root().as_cell() }) else {
            drop(effect);
            continue;
        };
        if effect_cell.head().eq_bytes("command") {
            handle
                .poke(MiningWire::Mined.to_wire(), effect)
                .await
                .expect("Could not poke nockchain with mined PoW");
        }
    }
}
1 Like

Annoying, isn’t it. If you look at the logs in my first post, we’re dealing with the same execution path. Kernel::load_with_hot_state_huge calls SerfThread::new, asking it to allocate 32gb of memory (or at least enough for a nock stack of 32gb). NockStack actually allocates the memory.

Or in our case, it doesn’t…

Telegram notified @logan already on this error

1 Like

Really hard to keep track of that chat while debugging this myself. Please update us here if you come across more info from affected users or the team. They might push a fix, but given all they have on their hands now, I’m sure they’d appreciate relevant info to be gathered here in this thread.

1 Like

Great Stuff in Telegram on this

The issue here is that Linux disallows obvious overcommits for MAP_ANONYMOUS by default. The Nock Stack mmaps 128GB which is too much for most systems to commit. Anybody affected by this issue should do sudo sysctl -w vm.overcommit_memory=1 and try again.

Notably this overcommit limit does not affect MAP_SHARED mappings, which is why we can map 1TB to LMDB in vere without issue.

I would PR this to the repo readme but I can’t. Here’s the commit Logan.

7 Likes

This is very promising, thank you!

vm.overcommit_memory=1
Always overcommit. Appropriate for some scientific applications. Classic example is code using sparse arrays and just relying on the virtual memory consisting almost entirely of zero pages.

Clearly applies in this context.

1 Like

Works on my machine now

Many Thanks. Your Wisdom is appreciated

This solved my problem - now I’m off mining. Thanks!

2 Likes

Love to see it

1 Like