What is kernel soft lockup?

I’m continuously getting errors from kernel on one of the my system as below:

BUG: soft lockup - CPU#0 stuck for 15s!
BUG: soft lockup - CPU#2 stuck for 24s!

As you can see, cpu id and duration changes but problem is same. What is the main reason for this?

Linux kernel starts a watchdog thread for each of the CPU core in the system. You can see this threads:

$ ps uax | grep watchdog
root        10  0.0  0.0      0     0 ?        S    Feb19   0:00 [watchdog/0]
root        11  0.0  0.0      0     0 ?        S    Feb19   0:00 [watchdog/1]
root        16  0.0  0.0      0     0 ?        S    Feb19   0:00 [watchdog/2]
root        21  0.0  0.0      0     0 ?        S    Feb19   0:00 [watchdog/3]

In above example we can see that total of 4 watchdog threads created by the kernel itself.

Watchdog tasks are high priority kernel threads that gets current timestamp every time it is scheduled and save the value in a per-CPU data structure.

If that timestamp is not updated for 2 x watchdog_thresh seconds (the softlockup threshold) the softlockup detector will dump useful debug information to the system log, after which it will call panic (through kernel.softlockup_panic sysctl variable) if it was instructed to do so or resume execution of
other kernel code.

User mode processes can not make soft lockups, they are always re-scheduled by the kernel. Soft lockups only occurs in kernel mode codes which can be a buggy driver etc.