SoFunction
Updated on 2025-04-08

Analysis of the erlang on_load_function_failed troubleshooting process

Overview

When doing mongo building and optimizing, I suddenly found that the package I printed cannot be started. There are logs as follows:

{"Kernel pid terminated",application_controller,"{application_start_failure,kernel,{{shutdown,{failed_to_start_child,kernel_safe_sup,{on_load_function_failed,fast_pbkdf2}}},{kernel,start,[normal,[]]}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,kernel,{{shutdown,{failed_to_start_child,kernel_safe_sup,{on_load_function_failed,fast_pbkdf2}}},{kernel,start,[normal,[]]}}}

The reason is that the environment of the baler changes, resulting in:

root@xxxxx:/srv/apps/xxxxx/lib/fast_pbkdf2-1.0.5/priv# ldd fast_pbkdf2.so 
        .1 =>  (0x00007ffc71bc6000)
        .1.1 => not found
        .6 => /lib/x86_64-linux-gnu/.6 (0x00007faf60b69000)
        /lib64/.2 (0x00007faf61139000)

Troubleshooting process

  • Roll back the code to the bootable package code. Repackage, it cannot be started.
  • The basic judgment is a problem of changing the packaging machine environment. And I found a management classmate of the baler.
  • Suppose that the runable package is a, and the non-runable package is b, and compare the differences between a and b.
  • The a and b packages can all run normally in the development environment.
  • The nm command compares the fast_pbkdf2.so symbols of a/b.
  • Use erlang crash viewer to view crashdump, no valid information.
  • The baler management classmate found that the problem was fixed after the rollback mongodb installation.

The ldd command found the real problem: After installing mongodb by the packaging machine, fast_pbkdf2.so link reached .1.1, while it was .1.0.0 before. My local development environment has .1.1, so a and b can run, while the production container is only .1.0.0, so the run failed.

Code

OTP-24.1
:1465

When erlang load fails, it will be on_load_function_failed. Except for the module that fails to load, no valid information is carried here.

run_on_load_handlers([M|Ms], Debug) ->
 debug(Debug, {running_on_load_handler,M}),
 Fun = fun() ->
       Res = erlang:call_on_load_function(M),
       exit(Res)
   end,
 {Pid,Ref} = spawn_monitor(Fun),
 receive
 {'DOWN',Ref,process,Pid,OnLoadRes} ->
     Keep = OnLoadRes =:= ok,
     erlang:finish_after_on_load(M, Keep),
     case Keep of
     false ->
         Error = {on_load_function_failed,M},
         debug(Debug, Error),
         exit(Error);
     true ->
         debug(Debug, {on_load_handler_returned_ok,M}),
         run_on_load_handlers(Ms, Debug)
     end
 end;
run_on_load_handlers([], _) -> ok.

in conclusion

  • When erlang encounters on_load_function_failed of the dynamic library, troubleshoot from the perspective of the loading of the c/c++ dynamic library.
  • Packaging should be consistent with the production environment and can be used with the same docker image.

The above is the detailed content of the erlang on_load_function_failed troubleshooting. For more information about the erlang on_load_function_failed troubleshooting, please follow my other related articles!