我正在运行一个 openmp 程序(CentOS 8.5 上的 gcc 和 libgomp)。我使用 strace 检查发现 syscall clone 被反复调用(我在下面贴出了部分日志),我认为这是...
我正在运行一个 openmp 程序(CentOS 8.5 上的 gcc 和 libgomp)。我使用 strace 进行检查,发现 syscall clone 被一遍又一遍地调用(我在下面贴出了部分日志),我认为这意味着 openmp 线程被不断地重新创建,因为所有其他非 openmp 线程都有固定的数量,并且都在主函数的最开始处初始化。
但我也尝试编写一个简单的openmp程序,似乎openmp在初始化阶段创建了一个线程池,并在以后重新使用它。
所以我的问题是:在什么情况下,libgomp 线程会终止,并重新创建线程?
clone(child_stack=0x7f16bff89ef0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[3265184], tls=0x7f16bff8f700, child_tidptr=0x7f16bff8f9d0) = 3265184
sched_setaffinity(3265184, 16, [8]) = 0
futex(0x7f16bff8fd18, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x40083f14, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x40083f14, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x40083f14, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x4012f184, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x4012f184, FUTEX_WAKE_PRIVATE, 2147483647) = 0
clone(child_stack=0x7f16c1f8def0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[3265185], tls=0x7f16c1f93700, child_tidptr=0x7f16c1f939d0) = 3265185
sched_setaffinity(3265185, 16, [2]) = 0
futex(0x7f16c1f93d18, FUTEX_WAKE_PRIVATE, 1) = 1
clone(child_stack=0x7f16c078aef0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[3265186], tls=0x7f16c0790700, child_tidptr=0x7f16c07909d0) = 3265186
sched_setaffinity(3265186, 16, [4]) = 0
futex(0x7f16c0790d18, FUTEX_WAKE_PRIVATE, 1) = 1
clone(child_stack=0x7f16bff89ef0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[3265187], tls=0x7f16bff8f700, child_tidptr=0x7f16bff8f9d0) = 3265187
sched_setaffinity(3265187, 16, [6]) = 0
futex(0x7f16bff8fd18, FUTEX_WAKE_PRIVATE, 1) = 1
clone(child_stack=0x7f16c178cef0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[3265188], tls=0x7f16c1792700, child_tidptr=0x7f16c17929d0) = 3265188
sched_setaffinity(3265188, 16, [8]) = 0
futex(0x7f16c1792d18, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x40083f14, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x40083f14, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x40083f14, FUTEX_WAKE_PRIVATE, 2147483647) = 0
clone(child_stack=0x7f16c1f8def0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[3265189], tls=0x7f16c1f93700, child_tidptr=0x7f16c1f939d0) = 3265189
sched_setaffinity(3265189, 16, [2]) = 0
futex(0x7f16c1f93d18, FUTEX_WAKE_PRIVATE, 1) = 1
clone(child_stack=0x7f16c178cef0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[3265190], tls=0x7f16c1792700, child_tidptr=0x7f16c17929d0) = 3265190
sched_setaffinity(3265190, 16, [4]) = 0
环境变量:
export PARALLEL_ENSEMBLE_THREADS=5
export GOMP_CPU_AFFINITY=7,2,4,6,8
是的,导出 GOMP_CPU_AFFINITY=7,2,4,6,8 并导出 PARALLEL_ENSEMBLE_THREADS=5
这更像是“系统管理员”的答案,但您可以用它 strace
来提供 堆栈跟踪 ,显示给定系统调用的调用位置。 -k
为此,请使用命令行选项。因此,例如,如果您尝试以下操作:
$ strace -etrace=clone -k -y -f -f bash -c "ls /dev/null;command echo"
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7e1b55ccaa10) = 141358
> /usr/lib/x86_64-linux-gnu/libc.so.6(_Fork+0x27) [0xee1a7]
> /usr/lib/x86_64-linux-gnu/libc.so.6(__libc_fork+0x52) [0xf3fb2]
> /usr/bin/bash(make_child+0x19e) [0x6909e]
> /usr/bin/bash(adjust_shell_level+0x315) [0x56935]
> /usr/bin/bash(adjust_shell_level+0x1bda) [0x581fa]
> /usr/bin/bash(execute_command_internal+0xb88) [0x4ae08]
> /usr/bin/bash(execute_command+0xce) [0x4dc6e]
> /usr/bin/bash(execute_command_internal+0x2ff2) [0x4d272]
> /usr/bin/bash(parse_and_execute+0x7ab) [0xb5ffb]
> /usr/bin/bash(_rl_enable_paren_matching+0xb0ce) [0x11e2de]
> /usr/bin/bash(main+0xf78) [0x33568]
> /usr/lib/x86_64-linux-gnu/libc.so.6(__libc_init_first+0x8a) [0x2a1ca]
> /usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b) [0x2a28b]
> /usr/bin/bash(_start+0x25) [0x34385]
strace: Process 141358 attached
/dev/null
...
(您会看到 SIGCHLD
之后 strace
会默认显示信号,但该部分与答案无关)
使用您的应用程序(仅跟踪 clone()
,跟随子级 -f
),这应该告诉您在 libgomp
代码中创建线程的位置。