当使用 nsys 进行分析时,cudagraph 启动段错误,堆栈跟踪如下:cudaProfilerStart for nsysCapture range started in the applicationCollecting data...*** SIGSEGV (@
当使用 nsys 进行分析时,cudagraph 启动段错误,堆栈跟踪如下:
cudaProfilerStart for nsys
Capture range started in the application
Collecting data...
*** SIGSEGV (@0x20000000001) received by PID 102711 (TID 0x7f518bfff700) from PID 1; stack trace: ***
@ 0x7f542e27cf1d: eas::allspark::FailureHandler(int, siginfo_t*, void*)
@ 0x7f5365c8553d: runtime.sigfwd
@ 0x7f5365c64d85: runtime.sigfwdgo
@ 0x7f5365c635b4: runtime.sigtrampgo
@ 0x7f5365c855a3: runtime.sigtramp
@ 0x7f543d740496: XCBDisconnectStub
@ 0x7f543ed19630: _L_unlock_13
@ 0x7f5320fc51f0: cudbgMain
@ 0x7f531a84fbee: cuptiGetGraphNodeId
@ 0x7f531a86d5b6: cuptiEventGroupSetDisable
@ 0x7f531a88b950: cuptiEnumMetrics
@ 0x7f531a851a97: cuptiActivityEnable
@ 0x7f531a85cbec: cuptiActivityEnable
@ 0x7f5320dd76e6: cuEGLApiInit
@ 0x7f5320e9e24b: cuEGLStreamConsumerConnect
@ 0x7f5358a7149a: stream_executor::gpu::GpuDriver::LaunchExecutableGraph(stream_executor::gpu::GpuContext*, CUgraphExec_st*, CUstream_st*)
@ 0x7f5372b20ca6: stream_executor::gpu::GpuExecutor::LaunchExecutableGraph(stream_executor::Stream*, void*)
@ 0x7f5358957cba: stream_executor::StreamExecutor::LaunchExecutableGraph(stream_executor::Stream*, void*)
@ 0x7f535886cb97: stream_executor::Stream::ThenLaunchGraph(void*)
@ 0x7f5372aef82f: xla::gpu::GpuExecutable::ExecuteThunks(xla::ServiceExecutableRunOptions const*, xla::gpu::BufferAllocations const&, bool, xla::HloExecutionProfile*, int)
@ 0x7f5372af3976: xla::gpu::GpuExecutable::ExecuteAsyncOnStreamWithGraphCaptureImpl(xla::ServiceExecutableRunOptions const*, absl::Span<xla::ShapedBuffer const* const>*, std::vector<xla::ShapeTree<xla::MaybeOwningDeviceMemory>, std::allocator<xla::ShapeTree<xla::MaybeOwningDeviceMemory> > >*, xla::HloExecutionProfile*, int)
如果没有调用 cudaProfilerStart,程序可以长时间运行,但是调用 cudaProfilerStart 后会立即出现 segsev。
nsys版本:NVIDIA Nsight Systems版本2020.4.3.7-10543b6cuda版本:V11.2.152tf版本:1.15