Skip to content

ospray causing parallel engine cores on exit  #4802

Open
@biagas

Description

@biagas

This is with the test suite being run in parallel in batch from pascal. I haven't tested other systems.

I noticed while monitoring a parallel test-suite run that the engine was creating core files at completion of every test. I limited my run to a single test so I could try to track down the problem.
I decided to run plots/contour.py and and piped the output to a file.

Here's what I saw being printed to the log file:

  EXIT:   Test script contour.py
  EXCODE: 111
 - - - - - - - - - - - - - - -


srun: error: pascal32: task 1: Aborted (core dumped)
srun: error: pascal32: task 0: Aborted (core dumped)

I added 'ulimit -c unlimited' to run_visit_test_suite.sh so I could get a useful core file and reran the test. I then ran gdb on the corefile, and backtrace yielded this:

(gdb) bt
#0  0x00002aaac1ad1387 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
#1  0x00002aaac1ad2a78 in __GI_abort () at abort.c:90
#2  0x00002aaaba0cd955 in signalhandler_core (sig=11)
    at /visit/src/common/misc/DebugStreamFull.C:162
#3  <signal handler called>
#4  0x00000000000001d1 in ?? ()
#5  0x00002aaac1ad505a in __cxa_finalize (d=0x2aaab47f3440) at cxa_finalize.c:55
#6  0x00002aaab3fd5113 in __do_global_dtors_aux ()
   from /usr/workspace/wsa/visit/visit/thirdparty_shared/3.1.0/toss3/ospray/1.6.1/linux-x86_64_gcc-4.9/lib64/libospray_module_ispc.so.0
#7  0x00007fffffffb960 in ?? ()
#8  0x00002aaaaaabb07a in _dl_fini () at dl-fini.c:253
Backtrace stopped: frame did not save the PC

I then completely recompiled VisIt without ospray (I removed the ospray libraries from quartz386.cmake), and ran the test again. This time no core files were created.

FWIW, I use a couple of scripts for building VisIt and running the test suite that were basically culled from regressiontest_pascal, split for separate build and run-the-test-suite scripts, then modified for my local dirs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingimpact mediumProductivity partially degraded (not easily mitigated bug) or improved (enhancement)likelihood mediumNeither low nor high likelihoodreviewedIssue has been reviewed and labeled by a developer

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions