Skip to content

[STORM-3779] killed topology worker does not removed with warn and error that "Topology config is not localized yet..." #7561

Open
@jira-importer

Description

@jira-importer
Collaborator

Hi developers,

We met critical issue when kill storm topology.

 

We killed the topology as below.

Config conf = new Config();
conf.put(Config.NIMBUS_SEEDS, "SOME_NIMBUS_SEED_STRING");
 
KillOptions opt = new KillOptions();
opt.set_wait_secs_isSet(true);
opt.set_wait_secs(10);
 
Nimbus.Iface nimbusClient = NimbusClient.getConfiguredClient(conf).getClient();
nimbusClient.killTopologyWithOpts("TOPOLOGY_NAME", opt);

 

Topology workers were distributed across multiple supervisors.
Some supervisor's workers died normally.

 

But the problem is that,

Some supervisor workers never died with error message like below!!

 

2021-06-29 02:58:44.284 o.a.s.d.s.Container SLOT_6707 [INFO] SET worker-user baef41a4-b5f6-4ea3-8868-5537dfba82f8 root
2021-06-29 02:58:44.284 o.a.s.d.s.Container SLOT_6707 [INFO] Creating symlinks for worker-id: baef41a4-b5f6-4ea3-8868-5537dfba82f8 storm-id: TOPOLOGY_NAME for files(1): [resources]
2021-06-29 02:58:44.284 o.a.s.d.s.BasicContainer SLOT_6707 [INFO] Launching worker with assignment LocalAssignment(topology_id:TOPOLOGY_NAME, executors:[ExecutorInfo(task_start:17, task_end:17), ExecutorInfo(task_start:29, task_end:29), ExecutorInfo(task_start:5, task_end:5)], resources:WorkerResources(mem_on_heap:6272.0, mem_off_heap:0.0, cpu:30.0, shared_mem_on_heap:0.0, shared_mem_off_heap:0.0, resources:{offheap.memory.mb=0.0, onheap.memory.mb=6272.0, cpu.pcore.percent=30.0}, shared_resources:{}), owner:root) for this supervisor d2ee514a-e40e-40fb-b119-59763f3bb95d-10.233.112.14 on port 6707 with id baef41a4-b5f6-4ea3-8868-5537dfba82f8
2021-06-29 02:58:44.285 o.a.s.d.s.Slot SLOT_6708 [INFO] STATE kill-and-relaunch msInState: 6 topo:TOPOLOGY_NAME worker:d06bb5c5-25e2-4557-8996-4d40045022d1 -> waiting-for-worker-start msInState: 0 topo:TOPOLOGY_NAME worker:d06bb5c5-25e2-4557-8996-4d40045022d1
2021-06-29 02:58:44.286 o.a.s.d.s.Slot SLOT_6707 [INFO] STATE kill-and-relaunch msInState: 7 topo:TOPOLOGY_NAME worker:baef41a4-b5f6-4ea3-8868-5537dfba82f8 -> waiting-for-worker-start msInState: 0 topo:TOPOLOGY_NAME worker:baef41a4-b5f6-4ea3-8868-5537dfba82f8
2021-06-29 02:58:46.799 o.a.s.d.s.BasicContainer Thread-7269 [INFO] Worker Process d06bb5c5-25e2-4557-8996-4d40045022d1 exited with code: 254
2021-06-29 02:58:48.065 o.a.s.d.s.BasicContainer Thread-7270 [INFO] Worker Process baef41a4-b5f6-4ea3-8868-5537dfba82f8 exited with code: 254
2021-06-29 02:59:09.234 o.a.s.d.s.t.SupervisorHealthCheck timer [INFO] Running supervisor healthchecks...
2021-06-29 02:59:09.234 o.a.s.h.HealthChecker timer [INFO] The supervisor healthchecks succeeded.
2021-06-29 02:59:39.234 o.a.s.d.s.t.SupervisorHealthCheck timer [INFO] Running supervisor healthchecks...
2021-06-29 02:59:39.234 o.a.s.h.HealthChecker timer [INFO] The supervisor healthchecks succeeded.
2021-06-29 02:59:53.558 o.a.s.d.s.Supervisor pool-11-thread-9 [INFO] Got an assignments from master, will start to sync with assignments: SupervisorAssignments(...)
2021-06-29 02:59:53.936 o.a.s.d.s.Slot SLOT_6702 [INFO] SLOT 6702: Assignment Changed from LocalAssignment(topology_id:TOPOLOGY_NAME, executors:[ExecutorInfo(task_start:23, task_end:23), ExecutorInfo(task_start:11, task_end:11)], resources:WorkerResources(mem_on_heap:3200.0, mem_off_heap:0.0, cpu:20.0, shared_mem_on_heap:0.0, shared_mem_off_heap:0.0, resources:{offheap.memory.mb=0.0, onheap.memory.mb=3200.0, cpu.pcore.percent=20.0}, shared_resources:{}), owner:root) to null
2021-06-29 02:59:53.939 o.a.s.d.s.Container SLOT_6702 [INFO] Killing d2ee514a-e40e-40fb-b119-59763f3bb95d-10.233.112.14:25976cac-9170-44ec-b835-099377cda893
2021-06-29 02:59:54.293 o.a.s.d.s.Slot SLOT_6708 [INFO] SLOT 6708: Assignment Changed from LocalAssignment(topology_id:TOPOLOGY_NAME, executors:[ExecutorInfo(task_start:10, task_end:10), ExecutorInfo(task_start:22, task_end:22)], resources:WorkerResources(mem_on_heap:3200.0, mem_off_heap:0.0, cpu:20.0, shared_mem_on_heap:0.0, shared_mem_off_heap:0.0, resources:{offheap.memory.mb=0.0, onheap.memory.mb=3200.0, cpu.pcore.percent=20.0}, shared_resources:{}), owner:root) to null
2021-06-29 02:59:54.293 o.a.s.d.s.Slot SLOT_6707 [INFO] SLOT 6707: Assignment Changed from LocalAssignment(topology_id:TOPOLOGY_NAME, executors:[ExecutorInfo(task_start:17, task_end:17), ExecutorInfo(task_start:29, task_end:29), ExecutorInfo(task_start:5, task_end:5)], resources:WorkerResources(mem_on_heap:6272.0, mem_off_heap:0.0, cpu:30.0, shared_mem_on_heap:0.0, shared_mem_off_heap:0.0, resources:{offheap.memory.mb=0.0, onheap.memory.mb=6272.0, cpu.pcore.percent=30.0}, shared_resources:{}), owner:root) to null
2021-06-29 02:59:54.296 o.a.s.d.s.Slot SLOT_6708 [INFO] STATE waiting-for-worker-start msInState: 70011 topo:TOPOLOGY_NAME worker:d06bb5c5-25e2-4557-8996-4d40045022d1 -> kill msInState: 0 topo:TOPOLOGY_NAME worker:d06bb5c5-25e2-4557-8996-4d40045022d1
2021-06-29 02:59:54.296 o.a.s.d.s.Slot SLOT_6707 [INFO] STATE waiting-for-worker-start msInState: 70010 topo:TOPOLOGY_NAME worker:baef41a4-b5f6-4ea3-8868-5537dfba82f8 -> kill msInState: 0 topo:TOPOLOGY_NAME worker:baef41a4-b5f6-4ea3-8868-5537dfba82f8
2021-06-29 02:59:54.298 o.a.s.d.s.Slot SLOT_6708 [INFO] SLOT 6708 all processes are dead...
2021-06-29 02:59:54.298 o.a.s.d.s.Container SLOT_6708 [INFO] Cleaning up d2ee514a-e40e-40fb-b119-59763f3bb95d-10.233.112.14:d06bb5c5-25e2-4557-8996-4d40045022d1
2021-06-29 02:59:54.298 o.a.s.d.s.AdvancedFSOps SLOT_6708 [INFO] Deleting path /storm/workers/d06bb5c5-25e2-4557-8996-4d40045022d1/pids/141225
2021-06-29 02:59:54.298 o.a.s.d.s.AdvancedFSOps SLOT_6708 [INFO] Deleting path /storm/workers/d06bb5c5-25e2-4557-8996-4d40045022d1/heartbeats
2021-06-29 03:00:06.452 o.a.s.d.s.AdvancedFSOps AsyncLocalizer Task Executor - 1 [INFO] Deleting path /storm/supervisor/stormdist/TOPOLOGY_NAME/stormjar.jar
2021-06-29 03:00:06.472 o.a.s.d.s.AdvancedFSOps AsyncLocalizer Task Executor - 1 [INFO] Deleting path /storm/supervisor/stormdist/TOPOLOGY_NAME/stormjar.jar.version
2021-06-29 03:00:06.472 o.a.s.d.s.AdvancedFSOps AsyncLocalizer Task Executor - 1 [INFO] Deleting path /storm/supervisor/stormdist/TOPOLOGY_NAME/resources
2021-06-29 03:00:06.472 o.a.s.l.LocalizedResourceRetentionSet AsyncLocalizer Task Executor - 1 [INFO] Deleted blob: TOPOLOGY_NAME-stormjar.jar (REMOVED FROM CLUSTER).
2021-06-29 03:00:06.475 o.a.s.d.s.AdvancedFSOps AsyncLocalizer Task Executor - 1 [INFO] Deleting path /storm/supervisor/stormdist/TOPOLOGY_NAME/stormconf.ser
2021-06-29 03:00:06.475 o.a.s.d.s.AdvancedFSOps AsyncLocalizer Task Executor - 1 [INFO] Deleting path /storm/supervisor/stormdist/TOPOLOGY_NAME/stormconf.ser.version
2021-06-29 03:00:06.475 o.a.s.l.LocalizedResourceRetentionSet AsyncLocalizer Task Executor - 1 [INFO] Deleted blob: TOPOLOGY_NAME-stormconf.ser (REMOVED FROM CLUSTER).
2021-06-29 03:00:06.477 o.a.s.d.s.AdvancedFSOps AsyncLocalizer Task Executor - 1 [INFO] Deleting path /storm/supervisor/stormdist/TOPOLOGY_NAME/stormcode.ser
2021-06-29 03:00:06.477 o.a.s.d.s.AdvancedFSOps AsyncLocalizer Task Executor - 1 [INFO] Deleting path /storm/supervisor/stormdist/TOPOLOGY_NAME/stormcode.ser.version
2021-06-29 03:00:06.478 o.a.s.l.LocalizedResourceRetentionSet AsyncLocalizer Task Executor - 1 [INFO] Deleted blob: TOPOLOGY_NAME-stormcode.ser (REMOVED FROM CLUSTER).
2021-06-29 03:00:06.478 o.a.s.d.s.AdvancedFSOps AsyncLocalizer Task Executor - 1 [INFO] Deleting path /storm/supervisor/stormdist/TOPOLOGY_NAME
2021-06-29 03:00:07.062 o.a.s.d.s.Supervisor pool-11-thread-10 [WARN] Topology config is not localized yet...
2021-06-29 03:00:07.063 o.a.s.t.ProcessFunction pool-11-thread-10 [ERROR] Internal error processing sendSupervisorWorkerHeartbeat
org.apache.storm.utils.WrappedNotAliveException: TOPOLOGY_NAME does not appear to be alive, you should probably exit
at org.apache.storm.daemon.supervisor.Supervisor$1.sendSupervisorWorkerHeartbeat(Supervisor.java:448) ~[storm-server-2.2.0.jar:2.2.0]
at org.apache.storm.generated.Supervisor$Processor$sendSupervisorWorkerHeartbeat.getResult(Supervisor.java:374) ~[storm-client-2.2.0.jar:2.2.0]
at org.apache.storm.generated.Supervisor$Processor$sendSupervisorWorkerHeartbeat.getResult(Supervisor.java:353) ~[storm-client-2.2.0.jar:2.2.0]
at org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:38) [storm-shaded-deps-2.2.0.jar:2.2.0]
at org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:38) [storm-shaded-deps-2.2.0.jar:2.2.0]
at org.apache.storm.security.auth.SimpleTransportPlugin$SimpleWrapProcessor.process(SimpleTransportPlugin.java:172) [storm-client-2.2.0.jar:2.2.0]
at org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:524) [storm-shaded-deps-2.2.0.jar:2.2.0]
at org.apache.storm.thrift.server.Invocation.run(Invocation.java:18) [storm-shaded-deps-2.2.0.jar:2.2.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
at java.lang.Thread.run(Unknown Source) [?:?]
2021-06-29 03:00:07.064 o.a.s.t.ProcessFunction pool-11-thread-3 [ERROR] Internal error processing sendSupervisorWorkerHeartbeat
org.apache.storm.utils.WrappedNotAliveException: TOPOLOGY_NAME does not appear to be alive, you should probably exit
at org.apache.storm.daemon.supervisor.Supervisor$1.sendSupervisorWorkerHeartbeat(Supervisor.java:448) ~[storm-server-2.2.0.jar:2.2.0]
at org.apache.storm.generated.Supervisor$Processor$sendSupervisorWorkerHeartbeat.getResult(Supervisor.java:374) ~[storm-client-2.2.0.jar:2.2.0]
at org.apache.storm.generated.Supervisor$Processor$sendSupervisorWorkerHeartbeat.getResult(Supervisor.java:353) ~[storm-client-2.2.0.jar:2.2.0]
at org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:38) [storm-shaded-deps-2.2.0.jar:2.2.0]
at org.apache.storm.security.auth.SimpleTransportPlugin$SimpleWrapProcessor.process(SimpleTransportPlugin.java:172) [storm-client-2.2.0.jar:2.2.0]
at org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:524) [storm-shaded-deps-2.2.0.jar:2.2.0]
at org.apache.storm.thrift.server.Invocation.run(Invocation.java:18) [storm-shaded-deps-2.2.0.jar:2.2.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
at java.lang.Thread.run(Unknown Source) [?:?]
2021-06-29 03:00:08.106 o.a.s.d.s.Supervisor pool-11-thread-9 [WARN] Topology config is not localized yet...
2021-06-29 03:00:08.107 o.a.s.t.ProcessFunction pool-11-thread-9 [ERROR] Internal error processing sendSupervisorWorkerHeartbeat
org.apache.storm.utils.WrappedNotAliveException: TOPOLOGY_NAME does not appear to be alive, you should probably exit
at org.apache.storm.daemon.supervisor.Supervisor$1.sendSupervisorWorkerHeartbeat(Supervisor.java:448) ~[storm-server-2.2.0.jar:2.2.0]
at org.apache.storm.generated.Supervisor$Processor$sendSupervisorWorkerHeartbeat.getResult(Supervisor.java:374) ~[storm-client-2.2.0.jar:2.2.0]
at org.apache.storm.generated.Supervisor$Processor$sendSupervisorWorkerHeartbeat.getResult(Supervisor.java:353) ~[storm-client-2.2.0.jar:2.2.0]
at org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:38) [storm-shaded-deps-2.2.0.jar:2.2.0]
at org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:38) [storm-shaded-deps-2.2.0.jar:2.2.0]
at org.apache.storm.security.auth.SimpleTransportPlugin$SimpleWrapProcessor.process(SimpleTransportPlugin.java:172) [storm-client-2.2.0.jar:2.2.0]
at org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:524) [storm-shaded-deps-2.2.0.jar:2.2.0]
at org.apache.storm.thrift.server.Invocation.run(Invocation.java:18) [storm-shaded-deps-2.2.0.jar:2.2.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
at java.lang.Thread.run(Unknown Source) [?:?]
2021-06-29 03:00:08.108 o.a.s.d.s.Supervisor pool-11-thread-16 [WARN] Topology config is not localized yet...
2021-06-29 03:00:08.108 o.a.s.t.ProcessFunction pool-11-thread-16 [ERROR] Internal error processing sendSupervisorWorkerHeartbeat
org.apache.storm.utils.WrappedNotAliveException: TOPOLOGY_NAME does not appear to be alive, you should probably exit
at org.apache.storm.daemon.supervisor.Supervisor$1.sendSupervisorWorkerHeartbeat(Supervisor.java:448) ~[storm-server-2.2.0.jar:2.2.0]
at org.apache.storm.generated.Supervisor$Processor$sendSupervisorWorkerHeartbeat.getResult(Supervisor.java:374) ~[storm-client-2.2.0.jar:2.2.0]
at org.apache.storm.generated.Supervisor$Processor$sendSupervisorWorkerHeartbeat.getResult(Supervisor.java:353) ~[storm-client-2.2.0.jar:2.2.0]
at org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:38) [storm-shaded-deps-2.2.0.jar:2.2.0]
at org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:38) [storm-shaded-deps-2.2.0.jar:2.2.0]
at org.apache.storm.security.auth.SimpleTransportPlugin$SimpleWrapProcessor.process(SimpleTransportPlugin.java:172) [storm-client-2.2.0.jar:2.2.0]
at org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:524) [storm-shaded-deps-2.2.0.jar:2.2.0]
at org.apache.storm.thrift.server.Invocation.run(Invocation.java:18) [storm-shaded-deps-2.2.0.jar:2.2.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]

This error message repeated forever until we killed that worker process.

 

 

 


Originally reported by sangheee, imported from: killed topology worker does not removed with warn and error that "Topology config is not localized yet..."
  • status: Open
  • priority: Major
  • resolution: Unresolved
  • imported: 2025-01-24

Activity

jira-importer

jira-importer commented on Apr 23, 2022

@jira-importer
CollaboratorAuthor

radhikakv:

+1 to prioritize this bug fix

We recently migrated to v2.2.0 and this issue is completely messing up the storm topology process which is affecting Production runs. 

Also suggest if there are any workarounds, clean-up scripts that needs to be executed until the bug is fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @jira-importer

        Issue actions

          [STORM-3779] killed topology worker does not removed with warn and error that "Topology config is not localized yet..." · Issue #7561 · apache/storm