Skip to content

FIX remove link to resource_tracker._pid in child processes #450

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 22, 2025

Conversation

tomMoral
Copy link
Contributor

@tomMoral tomMoral commented Apr 15, 2025

@fcharras
Copy link

fcharras commented Apr 15, 2025

Thanks for investigating that this fast @tomMoral

From what I've read from the code so far, this looks good to me (+ the extra resulting simplification on tracker_args -> tracker_fd).

What I intended to do initially is to try to monkey patch resource_tracker._stop and try to detect when it's called in __delete__ (using traceback inspection) and have it exit early in this case, to ensure only minimal differences between parent and child processes.

This is much simpler and indeed setting ._pid to None prevents the wrong code path, can we safely assume that no user code ever tries to explicitly stop the resource_tracker in loky processes ? (probably safe to assume since _pid and _stop are not public anyway...)

(The ci failing on windows looks like an actual remaining bug.)

@tomMoral
Copy link
Contributor Author

Yes, it is safe to assume that we can modify private info.
But the issue we still need to investigate is the failure on windows which looks reproducible..

@fcharras
Copy link

I have investigated the remaining issue in a windows VM.

The current issue arises when the parent process garbage collects the loky resource tracker.

The __del__ is only called once, but for some reason it fails regardless, the process of this resource tracker must have already been terminated somehow in another way.

There are no issues with the resource tracker from multiprocessing library.

Since this remaining windows issue is about the loky resource tracker, there's an easy fix: a quick workaround to could be to override loky resource tracker del method to just do nothing (or, only do nothing on windows).

So far I don't know why the loky resource tracker process is cleant before the garbage collection of the resource tracker object. If we can find out maybe there would be a cleaner fix.

@tomMoral tomMoral merged commit 81d7d22 into joblib:master Apr 22, 2025
11 of 12 checks passed
@tomMoral tomMoral deleted the FIX_resource_tracker branch April 22, 2025 09:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants