Skip to content

Add sampling to flow heuristic. #215

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

kristiandueholm
Copy link

This pull request solves the issue of mitmproxy dump files (flows) getting interpreted to be .har files by detect_input_format(). The error I have been seeing is:

TypeError: 'int' object is not subscriptable

The proposed solution should solve a lot of issues where inserting -f flow makes the program run properly. For example #213, #171, and likely #214.

Root cause

Enabling the debugging mode by setting the MITMPROXY2SWAGGER_DEBUG environment variable revealed that the heuristics generated in detect_input_format() was higher for .har even though the file was a flow dump. The main heuristic for detecting flow files is non-printable (ascii) characters. The underlying issue is that mitmproxy_dump_file_huristic() assumes these will be present in the first 2048 bytes. In my case these were filled with certificates, containing purely printable characters, causing a miss in the heuristic.

Proposed solution

Instead of relying on the first 2048 bytes, sample throughout the file for non-printables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant