Description
Describe the bug
the field values of the request object (for a GET request via http and https) vary between http and https (intercepted) in an inconsistent and potentially incorrect manner that is hindering correct implementation of a custom archive redirection plugin
protocol
is None always -> is this supposed to carry scheme information? how to distinguish whether the request is https or not???_url
is incomplete (missing scheme+host) for https requestshost
is None for https requestport
is incorrectly reported as 80 for https request (that is actually sent to 443)
To Reproduce
the issue can be reproduced with
- running
poetry install
and thenpoetry shell
and thenpython proxy/request_proxy.py
in the root dir of https://github.com/kuefmz/https-interception-proxypy/ - curl -x http://0.0.0.0:8899/ --cacert ca-cert.pem https://www.w3id.org/simulation/ontology/
- curl -x http://0.0.0.0:8899/ --cacert ca-cert.pem http://www.w3id.org/simulation/ontology/
Expected behavior
protocol
should not be None but http or https?_url
should be full request url (including scheme+host) for https requests that are interceptedhost
should be equivalent to the FQDN for https request as they are for http requestport
should be correctly reported
Version information
- OS: [ubuntu 20.04]
- Browser [curl]
- Device: [amd64]
- proxy.py Version [2.4.4]
Additional context
Log from the custom proxy with output of request object and selection of fields
2024-08-02 15:51:26,940 - pid:209673 [I] plugins.load:89 - Loaded plugin proxy.http.proxy.HttpProxyPlugin 2024-08-02 15:51:26,940 - pid:209673 [I] plugins.load:89 - Loaded plugin __main__.RequestPlugin Request _url: www.w3id.org:443 Request.method: b'CONNECT' Request protocol: None Request host: b'www.w3id.org' Request path: None Request properties: {'state': 6, 'type': 1, 'protocol': None, 'host': b'www.w3id.org', 'port': 443, 'path': None, 'method': b'CONNECT', 'code': None, 'reason': None, 'version': b'HTTP/1.1', 'total_size': 116, 'buffer': None, 'headers': {b'host': (b'Host', b'www.w3id.org:443'), b'user-agent': (b'User-Agent', b'curl/7.81.0'), b'proxy-connection': (b'Proxy-Connection', b'Keep-Alive')}, 'body': None, 'chunk': None, '_url': , '_is_chunked_encoded': False, '_content_expected': False, '_is_https_tunnel': True}Request _url: /simulation/ontology/
Request.method: b'GET'
Request protocol: None
Request host: None
Request path: b'/simulation/ontology/'
Request properties: {'state': 6, 'type': 1, 'protocol': None, 'host': None, 'port': 80, 'path': b'/simulation/ontology/', 'method': b'GET', 'code': None, 'reason': None, 'version': b'HTTP/1.1', 'total_size': 96, 'buffer': None, 'headers': {b'host': (b'Host', b'www.w3id.org'), b'user-agent': (b'User-Agent', b'curl/7.81.0'), b'accept': (b'Accept', b'/')}, 'body': None, 'chunk': None, '_url': <proxy.http.url.Url object at 0x7f50b472bc10>, '_is_chunked_encoded': False, '_content_expected': False, '_is_https_tunnel': False}
2024-08-02 15:51:35,421 - pid:209683 [I] server.access_log:388 - 127.0.0.1:59292 - CONNECT www.w3id.org:443 - 683 bytes - 680.55msRequest _url: http://www.w3id.org/simulation/ontology/
Request.method: b'GET'
Request protocol: None
Request host: b'www.w3id.org'
Request path: b'/simulation/ontology/'
Request properties: {'state': 6, 'type': 1, 'protocol': None, 'host': b'www.w3id.org', 'port': 80, 'path': b'/simulation/ontology/', 'method': b'GET', 'code': None, 'reason': None, 'version': b'HTTP/1.1', 'total_size': 145, 'buffer': None, 'headers': {b'host': (b'Host', b'www.w3id.org'), b'user-agent': (b'User-Agent', b'curl/7.81.0'), b'accept': (b'Accept', b'/'), b'proxy-connection': (b'Proxy-Connection', b'Keep-Alive')}, 'body': None, 'chunk': None, '_url': <proxy.http.url.Url object at 0x7f50b4520610>, '_is_chunked_encoded': False, '_content_expected': False, '_is_https_tunnel': False}
2024-08-02 15:51:44,094 - pid:209688 [I] server.access_log:388 - 127.0.0.1:49428 - GET www.w3id.org:80/simulation/ontology/ - 301 Moved Permanently - 573 bytes - 479.09ms