Skip to content
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Commit 0e9ae06

Browse files
authoredOct 7, 2022
Merge pull request #1 from FelixMertin/poetry_linters
[add] poetry as dependency management and linters for code quality
2 parents c1c2b42 + d8c2f7a commit 0e9ae06

File tree

7 files changed

+742
-39
lines changed

7 files changed

+742
-39
lines changed
 

‎.gitignore

Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
# Ignore all generated CSV files
2+
*.csv
3+
4+
# Byte-compiled / optimized / DLL files
5+
__pycache__/
6+
*.py[cod]
7+
*$py.class
8+
9+
# C extensions
10+
*.so
11+
12+
# Distribution / packaging
13+
.Python
14+
build/
15+
develop-eggs/
16+
dist/
17+
downloads/
18+
eggs/
19+
.eggs/
20+
lib/
21+
lib64/
22+
parts/
23+
sdist/
24+
var/
25+
wheels/
26+
share/python-wheels/
27+
*.egg-info/
28+
.installed.cfg
29+
*.egg
30+
MANIFEST
31+
32+
# PyInstaller
33+
# Usually these files are written by a python script from a template
34+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
35+
*.manifest
36+
*.spec
37+
38+
# Installer logs
39+
pip-log.txt
40+
pip-delete-this-directory.txt
41+
42+
# Unit test / coverage reports
43+
htmlcov/
44+
.tox/
45+
.nox/
46+
.coverage
47+
.coverage.*
48+
.cache
49+
nosetests.xml
50+
coverage.xml
51+
*.cover
52+
*.py,cover
53+
.hypothesis/
54+
.pytest_cache/
55+
cover/
56+
57+
# Translations
58+
*.mo
59+
*.pot
60+
61+
# Django stuff:
62+
*.log
63+
local_settings.py
64+
db.sqlite3
65+
db.sqlite3-journal
66+
67+
# Flask stuff:
68+
instance/
69+
.webassets-cache
70+
71+
# Scrapy stuff:
72+
.scrapy
73+
74+
# Sphinx documentation
75+
docs/_build/
76+
77+
# PyBuilder
78+
.pybuilder/
79+
target/
80+
81+
# Jupyter Notebook
82+
.ipynb_checkpoints
83+
84+
# IPython
85+
profile_default/
86+
ipython_config.py
87+
88+
# pyenv
89+
# For a library or package, you might want to ignore these files since the code is
90+
# intended to run in multiple environments; otherwise, check them in:
91+
# .python-version
92+
93+
# pipenv
94+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
95+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
96+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
97+
# install all needed dependencies.
98+
#Pipfile.lock
99+
100+
# poetry
101+
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
102+
# This is especially recommended for binary packages to ensure reproducibility, and is more
103+
# commonly ignored for libraries.
104+
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
105+
#poetry.lock
106+
107+
# pdm
108+
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
109+
#pdm.lock
110+
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
111+
# in version control.
112+
# https://pdm.fming.dev/#use-with-ide
113+
.pdm.toml
114+
115+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
116+
__pypackages__/
117+
118+
# Celery stuff
119+
celerybeat-schedule
120+
celerybeat.pid
121+
122+
# SageMath parsed files
123+
*.sage.py
124+
125+
# Environments
126+
.env
127+
.venv
128+
env/
129+
venv/
130+
ENV/
131+
env.bak/
132+
venv.bak/
133+
134+
# Spyder project settings
135+
.spyderproject
136+
.spyproject
137+
138+
# Rope project settings
139+
.ropeproject
140+
141+
# mkdocs documentation
142+
/site
143+
144+
# mypy
145+
.mypy_cache/
146+
.dmypy.json
147+
dmypy.json
148+
149+
# Pyre type checker
150+
.pyre/
151+
152+
# pytype static type analyzer
153+
.pytype/
154+
155+
# Cython debug symbols
156+
cython_debug/
157+
158+
# PyCharm
159+
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
160+
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
161+
# and can be added to the global gitignore or merged into this file. For a more nuclear
162+
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
163+
#.idea/

‎.pre-commit-config.yaml

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
repos:
2+
- repo: https://github.com/pre-commit/pre-commit-hooks
3+
rev: v4.3.0
4+
hooks:
5+
- id: trailing-whitespace
6+
- id: check-merge-conflict
7+
- id: check-yaml
8+
args: [--unsafe]
9+
- id: check-json
10+
- id: detect-private-key
11+
- id: end-of-file-fixer
12+
13+
- repo: https://github.com/timothycrosley/isort
14+
rev: 5.10.1
15+
hooks:
16+
- id: isort
17+
18+
- repo: https://github.com/psf/black
19+
rev: 22.8.0
20+
hooks:
21+
- id: black
22+
23+
- repo: https://gitlab.com/pycqa/flake8
24+
rev: 3.9.2
25+
hooks:
26+
- id: flake8

‎README.md

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,10 @@ At the end, it saves a CSV file in the current working directory. By default, al
55

66
## Running:
77
1. Install Python
8-
2. Run `pip install requests bs4`
9-
3. Run as `python WpBrokenCheck.py [Domain] [CSV_FileName]`
10-
4. Example: `python WpBrokenCheck.py example.com example.csv`
8+
2. Install [Poetry](https://python-poetry.org/docs/#installation)
9+
3. Run `poetry install`
10+
4. Run as `poetry run python WpBrokenCheck.py [Domain] [CSV_FileName]`
11+
5. Example: `poetry run python WpBrokenCheck.py example.com example.csv`
1112
<p align="center">
1213
<img src="https://res.cloudinary.com/suleman/image/upload/v1665055858/WpBrokenCheck.png">
1314
</p>
@@ -16,6 +17,16 @@ At the end, it saves a CSV file in the current working directory. By default, al
1617

1718
**Tip** : If target website has large number of posts then change `max_workers` from 5 to 10 at line 60.
1819

20+
## Linters:
21+
22+
There are the following Python linters:
23+
- black for code formatting
24+
- flake8 code formatting and line brakes (PEP8)
25+
- isort for reordering imports
26+
27+
They are run via pre-commit as you commit the code to the repository. You can also run it manually on all files by:
28+
`pre-commit run --all-files`
29+
1930
### To Do:
2031
- Make it filter specific codes.
2132
- Make it run on customized WP sites

‎WpBrokenCheck.py

Lines changed: 58 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,73 +1,95 @@
1-
import requests
2-
import csv
31
import concurrent.futures
4-
from concurrent.futures import as_completed
2+
import csv
53
import sys
4+
from concurrent.futures import as_completed
5+
66
import bs4
7+
import requests
78

89
domain = sys.argv[1]
910
csv_file = sys.argv[2]
1011
sess = requests.Session()
1112
links404 = []
1213

1314
headers = {
14-
'authority': 'www.'+domain,
15-
'referer': 'https://'+domain,
16-
'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Mobile Safari/537.36',
17-
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
18-
'sec-fetch-dest': 'document',
19-
'accept-language': 'en-US,en;q=0.9,tr;q=0.8',
15+
"authority": "www." + domain,
16+
"referer": "https://" + domain,
17+
"user-agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Mobile Safari/537.36",
18+
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
19+
"sec-fetch-dest": "document",
20+
"accept-language": "en-US,en;q=0.9,tr;q=0.8",
2021
}
21-
pages = int(sess.get('https://' + domain + '/wp-json/wp/v2/posts', headers=headers).headers['X-WP-TotalPages'])
22+
pages = int(
23+
sess.get("https://" + domain + "/wp-json/wp/v2/posts", headers=headers).headers[
24+
"X-WP-TotalPages"
25+
]
26+
)
27+
2228

2329
def prepare_csv_data(id, post_link, data):
24-
for i in data:
25-
links404.append({
30+
for i in data:
31+
links404.append(
32+
{
2633
"Post ID": id,
2734
"Post Link": post_link,
2835
"Broken Link": i[0][0],
29-
"Status Code":i[1],
30-
"Link Text":i[0][1]
31-
})
32-
36+
"Status Code": i[1],
37+
"Link Text": i[0][1],
38+
}
39+
)
40+
41+
3342
def generate_csv_report(csv_file, csv_data):
34-
with open(csv_file, 'w+',encoding="utf-8") as file:
35-
csvwriter = csv.DictWriter(file, fieldnames=list(csv_data[0].keys()))
36-
csvwriter.writeheader()
37-
csvwriter.writerows(csv_data)
38-
43+
44+
if csv_data:
45+
with open(csv_file, "w+", encoding="utf-8") as file:
46+
csvwriter = csv.DictWriter(file, fieldnames=list(csv_data[0].keys()))
47+
csvwriter.writeheader()
48+
csvwriter.writerows(csv_data)
49+
50+
print("Report saved in file: ", csv_file)
51+
52+
if not csv_data:
53+
print("There were no broken links!")
54+
55+
3956
def getLinks(rendered_content):
40-
soup = bs4.BeautifulSoup(rendered_content, 'html.parser')
41-
return [(link['href'],link.text) for link in soup('a') if 'href' in link.attrs]
42-
57+
soup = bs4.BeautifulSoup(rendered_content, "html.parser")
58+
return [(link["href"], link.text) for link in soup("a") if "href" in link.attrs]
59+
60+
4361
def getStatusCode(link, headers, timeout=5):
4462
print(" checking: ", link[0])
4563
try:
4664
r = sess.head(link[0], headers=headers, timeout=timeout)
47-
except (requests.exceptions.SSLError,
48-
requests.exceptions.HTTPError,
49-
requests.exceptions.ConnectionError,
50-
requests.exceptions.MissingSchema,
51-
requests.exceptions.Timeout,
52-
requests.exceptions.InvalidSchema
53-
) as errh:
65+
except (
66+
requests.exceptions.SSLError,
67+
requests.exceptions.HTTPError,
68+
requests.exceptions.ConnectionError,
69+
requests.exceptions.MissingSchema,
70+
requests.exceptions.Timeout,
71+
requests.exceptions.InvalidSchema,
72+
) as errh:
5473
print("Error in URL, ", link)
5574
return link, errh.__class__.__name__
5675
else:
5776
return link, str(r.status_code)
58-
77+
78+
5979
def executeBrokenLinkCheck(links):
6080
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
6181
futures = [executor.submit(getStatusCode, link, headers) for link in links]
6282
return [future.result() for future in as_completed(futures)]
6383

84+
6485
for i in range(pages):
65-
post_data = sess.get('https://' + domain + '/wp-json/wp/v2/posts?page='+str(i+1), headers=headers).json()
86+
post_data = sess.get(
87+
"https://" + domain + "/wp-json/wp/v2/posts?page=" + str(i + 1), headers=headers
88+
).json()
6689
for data in post_data:
67-
print("Checking post: ",data["link"])
90+
print("Checking post: ", data["link"])
6891
post_links = getLinks(data["content"]["rendered"])
6992
checked_urls = executeBrokenLinkCheck(post_links)
7093
prepare_csv_data(data["id"], data["link"], checked_urls)
71-
94+
7295
generate_csv_report(csv_file, links404)
73-
print("Report saved in file: ", csv_file)

‎poetry.lock

Lines changed: 458 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

‎pyproject.toml

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
[tool.poetry]
2+
name = "WpBrokenCheck"
3+
version = "0.1.0"
4+
description = "A script to check broken links on any WordPress website."
5+
authors = ["Suleman-Elahi <sul007@outlook.com>", "Felix Mertineit <fmertineit@gmail.com>"]
6+
7+
[tool.poetry.dependencies]
8+
python = "^3.8"
9+
requests = "^2.28.1"
10+
bs4 = "^0.0.1"
11+
12+
[tool.poetry.dev-dependencies]
13+
pytest = "^5.2"
14+
black = "^22.10.0"
15+
isort = "^5.10.1"
16+
flake8 = "^5.0.4"
17+
18+
[build-system]
19+
requires = ["poetry-core>=1.0.0"]
20+
build-backend = "poetry.core.masonry.api"

‎setup.cfg

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
[flake8]
2+
ignore = E501, W503, E203
3+
max-line-length = 79

0 commit comments

Comments
 (0)
Please sign in to comment.