Skip to content

Update filter #2230

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 16, 2025
Merged

Update filter #2230

merged 1 commit into from
Jun 16, 2025

Conversation

yzqzss
Copy link
Contributor

@yzqzss yzqzss commented Jun 15, 2025

把 xlog 全部4万多文章拉了下来,打标然后训练了个简单的 TF-IDF spam 分类器。
在最新的3000篇(跨度3个月)的博客文章上跑了下预测。

以下是这3000篇中新发现的 47 个至少有一篇被预测为 spam 文章的账号的情况:

未被勾选的为实实在在的 spam 账号(已人工复检)。
勾选的为识别错误的账号(已人工复检)。

这样就剩下了 41 个确定的近三个月新出现的 spam 账号,加进 filter 里了。


以及3000篇文章每篇的详细结果:

3000_prediction_results.csv

Copy link

vercel bot commented Jun 15, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
xlog ⬜️ Ignored (Inspect) Visit Preview Jun 15, 2025 10:38am

@hyoban hyoban merged commit 02d1eaf into Crossbell-Box:dev Jun 16, 2025
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants