-
Notifications
You must be signed in to change notification settings - Fork 13.1k
[ISSUE #13523] fix #13523 #13539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
[ISSUE #13523] fix #13523 #13539
Conversation
Thanks for your this PR. 🙏 感谢您提交的PR。 🙏 |
修复连接异常时未正确取消超时任务的问题并重构清理逻辑变更概述问题修复
重构
测试更新
变更文件
时序图sequenceDiagram
participant GC as GrpcConnection
participant DRF as DefaultRequestFuture
GC->>DRF: sendRequestInner()
GC->>GC: sendRequestNoAck()
GC-x GC: 捕获RemoteException
GC->>DRF: cancel(true, remoteException)
GC-->>DRF: 抛出异常
DRF->>FutureCleaner: clean()
💡 小贴士与 lingma-agents 交流的方式📜 直接回复评论
📜 在代码行处标记
📜 在讨论中提问
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔎 代码评审报告
🎯 评审意见概览
严重度 | 数量 | 说明 |
---|---|---|
🔴 Blocker | 0 | 阻断性问题,需立即修复。例如:系统崩溃、关键功能不可用或严重安全漏洞。 |
🟠 Critical | 0 | 严重问题,高优先级修复。例如:核心功能异常或性能瓶颈影响用户体验。 |
🟡 Major | 1 | 主要问题,建议修复。例如:非核心功能缺陷或代码维护性较差。 |
🟢 Minor | 0 | 次要问题,酬情优化。例如:代码格式不规范或注释缺失。 |
总计: 1 个问题
📋 评审意见详情
💡 代码实现建议
以下是文件级别的代码建议,聚焦于代码的可读性、可维护性和潜在问题。
☕ api/src/main/java/com/alibaba/nacos/api/remote/DefaultRequestFuture.java (1 💬)
- 新增的futureCleaner字段需要添加空值检查以避免潜在的空指针异常 (L76-L85)
🚀 架构设计建议
以下是对代码架构和设计的综合分析,聚焦于跨文件交互、系统一致性和潜在优化空间。
🔍1. 接口替换引入的功能一致性风险
将TimeoutInnerTrigger接口替换为FutureCleaner接口可能导致原有功能逻辑未被完整继承。原TimeoutInnerTrigger的triggerOnTimeout方法专门处理超时触发场景,而FutureCleaner的clean方法职责更泛化,可能存在清理逻辑与原有业务场景不匹配的风险。需验证所有调用点是否适配了新的接口行为,特别是涉及资源释放和状态同步的场景。
📌 关键代码
public interface FutureCleaner {
void clean();
}
if (futureCleaner != null) {
futureCleaner.clean();
}
若新接口未完全覆盖原有超时触发逻辑,可能导致连接未正确释放、资源泄漏或状态不一致等问题,特别是在高并发场景下引发连锁故障。
🔍2. 异常处理逻辑的全局一致性缺失
GrpcConnection.java中新增的RemoteException处理逻辑与DefaultRequestFuture的清理机制存在潜在冲突。当sendRequestNoAck抛出异常时,虽然调用了cancel方法,但未同步清理futureCleaner关联的资源,可能导致部分清理操作未被执行。需统一异常处理与资源释放的流程设计。
📌 关键代码
try {
sendRequestNoAck(request);
} catch (RemoteException remoteException) {
defaultPushFuture.cancel(true, remoteException);
throw remoteException;
}
异常传递过程中可能中断清理链路,导致未释放的定时任务或连接资源累积,长期运行将引发系统资源耗尽。
🔍3. 测试覆盖的边界场景缺失
新增的FutureCleaner接口未包含futureCleaner为null的边界测试用例。虽然文件级建议已指出字段空值检查,但测试代码中所有MockFutureCleaner实例均为有效对象,缺乏对null参数的负向测试,无法验证空值校验逻辑的实际效果。
📌 关键代码
MockFutureCleaner trigger = new MockFutureCleaner();
DefaultRequestFuture requestFuture = new DefaultRequestFuture(..., trigger);
未验证空值处理逻辑可能导致线上环境出现未捕获的NPE,特别是在依赖注入异常或配置错误场景下。
🔍4. 清理操作的原子性隐患
cancel方法中对timeoutFuture和futureCleaner的清理操作顺序未保证原子性。若futureCleaner.clean()抛出异常,可能导致timeoutFuture.cancel()未执行,遗留定时任务继续运行。需确保清理操作的异常隔离或顺序补偿机制。
📌 关键代码
if (futureCleaner != null) {
futureCleaner.clean();
}
if (timeoutFuture != null) {
timeoutFuture.cancel(mayInterruptIfRunning);
}
部分资源未被清理可能导致内存泄漏或重复任务执行,影响系统稳定性。
审查详情
📒 文件清单 (4 个文件)
📝 变更: 4 个文件
📝 变更文件:
api/src/main/java/com/alibaba/nacos/api/remote/DefaultRequestFuture.java
api/src/test/java/com/alibaba/nacos/api/remote/DefaultRequestFutureTest.java
config/src/main/java/com/alibaba/nacos/config/server/service/repository/extrnal/ExternalConfigInfoPersistServiceImpl.java
core/src/main/java/com/alibaba/nacos/core/remote/grpc/GrpcConnection.java
💡 小贴士
与 lingma-agents 交流的方式
📜 直接回复评论
直接回复本条评论,lingma-agents 将自动处理您的请求。例如:
-
在当前代码中添加详细的注释说明。
-
请详细介绍一下你说的 LRU 改造方案,并使用伪代码加以说明。
📜 在代码行处标记
在文件的特定位置创建评论并 @lingma-agents。例如:
-
@lingma-agents 分析这个方法的性能瓶颈并提供优化建议。
-
@lingma-agents 对这个方法生成优化代码。
📜 在讨论中提问
在任何讨论中 @lingma-agents 来获取帮助。例如:
-
@lingma-agents 请总结上述讨论并提出解决方案。
-
@lingma-agents 请根据讨论内容生成优化代码。
FutureCleaner futureCleaner) { | ||
this.timeStamp = System.currentTimeMillis(); | ||
this.requestCallBack = requestCallBack; | ||
this.requestId = requestId; | ||
this.connectionId = connectionId; | ||
if (requestCallBack != null) { | ||
this.timeoutFuture = RpcScheduledExecutor.TIMEOUT_SCHEDULER | ||
.schedule(new TimeoutHandler(), requestCallBack.getTimeout(), TimeUnit.MILLISECONDS); | ||
this.timeoutFuture = RpcScheduledExecutor.TIMEOUT_SCHEDULER.schedule(new TimeoutHandler(), | ||
requestCallBack.getTimeout(), TimeUnit.MILLISECONDS); | ||
} | ||
this.timeoutInnerTrigger = timeoutInnerTrigger; | ||
this.futureCleaner = futureCleaner; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
新增的futureCleaner字段需要添加空值检查以避免潜在的空指针异常
🟡 Major | 🐞 Bugs
📋 问题详情
在DefaultRequestFuture的构造函数中,futureCleaner参数未进行空值校验。若传入null值且后续调用clean()方法时,可能导致NullPointerException。该问题影响代码可靠性,需确保字段初始化的健壮性。
💡 解决方案
在构造函数中添加对futureCleaner参数的空值校验,避免非法初始化:
@@ -80,6 +80,8 @@
if (requestCallBack != null) {
this.timeoutFuture = RpcScheduledExecutor.TIMEOUT_SCHEDULER.schedule(new TimeoutHandler(),
requestCallBack.getTimeout(), TimeUnit.MILLISECONDS);
+ }
+ if (futureCleaner == null) {
+ throw new IllegalArgumentException("futureCleaner cannot be null");
}
this.futureCleaner = futureCleaner;
}
您的反馈对我们很重要!(建议右键在新标签页中打开以下链接)
/** | ||
* Cleaning something while request has been failed, canceled, timeout. | ||
*/ | ||
public interface FutureCleaner { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
直接修改可能有点问题, 我感觉。
Timeout的语意可能还是要保留的, 因为有可能不是因为Busy异常,而是确实客户端超时未回应的场景存在。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个解决方案并不完美。我会提供另外一个方案。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
先简单修复是否可以给Timeout添加一下cancel, 这样出现Busy异常的时候,cancel掉这个TimeoutHandler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
先简单修复是否可以给Timeout添加一下cancel, 这样出现Busy异常的时候,cancel掉这个TimeoutHandler
请大佬帮忙再次Code Review 一下。
原来的TimeoutInnerTrigger
修改为FutureTrigger
(包含timeout和cancel两种触发场景)。
当com.alibaba.nacos.core.remote.grpc.GrpcConnection#sendRequestNoAck发送异常时(此时说明请求没有发送给客户端),执行defaultPushFuture.cancel(true)
操作,取消timeoutFuture
(超时定时任务),并向上层(接口调用方)抛出异常。
@yunwz checkstyle not pass, please fix and do mvn install in local env first. |
add test unit for DefaultRequestFuture#cancel().
fixed checkstyle error and added some test unit for |
Please do not create a Pull Request without creating an issue first.
What is the purpose of the change
Cancel timeout task when happens
ConnectionBusyException
orConnectionAlreadyClosedException
.Brief changelog
XX
Verifying this change
XXXX
Follow this checklist to help us incorporate your contribution quickly and easily:
[ISSUE #123] Fix UnknownException when host config not exist
. Each commit in the pull request should have a meaningful subject line and body.mvn -B clean package apache-rat:check findbugs:findbugs -Dmaven.test.skip=true
to make sure basic checks pass. Runmvn clean install -DskipITs
to make sure unit-test pass. Runmvn clean test-compile failsafe:integration-test
to make sure integration-test pass.