DHCP Duplicate IP — Option 82 子网不匹配导致租约被提前删除
Case Summary: DHCP Duplicate IP — Option 82 子网不匹配导致租约提前删除并重分配
Product/Service: Windows Server 2022 — DHCP Server Role
1. 症状 (Symptoms)
- 客户环境中多台设备告警 Duplicate IP,同一个 IP 地址同时被两台设备使用,业务间歇性中断。
- 在 DHCP Server 上查询租约时发现:某条 IP 租约在过期时间未到之前就被删除,随后同一个 IP 被分配给了另一台设备。
- 例如:
Get-DhcpServerv4Lease -IPAddress 10.20.232.99在 12:11:34 查到租约过期时间约 12:23:17,但在 12:11:45 再查时该租约已消失;12:14:51 该 IP 已被分配给另一台设备。
- 例如:
- 问题频繁发生,涉及多个 IP 段。
2. 背景 (Background / Environment)
- DHCP Server:Windows Server 2022
- DHCP Failover:已配置,模式为 Load Balance
- MCLT(Maximum Client Lead Time):15 分钟
- Lease 时间:4 小时
- 注册表配置:
DhcpGlobalFlagForSubnetChangeDHCPRequest = 1(位于HKLM\SYSTEM\CurrentControlSet\Services\DhcpServer\Parameters) - 客户端:Linux 设备,配置双网卡,分属不同 VLAN
- 网络架构:跨子网部署,使用 DHCP Relay Agent 转发,Relay 配置了 Option 82
3. Troubleshooting 过程 (Investigation & Troubleshooting)
第一轮:租约为何提前消失?
- 做了什么:在 DHCP Server 上使用
Get-DhcpServerv4Lease反复查询问题 IP(如10.20.232.99)的租约状态。 - 发现了什么:
- 12:11:34 租约存在,过期时间约 12:23:17
- 12:11:45 同一 IP 的租约已不存在
- 12:14:51 该 IP 已被分配给另一台设备
- 得出了什么结论:DHCP Server 在租约有效期内主动删除了该租约,并将 IP 重新分配。需要确认是客户端主动释放还是 Server 端行为。
第二轮:客户端是否主动释放?
- 做了什么:抓取问题时间段的网络流量,重点筛查
DHCP Message Type = DHCP Release。 - 发现了什么:
- 没有看到客户端发送任何 DHCPRELEASE 报文
- 客户端本机上 IP 仍然在使用中
- 得出了什么结论:不是客户端主动释放 IP,删除租约的动作来自 DHCP Server 本身。
第三轮:DHCP 审计日志回溯
- 做了什么:查看 DHCP 审计日志(Audit Log),还原问题 IP 的完整事件链。
- 发现了什么:一个固定的事件模式:
- 客户端首次成功获取 IP(Assign),MCLT 为 15 分钟
- 在租约时间一半(~7.5 分钟)时客户端发起续租(DHCPREQUEST)
- DHCP Server 返回 NACK(DHCPNAK)
- 随后同一 IP 被 Assign 给其他设备
审计日志片段示例:
10,12/08/25,11:04:12,Assign,10.20.232.99,,AABBCCDD1122,... 15,12/08/25,11:11:42,NACK,10.20.232.99,,AABBCCDD1122,... 15,12/08/25,11:15:27,NACK,10.20.232.99,,AABBCCDD1122,... 15,12/08/25,11:17:19,NACK,10.20.232.99,,AABBCCDD1122,... 10,12/08/25,11:18:15,Assign,10.20.232.99,,AABBCCDD1122,... - 得出了什么结论:行为链路清晰——客户端续租 → Server NACK → Server 删除旧租约 → IP 重新分配。需要进一步确认 NACK 的原因。
第四轮:NACK 根因——Option 82 子网不匹配
- 做了什么:分析问题时间段的网络抓包和 DHCP Server ETL 日志,重点关注 DHCPREQUEST 中的 Option 82 信息。
- 发现了什么:
- 客户端首次获取的 IP 为
10.20.225.63,属于 Subnet A - 续租时 DHCPREQUEST 报文中 Option 82 携带的子网信息指向 Subnet B(
203.0.113.0) - DHCP Server ETL 日志明确记录:
REQUEST for address 10.20.225.63 REQUEST from relay agent: 198.51.100.30 Interface subnet = 198.51.100.30 Mismatch in ClientSubnet determined from request message (option 82 suboption 5 section) 203.0.113.0 and present in database 203.0.113.0 DhcpDoDynDnsCheckDelete() - Deleting record 10.20.225.63 Invalid DHCPREQUEST for 10.20.225.63 Nack'd.
- 客户端首次获取的 IP 为
- 得出了什么结论:
- 客户端拿着 Subnet A 的 IP,续租时 Option 82 却声明位于 Subnet B
- DHCP Server 发现子网不匹配,判定为”客户端已换子网”
- 由于启用了
DhcpGlobalFlagForSubnetChangeDHCPRequest = 1,Server 执行:删除旧租约 → 返回 NACK → 期待客户端重新 DISCOVER
第五轮:验证注册表行为——By Design
- 做了什么:检查 DHCP Server 注册表配置
HKLM\SYSTEM\CurrentControlSet\Services\DhcpServer\Parameters\DhcpGlobalFlagForSubnetChangeDHCPRequest。 - 发现了什么:值为 1。此设置的含义为:
- 当 DHCPREQUEST 中 Option 82 指示客户端已”换子网”时
- DHCP Server 将:找到旧租约 → 删除 → 返回 NACK → 强制客户端重新 DISCOVER
- 得出了什么结论:DHCP Server 的删除租约 + NACK 行为是 By Design 的增强行为,目的是避免客户端在换子网后仍持有旧子网 IP。Server 侧逻辑完全正常。
第六轮:客户端未按协议响应 NACK
- 做了什么:与客户确认客户端实际行为。经确认,客户端确实因双网卡配置从 Subnet A 漂移到了 Subnet B(场景 2)。
- 发现了什么:
- 按照 DHCP 协议预期:客户端收到 NACK 后应丢弃当前 IP → 重新发起 DHCPDISCOVER
- 实际行为:客户端收到 NACK 后没有重新 DISCOVER,继续使用已被 Server 回收的旧 IP
- 同时该 IP 被 Server 正常分配给另一台设备 → Duplicate IP
- 得出了什么结论:根因在客户端——Linux 设备双网卡跨 VLAN 配置问题导致对 DHCPNAK 的处理不符合协议预期。
4. Blockers 与解决 (Blockers & How They Were Resolved)
| Blocker | 影响 | 如何解决 |
|---|---|---|
| 初始怀疑方向是 DHCP Server 分配异常,需多轮日志分析才定位到 Option 82 | 延长了排查时间 | 通过 Audit Log + ETL 日志 + 网络抓包三方交叉验证,还原完整事件链 |
| 需要确认客户端是否真正发生子网漂移(场景 1 vs 场景 2) | 无法判断是 Relay 配置问题还是客户端真实漂移 | 与客户确认实际网络拓扑和客户端物理位置变化 |
| 客户端为 Linux 设备,DHCP 行为非 Windows 标准实现 | 无法直接调试客户端协议栈 | 确认为客户端侧问题后,转交客户自行与对应厂商排查 |
5. 根因与解决方案 (Root Cause & Resolution)
Root Cause
多因素叠加导致 Duplicate IP:
- 客户端(Linux,双网卡)发生子网漂移:客户端从 Subnet A 移动到 Subnet B,续租时 Option 82 携带了 Subnet B 的信息。
- DHCP Server 正确响应(By Design):由于启用了
DhcpGlobalFlagForSubnetChangeDHCPRequest = 1,Server 检测到子网不匹配后删除旧租约并返回 NACK,期待客户端重新 DISCOVER。 - 客户端未正确处理 NACK:Linux 客户端收到 NACK 后未丢弃旧 IP、未重新 DISCOVER,继续使用已被回收的 IP。
- IP 冲突产生:被回收的 IP 被 Server 分配给其他设备,导致两台设备同时使用同一 IP。
Resolution
- 确认 DHCP Server 行为正常——
DhcpGlobalFlagForSubnetChangeDHCPRequest = 1在此环境下是必要且正确的配置。 - 将问题聚焦到客户端侧:Linux 设备双网卡配置导致 DHCP 协议处理异常。
- 客户自行与 Linux 设备厂商/团队排查并修复客户端的 DHCP 行为(确保收到 NACK 后正确丢弃 IP 并重新 DISCOVER)。
Workaround
在客户端侧修复之前,可考虑的临时缓解措施:
- 为频繁出问题的设备配置 DHCP Reservation,避免 IP 被回收后分配给其他设备。
- 缩短 Lease 时间以减少冲突窗口期。
6. 经验教训 (Lessons Learned)
- 技术知识:
DhcpGlobalFlagForSubnetChangeDHCPRequest是一个关键的增强行为注册表开关。启用后,DHCP Server 在检测到续租请求中 Option 82 指示的子网与旧租约不一致时,会主动删除租约 + NACK,强制客户端重新获取 IP。这在 EVPN/VXLAN/多 VLAN 环境中尤为重要。- DHCP Option 82 是 Relay Agent 为 DHCP 报文添加的”地理标签”,帮助 Server 判断客户端的网络拓扑位置。当 Relay 配置或客户端位置发生变化时,Option 82 信息会随之改变。
- 排查方法:
- 处理 Duplicate IP 问题的推荐排查顺序:① Server 租约行为(是否提前删除、是否 NACK)→ ② 报文中的 Option 82 / GIADDR(是否子网漂移)→ ③ 客户端对 NACK 的处理逻辑(是否正确重新获取 IP)。
- DHCP Server 三重日志交叉验证法:Audit Log(事件时间线)+ ETL 日志(Server 内部决策逻辑)+ 网络抓包(实际报文内容),三者结合才能还原完整行为链。
- 流程改进:
- 在复杂网络(多 VLAN / EVPN / 双网卡)环境中部署 DHCP 时,应同时验证客户端对 DHCPNAK 的处理是否符合协议标准(RFC 2131)。
- 不要只盯着 DHCP Server——”Server 看上去像背锅侠,但实际上逻辑正常;真正的问题可能在客户端”是此类问题的典型场景。
- 预防措施:
- 在多 VLAN / 跨子网环境中启用
DhcpGlobalFlagForSubnetChangeDHCPRequest时,应确保所有 DHCP 客户端(尤其是非 Windows 客户端如 Linux / IoT 设备)能正确处理 DHCPNAK。 - 建议在部署前对各类客户端进行 NACK 响应测试。
- 在多 VLAN / 跨子网环境中启用
7. 参考文档 (References)
- DHCP Server Overview — Windows Server — Windows Server DHCP 服务概述
- DHCP Failover — Windows Server — DHCP 故障转移配置说明
- RFC 3046 — DHCP Relay Agent Information Option — DHCP Option 82 标准定义
Case Summary: DHCP Duplicate IP — Option 82 Subnet Mismatch Causing Premature Lease Deletion and Reassignment
Product/Service: Windows Server 2022 — DHCP Server Role
1. Symptoms
- Multiple devices in the customer’s environment reported Duplicate IP alerts — the same IP address was simultaneously in use by two different devices, causing intermittent business disruptions.
- Querying leases on the DHCP Server revealed that IP leases were being deleted before their expiration time, and the same IP was subsequently reassigned to another device.
- Example:
Get-DhcpServerv4Lease -IPAddress 10.20.232.99showed the lease at 12:11:34 with expiry ~12:23:17, but by 12:11:45 the lease had disappeared; by 12:14:51 the IP was assigned to a different device.
- Example:
- The issue occurred frequently across multiple IP ranges.
2. Background / Environment
- DHCP Server: Windows Server 2022
- DHCP Failover: Configured in Load Balance mode
- MCLT (Maximum Client Lead Time): 15 minutes
- Lease Duration: 4 hours
- Registry Setting:
DhcpGlobalFlagForSubnetChangeDHCPRequest = 1(located atHKLM\SYSTEM\CurrentControlSet\Services\DhcpServer\Parameters) - Clients: Linux devices with dual NICs on different VLANs
- Network Architecture: Cross-subnet deployment with DHCP Relay Agent forwarding; Relay configured with Option 82
3. Investigation & Troubleshooting
Round 1: Why Are Leases Disappearing Early?
- Action: Repeatedly queried the lease status for affected IPs (e.g.,
10.20.232.99) usingGet-DhcpServerv4Leaseon the DHCP Server. - Finding:
- At 12:11:34 — lease existed with expiry ~12:23:17
- At 12:11:45 — lease no longer present
- At 12:14:51 — IP assigned to a different device
- Conclusion: The DHCP Server proactively deleted the lease before its expiration and reassigned the IP. Need to determine if the client released it or the server acted on its own.
Round 2: Did the Client Release the IP?
- Action: Captured network traffic during the problem window, filtering for
DHCP Message Type = DHCP Release. - Finding:
- No DHCPRELEASE packets were observed from the client
- The IP was still active and in use on the client machine
- Conclusion: The client did not actively release the IP. The lease deletion was initiated by the DHCP Server itself.
Round 3: DHCP Audit Log Timeline Reconstruction
- Action: Reviewed DHCP audit logs to reconstruct the full event chain for affected IPs.
- Finding: A consistent pattern emerged:
- Client initially obtained an IP (Assign), MCLT of 15 minutes
- At the half-lease mark (~7.5 minutes), the client attempted to renew (DHCPREQUEST)
- DHCP Server responded with NACK (DHCPNAK)
- Shortly after, the same IP was assigned to another device
Audit log excerpt:
10,12/08/25,11:04:12,Assign,10.20.232.99,,AABBCCDD1122,... 15,12/08/25,11:11:42,NACK,10.20.232.99,,AABBCCDD1122,... 15,12/08/25,11:15:27,NACK,10.20.232.99,,AABBCCDD1122,... 15,12/08/25,11:17:19,NACK,10.20.232.99,,AABBCCDD1122,... 10,12/08/25,11:18:15,Assign,10.20.232.99,,AABBCCDD1122,... - Conclusion: Clear behavioral chain — client renew → server NACK → server delete old lease → IP reassigned. Need to investigate why the NACK was issued.
Round 4: Root Cause of NACK — Option 82 Subnet Mismatch
- Action: Analyzed network captures and DHCP Server ETL logs from the problem window, focusing on Option 82 information in DHCPREQUEST packets.
- Finding:
- The client originally obtained IP
10.20.225.63belonging to Subnet A - During renewal, the DHCPREQUEST packet’s Option 82 indicated Subnet B (
203.0.113.0) - DHCP Server ETL log clearly recorded:
REQUEST for address 10.20.225.63 REQUEST from relay agent: 198.51.100.30 Interface subnet = 198.51.100.30 Mismatch in ClientSubnet determined from request message (option 82 suboption 5 section) 203.0.113.0 and present in database 203.0.113.0 DhcpDoDynDnsCheckDelete() - Deleting record 10.20.225.63 Invalid DHCPREQUEST for 10.20.225.63 Nack'd.
- The client originally obtained IP
- Conclusion:
- The client held a Subnet A IP but its renewal request (via Option 82) claimed it was on Subnet B
- The DHCP Server detected the subnet mismatch and treated it as a “subnet change”
- With
DhcpGlobalFlagForSubnetChangeDHCPRequest = 1enabled, the server executed: delete old lease → return NACK → expect client to re-DISCOVER
Round 5: Registry Behavior Validation — By Design
- Action: Verified the DHCP Server registry setting at
HKLM\SYSTEM\CurrentControlSet\Services\DhcpServer\Parameters\DhcpGlobalFlagForSubnetChangeDHCPRequest. - Finding: Value was 1. When enabled:
- If a DHCPREQUEST’s Option 82 indicates the client has changed subnets
- The server will: locate the old lease → delete it → return NACK → force the client to re-DISCOVER
- Conclusion: The DHCP Server’s lease deletion + NACK behavior is By Design — an enhanced behavior to prevent clients from retaining old-subnet IPs after a subnet change. Server-side logic was completely correct.
Round 6: Client Failed to Properly Handle NACK
- Action: Confirmed with the customer that the client actually migrated from Subnet A to Subnet B (Scenario 2: genuine subnet change due to dual NIC configuration).
- Finding:
- Per DHCP protocol (RFC 2131): after receiving NACK, the client should discard the current IP → initiate a new DHCPDISCOVER
- Actual behavior: the client did not re-DISCOVER after receiving NACK and continued using the reclaimed IP
- Meanwhile, the server legitimately assigned the same IP to another device → Duplicate IP
- Conclusion: The root cause was on the client side — the Linux device’s dual-NIC cross-VLAN configuration caused improper handling of DHCPNAK, violating protocol expectations.
4. Blockers & How They Were Resolved
| Blocker | Impact | How It Was Resolved |
|---|---|---|
| Initial suspicion pointed at DHCP Server allocation anomaly; required multiple rounds of log analysis to identify Option 82 as the trigger | Extended troubleshooting time | Cross-validated using Audit Log + ETL logs + network capture to reconstruct the complete event chain |
| Needed to confirm whether the client truly changed subnets (Scenario 1: Relay misconfiguration vs. Scenario 2: actual client migration) | Could not determine whether Relay config or client movement was the cause | Confirmed with customer about actual network topology and client physical location changes |
| Client was a Linux device with non-standard DHCP implementation | Unable to directly debug the client’s protocol stack | After confirming client-side issue, handed off to the customer to troubleshoot with the device vendor |
5. Root Cause & Resolution
Root Cause
Duplicate IP caused by multiple contributing factors:
- Client (Linux, dual NIC) underwent subnet migration: The client moved from Subnet A to Subnet B; its renewal request’s Option 82 carried Subnet B information.
- DHCP Server responded correctly (By Design): With
DhcpGlobalFlagForSubnetChangeDHCPRequest = 1enabled, the server detected the subnet mismatch, deleted the old lease, and returned NACK — expecting the client to re-DISCOVER. - Client failed to properly handle NACK: The Linux client did not discard the old IP or re-DISCOVER after receiving NACK, continuing to use the reclaimed IP.
- IP conflict occurred: The reclaimed IP was reassigned by the server to another device, resulting in two devices using the same IP simultaneously.
Resolution
- Confirmed DHCP Server behavior was correct —
DhcpGlobalFlagForSubnetChangeDHCPRequest = 1is a necessary and proper configuration in this environment. - Focused the issue on the client side: Linux device dual-NIC configuration caused DHCP protocol handling anomalies.
- Customer engaged their Linux device vendor/team to investigate and fix the client’s DHCP behavior (ensuring proper IP discard and re-DISCOVER after receiving NACK).
Workaround
Temporary mitigations while the client-side fix was pending:
- Configure DHCP Reservations for frequently affected devices to prevent reclaimed IPs from being reassigned.
- Shorten lease duration to reduce the conflict window.
6. Lessons Learned
- Technical Knowledge:
DhcpGlobalFlagForSubnetChangeDHCPRequestis a critical enhanced-behavior registry key. When enabled, the DHCP Server proactively deletes leases and NACKs renewal requests where Option 82 indicates a subnet change. This is especially important in EVPN/VXLAN/multi-VLAN environments.- DHCP Option 82 serves as a “geographic tag” added by the Relay Agent, helping the Server determine the client’s network topology position. When the Relay configuration or client location changes, Option 82 information changes accordingly.
- Troubleshooting Methodology:
- Recommended investigation order for Duplicate IP issues: ① Server lease behavior (premature deletion? NACK?) → ② Option 82 / GIADDR in packets (subnet drift?) → ③ Client NACK handling (does it properly re-acquire an IP?).
- DHCP Server triple-log cross-validation: Audit Log (event timeline) + ETL log (server internal decision logic) + Network capture (actual packet content) — all three combined provide the complete behavioral chain.
- Process Improvement:
- When deploying DHCP in complex networks (multi-VLAN / EVPN / dual-NIC), verify that clients properly handle DHCPNAK per protocol standards (RFC 2131).
- Don’t only look at the DHCP Server — “The server appears to be the scapegoat, but its logic is correct; the real problem may be on the client side” is a classic pattern in these types of issues.
- Prevention:
- When enabling
DhcpGlobalFlagForSubnetChangeDHCPRequestin multi-VLAN / cross-subnet environments, ensure all DHCP clients (especially non-Windows clients such as Linux / IoT devices) can correctly handle DHCPNAK. - Recommend NACK response testing for various client types before deployment.
- When enabling
7. References
- DHCP Server Overview — Windows Server — Windows Server DHCP service overview
- DHCP Failover — Windows Server — DHCP failover configuration guide
- RFC 3046 — DHCP Relay Agent Information Option — DHCP Option 82 standard definition