I am leasing a Kimsufi dedicated server from OVH,
184.108.40.206. Since early
January 2015, TCP connections to that machine (and in
particular SSH connections) are sporadically hanging.
Analysis of the issue
This machine is on a network whose default router
As far as I was able to determine, this is a virtual router, load balanced using GLBP . The two actual routers are:
vss-9b-6k.fr.eu, MAC address
vss-9a-6k.fr.eu, MAC address
When attempting to establish an SSH connection from the outside
to that machine, the first data packet in the connection
appears to be dropped if sent through
This does not appear to be related to any kind of stateful firewalling system. As an experiment, I wrote a simple Scapy script that loops sending identical TCP segments, one per second, through both of the above MAC addresses, to a remote address outside OVH.
A tcpdump on the dedicated server shows the stream of outgoing packets:
1 2 3 4 5 6
Now observing traffic on the remote machine, we see only those packets that went through 00:07:b4:00:01:02:
1 2 3 4 5 6
None so far. OVH has been notified of the problem (TICKET#2015010719008317) and all analysis elements in my possession have been conveyed to them, to no avail so far: the machine has been essentially unusable for the past two months and counting.
Update 2015-03-20 OVH say they identified a problem and are working on a fix. No more SSH failure observed since 2015-03-18 in the afternoon, so apparently the fix did work. Still waiting for a post-mortem explanation as to what went wrong, and why it took them so long to ackonwledge, investigate, and resolve the problem.
Update 2015-04-01 Service remains stable, in that failures are not observed anymore. OVH indicates they are still discussing the underlying issue with Cisco, and the fix is not completed yet.
Update 2015-06-17 Service remains stable. OVH indicates that they have identified the origin of the issue, a fix is available, and they have scheduled its deployment.
Update 2015-09-17 At long last, OVH confirmed that the problem is indeed resolved on their side, and agreed to extend my subscription by 3 months at no cost in compensation, which is decent of them.
Thanks to Pierre Beyssac for hinting at GLBP, and to Fabian at OVH support for following up internally on this issue.