Not a week goes by without hearing someone talk about data centres. Even the general public now knows that data centres are the foundations of the global digital economy. Handling financial transactions, hosting databases, managing video streams, running AI... All of theses operations are based on a fundamental prerequisite that is never actually mentioned, that of precise time management.
The slightest time deviation between any of the thousands of servers in a data centre can result in data inconsistencies and create security holes or regulatory breaches. What is at stake and what are the technical challenges? And how can robust time synchronisation be implemented in a data centre environment? Read on to find out.
Why is time synchronisation critical in a data centre?
Modern applications rely on distributed databases with nodes that are spread over several geographical sites. To guarantee consistency, these systems use precise timestamping to organise events. Google, for example, has developed a system that combines atomic clocks and GPS receivers in each of its data centres to maintain a temporal uncertainty of less than 7 ms. Amazon has adopted a similar approach with its Amazon Time Sync service, which delivers microsecond-level accuracy for its distributed databases.
Another important aspect is regulatory compliance. For example, in finance, European regulations include a Markets in Financial Instruments Directive (MiFid II). This Directive, and in particular Regulatory Technical Standard (RTS) 25, requires trading system clocks to be synchronised to UTC (Coordinated Universal Time) with a specific maximum divergence. Similarly, in the US, FINRA rules call for synchronisation to the nearest 50 milliseconds. The data centres that host these activities therefore need an appropriate synchronisation infrastructure.
Regulatory Technical Standard (RTS) 25 of the European MiFID II Directive defines several levels of accuracy depending on the type of activity. High-frequency trading systems must be synchronised to UTC time to the nearest 100 microseconds. Other electronic trading activities have a tolerance of 1 millisecond, and manual activities 1 second. Legislation also requires each timestamp to be traceable to UTC via a documented chain, and records must be kept for five years. These requirements apply to any investment company operating in the EU. |
IT security is another major constraint for data centres. Event logging is one pillar of this security. In the event of a cyberattack, security teams must be able to put together the exact timeline of events from the logs of hundreds of different machines. If the clocks are not synchronised, the reliability of this reconstitution is not guaranteed, making forensic analysis and incident response more difficult.
These three aspects are not the only important points: time synchronisation is also important for real-time applications, for instance video streaming, high-performance computing and other applications that cause engineers headaches.
NTP and PTP: two protocols for two levels of accuracy
NTP (Network Time Protocol) is the “historical” time synchronisation protocol. Based on a tiered hierarchical architecture, NTP synchronises the software clocks of equipment with an accuracy of around 1 millisecond on a local network, and tens of milliseconds on the internet. NTP is the de facto choice for many applications due to its simplicity and robustness.
PTP (Precision Time Protocol) meets the needs of applications for which accuracies beyond a millisecond are required. Unlike NTP, which is based on purely algorithmic timestamping, PTP will propagate time information from dedicated devices. This helps to eliminate the variable delays introduced by the software stack and achieve accuracies in the region of 1 microsecond or less.
PTP operates according to a client-server architecture, in which a “grandmaster” reference clock distributes the time to the entire network. The grandmaster is elected using the BMCA algorithm (best master clock algorithm).
In practice, the two protocols often coexist within the same data centre. PTP is deployed on the segments of the network that demand higher accuracy (trading, distributed databases, telecommunications), while NTP continues to serve equipment for which an accuracy of around 1 millisecond is enough (application servers, workstations, peripherals). A high-quality time server must be able to distribute both protocols simultaneously to meet all the requirements.
Reliable synchronisation for your critical environments
Provide accurate and consistent synchronisation across all your equipment, without any drift.
The technical challenges faced by data centres
Even though they are everywhere, data centres are an environment like no other. They have often highly specific synchronisation constraints.
Typically, synchronisation messages pass through numerous switches and routers. For each device they pass through, they can be delayed unpredictably. This variation in propagation time, known as PDV (Packet Delay Variation), affects synchronisation precision. The more switches the messages pass through, the greater the impact. The strategic placement of what are known as boundary clocks considerably reduces this effect by “terminating” the PTP signal at each stage and regenerating it with a fresh timestamp.
PTP theory makes a specific assumption, that the transit time of a packet is the same in both directions (round trip). And yet, in a data centre’s complex networks, the round-trip transit paths can be different, creating asymmetry that introduces time errors that PTP cannot estimate.
A network design that is as symmetrical as possible, combined with the use of dedicated algorithms, helps to mitigate this problem.
The majority of synchronisation algorithms were developed at a time when IT infrastructures were designed and managed differently. Nowadays, most infrastructures have been virtualised.
Virtual machines and containers do not have their own hardware clocks, but instead rely on the clocks of their physical hosts. Precise synchronisation of the host therefore becomes all the more critical, as any deviation has a knock-on effect on all of the virtual machines it hosts (care must be taken as these resources can be moved “unexpectedly”).
One final important technical aspect is that of resilience. What happens if the time server temporarily loses its external reference source (GPS receiver failure, network outage)? This is why the quality of the time server’s internal oscillator is so important. An OCXO (Oven-Controlled Crystal Oscillator) or a rubidium oscillator will help to maintain an acceptable precision for several hours, possibly even several days, in “holdover” mode.
Time synchronisation security
Time synchronisation is an often underestimated but also commonly used attack vector. Manipulating the time of a computer system can have a series of ramifications, invalidating TLS certificates, bypassing authentication systems, enabling replay attacks, and even falsifying timestamps of logs, making forensic analysis ineffective.
The NTP protocol, in its standard configuration, exchanges unencrypted, unauthenticated packets. This makes it vulnerable to several types of attack:
- Spoofing
- Distributed Denial-of-Service attack (DDoS)
- Man-in-the-Middle
The Network Time Security (NTS) protocol provides a response to these vulnerabilities. NTS is based on TLS to establish the initial connection and on an authenticated encryption mechanism to protect the NTP packets exchanged during synchronisation. This guarantees both the authenticity of the source and the integrity of the time data transmitted.
Recommended architectures: redundancy and resilience
Time availability is as critical as network or power supply availability. A robust synchronisation architecture consists of several redundancy levels:
- It is recommended to never rely on a single time source. In an ideal world, you would combine several independent sources, for example GPS and internal oscillator for holdover. In the event of the satellite signal being lost (interference, hardware failure), the oscillator will seamlessly take over.
- The time server itself must be designed for high availability: dual power supply, dual Ethernet ports, redundant components.
- At network level, redundancy involves the deployment of several time servers in different zones of the data centre, with automatic failover mechanisms. For NTP, server peering techniques ensure continuity of service. For PTP, the BMCA algorithm automatically selects the best available grandmaster in the event of failure.
- Finally, a complete synchronisation architecture must include a supervision system for permanently checking the actual synchronisation precision at every point of the network. This traceability is particularly important in regulated sectors.