Guidelines for setting up a good NTP time server for synchronization of computer clocks; links for readers with general interest in time, time dissemination and computer time; issues specific to time dissemination in Slovenia.
Copyright © 2000 Mark Martinec, All Rights Reserved
This document is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
This document tries to cater primarily to the following interests:
This document tries to fill a niche which I feel was not well covered with other documents. It tries to avoid duplicating material that is well organized and available at the http://www.ntp.org/. It is based on my experience in setting up NTP servers at the J. Stefan Institute, making them better, and helping others to do the same.
The time unit second is one of the SI base units. Until 1956 the second was derived from the Earth's rotation around its axis, later from the Earth's motion around the Sun. In 1967, the change from the astronomical to the atomic definition of the second was done, because the atom's resonance frequency is much more constant in time than the angular frequency of the Earth, the oscillation frequency of a pendulum or of a quartz oscillator. The new definition of the SI second is based on the (non-radioactive) caesium, 133Cs, whose atomic frequency had been fixed at 9,192,631,770 Hz in 1967. See Cesium Atomic Clock page at USNO.
International Atomic Time -- Temps Atomique International (TAI) is calculated by the BIPM from the readings of more than 260 atomic clocks located in metrology institutes and observatories in more than 40 countries around the world. BIPM estimates that TAI does not lose or gain with respect to an imaginary perfect clock by more than about 100 nanoseconds per year.
Coordinated Universal Time (UTC) is the basis for legal time worldwide and follows TAI (see above) exactly except for an integral number of seconds, presently 33 (since 2006-01-01). These leap seconds are inserted on the advice of the International Earth Rotation Service (IERS) to ensure that, on average over the years, the Sun is overhead within 0.9 seconds of 12:00:00 UTC on the meridian of Greenwich. UTC is thus the modern successor of Greenwich Mean Time, GMT, which was used when the unit of time was the mean solar day.
Leap second: An intentional time step of one second used to adjust UTC to ensure approximate agreement with UT1. An inserted second is called a positive leap second (see page at NPL and at USNO), and an omitted second is called a negative leap second. A positive leap second used to be inserted about once every year and a half, but hasn't been inserted since 1999-01-01 until 2006-01-01. Here is the historical list of leap seconds at USNO.
Clock: (a) A device for maintaining and displaying time. (b) A device that counts the number of seconds occurring from an arbitrary starting time. A clock needs three basic parts. First, a source of events to be counted. This source can be labeled a frequency standard, frequency source, or time interval standard. Second, a means of accumulating (counting, adding, integrating) these events or oscillations. Third, a means of displaying the accumulation of time.
Disciplined oscillator: An oscillator with a servo loop that has its phase and frequency locked to an external reference signal.
Accuracy: (a) The degree of conformity of a measured or calculated value to its definition or with respect to a standard reference. (b) The closeness of a measurement to the true value as fixed by a universally accepted standard.
Precision: (a) The degree of mutual agreement among a series of individual measurements. Precision is often, but not necessarily, expressed by the standard deviation of the measurements. (b) Random uncertainty of a measured value, expressed by the standard deviation or by a multiple of a standard deviation.
Resolution: The degree to which a measurement can be determined is called the resolution of the measurement. The smallest significant difference that can be measured with a given instrument. For example, a measurement made with a time interval counter might have a resolution of 10 ns.
Frequency stability: Statistical estimate of the frequency fluctuations of a signal over a given time interval. Long term stability usually involves measurement averages beyond 100 s, short term stability usually involves measurement averages from a few tenths of a second to 100 s.
NOTE: Generally, there is a distinction between systematic effects such as frequency drift, and stochastic frequency fluctuations. Systematic instabilities may be caused by temperature, humidity, pressure, radiation, orientation, magnetic and gravitational field, etc. Random or stochastic instabilities are typically characterized in the time domain or frequency domain. They are typically dependent on the measurement system bandwidth or on the sample time or integration time.
Frequency drift: The linear (first-order) component of a systematic change in frequency of an oscillator over time. Drift is due to ageing plus changes in the environment and other factors external to the oscillator.
Ageing (or Aging, both forms are correct): The systematic change in frequency over time because of internal changes in the oscillator. For example, a 100 kHz quartz oscillator may age until its frequency becomes 100.01 kHz. NOTE: Ageing is the frequency change with time when factors external to the oscillator such as environment and power supply are kept constant.
Jitter: clock phase variation,
or time interval error (TIE) occurring at rates above 10 Hz.
The jitter in the context of NTP is calculated as the
exponential average of the first-order time differences.
Wander: clock phase variation,
or time interval error (TIE) occurring at rates below 10 Hz.
The wander in the context of NTP is calculated as the
exponential average of the first-order frequency differences.
Allan Variance: The standard method of characterizing the frequency stability of oscillators in the time domain, both short and long term. The traditional variance describes the deviation of a set of observations from the mean, but is not defined for noise processes more divergent than white noise. The Allan variance is convergent for all noise processes associated with precision oscillators. It is fast and accurate in estimating the noise process, easy to compute and has a straightforward relationship to the power law spectral density types. The Allan Deviation is a square root of Allan variance.
Synchronization: The process of measuring the difference in time of two time scales such as the output signals generated by two clocks. In the context of timing, synchronization means to bring two clocks or data streams into phase so that their difference is 0.
Syntonization: Relative adjustment of two frequency sources with the purpose of canceling their frequency difference but not necessarily their phase difference. (Telecom almost always uses the word synchronization when they mean syntonization)
Epoch: Epoch signifies the beginning of an era (or event) or the reference date of a system of measurements.
Julian Day: Obtained by counting days from the starting point of noon on 1 January 4713 B.C. (Julian Day zero). One way of telling what day it is with the least possible ambiguity. (see Julian Date Converter)
Modified Julian Day (MJD) = Julian date - 2400000.5 . MJD zero is 17 November 1858 at 00:00. Examples: 1900-01-01 (NTP epoch) = MJD 15020, 1970-01-01 (Unix epoch) = MJD 40587, 2000-03-01 = MJD 51604.
See also:
As any clock, the computer clock (also known as system clock, kernel clock, software clock) has three components: a frequency source (e.g. a quartz oscillator), a means of accumulating timing events (clock interrupt mechanism and counter implemented in software), and means of displaying the accumulation of time -- the programming interface (e.g. system routines) for reading the clock counter.
The implementation of the computer clock in the operating system and the programming interface (API) differs between operating systems and between hardware platforms, but almost always the basic source of timing is an uncompensated (room temperature) quartz crystal oscillator (RTXO) and the clock interrupts it generates. The operating system may be able to compensate for some sources of timing errors or to provide hooks for an application program to do that.
Let us first briefly touch on three properties of clocks: resolution, precision and accuracy (see definitions in the previous section), and finally on time monotonicity.
Resolution of the computer clock is usually determined by the clock interrupt rate, the most common values range from 100 Hz to 1000 Hz, giving us the resolution of 10 ms to 1 ms. That means we can't distinguish between two events that are closer than 10 ms (or 1 ms respectively) apart and that the average time error is half of that interval. Many modern platforms make possible the use other frequency sources in the computer to interpolate time between clock interrupts to a finer resolution. A common trick for precision timing is to use performance counters or CPU cycle counters for interpolation when available. The higher resolution obtained by interpolation is sometimes referred to as the clock granularity.
Precision (deviation or spread of measurements) is mostly determined by the short-term fluctuations in frequency of the computer clock oscillator and by measurement errors caused by the coarse clock resolution, interrupt latencies, and processor load.
Accuracy (the closeness of computer time to UTC in our case) is the systematic (not random) error in the time offset. When a typical computer clock is left free running, the major causes of its inaccuracy are the initial time offset error (in the old days we were setting the computer clock by looking at a wristwatch and typing in a command), and the average frequency offset error, causing time to move away from the true time at a steady rate. When the computer clock is disciplined to an external reference (e.g. by NTP protocol), the accuracy is orders of magnitude better and is determined by the accuracy of the external reference, by the offsets introduced by the time dissemination technique used (e.g. asymmetrical network delays or latencies in the interface to the attached reference clock), and by the ability of the software to properly adjust the computer clock. Depending on the time scale one has in mind, slow fluctuations in frequency of the computer clock oscillator can be counted as a contributor to imprecision or the inaccuracy of the computer clock.
Monotonicity means that each successive time reading will give time that is more in the future than the previous one (or at most the same, due to the coarse clock resolution). This is normally not an issue for precision clocks and hardware oscillators, but the software implementation of the computer clock makes it quite easy for time to be set backwards. Some applications and protocols may be much surprised after backward jumps of time, so time monotonicity is a very desirable property.
Example: The following diagram illustrates the concepts of accuracy, precision and resolution on a made-up example of time offset samples. Starting left to right: (1) not accurate and not precise computer clock, (2) accurate but not precise, (3) not accurate, but precise, (4) accurate and precise, (5) accurate, reasonably precise, but coarse resolution:
Figure 1.
In this section we'll deal mostly with accuracy and precision of computer time, but for some applications the resolution might be more important, and we won't be very precise (accurate?) to distinguish between the three terms here.
The (average) relative frequency (f-fnom)/fnom of RTXO ranges from few ppm (parts per million) if you are lucky, to more than 100 ppm (about one minute per week). The physical properties of the quartz crystal exhibit a gradual change with time, resulting in a gradual cumulative frequency drift called ageing. But the more pronounced frequency drift is the short term drift, which is in large part due to changes in temperature. A typical figure of 1 ppm in relative frequency change per degree centigrade is common, but the relationship is nonlinear, possibly showing hysteresis, and is different for each oscillator. Other factors influencing quartz oscillator are power supply noise, magnetic fields, air humidity, air pressure, orientation, vibration, etc. Figures of more than 3 ppm/K and also less than 0.1 ppm/K are not uncommon. It is likely that your quartz-controlled wristwatch will keep more accurate time than your expensive computer.
This implies that a typical computer clock would lose or gain from several seconds to more than a minute per week if left free-running. That time offset error would be in addition to the initial offset error when clock was set. Whether this is acceptable or not depends on the applications you run, on your network environment and on your expectations. The good news is that there exists for many years now a good and essentially free solution to keep computer clocks in sync (and to compensate for the oscillator frequency error), called NTP (Network Time Protocol) -- but more on that later on.
The more forgiving applications regarding accurate time are electronic messaging systems (e-mail, Usenet news, WWW, ...). The protocols could tolerate large time errors, but a basic courtesy to your correspondents suggests keeping time offsets below a few minutes, perhaps down to a few seconds. To receive mail before somebody claims to have sent it means somebody is lying, right? Keeping accurate time to a few seconds on mailers enables one to see how well messages propagate and to spot the problematic mailers and network connections.
WWW, Usenet news and similar document/message distribution systems use time-stamps for expiration and document caching control. Wrong time on servers can cause message loss or inaccessibility.
Computer file systems mark significant events in the life of a file with time-stamps: file creation time, last modification time, time of the last backup, and perhaps others. Certain applications rely on these time stamps to be reasonably precise and accurate. If time stamp would happen to be in the future (as might happen if computer clock were set backwards), some applications are much surprised and may perform an incorrect action or even die. Performing a backup of recently changed files may incorrectly skip archiving some files. A revision control system or an automated software build (e.g. unix make) can easily make wrong decisions when comparing file modification times, leading to improperly built software. The same argument applies to records in a database system.
When these actions and data are within a single host, the absolute accuracy (against UTC) is often not essential, but it is important to keep time monotonically increasing and with sufficient granularity (resolution). Infrequently adjusting the clock in large steps of one second or more can lead to trouble. Stepping time backwards can be worse than having inaccurate time.
When file systems, databases and applications span across more than one system (multiprocessor hosts, clusters, network file systems) it is essential to keep time in sync among the systems for the same reasons just described. To open a file on a remote file server via NFS, AFS or SMB, and have a local clock not within a second or so from the file server's clock, calls for trouble.
Nowadays when one seldom encounters a group of computers completely isolated from the rest of the networked world, it is probably the easiest to achieve the monotonicity requirement and to keep the clock differences between computers small by synchronizing each computer to an accurate source of standard time (UTC time).
Being able to have events time-stamped in the same timescale with sufficient resolution across different computers across the Internet (log files, accounting records), on routers (IP accounting, debug information), firewalls (policy violations, times of session setups, authentications), and network analyzers (packet transmission times) allows us to analyze the sequence of events that led to a certain event or problem. Not being able to compare times of events is a major problem when analyzing computer break-ins. Accurate and precise time with adequate resolution to less then one second and even down to a few milliseconds is very useful -- think of tcpdump time-stamps for example.
A document time-stamping service is usually one of the certification agency (CA) services. It allows one to obtain proof from the CA that a given document was time-stamped (e.g. after being created, signed, or approved) at a particular date and time, and it can also provide proof of the chronological sequence of documents. The time-stamping service involves cryptographic techniques and requires an accurate, precise and traceable time source.
Now that the cryptographic techniques for authentication, information transfer and key management are becoming part of everyday life, it is important to realize that many such methods rely on time-stamping events and keys, sometimes to prevent replay attacks. When computers are out of sync with each other and with outside sources of cryptographic information (e.g. smartcards, token cards), the authentication and encrypted information transfer is likely to fail.
Event scheduling requirements for accurate time can range from undemanding to very demanding. Think for example of Unix cron and at methods of automatic program execution, automatic testing, industrial process control, etc.
Transaction systems and distributed databases rely on accurate and precise time-stamping of events. This is especially important in large commercial systems (financial or legal transactions, money transfer, stock exchanges, electronic data interchange (EDI), etc.).
Communications protocols and real-time applications (including graphical user interfaces) with their use of timeouts, controlled delays, event queues, and timers, rely on fine clock resolution, monotonicity and continuity (absence of large jumps) of time.
Even the non-demanding protocols (in the time sense) such as SMTP (mail transport) with their use of timeouts can easily be thrown off-balance when time is adjusted in large steps, unless special work-around techniques are carefully used, which is not a commonly encountered feature in todays programs.
In conclusion: computer applications rely to various degrees on different properties of computer time: its resolution, precision, accuracy and monotonicity. For majority of common everyday computer applications the time accuracy within a second or some part of a second is a reasonable compromise between the good that it brings, and the price of it. To keep time within a few milliseconds of UTC is nice in some troubleshooting operations and probably not too much to ask for from the more important servers (file servers, mail servers, web servers, firewalls, ...). Fine resolution of computer time is important for performance measurements, fast communications protocols and real-time applications. Accurate time to below a millisecond could be important for certain applications (e.g. transaction systems, redundant multiprocessor systems) and is opening possibilities for new applications (e.g. measuring network performance and delays, synchronizing multimedia sessions and real-time conferencing, long-baseline scientific experiments, ...).
NTP is a protocol (RFC1305) for synchronizing computer clocks across the network to standard time, and is also a program (ntp daemon with utilities) that implements the protocol and controls the computer clock. Both the protocol and the software were developed by Prof. David L. Mills from the University of Delaware, USA. Due to its high reliability and rigorous care in handling its most precious object the time, due to its open source nature and the activities in the Internet standards bodies, and due to the ease of setting it up, there are always people contributing to the project and lots more benefiting from it. This is an assurance it will remain the best method of synchronizing computer clocks over network for some time.
The ntp daemon (ntpd or xntpd) and its accompanying utilities run on wide variety of computer platforms: there is a full range of Unix implementations, OpenVMS, Windows NT and Windows 2000 support, etc. The software is freely available and many proprietary operating systems include ntpd (or older xntpd) as part of the distribution. Most commercially available time servers are based on NTP software internally and support NTP protocol externally, including some popular routers. Unfortunately MS Windows 95 and 98 do not provide the architecture and programming interface to adjust the system clock, which can only be set (stepped), so one has to settle for simpler and coarser methods for synchronizing time on these platforms, and syntonizing clock frequency is impossible.
NTP daemon talks NTP protocol to other NTP daemons across WAN or LAN while carefully selecting the best time sources and best measurement samples, controls the software aspect of computer clock by implementing a phase locked loop (PLL) and a frequency locked loop (FLL) control of the clock, and is able to communicate with local sources of accurate frequency or time (radio clocks, caesium or rubidium frequency references, GPS receivers, ...) with its wide range of clock drivers.
The concepts of precision control of computer clocks as introduced by NTP are influencing the development of new kernel clock models and implementations (kernel PLL, kernel PPS (pulse-per-second) support, nanokernel, ...), and new programming interface such as RFC1589 and RFC2783 -- see the last section for links.
There is a wealth of information on NTP available on the net. Check the links in the last section, the most important being the http://www.ntp.org/ .
NTP server can run on almost any Unix host (and on most Cisco routers, Windows NT, Windows 2000, OpenVMS and other platforms). But, if you have a choice, and prefer a good time server as opposed to a mediocre one, it is worth spending some effort on choosing the best platform for it.
There are two kinds of errors that contribute to inaccuracy of time as kept by a NTP server running on a typical computer:
To minimize clock jitter, longer measurement interval is needed (to average-out the random measurement errors), but to minimize local oscillator frequency instability contributions, shorter sampling interval is needed (during a shorter period of time the oscillator frequency has less chance of drifting away). The best compromise in the sampling interval is called Allan intercept. At that sampling interval the Allan variance is at its minimum.
NTP does a good job of filtering clock jitter and choosing a good sampling interval. It is our job to choose a computer, its environment, its operating system and external frequency standard to keep both types of timing errors low.
Resolution of a computer clock (or clock granularity, as is sometimes called) is an important contribution to the clock jitter. If a computer clock is updated only 100 times per second as used to be the case in older computers (and still is on Windows NT, unless some real-time thread tricks are used), your clock reading may be off by as much as 10 ms from the true time, and on the average half that much. Nowadays more common figure for the clock interrupt rate is about 1000 Hz, giving an average clock reading error of 0.5 ms. Recent computer hardware and operating systems may utilize other tricks (e.g. using processor cycle counter (like Pentium RDTSC instruction), bus cycle counter or performance counters) to give much better apparent or actual resolution of the computer clock: microsecond clock resolution is almost a standard now, and nanosecond clock representation with reasonable interpolation to a submicrosecond level is not that rare anymore (e.g. nanokernel is now standard on FreeBSD). To support higher clock resolutions appropriate programming interface routines for reading/setting computer clock must be provided. Look for keywords such as micro kernel, microtime, nanokernel, real-time kernel and similar.
Reading a computer clock is typically done by calling a system routine. The time needed for a process to read the computer clock is subject to process scheduling, paging and swapping, interruptions from other processes and hardware interrupts. Even if the computer clock were absolutely accurate and precise, you still wouldn't know when exactly was the clock read during the microsecond or two spent in the system routine. Faster processor minimizes this window of uncertainty.
Even though NTP server presents a very light load on the CPU, it is a good idea to avoid heavily loaded hosts with lots of paging/swapping and I/O activity to minimize this source of clock jitter. Even worse may happen on a truly busy host: clock interrupts might get lost or masked by higher priority interrupts, causing large disruptions in time. Although NTP software is able to compensate to some degree for the disruptions, you pay the price by getting less accurate time.
There is a simple program util/jitter.c in the NTP software distribution kit, which may help determine the clock resolution your hardware and operating system provides. Besides helping to determine the computer clock resolution, this program also helps to see periodic and random irregularities in the readings of the computer clock. A graphing program such as gnuplot, Grace or Dataplot comes handy to visualize the collected data.
As for the external frequency standards, there are two choices: other lower-stratum NTP servers accessible on the Internet (over LAN or WAN), and locally connected frequency references such as radio clocks, caesium or rubidium frequency standards, GPS receivers, etc. In both cases one should aim for the lowest possible jitter/noise: fast networks and low steady network delays on one hand, and low latencies and low random delays in local clock interfacing, along with a clock source with a low intrinsic jitter. We'll say more about this topic in the next section.
To summarize:
A clock of a typical computer is implemented by counting interrupts from a cheap uncompensated quartz oscillator (RTXO) -- see section Why accurate and precise computer time? for a brief discussion of its characteristics.
Assuming that replacing a quartz oscillator of a processor with a more stable one is not an option (but then again, maybe it is), there are two things one can do about it: choose a computer with more stable oscillator, and control or choose the environment in which the computer with NTP server will operate, minimizing the most pronounced cause of short-term frequency drift in RTXO: temperature changes.
To make it easier to compare oscillator behaviour of different oscillators running at different nominal frequencies (fnom, units: 1/s or Hz), we usually factor-out the nominal frequency at which the quartz oscillator of a given computer operates. By dividing the frequency offset = (factual - fnom) by a nominal oscillator frequency, we obtain a relative frequency = (factual - fnom) / fnom . It is a time-dependent, dimensionless quantity. For convenience, especially when dealing with quartz oscillators, we often show it multiplied by a million and append pps (parts per million), but it is still a dimensionless quantity.
Relative frequency is also known as fractional or normalized frequency departure, y(t), or normalized frequency difference. It is also a change in the error of a clock's time divided by the elapsed time, t, over which the change occurred. NTP documentation and utilities usually call it frequency drift. Strictly speaking the relative frequency is a frequency measurement at a particular instant, while frequency drift is its change over a period of time.
While the average frequency offset is a major cause of inaccurate time in computers that are not synchronized to external source of accurate time, possibly leading to time offset of a minute per week, this can easily be compensated for by a NTP server.
Relative frequency as large as 500 ppm can be compensated for by NTP V4, and up to 100 ppm by NTP V3 (e.g.: 100 ppm = 100e-6 = 1e-4 = 0.0001*3600*24*7 seconds / week = 60.48 s/week). If you are unlucky to have a computer with greater than 100 ppm of relative frequency (not so unusual for a low cost PC), make sure you choose NTP V4!
Although the just described large relative frequency is not a problem by itself, it may indicate generally poor quartz oscillator design or manufacturing, with possibly other problems such as frequency fluctuations due to poor power supply regulation. My (unproven) advice is to stay away from machines with high relative frequency when choosing platform for a NTP server -- they were obviously not designed for accurate timekeeping. Aim for a platform with a relative frequency below roughly 20 or perhaps 50 ppm.
It is quite easy to measure the relative frequency on the assumption that it is nearly constant during the measurement interval, which is close to truth when the absolute value of the relative frequency is large compared to its fluctuations (e.g. temperature fluctuations and ageing), where our concerns are at the moment.
One primitive way would be to set the time of a computer to known good time (e.g. with ntpdate -b some-reference-host), noting the wall-clock when this was done, then check it after a day or two with ntpdate -bq some-reference-host. The time offset reported, divided by the time interval between the two calls, gives the (average) relative frequency, which is the slope of (supposedly linearly) increasing time offsets. Just make sure no mechanism of controlling and adjusting the clock is active, such as calendar clock (hardware clock, real time clock, RTC, CMOS clock), a NTP server or some other time-keeping software.
Easier and more accurate method is to install NTP and let it synchronize to some sane but not necessarily very good NTP server. After a day or two check the contents of the file /etc/ntp.drift, or run command ntpdc -c loop and look for line frequency: xx.xxx ppm .
Say you have chosen a few candidate-hosts for the high-quality NTP server, having reasonable average relative frequency and satisfying criteria from the previous section Minimizing clock jitter.
Install NTP daemons on candidate hosts and synchronize them to a reasonably good existing NTP server, better yet, give them a choice of three external NTP servers. At this point in time the quality of these external NTP servers is not that crucial, just make sure they appear sane and within say 100 ms of network round-trip delay and within few ms of each other. The delay is reported by ntpq -p utility, but probably the one reported by ping or traceroute is good as well for this purpose.
Enable NTP loop statistics in the configuration file of the NTP daemon (/etc/ntp.conf) by including options like like:
statsdir /var/adm/ statistics loopstats filegen loopstats file ntp-loopstats type week link enable
After an hour or so check with ntpq -p that the candidate hosts are synchronized to external servers, and check that the log files /var/adm/ntp-loopstats on all candidate hosts are growing (peek at their contents just to make sure). Let it all run for a couple of days.
Collect the loop log files. They should contain something like:
51673 55795.304 -0.000002000 0.213623 0.016504861 0.021785 8 51673 55819.534 -0.000611000 0.212753 0.024789345 0.018871 9 51673 55942.352 0.000017000 0.212784 0.029586178 0.016343 9 51673 56009.357 -0.000650000 0.212143 0.028697874 0.014157 9 51673 56266.352 -0.000421000 0.210541 0.028015846 0.012287 9 51673 56454.355 0.000622000 0.212280 0.024739002 0.010676 10
The first two fields show the date (Modified Julian Day) and time (seconds and fraction past UTC midnight). The next five fields show time offset (seconds), frequency offset (parts per million -- PPM), RMS jitter (seconds), Allan deviation (PPM) and clock discipline time constant (log2 of the polling interval for the selected synchronization source).
Take a graphing program such as gnuplot, Grace or Dataplot, or whatever you are comfortable with, and plot relative frequency (column 4) against time (first two fields: e.g. a formula (column(1) - 51673 + column(2)/86400.0 would give you (fractional) time when sample was taken, in days since midnight UTC of the first day in the above example). It is a good idea to subtract the average of column 4 from each column 4 sample to be able to compare hosts with different average offsets on the same scale.
Now compare these graphs, looking for the following features: the amount of frequency fluctuations on each host, the suddenness of frequency changes (first derivative), the abrupt jumps in frequency.
NTP loop frequency graph gives a good overall impression on the behaviour of a NTP server. The relative frequency average value is not critical (this is why we suggested subtracting the average) and neither are slow changes in frequency (e.g. due to slow thermal drift or ageing of a quartz oscillator), both are easily compensated for by NTP.
Fast changes in frequency on the other hand, of say 1 ppm per hour or more, indicate the corresponding large changes in the phase (time offset). Here are some of the possible reasons for this behaviour:
Examples: the following diagrams show how the relative frequency changes through a period of several days. In all cases the average relative frequency for the period of experiment was subtracted from the samples to get it floating around zero.
The first diagram shows some well-behaved NTP servers. The blue trace is our stratum-1 server with a GPS clock and a small PLL time constant (causing fast response to frequency changes while chasing the air-conditioning cycles), the remaining traces show some other well-behaved stratum-2 and stratum-3 servers in Slovenia. Note the relative frequency does not change by more than 1 ppm during several days and that the changes are smooth:
Figure 2.
The following diagram shows the behaviour of three other stratum-2 servers, synchronized over WAN to their references. Note the Y-scale changed by an order of magnitude compared to the previous diagram. The frequency changes are larger, but ntpd still seems to be in control of the situation:
Figure 3.
The last one in this section is an example of a public stratum-2 server with serious problems, probably due to heavy CPU load. With such sudden changes in frequency compensation (upper trace in red), the resulting time offset errors (lower blue trace) routinely exceed +/- 40 ms, and +/- 80 ms on occasion:
Figure 4.
The right-hand side of the diagram starting at day 3.5 gives a nice illustration of the relationship between the frequency error and the resulting time offset. As the time is an integral of frequency, the slope (first derivative) of the time offset error is proportional to the frequency offset.
NOTE: the relative frequency samples (the so called "loop frequency") were obtained by calling ntpdc -c loop every 10 minutes for each host involved. The time offset in the last example was measured against a nearby GPS-synchronized NTP server.
Choosing a computer with better thermal stability of its quartz oscillator would help to minimize frequency fluctuations and resulting time offset errors. You could either coarsely measure the frequency change per degree centigrade of each candidate computer directly, or if several computers share the same environment (room), you could just compare them among themselves and don't care for the absolute figures.
When measuring the temperature dependency of the oscillator frequency one has to keep in mind that the dependency is non-linear, depends on other factors such as air humidity, air pressure, vibration, orientation, magnetic fields, changes in power supply, and that it changes over a longer period due to ageing. So we'll have to contend with a rough estimate over a small range of expected room temperatures.
Also to be kept in mind is that it takes some time for a change in room temperature to result in a new stable temperature of the oscillator inside the computer chassis, and that it takes further time for the NTP daemon to compensate for this change (setting server maxpoll to 6 can shorten this time somewhat, but don't forget to remove this setting after the experiment is over).
The following procedure will probably do for our purpose:
The sign of the resulting figure is interesting but not important for our purpose, but its absolute value can help you choose a better platform among candidates. A figure of 1 ppm/K is typical for a RTXO quartz oscillator, a figure below say 0.2 ppm/K indicates a very good choice, a figure above 2 ppm/K is not unusual for a cheap PC platform and the higher it is, the more unstable your NTP server will be.
Example: The following diagram shows the behaviour of two similar NTP V4 stratum-2 servers located close together in the same air-conditioned room. Their reference stratum-1 servers are accessed over WAN (about 50 ms round-trip delay), resulting in a typical polling interval of 1024 seconds and consequently large PLL time constant. The quartz oscillator of host P has a relatively large temperature dependency: judging from the diagram the temperature coefficient is about +1.2 ppm / K. The other host K has almost 20 times lower temperature coefficient (perhaps only apparently due to a better chassis design) and is shown as a reference.
Despite air-conditioning the temperature changes near the computer cabinets show daily peak-to-peak range of almost three degrees centigrade, the temperature excursion on day 3 at noon (time = 3.5) was due to other reasons. Fortunately the temperature changes are gradual and quite smooth -- no direct airflow from the air-conditioning equipment was hitting the computer cabinets.
Figure 5.
One can clearly observe the frequency changes (green traces on the top set of three traces) as a result of temperature fluctuations (the fuzzy red trace). As a consequence of temperature decrease (consider diagram starting at about t=1.7 and focus on host P shown in dark green and dark blue traces) the quartz oscillator frequency decreases (not shown) -- the computer clock gets slow, causing time to fall behind the reference time (dark blue trace goes positive: ref_time - our_time > 0), until ntpd starts compensating for the new relative frequency by changing the frequency drift compensation (dark green trace starts to fall). The lag between the temperature change and subsequent frequency drift compensation change is caused by large PLL time constant -- the lag can be estimated from the diagram to be about three hours at this polling interval.
The example clearly demonstrates that from +/- 5 ms up to 30 ms of time offset error in this example is a direct consequence of relatively large temperature coefficient of a quartz crystal coupled with temperature fluctuations in the computer room, augmented by a relatively large PLL time constant, typical for a WAN-synchronized stratum-2 server.
The other important issue is that the time offset error is proportional to the rate of change (first derivative) of the temperature: the frequency compensation reaction time (PLL time constant) stays the same, consequently the difference between the actual and predicted frequency increases as the rate of frequency change increases. To steep frequency changes the ntpd reacts by lowering the PLL time constant, but this did not happen in our case.
NOTE: the time offset was measured against nearby GPS stratum-1 server (some random noise and an occasional missing sample in time offsets are due to this measurement method). The frequency drift compensation values were obtained by ntpdc -c loop, the temperature was sampled by a DS1820 temperature sensor near the computer cabinet, connected to the RS-232 interface via a low-cost adapter (DS9097U-S09).
Choosing a good reference for your NTP server(s) is as important as choosing a good platform. The time your NTP server will provide can only be as good as its reference sources. The choice of the reference affects the jitter (the random part of the differences between the reference time and local computer time), and can contribute to a systematic error in time offset against UTC.
The reference clock can be a local frequency reference or time reference (caesium beam tube, hydrogen maser frequency standard (aiming high!), rubidium gas cell frequency reference, GPS clock receiver, radionavigation receiver, LF or HF radio clock, ...), or another NTP server, accessible over LAN or WAN. Much of the general consideration applies to both.
We won't say much about the local sources of time and frequency, as it all depends on what is available, how much one is willing to pay for, on its interfacing to the computer, etc. More on that is available in The NTP FAQ and HOWTO. Some magic words to look for are: pulse-per-second (PPS) signal availability from the clock and its accuracy, PPS kernel support (see links section at the end of this document), serial-line (or parallel printer port) driver low-latency time-stamping support, and GPS receivers suitable to receive time (forget about NMEA-only solution). A nice alternative to PPS hassles and time-stamping of events on a serial line by a computer is an approach taken by the Palisade or Acutime2000 GPS receiver from Trimble Navigation (I'm just a happy user, no other ties to the manufacturer!), where a GPS receiver does the time-stamping of events generated by a computer.
When choosing other NTP servers or peers as our reference clocks, we need to consider two sources of time errors:
NTP was designed with LAN and WAN characteristics in mind. It can filter-out much of the jitter caused by variations and bursts in the network load, by long and changing round-trip delays, temporary network congestions/outages and the like. NTP V4 (ntpd) is even better in this regard then NTP V3 (xntpd) and is therefore more suited to synchronization over WAN (talking about the version of our local NTP client, the version of the remote NTP server does not matter).
Nevertheless, NTP will give better time estimate by choosing a closer NTP reference server (as measured by round-trip delay), and by choosing faster and less congested network routes. Keep in mind that it is far more important to have more constant and predictable round-trip delays over most of the day, than to minimize the absolute round-trip delay. Positioning your NTP server on a firewall host just to gain a millisecond or two does not offset the benefit of a possibly more stable internal machine.
Just to give an idea: round trip delays of 50 to 70 ms are quite acceptable and can routinely keep your time offset below few milliseconds (especially with NTP V4), provided your platform conforms to the selection criteria as explained in the previous section.
NTP assumes the network path is symmetric: that half of the round-trip delay is spent in the outgoing direction, and half of it for the reply. This is not always entirely true and can introduce systematic time offset. In practice the errors caused by network jitter are larger than the systematic offset, so this is normally not a cause of great concern. Check your routing paths if in doubt -- sending traffic across a surface link and receiving it over a satellite will definitely cause systematic time offsets! IP over cable-TV networks, ADSL links or split-path routes might be a problem as well.
Example: the following scatter diagrams show the dependency of time offset to network round-trip delay for samples obtained by a NTP server with a local GPS clock, located in Slovenia/Europe. Data were obtained from the ntpd's peerstats log file at the server goodtime.ijs.si and collected over a one month period.
Figure 6.
There are six well-behaved European stratum-1 servers shown in the diagram (round-trip delays range from 40 to about 70 ms) and one well-behaved in the USA (round-trip delay 150 ms and more). The most accurate time offsets come from the samples with the minimum network delay to a given server. As the delay for a given sample gets larger than the minimum possible for a given network path, the random network delays creep in and there is usually no relationship between the random part of the delay in the packet send, to the random part of the delay in the response packet received, so the samples form a wedge on a scatter diagram.
ntpd prefers the samples with the smallest delay from the set of last eight samples obtained for each reference server, effectively choosing the apex of each wedge.
Now take a look at the close-up of the right-hand side of the above diagram, showing only the samples for the remote NTP server time-B.timefreq.bldrdoc.gov, located at the NIST Boulder Laboratories, Boulder, Colorado, USA.
Figure 7.
A distant transatlantic reference server and a long data-collection period was chosen purposely to illustrate how a complex network path (about 14 hops one-way) can change over time and how the systematic errors in time offset creep in. Changes in the international network topology is something one does not have much influence on. With some imagination one can see eight distinct wedges in the diagram, the result of at least eight path changes during the one month period. Even if random errors were completely filtered out, the systematic offset errors of up to 5 milliseconds in our case (the apexes of the wedges) would determine the accuracy of the local clock.
Assessing the quality of the remote NTP server (or potential peer) is more difficult than choosing local computer platform, since it does not sit on your desk and you have limited insight into its characteristics and management. Yet, much can be induced if it is willing to respond to NTP control messages, and indeed much of what has been said about choosing a local platform, applies almost directly to choosing remote reference server. We'll mostly explain the differences and assume you have read the section Choosing a platform for a NTP server.
Checking relative frequency of a remote server is as easy as checking it at the local NTP server: use the command ntpdc -c loop remote-server and look for frequency: xx.xxx ppm, or use the command ntpq -c rv remote-server and look for frequency=xx.xxx. I would shy away from servers with absolute values of (roughly) 50 ppm or more.
Checking for frequency fluctuations involves a bit more work since you can't just collect the loopstats log files from the remote server (or perhaps you can, if you have a friend there). It involves periodically reading the relative frequency (as just described), extracting the frequency value and logging it to a file, together with a local time when sample was taken. A small cron script can do the job, or look into the scripts subdirectory of a NTP source distribution. Do not be too intrusive poking around someone else's NTP servers -- limit the sample rate to no shorter than one sample every 15 minutes or so, limit the duration of the measurement to a few days, and perhaps notify the remote manager, explaining what are you doing and asking for permission. It is essential to plot the changes of the relative frequency through time -- a single inspection can't tell you anything about possible serious disruptions in frequency due to missed clock interrupts or computer environment problems.
It may happen that the reported relative frequency of the remote server is always zero. This can either be good, indicating that its computer clock is controlled in some other way by a frequency standard, or it may be bad, indicating a serious problem in the remote server. Periodically checking for time offsets against some other NTP server will tell which is which: constantly rising (or falling) offsets, which are reset to zero from time to time is a telltale of seriously broken server.
Another thing to look for at remote NTP server is its configuration as reported by ntpq -p remote-server or ntpdc -p remote-server. It will report the local reference clocks for stratum-1 servers (type, offset and jitter), from which one can infer what can be expected from that server. It will also report other NTP servers if configured. For a stratum-1 server it is customary to have some other stratum-1 servers configured as peers, as a way to compare times and to provide some degree of backup, should the local reference clock fail. But don't expect NTP servers of some major national standards laboratories to trust somebody else to back up their time!
Checking servers (peers) of a remote stratum-2 server is much more important than checking peers of a stratum-1 server which has its own reference clock -- after all they are the only source of time that server has. Look for their reachability (octal value 377 is perfect, a zero bit here and there is acceptable, reachability 0 (or stratum 16) means unreachable), the number of reachable servers (three as a bare minimum, four or more is recommended), the offsets to them (below few milliseconds for the currently valid sources), round-trip delays to them (below 50 ms is nice, 100 ms could still be useful). Several unreachable servers configured possibly indicate this server has been neglected for a long time. When lots of reference servers are configured, it probably indicate this server's main purpose is checking other NTP servers and not disseminating time.
Still more information can be provided by other commands in the ntpq and ntpdc utilities and by other means, but we have covered the most important indicators of a stable, sane and well-configured remote NTP server.
If the remote NTP server does not respond to NTP control messages (does not react to ntpq or ntpdc queries -- but make sure your own firewall is not blocking UDP communication between local high port and remote port 123), you are cut off from the most important indicators of its behaviour and settings. You can still configure it as your reference server, together with a couple of other servers, and periodically compare offsets to each of them, but this is a poor substitute for seeing its configuration and loop behaviour. Asking its manager or perhaps checking for their web page is another option, and believing the information you get is up to you.
NTP uses a sophisticated two-stage algorithm to reject broken time sources and to dynamically pick the best from the set of available ones. An obvious reason for having more than one or two external reference time sources accessible over diverse network paths is the redundancy it provides, but this is not the only reason. Statistical methods NTP uses are meaningless and powerless if you don't give it sufficient freedom to choose from several time sources -- after all, one does not apply statistical methods to one or two observations. Four reference servers is the recommended minimum. Consider: with three servers configured and one of them fails, which of the remaining two would you believe? Generally: to protect against n bad references (malicious or just simply broken), you need 3*n+1 servers altogether.
See also examples at the end of the section Frequency fluctuations (drift), they apply very much to the selection of the reference NTP servers as well.
Example: The following example wraps up the section Selecting lower-stratum NTP servers to synchronize to. By choosing a couple of relatively close stratum 2 servers of quite different quality (less than 13 ms round-trip delay), plus two primary NTP servers for comparison, one very close (goodtime.ijs.si), the other relatively far away (ntp1.ptb.de), we can demonstrate on a single scatter diagram both the quality of the server itself, plus a jitter introduced by network:
Figure 8.
The vertical spread of the apex of each wedge shows the instability of the server itself (relative to our measuring host), the horizontal rightward spread of each wedge shows the dispersion of random network delays, the horizontal position of each apex shows the network distance of each NTP server from our host.
The next diagram is just a close-up of the same:
Figure 9.
So, which servers would you choose as your reference servers, and from which would you shy away?! Looking at the same servers from different point in the network the answer would probably be different.
NOTE: the data was collected from peerstats NTP log file for a period of one week from the stratum-2 server time.ijs.si, which is nicely stable and very close to our primary NTP server. The stability of some of the servers shown have been improved after the measurement was made, so the current situation could be different.
From the evidence I could collect, it appears the awareness of the NTP in the Slovenian computing community dates back to the end of 1993, one year after the NTP V3 specifications appeared (RFC1305), when one stratum-2 server was set up at the academic network provider Arnes (host kanin.arnes.si running Unix, set up by Tomaz Borstnar), and three stratum-2 servers were set up at the computing centre of the University of Ljubljana (VAX hosts uek1.uni-lj.si, uek2 and ueksa, set up by Rok Vidmar).
In the first days of January 1994 another stratum-2 was set up by Primoz Peterlin on a HP-UX platform biofiz.mf.uni-lj.si at the Institute of Biophysics, Faculty of Medicine, University of Ljubljana. This later became the first Slovenian public stratum-2 NTP server. For the record: the original posting to the NOTES conference system from Primoz Peterlin, dated January 30, 1994 (in Slovenian language), and a posting from December 13, 1994 by Iztok Umek with section 4.8 dedicated to NTP servers in Slovenia at that time. This server had been retired from the list of public servers in January 2006.
At the "Jozef Stefan" Institute in Ljubljana two stratum-3 servers were set up on August 1, 1994 (a Convex supercomputer marvin.ijs.si running Unix, and a VAX/VMS host cathy.ijs.si), along with three internal stratum-4 hosts available for synchronization to other computers at the Institute.
Another early bird is a public stratum-2 server at the Hydrometeorological Institute of Slovenia (now Agencija RS za okolje) (host hmljhp.rzs-hm.si was set up by Metod Kozelj on August 1995. (Since 2001-07-20 replaced by calvus.rzs-hm.si)
Soon followed a stratum-3 server on a VAX/VMS host vedana.mzt.si at the Ministry of Science and Technology of Slovenia (now MVSZT).
A stratum-2 server with an independent Internet link to primary NTP servers was set up and made available for synchronization to Slovenian NTP peers by a commercial ISP K2.net around that time.
Three stratum-2 servers were set up in January 1997 at the University of Maribor, along with at least three internal stratum-3 servers.
"Josef Stefan" Institute set up a stratum-2 server time.ijs.si, and on May 29, 1998 made it available as a public server.
In February 2000 Arnes reviewed the configuration of their NTP servers. Four stratum-2 servers were set up (replacing their previous set of ad-hoc servers) and all routers in the Slovenian academic network are now synchronized to them. Service is extended to their customers and others interested. Two of these four servers are public servers, they appeared on the list of public servers on June 2, 2000.
During all this years most of the Slovenian stratum-2 servers were peered with each other, providing a common stratum-2 layer for most of the stratum-3 NTP servers in Slovenia and abroad.
May 10, 2000 is the official date when the first primary (stratum-1) NTP server in Slovenia went into regular operation at the J. Stefan Institute, following a month of experimental use and fine-tuning. (It was included in the updated list of public servers on May 25, 2000). goodtime.ijs.si is a Unix workstation synchronized to a GPS receiver in time-only solutions mode.
Since January 2009 the Slovenian Institute of Quality and Metrology (SIQ) is offering access to their national time reference through NTP at ntp.siq.si .
Several other organizations and ISPs in Slovenia are now offering NTP service to their internal computers and to their customers, although the quality of this service is often not checked regularly and too often neglected. There is plenty of room left for improvement, even without the addition of primary clock sources: better selection of platform and its environment, pre-evaluation and selection of trustworthy stratum-1 servers, continuous monitoring of NTP server performance, etc.
I'll be glad to add additional references and correct or augment the above information. Please send me mail.
An very much incomplete list of primary sources of time in Slovenia:
More information is welcome, please send me mail.
A list of world-wide public NTP servers is maintained by the NTP Public Services Project and is available at http://ntp.isc.org/Servers/. Below is are the Slovenian entries from that list, along with some unlisted entries with restricted access.
Some organizations in Slovenia dedicate certain NTP servers for use only by their users, or make them available with other restrictions. Guidelines for setting up clients may be provided in the referenced documents. Some of these are:
From the charter of the mailing list slo-time@ijs.si :
A meeting place for people concerned with distribution of accurate time in Slovenia, in particular the NTP (Network Time Protocol) server administrators and keepers of primary clocks in Slovenia. Themes include legal time (e.g. time zones, daylight saving time), calendars, and software dealing with time and calendar.
See mailing list manager page at http://mailman.ijs.si/listinfo/slo-time, with archives available at http://mailman.ijs.si/pipermail/slo-time/ .
Document initially created on 2000-03-21 by
Mark Martinec
Updated on: 2000-03-22, 2000-03-23, 2000-03-27 (more links, minor updates),
2000-04-05 (links restructured), 2000-04-13 (new section: Why accurate
and precise computer time?), 2000-04-14 (new sections: The properties
of a computer clock; NTP, a Network Time Protocol and the program),
2000-05-10 (new primary NTP server goodtime.ijs.si announced,
new section: Choosing a platform for a NTP server),
2000-05-18 (new section: Private NTP servers in Slovenia, with link
to Arnes page with instructions for their users),
2000-05-24 (completed section: Choosing a platform for a NTP server,
new section: Selecting lower-stratum NTP servers to synchronize to),
2000-05-30 (links to two new public st-2 servers at Arnes),
2000-06-12 (temperature dependency example diagram),
2000-06-13 (text cleanup, spelling checked),
2000-06-14 (three new diagrams illustrating loop frequency behavior),
2000-06-15 (added scatter diagrams to the section: Jitter and systematic
offsets introduced by network; more text cleanup, rewritten some paragraphs),
2000-06-19 (added diagram: accuracy / precision / resolution),
2000-06-21 (added scatter diagram for Slovenian st-2 servers),
2000-07-03 (officially announced this document in comp.protocols.time.ntp),
2000-07-10 (a couple of links added),
2000-09-21 (minor text cleanup), 2000-10-05 (added link
to Implementation of the ISO 8601 Standard Around The World),
2000-10-17 (two more links on Daylight Saving Time),
2000-11-14 (added references to private Slovenian NTP/GPS servers
at Center vlade RS za informatiko and Ljubljana Airport),
2000-11-17 (added several references to ITU-R TF Series
Recommendations in force),
2000-11-27 (added a couple of new links),
2000-12-06 (added link to Secure Network Time Protocol),
2000-12-12 (changed text of IJS public time server entries),
2000-12-18 (updated link to Bancomm's reorganized site - thanks
to Ruedi Aschwanden),
2000-12-28 (fixed a typo, added a link to Index to NTP usage),
2001-02-13 (two new links),
2001-03-12 (fixed links to reorganized Agilent site),
2001-04-17 (added a link to Low-frequency radio time signals),
2001-05-10 (a few clarifications),
2001-05-15 (added some links),
2001-06-18 (added reference to RFC1165 and a white paper from
Cisco Systems: Essential IOS Features Every ISP Should Consider),
2001-07-23 (more links),
2001-07-24 (updated info on two Slovenian public NTP servers:
biofiz.mf.uni-lj.si and calvus.rzs-hm.si),
2001-09-04 (add link to Sun Blueprints article on NTP
by D. Deeths and G. Brunette),
2001-09-20 (stressed the importance of having 4 our more reference servers),
2001-09-26 (added NTP's definition of jitter and wander),
2002-01-18 (fixed links to ISO 8601),
2002-02-06 (fixed links to Peter Meyer's site),
2002-07-23 (fixed link to MJD conversion utility),
2004-03-05 (added link to NTP server using PC gnu/Linux and FreeBSD),
2004-05-13 (updated links to Symmetricom),
2005-02-14 (updated several link)
2005-06-16 (new links)
2006-01-05 (updated TAI-UTC offset to 33s, few minor updates)
2006-01-19 (retired server biofiz.mf.uni-lj.si, announcement by Primoz Peterlin)
2009-02-20 (new Slovenian st-1 server at SIQ, a national reference)
2010-10-07 (added a link to World Time Solutions)
2011-11-30 (added a link to TimeTools)