High Resolution Timer
High Resolution Timer (HRT) is a powerful, feature-rich software package that is designed to provide millisecond-level time resolution and accuracy on Windows-based computers. The software is suitable for both scientific and commercial applications, and is designed to be used in a wide variety of contexts.
The High Resolution Timer allows for more accurate timing of events, enabling improved performance and efficiency.
Features:
High Resolution Timer comes packed with a variety of features, including:
• High-precision timing: HRT offers millisecond-level time resolution and accuracy. This means that it can be used for extremely precise measurements of events and processes.
• Flexible, user-friendly interface: HRT is designed to be easy to use, with a straightforward and intuitive interface.
• Multiple timer functions: HRT supports both absolute and relative timing modes, as well as a wide range of alarm and notification options.
• Compatibility with a variety of data sources: HRT can be used with a variety of data sources, such as Windows Event Logs and Windows Performance Counters.
• Support for multiple programming languages: HRT is compatible with a range of programming languages, including but not limited to C, C++, C#, Visual Basic, and Java.
• Advanced features: HRT also includes advanced features such as customizable user settings and an API for integration with other applications.
• Comprehensive documentation and support: HRT is backed by comprehensive documentation and a dedicated support team.
Advantages
High Resolution Timer provides numerous advantages over other similar software packages. These include:
• High-precision timing: HRT offers millisecond-level time resolution and accuracy, making it suitable for extremely precise measurements of events and processes.
• Flexible and user-friendly interface: HRT is designed to be easy to use, with a straightforward and intuitive interface.
• Comprehensive documentation and support: HRT is backed by comprehensive documentation and a dedicated support team.
• Compatibility with a variety of data sources: HRT can be used with a variety of data sources, such as Windows Event Logs and Windows Performance Counters.
• Support for multiple programming languages: HRT is compatible with a range of programming languages, including but not limited to C, C++, C#, Visual Basic, and Java.
• Advanced features: HRT also includes advanced features such as customizable user settings and an API for integration with other applications.
Conclusion
High Resolution Timer is a powerful and feature-rich software package that is designed to provide millisecond-level time resolution and accuracy on Windows-based computers. It is suitable for both scientific and commercial applications, and is designed to be used in a wide variety of contexts. The software is easy to use, with a straightforward and intuitive interface, and is backed by comprehensive documentation and a dedicated support team. It also offers compatibility with a variety of data sources, as well as support for multiple programming languages, and advanced features such as customizable user settings and an API for integration with other applications.
1. The software should be able to accurately measure time intervals down to the microsecond level.
2. The software should be able to measure the frequency of events.
3. The software should be able to measure the jitter of events.
4. The software should be able to generate accurate reports of measured data.
5. The software should provide a user-friendly interface for users to easily understand and interact with.
6. The software should be compatible with various operating systems and platforms.
7. The software should be able to integrate with other software for data analysis and reporting.
8. The software should provide high levels of security and data integrity.
9. The software should be able to handle large amounts of data.
10. The software should provide reliable and efficient performance at all times.
2 minute read
This post lists the code for creating a high resolution timer for the Microsoft Windows platform. High resolution timers are often used in multimedia and entertainment applications for timing events up to the microsecond.
This is a heavily modified re-post of the article that used to be on the Scriptionary.com website before the change to the blog, read the source code for details.
#ifndef _TIMER_H_
#define _TIMER_H_
#ifndef WIN32_LEAN_AND_MEAN
# define WIN32_LEAN_AND_MEAN
#endif
#include <windows.h>
// Defines a simple High Resolution Timer
class Timer
{
public:
// Construct the timer and initialize all of the
// members by resetting them to their zero-values.
Timer(void) {
Reset();
}
~Timer(void){ /* empty */ }
// Activate the timer and poll the counter.
void Start(void) {
_Active = true;
PollCounter(_StartCount);
}
// Deactivate the timer and poll the counter.
void Stop(void) {
_Active = false;
PollCounter(_EndCount);
}
// Stops the timer if it's active and resets all
// of the Timer's members to their initial values.
void Reset(void) {
if (_Active)
Stop();
QueryPerformanceFrequency(&_Frequency);
_StartTime = (_EndTime = 0.0);
_StartCount.QuadPart = (_EndCount.QuadPart = 0);
_Active = false;
}
// Returns the time elapsed since Start() was called
// in micro-seconds
const long double GetTimeInMicroSeconds(void) {
const long double MicroFreq =
1000000.0 / _Frequency.QuadPart;
if (_Active)
PollCounter(_EndCount);
_StartTime = _StartCount.QuadPart * MicroFreq;
_EndTime = _EndCount.QuadPart * MicroFreq;
return _EndTime - _StartTime;
}
// Returns the time elapsed since Start() was called
// in milli-seconds
const long double GetTimeInMilliseconds(void) {
return GetTimeInMicroSeconds() * 0.001;
}
// Returns the time elapsed since Start() was called
// in seconds
const long double GetTimeInSeconds( void ) {
return GetTimeInMicroSeconds() * 0.000001;
}
// Returns TRUE if the Timer is currently active
const bool IsActive(void) const {
return _Active;
}
private:
// Poll the query performance counter, safely by tying
// the polling functionality temporarily to a single
// logical processor (identified by 0).
void PollCounter(LARGE_INTEGER& Out) {
HANDLE Thread = GetCurrentThread();
DWORD_PTR OldMask = SetThreadAffinityMask(Thread, 0);
QueryPerformanceCounter(&Out);
SetThreadAffinityMask(Thread, OldMask);
}
private:
bool _Active;
long double
_StartTime,
_EndTime;
LARGE_INTEGER
_Frequency,
_StartCount,
_EndCount;
}; // class Timer
#endif
Here’s an incredibly simple example of how you could potentially use the class:
int main( int argc, const char* argv[] )
{
Timer myTime;
std::cout << "Starting the Timer..." << std::endl;
MyTime.Start();
std::cout << "Going to sleep for 5 seconds." << std::endl;
Sleep(5000); // or something else for a while
std::cout << "Stopping the Timer." << std::endl;
MyTime.Stop();
std::cout << MyTime.GetTimeInSeconds()
<< " seconds passed" << std::endl;
return 0;
}
Within the discipline of software performance engineering (SPE), application response time monitoring refers to the capability of instrumenting application requests, transactions and other vital interaction scenarios in order to measure their response times. There is no single, more important performance measurement than application response time, especially in the degree which the consistency and length of application response time events reflect the user experience and relate to customer satisfaction. All the esoteric measurements of hardware utilization that Perfmon revels in pale by comparison.
Of course, performance engineers usually still want to be able to break down application response time into its component parts, one of which is CPU usage. Other than the Concurrency Visualizer that is packaged with the Visual Studio Profiler that was discussed in the previous post, there are few professional-grade, application response time monitoring and profiling tools that exploit the ETW facilities in Windows. As we have seen illustrated, the Concurrency Visualizer demonstrates the capability to reconstruct an application’s execution time profile from kernel trace events with great precision and detail. An application response time monitoring tool that generates ETW response time events that can then be correlated with the Windows OS thread scheduling events has potential value in performance investigations of many sorts.
Fortunately, there is such an application response time monitoring tool for both Windows C++ native and .NET applications that is integrated with ETW. It is called the Scenario class. (The name “Scenario” was suggested by a Habib Heydarian, an excellent Program Manager at Microsoft, and is a nod to the popular practice of scenario-based development, associated primarily with Agile software development methodologies. See, for example, Scenario-Based Design of Human-Computer Interactions, by John Carroll and User Stories Applied: For Agile Software Development by Mike Cohen.)
The Scenario instrumentation library is a free download available from the MSDN Code Gallery site here.
The Scenario class is an ETW wrapper built around an extended version of the .NET Stopwatch class. The standard .NET Stopwatch class in the System.Diagnostics Namespace is used to measure elapsed time in a .NET Framework application. The .NET Stopwatch class itself is a managed wrapper around the native Windows APIs called QueryPerformanceFrequency and QueryPerformanceCounter (QPC) that access a high resolution timer and are used to calculate elapsed time. A straightforward extension of the .NET Stopwatch class adds a call to QueryThreadCycleTime (QTCT), a Windows API that provides a measure of CPU usage at the thread level, beginning in Window version 6.
Prior to discussing the use of the application-oriented Scenario instrumentation library, however, we should first take a deeper look at the Windows APIs it utilizes, namely QueryPerformanceCounter (QPC), QueryPerformanceFrequency, and the newer QueryThreadCycleTime (QTCT). Using the Scenario library properly and interpreting the measurement data it produces will benefit from a deeper understanding of how these Windows APIs work.
QueryPerformanceCounter.
The QueryPerformanceCounter and QueryPerformanceFrequency APIs were added to Windows beginning with Windows 2000 when the Performance Monitor developers noticed that the granularity of the Windows system clock was inadequate for measuring disk IO response time. As discussed in the first post in this series, the Windows system time of day clock contains time values that display precision to 100 nanosecond timer units. However, the current time of day clock value maintained by the OS is only updated once per quantum, effectively about once every 15.6 milliseconds in current versions of Windows. Using values from the system clock to time disk IOs in early versions of Windows NT, the Logical and Physical DiskAvg Disk sec/Transfer counters in Perfmon often reported zero values whenever an IO operation started and ran to completion within a single tick of the system clock.
The solution in Windows 2000 was the addition of the QueryPerformanceCounter (QPC) and QueryPerformanceFrequency APIs. In Windows 2000, the QueryPerformanceCounter API returned a value from an rdtsc (Read TimeStampCounter) instruction. The TSC is a special hardware performance monitoring counter on Intel-compatible processors that is incremented once per processor clock cycle. On a processor clocked at 2 GHz, for example, one expects two TSC clock cycle ticks to occur every nanosecond. (As will be discussed further, elapsed time measurements that are based on successive rdtsc instructions are far less precise than the processor’s actual instruction execution cycle time. See, for example, this FAQ from the Intel Software Network for an official explanation from the CPU manufacturer.)
Issuing an rdtsc instruction on a processor clocked at 2 GHz returns a clock value considerably more precise than standard Windows timer values delineated in 100 nanosecond units. Since the processor clock speed is hardware-dependent, an additional API call, QueryPerformanceFrequency, was provided to supply the processor clock speed. Once you know the clock frequency, you can translate the output from successive rdtsc instructions into elapsed wall clock time.
When the QueryPerformanceCounter and QueryPerformanceFrequency APIs became generally available in Windows, they were rapidly adopted by other applications in need of a more precise timer facility than the Windows system clock. However, developers using QPC() soon began to notice discrepancies in time measurements taken using the rdtsc instruction due to the way in which the TSC was implemented in the hardware. There were two major discrepancies that were encountered using the rdtsc instruction on Intel-compatible processors, namely:
- lack of synchronization of the TSC across processors, and
- dynamic changes to the TSC clock update interval as a result of the processor entering a lower power state, slowing both the clock rate and the TSC update interval in tandem.
The effect of these TSC anomalies was quickly evident in the disk driver routine responsible for timing the duration of disk operations when running on a multiprocessor, which was especially ironic since QPC was originally built in order to measure disk IO operations accurately. It was possible on a multi-processor for a disk IO that was initiated on CPU 0 and completed on CPU 1 to retrieve a TSC value from CPU 1 for the IO completion that was before the TSC value retrieved when the IO operation was initiated on CPU 0. (The flavor of the difficulties in dealing with rdtsc instructions across multiple CPUs on multiprocessors is captured in this blog entry written by Microsoft SQL Server customer support representatives.)
Of lesser concern, the latency of an rdtsc instruction was surprisably larger than expected for a hardware instruction, on the order of several hundred clock cycles on older Intel hardware. That, and the fact that the rdtsc instruction does not serialize the machine, made QPC() unsuitable for accurately timing, say, micro-benchmarks of less than several thousand instructions.
To deal with the drift in TSC values across multiple processors, in Windows 6 (Vista and Windows Server 2008), the Windows QueryPerformanceCounter function was changed to use one of several external, high resolution timer facilities that are usually available on the machine. These include the High Precision Event Timer (HPET), often also referred to as the Multimedia Timer in Windows, and the external ACPI Power Management Timer, another high resolution timer facility that is independent of the main processor hardware. Because these timer facilities are external to the processors, they are capable of supplying uniform values that are consistent across CPUs. (At the same time, QueryPerformanceFrequency was also re-written to return the frequency of the external timer source.) This change effectively fixed the problems associated with accurate measurements of disk IO response time that were evident in Windows 2000 and Windows XP.
However, using an external timer in the QPC implementation does have one major disadvantage, namely addional latency. If you wrap rdtsc instructions around QPC() calls in Windows Vista, you can typically measure latency on the order of 800 nanoseconds to call the QPC API, or roughly 1 m-second per call. This latency is particularly problematic given how frequently QPC is called. In ETW tracing, for instance, QPC is the default timer routine that is used to generate the event timestamps. When gathering high volume events such as the CSwitch, ReadyThread, ISR and DPC, using QPC for timestamps in ETW generates significant overhead. If one is expecting, say, 20,000 ETW events to be generated per second per processor on a Vista or Windows Server 2008 machine, calling QPC that frequently adds about 2% additional CPU utilization per processor.
(Note: The clock facility used by default in ETW is QPC. Alternatively, one can specify either the low resolution system timer (an option that should only be used in rare instances where the low resolution 15.6 ms clock ticks suffice), or that an rdtsc instruction be issued directly. The rdtsc option is termed the “CPU cycle counter” clock resolution option. See this ETW documentation entry on the MSDN Library for details.)
For the record, ETW is hardly the only application that routinely relies on QPC-based measurements of elapsed time. When the ETW tracing infrastructure is used to gather OS kernel scheduling events, it is probably the most frequent caller of the API on the machine. Other frequent callers of QPC include the disk driver routines mentioned earlier that measure disk IO response time – reported in Perfmon as the Avg. Disk Secs/Transfer counters. These were re-written in Windows 2000 to use QPC. The TCP protocol, which needs to estimate the Round Trip Time (RTT) of packets sent to remote TCP sessions, utilizes QPC for high resolution timing, also. As mentioned earlier, the .NET Framework Stopwatch class allows an application to issue calls to QPC and QPF as an alternative to using the low resolution DateTime.Now() method that access the Windows system clock.
The hardware manufacturers were meanwhile at work making improvements in the TSC hardware to allow it to serve as an efficient and accurate elapsed time counter. Changing the behavior of the TSC when there was a power state change that adjusted the processor’s clock cycle time was the first fix. Newer machines now contain a TSC with a constant tick rate across power management states. (Which is what I like to call it, instead of what Intel officially calls an invariant TSC, terminology I find a little awkward). For details, see the Intel hardware manual entitled Intel® 64 and IA-32 Architectures, Software Developer’s Manual, Volume 3A: System Programming Guide, Part 1. Section 16.9 of this manual discusses the Time-Stamp Counter on Intel hardware and its processor-dependent behavior. The manual reports, “Constant TSC behavior ensures that the duration of each clock tick is uniform and supports the use of the TSC as a wall clock timer even if the processor core changes frequency. This is the architectural behavior moving forward.” [Emphasis added.]
The second problem related to the TSC clocks not being synchronized across processors still exists, but the TSCs for all the processor cores resident on a single multi-core socket do run off the same underlying hardware source, at least. Clock drift across processor cores remains an issue on multi-socket NUMA hardware, but built-in NUMA node thread and interrupt affinity in Windows minimizes some of these concerns, while not eliminating them completely.
Finally, the hardware vendors also report that the latency of the rdtsc instruction has improved significantly in current hardware. Reportedly, the latency of an rdtsc instruction has improved by an order of magnitude on current hardware. Unfortunately, the hardware vendors are reluctant to disclose the specific latency of their rdtsc instructions. The rdtsc instruction still does not serialize the processor instruction execution engine, so rdtsc continues to return timer values that are subject to some amount of processor core “jitter,” imprecision about instruction execution latency simply is a fact of like in superscalar, pipelined machines. The need to maintain a TSC with a constant tick rate across power state changes also results in some loss of precision in rdtsc return values, which is really no big deal since it affects just one or two of the least significant clock resolution timer bits.
The long latency associated with accessing an external clock facility combined, with the rdtsc hardware improvements described above, prompted another round of changes in QueryPerformanceCounter for Windows 7 (and Windows Server 2008 R2). During system initialization, Windows 7 attempts to figure out if the current hardware supports a TSC tick rate that is constant across power state changes. When Windows 7 determines that the processor’s TSC tick rate is constant, the QPC routine is set to issue rdtsc instructions. If it appears that the TSC is not invariant across processor core frequency changes, then QPC will be resolved as in Windows 6 by calling the machine’s external timer. In this fashion, QPC in Windows 7 automatically provides a well-behaved, high resolution hardware clock timer service that uses the low latency rdtsc instruction whenever it is well-behaved.
This ability in Windows 7 to resolve the QPC timer service dynamically based on the current hardware is the reason Windows application developers are advised to stay away from using rdtsc – unless you absolutely know what you are doing – and to use QPC instead. Coding an inline rdtsc instruction is still going to be faster than calling QPC to access the TSC, but I hope I have made clear that using rdtsc directly is not for the faint of heart. One especially has to be aware of older portables in my experience. In newer machines, however, my experience using rdtsc is very positive.
Another good reason to avoid issuing rdtsc instructions in your app is the portability of your app onto ARM machines going forward in Windows 8. On ARM-based tablets and phone, there are a slew of hardware dependencies that we are going to have to start looking out for — which is a good topic for another set of blog posts!
CodeGuru content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.
Environment: Windows NT/2K/XP, MSVC 6
While the Microsoft Win32 API provides functions for dealing with waitable timer objects that provide clients with very high resolution (100 nSec.), it gives no guarantee as to the actual precision of those timers. Because those timers rely solely on the APC mechanism to deliver notification to the timer callbacks and those callbacks can only be invoked when threads are put into an APC alertable state, precision is very, very poor. This is because it relies on the thread scheduler, which usually preforms thread context switching every 10msecs delays. Unfortunately, this is not easily overcome. To a lesser extent, the same problem exists when using high-precision multimedia timers.
The proposed solution uses critical sections that don’t rely on the thread scheduler to perform time slice allocation and thus are enterable as soon as they become available. This solution utilises the multimedia timer’s capabilities to create periodic timers with the highest resolution possible (1msec) and a critical section to provide switching between the threads.
Wait happens because the main program thread gets blocked on the timer critical section while the timer performs counting down cycles. Once this happens, the timer thread leaves the critical section, thus allowing the main thread to continue. Once the main thread enters the imer’s critical section, it leaves it immediately, thus allowing the timer thread to block it again on the next pass.
Because this solution does not involve context switching, the delay is minimal. The error level of this timer is below 5%. The presented class can be utilised as shown in the Delay() method below. This example uses high-resolution timers to calculate the actual delay and average delay error level. One can call this method n-number of times with different delay values to verify the error level of this timer.
Enjoy!
class PreciseTimer
{
public:
PreciseTimer() : mRes(0), toLeave(false), stopCounter(-1)
{
InitializeCriticalSection(&crit);
mRes = timeSetEvent(1, 0, &TimerProc, (DWORD)this,
TIME_PERIODIC);
}
virtual ~PreciseTimer()
{
mRes = timeKillEvent(mRes);
DeleteCriticalSection(&crit);
}
void Wait(int timeout)
{
if ( timeout )
{
stopCounter = timeout;
toLeave = true;
EnterCriticalSection(&crit);
LeaveCriticalSection(&crit);
}
}
static void CALLBACK TimerProc(UINT uiID, UINT uiMsg, DWORD
dwUser, DWORD dw1, DWORD dw2)
{
static volatile bool entered = false;PreciseTimer* pThis = (PreciseTimer*)dwUser;
if ( pThis )
{
if ( !entered && !pThis->toLeave )
{
entered = true;
EnterCriticalSection(&pThis->crit);
}
else if ( pThis->toLeave && pThis->stopCounter == 0 )
{
pThis->toLeave = false;
entered = false;
LeaveCriticalSection(&pThis->crit);
}
else if ( pThis->stopCounter > 0 )
–pThis->stopCounter;
}
}private:
MMRESULT mRes;
CRITICAL_SECTION crit;
volatile bool toLeave;
volatile int stopCounter;
};
void Delay(unsigned int val)
{
static LARGE_INTEGER freq = {0};
static double average = 0;
static int count = 0;++count;
LARGE_INTEGER iStart, iStop;
if ( freq.QuadPart == 0 )
QueryPerformanceFrequency(&freq), freq.QuadPart /= 1000;
double sleep = 0;
QueryPerformanceCounter(&iStart);timer.Wait(val);
QueryPerformanceCounter(&iStop);
sleep = ((double)iStop.QuadPart – (double)iStart.QuadPart)
/ (double)freq.QuadPart;
average += (val ? 100.0*(sleep-val)/(double)val : 0);
printf(“Waited for %6.3f (%ld). error = %5.2fn”, sleep, val,
average/(double)count);
}