Block traces are widely used for system studies, model verifications, and design analyses in both industry and academia. While such traces include detailed block access patterns, existing trace-driven research unfortunately often fails to find true-north due to a lack of runtime contexts such as user idle periods and system delays, which are fundamentally linked to the characteristics of target storage hardware. Thus, we propose TraceTracker, a novel hardware/software co-evaluation method that allows users to reuse a broad range of the existing block traces by keeping most their execution contexts and user scenarios while adjusting them with new system information. Specifically, our TraceTraker’s software evaluation model can infer CPU burst times and user idle periods from old storage traces, whereas its hardware evaluation method remasters the storage traces by interoperating the inferred time information, and updates all inter-arrival times by making them aware of the target storage system. We apply the proposed co-evaluation model to 577 traces, which were collected by servers from different institutions and locations a decade ago, and revive the traces on a high-performance flash-based storage array. You can download the traces here.
Target Block Traces
We reconstruct three workload categories: i) Florida International University (FIU), ii) Microsoft Production Server (MSPS), and iii) Microsoft Research Cambridge (MSRC). In addition, Microsoft Enterprise workloads (Exchange which is exchange server and Enterprise which is TPC-C benchmark) are provided in this Open Storace Trace repository. You can check the details of workloads in below brief description or TraceTracker paper.
Microsoft Production Server (MSPS)
BS (Build server), 24HR (RADIUS server), 24HRS (RADIUS back-end SQL server), DADS (Display Ads Data server), DAP (Display Ads Payload server), LMBE (LiveMaps back-end server), MSNFS (MSN storage file server), CFS (MSN storage metadata server), DDR (Developer tools release server)
Microsoft Research Cambridge (MSRC)
usr (user home), src (source control), hm (HW monitoring), proj (project directory), rsrch (research proj), stg (web staging), web (web/SQL server), wdev (test web server)
Florida International University (FIU)
ikki, casa, madmax, topgun (end-user home directory), webuser (web server for user website), webresearch (apache web server), webmail (department mail server), online (course management system), homes (research group activities), cheetah (CS department mail server), webmail+online (webmail proxy course management)
Trace File Format
There are multiple trace files per workload. For example, 20 numbers of 24HR such as 24HR1, 24HR2, …, 24HR20 exist in Microsoft Production Server. Since the conventional SNIA workloads were collected over several hours, days, or weeks, there are mutliple trace files (check details in SNIA workload papers).
The format of reconstruncted trace file is shown in above figure. There were eight numbers of column such as timestamp, request type, address, request size, access type, inter-arrival time, device time, and idle time.
1) Timestamp [unit = second]: It is not updated yet; same with old disk-based SNIA block trace.
You can simply calculate this by using equation next-timestamp = previous-timestamp + inter-arrival time
2) Request type: There are only two types of request such as read (RS) and write (WS)
3) Address [unit = sector]: Since the old SNIA traces were collected on disk-based storage system, the unit of address is sector. By using this TraceTracker calculate flash-based storage system address which is accessed by page size
4) Request size [unit = sector]: Same with address, unit of request size is sector which is 512B.
5) Access type: There are two types of access pattern such as sequential (seq) and random (rand). Currently, it is simply determined by comparing next-address with previous-address + previous-request size. In other words, if address is strictly subsequent compared to previous address, that request is marked as sequential access pattern.
6) Inter-arrival time: Inter-arrival time is calculated based on blktrace definition which is D2D time.
7) Device time: Like inter-arrival time, device is calculated based on blktrace definition which is D2C time. NOTE that, this value is collected on flash-based storage system
8) Idle time: Idle time is estimated from old disk-based storage traces (SNIA workloads) by using TraceTracker method. By adding this with device time which is collected on flash-based storage system, the inter-arrival time can be determined.