TARGET: AN ANALYSIS ENVIRONMENT FOR
AGGREGATED SIMULATION TRAINING DATA
Julia J. Loughran, Eric W. Johnson, Michael Kappel, Marchelle M. Stahl
loughran@ida.org, ejohnson@ida.org, mkappel@ida.org, mstahl@ida.org
Institute for Defense Analyses (IDA)
1801 N. Beauregard Street
Alexandria, VA 22311
KEYWORDS
Analysis, Data Aggregation, Data Visualization, Lessons Learned, Training Feedback, Exercise Playback
ABSTRACT
After Action Review (AAR) systems for distributed simulations typically concentrate on the analysis of a single exercise. The Institute for Defense Analyses (IDA) has developed techniques, analytic processes, and data visualization tools for analyzing simulation data from multiple exercises. This paper will highlight the different analytic approaches and visualization techniques developed under IDA’s Virtual Training Repository Case Study, as well as provide “lessons learned” for the distributed simulation community. The year-long effort was funded under DARPA’s Computer Aided Education and Training Initiative (CAETI) and used the Army’s Virtual Training Program (VTP) at Ft. Knox to supply data for the prototype repository.
The tools and approaches developed under this program are being integrated into TARGET (Training Analysis Repository and Graphical Evaluation Toolset). TARGET uses a reduced set of simulation training data and a graphical user interface to provide a means of displaying different Measures of Performance (MOPs). Graphing tools provide bar charts and event-based timelines, regression analysis tools search for relationships in the data, and a 2D Battle Viewer capability allows users to playback one of more training exercises in user-defined time steps. Color tracks show vehicle formation, as well as, shot, hit, and kill data.
1.0 BACKGROUND
Distributed Interactive Simulation (DIS) provides a virtual world for combat, which in turn, provides an environment for collective military training. DIS also provides the capability to record all the data elements generated from an exercise for replay and/or analysis.
Under an FY96 task, the Institute for Defense Analyses (IDA), conducted the Virtual Training Repository (VTR) Case Study. The purpose of the VTR Case Study was threefold:
¨ Explore the technical issues and requirements for developing a repository of distributed simulation data;
¨ Demonstrate the value of analyzing simulation data collected over time, and
¨ Provide insights into how data collected in the virtual environment can be used to enhance the training strategy and scenarios.
IDA spent one year collecting simulation data from the VTP and developing a simulation analysis tool called “TARGET” (Training Analysis Repository and Graphical Evaluation Toolset). The results of these activities are summarized in this paper which includes a collection of lessons-learned related to archiving, visualizing, and utilizing aggregated simulation data.
1.1 Analytic Value of Simulation Exercises
In 1991, IDA Research Staff Members Richard Schwartz and Marchelle Stahl proposed the idea of creating a repository to store SIMNET and DIS data collected from training exercises and analytic experiments. Schwartz and Stahl believed that the analysis of SIMNET/DIS data would provide powerful insights into how humans behave in (simulated) battlefield situations. Human behavior is represented directly in data generated by human-operated simulators and indirectly in data generated by operator-controlled computer generated forces (CGF). In addition to analyzing human behavior, a repository of SIMNET/DIS could be applied to analyzing the characteristics of DIS simulators and simulations, and/or issues in support of others, including the operational test and acquisition communities.
The use of previously recorded exercises for analysis eliminates the expense of organizing, planning, and generating real-time DIS exercises. The repository concept was first suggested after Schwartz and Stahl conducted a study where previously recorded SIMNET exercises were used to evaluate various non-line-of-site (NLOS) weapon concepts.[1] In the NLOS study, new weapon systems were substituted for weapon systems in previously recorded SIMNET exercises and the effects of these changes were measured.
1.2 Simulation Analysis Data Challenges
In FY95, IDA Research Staff Member Julia Loughran led an internal research and development (IR&D) project focused on learning what distributed simulation could offer analytic communities. Thirty-five IDA researchers were interviewed as part of this study. The study identified some of the pitfalls associated with using DIS for analysis, including the following:
1. The length of time associated with developing an understanding of distributed simulation data (e.g., understanding Protocol Data Units or PDUs).
2. Difficulty in reusing simulation data recorded at other sites (e.g., differences in logger formats and software).
3. Lack of supporting information necessary to understand the purpose and objective of the simulation exercise.
4. Lack of a data repository where analysts can go to find previously recorded exercises.
5. Desire within the analytic community to use the software and analysis tools that are familiar to them and the associated difficulty in getting the simulation data into these formats.
Many of the IDA analysts agreed, however, that having access to data from man-in-the-loop simulations would be beneficial to parts of their analysis.
Based on this study, together with the research performed by Schwartz and Stahl, the concept of a DoD-wide DIS repository evolved. It grew to encompass three kinds of information:
1) DIS exercise data,
2) Supporting material required to understand and correctly interpret the exercise (e.g., the purpose of the exercise, the scenario, the types and number of forces used), and
3) Analysis and visualization tools for use with DIS data (e.g., tools for decoding PDUs into data formats compatible with other analysis software).
Beyond the inherent value of the data stored in such a repository, the repository would provide both simulation data and simulation analysis tools in a consistent format, making them readily available to analysts and other communities.
1.3 Case Study
In order to explore the utility of a DoD-wide DIS repository, Stahl and Loughran wanted to develop a prototype by using data from a military program generating large amounts of simulation data.
In FY96, the Defense Advanced Research Projects Agency (DARPA), under Dr. Kirstie Bellman’s Computer Aided Education and Research Initiative (CAETI), funded IDA to develop the Virtual Training Repository (VTR) prototype. The CAETI program’s objective is to identify advanced computer technologies that can provide significant advances in effectiveness for both kindergarten through 12th grade education and military training. One of the major thrust areas under the CAETI program is the use of virtual environments for collaborative learning.
The VTP was selected to be the source of data for the VTR prototype. The VTP provides structured task-based training, primarily to Army Reserve units and occasionally Active duty units. Structured training refers to the fact that units train on predefined scenarios (called tables) as opposed to creating their own scenarios. The tables are designed to train specific tasks and each table is increasingly difficult. The Army calls this the “crawl-walk-run” approach to training. The VTP trains tank, mechanized, and scout units at the platoon, company, and battalion levels. To limit the scope of the VTR study, only data from offensive armor platoon-level exercises was collected.
Several characteristics of the VTP made it a good source of data for archiving and analysis. These factors included:
¨ Structured training tables that provided comprehensive documentation,
¨ Many replications of the same logged tables providing the basis for developing lessons learned about the VTP, and
¨ A team of Observer/Controllers (O/Cs) who provided feedback to the units and who acted as Subject Matter Experts (SMEs) in helping define Measures of Performance (MOPs).
2.0 DATA ANALYSIS AND VISUALIZATION
Data from the VTP’s offensive armor platoon-level training exercises was collected for a 12-month period. During this time, IDA explored issues related to data collection and reduction, the development of analytic processes, and the exploration of data visualization tools. Although simple conceptually, this process, detailed in Figure 1, was time consuming and prone to errors.
2.1 Data Collection and Reduction
Data collection began in October 1995 and concluded in October 1996. Three kinds of data were collected: SIMNET logger files, radio traffic recordings, and subjective feedback, including assessments and comments from the O/C team. For each table trained, radio traffic and PDUs were recorded by the ModSAF (Modular Semi-Automated Forces) 1.0 logger into one file. One logger file was produced each time a unit performed a table. If a Unit has trouble with a table, it can sometimes be repeated multiple times. Regardless of how many tables were performed, a unit was given a single Take Home Package (THP). A THP contains the O/C’s subjective assessment about the unit’s performance on the tasks trained during a given training session.
The logger file is a binary file created on a Unix workstation. THPs are Excel and Microsoft Word files created on a personal computer (PC) by the O/C responsible for training a particular unit.
The data collected included 243 logger files and 258 THP files. Data reduction and analysis tools were developed for each of these different types of data.
2.1.1 Data Reduction for THPs
The THP data was reduced from approximately 10 megabytes in their Excel format to roughly 1.5 megabytes after being aggregated and stored in a single dBASE file. To reduce the THP data, IDA implemented an Excel macro to parse the files. The macro loops through all THP files in a data directory, loading each file, one by one, into a new aggregated Excel spreadsheet. This spreadsheet is parsed, and for each cell, a formatted record is written to a single ASCII file. This file includes the date, O/C, the table, the task, the subtask, and the O/C’s assessment: train to sustain or train to improve. The ASCII file is then appended to a dBASE table for access by the TARGET system.
2.1.2 Data Reduction of Voice
The digitized voice data on the logger files was difficult to analyze. The voice traffic recordings were noisy and confusing and it seemed unlikely that current word-spotting tools would be of much help in identifying key events (e.g., recognizing a statement like “contact”). IDA realized the importance of this data to the analysis, so some files were replayed and a staff member entered key events into a separate file using TARGET’s event

Figure 1. Virtual Training Repository Analytic Process
capture utility. (Note: The time consuming nature of this process restricted the collection of voice information for every exercise.)
IDA’s event capture utility provides graphical user interface buttons representing various types of events: orders (initial, wedge, come on-line), contact reports, situation/spot reports, calls for fire, radio transmissions, and unidentified communications. When an exercise is replayed using the ModSAF logger, the user clicks on the event button associated with the voice traffic heard. A small dialog box is presented for the user to enter relevant information about the event. For example, the situation/spot reports dialog presents a series of yes/no radio buttons to identify the unit’s accuracy in reporting enemy size, activity and location. The event capture utility writes the logger time stamp when events are identified. After users have found all of the key voice transmissions for an exercise, they select the Write button on the application and an ASCII event file for the exercise is created. The initial set of buttons and the dialog boxes for the events are completely data driven. Thus the event capture utility may be easily reconfigured to capture any set of events and any set of associated information about the events. TARGET’s event capture utility was implemented in C and Motif to be run on Unix platforms.
2.1.3 Data Reduction of Logger Files
Logger data from the VTP exercises was collected and reduced to a standard, intermediary file format. This format, developed by IDA, is the Logged Event Analysis Format (Leaf).[2] The Leaf data reduction process discards unnecessary or unwanted data from the logger and stores in a formatted ASCII file. The data stored in a Leaf file-set includes:
¨ Exercise history information,
¨ A list of the vehicles that participated in an exercise,
¨ The location of each vehicle (including hull and gun direction) at user-specified time increments (the default increment is five seconds),
¨ Information about each shot fired during an exercise and its effect,
¨ Information about the intervisibility between red and blue vehicles.
Because the Leaf standard is at an intermediate level of detail, Leaf files are much smaller than the logger files. In the VTP Case Study, approximately 5 gigabytes of logger data was collected. This data in the Leaf format was only 66 megabytes. In addition, the ASCII format is much easier to interpret than the binary logger files.
After the VTP data was collected and processed (reduced), data manipulation and visualization approaches were explored.
2.2 Standard Bar Charts
Early in the task, we began an iterative process of visualizing the collected training data. Each type of chart, graph or other visual display that was produced was later shown to Subject Matter Experts (SMEs) at Ft. Knox and refined based on their feedback. Initially, we used S-Plus, a comprehensive data analysis and visualization package. Using S-Plus, standard bar charts were generated to analyze both the subjective (THP) and the objective (logger) data. Based on the interest from the Army, a more user-friendly interface to the data was developed in Visual Basic, running on a PC. This toolset, which produces graphs and charts based on user-specified criteria, became the foundation of the TARGET system.
2.2.1 Take Home Package Data
The Army was very interested in analyzing the subjective THP data so IDA developed a user-friendly system for displaying this information in a wide variety of ways. Figure 2 shows the user interface for this toolset. The interface provides an intuitive way to specify conditions for a query based on the date of the exercise (session), O/C, table, task/subtask, and/or O/C assessment. For example, the user might want to see the number of sustain and improve assessments given by all of the O/Cs on a per table basis. This would be done by choosing “Table” for the X axis and “Assessment” for the Y axis as shown in Figure 3. Multiple minicharts may also be created by grouping on a third factor.
The THP data analysis highlighted a number of interesting facts about the VTP, including the following:
¨ Which tasks and tables were being trained most frequently,
¨ Which tasks and tables Units appeared to have trouble with, and
¨ The differences between O/C assessments across tasks and tables.
Figure 2. TARGET’s THP Main Screen

Figure 3. Sample THP Chart
2.2.2 Data Logger Measures of Performance
Leaf data was used together with the S-Plus data analysis package to generate a number of graphs and timeline displays. The displays that were found to be the most useful were later incorporated into TARGET. These charts were used in sessions with the O/Cs and other Army domain experts to help in the process of specifying MOPs. Some of the calculations for MOPs include the following:
¨ Platoon formation calculations (e.g., wedge, line, herringbone),
¨ Range information (e.g., average engagement range per table, range at time of first blue shot/kill),
¨ Turret scanning on a per-tank basis,
¨ Dispersion of fire,
¨ Fratricide, and
¨ Intervisibility calculations.
TARGET includes facilities to produce MOP charts related to the calculations listed above. The user selects a particular MOP to chart and data is retrieved from a MOP dBASE table. The Y axis always gives a count of the number of exercises. For example, Figure 4 presents the number of exercises that fall into a set of ranges between each blue and red vehicle when intervisibility is achieved.

Figure 4. Sample MOP Chart
2.3 Time-Based Analysis
When analyzing simulation exercises, we realized it is hard to look at aggregated measures without the ability to replay a single exercise to show the MOPs relationship to other elements of the exercise. The time and order events occur is important to the analysis and overall understanding of the training event. One MOP of interest to the Army was the dispersion of fire and the time for all tanks in a platoon to take action against the enemy. TARGET’s timeline charts show who is firing at who and how long after the first tank in a platoon fires do the other tanks fire. The hypothesis here is that the sooner all tanks engage in the fight, the more successful they will be. In Figure 5, the X axis shows the time of the battle and the Y axis shows lines for each red and blue vehicle (each of the blue vehicles is depicted by different colors). The diamonds along the timeline show who fired (depicted by the diamond’s color) and the effect of the shot (e.g., open diamonds are a miss, filled-in diamonds a hit, and black triangles a kill).
2.4 Regression Analysis
Regression analysis searches for relationships among the MOPs associated with a given table. A database was constructed for six of the VTP tables; each database consisting of the same format: the rows represent each platoon that has performed the table and the columns are the MOPs for that platoon. MOPs have been derived from all three data sources (voice, logger files, and THPs). Some of the MOPs are designated as outcomes. Objective MOPs include information from the logger, e.g., number of Blue tanks surviving. Subjective MOPs are related to the O/C’s assessments, e.g., a train to sustain or train to improve assessment.
The regression analysis searches for relationships between the outcomes and other MOPs. Figure 6 shows a correlation between the time it takes for a unit to make a contact report and the corresponding time it takes to make a spot report.
Unfortunately, the results of our initial application of regression analysis has not produced particularly interesting lessons learned. IDA will pursue more extensively as part of a 1997 data mining IR&D project. One reason more interesting correlations have not been found in the VTR data is because of the small number of logger files that have a corresponding THP, and voice analysis file.
2.5 2-D Battle Viewer
The Battle Viewer is a TARGET application that provides a way to playback an exercise or set of exercises on a 2D map display. The Battle Viewer is a Visual Basic application and acquires data from a database of reduced logged training exercises. This data is derived from Leaf files that have been transferred into dBASE. Via TARGET’s graphical user interface, the user can select the exercise or exercises they want to replay.
A 2D terrain map of the National Training Center (NTC) is displayed. Routes and shots of the vehicles are overlaid on this display. (Note: The VTP uses the NTC terrain database. Other DIS terrain databases could also be used as part of the Battle Viewer.) The Battle Viewer’s NTC terrain map was generated by reading ModSAF files containing coordinates for various terrain features, such as soil type and road networks. These features are then rendered graphically

Figure 6. Example of Correlation Found Using Regression Analysis

Figure 5. Time-Based Analysis
on a 2D color display. Additionally, grid lines are drawn for orientation. The map can be zoomed or scrolled by the user and rendition of terrain features may be toggled on or off.
A set of buttons allows the user to select a time step and move forward or backward through the exercise. Multiple exercises may be overlaid simultaneously for comparison. This feature may be helpful when trying to understand a training scenario, e.g., determining where on the terrain the blue forces will typically come in contact with the enemy.
Each vehicle’s route is drawn on the map as solid lines. The blue platoon is color coded; each tank appears in a different color. The shots of a vehicle are portrayed as a dotted line in the same color. The result of a shot (hit, miss, or kill) is shown as a color-coded circle at the point of impact. Vehicle markings and types may also be overlaid. All exercise features (routes, shots, impacts, markings, types) may be toggled on or off to simplify the display for analysis. In Figure 7, two exercises have been replayed using the Battle Viewer. In these exercises, the red vehicles start at the upper left and move to the lower right. The blue vehicles start at the mid-right and move left. The engagement occurs in the middle of the map. Dashed lines represent shots and colored circles represent the shots effect.
3.0 APPLICATIONS
An original objective of the VTR was to examine the utility of a repository of simulation exercises. The logical use of the VTR data is for the training and training development communities, but other communities may find this data useful. Ft. Knox will receive the data and software tools that were generated under this task and IDA is exploring other communities that may be able to benefit from this study.
3.1 Training and Training Development Tool
The TARGET tool, in its prototype form, has been delivered to Ft. Knox’s VTP and they are reviewing the interface, MOPs, and displays. Potential ways in which TARGET could be used as a training tool have been discussed. It may be valuable to show units how they have performed on a particular task or table in comparison to all others who have performed the same. In addition, exemplary tables may be used in the unit’s After Action Review to show either a good or bad execution of the table.
The training development community at Ft. Knox may use TARGET to refine the VTP or establish a set of objective matrices for the virtual environment. Currently, a unit proceeds to a more difficult table when an O/C has decided that it should proceed. Objective MOPs, when applied to the VTP tables, may establish a more consistent training regime.

Figure 7. 2D Battle Viewer
The Center for Army Lessons Learned (CALL) and the Army Training Digital Library (ATDL) communities have also been briefed. They are looking at how they might be able to use this data to derive lessons about training in the virtual environment.
3.2 Baseline for New Equipment Analysis
One potential use of the VTR data is to use it as a baseline for the Close Combat Tactical Trainer (CCTT) Quick Start analysis. Data from the VTR could be a good source of information to instantiate the benefits of a higher fidelity virtual environment.
3.3 Data Source for Other Analytic Applications
Other analytical communities, including the acquisition and operational test communities, may find a repository of armor platoon exercises beneficial. Because the data has been captured, reduced, and stored in an intermediate format, it facilitates access and use of the data.
4.0 NEXT STEPS
The VTR Case Study performed under DARPA’s CAETI program will conclude in February 1997 with the publication of a final document describing the project. Approaches for expanding the scope of this research are being discussed within the Army, DARPA, and other communities. These approaches are discussed in the following sections.
4.1 Scaling to Higher Echelons
Because the VTR Case Study focused solely on platoon-level exercises, a natural extension to this effort would be to collect data from company and battalion-level DIS exercises. The work involved in scaling the VTR effort to higher echelons would include defining new MOPs. It may also identify issues related to scalability and performance.
4.2 Methodology Applied to Other Environments
Currently, the VTR has collected data from only the SIMNET environment. The VTR’s scope could be expanding by applying the same data reduction and analysis approach to constructive or live instrumentation system data. Ft. Knox’s VTP uses the Janus constructive simulation wargame for battalion staff training and the Army’s Combat Training Centers (CTCs) collect instrumentation system data. CALL has had discussions with IDA about applying the data reduction approach to the CTC instrumentation system data. CALL is particularly interested in whether this data could be reduced, stored, and replayed in the Leaf format.
4.3 3D Data Visualization and Exercise Playback
Today, low-cost PCs can display 3D graphics that until recently were available only on high-end Unix workstations This is a result of increased performance of microprocessors, combined with the availability of low-cost graphics chips, and the existence of the Virtual Reality Modeling Language (VRML) for creating 3D environments.
The emergence of the World Wide Web opens up new possibilities for enabling remote access to data stored in a central repository. Using funds from an IR&D project, IDA is building a prototype VRML Battle Viewer prototype that allows users at remote sites to replay simulation exercises in 3D.
4.4 Data Mining
Data mining techniques attempt to find patterns in large-scale databases. As mentioned earlier in this paper, IDA will apply new data mining techniques to the data collected in the VTR to possibly identify patterns associated with training in the virtual environment
5.0 LESSONS LEARNED
The VTR effort has contributed to gaining a better understanding of how data from multiple exercises can be archived and analyzed. There are different concerns when analyzing multiple that are not consider when analyzing a single exercise, for instance, in conducting an After Action Review (AAR). Prior to this effort, very few communities archived DIS exercises. The VTR Case Study has identified issues to facilitate DIS data archiving and approaches for making this data more meaningful.
5.1 Lessons Learned for Archiving Data
A number of lessons learned related to archiving training data resulted from IDA’s work on this project. These include the following:
¨ Realization that a project’s success is contingent upon the support from the sponsor and the user. For the VTR, this included the DARPA Program Manager and the Army, particularly the VTP staff.
¨ Archiving required the VTP to change their work practices; file naming conventions became important and funding had to be identified to cover the additional contractor time necessary to log the exercises.
¨ Tension existed between the flexibility needed for good training experiences and the standardization needed for archiving and analysis.
¨ The process of archiving exercises is one that is error prone and requires constant oversight: ensuring correctness in the data is complex and hard.
¨ Identification of MOPS is difficult and requires a good understanding of the domain and access to SMEs.
¨ Analytic results sell the concept of archiving.
There are still a number of issues related to archiving simulation data that this case study did not address, including security, distributed data access, and storage.
5.1.1 Additional Data Needed
One important lesson for archiving simulation training data is that in order to have a complete understanding of the exercises, additional data elements are needed. This data includes:
1. Background data on the units (e.g., training histories, the military operating specialty of each soldier in each position, etc.);
2. The degree of coaching provided by the O/Cs for each table;
3. The proficiency of the opposing forces (OPFOR), a variable set in ModSAF;
4. Automation of voice traffic analysis;
5. Exercise markers to show which exercise (when there are multiple versions of the same exercise) is the most suitable for analysis;
6. Additional PDUs, including: entity state change and event flags with text fields for descriptions.
5.1.2 Lessons for the DIS Community
As the DIS Community begins to embark on standards for the High-Level Architecture (HLA), it is important that they realize the importance of certain factors that will facilitate the analysis of DIS exercises in the future. Some of these factors include the following:
¨ Recognizing the importance of the DIS Community’s Data Logger Interchange File Format (DLIF).
If translation tools existed to convert from various ModSAF formats to DLIF, our tools could simply work from the DLIF files.
¨ Recognizing the importance of the ability to automatically analyze voice traffic (e.g., commercialization of word-spotting tools).
In the future, much information that now occurs only with radio traffic will be digitized. However, research in the area of voice recognition or the addition of an event flag where critical events could be flagged and later detailed in the logger would still be very beneficial.
¨ Recognizing the importance of a standard header file for loggers that could provide the capability to capture data pertinent to the exercise.
This header file could contain key information about the exercise’s purpose, date it occurred, and a point of contact (POC) for the exercise.
A DoD-wide simulation repository would make it possible for analysts to extract meaningful information from data that might otherwise be put to little use. For such a repository to become a reality, the data collection, reduction, and analysis issues discussed in this report need to be addressed. The experience gained with the tools and analysis approaches developed for the VTR Case Study provide a good foundation for future efforts to build larger scale simulation data archives.