Aviation Safety
NASA's National Aviation Operations Monitoring Service Project Was Designed Appropriately, but Sampling and Other Issues Complicate Data Analysis
Gao ID: GAO-09-112 March 13, 2009
The National Aviation Operations Monitoring Service (NAOMS), begun by the National Aeronautics and Space Administration (NASA) in 1997, aimed to develop a methodology that could be used to survey a wide range of aviation personnel to monitor aviation safety. NASA expected NAOMS surveys to be permanently implemented and to complement existing federal and industry air safety databases by generating ongoing data to track event rates into the future. The project never met these goals and was curtailed in January 2007. GAO was asked to answer these questions: (1) What were the nature and history of NASA's NAOMS project? (2) Was the survey planned, designed, and implemented in accordance with generally accepted survey principles? (3) What steps would make a new survey similar to NAOMS better and more useful? To complete this work, GAO reviewed and analyzed material related to the NAOMS project and interviewed officials from NASA, the Federal Aviation Administration, and the National Transportation Safety Board. GAO also compared the development of the NAOMS survey with guidelines issued from the Office of Management and Budget, and asked external experts to review and assess the survey's design and implementation.
NAOMS was intended to demonstrate the feasibility of using surveys to identify accident precursors and potential safety issues. The project was conceived and designed to provide broad, long-term measures on trends and to measure the effects of new technologies and aviation safety policies. Researchers planned to interview a range of aviation personnel to collect data in order to generate statistically reliable estimates of risks and trends. After planning and development, a field trial, and eventual implementation of the air carrier pilot survey and the development of a smaller survey of general aviation pilots, the project effectively ended when NASA transmitted a Web-based version of the air carrier pilot survey to the Air Line Pilots Association. NAOMS's air carrier pilot survey was planned and designed in accordance with generally accepted survey principles, including its research and development, consultation with stakeholders, memory experiments to enhance the questionnaire, and a large-scale field trial. The survey's sample design and selection also met generally accepted research principles, but there were some limitations, and the survey data may not adequately represent the target population. Sample frame and design decisions to maintain program independence and pilot privacy complicate analysis of NAOMS data. Certain implementation decisions, including extended methodological experiments and data entry issues, also complicate analytical strategies. Also, working groups of aviation stakeholders were convened as part of NAOMS to assess the validity and utility of the data, but these groups never had access to the raw data and were disbanded before achieving consensus. To date, NAOMS data have not been fully analyzed or benchmarked against other data sources. While NAOMS's limitations are not insurmountable, a new survey would require more coherent planning and sampling methods, a cost-benefit analysis, closer collaboration with potential customers, a detailed analysis plan, a reexamination of the sampling strategy, and a detailed project management plan to accommodate concerns inherent in any survey endeavor. As a research and development project, NAOMS was a successful proof of concept with many strong methodological features, but the air carrier pilot survey could not be reinstated without revisions to address some of its methodological limitations. The designers of a new survey would want to supplement NAOMS where it was self-limiting. Alternatively, a newly constituted research team might lead operational, survey, and statistical experts in extensively analyzing existing data to illuminate future projects. In reviewing a draft of this report, NASA reiterated that NAOMS was a research and development project and provided technical comments, which GAO incorporated as appropriate. NASA also expressed concern about protecting NAOMS respondents' confidentiality, a concern GAO shares. However, GAO noted that other agencies have developed mechanisms for releasing sensitive data to appropriate researchers. The Department of Transportation had no comments.
GAO-09-112, Aviation Safety: NASA's National Aviation Operations Monitoring Service Project Was Designed Appropriately, but Sampling and Other Issues Complicate Data Analysis
This is the accessible text file for GAO report number GAO-09-112
entitled 'Aviation Safety: NASA's National Aviation Operations
Monitoring Service Project Was Designed Appropriately, but Sampling and
Other Issues Complicate Data Analysis' which was released on April 9,
2009.
This text file was formatted by the U.S. Government Accountability
Office (GAO) to be accessible to users with visual impairments, as part
of a longer term project to improve GAO products' accessibility. Every
attempt has been made to maintain the structural and data integrity of
the original printed product. Accessibility features, such as text
descriptions of tables, consecutively numbered footnotes placed at the
end of the file, and the text of agency comment letters, are provided
but may not exactly duplicate the presentation or format of the printed
version. The portable document format (PDF) file is an exact electronic
replica of the printed version. We welcome your feedback. Please E-mail
your comments regarding the contents or accessibility features of this
document to Webmaster@gao.gov.
This is a work of the U.S. government and is not subject to copyright
protection in the United States. It may be reproduced and distributed
in its entirety without further permission from GAO. Because this work
may contain copyrighted images or other material, permission from the
copyright holder may be necessary if you wish to reproduce this
material separately.
Report to Congressional Requesters:
United States Government Accountability Office:
GAO:
March 2009:
Aviation Safety:
NASA's National Aviation Operations Monitoring Service Project Was
Designed Appropriately, but Sampling and Other Issues Complicate Data
Analysis:
GAO-09-112:
GAO Highlights:
Highlights of GAO-09-112, a report to congressional requesters.
Why GAO Did This Study:
The National Aviation Operations Monitoring Service (NAOMS), begun by
the National Aeronautics and Space Administration (NASA) in 1997, aimed
to develop a methodology that could be used to survey a wide range of
aviation personnel to monitor aviation safety. NASA expected NAOMS
surveys to be permanently implemented and to complement existing
federal and industry air safety databases by generating ongoing data to
track event rates into the future. The project never met these goals
and was curtailed in January 2007.
GAO was asked to answer these questions: (1) What were the nature and
history of NASA‘s NAOMS project? (2) Was the survey planned, designed,
and implemented in accordance with generally accepted survey
principles? (3) What steps would make a new survey similar to NAOMS
better and more useful?
To complete this work, GAO reviewed and analyzed material related to
the NAOMS project and interviewed officials from NASA, the Federal
Aviation Administration, and the National Transportation Safety Board.
GAO also compared the development of the NAOMS survey with guidelines
issued from the Office of Management and Budget, and asked external
experts to review and assess the survey‘s design and implementation.
What GAO Found:
NAOMS was intended to demonstrate the feasibility of using surveys to
identify accident precursors and potential safety issues. The project
was conceived and designed to provide broad, long-term measures on
trends and to measure the effects of new technologies and aviation
safety policies. Researchers planned to interview a range of aviation
personnel to collect data in order to generate statistically reliable
estimates of risks and trends. After planning and development, a field
trial, and eventual implementation of the air carrier pilot survey and
the development of a smaller survey of general aviation pilots, the
project effectively ended when NASA transmitted a Web-based version of
the air carrier pilot survey to the Air Line Pilots Association.
NAOMS‘s air carrier pilot survey was planned and designed in accordance
with generally accepted survey principles, including its research and
development, consultation with stakeholders, memory experiments to
enhance the questionnaire, and a large-scale field trial. The survey‘s
sample design and selection also met generally accepted research
principles, but there were some limitations, and the survey data may
not adequately represent the target population. Sample frame and design
decisions to maintain program independence and pilot privacy complicate
analysis of NAOMS data. Certain implementation decisions, including
extended methodological experiments and data entry issues, also
complicate analytical strategies. Also, working groups of aviation
stakeholders were convened as part of NAOMS to assess the validity and
utility of the data, but these groups never had access to the raw data
and were disbanded before achieving consensus. To date, NAOMS data have
not been fully analyzed or benchmarked against other data sources.
While NAOMS‘s limitations are not insurmountable, a new survey would
require more coherent planning and sampling methods, a cost-benefit
analysis, closer collaboration with potential customers, a detailed
analysis plan, a reexamination of the sampling strategy, and a detailed
project management plan to accommodate concerns inherent in any survey
endeavor. As a research and development project, NAOMS was a successful
proof of concept with many strong methodological features, but the air
carrier pilot survey could not be reinstated without revisions to
address some of its methodological limitations. The designers of a new
survey would want to supplement NAOMS where it was self-limiting.
Alternatively, a newly constituted research team might lead
operational, survey, and statistical experts in extensively analyzing
existing data to illuminate future projects.
In reviewing a draft of this report, NASA reiterated that NAOMS was a
research and development project and provided technical comments, which
GAO incorporated as appropriate. NASA also expressed concern about
protecting NAOMS respondents‘ confidentiality, a concern GAO shares.
However, GAO noted that other agencies have developed mechanisms for
releasing sensitive data to appropriate researchers. The Department of
Transportation had no comments.
View [hyperlink, http://www.gao.gov/products/GAO-09-112] or key
components. For more information, contact Nancy R. Kingsbury at (202)
512-2700 or kingsburyn@gao.gov, or Gerald L. Dillingham at (202) 512-
2834 or dillinghamg@gao.gov.
[End of section]
Contents:
Letter:
Scope and Methodology:
Results in Brief:
NAOMS Was Intended to Identify Accident Precursors and Potential Safety
Issues:
NAOMS's Planning and Design Were Robust, but Implementation Decisions
Complicate Data Analysis:
A New Survey Would Require Detailed Planning and Revisiting Sampling
Strategies:
Concluding Observations:
Agency Comments and Our Evaluation:
Appendix I: Technical Issues Relating to NAOMS's Development and Data:
Appendix II: Comments from the National Aeronautics and Space
Administration:
Appendix III: GAO Contacts and Staff Acknowledgments:
Bibliography:
Tables:
Table 1: NAOMS Briefings, Presentations, Workshops, and Working Group
Meetings, 1997-2005:
Table 2: Principles We Used to Assess the NAOMS Survey:
Figures:
Figure 1: NAOMS's Original Milestones, Fiscal Year 1997 to
Implementation as a Permanent Survey:
Figure 2: NAOMS's Milestones for Fiscal Years 1997-2007, from
Development to Delivery to an Operating Organization:
Figure 3: The Rationale for NAOMS's Questionnaire Structure:
Figure 4: Example of an Air Carrier Pilot Survey Drill-Down Question:
Figure 5: NAOMS's Preliminary Estimates of Pilot-Reported Flight Hours
and Flight Legs, by Aircraft Size, 2002:
Figure 6: NAOMS Air Carrier Questionnaire Section B, Question ER4 on
Uncommanded Movements:
Figure 7: NAOMS's Preliminary Findings On Pre-and Post-September 11,
2001, Event Rates:
Abbreviations:
ALPA: Air Line Pilots Association:
ASMM: Aviation System Monitoring and Modeling:
ASRS: Aviation Safety Reporting System:
ATO: Air Traffic Organization:
BTS: Bureau of Transportation Statistics:
CAST: Commercial Aviation Safety Team:
CATI: computer-assisted telephone interviewing:
FAA: Federal Aviation Administration:
FOIA: Freedom of Information Act:
GTOW: gross takeoff weight:
NAOMS: National Aviation Operations Monitoring Service:
NAS: national airspace system:
NASA: National Aeronautics and Space Administration:
NTSB: National Transportation Safety Board:
OIG: Office of Inspector General:
OMB: Office of Management and Budget:
[End of section]
United States Government Accountability Office:
Washington, DC 20548:
March 13, 2009:
Congressional Requesters:
The National Aviation Operations Monitoring Service (NAOMS) was a
National Aeronautics and Space Administration (NASA) initiative that
aimed to develop a methodology to survey a wide range of aviation
personnel to monitor safety in the national airspace system (NAS).
[Footnote 1] The foundation for the NAOMS project was President
Clinton's August 1996 White House Commission on Aviation Safety and
Security, whose principal charge was to develop, domestically and
internationally, a strategy to improve aviation safety and security.
[Footnote 2] By interviewing a probability sample of pilots and other
aviation professionals, project staff planned to collect data about the
respondents' experiences and thus make possible statistically reliable
measurements of rates and rate trends on a wide array of types of
safety events in the NAS, from passenger disturbances to engine
failures to bird strikes.[Footnote 3] Part of a larger NASA research
and development initiative on aviation safety, the NAOMS project was to
demonstrate the feasibility of and develop the capacity for using
survey research to measure the occurrence of safety events. NASA
expected surveys developed under NAOMS to complement existing federal
and industry aviation safety databases.[Footnote 4] While NASA
originally intended for NAOMS to collect data regularly from air
carrier and general aviation pilots, air traffic controllers, flight
attendants, and mechanics and to hand off the survey data collection to
a different entity for permanent implementation, the project never met
these goals.
NAOMS was essentially a survey of air carrier pilots, and it stopped
collecting data in 2004.[Footnote 5] However, neither project staff nor
other aviation safety stakeholders ever fully analyzed its data. NAOMS
was curtailed at the end of its first and only decade, when NASA
transferred a Web-based version of its data collection system to the
Air Line Pilots Association (ALPA) in January 2007. Where hope had been
that the NAOMS project would provide a comprehensive, systemwide,
statistically sound survey mechanism for monitoring the performance and
safety of the overall NAS, ALPA did not plan to permanently implement
the air carrier pilot survey as it was designed. The data collection
system was never fully implemented, and its future is uncertain.
Our objective in this report is to answer the following three
questions:
* What were the nature and history of NASA's NAOMS project?
* Was the survey planned, designed, and implemented in accordance with
generally accepted survey principles?
* What steps would make a new survey similar to NAOMS better and more
useful?
Scope and Methodology:
To describe the history and nature of the NAOMS project, we researched,
reviewed, and analyzed related material posted on several NASA Web
sites and provided to us directly by NASA and its contractor for NAOMS.
We reviewed relevant documents on the House of Representatives'
Committee on Science and Technology Web site. We examined relevant
documents produced by the Battelle Memorial Institute (Battelle),
National Academies, and others as well as information produced for the
National Research Council. In addition, we reviewed a number of
relevant reports, articles, correspondence, and fact sheets on the
NAOMS project and air safety. Many of the publicly available materials
we reviewed are named in the bibliography at the end of this report.
To analyze the NAOMS air carrier pilot survey's planning, design, and
implementation (including pretest, interview, and data collection
methods); interviewer training; development of survey questions,
including which safety events to include in the survey; and sampling,
we interviewed officials from NASA, the Federal Aviation Administration
(FAA), and the National Transportation Safety Board (NTSB) and NAOMS
project staff. We also reviewed relevant documents. We discussed the
survey with NAOMS team members to obtain their recollections of the
work, particularly regarding limitations, gaps, and inconsistencies in
the documentation. GAO internal experts in survey research reviewed the
Office of Management and Budget's (OMB) Standards and Guidelines for
Statistical Surveys and derived a number of survey research principles
relevant to assessing the NAOMS survey.[Footnote 6] We compared the
NAOMS survey's design and implementation with these principles.
Although OMB's standards as they are used today were not final until
2006, the vast majority of OMB's guidelines represent long-established,
generally accepted professional survey practices that preceded the 2006
standards by several decades. We also examined the potential risk for
survey error--that is, "errors inherent in the methodology which
inhibit the researchers from obtaining their goals in using surveys" or
"deviations of obtained survey results from those that are true
reflections of the population."[Footnote 7] Survey error could result
from issues related to sampling (including noncoverage of the target
population and problems with the sampling frame), measurement error,
data processing errors, and nonresponse.[Footnote 8]
We asked three external experts to review and assess the NAOMS air
carrier pilot survey's design and implementation as well as
considerations for analysis of collected data. These external reviews
and assessments were conducted independently of our own review
activities. We selected the experts for their overall knowledge and
experience in survey research methodology and, specifically, for their
expertise in measurement (particularly the aspects of memory and
recall), survey administration and management, and sampling and
estimation. The experts included Robert F. Belli, Professor, Department
of Psychology, University of Nebraska, Lincoln, Nebraska; Chester
Bowie, Senior Vice President and Director, Economics, Labor, and
Population Studies, National Opinion Research Center, Bethesda,
Maryland; and Steve Heeringa, Senior Research Scientist at the Survey
Research Center and Director of the Statistical Design Group at the
Institute for Social Research, University of Michigan, Ann Arbor,
Michigan.
To determine what steps or other considerations might improve the
quality and usefulness of a survey like NAOMS if one were to be
implemented in the future, we identified and described methodological
deviations that we found from GAO's guidance and OMB's standards. We
also obtained the views of internal and external experts on how
limitations caused by such deviations might be overcome. We assessed
the potential or known effects of design or implementation limitations
we identified.
We focused our review on the most extensively developed part of the
NAOMS effort, the air carrier pilot survey. We discuss the general
aviation study as it relates to the air carrier survey and overall
project evolution, but we do not focus on its development or
implementation.[Footnote 9] We attempted to identify any problems that
might have prevented the NAOMS survey data from producing meaningful
results, and that might not materially affect the survey results but
could result from accepting the reasonable risk and trade-offs inherent
in any survey research project. We note that limitations may not
necessarily be weaknesses.
We conducted our work from March 2008 to March 2009 in accordance with
generally accepted government auditing standards. Those standards
require that we plan and perform the audit to obtain sufficient,
appropriate evidence to provide a reasonable basis for our findings and
conclusions based on our audit objectives. We believe that the evidence
obtained provides a reasonable basis for our findings and conclusions
based on our audit objectives.
Results in Brief:
The NAOMS project was originally intended to develop a survey
methodology to identify accident precursors and potential safety
issues. The project was conceived and designed in 1997 to provide
broad, long-term measures on trends and to measure the effect of new
technologies and policies on aviation safety. NAOMS was to supplement
other aviation safety systems by interviewing aviation personnel to
collect data that could be used to generate statistically reliable
estimates of risks and trends. The project was a developmental effort
by NASA that was part of a larger aviation safety initiative, and it
aimed to demonstrate the viability of using a survey methodology to
monitor trends in aviation safety. It did not have an investigatory
mission or aim to provide policy responses or interventions. NASA
originally intended that the permanent implementation of surveys
developed under NAOMS would generate ongoing data to track event rates
into the future. Despite initial plans to administer the survey to
pilots, air traffic controllers, flight attendants, and mechanics,
NAOMS focused its development efforts primarily on air carrier pilots.
After planning and development, a field trial, and eventual
implementation of the air carrier pilot survey and a smaller survey of
general aviation pilots, the project effectively ended when NASA
transmitted a Web-based version of the air carrier pilot data
collection system to ALPA in January 2007.
While the NAOMS air carrier pilot survey's planning and design were
robust, implementation decisions complicate data analysis. NASA's
project team planned and developed NAOMS in accordance with generally
accepted survey research principles. The team thoroughly researched the
survey's development; consulted with stakeholders in industry,
government, and academia during the project's conception and evolution;
conducted innovative memory experiments that enhanced the
questionnaire; and conducted a large-scale field trial of air carrier
pilots to answer key questions about data collection and response rate.
The survey's sample design and selection met generally accepted
research principles, with some limitations. For example, NAOMS was
handicapped by its sampling frame and filter to identify air carrier
pilots; while programmatically appropriate, the frame may not have
adequately represented the target population. Furthermore, the use of
sample selection criteria that potentially biased the data, along with
design decisions to protect pilot confidentiality and limited sample
sizes, complicate the development of analytical strategies to account
for operational differences across aircraft of different sizes and for
the potential for multiple pilots to witness the same event. Similarly,
implementation decisions met many important survey research principles
but also complicate analysis of NAOMS data. The team did not decide on
an optimal recall period for the questionnaire until approximately a
quarter of the way into the final survey; additional analysis would be
required to determine whether the data from different recall periods
are sufficiently similar to be combined. Interviewers were experienced,
and the survey attained high completion rates. However, interviewers'
skill could not overcome challenges created by problematic questions or
data entry issues. Working groups of aviation stakeholders, never
having had access to the raw NAOMS data, were disbanded before
achieving consensus on the validity and utility of these data;
consequently, data validation efforts for NAOMS were limited primarily
to preliminary assessments to gauge face validity.[Footnote 10]
Inadequate records preclude the ability to leverage information from
the sample when analyzing NAOMS data and hinder evaluation of the
project's management and goal attainment.
A new survey would require more coherent planning and sampling methods
linked to analytical goals. Sufficient survey methodology literature
and documentation on NAOMS's memory experiments are available to
conduct another survey of NAOMS's kind with similarly strong survey
development techniques. The project's limitations are not
insurmountable, and a future effort could successfully go forward from
where NAOMS ended. Researchers would benefit from a cost-benefit
analysis to ensure that a survey like NAOMS could cost-effectively
generate essential safety information. Experimentation and testing that
the NAOMS team conducted could provide an effective foundation from
which to construct and test a new questionnaire. Closer collaboration
with potential customers to formally and specifically codify the
expected uses of the data would help ensure the data's utility.
Similarly, a detailed analysis plan specifying any likely adjustments
or weights, written in concert with the questionnaire, would help
ensure that data could be appropriately analyzed. For example,
researchers might reconsider the balance between confidentiality and
the potential benefits of a questionnaire that allowed pilots to link
reported events to particular aircraft and to identify aircraft they
flew as air carrier pilots and in other capacities. Researchers should
revisit sampling strategies to ensure that the selected frame was the
most cost-effective way of sufficiently identifying the target
population, and that potential biases could be remedied before or after
data analysis. Finally, a detailed project management plan would help
researchers accommodate the risks and trade-offs inherent in any survey
endeavor without jeopardizing eventual analysis of the data.
Overall, we concluded that as a research and development project, NAOMS
was a successful proof of concept, with many strong methodological
features. For example, in using a probability sample and asking about
experiences rather than opinions, by and large, NAOMS satisfied its
stakeholders' goal of moving beyond accident-driven safety policy.
Despite having successfully demonstrated the feasibility of using a
survey to collect safety information from air carrier and general
aviation pilots, the NAOMS project never met its goal of collecting
data on an ongoing basis from a full range of aviation personnel,
including helicopter pilots, air traffic controllers, flight
attendants, and mechanics. While NASA eventually conveyed an air
carrier pilot survey data collection operation to another entity, the
project fell short of attaining permanent implementation of the
original survey to track event rates into the future. NAOMS was
essentially a survey of air carrier pilots that stopped data collection
in 2004, and it could not be reinstated without revisions to address
certain methodological limitations. NAOMS data were never fully
analyzed, and, depending on the research objective, the existing data
would require multiple adjustments for proper analysis. Although
potentially useful for historic analysis, these data are limited in
their ability to provide insight into the current health of the NAS.
While NAOMS's design, data collection methods, and implementation were
well-intentioned and strong in many respects, the designers of a new
survey would want to supplement NAOMS where it was self-limiting.
Alternatively, a newly constituted research team might lead
operational, survey, and statistical experts in extensively analyzing
existing data to illuminate future projects of the same kind.
We provided NASA and the Department of Transportation with drafts of
this report for their review and comment. NASA reiterated that NAOMS
was a research and development project and provided technical comments,
which we incorporated as appropriate. NASA also expressed concern about
protecting NAOMS respondents' confidentiality, a concern we share.
However, we noted that other agencies have developed mechanisms for
releasing sensitive data to appropriate researchers. The Department of
Transportation had no comments. We also provided a draft of this report
to Battelle (NASA's contractor for NAOMS) and the survey methodologist
for NAOMS for their review. Battelle provided no comments on the draft
report. The survey methodologist reported that he found the draft
report to be objective and detailed, and that he believed it will
contribute to the public debate on NAOMS. He also provided technical
clarifications, which we incorporated into the report as appropriate.
NAOMS Was Intended to Identify Accident Precursors and Potential Safety
Issues:
The NAOMS project was conceived and designed in 1997 to provide broad,
long-term measures on trends and to measure the effect of new
technologies and policies on aviation safety. Following the 1996
formation of the White House Commission on Aviation Safety and
Security, and the commission's 1997 report to the President committing
the government and industry to "establishing a national goal to reduce
the aviation fatal accident rate by a factor of five within ten years
and conducting safety research to support that goal," NASA worked with
FAA and NTSB to set up the Aviation Safety Investment Strategy Team
within NASA.[Footnote 11] This team organized workshops, examined
options, and recommended a strategy for improving aviation safety and
security. One of its recommendations led to NASA's Aviation System
Monitoring and Modeling (ASMM) project, a program to identify existing
accident precursors in the aviation system and to forecast and identify
potential safety issues to guide the development of safety technology.
[Footnote 12]
ASMM, within NASA's Aviation Safety and Security Program, was to
provide systemwide analytic tools for identifying and correcting the
predisposing conditions of accidents and to provide methodologies,
computational tools, and infrastructure to help experts make the best
possible decisions. ASMM was expected to accomplish this by, among
other things:
* intramural monitoring, providing air carriers and air traffic control
facilities with tools for monitoring their own performance and safety
within their own organizations, and:
* extramural monitoring, providing a comprehensive, systemwide,
statistically sound survey mechanism for monitoring the performance and
safety of the overall National Air Transportation System by seeking the
perspectives of flight crews, air traffic controllers, cabin crews,
mechanics, and other frontline operators (NAOMS was developed as the
primary mechanism for collecting this information).
Agencies, airlines, and other private organizations had realized that
quantitative and anecdotal information they had been collecting could
not be used to calculate statistically reliable risk levels. The
project team identified eight major aviation safety data sources that
were available when NAOMS was created.[Footnote 13] For example, flight
operational quality assurance data could have helped in deriving
statistically reliable estimates from digital measurements of flight
parameters, but these data do not cover all airlines or include
information on human cognition or affect. Another dataset was from the
Aviation Safety Reporting System (ASRS), which for 30 years had been
successfully collecting information from pilots, controllers,
mechanics, and other operating personnel about human behavior that
resulted in unsafe occurrences or hazardous situations.[Footnote 14]
However, because ASRS reports are submitted voluntarily, the resulting
data cannot be used to generate reliable rate estimates. Under ASRS,
pilots describe events briefly by mail or on NASA's ASRS Web site. NASA
reviews each report and enters detailed information about the events
into an anonymous database that it maintains. According to the ASRS
Director, the system is subject to volatility in reporting, as in 2006,
when the data witnessed a spike in reports of wrong runway use
following a fatal accident in Kentucky, where pilots turned onto a
taxiway that was too short for their aircraft to attain lift-off speed.
[Footnote 15]
Also, ASRS is not statistically generalizable. Although it does not
constrain the types of events that can be reported, ASRS reporting is
voluntary and unlikely to cover the universe of safety events, and it
cannot be used to calculate trends. To complement this system and other
safety databases, the NAOMS project was to interview a statistical
sample of professionals participating in the air transportation system,
including pilots, about their experiences. Data from the interviews
were to enable statistically reliable measurements of rates and rate
trends for a wide array of types of safety events, such as the
professionals experiencing fire in the cargo or passenger compartment
or encountering severe turbulence in clear air, collisions with birds,
airframe icing, and total engine failure. As the project evolved, the
NAOMS researchers decided to deemphasize NAOMS's potential to calculate
rates in isolation, instead highlighting the project's primary
capability to identify trends worthy of investigation, thereby
complementing other data sources. The premise of the NAOMS project was
that aviation personnel were the best source of information on day-to-
day, safety-related events. In measuring the occurrence of safety
incidents that might increase the risk of an accident, rather than
accidents themselves, the project would serve a monitoring role rather
than an investigative role. Instead of directly informing policy
interventions, NASA expected that trends seen in the NAOMS data would
point aviation safety experts toward what to examine in other data
systems. However, to date, the accuracy of rate and trend estimates
based on NAOMS data has not been established.
NASA appointed two researchers with aviation safety experience to lead
a project team in developing surveys for NAOMS as a part of ASMM. The
researchers contracted with Battelle to administer the project.
Battelle, in turn, subcontracted with experts in survey methodology and
aviation safety to help with questionnaire construction and project
execution.
NASA housed the project within the external monitoring aspect of the
ASMM program, which aimed to develop a comprehensive survey methodology
for monitoring the overall state of the NAS that could, on
implementation, provide aviation decision makers with regular,
accurate, and insightful measures of the system's health, performance,
and safety. ASMM's plan discussed the importance of developing surveys
for NAOMS with methodological rigor, noting that the success of NAOMS
depended at least on the:
"1) plausibility and understandability of NAOMS statistics (e.g.,
reasonable and reliable representation of the relative frequencies with
which unwanted events occur),
"2) stability and interpretability of NAOMS statistical trends,
"3) sensitivity to industry concerns about data misuse, and:
"4) timely and appropriate disclosures of NAOMS findings."[Footnote 16]
A primary objective of NAOMS was to demonstrate that surveys of
personnel from all aspects of the aviation community could be cost-
effectively implemented to help develop a full and reliable view of the
NAS. NASA also sought to find a permanent "home" for the surveys,
having planned to develop "scientific methodologies to maximize the
useful information and minimize the cost, but not...provide for
permanent service" or funding for NAOMS.[Footnote 17]
That is, NASA intended the NAOMS project to collect data continually
from air carrier and general aviation pilots, helicopter pilots, air
traffic controllers, flight attendants, and mechanics. It sought to
design a permanent survey data collection operation that, once
implemented, could generate ongoing data to track event rates into the
future (see figure 1). NASA was to conduct the research and development
steps necessary to demonstrate a survey methodology that would
quantitatively measure aviation safety throughout the NAS, but it
expected that a different organization, possibly FAA, would permanently
implement the surveys NASA developed.
Figure 1: NAOMS's Original Milestones, Fiscal Year 1997 to
Implementation as a Permanent Survey:
[Refer to PDF for image: illustration]
FY97: Briefings to Aviation Safety Decision Makers;
FY98: NAOMS Concept Presented at NASA Data Analysis & Monitoring
Workshop; Methodological & Field Research;
FY99: NAOMS Workshop;
FY00: Field Trial; Air Carrier Survey Implemented;
FY01: ATC, Cabin Crew and Mechanic Surveys Implemented;
FY02: General Aviation Survey Implemented;
FY03: System-wide Risk Assessment Demonstrated;
FY04: Permanent Survey Implemented.
Source: Linda Connell, NAOMS Workshop: National Aviation Operations
Monitoring Service (NAOMS) (Washington, D.C.: NASA, Mar. 1, 2000), 113.
[End of figure]
NASA's project leaders outlined these objectives in briefings,
presentations, workshops, and meetings as they explained the project's
concept and progress (see table 1). The NAOMS team briefed officials
overseeing the ASRS project, for example, on NAOMS's concept as early
as 1997. In 2005, the team showed the Commercial Aviation Safety Team
(CAST) how the NAOMS air carrier pilot survey could help develop
metrics to assess the effectiveness of safety interventions.[Footnote
18]
Table 1: NAOMS Briefings, Presentations, Workshops, and Working Group
Meetings, 1997-2005:
Year: 1997;
Date: [A];
Topic or title: Concept for Monitoring;
Audience: NASA Aviation Safety Reporting System Advisory Committee;
Place: [A].
Year: 1997;
Date: [A];
Topic or title: Review of Concept for Monitoring;
Audience: International workshop participants at NASA headquarters;
Place: Washington, D.C.
Year: 1998;
Date: [A];
Topic or title: Monitoring Concept Described;
Audience: Office of System Safety, FAA;
Place: [A].
Year: 1998;
Date: March 5;
Topic or title: Creation of NAOMS: Proposed Phase 1, A Monitoring
Proposal;
Audience: Flight Safety Foundation Icarus Committee Working Group on
Flight Operational Risk Assessment;
Place: Washington, D.C.
Year: 1998;
Date: November 13;
Topic or title: Development and Proof of Concept;
Audience: NASA Aviation Safety Reporting System Advisory Subcommittee;
Place: Moffett Field, California.
Year: 1999;
Date: May 11;
Topic or title: Program Concept and Methodology Workshop;
Audience: FAA and other government agencies, and aviation industry
groups;
Place: Alexandria, Virginia.
Year: 2000;
Date: January 26;
Topic or title: Program Overview; Partial Field Trial Results;
Audience: Aviation Specialty Corporation;
Place: [A].
Year: 2000;
Date: March 1;
Topic or title: Workshop: Field Trial Results and Methodology;
Audience: FAA and other government agencies and aviation industry
groups;
Place: Washington, D.C.
Year: 2002;
Date: August 28;
Topic or title: In-Close Approach Changes Level 2 Milestone Workshop;
Audience: NASA Ames and ICAC contractors;
Place: [A].
Year: 2002;
Date: December 5;
Topic or title: Program Overview: Preliminary Results;
Audience: Aviation Safety and Security Program Office, NASA Langley
Research Center;
Place: Hampton, Virginia.
Year: 2003;
Date: April 9;
Topic or title: Program Overview and Preliminary Results;
Audience: FAA;
Place: Washington, D.C.
Year: 2003;
Date: May 7;
Topic or title: Year: Program Review;
Audience: National Research Council Review Committee;
Place: Moffett Field, California.
Year: 2003;
Date: August 5;
Topic or title: Overview and Status;
Audience: FAA and Joint Implementation Measurement Data Analysis Team
of CAST;
Place: Newport, Rhode Island.
Year: 2003;
Date: December 18;
Topic or title: Project Overview: Background, Approach, Development,
Methodology, and Current Status;
Audience: NAOMS Working Group 1;
Place: Seattle, Washington.
Year: 2004;
Date: [A];
Topic or title: Survey Methodology and Design Decisions;
Audience: NTSB;
Place: Washington, D.C.
Year: 2004;
Date: May 5;
Topic or title: Project Status and Results Review;
Audience: NAOMS Working Group 2;
Place: Year: Washington, D.C.
Year: 2004;
Date: June 16;
Topic or title: Construction of Joint Implementation Measurement Data
Analysis Team, Air Carrier Questionnaire Section C;
Audience: Joint Implementation Measurement Data Analysis Team of CAST;
Place: San Francisco, California.
Year: 2004;
Date: September 1;
Topic or title: Program Overview, Air Carrier Questionnaire, Section C,
In-Close Approach Changes Results;
Audience: Air Traffic Organization, FAA;
Place: Year: Washington, D.C.
Year: 2004;
Date: September 8;
Topic or title: Project Overview;
Audience: FAA La Pointe Technical Center;
Place: Mountain View, California.
Year: 2005;
Date: January 26 and 28;
Topic or title: Joint Implementation Measurement Data Analysis Team,
Air Carrier Questionnaire Section C Results;
Audience: Joint Implementation Measurement Data Analysis Team of CAST;
Place: [A].
Source: GAO.
Note: We found no information on briefings, presentations, workshops,
or meetings in 2001.
[A] We were unable to determine the missing data in the table.
[End of table]
Another early presentation, in March 1998, demonstrated NAOMS's concept
and goals while spelling out in detail the project's phase one. Project
staff planned to profile and summarize participant demographics in a
technical document, develop a preliminary statistical design, identify
high-value survey topics, incorporate these topics into a draft survey
instrument, and analyze and validate the survey design to refine the
survey instrument.[Footnote 19] The presentation delineated four
distinct project phases:
* develop the methodology, while engaging stakeholder support;
* conduct a test survey to prove the concept;
* implement the full nationwide survey incrementally; and:
* hand off the instrument to an organization interested in operating it
over the long term.[Footnote 20]
Project staff were later to describe the first two stages as one
"methods development" phase. Figure 2 outlines the completion of these
phases as expressed first in 1997 briefings to aviation safety decision
makers in the development stage to the delivery of NAOMS's data
collection system to ALPA in January 2007. The figure reflects changes
in the NAOMS project resulting from NASA's decision to halt development
of the full array of surveys indicated in figure 1. By 2004, which was
the original target date for permanent implementation of surveys, the
team had been able to develop and begin only the pilot surveys (both
air carrier and general aviation pilots), not those for other personnel
as initially was planned.
Figure 2: NAOMS's Milestones for Fiscal Years 1997-2007, from
Development to Delivery to an Operating Organization:
[Refer to PDF for image: illustration]
NAOMS Development Timeline:
Development Phase:
FY97:
* Briefings to Aviation Safety Decision Makers;
* NAOMS Concept Presented at NASA Data Analysis & Monitoring Workshop.
FY98:
* Methodological & Field Research.
FY99:
* Pre-Field Trial Workshop.
FY00:
* Field Trial Data Collection;
* Post-Field Trial workshop.
Operational Phase:
FY01:
* Air Carrier Survey (through FY05).
FY02:
* GA Survey.
FY03:
* GA Survey Ends.
FY04:
* JIMDAT Baseline Measures;
* NAOMS Data Collection Concludes.
Handoff Phase:
FY05:
* Development of a NAOMS Web Survey Implementation;
FY06:
* Handed off to ALPA-CAST.
Source: Robert S. Dodd, ’NAOMS Development and Application,“
presentation to the Aeronautics and Space Engineering Board, National
Academies (Washington, D.C.: June 9, 2008), 5.
[End of figure]
As shown in figures 1 and 2, NASA originally planned to end funding in
2004 but extended it to 2007 to "properly fund transition of the data"
to the larger safety community.[Footnote 21] A Web-based version of the
air carrier pilot survey and related information were handed off to
ALPA in January 2007.
The Survey's Development: Feasibility, Methodology, and Field Testing:
In 1998, members of the NAOMS team--NASA managers, survey
methodologists, experts in survey implementation, aviation safety
analysts, and statisticians working with support service contractors
from Battelle--began to study long-term surveys that had helped support
government policymaking since at least 1948. The team intended for
NAOMS to employ the best practices of surveys used in other policy
areas providing comparable benefits. The team members reviewed an
extensive variety of surveys used for national estimates and for risk
monitoring. These surveys included the Centers for Disease Control and
Prevention's Behavioral Risk Factor Surveillance System, which provides
information on, among others, rates of smoking, exercise, and seat-belt
use, and the Bureau of Labor Statistics' Consumer Expenditure Survey,
which provides data to construct the consumer price index. The team's
aim was to learn how the NAOMS survey could measure actual experiences.
The NAOMS team came to the conviction that the survey should collect
the information they needed from the people:
"who were watching the operation of the aviation system first-hand and
who knew what was happening in the field...[and that] this use of the
survey method was in keeping with many other long-term federally funded
survey projects that provide valuable information to monitor public
risk, identify sources of risk that could be minimized, identify upward
or downward trends in specific risk areas, to call attention to
successes, identify areas needing improvement, and thereby save
lives...."[Footnote 22]
The team decided that in a well-designed and implemented survey
process,
"only the aviation systems operators--its pilots, air traffic
controllers, mechanics, flight attendants, and others--[had] the
situational awareness and breadth of understanding to measure and track
the frequency of unwanted safety events and to provide insights on the
dynamics of the safety events they observe. The challenge was to
collect these data in a systematic and objective manner."[Footnote 23]
In 1999, the team established a plan of action that included a
feasibility assessment, with a literature review, to study
methodological issues, estimate sample size requirements, and enlist
the support of the aviation community. The assessment also planned for
research that included a series of focus groups to help determine
likely responses to a survey and a study of how pilots recall
experiences and events. It also outlined a field trial to begin in
fiscal year 1999 and, finally, a staged implementation, beginning with
air carrier pilots, progressing to a regular series of surveys, and
moving on to other aviation constituencies.[Footnote 24]
For the feasibility assessment, NAOMS researchers consulted with
industry and government safety groups, including members of CAST and
FAA and analysts with ASRS. They reviewed aviation event databases such
as ASRS, the National Airspace Information Monitoring System, and
Bureau of Transportation Statistics (BTS) data on air carrier traffic.
The team drew on information from this research, as well as team
members' own expertise, to construct and revise a preliminary
questionnaire for air carrier pilots.
After the feasibility assessment, the team conducted a large-scale
field trial from November 1999 to February 2000 to help resolve the
following issues about the air carrier pilot questionnaire:
"What risk-elevating events should we ask the pilots to count?
"How shall we gather the information from pilots--written
questionnaires, telephone interviews, or face-to-face interviews?
"How far back in the past can we ask pilots to remember without
reducing the accuracy of their recollections?
"In what order should the events be asked about in the
questionnaire?"[Footnote 25]
As a result of the 600 air carrier pilot interviews conducted for the
field trial, the researchers decided that telephone interviewing was
sufficiently cost-effective and had a high enough response rate to use
in the final survey. The field trial had tested question content that
derived from previous research and had experimented with the order of
different sections of the survey. The field trial gave the team
confidence that the NAOMS survey was a viable means of monitoring
safety information. However, the field trial did not fully resolve
questions about the period of time that would best accommodate pilots'
ability to recall their experiences or about the best data collection
strategy.
Getting the Survey Under Way:
The team had decided before the field trial that the NAOMS
questionnaire content and structure were to be governed by (1) measures
of respondent risk exposure, such as the numbers of flight hours and
flight legs flown; (2) estimates of the numbers of safety incidents and
related unwanted events respondents experienced during the recall
period; (3) answers to questions on special focus topics stakeholders
requested; and (4) feedback on the quality of the questions and the
overall survey process.[Footnote 26]
After the team analyzed the data from the field trial and conducted
further extensive research, it decided that the NAOMS survey should
address as many safety events identified during its preliminary
research as practical, that its questions should be ordered to match
clusters from the field trial based on causes and phases of flight, and
that a sample size of approximately 8,000 to 9,000 interviews per year
would provide sufficient sensitivity to detect changes in rates. The
team structured the survey in four sections in accordance with their
original expectations of what the survey should cover. NAOMS's project
managers explained the rationale for this structure, shown in figure 3,
in a 2004 presentation to FAA's Air Traffic Organization (ATO).
[Footnote 27]
Figure 3: The Rationale for NAOMS's Questionnaire Structure:
[Refer to PDF for image: illustration]
Questionnaire Structure:
* Section A: Operational Exposure:
– Measures operational activity levels (risk exposure).
* Section B: Safety Event Experiences (Core Questions):
– Counts standard event frequencies with long-term trends in mind.
* Section C: Focus Topics:
– Provides a moving ’searchlight“ that can be redirected as needed to
topics of interest.
* Section D: Participant Feedback:
– Seeks continuing feedback on the validity of the NAOMS survey process
and survey questions.
Source: Mary Connors and Linda Connell, ’National Aviation Operations
Monitoring Service Project Overview: Background, Development, Approach,
and Current Status,“ presentation to the Air Traffic Organization
(Washington, D.C.: NASA, Sept. 1, 2004), 12.
[End of figure]
NASA's contractors began computer-assisted telephone interviewing
(CATI) data collection for the full air carrier pilot survey in March
2001. Using a sample that was drawn quarterly from a subset of a
publicly available FAA database, interviewers surveyed pilots regularly
over approximately 45 months of data collection. The survey methodology
changed during the first few months of the survey: that is, researchers
settled on which recall period to use and a cross-sectional data
collection strategy approximately 1 year after the operational survey
began. Interviewing ended in December 2004, by which time more than
25,000 air carrier pilot interviews had been completed.
In addition to the air carrier pilot survey, NAOMS researchers explored
elements of the original action plan for the project. They conducted
focus groups with air traffic controllers and drafted preliminary
survey questions. Building on research done for the main air carrier
survey, NAOMS staff also developed and implemented a survey for general
aviation pilots that ran for approximately 9 months in late 2002 and
early 2003. However, by the end of 2002, NASA realized that it would
not be feasible to expand the project to other aviation personnel under
its initial plan to hand off the surveys for permanent service at the
end of fiscal year 2004. NAOMS staff focused their attention on
establishing the NAOMS air carrier pilot survey as a permanent service,
noting that the system was still under development and that its
benefits had not been fully demonstrated. They suggested that it would
be difficult to find an organization that would be willing to commit to
the financial and developmental resources necessary to manage an
uncompleted project.
The Survey's Handoff and Results, and the NASA Inspector General's
Review:
NASA's documentation had repeatedly shown that the NAOMS project's
purpose was "the development of methodologies for collecting aviation
safety data," with their eventual transition "to the larger safety
community" for permanent implementation. NAOMS had met its key
objectives of demonstrating a survey methodology to quantitatively
measure aviation safety and track trends in event rates by the end of
2004, when original funding for the project had been scheduled to end.
Seeking to ensure the future of the survey while streamlining the
project, project staff tested whether Web-based data collection was a
cost-effective measure.
NASA established an agreement with ALPA, which planned to initiate a
Web-based version of the air carrier pilot survey on behalf of CAST and
its Joint Implementation Measurement Data Analysis Team.[Footnote 28]
NASA extended NAOMS original funding into 2007 to accommodate the
transition to ALPA.[Footnote 29] NASA conducted training sessions for
ALPA staff on the NAOMS Web application in early fiscal year 2007 and
conveyed the operational data collection system to ALPA in January
2007. However, ALPA never fully implemented the Web survey. According
to an ALPA official in late 2007, the organization was exploring how to
modify the survey before implementing it.[Footnote 30] Although ALPA
never had access to existing NAOMS data, this official also expressed
uncertainty about what should be done with the existing data. The
project effectively ended at the point of transfer.
In October 2007, following NASA's rejection of an Associated Press
reporter's request for NAOMS data under the Freedom of Information Act
(FOIA), the House Committee on Science and Technology held hearings
about the development and execution of NAOMS. NASA's Office of
Inspector General (OIG) subsequently initiated an investigation into
NAOMS's project management. The OIG's March 2008 report, summarizing
the history and status of the NAOMS project, found that NAOMS had
achieved many of its objectives. Specifically, NAOMS had:
"demonstrated a survey methodology to quantitatively measure aviation
safety, tracked trends in event rates over time, identified effects of
new procedures introduced into the operating environment, and generated
interest and acceptance of NAOMS by some of the aviation community as
described in the Project Plans."[Footnote 31]
The OIG report identified several shortcomings of the project,
including that (1) the "contracting officers did not adequately specify
project requirements" or "hold Battelle responsible for completing the
NAOMS Project as designed or proposed"; (2) the "contractor
underestimated the level of effort required to design and implement the
NAOMS survey"; (3) "NASA had no formal agreement in place for the
transfer and permanent service of NAOMS"; and (4) "NAOMS working groups
failed to achieve their objectives of validating the survey data and
gaining consensus among aviation safety stakeholders about what NAOMS
survey data should be released."[Footnote 32] An additional deficiency,
according to the OIG, was that, as of February 2008, "NASA had not
published an analysis of the NAOMS data nor adequately publicized the
details of the NAOMS Project and its primary purpose as a contributor
to the ASMM Project."[Footnote 33]
NAOMS's Planning and Design Were Robust, but Implementation Decisions
Complicate Data Analysis:
We found that, overall, the NAOMS project followed generally accepted
survey design and implementation principles, but decisions made in
developing and executing the air carrier pilot survey complicate data
analysis. We discuss in this report each of the three major stages of
survey development--planning and design, sample design and selection,
and implementation--in turn. While we document the many strengths of
the NAOMS survey and its evolution, we also discuss limitations that
raise the risk of potential errors in various aspects of the survey's
results. We also note where design, sampling, and implementation
decisions directly or potentially affect the analysis and
interpretation of NAOMS's data.
Table 2 outlines the generally accepted survey research principles,
derived in part from OMB guidelines, that we used in our assessment.
The table is a guide primarily to how we answered our second question
on the strengths and limitations of the design, sampling, and
implementation of the NAOMS survey. However, we caution that survey
development is not a linear process; steps appearing in one section of
table 2 may also apply to other aspects of the project. Direct
fulfillment of each step, while good practice, is not sufficient to
ensure quality. Additional related practices, and the interaction of
various steps throughout the course of project development and
implementation, are essential to a successful survey effort. Table 2
should be viewed not as a simple checklist of survey requirements, but
as guiding principles that underlie the narrative of our report and our
overall evaluation of the NAOMS survey.
Table 2: Principles We Used to Assess the NAOMS Survey:
Survey element: Planning and design;
Principles:
* The survey had a clear rationale?
* A review of existing studies, surveys, reports, or other literature
informed the survey?
* Potential users were consulted to identify their requirements and
expectations?
* The scope of survey data items was defined and justified?
* A management plan preserved the survey data and documentation of
survey records?
* The design identified the frequency and timing of data collection?
* The design identified survey data collection methods?
* The questionnaire design minimized respondent burden and maximized
data quality?
* The questionnaire was pretested and all components of the final
survey system were field tested?
* The design planned for the highest practical rates of response before
data collection?
* Components of the survey were tested using focus groups, cognitive
testing, and usability testing, prior to a field test of the survey?
Survey element: Sample design and selection;
Principles: The proposed target population was clearly identified?
* The sample frame and design were appropriate?
* Sample design coverage issues were described and handled
appropriately?
* Sample size calculations were appropriate?
* Potential nonsampling errors were estimated?
Survey element: Implementation;
Principles:
* Sample administration and disposition monitoring were appropriate?
* Appropriate steps were taken to communicate confidentiality to
respondents and to preserve the confidentiality of their data?
* The respondents were provided with appropriate informational
materials?
* Response maximization efforts, including period of data collection
and interviewer training, were appropriate?
* Steps to ensure the quality of the data were appropriate?
* Appropriate checks and edits on the data collection system mitigated
errors?
* Actions taken during data editing or other changes to the data were
documented?
* Survey response rates were calculated using standard formulas?
* Nonresponse analysis was conducted appropriately?
* The survey system documentation included all information necessary to
analyze the data appropriately?
* The survey system documentation was sufficient to evaluate the
overall survey?
Sources: GAO and OMB, Standards and Guidelines for Statistical Surveys
(Washington, D.C.: September 2006).
[End of table]
The Survey's Planning and Design:
Early documentation of the NAOMS project shows that the project was
planned and developed in accordance with generally accepted principles
of survey planning and design. As we have previously discussed, the
project team established a clear rationale for the air carrier pilot
survey and its use for ongoing data collection at its conception. Team
members considered the survey's scope and role in light of other
sources of available data, basing the questionnaire on a solid
foundation of available data, literature, and information from aviation
stakeholders. They devised mechanisms to protect respondent
confidentiality. Researchers collected preliminary information from
focus groups and interviews that they used in conducting confirmatory
memory experiments and in developing the questionnaire to reduce
respondent burden and increase data quality. The team was also
concerned with validating the concept of NAOMS and achieving buy-in
from members of industry and others to help ensure the relevance and
usefulness of the NAOMS data to potential users, although they were not
able to fully resolve questions some stakeholders had in the utility of
the data. The team's field trial of air carrier pilots allowed them to
answer key questions about data collection and response rate. The field
trial was followed with supplemental steps to revise the questionnaire
before the full air carrier pilot survey.
Notwithstanding the survey design's strengths, it exhibited some
limitations, such as a failure to use the field trial to fully test
questionnaire content and order and fragmented management plans.
[Footnote 34] We found potential risk for survey errors involving
measurement, with low implications for risk of error in the survey's
data.
Preliminary Research Supported the Survey's Development:
In its planning, the NAOMS team extensively researched survey
methodology, existing safety databases, and literature on aviation
safety and personnel. The team also conducted interviews and focus
groups with pilots. To generate publicity and support from aviation
stakeholders, the NAOMS team made multiple presentations to and
conducted workshops with government officials and aviation stakeholders
(see table 1). The preliminary research and feedback from stakeholders
helped the team define the scope of data collection.
Literature Reviews and Planning:
Initial literature reviews focused primarily on the data collection
methods that would be most likely to ensure response accuracy, on
question wording and ordering that would maximize recall validity, and
on preventing respondents from underreporting for fear of being held
accountable for mistakes. A document summarizing several early team
memorandums addressed theories and literature on "satisficing"--or the
notion that survey respondents seek strategies to minimize respondent
burden and cognitive engagement--and the relationship between the data
collection method and respondent motivation. This document, which was
reprinted, in part, in the contractor's reference report on NAOMS, also
examined literature on social desirability, particularly how
confidentiality affects response accuracy. It included reviews of
academic literature on how interviewing methods can dampen or enhance
tendencies toward socially desirable responses.
The summary document discussed the importance of the questionnaire's
accounting for memory organization as a way to minimize response burden
and maximize respondent recall using specific cues to take full
advantage of how pilots organize events in memory, thus maximizing
their ability to recall and report events in the reference period. It
outlined specific strategies that have been used to assess memory
organization. The document proposed steps the NAOMS researchers could
take to assess memory organization; identify optimal recall periods;
and construct, validate, pretest, and refine the survey questionnaire.
It also outlined a way to implement and evaluate different data
collection methods and included initial sample size calculations to
compare response rates and potential sampling frames.
Another planning document enumerated in detail the populations of
interest in addition to pilots, including air traffic controllers,
mechanics, dispatchers, and flight attendants. The project team
compiled an annotated list of sources on aviation safety and their
limitations to indicate how the survey might play a role within an
overall system to monitor national airspace safety.[Footnote 35] The
project team supplemented its research with focus groups and one-on-one
interviews with pilots to help in deciding which safety events the
questionnaire should cover. These focus groups and interviews are
discussed in more detail in appendix I.
Workshops and Consultations with Stakeholders and Potential Users:
After presentations on the NAOMS concept and its relevance to aviation
safety in March and November 1998, NAOMS staff held the project's first
major workshop on May 11, 1999. A wide range of FAA and NASA officials;
representatives from private industry, academia, and labor unions; and
methodologists discussed:
* the need for NAOMS as a way to fill gaps in safety knowledge and move
beyond accident-driven safety policy (often called the "accident du
jour" syndrome);
* government's and others' use of survey research, citing specific
surveys that are used to measure rates, trends, risks, and safety
information in other fields;
* the intent to focus NAOMS questions on individuals' experiences,
rather than on their opinions; and:
* the need to involve industry and labor stakeholders to ensure high
participation rates and relevant safety content.[Footnote 36]
In addition to introducing the concept of NAOMS and its likely form,
the team expressly sought labor and industry participation in
developing NAOMS and to ensure high response rates; the relevance of
specific questions; and the survey's output application to decision
making on policies, procedures, and technology.
Several aviation stakeholders participating in the workshop offered
feedback on the survey in general and on individual questions raised in
focus groups and the early field research. For example, a summary of
comments from FAA staff raised questions about response rate, the scope
of questions, and strategies for data validation.[Footnote 37] We found
that NAOMS staff clearly thought through many of these issues,
including matters of response rate and questionnaire consistency, and
worked to address them as the project developed. However, as we discuss
in the following text, while NASA initially expected that FAA would be
a primary customer of NAOMS data, it failed to attain consensus with
the agency on the project's merits and on whether NAOMS's goal of
establishing statistically reliable rates, in addition to trends, was
possible.
Defining the Scope of the Data NAOMS Would Collect:
The NAOMS team determined that the NAOMS survey would usefully
supplement other safety resources whose goals were investigative or
were to identify causation. Unlike those resources, NAOMS was to
capture not just incidents but also precursors to accidents and "more
subtle associations that may precede safety events."[Footnote 38] The
2007 ASMM summary report noted that one must know where to look in
order to investigate precursors.[Footnote 39] NAOMS was designed to
point toward such research. The project team expected that trends seen
in the NAOMS data would point aviation safety experts toward what to
examine in other data systems. Researchers and FAA officials told us
that many data, such as radar track data and traffic collision
avoidance data, do not cover the entire NAS and were not regularly
analyzed at the time that NAOMS was being developed.
Following the 1999 workshop on the concept of NAOMS and the preliminary
air carrier pilot questionnaire, a summary of comments from FAA showed
some support for NAOMS. However, the summary expressed concern that
much of the data being gathered were too broad to permit the
development of appropriate intervention strategies. An FAA memorandum
later, following meetings with NAOMS staff in 2003, requested extensive
questionnaire revisions and suggested that certain questions were
irrelevant, should be dropped, or were covered by other safety systems.
FAA also sought more detailed investigatory questions to assess the
causes of some events, such as engine shutdowns, and revisions to
questions that it saw as too subjective and too broad to provide real
safety insight. To ensure that question consistency over time would
enable trend calculations, NASA researchers did not make most of the
revisions. Instead, they responded that to the extent that NAOMS might
provide "a broad base of understanding about the safety performance of
the aviation system" and allow for the computation of general trends
over time, its questions could help supplement other safety systems.
[Footnote 40]
The project team's concerns about respondent confidentiality influenced
the questionnaire's design. For example, they expressed some fear that
questions that attributed blame to respondents reporting safety events
would lead to underreporting. These concerns motivated decisions to
exclude from the questionnaire most of the information that could have
identified respondents. Pilots were not asked to give dates or identify
aircraft associated with events they reported. Additionally, the
database that tracked sampling and contact information for individual
pilots recorded only the weeks in which interviews took place, not
their specific dates.
Project Management Plans Were Not Comprehensive:
The NAOMS team's project management plans were not comprehensive. From
1998 to 2001, the activities of Battelle and its subcontractors were
covered by statements of work to plan and track the survey's
development. These documents enumerated tasks, deliverables, and
projected timelines. Similar documents do not exist for the 2002 to
2003 data collection period, when NASA changed priorities for NAOMS.
Battelle developed a new implementation plan to address changes in
NASA's priorities in 2004, but plans from 2002 onward were largely
subsumed in a series of contract modifications and were not
centralized. Twenty-four base contracts and modifications contained
information to track overall progress, but, according to NASA, the
overall ASMM project plan (while in accordance with NASA policy) did
not contain sufficient detail to correlate the plan with contract task
modifications such as those used for NAOMS. The lack of a central plan
makes it difficult to evaluate specific aspects of NAOMS against
preestablished benchmarks. Furthermore, the failure to maintain
management or work plans during data collection or to adapt the initial
work plans to accommodate project changes may have contributed to the
gaps in record-keeping regarding sampling, as discussed later in this
report.
Innovative Memory Experiments Enhanced the Questionnaire:
Research demonstrates that designing a survey to accommodate the
population's predominant memory structure can reduce respondents'
cognitive burden and increase the likelihood of collecting high-quality
data. The NAOMS team conducted innovative experiments to help in
developing a survey that would reduce respondent burden and accommodate
the air carrier pilots' memory organization and their ability to recall
events, thus increasing the likelihood of accuracy. While researching
and testing hypotheses about memory organization to enhance
questionnaire design are excellent survey research practices, few
researchers have the time or resources to conduct extensive experiments
on their target population. The NAOMS survey methodologist ran
experiments from 1998 through 1999 to generate and test hypotheses that
could be incorporated into the design of the air carrier pilot survey.
Several of the project's experiments to determine pilots' recall and
memory structures were based on relatively few pilots. These were
supplemented with other experiments and additional data analysis to
validate the researchers' hypotheses. However, these experiments were
limited to the core questions on safety in the air carrier pilot survey
and did not extend to other sections of the survey or other
populations, whether general aviation pilots, mechanics, or flight
crew. The memory experiments led researchers to design the core safety
events section of the survey according to a hybrid scheme of memory
organization--that is, it used groupings and cues related to causes of
events as well as phases of flight, such as ground operations and
cruising.[Footnote 41]
After the memory experiments, the NAOMS survey methodologist
recommended that project staff undertake cognitive interviews to ensure
that the questionnaire to be used in a planned field trial could be
understood and was complete, recommending also that a final version of
the questionnaire be tested with a separate group of pilots. A
memorandum indicated that at least five cognitive interviews were held
before the field trial, but we could not identify documentation on
their effect on the questionnaire's structure or content.
A Large-Scale Field Trial Resolved Many Issues, but Not Others:
In 1999, following more than 1 year of research, experiments, and
questionnaire development, NAOMS researchers conducted a large-scale
field trial. It was to help decide the appropriate recall period for
the survey questions; major issues of order and content for the
questionnaire; and the appropriate method of survey administration to
minimize cost, while maximizing response rate and data quality. The
field trial also allowed the NAOMS team to assess whether the survey
methodology was a viable means of measuring safety events. Although
largely in accordance with generally accepted survey principles, the
field trial had some limitations and did not resolve important
questions about the survey's methodology.
To administer the trial, team members randomly assigned pilots to
various experimental conditions: three different interviewing methods
(self-administered questionnaires, and CATI and in-person interviews),
six different recall periods, and the presentation of the main
questions of the core safety questions first or following the topical
focus section. Interviewers for the CATI and in-person interviews
received group and individual training, and the researchers used widely
accepted practices to enhance response rates for the self-administered
questionnaire, with notifications and reminder letters to maximize
response rate. Their analysis of the data appeared to show that
experimental assignments were sufficiently random and different in data
quality to allow some decisions about response mode and recall period-
-showing, for example, that different modes resulted in different
completion rates, and that longer recall periods produced higher event
counts.
Recall Period Research and Testing:
The NAOMS researchers hoped to reliably measure highly infrequent
events--the severest of which pilots were likely to recall quite well-
-without jeopardizing the measurement of more frequent, less memorable
events that had safety implications. Literature on survey research did
not point to one specific reference period for events such as those in
the NAOMS survey. To evaluate the effect of recall period on a pilot's
ability to accurately remember events, the project's survey expert
asked five pilots to fill out, from memory, a calendar of the dates and
places of each of their takeoffs and landings in the past 4 weeks. Then
they were asked to fill out an identical calendar at home, using
information they had recorded in their logbooks.
The survey methodologist used these data to support his recommendation
that NAOMS use a 1-week recall period, noting that this would require a
substantial increase in sample size to measure events with the
precision NAOMS originally intended. However, because the experiment
was designed to measure only takeoffs and landings--routine activities
that were unlikely to carry the weight in memory of more severe or
infrequent safety events at the heart of the NAOMS project--the survey
methodologist added the caveat that the final decision about recall
interval would have to be informed by the particular list of events in
the final NAOMS questionnaire and the rates at which pilots witnessed
them.
Following the logbook experiment, NAOMS researchers tested several
potential recall periods in the field trial, including 1 and 2 weeks
and 1, 2, 4, and 6 months. Data from the field trial show an increase
in the number of hours flown and event reporting commensurate with
extensions of the recall period and possible overreporting for the 1-
week period relative to the others. Aside from the logbook experiment,
however, no efforts were made to validate the accuracy of field trial
reports of safety events or flight hours and legs flown in survey data
collected within different recall periods.[Footnote 42]
The project team also obtained feedback from the pilots participating
in the field trial. This feedback indicated that most who commented on
recall periods said they were too short; the pilots wanted to report
incidents that happened recently, but not within the recall period. The
researchers noted that the pilots' discomfort with a short recall
period did not necessarily mean the data collected within that period
were inaccurate; it meant only that it was possible that they wanted to
report events outside the recall period to avoid giving the impression
that certain events never occurred. Researchers also studied pilots'
reported confidence in their responses as an indication of data quality
obtained with different recall periods. However, the information from
the field trial tests and respondent feedback did not resolve the
question of which recall period to use. Researchers decided to use
approximately the first 9 months of NAOMS data collection as an
experimental period to resolve questions the field trial could not
answer, and they settled on a 60-day recall period several quarters
after full data collection began.[Footnote 43]
Data Collection Methods:
The contractor administering the field trial randomly assigned pilots
to mail questionnaires, face-to-face interviewing, or CATI. Face-to-
face data collection was stopped after it proved to be too costly and
complicated. The project team then compared the costs and response
rates of the two other methods as well as the completeness of responses
as a measure of data quality. Completed mail questionnaires cost $67
each and had a response rate of 70 percent, and 4.8 percent of the
questions went unanswered. Telephone interviews cost $85 and attained a
response rate of 81 percent, and all of the questions were answered.
[Footnote 44]
The project team decided that the CATI collection method was
preferable, given the response rate, the cost, and a tighter
relationship between the numbers of hours flown and aggregated events
reported. We found ample information to support this data collection
method. In contrast, the field trial did not provide the researchers
with an opportunity to validate the sample strategy for data
collection--either cross-sectional (drawing each sample anew over time)
or panel (surveying the same set of respondents over time). As with the
recall period, researchers used the early part of the full survey to
experiment with both panel and cross-sectional approaches. They decided
on a final data collection approach approximately 9 months after the
full survey began.
Questionnaire Order and Content:
Team members developed different versions of the field trial
questionnaire to test whether to survey pilots first about main events-
-the core safety issues in section B--or about focus events--the issues
on specific topics in section C (see figure 3). The researchers'
quantitative analysis of the field trial data suggested that different
section orders did not affect data quality. However, we found it
unusual that the field trial questionnaire did not fully incorporate
the specific question order suggested by experiments or literature in
the main events section. While questionnaires contained content areas
from the memory experiment that combined the causes of events and the
phases of flights, individual topics within the core safety events
section of the field trial survey were not ordered from least to most
severe as the survey methodologist recommended. NASA later clarified
that the NAOMS team incorporated the results of the field trial into
the final survey instrument.
Additionally, the field trial questionnaire did not contain the "drill-
down" questions that appeared in the final questionnaire--that is,
questions asking for multiple response levels (see figure 4). The
failure to include these questions appears to violate the generally
accepted survey practice of using a field trial to test a questionnaire
that has been made as similar as possible to the final questionnaire.
While questionnaires almost inevitably change between a field trial and
their final form, the results of the experiments, cognitive interviews,
and full set of questions should have been incorporated into the test
questionnaire before the development of the final survey.
Figure 4: Example of an Air Carrier Pilot Survey Drill-Down Question:
[Refer to PDF for image: illustration]
ER2. How many times during the last (Time Period) did an aircraft on
which you were a crewmember experience a spill, fire, fumes, or
aircraft damage due to transporting hazardous materials?
#HAZMAT:
If 0, Skip To ER3.
A. (How many of these [# in ER2] times were the spills, fire, fumes or
aircraft damage/Was this spill, fire, fumes or aircraft damage) in the
cargo compartment?
# In Cargo Compartment:
(The Amount In ER2A Cannot Be Greater Than The Amount In ER2).
B. (How many of these [# in ER2] times were spills, fire, fumes or
aircraft damage/Was this spill, fire, fumes or aircraft damage) in the
passenger compartment?
# In Passenger Compartment:
(The Amount In ER2A And ER2B Combined Cannot Be Greater Than The Amount
In Er2).
C. (How many of these [# IN ER2] times were the spills, fire, fumes or
aircraft damage/Was the spill, fire, fumes or aircraft damage) caused
because the hazardous materials in question were out of compliance with
regulations?
# Out Of Compliance With Regulations:
(The Amount In ER2C Cannot Be Greater Than The Amount In ER2).
Source: Battelle Memorial Institute, NAOMS Reference Report: Concepts,
Methods, and Development Roadmap, prepared for the NASA Ames Research
Center (Nov. 30, 2007), app. 11-5.
[End of figure]
Supplementary Steps Led to Questionnaire Revisions before the Main
Survey:
In addition to subject matter and survey methodology research,
experiments, and field testing, NAOMS staff used other commonly used
survey research techniques to develop and revise the air carrier pilot
survey questionnaire. For example, we found that at least five
cognitive interviews were conducted before the field trial, but we
found no documentation that described these interviews or their effect.
[Footnote 45] Additional cognitive interviews were conducted after the
field trial on nearly final versions of the questionnaire before the
survey's full implementation, resulting in changes to the questionnaire
(see appendix I). The project team did not record field trial
interviews; doing so would have allowed verbal behavioral coding, which
is a supplemental means of assessing problems with survey questions for
both respondents and interviewers.
Besides the changes the team made to the questionnaire from the results
of the cognitive interviews, team members reviewed the survey
instrument in great detail, adding and deleting questions to make it
easier for the interviewers to manage and for the respondents to
understand. However, as we have previously mentioned, the questionnaire
used in the field trial did not fully incorporate the order of events
suggested by the memory experiments. This order appears to have been
addressed after the cognitive interviewing that took place just before
the final survey began.
We found evidence that the NAOMS team made some changes to the
questionnaire as a result of respondent comments on the field trial,
such as discarding a planned section on minimum equipment lists, seen
by many respondents as ambiguous and unclear, in favor of a different
set of questions. However, there is no documentation of additional
question revisions in response to empirical information from the field
trial. Additionally, except for CATI testing involving Battelle
managers and interviewers, we could not find evidence of a pretest of
the final questionnaire incorporating all order and wording changes
before the main survey was implemented. NASA recently told us that the
results of the field trial, as well as inputs from other research, were
fully incorporated into the final survey instrument.
The Survey's Sample Design and Selection:
We found that for its time, NAOMS's practices regarding sample frame
design and sample selection met generally accepted survey research
principles, with some limitations. The project team clearly identified
a target population and potential sample sources. To maintain program
independence, the team constructed the sampling frame from a publicly
available database that was known to exclude a sizable proportion of
air carrier pilots, and applied filtering criteria to the frame to
increase the likelihood that the pilots NAOMS contacted would be air
carrier pilots, rather than general aviation pilots. It is not known
for certain whether the approximately 36,000 pilots NAOMS identified
for its sample frame were representative of the roughly 100,000
believed to exist.[Footnote 46] The implications for the risk of error
were high; the most significant sources of potential survey error stem
from coverage and sampling.
In addition to increasing the risk of error, sampling decisions
potentially affect the analysis and interpretation of NAOMS data.
Sample size calculations may not be sufficient to generate reliable
trend estimates because of the infrequency of events that have great
safety significance and concerns about operational characteristics and
potential bias resulting from the sample filter. Additionally,
developing estimates of event counts for air carrier operations in the
NAS (which was not a primary objective of NAOMS) from a sample of
pilots is complicated by the fact that rates from NAOMS are based on
individuals' reports, rather than on direct measures of safety events.
[Footnote 47] Also, the survey has the potential for multiple
individuals to observe the same event.
Potential Problems Related to the Sampling Strategy Require Additional
Assessment:
While NAOMS researchers designed and selected a sample in accordance
with generally accepted survey research principles, sampling decisions
they made to address complications influenced the nature of the data
collected. NAOMS's sampling strategy for the air carrier pilot survey
was complicated by the needs to (1) link a target population to
specific analytical goals; (2) identify an appropriate frame from which
to draw a sample; and (3) locate commercial pilots, rather than general
aviation pilots. Eventually, the team constructed a frame from a
publicly available pilot registration database that excluded some
pilots and lacked information on where pilots worked, compelling the
team to use a filter to increase the likelihood of sampling air carrier
pilots. The contractor drew a simple random sample each quarter from
the freshly updated, filtered, and cleaned database and divided the
sample into random replicates that were released weekly for
interviewing.[Footnote 48] After the first year of the air carrier
pilot survey, which adapted sampling to accommodate experiments on
recall period and panel approach to data collection, the survey sampled
approximately 3,600 air carrier pilots for most quarters of data
collection. This sampling strategy resulted in 25,720 completed
interviews by the end of the air carrier interviewing.
Identifying a Target Population:
To develop NAOMS's sampling strategy, the team first needed to identify
a target population. Although an ideal target population corresponds
directly with a specific unit of analysis of interest, researchers
often rely on proxies when they cannot directly sample the unit. With
NAOMS's goal of estimating trends of safety events per air carrier
flight hour or flight leg in the NAS, a target population might have
been all air carrier flights in the NAS. Theoretically, one could draw
a sample of all air carrier flights in the NAS, locate the pilots on
these flights, and interview them about events specific to a particular
flight.
Given that such a sample would be prohibitively resource-intensive, the
NAOMS team identified an alternative target population--namely, air
carrier pilots. Surveying air carrier pilots would provide information
on safety events as well as on how many flight hours or flight legs
that pilots flew. If the frame fully covered the population of air
carrier pilots, the team's planned simple random sample from the frame
would allow an estimation of individual air carrier pilots' rates of
events experienced per hour or leg flown. In isolation, these
individual-based estimates would fall short of cleanly characterizing
the NAS, which involves other pilots besides air carrier pilots and
other personnel, including other crew members on each flight. However,
the estimates could address NAOMS's goal of estimating rates (for
individual air carrier pilots) on the basis of risk exposure and trends
in safety events over time, to supplement other systems of information
about safety.
One potential difficulty with this target population was that the
number of pilots actively employed as air carrier pilots was not known
when the project began. Although the NAOMS team extensively reviewed
the size of the pilot population, we found multiple estimates of the
target population from the NAOMS documentation. NAOMS's preliminary
research suggested that approximately 90,000 pilots were flying for
major national and regional air carriers and air cargo carriers.
[Footnote 49] Other information suggested that the population
could have been as large as 120,000 pilots. For example, the 60,000 air
carrier pilots in ALPA's membership represented "roughly one-half to
two-thirds" of all air carrier pilots, or, alternatively, up to 80
percent of the target population.[Footnote 50] In light of these
different estimates, we assume for purposes of discussion a target
population of about 100,000 air carrier pilots.
Constructing a Sampling Frame:
NAOMS researchers next needed to identify a source of information on
its target population to provide a sampling frame from which it could
sample air carrier pilots. As we have previously mentioned, because
there was no central list of air carrier pilots that would ensure
coverage of the target population, researchers had to choose an
alternative frame. Initially, they considered using ALPA's membership
list of air carrier pilots. However, to maintain the project's
independence and to be as inclusive of pilots as possible, regardless
of their employer or union status, they decided against using this or
any other industry list, such as personnel information from airlines.
The project team also considered using FAA's Airmen Registration
Database.[Footnote 51] Its information on pilots included certification
type and number, ratings, medical certification, and other personal
data. When the survey was first being developed, limited information
for all pilots in the Airmen Registration Database was publicly
available as the Airmen Directory Releasable File. In 2000, after the
field trial but before the full air carrier pilot survey was about to
be implemented, FAA began allowing pilots to opt out of the publicly
releasable database. NASA officials told us that the team had
considered asking FAA for the full database but decided against
formally pursuing access to it for several reasons. These included
ensuring continuing access to a public, updated database; ensuring
access to a database that contained contact information for pilots; and
maintaining independence from FAA as an aviation regulatory agency.
Also, NASA was concerned about using the full data, because it wanted
to maintain the privacy of pilots who had removed their names from the
list explicitly to avoid contacts from solicitors, purveyors, or the
like.
NAOMS staff had access to the full database when it was still publicly
available in 2000 for the air carrier pilot survey's field trial
sample. However, NASA officials believed that they could not use it for
the full-scale survey from 2001 to 2004 because the nature of the
frame--in terms of how well it represented the current air carrier
pilot population--would change over time. Instead, the team decided to
use as the frame for the full-scale air carrier pilot survey the Airmen
Directory Releasable File that excluded pilots who had opted out; this
file was regularly updated over the course of the air carrier pilot
survey.[Footnote 52] The choice of frame may have been appropriate,
given programmatic constraints, but posed several challenges. First,
pilots in the publicly available Airmen Directory Releasable File were
not necessarily representative of pilots in FAA's full Airmen
Registration Database. Second, the database lacked information on
whether airmen actively flew for a commercial airline. Lastly, only a
relatively small portion of the 688,000 pilots in the database at the
time of the field trial were air carrier pilots.
Potential Effect of the Opt-out Policy:
NAOMS staff, realizing the potential limitations of using the publicly
available data, were concerned about whether the frame provided
adequate coverage of the target population or introduced bias into the
data--that is, whether pilots in the public, opt-out database were
sufficiently representative of air carrier pilots overall. For example,
ALPA had provided its membership (which comprises approximately two-
thirds of air carrier pilots) with information about the opt-out policy
and with a form letter to pilots to facilitate their removal from the
list. It is, therefore, possible that ALPA pilots removed their names
from public access at a higher rate than non-ALPA pilots.
NAOMS researchers' analysis suggests that air carrier pilots may have
removed their names from the public database at a disproportionately
greater rate than did general aviation pilots.[Footnote 53] One
Battelle statistician expressed concern to other NAOMS team members
that the sample, therefore, might not represent the population of
interest. To help assess potential bias as a result of the opt-out
policy (and the filter, discussed in the following text), researchers
added a question to the survey--part way through the data collection
phase--asking pilots to identify the size category of the aircraft
fleet of the air carrier for which they flew. This information would
allow for a comparison with air carrier fleet sizes known to exist in
the NAS.[Footnote 54]
Identifying Air Carrier Pilots from the Sampling Frame:
The database from which the project drew its sample of pilots lacked
information on where the pilots worked and, therefore, could not be
used to identify pilots flying commercial aircraft. The incidence of
air carrier pilots in the full Airmen Registration Database was fairly
low--approximately one in seven pilots would have been an air carrier
pilot. (We could not find documentation on the number or proportion of
air carrier pilots in the opt-out database, but we believe it to have
had a similarly low incidence.) Therefore, the NAOMS researchers
decided to use a filter to increase the likelihood that those contacted
for the survey would be air carrier pilots.
The filter required that pilots be U.S. residents certified for air
transport, with flight engineer certification and a multiengine rating-
-a rating that sets specific standards for pilot experience and skill
in operating a multiengine aircraft. By construction, all pilots in the
public (opt-out) Airmen Directory Releasable File who did not fulfill
these filtering requirements fell into the sampling frame to be used
for the general aviation survey. After the filter was applied, the
final frame for air carrier sampling had approximately 37,000 pilots in
the first several quarters; records on the size of the frame's later
quarters were not maintained.
With these filtering criteria, approximately 70 percent to 80 percent
of those contacted for the air carrier sample were, in fact, air
carrier pilots who had flown within the recall period specified on the
questionnaire. Although the contractor collected some information on
pilots who were contacted but deemed ineligible for the survey, the
data were not analyzed specifically to establish how effective the
filter was at identifying air carrier pilots, even if they did not
qualify for the survey. Without data on which people were excluded
because they were general aviation, rather than air carrier pilots,
these pilots would be wrongly omitted from the sampling frame for the
general aviation survey.
As data collection progressed, the NAOMS team realized that the data
were biased toward more experienced pilots, pilots flying primarily as
captains, and pilots flying widebody aircraft over longer flight times.
[Footnote 55] After extensive analysis of the observed bias, the team
attributed the bias primarily to two of the four filtering criteria--
that is, that pilots were required to have both air transport and
flight engineer certifications. Team researchers explored various
strategies for addressing the observed bias and made several
recommendations for data collection and analysis. The team considered
whether using stratification to select samples according to alternative
or additional characteristics would help reduce the observed bias
toward more experienced pilots flying larger aircraft, but it
eventually decided against changing the sampling strategy midsurvey.
[Footnote 56]
To determine whether the filter systematically excluded certain types
of respondents--for example, air carrier pilots flying smaller aircraft
or pilots with less experience--the NAOMS team recommended capitalizing
on the implementation of NAOMS's general aviation portion. The sampling
frame for the general aviation survey included all pilots not filtered
into the air carrier sample. Accordingly, project staff could examine
the characteristics of air carrier pilots who fell into the general
aviation sample because they did not meet filtering requirements, to
establish whether they differed notably from those surveyed using the
filtered sample. Preliminary analysis confirmed that pilots surveyed
from the filtered sample exhibited systematic differences from air
carrier pilots in the general aviation survey. Specifically, pilots
surveyed with the air carrier sampling filters overrepresented captains
and international flights, underrepresented smaller aircraft and
airlines, and overrepresented the largest aircraft and airlines.
Following these analyses, the NAOMS team advocated incorporating
operating characteristics into all analyses to mitigate potential bias.
For the most part, the team recommended using operational size
categories--that is, small transport aircraft and medium, large, and
widebody aircraft--to stratify and possibly weight analyses, since
different types of aircraft face different event risks and since safety
issues may be more or less serious, depending on operating
characteristics or aircraft make and model.[Footnote 57] The team's
presentations of preliminary results frequently incorporated such
analyses, as shown in figure 5. While other operational stratifications
were suggested, such as specific aircraft make and model, it was
acknowledged that this kind of analysis would dramatically reduce the
effective sample size available for analysis in each category. A
smaller effective sample size would decrease the precision of estimates
from the survey, making it more difficult to detect changes in rates
over time, especially for infrequent events.
Figure 5: NAOMS's Preliminary Estimates of Pilot-Reported Flight Hours
and Flight Legs, by Aircraft Size, 2002:
[Refer to PDF for image: vertical bar graph]
Pilot Reported Hours and Legs For Reference Period:
Small Transport:
Mean Hours: approximately 82;
Mean Legs: approximately 55.
Medium Transport:
Mean Hours: approximately 103;
Mean Legs: approximately 50.
Large Transport:
Mean Hours: approximately 105;
Mean Legs: approximately 35.
Widebody:
Mean Hours: approximately 95;
Mean Legs: approximately 20.
Aircraft Size: Small Transport;
Mean Hours Per Leg: 1.5.
Aircraft Size: Medium Transport;
Mean Hours Per Leg: 2.1.
Aircraft Size: Large Transport;
Mean Hours Per Leg: 3.1.
Aircraft Size: Widebody:
Mean Hours Per Leg: 4.9.
Source: Linda Connell and Mary Connors, ’National Aviation Operation
Monitoring Service (NAOMS),“ presentation to the Aviation Safety and
Security Program Office (Hampton, Va.: NASA, Dec. 5, 2002), 44.
[End of figure]
Additionally, to the extent that the data were to be analyzed as rates
per flight leg or flight hour, an analysis segregated by operational
characteristics would represent a fair description of these rates if it
were assumed that the data adequately represented aircraft and pilots
experiencing safety events within those operational categories--for
example, if the widebodies and their pilots in the sample were fairly
representative of air carrier widebody aircraft and pilots in the NAS.
Sample Size Calculations May Have Curtailed Statistically Reliable
Trend Estimates for All Questions:
NAOMS aimed to generate statistically reliable rates and trends that
would allow analysts to identify a 20 percent yearly change with 95
percent confidence. However, the ability to detect such trends depended
not only on the sample size, but also on the frequency of events. One
statistician who had worked with the project team reported recently
that detecting changes in trends of very rare events, such as complete
engine failure, would require a prohibitively large sample of
approximately 40,000 pilots. NAOMS's sample sizes were insufficient to
allow analysis of all questions on the air carrier pilot survey or to
accommodate analytical strategies that researchers eventually deemed
necessary after data collection had begun, such as analysis by aircraft
size category.
During the field trial, sample sizes were calculated to distinguish
response rates between the three data collection methods (face-to-face
and telephone interviews and mail questionnaires) to answer questions
such as the following: Did an 81 percent completion rate for telephone
interviews differ significantly from a 70 percent response rate for
mail questionnaires? Later sample calculations for the full survey
focused more directly on establishing the ability to detect a 20
percent change in event rates over time. Data from the field trial were
analyzed to estimate how frequently an air carrier pilot experienced
each specific event, enabling the team to assess how reliably different
sample sizes could detect increases or decreases of 20 percent. From
the field trial data, the contractor estimated that 8,000 interviews
would allow detection of changes in rates with 95 percent confidence
for approximately one-half of the core safety event questions.
The team eventually settled on a sample size of approximately 8,000
cases a year, declaring in its application to OMB that this would be
the minimum size required to reliably detect a 20 percent change.
[Footnote 58] The application clarifies that just 5,000 unique pilots
would be interviewed in the first year to gather 8,000 completed
surveys (4,000 in cross-sectional samples, and 1,000 in four waves of
the panel), but sample size calculations submitted to OMB do not
expressly consider the impact of the panel's smaller sample size on the
ability of NAOMS data to detect trends.[Footnote 59] In the 3 years
after data collection experiments in recall and method were
discontinued, the survey interviewed approximately 7,000 cases a year.
At the time the NAOMS OMB application was submitted, project staff did
not have adequate data to know for certain how frequently individual
safety events would be reported, or to know an exact number of
interviews that could actually be attained in a year. The NAOMS OMB
application reported that pilots experience certain events quite
infrequently, without expressly calculating how well a sample size of
8,000 could generate reliable estimates for such events. The sample
size calculations in the application also assumed that the first-year
data could be aggregated across recall periods and both the panel and
cross-sectional data collection approaches that were used. NAOMS
project staff later told us that further analysis would be essential to
establish whether rates and trends generated from different recall
periods and data collection approaches were sufficiently similar to
allow combining the data. NASA believes that, even without data from
the experimental period, the subsequent 3 years of air carrier pilot
data were sufficient to demonstrate the survey's capability of
detecting trends reliably.
Partway through data collection for the full air carrier pilot survey,
NASA's contractor conducted simulations using early NAOMS data to
better establish sample sizes at which 20 percent changes in rates for
individual questions could be detected. These data confirmed that a
sample of 8,000 cases a year would be sufficient to detect a 20 percent
change for roughly one-half the core safety event questions, assuming
all cases were analyzed simultaneously. By this point, however, the
project team had already established the importance of breaking out
NAOMS's estimates according to the size category of the aircraft flown
to compensate for operational differences and the effects of the
sampling procedures that we have previously described. Thus, sample
size calculations may have overstated the ability of the NAOMS data to
reliably detect trends at given significance levels, if segregating
answers by operational characteristics is critical. Additional
simulations that accounted for likely analytical considerations would
be essential to determine whether the NAOMS project could attain its
goal of measuring 20 percent changes in rates of different safety
events with statistical confidence.
Sampling and Design Decisions Bear on NAOMS's Rate Calculations and
Characterization of the National Air Space:
When analyzing NAOMS's data, researchers must consider the effect of
several design and sampling decisions that the project team made to
accommodate pilots' confidentiality and the infeasibility of directly
sampling all flights in the NAS. For example, the likelihood that a
particular event would be reported by a pilot responding to the NAOMS
survey increased with the number of crew witnessing the event and the
number of aircraft involved. However, in designing a questionnaire to
lessen the likelihood of respondent identification, the NAOMS team
decided not to link pilots' reports of specific events to particular
aircraft flown during those events or on the dates on which those
events happened. Furthermore, the team's choice of sampling frame and
filter resulted in a disproportionate selection of captains relative to
other crew members. While sampling and design choices were rational in
light of concerns about confidentiality and program independence, such
decisions have had implications on how to calculate and interpret rates
from NAOMS and on whether analysts can extrapolate the data to
characterize the national air space. NAOMS staff failed to identify
specific analytical strategies to accommodate these issues in advance
of data collection.
Using NAOMS Data to Calculate Rates and Trends:
Survey design and sampling decisions affect how rates from NAOMS data
can be calculated. For example, the NAOMS survey has the potential to
collect multiple reports of safety events if more than one crew member
on an aircraft or crew members on different aircraft observed the same
safety event.[Footnote 60] Safety events happening on aircraft with
more crew members would also have had a greater likelihood of being
reported, since more individuals who experienced the same event could
have been subject to selection into the sample. These issues are not a
problem, unless researchers fail to address them appropriately in an
analysis.
Analytic goals must determine whether one adjusts for the potential
that an event is observed by multiple crew members in the sampled
population. Given that one of NAOMS's goals was to characterize the
rate at which individual air carrier crew members experienced events
per flight hour or flight leg, and assuming all crew members in an
aircraft were equally likely to be sampled, multiple crew members
observing an event involving one aircraft would not pose a problem.
However, other considerations bear on whether and how to make
adjustments. For example, bias resulting from the sampling frame and
filter suggests that captains were more likely to have been selected
into the air carrier sample than first officers or other crew members;
additionally, many pilots flew in more than one crew capacity during
the recall period. Events involving multiple aircraft also complicate
estimates, partly because individuals not qualified for the air carrier
pilot survey might have flown many of these aircraft. Extrapolating
from individually derived rate estimates to system counts would also
require making substantial assumptions and adjustments (see the
following text).
One potential strategy to address the possibility of multiple
observations of the same event would be to allocate events according to
the number of crew members who might have witnessed them (more details
on alternative strategies are in appendix I). For example, a report of
a bird strike from a pilot flying a widebody aircraft with two
additional crew members could be counted as one-third of a bird strike.
Appropriate allocation presumes, however, that the analyst can identify
the number of crew members present for any given report of a safety
event. In general, the NAOMS recall period extended over 60 days,
during which some pilots flew two or more types of aircraft of
different size categories, implying different numbers of crew.[Footnote
61] Additionally, the questionnaire did not allow a pilot who flew more
than one aircraft to identify which aircraft a reported safety event
was associated with or in which role he or she served as crew. Analysts
seeking to address the potential effect of multiple reports of the same
event would have to develop allocation strategies that account for
these design issues.
Researchers must also develop allocation strategies for other aspects
and types of analysis using NAOMS data, such as trends or rate
estimates for different aircraft types. We have previously mentioned
that the NAOMS team recommended analyzing data by operational size
category because of sampling considerations and because the effect and
exposure to certain risks varied by class of aircraft. They also noted
the importance of seasonal variations in relation to safety events--for
example, icing is less likely to be a problem in summer than winter.
In its preliminary analysis, the NAOMS team attempted to resolve the
issue of seasonal assignment by using nonproportional allocation
strategies. The team used a midpoint date of the recall period--for
example, October 1 if an interview recall period ran from September 1
to October 30--to determine a seasonal assignment for each interview in
the analysis. For pilots flying different aircraft during the recall
period, team members assigned an operational size class, based on the
aircraft predominantly flown. For pilots who reported flying different
operational sizes of aircraft equally over the recall period, project
staff used a random number generator to determine the size class for
preliminary analysis.
Extrapolating to the National Airspace System:
The NAOMS team disagreed on the survey's ability to provide information
on systemwide event counts versus rates and on trends based on
individuals' risk exposure. In preliminary analysis, the contractors
often used BTS data to weight NAOMS data to generate systemwide event
counts for air carrier operations in the NAS, and to provide baseline
measures to assess potential bias resulting from sampling and filtering
procedures.[Footnote 62] Since BTS's data collection processes changed
during the NAOMS data collection period, however, the contractor
stopped using these data to weight its estimates.
Because of the distinction between the NAOMS's unit of analysis and the
sampling frame, as well as other sampling issues we found, it may not
be possible to establish systemwide event counts for air carrier
flights from the NAOMS data without using an external benchmarking
dataset. However, extrapolating to systemwide event counts was not an
explicit goal of the project. To the extent that analysts seek to use
an external dataset to weight the NAOMS data in estimates of systemwide
counts, that dataset's collection procedures and reliability would
require assessment. Additionally, caution should be exercised, since
changes in data collection or editing procedures over time could
confound actual trends with changes resulting from variations in any
external weighting dataset.
The Survey's Implementation:
We found that NAOMS researchers followed generally accepted survey
principles for many aspects of the survey's implementation, with some
limitations. Sample administration, information systems, and
confidentiality provisions appear to have been adequate, and telephone
interviewers were successful in administering technical questions and
attaining high completion rates. However, despite adequate records of
data editing and checks, analysis and interpretation of NAOMS data are
complicated by first-year experiments in recall period and data
collection approaches and CATI programming choices, along with sampling
and design decisions. Researchers did not conduct full data validation
or nonresponse bias assessments to ensure the quality of the data. We
found deficiencies in record-keeping and moderate implications for the
risk of survey error; the potential survey errors involved processing,
sampling, and nonresponse.
Information Systems and Sample Management Maintained Confidentiality,
but Data Checks and Record-Keeping Were Limited:
We found several issues with NAOMS information systems. Sample
administration and management, including notification of and
informational materials for pilots and release of sample for
interviewing, met generally accepted survey principles. Pilot
confidentiality seriously concerned project staff, and steps to protect
confidentiality appear to have been adequate. In contrast, CATI
programming and data checks, along with record-keeping, had greater
limitations.
Sample Administration and Management:
Taking its sample from the Airmen Directory Releasable File, NAOMS
sampled using pilots' certificate numbers, with a filter designed to
target air carrier pilots. After adjusting for duplicate certificate
numbers that had entered the sample some time in the previous year
(regardless of whether an interview was completed), the team obtained
pilots' updated addresses from the U.S. Postal Service's change-of-
address file and submitted them to Telematch to obtain telephone
numbers for each address.[Footnote 63] This process resulted in an
approximately 60 percent match of addresses to telephone numbers, which
researchers saw as sufficient because they believed the Airmen
Directory included some records for individuals who had retired or were
deceased. Each quarterly sample was then divided randomly into 13 parts
to be released weekly. On the Friday before each week's release,
project staff sent pilots a notification on NASA letterhead that
described the study and its confidentiality provisions and informed
them that an interviewer would be calling. To pilots for whom Telematch
could not provide a valid telephone number, or who had "bad" numbers
from the field trial, project staff sent postcards asking them to call
NAOMS interviewers directly or to send in an updated telephone number.
The project team monitored the disposition of the sample on a weekly or
quarterly basis, including the proportion of respondents who were
ineligible, refused, or could not be located. While between 17 and 29
percent of pilots in each quarterly sample could not be located, and
consequently were not interviewed, approximately 5 percent of the
completed interviews resulted from cases that had not been matched to a
telephone number through Telematch. The NAOMS team aimed initially for
a 6-week fielding period, or "call window," to allow interviewers
sufficient time to call back each nonresponding pilot in the sample
before assigning the case a final disposition (such as "no-locate" or
"refusal") and removing the pilot from the sample. However, researchers
found that a 3-month call window was necessary to attain a sufficient
response rate. The team did not indicate having compared the answer
patterns of pilots they reached early in the sample with the answer
patterns of pilots who were hard to track down, to ensure the patterns
were comparable across the full sample field period.
Information Systems and Pilot Confidentiality:
The survey's management techniques and documentation for interviewers
indicate that the NAOMS project team was particularly attentive to
confidentiality. The questionnaire did not ask pilots to link safety
events to specific flights, airlines, or times. Interviewers were
informed that "Battelle [can] not link data items with individual
pilots. All reports will be presented using aggregate information."
[Footnote 64] Battelle used separate systems to track the sampling and
to store the interview data, which ensured that pilots' answers could
not be linked to any identifying information. In the system with
sampling information, the specific date of each interview was not
recorded, only the week in which it happened. The NAOMS Reference
Report described NAOMS's responses as "functionally anonymous" and
suggested that the promise of confidentiality enhanced the respondents'
rapport with the interviewers.[Footnote 65]
The NAOMS team never sought to release unedited individual-level data
from the survey. The project's OMB application describes plans for
ensuring the confidentiality of respondents, including provisions for
confidentiality statements on behalf of interviewers and staff, and
separate computer systems for sampling and interviewing so that
respondents' answers could not be linked back to identifying
information. The application also states that:
"The identity of respondents will not be revealed to anyone outside of
the study staff.
"The data presented in reports and publications will be in aggregate
form only.
"The respondent will be assured that participation is completely
voluntary and in no way affects their employment."[Footnote 66]
Among analytical products for the aviation community, researchers
planned to release summary reports and "structured, fully de-identified
datasets." According to a presentation at the first NAOMS workshop,
NAOMS products would be subject to FOIA after they were in "a finished
state."[Footnote 67] NASA officials told us that they agreed that there
would be little risk of violating pilots' confidentiality if data were
released in aggregate as initially was planned.
In meetings with NASA, as well as in the agency's written comments
responding to our draft report, officials expressed serious concern
about the importance of protecting pilots' identity, a concern we
share. The officials offered several specific examples of how they felt
NAOMS data could be used to identify individual pilots. However, many
government agencies that collect sensitive information, such as the
Institute for Education Sciences, the Census Bureau, and the National
Center for Health Statistics, have successfully allowed individual
researchers access to extremely sensitive raw data on individuals.
These agencies have effectively addressed the issue of individual
privacy by, for example, requiring researchers to attain clearance to
use data that could reveal sensitive information, to sign nondisclosure
agreements, and to submit to stiff penalties for noncompliance.
Additionally, agencies may restrict the types of analyses that can be
performed with the data, where data can be analyzed, and how the data
are reported. For example, the National Center for Health Statistics
may prevent researchers from accessing table cells that contain fewer
than five observations to lessen the likelihood that an individual
respondent can be identified.
We realize that given the evolution of data mining techniques, one
could conceive of a full, raw NAOMS dataset being linked to proprietary
information from airlines or a host of other safety systems in ways
that might enable a dedicated data analyst to identify a particular
pilot from the air carrier survey.[Footnote 68] This breach seems
unlikely to happen, however, given the relative absence of identifiable
information in the survey data and the lack of connection between the
tracking database and the CATI data. If the survey were to be
implemented as it was planned and the data released publicly only in
aggregate, the confidentiality provisions of the air carrier pilot
survey appear to have been adequate. The risk that individual pilots
might be identified from the raw data would be greater for the general
aviation survey, which involved a wider range of aircraft types,
several of which might be linked to very small populations of pilots.
NASA officials also expressed concern that pilots might have understood
NAOMS's promises of confidentiality as conferring the kind of legal
protection that voluntary reporting to a system like ASRS provides. We
found no evidence substantiating or refuting this understanding. To the
extent that confidentiality protections in NAOMS were adequate, any
fear that pilots would invoke legal protections that did not exist are
unfounded.
CATI Programming and Data Checks:
Partly because NASA emphasized the importance of not second-guessing
pilots, and partly because project staff wanted to avoid truncating
answers unnecessarily, the contractor built only limited edit checks
into the CATI data collection system, despite initial plans to the
contrary. The questionnaire used in training interviewers identified
one structured prompt for the number of hours a pilot reported having
flown during the recall period.[Footnote 69] It did not include any
other instructions to recheck values reported for specific questions if
they seemed unreasonable (perhaps indicating mistyping or an
interviewer-respondent misunderstanding).
Although the contractor documented edits and quality checks that it
performed on the collected data, the CATI system may not have included
all initially planned edit checks. The final questionnaire for
interviewer training suggests that additional edit checks were built
into the CATI system, but the contractor's data editing protocols
suggest that the edit checks were not consistently integrated into the
program. For example, when pilots were asked to break the time that
they flew different aircraft into percentages--such as 50 percent of
the time flying a Boeing 737, 25 percent flying a McDonnell Douglas MD-
80, and 25 percent flying a Boeing 727--the CATI system was supposed to
have forced interviewers to reenter information if the responses did
not add to 100 percent. Therefore, if, for example, the interviewer had
mistakenly entered 25 percent for each of the three separate aircraft
categories, the total percentage (75 percent) should have triggered the
CATI system to force the interviewer to reenter information until it
added to 100 percent, but the system did not in a handful of cases.
[Footnote 70] Although such anomalies were extremely rare in the air
carrier pilot data, multiple managerial reviews and tests of the CATI
programming before the survey was implemented failed to identify the
anomalies in advance of survey fielding.
For many of the questions that pilots were asked, the concern that
answers not be truncated unnecessarily by imposing predetermined edit
checks seems reasonable, given that the goal was to generate
statistically reliable information on aviation safety that was
otherwise unavailable. For other questions, such as those on total
engine failure and other rare events, input from aviation experts and
operational staff would have helped in constructing thresholds for the
checks in the CATI system. The additional data would have helped
analysts distinguish between true outliers and data entry errors and
between interviewer and respondent misunderstandings.
Survey completion rates were relatively high, and the NAOMS team
reported exceptionally few break-offs partway through the interviews.
It is impossible to know for certain whether the high completion rates
were because interviewers did not second-guess pilots by asking them to
repeat answers that researchers had deemed unlikely. To the extent that
interviewer rapport with pilots was enhanced because the pilots were
not second-guessed, the decision to limit the number of built-in CATI
edit checks may have enhanced the completion rates, at the expense of
complicating data cleaning and outlier identification.
Record-Keeping:
NAOMS record-keeping was fairly decentralized. While many of the
individual steps of the NAOMS project appear to have been documented in
some form, the project staff and contractors did not assemble a
coordinated, clear history detailing the project's management that
would facilitate evaluation of the overall air carrier pilot survey.
Information on the project's steps is largely dispersed across a series
of contracts and modifications between NASA and Battelle and internal
NAOMS team documents on individual pieces of the project. The lack of
summary documentation for various aspects of the project makes it
difficult to (1) distinguish between what was planned at the beginning
of the project and what phases were accomplished in later years,
following NASA priority changes for NAOMS's resources, and (2) assess
whether aspects of project and budget management raised the potential
risk of survey error.[Footnote 71]
Regarding the sample, the contractor kept limited information on the
size of the frame before and after filtering to identify air carrier
pilots. The size information the contractor maintained was not enough
to reconstruct the sampling fraction--the percentage of pilots sampled
each quarter from the filtered frame--for all quarters of the air
carrier pilot survey. Additionally, Battelle's procedures for
maintaining pilot confidentiality aimed to make it extraordinarily
difficult to identify which pilots were in the sample frame at any
given time. At the time of sampling, Battelle maintained enough
information to remove pilots who had already been sampled from future
samples for the next four quarters. Battelle did this partly because
the population was relatively small, and because they did not want to
interview the same pilot more than once a year. Although the contractor
lacked formal records, it estimated that the procedure led to the
exclusion of approximately 20 percent of the filtered sampling frame in
any given year.
Regarding NAOMS data, the lack of sampling records prevents analysts
from leveraging sampling information when producing estimates or
calculating sampling errors. Furthermore, the lack of these data
hinders the kinds of nonresponse bias analysis that the project team
originally planned. Without reliable information on the proportion of
cases that were removed from the sample in any given quarter, analysts
must rely on more conservative variance estimates than might have been
necessary, making the detection of changes over time more difficult.
Experiments in Data Collection and Recall Period Length May Have
Restricted the Utility of the First-Year Data:
Two main experiments that NAOMS researchers conducted in the initial
year of interviewing may have restricted the utility of first-year
data. Because the field trial had not resolved the optimal length of
time the survey's questions should cover, researchers used the final
survey to test first two and then three different recall periods for
several months. Subject matter experts on the team also advocated a
second experiment to determine the relative merits of a panel or cross-
sectional data collection approach. NASA officials told us that they
viewed the first months of the survey as part of a development phase,
rather than full implementation of the survey. Nevertheless, NAOMS
project staff have noted that adequate research on the feasibility of
combining data from the experimentation has not yet been done.
Depending on the results of such research, it may be imprudent to
evaluate NAOMS's first-year responses as if they were similar to the
trend data collected in subsequent years.[Footnote 72] Approximately
one-quarter of NAOMS air carrier pilot survey interviews were collected
under experimental conditions; the subsequent 3 years of the survey
used a cross-sectional data collection approach with a 60-day recall
period.
Panel or Cross-Sectional Approach:
As interviewing for the full survey began, project staff had not
reached consensus on whether to use a panel or a cross-sectional
approach for data collection.[Footnote 73] Panel data are observations
collected on the same sample of respondents over a period of time.
Cross-sectional data are observations collected on respondents at a
single point in time.[Footnote 74] While some team members opposed the
panel approach because of potential respondent attrition, others
thought that it might "encourage participants to become even more acute
observers of aviation system safety" and "produce a higher response
rate and higher response quality."[Footnote 75] However,
confidentiality procedures that removed the link between the sample
tracking system and respondent's answers meant that panel data would
not necessarily provide repeated observations for analysis in the NAOMS
data. According to the interviewers' manual,
"We will be asking panel members to give us a code word that we can use
to link interviews, but this code word will not be kept in our tracking
system. Pilots forgetting the word will not have their data linked."
[Footnote 76]
The NAOMS team decided to begin its first full year of air carrier data
collection using both panel and cross-sectional approaches.
After analyzing the first half-year of data, the team noted that, among
other things, the panel approach may have heightened pilots' awareness
of the timing of safety events but not the number of events recalled.
[Footnote 77] The project team decided, for the following four reasons,
to abandon the panel design in favor of cross-sectional data
collection: (1) the panel design resulted in fewer independent
observations; (2) the panel design was logistically difficult to
administer; (3) NAOMS's confidentiality procedures made analyzing
repeated observations over time impossible (the proportion of pilots
who remembered the password and thus could have data linked was not
reported); and (4) the cross-sectional design had yielded a
sufficiently high response rate to allay worries that pilots would be
unwilling to respond unless enlisted as panel members.
Recall Period:
As we have previously discussed, the lack of literature on pilots'
recall, in particular, and the wide variation in the literature's
recommended recall periods, more generally, made it difficult for the
team to decide on the most appropriate recall period. Team members had
extensively analyzed data from the field trial to determine any
differences among the recall periods tested in that survey.
Researchers' analysis showed that, as expected, respondents with longer
recall periods reported having flown more hours and legs than those
with shorter recall periods. Researchers' regression analysis also
confirmed a positive relationship between recall period and the total
number of events that pilots reported; the magnitude and statistical
significance of this relationship was strongest between 2 weeks (14
days) and 2 months (60 days). Additionally, the team examined pilots'
comments on whether their particular recall period had been
appropriate.
Despite these analyses, the team decided to delay the decision on
recall period until they had collected more data in the initial months
of the full air carrier survey. After reviewing the field trial results
and pilots' comments, the team was firm only in the belief that a 7-day
period was too short, despite a small-scale experiment suggesting this
period was optimal for pilots' memory of routine events. (However, a 7-
day period would have been too short to capture infrequent risk
events.) The team explored various tolerances for error, event
periodicity, and cost before testing 30-day and 90-day recall periods
in the survey's first two quarters of sampling.
After the first two waves of data collection, team members explored
data on the length of the recall period. Then they tested a three-way
split design, collecting an additional 2 months of cross-sectional data
to assess whether 60 days would be the best compromise between the 30-
day and 90-day periods. Using these data, the project team compared the
mean event rate over time across all core safety event questions--
noting that longer recall periods should result in pilots reporting
more events--and the standard deviation associated with these rates,
which declined as the recall period increased. However, the team did
not analyze the relationship between recall periods and specific events
or the correlation of exposure units (flight hours and flight legs) to
safety events for the different periods.[Footnote 78] Eventually, staff
chose 60 days as providing a reasonable balance between the recall of
events and avoidance of error. According to NASA officials, the
selected recall period was seen as a compromise between cost and
reliability. Despite the theoretical merits of the analyses justifying
this decision, researchers cannot independently confirm the accuracy of
reporting under different recall periods without separate data
validation efforts as part of the field trial or full survey. However,
the practicality of efforts to validate respondent accuracy depends on
the nature of the data being collected, the existence of alternative
data sources, and the design of the questionnaire. As NAOMS's survey
methodologist has observed, surveys would be unnecessary if a true
population value were known.[Footnote 79]
Because NASA's objective in designing and implementing the NAOMS survey
was to develop a data collection methodology, the team was warranted in
deciding to use the first year of data analysis to resolve questions
that had not been fully answered by the field trial. This is
particularly true for their decision to test various recall periods
that would help them find an appropriate balance between recall period
and budget and sampling constraints. As we have previously mentioned,
further analysis would be required to establish whether data collected
during the experimentation can be combined with later data using only
the 60-day recall period and cross-sectional approach. However, NASA
officials told us that the subsequent 3 years of cross-sectional data
collection with a 60-day recall period was sufficient to demonstrate
the capability of the air carrier pilot survey to measure trends.
Experienced Professional Interviewers Administered Technical Questions:
Training materials, questionnaire copies and revisions, specificity in
interviewers' scripts, and cooperation among staff demonstrate that the
team selected appropriate interviewers and was sensitive to key issues
throughout the questionnaire's development. The NAOMS project team
decided not to use aviation experts as interviewers in the belief that
the "lack of expert knowledge can be a benefit since the interviewers
are only recording what they hear rather than interpreting it through
the lens of their own experiences."[Footnote 80] To mitigate issues
that might have resulted from using interviewers unfamiliar with the
subject matter, the team emphasized the importance of the clarity of
the questions and consistency in how the interviewers read them and
responded to the respondents' questions.
The project staff emphasized the importance of using professional and
experienced interviewers and giving them adequate training to
administer the survey. NAOMS's principal investigator told us that the
interviewers Battelle used for the NAOMS survey were exceptionally
professional and were accustomed to conducting interviews on sensitive
topics.[Footnote 81] Interviewers received a training manual for the
project's first year, which included the following: a background on the
rationale for the NAOMS survey, a description of how the survey could
shed light on safety systems, the survey's confidentiality protections,
and information on the survey's sampling and tracking
information.[Footnote 82] They also received a paper copy of the
questionnaire with interviewer notes, pronunciation information, and a
glossary of aviation terms.
The NAOMS team conducted a series of cognitive interviews with pilots
to learn whether they would understand the questions and whether the
incidents they reported were those that the team sought to measure.
These interviews led to questionnaire revisions to address potential
ambiguities for both respondents and interviewers. Regardless of
efforts to develop clear questions that interviewers could read
directly and respondents could easily interpret and answer, the team
acknowledged that certain questions turned out to be less reliable than
others. For example, in considering a question series on the
uncommanded movements of rudders, ailerons, spoilers, and other such
equipment (see figure 6), the team's concern was that pilots might be
unaware of these events or might interpret uncommanded movements as
including autopilot adjustments.[Footnote 83] The survey instrument did
not include instructions to interviewers to clarify the intended
meaning of this set of questions, and question standardization alone
could not overcome the questions' potential ambiguity, despite
interviewers' skill.
Figure 6: NAOMS Air Carrier Questionnaire Section B, Question ER4 on
Uncommanded Movements:
[Refer to PDF for image: illustration]
ER4. How many times during the last (Time Period) did an in-flight
aircraft on which you were a crewmember experience uncommanded
movements of any of the following devices? (Read Questions)
a. Uncommanded movements of the elevators?
# Elevators:
b. Uncommanded movements of the rudder?
# Rudder:
c. Uncommanded movements of the ailerons?
# Ailerons:
d. Uncommanded movements of the spoilers?
# Spoilers:
e. Uncommanded movements of the speedbrakes?
# Speedbrakers:
f. Uncommanded movements of the trim tabs?
# Trim Tabs:
g. Uncommanded movements of the flaps?
# Flaps:
h. Uncommanded movements of the slats?
# Slats:
i. Did any other devices have uncommanded movements during the last
(Time Period)?
Yes: 1;
NO (Skip To ER5): 0;
RF (Skip To ER5): 7;
DK (Skip To ER5): 8.
1. Which devices?
Specify:
2. For Each Device Listed In ER4i1:
How many times did (Device Listed In ER4i1) perform uncommanded
movements during the last (Time Period)?
# Uncommanded Movements:
Source: Battelle Memorial Institute, NAOMS Reference Report: Concepts,
Methods, and Development Roadmap, prepared for the NASA Ames Research
Center (Nov. 30, 2007), app. 11-6.
[End of figure]
In its quality assurance procedures, Battelle monitored and documented
approximately 10 percent of the interviews. However, it did not record
audio of the interviews. Battelle's documentation states that the
monitoring procedure took the form of live supervisory monitoring of
interviews in progress, as well as callbacks to respondents to ask
about their interviewing experience and to administer key questionnaire
items again to see whether answers were reliable. However, NASA
officials told us that the callbacks were never performed, in keeping
with the project's concerns about pilot confidentiality.
Telephone Interviews Attained High Completion Rates, but Validation
Efforts Focused Primarily on Face Validity:
While interviewers for NAOMS attained high completion rates from pilots
in the sample, limited validation efforts hinder confirmation of data
quality. Roughly 80 percent of sampled pilots thought to be eligible
for the NAOMS air carrier pilot survey completed telephone interviews,
and a notable portion of those who were contacted were found to be
ineligible. The project team decided against conducting nonresponse
bias analysis and did not pursue other formal data validation, focusing
instead on the face validity of preliminary NAOMS rates and trends.
Completion Versus Response Rates:
In public presentations and documents of air carrier pilot survey
results, NAOMS staff often discussed the rate of sample cases that were
located and the proportion of interviews completed. The completion
rate, distinct from a response rate, surpassed 80 percent by the end of
the air carrier survey. Throughout the air carrier survey,
approximately 23 percent of those contacted were deemed ineligible
because they were not commercial air carrier pilots or had not flown in
the recall period. Additionally, approximately 24 percent of cases
drawn for the air carrier sample were never located and, thus, their
eligibility for the sample could not be determined.
A survey's response rate, defined, in general, as the number of
completed interviews divided by the eligible number of reporting units
in the sample, is often used as an indicator of data quality and as a
factor in deciding to pursue nonresponse bias analyses or additional
survey follow-up.[Footnote 84] OMB's guidelines, although not yet
formal when the NAOMS survey was implemented, call for a nonresponse
bias analysis when survey response rates fall below 80 percent. OMB
guidelines cite survey industry standards for response rate
calculations; these calculations generally include either unknown
sample cases or an estimate of likely eligibles among unknown cases, in
the denominator of the calculations.[Footnote 85] A calculation of
response rates that excludes unknown cases rests on the assumption that
all of those cases would have proven ineligible. For NAOMS data, a
response rate calculation that included cases of indeterminate
eligibility in the denominator (because the pilots could not be
located) would be closer to 64 percent. If the cases not located fell
out of scope at approximately the same rate as the cases that were
located and contacted, the NAOMS response rate would be approximately
67 percent.
NAOMS staff told us that they decided against pursuing nonresponse bias
analyses as initially planned because they thought that air carrier
completion rates were quite high for pilots who were located and
contacted and because NASA's priorities had changed, resulting in fewer
resources for staff to complete such activities. However, more
conservative calculations of response rates might have merited further
scrutiny, such as a nonresponse bias analysis or other research into
reasons for the sample rate of unlocated pilots. Comparing information
from the sample frame respondents' and unlocated pilots'
characteristics might have provided insight into any systematic
differences between the two groups.
Establishing Validity:
NAOMS project staff attempted to validate the data in a variety of
limited ways. Besides the interview monitoring, they made preliminary
calculations, such as a comparison of the hourly rate at which pilots
left the cockpit to deal with passenger disturbances. They found that,
unlike some other events, the rate dropped dramatically after September
11, 2001 (see figure 7), which demonstrated the importance of enforcing
existing rules requiring the cockpit door to be closed during flight.
Other validation attempts included checking on the seasonality of
events--for example, on whether reports of icing problems increased in
winter.
Figure 7: NAOMS's Preliminary Findings On Pre-and Post-September 11,
2001, Event Rates:
[Refer to PDF for image: illustration]
Pre and Post 9-11 Evaluation of Sample Events:
Event: Frequent Congestion;
Event rate, Pre 9-11 (per 1 mile legs): approximately 200;
Event rate, Post 9-11 (per 1 mile legs): approximately 130.
Event: Pilot Leaves Cockpit;
Event rate, Pre 9-11 (per 100k hours): approximately 800;
Event rate, Post 9-11 (per 100k hours): approximately 300.
Event: Bird Strike;
Event rate, Pre 9-11 (per 1 mile legs): approximately 420;
Event rate, Post 9-11 (per 1 mile legs): approximately 450.
Event: Cargo Shift;
Event rate, Pre 9-11 (per 1 mile legs): approximately 290;
Event rate, Post 9-11 (per 1 mile legs): approximately 280.
Source: NASA, ’National Aviation Operations Monitoring Service
(NAOMS),“ presentation to FAA (Washington, D.C.: Apr. 9, 2003).
[End of figure]
The NAOMS staff recommended more formal validation efforts, suggesting
the examination of questions that had been included in the survey
specifically because they could be benchmarked against other FAA data
systems, such as ASRS and the Wildlife Strike Database. Such work would
have been complicated, however, by the decision to use NAOMS data to
fill in data gaps from other safety systems and not to ask questions
that directly overlapped them, even for items included for
benchmarking. For example, NAOMS asked pilots about all bird strikes
without establishing a threshold for their severity. FAA does not,
however, require pilots to report all bird strikes to its Wildlife
Strike Database, only those bird strikes that cause "significant"
damage. Additionally, aviation researchers have estimated that up to 80
percent of bird strikes with civil aircraft are not reported to FAA's
Wildlife Strike Database.[Footnote 86] Therefore, it is not surprising
that NAOMS data imply a much higher incidence of bird strikes than
other systems.
In addition to considering examples such as pre-and post-September 11,
2001, rates, NAOMS staff had also examined other issues that had
intuitive appeal, such as seasonal fluctuations in reported bird
strikes.[Footnote 87] Project staff also suggested that the data
corresponded well with other data systems, citing as an example both
runway incursions--a decline in which the NAOMS team attributed to an
FAA policy change--and reserve fuel tank use--an increase in which had
reportedly been seen in ASRS.[Footnote 88] Additionally, for field
trial data, project staff examined the strength of the relationship
between the number of events reported and the hours flown or the length
of the recall period, because pilots flying more hours or recalling
events over longer recall periods should report more events than those
with fewer hours flown or shorter recall periods. In addition to having
face validity, the survey methodologist noted that the relationship
between events reported and flight hours and legs is also a measure of
construct validity, in that it demonstrated that NAOMS's measures
corresponded well with theoretical expectations. However, the
relationship does not confirm whether the events that pilots reported
actually happened. No other data validation efforts were undertaken on
the full survey.[Footnote 89] NAOMS project staff reported that several
questions in the NAOMS data had face validity, but the data still had
to be benchmarked. While such benchmarking is critical for validating
NAOMS data, it may not be sufficient to confirm the accuracy of pilot
recall for most NAOMS questions or to estimate the potential effect of
nonresponse bias.
Stakeholders Disagreed on the Utility and Value of the NAOMS Data:
The effectiveness of NAOMS as a monitoring tool depended on its ability
to provide reliable and valid estimates to address customers' concerns.
NAOMS team members promoted the survey's potential for generating rates
and trends but also debated whether the data could be used to establish
baseline counts of events for the NAS. NAOMS working groups were
started but disbanded before resolving this issue or benchmarking the
data against what was known from other safety data.
NAOMS Data and Systemwide Event Counts:
NAOMS team members agreed that the survey was designed to measure the
occurrence of events, rather than their causes. They did not clearly
agree on the survey's ability to provide systemwide counts of events,
rather than rates per flight hour or flight leg, or rate trends over
time. According to the project's leaders, NAOMS was never intended to
generate an absolute picture of the NAS (i.e., total counts of the
number of events in the NAS each year). They told us that its utility
was understood to lie in its ability to measure relative frequencies
that could be used to generate trends over time. However, NASA's OIG
found "a disparity between the stated goals of NAOMS and the manner in
which NAOMS project management initially presented the data to FAA," a
point that FAA also raised.[Footnote 90] Senior FAA officials told us
that NAOMS staff repeatedly indicated that the project would provide
"true" estimates of rates of safety events in the NAS at the project's
beginning, a capability that FAA disputed. NAOMS's emphasis on relative
trends, which FAA believed NAOMS could depict, happened only in later
stages of the project.
Regardless of whether NAOMS data were presented as counts or rates, the
data were never designed to serve as a stand-alone system. The survey's
methodologist told us that he believed that NASA staff were always
clear about the goal of establishing rates and trends, but that in the
absence of a baseline count of how frequently safety events occurred,
these rates were insufficient to specifically quantify change from the
survey's beginning. However, in theory, such data could be used to
generate trends if the nature of any sampling and nonsampling error in
data collection remained constant over time.
Additionally, the NAOMS survey methodologist described issues that
might jeopardize inferences about trends based on hourly rates. For
example, because rates per-exposure unit are a per-pilot measure,
rather than a system or aircraft measure, one could incorrectly
attribute a change in rates to a systemwide shift that might instead
have resulted from a change in technology that affected the number of
individuals in the cockpit crew. As we have previously mentioned, the
sampling frame, the filter, and potential noncoverage and nonresponse
issues would make further analysis necessary before one could conclude
that NAOMS's measures of rates per-exposure unit could be generalized
to the full population of air carrier pilots.
According to NASA's researchers, when the NAOMS contractors began to
work closely with the data, they began to extrapolate and generate
systemwide count estimates. NASA reported that one contractor believed
it was essential to report system counts: that is, counts were
necessary to convey the meaning of the data from a policymaker's
perspective and rates did not convey the significance of a given
result. Battelle staff used BTS data to weight NAOMS data according to
systemwide numbers of flight hours or flight legs and used these
estimates in several presentations of NAOMS preliminary results. The
staff reported to us later that they had decided against weighting up
to the full population of aircraft types because they did not think
that it made sense to combine operational size categories of aircraft.
The early presentations of the NAOMS data raised concerns for FAA,
because the numbers presented as systemwide estimates did not match
FAA's other information sources. Several FAA and NASA officials with
whom we spoke asserted that data from several specific survey items did
not correspond with the content of other reporting systems. However,
the items cited were not intended to overlap directly with data FAA had
already collected. NASA officials conceded that how NAOMS defined the
question wording might have contributed to one cited discrepancy. In
addition, FAA officials thought NAOMS was unable to accurately measure
systemwide rates of safety events and asked for extensive, specific
revisions to the survey to address specific questions. Among other
things, these officials wanted NAOMS to ask questions that were more
investigatory in nature than the broad monitoring concept that NASA had
envisioned. NASA did not make the changes that FAA recommended part way
through the survey. In correspondence with FAA, NAOMS researchers
emphasized that the survey's ability to measure trends required
consistent question wording. FAA officials were also concerned about
the quality of NAOMS data because the survey's questions were based
solely on pilots' perceptions.
NAOMS's Working Groups:
NASA's project leaders reported that the working groups were to play a
critical role in evaluating the validity of the NAOMS data and in
establishing whether the survey's information seemed reasonable, given
what was known about safety from other data sources.[Footnote 91] The
two working groups, established in 2003 and 2004, were distinct from
the two workshops conducted in 1999 and 2000, although the groups and
workshops were similar in that they both aimed to introduce the NAOMS
project to a wide range of stakeholders, including FAA and industry
members, and that they solicited input on the survey's goals and
questionnaires.[Footnote 92]
NASA envisioned a wide range of participants in the working groups,
including pilots; flight attendants; people familiar with alternative
data systems; and other aviation stakeholders, such as academic
researchers and industry. Project leaders told us that they did not
expect that participants would necessarily attain consensus, except to
the extent that the groups thought the NAOMS data appeared to be valid
and could publicly present the data in a way that would not be
automatically translated into systemwide extrapolation of event counts.
According to a presentation at the first working group meeting, in
December 2003, "the release of NAOMS data, and its future directions,
will be guided by the Working Group [sic]."[Footnote 93] NASA and FAA
representatives had agreed earlier that year not to release any survey
results before the working groups reviewed them and came to a consensus
on the timing, content, and level of the release of NAOMS data.
Discussing the fate of the 2003 and 2004 working groups, NASA's OIG
concluded in March 2008 that "the NAOMS working groups failed to
achieve their objectives of validating the survey data and gaining
consensus among aviation safety stakeholders about what NAOMS survey
data should be released."[Footnote 94] The working groups' limited
effect may have stemmed partly from disagreement over their
composition. NASA project leaders suggested that FAA had wanted an
existing advisory group to oversee efforts to validate the data,
whereas NASA wanted a different combination of academicians--
specifically, FAA staff, subject matter experts, and industry
stakeholders.[Footnote 95] FAA officials told us that they had serious
concerns about some of NASA's proposed experts, because these experts
cited preliminary estimates from NAOMS data that FAA found not to be
credible.
Additionally, portions of the working group agendas were dedicated to
discussing the importance of survey research for reliably measuring
trends. These discussions might indicate that some working group
members doubted the core foundations of the NAOMS project or the
survey's ability to supplement aviation safety systems.[Footnote 96]
According to an official in NASA's OIG, he believed that the
presentations at the working groups were, in a sense, an attempt to get
the working group participants on board with the NAOMS project.
NASA's project team suggested that the two working group meetings took
place necessarily late in the NAOMS project to allow for the collection
of enough preliminary data and to work through nondisclosure issues.
The team also suggested that the meetings "were largely dedicated to
organizational, procedural, and membership issues."[Footnote 97]
Moreover, presentations at the two working group meetings showed only
the contractor's preliminary aggregate analysis. Because the working
group members never had the raw data, they had no opportunity to
achieve consensus on the validity of NAOMS data or appropriate uses of
these data. NASA's project leaders have asserted, moreover, that the
"Working Group approach" was "terminated prematurely because the NAOMS
resources were re-directed to another approach."[Footnote 98] According
to the project leaders, policy changes resulted in the disbanding of
all advisory groups before a more formalized NAOMS group could be
assembled after the first two groups failed to reach their objectives.
Reestablishing any sort of advisory group would be difficult, because
NASA procedures would require prospective participants to undergo a
strict nondisclosure procedure.
Given that the working group members did not have access to the raw
data and did not agree on the groups' goals or composition, it is not
surprising that they were unable to productively pursue consensus on
the validity and utility of NAOMS data. Additionally, to the extent
that some participants rejected NAOMS's premise that a survey is a
valid and reliable way to generate safety-related data, they are not
likely to have believed that the data the project collected could be
validated. For example, while acknowledging that NAOMS had the
potential to allow reliable estimates of relative trends, FAA officials
told us that they disagreed that NAOMS could generate statistically
reliable rate estimates because of the subjectivity of NAOMS questions.
These officials questioned the ability of NAOMS's information to
generate rates or its capacity for validation by existing databases.
[Footnote 99] Additionally, FAA officials noted that they did not
believe any potential customers would have confidence in aggregate
NAOMS results unless the source data were released to the customers
directly, rather than to a working group. FAA also expressed concern
that pilots would lack causal knowledge to answer the survey's
questions. However, we have noted in this report that the questionnaire
was not designed to collect causal information. Additionally, we
believe that knowledge of why an event occurred should not be needed to
report whether a pilot witnessed or experienced a specific event.
A New Survey Would Require Detailed Planning and Revisiting Sampling
Strategies:
A new survey similar to NAOMS would require more coherent planning and
sampling methods linked to specific analytic goals. In addition, the
NAOMS survey exhibited some limitations that others might want to
avoid. Sufficient survey methodology literature and documentation on
NAOMS's memory experiments are available to conduct another survey of
its kind with similarly strong survey development techniques, built on
a similarly strong foundation.[Footnote 100] The sections that follow
suggest some elements of a new survey like NAOMS.
Conduct a Cost-Benefit Analysis:
Before undertaking a similar survey, researchers should review
developments in aviation safety and also the costs of and potential for
the NAOMS data to enhance policymakers' ability to measure trends and
effects on safety interventions. As NAOMS's application to OMB
observed, managers seek rational and data-driven approaches to aviation
safety, which "requires numbers that quantify the safety risks these
investments are expected to reduce, numbers that reveal trends
portending future safety problems, and still more numbers that measure
the effectiveness of past safety investments."[Footnote 101]
NAOMS air carrier data demonstrate that surveys can be used to generate
trend data measuring aspects of aviation safety, and some of the team's
researchers believe that the data's utility for monitoring the effect
of policy interventions has already been demonstrated. A survey like
NAOMS could supplement other safety information, but additional
analysis must determine whether NAOMS can be sufficiently useful and
cost-effective, given more recent events and technological
developments. For example, digital flight data could potentially
provide monitoring information, but they are not yet comprehensive or
regularly and thoroughly analyzed. Additionally, many data sources,
such as digital measurements of flight parameters, cannot illuminate
behavioral or perceptual information from operators that might bear on
aviation safety. Until such capacity exists, a survey like NAOMS may
nonetheless cost-effectively supplement other safety information and
identify where to look for other sources of safety information.
A thorough cost-benefit analysis should include the cost of additional
steps to develop the survey, such as further experiments, questionnaire
revisions, and pretesting.[Footnote 102] Such an analysis should also
address the potential costs and benefits of the survey in light of
resources required to analyze other sources of safety information. For
example, the cost of collecting and analyzing NAOMS-like data may be
small relative to the cost of thoroughly analyzing digital flight data,
but, depending on the questionnaire design, such analysis may not
identify causation.
Capitalize on Experimentation and Testing:
A future survey should build on the insights gained from NAOMS's
extensive developmental research on pilots' memory organization and
ability to recall events. The survey might undertake additional
experiments and testing to accommodate survey revisions resulting from
stakeholder interests and lessons learned from the NAOMS air carrier
pilot survey. A survey might supplement experiments with additional
cognitive interviews, behavioral coding, and reviews. Researchers
should consider the resources needed for wide-scale testing during the
survey's development. Whereas research demonstrates the benefits of
adapting a survey's content to the subject matter and population of
interest, researchers would want to consider the availability of
resources and time to conduct the experiments necessary to reduce
respondent burden and increase accuracy. Additionally, researchers
should engage in data validation efforts beyond establishing face
validity when making important design decisions, such as which recall
period to use.
Generally accepted survey practice is to use a field trial to test a
questionnaire that is as similar as possible to the final
questionnaire. Accordingly, a future survey might attempt to
incorporate the results of the experiments, cognitive interviews, and
full set of questions into a field trial questionnaire. A future survey
should also run a monitored CATI pretest on the final version of the
questionnaire, to test the automated programming and ensure that
interviewers and respondents appear to interpret questions correctly.
Collaborate with Customers in the Survey's Development:
Beyond soliciting and incorporating feedback from aviation safety
stakeholders, staff promoting a new survey like NAOMS should work
directly with the survey's presumed customers to specify the uses of
the data. While it is not essential that these data inform policy
interventions, policymakers should agree on their potential utility. A
customer's rejection of the premises of a data collection system--as
happened with FAA's rejection of the idea that NAOMS would provide a
reliable safety monitoring system--should be resolved before full data
collection begins, and consensus on the survey's goals and uses should
be formally documented. Otherwise, alternative customers should be
identified or the survey's design and goals should be revisited.
Consulting with potential customers on the wording and likely use of
specific questions would enhance the utility of the survey's data. An
analysis of the existing NAOMS data by both scientists and customers'
representatives could help demonstrate how specific analytic products
might directly or indirectly serve organizational missions.
Assess Whether Questionnaire Content Facilitates Planned Analyses:
In the NAOMS air carrier pilot survey, there is the potential for more
than one crew member on the same aircraft or on separate aircrafts to
have reported the same incident. Proportional allocation or segregated
analysis of different types of crew might help address the potential
for multiple reports of the same event but can be difficult to
implement. Nevertheless, survey designers should consider their
analytic goals when designing the questionnaire--that is, are they
looking for per-crew member risk estimates or system counts? Certain
goals may require researchers to adjust the data, while others may not.
Overall, survey designers should be prepared to compare the sensitivity
of their estimates with different strategies and under different
assumptions.
Future efforts to collect safety information from pilots in a survey
might also reconsider the potential effect of sampling pilots who fly
more than one type of aircraft during the recall period or in more than
one crew capacity. The survey designers might want to consider whether
NAOMS's confidentiality considerations outweigh the potential benefits
of allowing pilots to link reported events to particular aircraft,
given the perceived link between operational size class and risk
exposure. To facilitate estimates, the designers of a future survey
should also explore the feasibility of modifying the questionnaire to
allow pilots to identify specific aircraft and crew capacities
associated with each report of a safety event. They would benefit from
establishing an analysis plan in conjunction with the questionnaire.
Doing so would help determine the utility of adding and deleting
questions and would clarify, at the analysis stage, the effect that
doing so would have on data collection.
Detail Analytical Goals and Strategies in Advance of Fielding:
To ensure consensus on the usefulness of the data, a detailed analysis
plan should be developed. The plan should include basic information on
likely estimating the strategies and uses of the data, as well as
detailed information on likely adjustments or weights needed to take
account of questionnaire design and sampling and of the potential uses
of the data. Any adjustments to the analysis plan for operational
considerations, preliminary results, policy changes, or unforeseen
circumstances should be formalized as data collection progresses.
NAOMS was intended to capture precursors to accidents and
nonsignificant risks and to supplement other aviation safety
information. It was expected that rate trends seen in the NAOMS data
would point aviation safety experts toward what to examine in data
systems. Therefore, aviation safety experts and stakeholders would have
to conduct more extensive analysis than was conducted in the NAOMS
project to establish whether rates and trends could be used for this
purpose. Additionally, for a similar survey, analysis would have to
establish whether data generated from different recall periods,
interview methods, or operational size categories were sufficiently
similar to allow data to be combined, and whether making adjustments to
sampling strategies or question wording is necessary to accommodate
analytic goals.
The NAOMS survey was intended to provide a better understanding of the
safety performance of the aviation system, and to allow for the
computation of general trends over time, in order to supplement safety
systems. A survey with a different goal--one that was investigative or
intended to understand the causes of events--would seek information
different from those asked for in the NAOMS questions. Depending on the
customers' intended use of the data, developers of a future survey
might consider writing questions that asked about, for example, the
causes of engine failures or details about air crews' experience of
engine shutdowns. Whereas questions such as the latter would be
consonant with NAOMS's goal of describing precursors to safety events,
the former would be more investigative. Developing a detailed analysis
plan in conjunction with the questionnaire would help ensure that the
survey included questions relevant for specific analyses.
Revisit Sampling Strategy:
Given the proportion of out-of-scope cases drawn into NAOMS's filtered
sample, and the cost of finding and contacting them, the designers of a
future survey should reevaluate the merits of using a database like the
Airmen Registration Database as a sampling frame relative to potential
alternatives, to ensure that the database is still the most cost-
effective or programmatically viable means of identifying the target
population.[Footnote 103] Other frames, such as industry or union
lists, might be considered or alternative stratification and filtering
strategies might be used to identify air carrier pilots. Sampling
strategies must also consider whether the proliferation of cell phones
will require adjusting contact methods to target a population as mobile
as pilots.
Analysis of data such as the NAOMS data might compare different
approaches to calculating trends and exposure rates to see if
substantive conclusions were similar. Analysts might also want to
determine how their estimates relate to the overall NAS. For example,
if estimates can address only crew-based risk exposure, they probably
do not characterize the NAS, although they may provide other important
information for aviation safety monitoring. To the extent that
characterizing event levels for the NAS is a goal, a survey like NAOMS
might require a different sampling strategy than for a survey designed
primarily to monitor trends. Sampling records, including sources used
to construct a sample frame and the frame itself, should be maintained
for potential use in estimates and nonresponse bias analyses.
Write a Detailed Implementation Plan:
A detailed implementation plan would help ensure the continuity of
management and record-keeping for the project and would help ensure
that steps like data validation and bias analyses are carried through
on a schedule. Given the risks and trade-offs inherent in any survey
endeavor, such a plan would also help to ensure that future analysis of
the data can accommodate decisions made in the face of changing
conditions or for practical considerations.
While benchmarking and face validity checks are important aspects of
data validation, they may not be sufficient to confirm the accuracy of
pilot recall or estimate the potential effect of nonresponse bias. Even
so, besides conducting quality checks on the interview process, future
survey developers should undertake formal data validation efforts
during data collection and questionnaire development. Nonresponse bias
analyses should be planned and completed. The survey's sponsors should
allocate resources to fully benchmark the data.
NAOMS's confidentiality provisions appear to have been adequate.
Nevertheless, researchers interested in implementing a similar survey
might find it useful to further delineate the kinds of data that might
be released and the techniques that might be used to remove identifiers
from datasets before implementing the survey. In light of other
agencies' mechanisms for releasing individual-level data to screened
researchers in a controlled fashion, survey documentation should also
clarify the conditions under which data could be released to outside
researchers, as appropriate.
While the NAOMS extended survey sample fielding period may have been
necessary to attain a high response rate from a population as mobile as
pilots, future researchers should compare the nature of the answers
from pilots who were contacted with relative ease with the answers from
pilots who it took greater effort to contact. These researchers should
also consider an extended field period's implications for how quarterly
statistics are generated in light of potential changes to the sampling
frame over time.
There is some merit to NASA's assertion that the working groups could
not conduct any data validation, without access to the data. In a
future survey, such groups might be constituted earlier, so that data
are available for discussions on data validation. A future effort might
use such working groups in parallel with data collection, thus
soliciting and formalizing the participation of stakeholders. This
parallel effort might help the new effort begin validation as soon as
sufficient data are collected. It might also help circumvent disputes
over the potential uses of the survey data.
Finally, researchers pursuing efforts similar to the NAOMS project
might usefully delineate in advance exactly how rates will be
calculated, how potential issues will be clarified, and how the data
will be interpreted. A future survey might benefit from tighter
coordination between its designers and contractors to ensure that
public presentations of preliminary results, when there is still
significant debate about the validity of the results, show only the
numbers agreed to by project staff.
Concluding Observations:
As a monitoring tool, NAOMS was intended to point air safety experts
toward trends, to help show FAA and others where to look for causes or
extremely rare safety events in other datasets. As a research and
development project, NAOMS was a successful proof of concept. However,
the data that NASA collected under NAOMS have not been fully analyzed
or validated by project staff or aviation safety stakeholders.
Depending on the research objective, proper analysis of NAOMS data
would require multiple adjustments. Additionally, because of their age,
existing NAOMS data would most likely not be useful as indicators of
the current status of the NAS.
A similar project, adequately funded and appropriately planned, could
accomplish what NAOMS intended to do. According to a 2008 FAA
presentation to the National Research Council:
"The NAOMS survey could be very useful in sampling flight crew
perceptions of safety, and complementing other databases such as ASRS.
The survey data, when properly analyzed, could be used to call
attention to low-risk events that could serve as potential indicators
for further investigation in conjunction with other data sources."
[Footnote 104]
In this report, we have both described NAOMS's limitations sufficiently
to enable others to look at redesigning them and suggested ways in
which a newly undertaken project might successfully go forward. The
planners and designers of a new survey might want to supplement it
where NAOMS was self-limiting, by incorporating research into
investigatory questions of the type that interested FAA, or to more
specifically detail its monitoring capacity in conjunction with
existing aviation safety systems. Alternatively, a newly constituted
research team might lead operational, survey, and statistical experts
in extensively analyzing existing data to validate a new survey's
utility for various purposes or to illuminate future projects of the
same type.
Agency Comments and Our Evaluation:
We provided a draft of this report to the National Aeronautics and
Space Administration and to the Department of Transportation for their
review. Transportation had no comments on the draft report. NASA
provided written comments, and appendix II contains a reprint of the
agency's letter. NASA also provided technical clarifications, which we
incorporated into the report as appropriate.
In response to the draft report's characterization of NAOMS, NASA
emphasized that NAOMS was a research and development initiative. We
revised the report to more clearly reflect this aspect of NAOMS. NASA
also stated that the draft report inappropriately asserted that NAOMS's
goals changed over time, and noted that the principal goal of the
project was always to develop a methodology to assess trends or changes
over time. While we recognize that this was a primary goal of the
project and have revised the report to clarify this issue, we believe
that the project staff were not consistent in how they presented
NAOMS's likely capabilities to other aviation stakeholders over the
life of the project. NASA was also concerned about the draft report's
discussion about maintaining pilot confidentiality, citing its own
research on the risk of pilot disclosure in the NAOMS data and the
inability to determine individuals' motivation for trying to identify a
specific pilot. We agree with NASA's concern about pilot identification
and have revised the report to highlight NASA's concern; however, we
also note that other government agencies have developed mechanisms for
releasing, in a controlled manner, extremely sensitive raw data with
high risk for the identification of individuals to appropriate
researchers.
We also provided a draft of this report to Battelle (NASA's contractor
for NAOMS) and Jon A. Krosnick, Professor, Stanford University (the
survey methodologist for NAOMS) for their review. Battelle provided no
comments on the draft report. Dr. Krosnick reported that he found the
draft report to be objective and detailed, and that he believed it will
contribute to the public debate on NAOMS. He also provided technical
clarifications, which we incorporated into the report as appropriate.
As agreed with your offices, unless you publicly announce its contents
earlier, we plan no further distribution of this report until 30 days
after its issuance date. At that time, we will send copies of this
report to relevant congressional committees, the Administrator of the
National Aeronautics and Space Administration, the Secretary of the
Department of Transportation, and the Administrator of the Federal
Aviation Administration, and other interested parties. The report also
will be available at no charge on the GAO Web site at [hyperlink,
http://www.gao.gov].
If you or your staffs have questions concerning this report, please
contact Nancy Kingsbury at (202) 512-2700, kingsburyn@gao.gov, or
Gerald Dillingham at (202) 512-2834, dillinghamg@gao.gov. Contact
points for our Offices of Congressional Relations and Public Affairs
are on the last page of the report. GAO staff who made key
contributions to this report are acknowledged in appendix III.
Signed by:
Nancy R. Kingsbury, Ph.D.
Managing Director, Applied Research and Methods:
Signed by:
Gerald L. Dillingham, Ph.D.
Director, Physical Infrastructure Issues:
List of Requesters:
The Honorable Bart Gordon:
Chairman:
Committee on Science and Technology:
House of Representatives:
The Honorable Gabrielle Giffords:
Chair:
Subcommittee on Space and Aeronautics:
Committee on Science and Technology:
House of Representatives:
The Honorable Brad Miller:
Chairman:
Subcommittee on Investigations and Oversight:
Committee on Science and Technology:
House of Representatives:
The Honorable Mark Udall:
United States Senate:
The Honorable Jerry Costello:
House of Representatives:
The Honorable Daniel Lipinksi:
House of Representatives:
[End of section]
Appendix I: Technical Issues Relating to NAOMS's Development and Data:
In this appendix, we present in more detail a few topics we discuss in
the report. They are the (1) National Aviation Operations Monitoring
Services' (NAOMS) memory experiments; (2) NAOMs's cognitive interviews
with pilots; (3) estimating the effect of the sampling frame, filter,
and operational considerations; (4) outlier detection and mitigation;
and (5) allocation strategies.
Memory Experiments:
The recall and memory experiments for the core safety event section
began with three focus groups conducted in August and September 1998,
consisting of 37 pilots, and one-on-one "autobiography" interviews of 9
pilots. The autobiographies gave the team insight into pilots'
experiences and how they thought about events, enabling the team to
develop potential event clusters that matched general categories
suggested by the pilots' responses. The focus groups and
autobiographies helped in generating questions about different types of
events that would link to the major hypothesized memory structures--
flight phases, causes, and severity--and eventually a hybrid type that
contained causes and flight phases.[Footnote 105]
The NAOMS team and its subject matter experts then listed 96 events--
some based on actual experiences, some purely hypothetical--that
covered different permutations of these events. For example, they
differentiated between minor, moderate, and major problems during
takeoff, cruise, and other phases of flight, involving specific causes
and resulting in specific events. Examples were "major, approach,
weather, spatial deviation" and "minor, landing, people-problem with a
conflict or in-flight encounter." A sorting experiment used the list
derived from this process. Researchers gave 14 pilots 96 randomly
sorted cards, each containing an individual event, and asked them to
sort these cards into stacks containing events that were similar to one
another, and to label the stacks descriptively. This sorting task
further confirmed potential clusters in the pilots' memory structures.
A quantitative analysis of four competing hypotheses of organizational
schemes (cause, flight phase, combined cause and flight phase, and
severity) showed that the scheme that contained both causes and flight
phases best explained the results of the sorting experiment.
The project team also assessed the order in which pilots recalled
events. The team transcribed the 96 events onto individual sheets of
paper and randomly sorted them before presenting them to 9 pilots to
read. The pilots then were asked to solve a set of anagrams completely
unrelated to aviation--a "distraction" activity to clear their minds--
before recalling specific events from the list of 96 events. The
researchers tape-recorded what the pilots said, transcribed the
responses, and analyzed the resulting data, using an index called
"adjusted ratio of clustering" for each of the four hypothesized
schemes. Data again indicated that a scheme combining causes and phases
of flight best represented pilots' prevalent memory structures.
For a final confirmatory test of the best organizational approach to
pilots' memory structures, the project team randomly assigned 36 pilots
to 1 of 4 experimental conditions. This test was similar to the recall
study, except that pilots in 3 of the experimental conditions were
offered cues to prompt event recall (cause, phase, or a combination of
the two). The cues that combined cause and phase appeared to optimize
the number of specific events that a pilot could recall.
A memorandum summarizing these results added a final caveat on question
order: that is, events were to be ordered from the weakest in memory to
the strongest in memory. This ordering would accord with literature
that showed that strong memories can obscure lesser ones in the same
memory cluster. The memorandum's author recommended further research
with pilots to develop a ranking of weak to strong memories. It does
not appear that formal analysis was conducted, although it is likely
that some NAOMS researchers tapped into their own flying and other
aviation experience to help sort events on the final questionnaire.
Cognitive Interviews:
For the full air carrier pilot survey, researchers interviewed four
Aviation Safety Reporting System (ASRS) analysts, all of retired
pilots, plus seven active pilots recruited from personal friends of
NAOMS staff. At least six of the seven active pilots were air carrier
pilots who would have been within NAOMS's target population.
The questionnaire was revised between the three separate sets of
cognitive interviews, but not between participants within a set of
interviews--the four ASRS analysts, the six air carrier pilots, and the
7th pilot. The revisions included changes the survey methodologist
recommended to more appropriately match the memory structure that the
earlier experiments had revealed, as well as changes to accommodate
issues raised in the cognitive interviews. We do not have evidence to
suggest whether the questionnaire's final version was cognitively
tested before the survey's implementation. Interviewers and Battelle
Memorial Institute (Battelle) managers did conduct a series of
interviews to test the flow of the computer-assisted telephone
interview (CATI) programming before the survey was implemented.
Estimating the Effect of the Sampling Frame, Filter, and Operational
Considerations:
The decisions that decreased the likelihood of identifying the NAOMS
survey respondents made it necessary for analysts to adjust their
estimates. In making adjustments, analysts generally look to their
analytical goals and to the likely effect of an adjustment on the
substantive interpretation of an estimate compared with an alternative.
The analysts also try to explore whether adjustments made to address
specific problems affect adjustments to address other issues. For
example, a series of adjustments to address different features or
limitations of the data may render the interpretation of estimates too
complicated for practical use. Changes in external datasets used for
benchmarking or in creating projections may affect the interpretability
of the data over time. In the case of the NAOMS data, sampling, design,
and implementation decisions complicate straightforward estimates for
either system counts or rates.
For a full analysis to account for issues related to questionnaire
design, sampling, and implementation, the NAOMS air carrier data would
require multiple adjustments and imputation. Additional analyses would
be required to determine the nature and effect of these adjustments.
Before the project's end, NAOMS researchers analyzed potential biases
that they believed resulted from the filter used to identify air
carrier pilots from the sampling frame. These analyses are critical for
determining the appropriate uses of the data. We believe that the first
priority for further analysis is to estimate the effect of the sampling
frame. That is, however appropriate NAOMS's use of the publicly
available Airmen Registration Database may have been for cost and
programmatic considerations, it has not yet been established whether
the frame sufficiently represented air carrier pilots in general,
especially in light of pilots' ability to opt out of the registry.
Potential analytic approaches to assessment include but are not limited
to the following:
* Comparing pilots' reported airline fleet characteristics in the
survey with outside data on the size of air carrier fleets. NAOMS
project staff added a question on airline fleet size to the survey
expressly to be able to gauge whether the pilots in the Airmen
Registration Database flew in fleets similar to the air carrier fleet
distribution as a whole. While this analysis might provide compelling
information about how representative the frame was, it is insufficient
to demonstrate that the frame fully represented air carrier pilots of
interest or air carrier pilots covered by the full frame. For example,
it is conceivable that the distribution of pilots' airline fleet
characteristics correspond between NAOMS data and data derived from
other sources, but that the distribution of pilot characteristics
within each fleet size was systematically biased toward more
experienced pilots who were better able to foresee and avoid safety-
related events.
* Comparing pilot characteristics from the publicly available frame or
the sample (as a random subset of the frame) with the full database
that the Federal Aviation Administration (FAA) maintained. Ideally, the
comparison would have been made with files used for survey fielding.
However, Battelle has reported that it does not have enough data to
make such a comparison. A NAOMS team member suggested that for an
alternative, one could compare the full FAA database with the publicly
available registry on a range of characteristics both relevant and
external to NAOMS's concerns. Without knowing whether the nature of the
opt-out registry had changed over time, this analysis would help
determine whether pilot characteristics in the public frame can be
generalized to those in the full frame. However, because neither
database contains information on pilots' employment or union
membership, this analysis would be insufficient to determine whether
the frame used for NAOMS data collection was systematically biased to
include or exclude pilots from certain airlines or unions. Thus, this
approach would complement, not replace, the analysis comparing fleet
characteristics discussed in the previous bullet.
* Conducting something like a nonresponse bias assessment. Analysts
would take random samples of pilots within the filtered frame as it
would be constructed from the publicly available database and from the
full FAA-maintained database and would use a survey to compare pilot
characteristics for these two samples. Ideally, this would have been
done during the survey field trials; however, in the absence of
compelling evidence that the nature of the two databases had changed
over time, the comparison could still provide insight on whether pilots
in the opt-out frame were sufficiently similar to those in the full
database to treat the opt-out frame as representative of the
population. Depending on its design, a study such as this would allow
analysts to focus on characteristics that were most relevant to NAOMS,
such as career flying hours or experiences of safety events, and would
also provide a means of gauging potential bias in terms of employers,
union membership, and other factors that are not expressly collected in
the certificate database.
In any case, analysts of NAOMS data must pursue additional research to
determine the existence and nature of potential biases from using the
public database rather than the full database, and determine whether
and which analytic strategies will ensure that the results adequately
represented safety events in the population of interest.
In addition to adjustments for sampling considerations, other analyses
may be useful in generating estimates and necessary adjustments. For
example, to mitigate the effect of coverage bias in systemwide event
count estimates, the NAOMS team advocated using Bureau of
Transportation Statistics data related to operational size categories,
carrier size, flight hours, and flight legs as benchmarks for weighting
these data. The feasibility of using exogenous information to weight
NAOMS data depends heavily on achieving a consensus on the appropriate
and inappropriate uses of the survey regarding measuring risk exposure
and safety events in the national airspace system (NAS).
Battelle recommended statistical modeling--in particular, generalized
linear modeling--to develop "more refined rate estimates."[Footnote
106] Generalized linear models would have allowed estimates of safety
event rates, while controlling for the independent effect of factors
such as season and operational aircraft size.[Footnote 107] Battelle
conducted preliminary modeling with generalized linear regression
models on grouped sets of data. The utility of such models is
contingent on the goals of the analysis and the nature of bias or
patterns of missing data; adjusting for independent factors may not be
appropriate when generating rate estimates to project to the
population. One Battelle statistician noted that NAOMS data lacked
important explanatory factors, and that statistical models could suffer
from omitted variable bias (which is unrelated to whether these data
can be projected to the population of interest). This criticism did not
account for the fact that NAOMS's data were not designed to be used for
an investigative process or to establish causation.[Footnote 108]
Estimates from NAOMS are further complicated by the need to distinguish
between risk based on time exposure and risk related to the number of
takeoffs and landings. Analysts using NAOMS data might want to compare
various approaches to calculating trends and exposure rates to see if
different analyses result in similar substantive conclusions. They
should also clarify whether and how estimates relate to the overall
system--for example, if they can address only crew-based risk exposure,
one might ask whether this is sufficient for characterizing the NAS.
Outlier Detection and Mitigation:
Outliers can greatly influence the interpretation of statistical
analyses. Outlier detection and cleaning, which should consider both
statistical and operational concerns, require help from subject matter
experts who can identify whether a given data point seems "reasonable"
in context. Researchers may also consider whether data follow
statistical distributions, such as binomial or Poisson distributions,
in deciding how to identify or exclude outliers. Additionally,
researchers should consider whether the unit of analysis (whether
counts or rates) leads to identifying different cases of outliers and
the effect of various methods of outlier detection and cleaning on the
substantive interpretation of the analysis.[Footnote 109]
Causes of outliers can be respondents' mishearing or misinterpreting a
question or deciding not to respond truthfully. Outliers may also
reflect accurate data that do not correspond with the preponderance of
cases. For example, one Battelle researcher cited the "cowboy theory"
of aviation safety--the notion that the vast majority of accidents are
caused by a small proportion of pilots. Battelle also suggested that
some pilots might report events that they had not experienced in order
to deliver a message about safety.
Survey research data collected by CATI methods are also subject to
several types of outliers. An interviewer may mistype a response--for
example, entering 3 as 33. CATI systems often use range checks to
prevent such errors: that is, if what is typed exceeds a numerical
threshold, the interviewer is prompted to ask the question again or to
key the data again. Few hard range checks were incorporated into the
NAOMS CATI program, because NASA had instructed the contractor not to
question the veracity of pilots' responses by having interviewers re-
ask questions if a response seemed unusual. The lack of range checks
makes it more difficult to distinguish between outlying answers that
were mistyped and those that represented accurate respondent answers.
The use of free-text fields to record aircraft type may also have
complicated the identification of unreasonable answers for air carrier
pilots.
For most questions, the contractor developed an outlier cleaning method
that was thought to be both appropriate and objective.[Footnote 110]
This method was used to identify and remove cases of "doubtful quality"
(such as whether the ratio of flight hours to flight legs was
unreasonable or whether a pilot had "unreasonable" values on multiple
questions), cases lacking information in the questionnaire's fields on
flight activity, and additional outliers flagged as "not applicable."
Although the method provided a consistent means of approaching outliers
for each question, it did not account for whether reported values made
sense in an operational context. Furthermore, the method was developed
only midway through data collection. Had the method been developed
farther along, more data might have helped clarify whether a
distribution-based approach to outlier detection would have been
appropriate. To more thoroughly consider statistical and operational
concerns, further strategies for data cleaning and outlier detection
would benefit from using the full data.
Allocation Strategies:
The NAOMS survey has the potential to collect multiple reports of
safety events witnessed by more than one crew member or involving
multiple aircraft. Several NAOMS researchers believe that the effect of
this issue has been overstated, particularly in light of potential
analytical strategies to remedy this problem. Additionally, such
concerns do not apply to analyses that determine per-crew member risk
exposure (as compared with systemwide projections of event counts), if
each individual crew member had an equal chance of being selected.
Strategies that researchers have suggested for addressing the potential
for multiple reports of the same event include proportionally
allocating events by the likely number of crew members on each
aircraft. However, because the number of crew members varies by
aircraft size and flight--for example, long international flights
require relief crews--this strategy is complicated by the inability to
determine for certain which aircraft was involved in a specific
incident when a pilot flew more than one aircraft during the recall
period.[Footnote 111]
An alternative strategy would be to calculate events reported by pilots
who flew as captains separately from those events reported by other
pilots--that is, first officers, flight engineers, and relief pilots.
However, this approach might also be complicated by the possibility
that pilots flew in more than one capacity over the recall period and
the questionnaire does not allow pilots to identify whether they were
the captain when experiencing a reported safety event. Furthermore, to
the extent that sampling techniques resulted in bias related to the
likelihood of flying in a given capacity--that is, the so-called "left-
seat bias" that resulted in disproportionate sampling of captains
thought to have resulted from the sample filter--segregated analysis of
different crew members would require adjustments to project event
counts systemwide.
The inability to link reported safety events for pilots who flew more
than one aircraft type to a specific aircraft (and, by implication, to
a crew size) or day requires developing allocation strategies for other
aspects of the data. Before settling on the nonproportional allocation
strategies that we describe in this report, Battelle explored
alternatives for allocating aircraft among operational size categories
and seasons in its preliminary analyses of NAOMS data. For both size
category and season, Battelle first attempted to allocate reported
safety events and hours flown proportionally across the number of days
in a given season or according to the percentage flown per aircraft.
Both allocations proved unsatisfactory as it became administratively
infeasible for the NAOMS team to maintain either system as data
collection continued. Additionally, the allocations resulted in
fractional degrees of freedom, in that reports from pilots that were
split across seasons or aircraft were treated as less than a full case.
Similarly, treating proportionally allocated safety events entails
theoretical difficulties--for example, was it legitimate when
calculating rates to count one-half or one-third of a bird strike?
While proportional allocation or segregated analysis of different types
of crews may help to account for potential reports of the same event,
these strategies may be difficult to implement because pilots could
have flown more than one aircraft type or in multiple crew capacities
during the recall period and because of seasonal patterns in the data.
As with other weights and adjustments, researchers need to consider
their analytical goals--for example, whether they are looking for per-
crew member risk estimates or system counts--and should be prepared to
compare the sensitivity of their estimates with different strategies
and different assumptions. Analysts should also assess whether and how
the necessity of multiple adjustments and allocations limits the
utility of the data for characterizing trends in air carrier aviation
safety.
[End of section]
Appendix II: Comments from the National Aeronautics and Space
Administration:
National Aeronautics and Space Administration:
Headquarters:
Washington, DC 20546-0001:
February 19, 2009:
Reply to the Attention of: Aeronautics Research Mission Directorate:
Dr. Gerald L. Dillingham:
Director:
Physical Infrastructure Issues:
U.S. Government Accountability Office:
Washington, DC 20548:
Dear Dr. Dillingham:
NASA appreciates the opportunity to comment on the National Aviation
Operations Monitoring Service (NAOMS) draft report. We wish to express
our appreciation to the staff of the GAO for their courtesy and for the
effort they expended in acquiring a high level of understanding of this
complex project over a very short period of time.
The comments below apply to the report as a whole. In addition,
enclosed are the specific comments listed in sequential order as they
relate to items in the report, as requested by your team.
General Comments:
1. This document should emphasize that NAOMS was a NASA research and
development (R&D) project to develop a methodology, and not a formally-
adopted operational survey "product." That is, its purpose was to
evaluate the feasibility of developing a methodology for assessing
safety-related trends or changes over time on a system-wide basis.
Although a great deal of initial effort was expended to get as much
correct as possible, an objective of the project was to see not only
what worked but what did not work, what biases evolved, and if and/or
how they might be addressed. Towards the end of the report, it does
acknowledge that the purpose of NAOMS was to develop a methodology.
However, it would be helpful if this were also stated earlier in the
report and in the Executive Summary.
2. The report makes numerous assertions that the goals of the project
changed (from trending or changes over time to 20-percent changes from
one year to the next and from event counts to trends). The primary goal
of the project from the beginning was to develop a methodology to
assess trends or changes over time (e.g., the Aviation System
Monitoring and Modeling project plan of 2000 and the early
presentations in 1999 and 2000).
3. The report compares the NAOMS data with the Aviation Safety
Reporting System (ASRS) data and states that there is less risk of
respondent confidentiality being comprised within NAOMS. Even though
the authors of the report recognized improved data mining techniques
that are available today, they did not see this as a great risk to
NAOMS. The NASA work on the "Assessment of Probability of Disclosure"
definitely showed that there were risks to pilot disclosure depending
on the available information of an event or the motivation to identify
a particular pilot. NASA contends that it is impossible to predict
factors that may motivate someone to try to identify a pilot so that it
is always necessary to protect a pilot's identity.
4. The last recommendation of the report emphasizes the need for
consensus among team members before exposing controversial data.
However, as pointed out with reference to the Federal Aviation
Administration and the working group, such consensus--for many reasons--
is difficult to achieve. This recommendation does, however, serve to
point out the complex and difficult decisions that had to be made
throughout the project.
In closing, NASA would again like to thank you for the opportunity to
provide comments on this draft report.
Sincerely,
Signed by:
Jaiwon Shin:
Associate Administrator for Aeronautics Research Mission Directorate:
Enclosure:
[End of section]
Appendix III: GAO Contacts and Staff Acknowledgments:
GAO Contacts:
Nancy R. Kingsbury, Ph.D., (202) 512-2700, or kingsburyn@gao.gov:
Gerald L. Dillingham, Ph.D., (202) 512-2834, or dillinghamg@gao.gov:
Staff Acknowledgments:
In addition to the persons named above, H. Brandon Haller, Assistant
Director; Teresa Spisak, Assistant Director; Carl Barden; Ron
LaDueLake; Maureen Luna-Long; Grant Mallie; Erica Miles; Charlotte
Moore; Anna Maria Ortiz; Dae Park; Penny Pickett; Mark Ramage; Carl
Ramirez; Mark Ryan; and Richard Scott made key contributions to this
report.
[End of section]
Bibliography:
Many publicly available documents on the National Aviation Operations
Monitoring Service (NAOMS) are at the National Aeronautics and Space
Administration's (NASA) Web site dedicated to the NAOMS project
[hyperlink, http://www.nasa.gov/news/reports/NAOMS.html], last accessed
Mar. 1, 2009, or at other NASA Web sites where materials on NAOMS and
the Aviation Safety and Security Program are archived and searchable.
The Committee on Science and Technology of the House of Representatives
maintains additional information related to its October 31, 2007,
hearing on NAOMS through its Web site at [hyperlink,
http://science.house.gov/publications/] (last accessed Mar. 1, 2009).
Battelle Memorial Institute. NAOMS Reference Report: Concepts, Methods,
and Development Roadmap. Prepared for the National Aeronautics and
Space Administration Ames Research Center. November 30, 2007.
Connell, Linda. NAOMS Workshop: National Aviation Operations Monitoring
Service (NAOMS). Washington, D.C.: National Aeronautics and Space
Administration, March 1, 2000.
Connell, Linda. Workshop on the Concept of the National Aviation
Operational Monitoring Service (NAOMS). Alexandria, Va.: National
Aeronautics and Space Administration, May 11, 1999.
Connors, Mary, and Linda Connell. "The National Aviation Operations
Monitoring Service: A Project Overview of Background, Approach,
Development and Current Status." Presentation to the NAOMS Working
Group 1. Seattle, Wash.: National Aeronautics and Space Administration,
December 18, 2003.
Dodd, Robert S. Statement on the National Aviation Operations
Monitoring Service, October 28, 2007. Statement on the National
Aviation Operations Monitoring Service. Statement before the Committee
on Science and Technology, House of Representatives, U.S. Congress.
Washington, D.C.: October 31, 2007.
Griffin, Michael D., Administrator, National Aeronautics and Space
Administration. Letter to National Aeronautics and Space Administration
employees on NAOMS. Washington, D.C.: January 14, 2008.
Griffin, Michael D., Administrator, National Aeronautics and Space
Administration. Statement on the National Aviation Operations
Monitoring Service. Statement before the Committee on Science and
Technology, House of Representatives, U.S. Congress. Washington, D.C.:
October 31, 2007.
Griffin, Michael, Administrator, and Bryan D. O'Connor, Chief, Safety
and Mission Assurance, National Aeronautics and Space Administration.
"Release of Aviation Safety Data." Media briefing moderated by J. D.
Harrington, National Aeronautics and Space Administration Office of
Public Affairs. Washington, D.C.: December 31, 2007.
Krosnick, Jon A. Statement on the National Aviation Operations
Monitoring Service, October 30, 2007. Statement before the Committee on
Science and Technology, House of Representatives, U.S. Congress.
Washington, D.C.: October 31, 2007.
McVenes, Terry, Executive Air Safety Chairman, ALPA International.
Statement on the National Aviation Operations Monitoring Service.
Statement before the Committee on Science and Technology, House of
Representatives, U.S. Congress. Washington, D.C.: October 31, 2007.
Miller, Brad, Chairman, Subcommittee on Investigations and Oversight,
Committee on Science and Technology, House of Representatives, U.S.
Congress. Letter to Robert Sturgell, Acting Administrator, Federal
Aviation Administration. Washington, D.C.: July 23, 2008.
National Aeronautics and Space Administration. National Aviation
Operations Monitoring Service Application for OMB Clearance. Moffett
Field, Calif.: Ames Research Center, June 12, 2000.
National Aeronautics and Space Administration. "National Aviation
Operational Monitoring Service (NAOMS): Development and Proof of
Concept." Presentation to the Aviation Safety Reporting System Advisory
Subcommittee. Washington, D.C.: November 13, 1998.
National Aeronautics and Space Administration. "Creation of a National
Aviation Operational Monitoring Service (NAOMS): Proposed Phase One
Effort." Presentation to the Flight Safety Foundation Icarus Committee
Working Group on Flight Operational Risk Assessment. Washington, D.C.:
March 5, 1998.
National Aeronautics and Space Administration, Office of Safety and
Mission Assurance. "Final Report of the National Aeronautics and Space
Administration (NASA) National Aviation Operations Monitoring Service
(NAOMS) Information Release Advisory Panel (2008)." Memorandum to the
Associate Administrator, National Aeronautics and Space Administration.
Washington, D.C.: May 12, 2008.
National Aeronautics and Space Administration, Office of Inspector
General, Assistant General for Auditing. "Final Memorandum on the
Review of the National Aviation Operations Monitoring Service (Report
No. IG-08-014; Assignment No. S-08-004-00)," to the Associate
Administrator for Aeronautics Research, National Aeronautics and Space
Administration. Washington, D.C.: March 31, 2008.
Statler, Irving C. Aviation Safety and Security Program (AvSSP): 2.1
Aviation System Monitoring and Modeling (ASMM) Sub-Project Plan,
Version 4.0. Washington, D.C.: National Aeronautics and Space
Administration, February 2004.
Statler, Irving C., ed. The Aviation System Monitoring and Modeling
(ASMM) Project: A Documentation of Its History and Accomplishments 1999-
2005. Washington, D.C.: National Aeronautics and Space Administration,
June 2007.
White House Commission on Aviation Safety and Security. Final Report to
President Clinton. Washington, D.C.: The White House, February 12,
1997.
[End of section]
Footnotes:
[1] The NAS, also known as the national aviation system, comprises the
people, procedures, facilities, equipment, and infrastructure that
enable air travel in the United States. This includes, but is not
limited to, air traffic controllers, safety inspectors and technicians,
mechanics, pilots, radar systems, airports, and aircraft.
[2] Executive Order 13,015; 61 Federal Register 43937 (Aug. 27, 1996).
[3] By "project staff"--and, alternatively, the "NAOMS team" or "NAOMS
researchers"--we mean in this report the two researchers experienced in
aviation safety that NASA appointed to lead NAOMS, and the contractor
staff from the Battelle Memorial Institute (Battelle) who administered
the project and worked with experts (Battelle subcontractors) in survey
methodology and aviation safety to help with questionnaire construction
and project management.
[4] GAO expects to report on its assessment of the Federal Aviation
Administration's existing data sources later in 2009.
[5] In this report, we use the term "NAOMS project" to refer to the
original project as it was initially conceived, as a monitoring system
with multiple surveys of a variety of aviation personnel. However, we
primarily use the short form "NAOMS," and, alternatively, the "NAOMS
survey," to refer to the most extensively developed part of the
project, the air carrier pilot survey.
[6] OMB, Statistical Programs and Standards, Standards and Guidelines
for Statistical Surveys (Washington, D.C.: September 2006). See
[hyperlink, http://www.whitehouse.gov/omb] (last accessed Mar. 1,
2009).
[7] Robert M. Groves, Survey Errors and Survey Costs (New York, N.Y.:
John Wiley and Sons, April 1989), 6.
[8] OMB, Federal Committee on Statistical Methodology, "Measuring and
Reporting Sources of Error in Surveys, Statistical Policy [Working
Paper 31]" (Washington, D.C.: July 2001).
[9] The general aviation survey adapted the questionnaire and expanded
the sample used in NAOMS to survey nonmilitary pilots who were not
commercial air carrier pilots.
[10] "Face validity," a qualitative measure, refers to whether data
look like they measure what is intended, rather than to whether they
can be quantified with statistical methods.
[11] White House Commission on Aviation Safety and Security, Final
Report to President Clinton, recommendation 1.1 (Washington, D.C.: The
White House, Feb. 12, 1997), 8.
[12] A precursor is "the symptom of a systemic problem that is a
confluence of causal factors conducive to undesired system behavior
(e.g., human fatigue, organizational culture, equipment failure, or
procedural discrepancy) that, if left unresolved, has the potential to
result in increased probability of an accident. A precursor is a
measurable deviation from expectations or the norm, and it is important
that it not be viewed as being synonymous with causality." See Irving
C. Statler, The Aviation System Monitoring and Modeling (ASMM) Project:
A Documentation of Its History and Accomplishments 1999-2005
(Washington, D.C.: NASA, June 2007), 5.
[13] The eight data sources are listed in a Battelle document entitled
NAOMS Reference Report: Concepts, Methods, and Development Roadmap,
prepared for the NASA Ames Research Center (Nov. 30, 2007), table 2.1.
[14] FAA instituted its voluntary ASRS program in 1975. To enhance the
program by increasing the anonymity of reporters and others, FAA
delegated reporting, processing, and analysis of raw data from Aviation
Safety Reports to NASA as a third party. Under the terms of a
memorandum of understanding originally signed in 1975, NASA designed
ASRS to receive Aviation Safety Reports, and administers the program
independent of FAA. (See U.S. Department of Transportation, FAA,
Advisory Circular 00-46D (Feb. 26, 1997).)
[15] To encourage operational personnel to report incidents or
situations that they believe compromise aviation safety, FAA provides
ASRS reporters with limited legal immunity from regulatory enforcement
action. The Administrator of FAA is prohibited from using reports
submitted to NASA under ASRS (or information derived from them) in any
enforcement action, except that it may use information concerning
criminal offenses or accidents, which are not covered under the program
(14 C.F.R. § 91.25 (2008); see http://asrs.arc.nasa.gov/overview/
immunity.html, last accessed Mar. 1, 2009).
[16] Irving C. Statler, Aviation Safety and Security Program (AvSSP):
2.1 Aviation System Monitoring and Modeling (ASMM) Sub-Project Plan,
Version 4.0 (Washington, D.C.: NASA, February 2004), 40.
[17] Statler, Aviation Safety and Security Program (AvSSP), 42.
[18] CAST, a government-industry group, identifies top safety areas by
analyzing accident and incident data and identifies and implements
safety enhancements aimed at reducing fatalities.
[19] NASA, "Creation of a National Aviation Operational Monitoring
Service (NAOMS): Proposed Phase One Effort" (Washington, D.C.: Mar. 5,
1998), 21.
[20] NASA, "Creation of a National Aviation Operational Monitoring
Service (NAOMS)," 4.
[21] Michael D. Griffin, Administrator, and Bryan D. O'Connor, Chief,
Safety and Mission Assurance, NASA, "Release of Aviation Safety Data,"
media briefing (Washington, D.C.: Dec. 31, 2007), 17 (Michael D.
Griffin statement).
[22] Jon A. Krosnick, statement on the National Aviation Operations
Monitoring Service before the Committee on Science and Technology,
House of Representatives, U.S. Congress (Washington, D.C.: Oct. 31,
2007), 2-3.
[23] Battelle, NAOMS Reference Report, 6.
[24] Linda Connell, Workshop on the Concept of the National Aviation
Operational Monitoring Service (NAOMS) (Alexandria, Va.: May 11, 1999),
24. See also Robert S. Dodd, "NAOMS Development and Application,"
presentation to the Aeronautics and Space Engineering Board, National
Academies (Washington, D.C.: June 9, 2008), 5.
[25] Krosnick, statement before the Committee on Science and
Technology, 7-8.
[26] Battelle, NAOMS Reference Report, 14-15, and Connell, Workshop on
the Concept of the National Aviation Operational Monitoring Service, 51-
58. Flight hours are used to calculate risk exposure for events that
can occur any time during flight; flight legs are used for events that
occur mainly during terminal operations. See Mary Connors and Linda
Connell, "The National Aviation Operations Monitoring Service: A
Project Overview of Background, Approach, Development, and Current
Status," presentation to the NAOMS Working Group 1 (Seattle, Wash.:
Dec. 18, 2003), 16.
[27] ATO employs approximately 35,000 air traffic controllers,
technicians, engineers, and support personnel who provide air traffic
services to the nation to facilitate the safe and efficient movement of
aircraft throughout the NAS. See [hyperlink,
http://www.faa.gov/about/office_org/headquarters_offices/ato] (last
accessed Mar. 1, 2009).
[28] The Joint Implementation Measurement Data Analysis Team is a CAST
working group that assesses proposed safety enhancements and prepares
safety plans to track progress in implementing them.
[29] Griffin and O'Connor, "Release of Aviation Safety Data," 17
(Michael D. Griffin statement).
[30] Terry McVenes, Executive Air Safety Chairman, ALPA International,
statement on the National Aviation Operations Monitoring Service before
the Committee on Science and Technology, House of Representatives, U.S.
Congress (Washington, D.C.: Oct. 31, 2007).
[31] NASA, Assistant Inspector General for Auditing, Office of
Inspector General, "Final Memorandum on the Review of the National
Aviation Operations Monitoring Service (Report No. IG-08-014;
Assignment No. S-08-004-00)," to the Associate Administrator for
Aeronautics Research, NASA (Washington, D.C.: Mar. 31, 2008), 9.
[32] NASA, "Final Memorandum," 10.
[33] NASA, "Final Memorandum," 10. To respond to the Inspector
General's recommendation that NASA lead aviation stakeholder efforts to
assess the utility of NAOMS data, NASA contracted with the National
Research Council of the National Academies to provide an independent
assessment of NAOMS's methodology. NASA estimated that the council
would complete such an assessment in June 2009.
[34] According to NASA officials, issues of content and order were
addressed before the full air carrier pilot survey was implemented.
[35] Battelle, NAOMS Reference Report, appendix 2.
[36] Connell, Workshop on the Concept of the National Aviation
Operational Monitoring Service, especially 6, 14, 32-34, 62, and 64.
[37] NASA, "FAA NAOMS Workshop FAA Attendees Interviews, Summary"
(Washington, D.C.: September 1999).
[38] NASA, "NAOMS Response to FAA Questions and Concerns" (Washington,
D.C.: August 2003), especially 1.
[39] Statler, The Aviation System Monitoring and Modeling (ASMM)
Project, 10.
[40] NASA, "NAOMS Response to FAA," 5.
[41] More details about these experiments are in appendix I of this
report as well as in Battelle, NAOMS Reference Report, appendix 4-1.
[42] NASA and its contractors attempted to validate flight hour and leg
reports in the full air carrier pilot survey by comparing it with
existing BTS data.
[43] Battelle's final reference report on NAOMS suggests that the
experiments on data collection method and recall period persisted
throughout first year of the full air carrier pilot survey's operation.
(See Battelle, NAOMS Reference Report, 26.) NASA staff have clarified
that the NAOMS team decided to discard panel-based data collection in
favor of the cross-sectional approach approximately 9 months into the
survey's operation.
[44] Final cost numbers presented to the National Academies in 2008
differ from the estimates of fully operational costs presented in the
Battelle, NAOMS Reference Report, 31.
[45] Cognitive interviews are individual pretests of the survey in
which the survey developers solicit feedback on the language and
comprehensibility of specific questions.
[46] Preliminary analysis suggests that relative to BTS data and data
from air carrier pilots in the general aviation study, NAOMS air
carrier pilot survey data overrepresent pilots flying widebody aircraft
with long flight times and pilots flying as captains, rather than as
first officers or in some other capacity.
[47] That a survey (or safety monitoring system) relies on individuals'
reports is not a flaw, but a design feature that must be accounted for
when analyzing data.
[48] To generate a replicate, survey researchers take smaller samples
from the full sample, using the same sampling design. By releasing
small replicates on a regular basis, instead of the entire sample at
once, researchers can begin generating estimates for the entire sample
as each replicate is released and help ensure that systematic
differences between those who respond to the survey rapidly and those
who take longer to interview do not compound over the time that the
survey is administered.
[49] See Battelle, NAOMS Reference Report, appendix 2.
[50] Robert Dodd, "Airline Pilot Self Selection Bias in the FAA Airmen
Certification Database and Methods to Evaluate Its Effect: Questions on
Airline Size," memorandum to Mary Conners and Linda Connell, NASA (May
31, 2002), 1.
[51] FAA's Airmen Registration Database is a searchable set of files
that is updated monthly: see [hyperlink,
http://www.faa.gov/licenses_certificates/airmen_certification/releasable
_airmen_download], last accessed Mar. 1, 2009, and maintained by FAA's
Airmen Certification Branch in Oklahoma City. Since 2000, airmen may
opt to restrict public access to information in the database, including
their name, address, and ratings, in accordance with the Wendell H.
Ford Aviation Investment and Reform Act for the 21st Century, Public
Law 106-181 (Apr. 5, 2000).
[52] The NAOMS sample was drawn from the version of the Airmen
Directory Releasable File that was posted at [hyperlink,
http://www.landings.com] (last accessed Mar. 1, 2009).
[53] The NAOMS team compared the pilots in its field trial air carrier
sampling frame (based on the full FAA directory) with the publicly
available opt-out directory for the next year, and found that 39
percent of its sample pilots were not available in the new directory.
This contrasted with the team's understanding that roughly 8 to 10
percent of all pilots had opted out of the public list by 2002. The
team reported that an unknown portion of the greater attrition was
likely to have resulted from air carrier pilot retirements or
withdrawals for medical reasons.
[54] Preliminary analysis of the question was eventually used to
illustrate that NAOMS data overrepresented large air carriers and
underrepresented small carriers.
[55] The NAOMS data appeared to be biased in comparison with BTS
benchmark data and air carrier pilots in the general aviation sample.
[56] A stratified sample involves dividing the sampling frame into
mutually exclusive subgroups thought to be similar on the basis of
available information on each case; simple random or systematic random
samples are then selected from within each subgroup. Researchers often
use stratification to help ensure that they obtain reasonably precise
estimates for each subgroup of interest in a population.
[57] The team categorized aircraft in its preliminary analyses into
four operational size categories (see figure 5): small transport = less
than 100,000 pounds per gross takeoff weight (GTOW); medium transport =
100,000 to 200,000 pounds GTOW; large transport = more than 200,000
pounds GTOW with a single aisle; and widebody = more than 300,000
pounds GTOW with two aisles.
[58] NASA, National Aviation Operations Monitoring Service Application
for OMB Clearance (Moffett Field, Calif.: Ames Research Center, June
12, 2000), 15. Under the Paperwork Reduction Act of 1995, OMB requires
federal agencies seeking to conduct new surveys to submit an
application that establishes the necessity of new data collection in
light of other data systems; estimates cost and respondent burden; and
provides specific details about the survey and its sampling,
implementation, and likely use. See John D. Graham, Administrator,
Office of Management and Budget, Memorandum for the President's
Management Council: Guidance on Agency Survey and Statistical
Information Collections (Washington, D.C.: Executive Office of the
President, Jan. 20, 2006).
[59] NASA, National Aviation Operations Monitoring Service Application,
15.
[60] NASA officials and project staff have used the phrase "double-
counting" as shorthand to denote the potential for an event to be
reported by more than one pilot.
[61] The first-year data included 30-day, 60-day, and 90-day recall
periods.
[62] These baseline measures include flight hours and legs flown by
commercial aircraft.
[63] Telematch, in Springfield, Virginia, is a national database that
consists of some 170 million directory assistance consumer and business
listing records sourced directly from telephone companies and updated
daily. The NAOMS team used Telematch to find telephone numbers based on
each pilot's address in the Airmen Directory. See Telematch at
[hyperlink, http://www.telematch.com] (last accessed Mar. 1, 2009).
[64] NASA, Interviewer Training Manual from Year 1 (2001), I-10.
[65] Battelle, NAOMS Reference Report, 8. See also NASA, Interviewer
Training Manual from Year 1, I-10.
[66] NASA, National Aviation Operations Monitoring Service Application,
9, sec. I-J.
[67] Connell, Workshop on the Concept of the National Aviation
Operational Monitoring Service, 66.
[68] NASA's concerns about pilot confidentiality underlie the agency's
recent efforts to develop a redacted version of the NAOMS data for
public release.
[69] A structured prompt is a scripted instruction available for an
interviewer to clarify a respondent's question or response. When an
interviewer enters a respondent's answer into the computer system and
the entry does not meet certain criteria that have been established by
the survey's designers, the system may be programmed to provide
interviewers with a structured prompt that is to be read verbatim to
clarify a respondent's answer. For example, if the data system recorded
a value of 1,000 for the number of hours a pilot flew in a week the
interviewer would be instructed to ask the respondent whether that
value was correct. Structured prompts ensure that the interaction
between interviewers and respondents is consistent and help to mitigate
the effects of misunderstandings and data entry problems, in that an
unusually high or low value that persists after a structured prompt can
be treated as an outlier, rather than as a typing error or
miscommunication.
[70] The raw data suggest that the CATI programming appropriately
prevented interviewers from entering responses in cases where the sum
of the aircraft flown questions would exceed 100 percent, but not in
cases where the sum was less than 100 percent. Similarly, another set
of questions included cases for which the sum of events from individual
question subparts erroneously exceeded the top-level question value.
[71] Chester Bowie, National Opinion Research Center, "Review and
Evaluation of the Survey Management Component of the Air Carrier Survey
in NASA's National Aviation Operation's Monitoring Service (NAOMS),"
paper prepared for GAO (Bethesda, Md: Aug. 29, 2008), 3-4.
[72] Additionally, the events of September 11, 2001, affected the
airline industry and may have had an impact on the nature of
subsequently collected data, according to NASA officials.
[73] A November 2000 team document showed 2 arguments in favor of a
panel approach and 10 against.
[74] See, for example, GAO, Designing Evaluations, [hyperlink,
http://www.gao.gov/products/GAO/PEMD-10.1.4] (Washington, D.C.: March
1991).
[75] Battelle, NAOMS Reference Report, 35, appendix 9-6.
[76] NASA, Interviewer Training Manual from Year 1, I-10.
[77] A contractor document about potential bounding effects (whereby
the time from one panel survey interview to the next provides a mental
benchmark for the respondent) and a 7-day recall period demonstrate how
collection method and recall period can affect the nature of the data
collected. To the extent that different approaches resulted in
substantive differences in these data, results from interviews
collected under different methodologies should not be combined.
[78] Robert F. Belli, "NAOMS Survey Review," paper prepared for GAO
(Lincoln, Neb.: Aug. 27, 2008), 7. Belli has noted that there were no
individual-level validation data on which to make a determination of
the efficacy of these different recall periods.
[79] The survey methodologist also suggested that many federal surveys
do not conduct independent data validation.
[80] Battelle, NAOMS Reference Report, 29.
[81] Battelle's 2007 report describes the interviewers' training and
certification for the field trial. See Battelle, NAOMS Reference
Report, 7-8 and 29.
[82] NASA, Interviewer Training Manual from Year 1, sec. I.
[83] Ailerons and spoilers are parts of aircraft wings that provide (or
decrease) lift, and that help to control the airplane's stability in
flight.
[84] See, for example, American Association for Public Opinion
Research, Standard Definitions: Final Dispositions of Case Codes and
Outcome Rates for Surveys, 5th ed. (Lenexa, Kans.: 2008), 34.
[85] The first calculation would be the American Association for Public
Opinion Research's response rate 1; the second would be response rate
3. In a calculation of rates, the numerator was data NAOMS collected on
events, the denominator was data it collected on exposures per flight
hour or flight leg. See Connors and Connell, "The National Aviation
Operations Monitoring Service," 16.
[86] Sandra E. Wright and Richard A. Dolbeer, The National Wildlife
Strike Database for the U.S.A.: 1990 to 2002 and Beyond, Bird Strike
Committee Proceedings 2003, Bird Strike Committee U.S.A. and Canada,
5th Joint Annual Meeting, Toronto (Lincoln, Neb.: University of
Nebraska, 2003), 1.
[87] The survey methodologist for NAOMS reported that the pattern of
birdstrikes followed the expected seasonal pattern, which helped to
give the researchers additional confidence in the validity of the data.
[88] In 2002, FAA completed a national runway safety plan with 39
safety objectives, such as enhancing runway markings and lighting. For
more details on FAA policy on runway incursions, see GAO, Aviation
Runway and Ramp Safety: Sustained Efforts to Address Leadership,
Technology, and Other Challenges Needed to Reduce Accidents and
Incidents, GAO-08-29 (Washington, D.C.: Nov. 20, 2007).
[89] NAOMS staff provided us with a list of potential benchmarking
questions and corresponding data sources to facilitate future analysis.
[90] NASA, "Final Memorandum," 13.
[91] See, for example, NASA, "Final Memorandum," 15-16.
[92] The workshop agendas, participants, and feedback discussions are
detailed in Battelle, NAOMS Reference Report, apps. 8 and 10. The 1999
workshop agenda (appendix 8-1) included work group summaries and
discussions, but these work groups are not to be confused with the
working groups established later, during NAOMS's implementation. For
the two workshops, see also Connell, Workshop on the Concept of the
National Aviation Operational Monitoring Service, and Linda Connell,
NAOMS Workshop: National Aviation Operations Monitoring Service (NAOMS)
(Washington, D.C.: Mar. 1, 2000). An earlier workshop, in 1998, also
conducted during development, was NASA, "Creation of a National
Aviation Operational Monitoring Service (NAOMS)." All three development
workshops, as well as the implementation's working groups, are found at
NASA, National Aviation Operational Monitoring Service (NAOMS)
Information Release, NAOMS Project Presentations and Associated
Documents, [hyperlink,
http://www.nasa.gov/news/reports/NAOMS_pres.html] (last accessed Mar.
1, 2009). See also table 1 of this report.
[93] Connors and Connell, "The National Aviation Operations Monitoring
Service," agenda item 8 ("Future Directions"), 2.
[94] NASA, "Final Memorandum," 10.
[95] Senior FAA officials recently told us that the existing advisory
group they proposed was CAST's Joint Implementation Measurement Data
Analysis Team.
[96] The December 18, 2003, and May 5, 2004, working groups' agendas
and presentations are available at [hyperlink,
http://www.nasa.gov/news/reports/NAOMS_pres.html] (last accessed Mar.
1, 2009).
[97] Irving Statler, "Comments on the OIG Review of the National
Aviation Operations Monitoring Service," NASA (Mountain View, Calif.:
Mar. 5, 2008), 3-4.
[98] Statler, "Comments on the OIG Review of the National Aviation
Operations Monitoring Service," 3.
[99] In addition, FAA maintained that it had long believed that the
survey would be "overtaken by events," such as the collection of
digital flight data, also known as flight operational quality assurance
data. Such data could provide precise rates of occurrence on multiple
parameters and, thus, in FAA's view, could obviate NAOMS's potential
benefits. As of October 2008, 21 air carriers had FAA-and airline-
approved digital flight data programs. When NAOMS began in 1997, only 3
air carriers participated in such programs.
[100] NASA officials noted that they have not found sufficient publicly
released overall technical documentation for the NAOMS project. Many of
the documents we reviewed for our report were internal team memorandums
and analyses that we obtained directly from NASA contractors and
subcontractors for the project, and they have not been publicly posted.
However, we believe the NAOMS Reference Report and the project's OMB
paperwork contain sufficient information on the nature of the memory
experiments to inform future research.
[101] NASA, National Aviation Operations Monitoring Service
Application, 3.
[102] See, for example, GAO, GAO Cost Estimating and Assessment Guide:
Best Practices for Developing and Managing Capital Program Costs,
[hyperlink, http://www.gao.gov/products/GAO-09-3SP] (Washington, D.C.:
Mar. 2, 2009).
[103] A sample frame based on the full version of the FAA Airmen
Registration Database would ensure that all potentially eligible pilots
were available on the sampling frame, including those who opted out of
the publicly available directory. However, even the full database lacks
information on where pilots work and, thus, precludes direct
identification of air carrier pilots.
[104] Mike Baseshore, Office of Aviation Safety Analytical Services,
FAA, "Discussion on the NASA National Aviation Operational Monitoring
Service (NAOMS) Project," presentation to Aeronautics and Space
Engineering Board, National Academies (Washington, D.C.: June 10,
2008).
[105] Battelle, NAOMS Reference Report: Concepts, Methods, and
Development Roadmap, prepared for the NASA Ames Research Center (Nov.
30, 2007), appendix 7.
[106] L. J. Rosenthal, "An Overview of NAOMS Decisions Relating to
Sampling Approach," memorandum (Battelle, June 18, 2008), 4.
[107] An alternative modeling approach, such as zero-inflated Poisson
regression or negative binomial regression, would be useful in
assessing the effect of explanatory factors, including risk exposure,
on the likelihood of having experienced one or more safety events.
[108] The ability of statistical modeling to mitigate the effect of
bias as a result of coverage or noncoverage is limited and depends
heavily on how cases enter or fail to enter the sample. Other analysis
to determine if missing cases relate to the dependent or independent
variables of interest (regardless of their availability in the NAOMS
data), including the assessments of potential bias as a result of the
choice of sampling frame and the filter, are essential in establishing
the utility of statistical modeling.
[109] Battelle considered both rates and counts in its research on
outlier detection and resolution strategies for NAOMS data. See Thomas
Ferryman and others, Refined Outlier Detection and Resolution Process
(Richland, Wash.: Battelle, December 2002).
[110] This method, called the "Chebyshev multiple outlier detection
method," was thought to be appropriate in that adequate information to
generate distributionally driven cut-off values were lacking and to be
objective in that the method did not require judgment regarding the
appropriateness of any given answer. The method was nonparametric and
based on the Chebyshev inequality. Drafts of analysis plans show that
the NAOMS team initially planned to use distributionally based outlier
cleaning.
[111] As we have previously discussed, not linking specific planes and
a reported safety event was one of the NAOMS team's strategies for
maintaining pilot confidentiality.
[End of section]
GAO's Mission:
The Government Accountability Office, the audit, evaluation and
investigative arm of Congress, exists to support Congress in meeting
its constitutional responsibilities and to help improve the performance
and accountability of the federal government for the American people.
GAO examines the use of public funds; evaluates federal programs and
policies; and provides analyses, recommendations, and other assistance
to help Congress make informed oversight, policy, and funding
decisions. GAO's commitment to good government is reflected in its core
values of accountability, integrity, and reliability.
Obtaining Copies of GAO Reports and Testimony:
The fastest and easiest way to obtain copies of GAO documents at no
cost is through GAO's Web site [hyperlink, http://www.gao.gov]. Each
weekday, GAO posts newly released reports, testimony, and
correspondence on its Web site. To have GAO e-mail you a list of newly
posted products every afternoon, go to [hyperlink, http://www.gao.gov]
and select "E-mail Updates."
Order by Phone:
The price of each GAO publication reflects GAO‘s actual cost of
production and distribution and depends on the number of pages in the
publication and whether the publication is printed in color or black and
white. Pricing and ordering information is posted on GAO‘s Web site,
[hyperlink, http://www.gao.gov/ordering.htm].
Place orders by calling (202) 512-6000, toll free (866) 801-7077, or
TDD (202) 512-2537.
Orders may be paid for using American Express, Discover Card,
MasterCard, Visa, check, or money order. Call for additional
information.
To Report Fraud, Waste, and Abuse in Federal Programs:
Contact:
Web site: [hyperlink, http://www.gao.gov/fraudnet/fraudnet.htm]:
E-mail: fraudnet@gao.gov:
Automated answering system: (800) 424-5454 or (202) 512-7470:
Congressional Relations:
Ralph Dawn, Managing Director, dawnr@gao.gov:
(202) 512-4400:
U.S. Government Accountability Office:
441 G Street NW, Room 7125:
Washington, D.C. 20548:
Public Affairs:
Chuck Young, Managing Director, youngc1@gao.gov:
(202) 512-4800:
U.S. Government Accountability Office:
441 G Street NW, Room 7149:
Washington, D.C. 20548: