Data Mining
Early Attention to Privacy in Developing a Key DHS Program Could Reduce Risks
Gao ID: GAO-07-293 February 28, 2007
The government's interest in using technology to detect terrorism and other threats has led to increased use of data mining. A technique for extracting useful information from large volumes of data, data mining offers potential benefits but also raises privacy concerns when the data include personal information. GAO was asked to review the development by the Department of Homeland Security (DHS) of a data mining tool known as ADVISE (Analysis, Dissemination, Visualization, Insight, and Semantic Enhancement). Specifically, GAO was asked to determine (1) the tool's planned capabilities, uses, and associated benefits and (2) whether potential privacy issues could arise from using it to process personal information and how DHS has addressed any such issues. GAO reviewed program documentation and discussed these issues with DHS officials.
ADVISE is a data mining tool under development intended to help DHS analyze large amounts of information. It is designed to allow an analyst to search for patterns in data--such as relationships among people, organizations, and events--and to produce visual representations of these patterns, referred to as semantic graphs. None of the three planned DHS implementations of ADVISE that GAO reviewed are fully operational. (GAO did not review uses of the tool by the DHS Office of Intelligence and Analysis.) The intended benefit of the ADVISE tool is to help detect threatening activities by facilitating the analysis of large amounts of data. DHS is currently in the process of testing the tool's effectiveness. Use of the ADVISE tool raises a number of privacy concerns. DHS has added security controls to the tool; however, it has not assessed privacy risks. Privacy risks that could apply to ADVISE include the potential for erroneous association of individuals with crime or terrorism and the misidentification of individuals with similar names. A privacy impact assessment would identify specific privacy risks and help officials determine what controls are needed to mitigate those risks. ADVISE has not undergone such an assessment because DHS officials believe it is not needed given that the tool itself does not contain personal data. However, the tool's intended uses include applications involving personal data, and the E-Government Act and related guidance emphasize the need to assess privacy risks early in systems development. Further, if an assessment were conducted and privacy risks identified, a number of controls could be built into the tool to mitigate those risks. For example, controls could be implemented to ensure that personal information is used only for a specified purpose or compatible purposes, and they could provide the capability to distinguish among individuals that have similar names to address the risk of misidentification. Because privacy has not been assessed and mitigating controls have not been implemented, DHS faces the risk that ADVISE-based system implementations containing personal information may require costly and potentially duplicative retrofitting at a later date to add the needed controls.
Recommendations
Our recommendations from this work are listed below with a Contact for more information. Status will change from "In process" to "Open," "Closed - implemented," or "Closed - not implemented" based on our follow up work.
Director:
Team:
Phone:
GAO-07-293, Data Mining: Early Attention to Privacy in Developing a Key DHS Program Could Reduce Risks
This is the accessible text file for GAO report number GAO-07-293
entitled 'Data Mining: Early Attention to Privacy in Developing a Key
DHS Program Could Reduce Risks' which was released on March 21, 2007.
This text file was formatted by the U.S. Government Accountability
Office (GAO) to be accessible to users with visual impairments, as part
of a longer term project to improve GAO products' accessibility. Every
attempt has been made to maintain the structural and data integrity of
the original printed product. Accessibility features, such as text
descriptions of tables, consecutively numbered footnotes placed at the
end of the file, and the text of agency comment letters, are provided
but may not exactly duplicate the presentation or format of the printed
version. The portable document format (PDF) file is an exact electronic
replica of the printed version. We welcome your feedback. Please E-mail
your comments regarding the contents or accessibility features of this
document to Webmaster@gao.gov.
This is a work of the U.S. government and is not subject to copyright
protection in the United States. It may be reproduced and distributed
in its entirety without further permission from GAO. Because this work
may contain copyrighted images or other material, permission from the
copyright holder may be necessary if you wish to reproduce this
material separately.
Report to the Chairman, Committee on Appropriations, House of
Representatives:
United States Government Accountability Office:
GAO:
February 2007:
Data Mining:
Early Attention to Privacy in Developing a Key DHS Program Could Reduce
Risks:
GAO-07-293:
GAO Highlights:
Highlights of GAO-07-293, a report to the Chairman, Committee on
Appropriations, House of Representatives
Why GAO Did This Study:
The government‘s interest in using technology to detect terrorism and
other threats has led to increased use of data mining. A technique for
extracting useful information from large volumes of data, data mining
offers potential benefits but also raises privacy concerns when the
data include personal information.
GAO was asked to review the development by the Department of Homeland
Security (DHS) of a data mining tool known as ADVISE (Analysis,
Dissemination, Visualization, Insight, and Semantic Enhancement).
Specifically, GAO was asked to determine (1) the tool‘s planned
capabilities, uses, and associated benefits and (2) whether potential
privacy issues could arise from using it to process personal
information and how DHS has addressed any such issues. GAO reviewed
program documentation and discussed these issues with DHS officials.
What GAO Found:
ADVISE is a data mining tool under development intended to help DHS
analyze large amounts of information. It is designed to allow an
analyst to search for patterns in data”such as relationships among
people, organizations, and events”and to produce visual representations
of these patterns, referred to as semantic graphs (see fig.) None of
the three planned DHS implementations of ADVISE that GAO reviewed are
fully operational. (GAO did not review uses of the tool by the DHS
Office of Intelligence and Analysis.) The intended benefit of the
ADVISE tool is to help detect threatening activities by facilitating
the analysis of large amounts of data. DHS is currently in the process
of testing the tool‘s effectiveness.
Use of the ADVISE tool raises a number of privacy concerns. DHS has
added security controls to the tool; however, it has not assessed
privacy risks. Privacy risks that could apply to ADVISE include the
potential for erroneous association of individuals with crime or
terrorism and the misidentification of individuals with similar names.
A privacy impact assessment would identify specific privacy risks and
help officials determine what controls are needed to mitigate those
risks. ADVISE has not undergone such an assessment because DHS
officials believe it is not needed given that the tool itself does not
contain personal data. However, the tool‘s intended uses include
applications involving personal data, and the E-Government Act and
related guidance emphasize the need to assess privacy risks early in
systems development. Further, if an assessment were conducted and
privacy risks identified, a number of controls could be built into the
tool to mitigate those risks. For example, controls could be
implemented to ensure that personal information is used only for a
specified purpose or compatible purposes, and they could provide the
capability to distinguish among individuals that have similar names to
address the risk of misidentification. Because privacy has not been
assessed and mitigating controls have not been implemented, DHS faces
the risk that ADVISE-based system implementations containing personal
information may require costly and potentially duplicative retrofitting
at a later date to add the needed controls.
What GAO Recommends:
To ensure that privacy protections are in place, GAO is recommending
that the Secretary of Homeland Security immediately conduct a privacy
impact assessment of the ADVISE tool and implement privacy controls, as
needed, to mitigate any identified risks.
DHS generally agreed with the content of this report and described
actions initiated to address GAO‘s recommendations.
[Hyperlink, http://www.gao.gov/cgi-bin/getrpt?GAO-07-293].
To view the full product, including the scope and methodology, click on
the link above. For more information, contact Linda Koontz at (202) 512-
6240 or koontzl@gao.gov.
[End of section]
Contents:
Letter:
Results in Brief:
Background:
ADVISE Is Intended to Help Identify Patterns of Interest to Homeland
Security Analysts:
DHS Has Not Yet Addressed Key Privacy Risks Associated with Expected
Uses of the ADVISE Tool:
Conclusions:
Recommendations for Executive Action:
Agency Comments and Our Evaluation:
Appendix I: Objectives, Scope, and Methodology:
Appendix II: Comments from the Department of Homeland Security:
Appendix III: GAO Contact and Staff Acknowledgments:
Table:
Table 1: Fair Information Practices:
Figures:
Figure 1: An Overview of the Data Mining Process:
Figure 2: Major Elements and Functions of ADVISE:
Figure 3: Typical Semantic Graph:
Abbreviations:
ADVISE: Analysis, Dissemination, Visualization, Insight, and Semantic
Enhancement:
DHS: Department of Homeland Security:
ICAHST: Interagency Center for Applied Homeland Security Technology:
OECD: Organization for Economic Cooperation and Development:
OMB: Office of Management and Budget:
PIA: privacy impact assessment:
[End of section]
United States Government Accountability Office:
Washington, DC 20548:
February 28, 2007:
The Honorable David R. Obey:
Chairman, Committee on Appropriations:
House of Representatives:
Dear Mr. Chairman:
Since the terrorist attacks of September 11, 2001, there has been an
increasing focus on the need to prevent and detect terrorist threats
through technological means. Data mining--a technique for extracting
useful information from large volumes of data--is one type of analysis
that has been used increasingly by the government to help detect
terrorist threats. While data mining offers a number of promising
benefits, its use also raises privacy concerns when the data include
personal information.[Footnote 1]
Federal agency use of personal information is governed primarily by the
Privacy Act of 1974 and the E-Government Act of 2002, which prescribe
specific activities that agencies must perform to protect privacy, such
as (1) ensuring that personal information is used only for a specified
purpose, or related purposes, and that it is accurate for those
purposes and (2) conducting assessments of privacy risks associated
with information technology used to process personal information, known
as privacy impact assessments.[Footnote 2] Agencies that wish to reap
the potential benefits of data mining are faced with the challenge of
implementing adequate privacy controls for the systems that they use to
perform these analyses.
You asked us to review the Department of Homeland Security's (DHS)
development of an analytical tool known as Analysis, Dissemination,
Visualization, Insight, and Semantic Enhancement (ADVISE).
Specifically, we agreed with your staff that our objectives were to
determine (1) the planned capabilities, uses, and associated benefits
of the ADVISE tool and (2) whether potential privacy issues could arise
from using ADVISE to process personal information and how DHS has
addressed any such issues. Our review did not include intelligence
applications, such as uses of the tool by the DHS Office of
Intelligence and Analysis.
To address our first objective, we identified and analyzed the ADVISE
tool's planned capabilities, uses, and associated benefits. We reviewed
program documentation, including annual program execution plans, and
interviewed agency officials responsible for managing and implementing
the program. We also interviewed officials at DHS components that have
begun to implement the tool[Footnote 3] in order to identify their
current or planned uses, the progress of their implementation, and the
benefits they hope to gain.
To address our second objective, we searched for potential privacy
concerns by reviewing relevant reports, including prior GAO reports and
the DHS Privacy Office 2006 report on data mining.[Footnote 4] We
identified and analyzed actions to comply with the Privacy Act of 1974
and the E-Government Act of 2002. We also interviewed technical experts
within the DHS Science and Technology Directorate and personnel
responsible for implementing ADVISE at DHS components to assess privacy
controls included in the ADVISE tool, as well as the quality assurance
processes for data analyzed using ADVISE. We performed our work from
June 2006 to December 2006 in the Washington, D.C., metropolitan area
and Laurel, Maryland. Our work was performed in accordance with
generally accepted government auditing standards. Our objectives,
scope, and methodology are discussed in more detail in appendix I.
Results in Brief:
ADVISE is a data mining tool under development that is intended to
facilitate the analysis of large amounts of data. It is designed to
accommodate both structured data (such as information in a database)
and unstructured data (such as e-mail texts, reports, and news
articles) and to allow an analyst to search for patterns in data,
including relationships among entities (such as people, organizations,
and events), and to produce visual representations of these patterns,
referred to as semantic graphs. Although none are fully operational,
DHS's planned uses of this tool include implementations at four
departmental components (including Immigration and Customs Enforcement
and other components).[Footnote 5] DHS is also considering further
deployments of ADVISE. The intended benefit of the ADVISE tool is to
help detect activities that threaten the United States by facilitating
the analysis of large amounts of data that otherwise would be very
difficult to review. DHS is currently in the process of testing the
tool's effectiveness.
Use of the ADVISE tool raises a number of privacy concerns. DHS has
added security controls to the ADVISE tool, including access
restrictions, authentication procedures, and security auditing
capability. However, it has not assessed privacy risks. Privacy risks
that could apply to ADVISE include the potential for erroneous
association of individuals with crime or terrorism, the
misidentification of individuals with similar names, and the use of
data that were collected for other purposes. A privacy impact
assessment would determine the specific privacy risks associated with
ADVISE and help officials determine what controls are needed to
mitigate those risks. Although DHS officials are considering conducting
a modified version of such an assessment, the ADVISE tool has not yet
been assessed because department officials believe it is not needed
given that the ADVISE tool itself does not contain personal data.
However, the tool's intended uses include applications involving
personal information, and the E-Government Act, as well as related
Office of Management and Budget and DHS guidance, emphasize the need to
assess privacy risks early in systems development. Further, if a
privacy impact assessment were conducted now and privacy risks
identified, a number of controls exist that could be built into the
tool to mitigate those risks. For example, controls could be
implemented to ensure that personal information is used only for a
specified purpose or compatible purposes, or they could provide the
capability to distinguish among individuals that have similar names (a
process known as disambiguation) to address the risk of
misidentification. Because privacy risks such as these have not been
assessed and decisions about mitigating controls have not been made,
DHS faces the likelihood that ADVISE-based system implementations
containing personal information may require costly and potentially
duplicative retrofitting at a later date to add the needed privacy
controls.
To ensure that privacy protections are in place before DHS proceeds
with implementations of systems based on ADVISE, we are recommending
that the Secretary of Homeland Security immediately conduct a privacy
impact assessment of the ADVISE tool to identify privacy risks and
implement privacy controls to mitigate those risks.
We obtained oral and written comments on a draft of this report from
DHS. In its comments DHS generally agreed with the content of this
report and described actions initiated to address our recommendations.
Background:
As defined in a report that we issued in May 2004,[Footnote 6] data
mining is the application of database technology and techniques--such
as statistical analysis and modeling--to uncover hidden patterns and
subtle relationships in data and to infer rules that allow for the
prediction of future results. This definition is based on the most
commonly used terms found in a survey of the technical literature.
Data mining has been used successfully for a number of years in the
private and public sectors in a broad range of applications. In the
private sector, these applications include customer relationship
management, market research, retail and supply chain analysis, medical
analysis and diagnostics, financial analysis, and fraud detection. In
the government, data mining has been used to detect financial fraud and
abuse. For example, we used data mining to identify fraud and abuse in
expedited assistance and other disbursements to Hurricane Katrina
victims.[Footnote 7]
Although the characteristics of data mining efforts can vary greatly,
data mining generally incorporates three processes: data input, data
analysis, and results output. In data input, data are collected in a
central data "warehouse," validated, and formatted for use in data
mining. In the data analysis phase, data are typically queried to find
records that match topics of interest. The two most common types of
queries are pattern-based queries and subject-based queries:
² Pattern-based queries search for data elements that match or depart
from a predetermined pattern (e.g., unusual claim patterns in an
insurance program).
² Subject-based queries search for any available information on a
predetermined subject using a specific identifier. This could be
personal information such as an individual identifier (e.g., an
individual's name or Social Security number) or an identifier for a
specific object or location. For example, the Navy uses subject-based
data mining to identify trends in the failure rate of parts used in its
ships.
The data analysis phase can be iterative, with the results of one query
being used to refine criteria for a subsequent query. The output phase
can produce results in printed or electronic format. These reports can
be accessed by agency personnel and can also be shared with personnel
from other agencies. Figure 1 depicts a generic data mining process.
Figure 1: An Overview of the Data Mining Process:
[See PDF for image]
Source: GAO, adapted from Vipin Kumar and Mohammed J. Zaki.
[End of figure]
In recent years, data mining has emerged as a prevalent government
mechanism for processing and analyzing large amounts of data. In our
May 2004 report, we noted that 52 agencies were using or were planning
to use data mining in 199 cases, of which 68 were planned, and 131 were
operational. Additionally, following the terrorist attacks of September
11, 2001, data mining has been used increasingly as a tool to help
detect terrorist threats through the collection and analysis of public
and private sector data. This may include tracking terrorist
activities, including money transfers and communications, and tracking
terrorists themselves through travel and immigration records. According
to an August 2006 DHS Office of Inspector General survey of
departmental data mining initiatives,[Footnote 8] DHS is using or
developing 12 data mining programs, 9 of which are fully operational
and 3 of which are still under development.
One such effort is the ADVISE technology program. Managed by the DHS
Science and Technology Directorate,[Footnote 9] the ADVISE program is
primarily responsible for (1) continuing to develop the ADVISE data
mining tool and (2) promoting and supporting its implementation
throughout DHS. According to program officials, it has spent
approximately $40 million to develop the tool since 2003.
To promote the possible implementation of the tool within DHS component
organizations, program officials have made demonstrations (using
unclassified data) to interested officials, highlighting the tool's
planned capabilities and expected benefits. Program officials have
established working relationships with component organizations that are
considering adopting the tool, including detailing them staff
(typically contractor-provided) to assist in the setup and
customization of their ADVISE implementation and providing training for
the analysts who are to use it.
Program officials project that implementation of the tool at a
component organization should generally consist of six main phases and
take approximately 12 to 18 months to complete. The six phases are as
follows:
² preparing infrastructure and installing hardware and software;
² modeling information sources and loading data;
² verifying and validating that loaded data are accurate and
accessible;
² training and familiarizing analysts and assisting in the development
of initial research activities using visualization tools;
² supporting analysts in identifying the best ways to use ADVISE for
their problems, obtaining data, and developing ideas for further
improvements; and:
² turning over deployment to the component organizations to maintain
the system and its associated data feeds.
The program has also provided initial funding for the setup,
customization, and pilot testing of implementations within components,
under the assumption that when an implementation achieves operational
status, the respective component will take over operations and
maintenance costs. Program officials estimate that the tool's
operations and maintenance costs will be approximately $100,000 per
year, per analyst. The program has also offered additional support to
components implementing the tool, such as helping them develop privacy
compliance documentation. According to DHS officials, the program has
spent $12.15 million of its $40 million in support of several pilot
projects and test implementations throughout the department.
Currently, the department's Interagency Center for Applied Homeland
Security Technologies (ICAHST) group within the Science and Technology
Directorate is testing the tool's effectiveness, adequacy, and cost-
effectiveness as a data mining technology. ICAHST has completed
preliminary testing of basic functionality and is currently in the
process of testing the system's effectiveness, using mock data to test
how well ADVISE identifies specified patterns of interest.
Privacy Concerns Have Been Raised Regarding Data Mining:
The impact of computer systems on the ability of organizations to
protect personal information was recognized as early as 1973, when a
federal advisory committee on automated personal data systems observed
that "The computer enables organizations to enlarge their data
processing capacity substantially, while greatly facilitating access to
recorded data, both within organizations and across boundaries that
separate them." In addition, the committee concluded that "The net
effect of computerization is that it is becoming much easier for record-
keeping systems to affect people than for people to affect record-
keeping systems."[Footnote 10]
In May 2004, we reported that mining government and private databases
containing personal information creates a range of privacy
concerns.[Footnote 11] Through data mining, agencies can quickly and
efficiently obtain information on individuals or groups by searching
large databases containing personal information aggregated from public
and private records. Information can be developed about a specific
individual or a group of individuals whose behavior or characteristics
fit a specific pattern. The ease with which organizations can use
automated systems to gather and analyze large amounts of previously
isolated information raises concerns about the impact on personal
privacy.
Further, we reported in August 2005[Footnote 12] that although agencies
responsible for certain data mining efforts took many of the key steps
required by federal law and executive branch guidance for the
protection of personal information, none followed all key procedures.
Specifically, while three of the four agencies we reviewed had prepared
privacy impact assessments (PIA)--assessments of privacy risks
associated with information technology used to process personal
information--for their data mining systems, none of them had completed
a PIA that adequately addressed all applicable statutory requirements.
We recommended that four agencies complete or revise PIAs for their
systems to fully comply with applicable guidance. As of December 2006,
three of the four agencies reported that they had taken action to
complete or revise their PIAs.
Federal Laws and Guidance Define Steps to Protect Privacy of Personal
Information:
Federal law includes a number of separate statutes that provide privacy
protections for information used for specific purposes or maintained by
specific types of entities. The major requirements for the protection
of personal privacy by federal agencies come from two laws, the Privacy
Act of 1974 and the privacy provisions of the E- Government Act of
2002. The Office of Management and Budget (OMB) is tasked with
providing guidance to agencies on how to implement the provisions of
both laws and has done so, beginning with guidance on the Privacy Act,
issued in 1975.
The Privacy Act places limitations on agencies' collection, disclosure,
and use of personal information maintained in systems of records. The
act describes a "record" as any item, collection, or grouping of
information about an individual that is maintained by an agency and
contains his or her name or another personal identifier. It also
defines "system of records" as a group of records under the control of
any agency from which information is retrieved by the name of the
individual or by an individual identifier. The Privacy Act requires
that when agencies establish or make changes to a system of records,
they must notify the public through a "system of records notice" that
is, a notice in the Federal Register identifying, among other things,
the type of data collected, the types of individuals about whom
information is collected, the intended "routine" uses of data, and
procedures that individuals can use to review and correct personal
information.[Footnote 13] In addition, the act requires agencies to
publish in the Federal Register notice of any new or intended use of
the information in the system, and provide an opportunity for
interested persons to submit written data, views, or arguments to the
agency.
Several provisions of the act require agencies to define and limit
themselves to specific predefined purposes. For example, the act
requires that to the greatest extent practicable, personal information
should be collected directly from the subject individual when it may
affect an individual's rights or benefits under a federal program. The
act also requires that an agency inform individuals whom it asks to
supply information of (1) the authority for soliciting the information
and whether disclosure of such information is mandatory or voluntary;
(2) the principal purposes for which the information is intended to be
used; (3) the routine uses that may be made of the information; and (4)
the effects on the individual, if any, of not providing the
information. In addition, the act requires that each agency that
maintains a system of records store only such information about an
individual as is relevant and necessary to accomplish a purpose of the
agency.
Agencies are allowed to claim exemptions from some of the provisions of
the Privacy Act if the records are used for certain purposes. For
example, records compiled for criminal law enforcement purposes can be
exempt from a number of provisions, including (1) the requirement to
notify individuals of the purposes and uses of the information at the
time of collection and (2) the requirement to ensure the accuracy,
relevance, timeliness, and completeness of records. In general, the
exemptions for law enforcement purposes are intended to prevent the
disclosure of information collected as part of an ongoing investigation
that could impair the investigation or allow those under investigation
to change their behavior or take other actions to escape prosecution.
The E-Government Act of 2002 strives to enhance protection for personal
information in government information systems or information
collections by requiring that agencies conduct PIAs. As described
earlier, a PIA is an analysis of how personal information is collected,
stored, shared, and managed in a federal system. More specifically,
according to OMB guidance,[Footnote 14] a PIA is an analysis of how:
...information is handled: (i) to ensure handling conforms to
applicable legal, regulatory, and policy requirements regarding
privacy; (ii) to determine the risks and effects of collecting,
maintaining, and disseminating information in identifiable form in an
electronic information system; and (iii) to examine and evaluate
protections and alternative processes for handling information to
mitigate potential privacy risks.
Agencies must conduct PIAs before (1) developing or procuring
information technology that collects, maintains, or disseminates
information that is in a personally identifiable form or (2) initiating
any new data collections involving personal information that will be
collected, maintained, or disseminated using information technology if
the same questions are asked of 10 or more people. OMB guidance also
requires agencies to conduct PIAs in two specific types of situations:
(1) when, as a result of the adoption or alteration of business
processes, government databases holding information in personally
identifiable form are merged, centralized, matched with other
databases, or otherwise significantly manipulated and (2) when agencies
work together on shared functions involving significant new uses or
exchanges of information in personally identifiable form.[Footnote 15]
DHS has also developed its own guidance[Footnote 16] requiring PIAs to
be performed when one of its offices is developing or procuring any new
technologies or systems, including classified systems, that handle or
collect personally identifiable information. It also requires that PIAs
be performed before pilot tests are begun for these systems or when
significant modifications are made to them. Furthermore, DHS has
prescribed detailed requirements for PIAs. For example, PIAs must
describe all uses of the information, and whether the system analyzes
data in order to identify previously unknown patterns or areas of note
or concern.
Fair Information Practices:
The Privacy Act of 1974 is largely based on a set of internationally
recognized principles for protecting the privacy and security of
personal information known as the Fair Information Practices. A U.S.
government advisory committee first proposed the practices in 1973 to
address what it termed a poor level of protection afforded to privacy
under contemporary law[Footnote 17]. The Organization for Economic
Cooperation and Development (OEC[Footnote 18]D) developed a revised
version of the Fair Information Practices in 1980 that has, with some
variation, formed the basis of privacy laws and related policies in
many countries, including the United States, Germany, Sweden,
Australia, New Zealand, and the European Uni[Footnote 19]on. The eight
principles of the OECD Fair Information Practices are shown in table 1.
Table 1: Fair Information Practices:
Principle: Collection limitation;
Description: The collection of personal information should be limited,
should be obtained by lawful and fair means, and, where appropriate,
with the knowledge or consent of the individual.
Principle: Data quality;
Description: Personal information should be relevant to the purpose for
which it is collected, and should be accurate, complete, and current as
needed for that purpose.
Principle: Purpose specification;
Description: The purposes for the collection of personal information
should be disclosed before collection and upon any change to that
purpose, and its use should be limited to those purposes and compatible
purposes.
Principle: Use limitation;
Description: Personal information should not be disclosed or otherwise
used for other than a specified purpose without consent of the
individual or legal authority.
Principle: Security safeguards;
Description: Personal information should be protected with reasonable
security safeguards against risks such as loss or unauthorized access,
destruction, use, modification, or disclosure.
Principle: Openness;
Description: The public should be informed about privacy policies and
practices, and individuals should have ready means of learning about
the use of personal information.
Principle: Individual participation;
Description: Individuals should have the following rights: to know
about the collection of personal information, to access that
information, to request correction, and to challenge the denial of
those rights.
Principle: Accountability;
Description: Individuals controlling the collection or use of personal
information should be accountable for taking steps to ensure the
implementation of these principles.
Source: OECD.
[End of table]
The Fair Information Practices are not precise legal requirements.
Rather, they provide a framework of principles for balancing the need
for privacy with other public policy interests, such as national
security, law enforcement, and administrative efficiency. Ways to
strike that balance vary among countries and according to the type of
information under consideration.
ADVISE Is Intended to Help Identify Patterns of Interest to Homeland
Security Analysts:
ADVISE is a data mining tool under development that is intended to
facilitate the analysis of large amounts of data. It is designed to
accommodate both structured data (such as information in a database)
and unstructured data (such as e-mail texts, reports, and news
articles) and to allow an analyst to search for patterns in data,
including relationships among entities (such as people, organizations,
and events) and to produce visual representations of these patterns,
referred to as semantic graphs. Although none are fully operational,
DHS's planned uses of this tool include implementations at several
departmental components, including Immigration and Customs Enforcement
and other components. DHS is also considering further deployments of
ADVISE. The intended benefit of the ADVISE tool is to help detect
activities that threaten the United States by facilitating the analysis
of large amounts of data that otherwise would be prohibitively
difficult to review. DHS is currently in the process of testing the
tool's effectiveness.
The ADVISE Tool Provides Analytical Capabilities Intended to Identify
Patterns of Interest to DHS Analysts:
ADVISE provides several capabilities that help to find and track
relationships in data. These include graphically displaying the results
of searches and providing automated alerts when predefined patterns of
interest emerge in the data. The tool consists of three main elements-
-the Information Layer, Knowledge Layer, and Application Layer
(depicted in fig. 2).
Figure 2: Major Elements and Functions of ADVISE:
[See PDF for image]
Source: DHS.
[End of figure]
Information Layer:
At the Information Layer, disparate data are brought into the tool from
various sources. These data sources can be both structured (such as
computerized databases and watch lists) and unstructured (such as news
feeds and text reports). For structured data, ADVISE contains software
applications that load the data into the Information Layer and format
it to conform to a specific predefined data structure, known as an
ontology. Generally speaking, ontologies define entities (such as a
person or place), attributes (such as name and address), and the
relationships among them.
For unstructured data, ADVISE includes several tools that extract
information about entities and attributes. As with structured data, the
output of these analyses is formatted and structured according to an
ontology. Tagging information as specific entities and attributes is
more difficult with unstructured data, and ADVISE includes tools that
allow analysts to manually identify entities, attributes, and
relationships among them. According to DHS officials, research is
continuing on developing efficient and effective mechanisms for
inputting different forms of unstructured data.
ADVISE can also include information about the data--known as
"metadata"--such as the time period to which the data pertain and
whether the data refer to a U.S. person. ADVISE metadata also include
confidence attributes, ranging from 1 to -1, which represent subjective
assessments of the accuracy of the data. Each data source has a
predefined confidence attribute. Analysts can change the confidence
attribute of specific data, but changes to confidence levels are
tracked and linked to the analysts making the changes.
Knowledge Layer:
At the Knowledge Layer, facts and relationships from the Information
Layer are consolidated into a large-scale semantic graph and various
subgraphs. Semantic graphing is a data modeling technique that uses a
combination of "nodes," representing specific entities, and connecting
lines, representing the relationships among them. Because they are well-
suited to representing data relationships and linkages, semantic graphs
have emerged as a key technology for consolidating and organizing
disparate data. Figure 3 represents the format that a typical semantic
graph could take. The Knowledge Layer contains the semantic graph of
all facts reported through the Information Layer interface and
organized according to the ontology.
Figure 3: Typical Semantic Graph:
[See PDF for image]
Source: GAO.
[End of figure]
The Knowledge Layer also includes the capability to provide automatic
alerts to analysts when patterns of interest (or partial patterns) are
matched by new incoming information.
Application Layer:
At the Application Layer, analysts are able to interact with the data
that reside in the Knowledge Layer. The Application Layer contains
tools that allow analysts to perform both pattern-based and subject-
based queries and to search for data that match a specific pattern, as
well as data that are connected with a specific entity. For example,
analysts could search for all of the individuals who have traveled to a
certain destination within a given period of time, or they could search
for all information connected with a particular person, place, or
organization. The resulting output of these searches is then
graphically displayed via semantic graphs.
ADVISE's Application Layer also provides several other capabilities
that allow for the further examination and adjustment of its output. An
analyst can pinpoint nodes on a semantic graph to view and examine
additional information related to them, including the source from which
the information and relationships are derived, the data source's
confidence level, and whether the data pertain to U.S. persons.
The ADVISE Application Layer also provides analysts the ability to
monitor patterns of interest in the data. Science and Technology
Directorate staff work with component staff to define patterns of
interest and build an inventory of automated searches. These patterns
are continuously being monitored in the data, and an alert is provided
whenever there is a match. For example, an analyst could define a
pattern of interest as "all individuals traveling from the United
States to the Middle East in the next 6 months" and have the ADVISE
tool provide an alert whenever this pattern emerges in the data.
ADVISE Is Expected to Benefit DHS by Helping to Detect Potentially
Threatening Activities:
The current planned uses of the ADVISE tool include implementations at
several DHS components that are planning to use it in a variety of
homeland security applications to further their respective
organizational missions. Currently none of these implementations is
fully operational or widely accessible to DHS analysts. Rather, they
are all still in various phases of systems development. These
applications are expected to use the tool primarily to help analysts
detect threats to the United States, such as identifying activities
and/or individuals that could be associated with terrorism.
The intended benefit of the ADVISE tool is to consolidate large amounts
of structured and unstructured data and permit their analysis and
visualization. The tool could thus assist analysts to identify and
monitor patterns of interest that could be further investigated and
might otherwise have been missed.
None of the DHS components have fully implemented the tool in
operational systems and, as discussed earlier, testing of the tool is
still under way. Until such testing is complete and component
implementations are fully operational, the intended benefit remains
largely potential.
DHS Has Not Yet Addressed Key Privacy Risks Associated with Expected
Uses of the ADVISE Tool:
Use of the ADVISE tool raises a number of privacy concerns. DHS has
added security controls to the ADVISE tool, including access
restrictions, authentication procedures, and security auditing
capability. However, it has not assessed privacy risks. Privacy risks
that could apply to ADVISE include the potential for erroneous
association of individuals with crime or terrorism through data that
are not accurate for that purpose, the misidentification of individuals
with similar names, and the use of data that were collected for other
purposes. A PIA would determine the privacy risks associated with
ADVISE and help officials determine what specific controls are needed
to mitigate those risks. Although department officials believe a PIA is
not needed given that the ADVISE tool itself does not contain personal
data, the E-Government Act of 2002 and related federal guidance require
the completion of PIAs from the early stages of development. Further,
if a PIA were conducted and privacy risks identified, a number of
controls exist that could be built into the tool to mitigate those
risks. For example, controls could be implemented to ensure that
personal information is used only for a specified purpose or compatible
purposes, or they could provide the capability to distinguish among
individuals that have similar names (a process known as disambiguation)
to address the risk of misidentification. Because privacy risks such as
these have not been assessed and decisions about mitigating controls
have not been made, DHS faces the likelihood that system
implementations based on the tool may require costly and potentially
duplicative retrofitting at a later date to add the needed controls.
Potential Privacy Concerns Arise with the Use of the ADVISE Tool to
Process Personal Information:
Like other data mining applications, the use of the ADVISE tool in
conjunction with personal information raises concerns about a number of
privacy risks that could potentially have an adverse impact on
individuals. As the DHS Privacy Office's July 2006 report on data
mining activities notes, "privacy and civil liberties issues
potentially arise in every phase of the data mining process."[Footnote
20]
Potential privacy risks can be categorized in relation to the Fair
Information Practices, which, as discussed earlier, form the basis for
privacy laws such as the Privacy Act. For example, the potential for
personal information to be improperly accessed or disclosed relates to
the security safeguards principle, which states that personal
information should be protected against risks such as loss or
unauthorized access, destruction, use, modification, or disclosure.
Further, the potential for individuals to be misidentified or
erroneously associated with inappropriate activities is inconsistent
with the data quality principle that personal data should be accurate,
complete, and current, as needed for a given purpose. Similarly, the
risk that information could be used beyond the scope originally
specified is based on the purpose specification and use limitation
principles, which state that, among other things, personal information
should only be collected and used for a specific purpose and that such
use should be limited to the specified purpose and compatible purposes.
Like other data mining applications, the ADVISE tool could misidentify
or erroneously associate an individual with undesirable activity such
as fraud, crime, or terrorism--a result known as a false positive.
False positives may be the result of poor data quality, or they could
result from the inability of the system to distinguish among
individuals with similar names. Data quality, the principle that data
should be accurate, current, and complete as needed for a given
purpose, could be particularly difficult to ensure with regard to
ADVISE because the tool brings together multiple, disparate data
sources, some of which may be more accurate for the analytical purpose
at hand than others. If data being analyzed by the tool were never
intended for such a purpose or are not accurate for that purpose, then
conclusions drawn from such an analysis would also be erroneous.
Another privacy risk is the potential for use of the tool to extend
beyond the scope of what it was originally designed to address, a
phenomenon commonly referred to as function or mission "creep." Because
it can facilitate a broad range of potential queries and analyses and
aggregate large quantities of previously isolated pieces of
information, ADVISE could produce aggregated, organized information
that organizations could be tempted to use for purposes beyond that
which was originally specified when the information was collected. The
risks associated with mission creep are relevant to the purpose
specification and use limitation principles.
DHS Has Implemented Security Controls but Has Not Yet Assessed Privacy
Risks:
To address security, DHS has included several types of controls in
ADVISE. These include authentication procedures, access controls, and
security auditing capability. For example, an analyst must provide a
valid user name and password in order to gain access to the tool.
Further, upon gaining access, only users with appropriate security
clearances may view sensitive data sets. Each service requested by a
user--such as issuing a query or retrieving a document--is checked
against the user's credentials and access authorization before it is
provided. In addition, these user requests and the tool's responses to
them are all recorded in an audit log.
While inclusion of controls such as these is a key step in guarding
against unauthorized access, use, disclosure, or modification, such
controls alone do not address the full range of potential privacy
risks. The need to evaluate such risks early in the development of
information technology is consistently reflected in both law (the E-
Government Act of 2002) and related federal guidance. The E-Government
Act requires that a PIA be performed before an agency develops or
procures information technology that collects, maintains, or
disseminates information in a personally identifiable form. Further,
both OMB and DHS PIA guidance emphasize the need to assess privacy
risks from the early stages of development.[Footnote 21]
However, although DHS officials are considering performing a PIA, no
PIA or other privacy risk assessment has yet been conducted. The DHS
Privacy Office[Footnote 22] instructed the Science and Technology
Directorate that a PIA was not required because the tool alone did not
contain personal data.[Footnote 23] According to the Privacy Office
rationale, only specific system implementations based on ADVISE that
contained personal data would likely require PIAs, and only at the time
they first began to use such data. However, guidance on conducting PIAs
makes it clear that they should be performed at the early stages of
development. OMB's PIA guidance requires PIAs at the IT development
stage, stating that they "should address the impact the system will
have on an individual's privacy, specifically identifying and
evaluating potential threats relating to elements identified [such as
the nature, source, and intended uses of the information] to the extent
these elements are known at the initial stages of development."
Regarding ADVISE, the tool's intended uses include applications
containing personal information. Thus the requirement to conduct a PIA
from the early stages of development applies.
As of November 2006, the ADVISE program office and DHS Privacy Office
were in discussions regarding the possibility of conducting a privacy
assessment similar to a PIA but modified to address the development of
a technological tool. No final decision has yet been made on whether or
how to proceed with a PIA. However, until such an assessment is
performed, DHS cannot be assured that privacy risks have been
identified or will be mitigated for system implementations based on the
tool.
Privacy Protection Controls to Mitigate Identified Risks Exist and
Could Be Built into ADVISE:
A variety of privacy controls can be built into data mining software
applications, including the ADVISE tool, to help mitigate risks
identified in PIAs and protect the privacy of individuals whose
information may be processed. DHS has recognized the importance of
implementing such privacy protections when data mining applications are
being developed. Specifically, in its July 2006 report, the DHS Privacy
Office recommended instituting controls for data mining activities that
go beyond conducting PIAs and implementing standard security controls.
Such measures could be applied to the development of the ADVISE
tool.[Footnote 24] Among other things, the DHS Privacy Office
recommended that DHS components use data mining tools principally as
investigative tools and not as a means of making automated decisions
regarding individuals.[Footnote 25] The report also emphasizes that
data mining should produce accurate results and recommends that DHS
adopt data quality standards for data used in data mining. Further, the
report recommends that data mining projects give explicit consideration
to using anonymized data when personally identifiable information is
involved. Although some of the report's recommendations may apply only
to operational data mining activities, many reflect system
functionalities that can be addressed during technology development.
Based on privacy risks identified in a PIA, controls exist that could
be implemented in ADVISE to mitigate those risks. For example, controls
could be implemented to enforce use limitations associated with the
purpose specified when the data were originally collected.
Specifically, software controls could be implemented that require an
analyst to specify an allowable purpose and check that purpose against
the specified purposes of the databases being accessed.
Regarding data quality risks, the ADVISE tool currently does not have
the capability to distinguish among individuals with similar
identifying information, nor does it have a mechanism to assess the
accuracy of the relationships it uncovers. To address the risk of
misidentification, software could be added to the tool to distinguish
among individuals that have similar names, a process known as
disambiguation. Disambiguation tools have been developed for other
applications. Additionally, although the ADVISE tool includes a feature
that allows analysts to designate confidence levels for individual
pieces of data, no mechanism has been developed to assess the
confidence of relationships identified by the tool. While software
specifically to determine data quality would be difficult to develop,
other controls exist that could be readily used as part of a strategy
for mitigating this risk. For example, anonymization could be used to
minimize the exposure of personal data, and operational procedures
could be developed to restrict the use of analytical results containing
personal information that could have data quality concerns. To
implement anonymization, the tool would need the software capability to
handle anonymized data or have a built-in data anonymizer. DHS
currently does not have plans to build anonymization into the ADVISE
tool.[Footnote 26]
Until a PIA that identifies the privacy risks of ADVISE is conducted
and privacy controls to mitigate those risks are implemented, DHS faces
the risk that privacy concerns will arise during implementation of
systems based on ADVISE that may be more difficult to address at that
stage and possibly require costly retrofitting.
Conclusions:
The ADVISE tool is intended to provide the capability to ingest large
amounts of data from multiple sources and to display relationships that
can be discerned within the data. Although the ADVISE tool has not yet
been fully implemented and its effectiveness is still being evaluated,
the chief intended benefit is to help detect activities threatening to
the United States by facilitating the analysis of large amounts of
data.
The ADVISE tool incorporates security controls intended to protect the
information it processes from unauthorized access. However, because
ADVISE is intended to be used in ways that are likely to involve
personal data, a range of potential privacy risks could be involved in
its operational use. Thus, it is important that those risks be
assessed--through a PIA--so that additional controls can be established
to mitigate them. However, DHS has not yet conducted a PIA, despite the
fact that the E-Government Act and related OMB and DHS guidance
emphasize the need to assess privacy risks early in systems
development. Although DHS officials stated that they believe a PIA is
not required because the tool alone does not contain personal data,
they also told us they are considering conducting a modified PIA for
the tool. Until a PIA is conducted, little assurance exists that
privacy risks have been rigorously considered and mitigating controls
established. If controls are not addressed now, they may be more
difficult and costly to retrofit at a later stage.
Recommendations for Executive Action:
To ensure that privacy protections are in place before DHS proceeds
with implementations of systems based on ADVISE, we recommend that the
Secretary of Homeland Security take the following two actions:
² immediately conduct a privacy impact assessment of the ADVISE tool to
identify privacy risks, such as those described in this report, and:
² implement privacy controls to mitigate potential privacy risks
identified in the PIA.
Agency Comments and Our Evaluation:
We received oral and written comments on a draft of this report from
the DHS Departmental GAO/Office of Inspector General Liaison Office.
(Written comments are reproduced in appendix II.) DHS officials
generally agreed with the content of this report and described actions
initiated to address our recommendations. DHS also provided technical
comments, which have been incorporated in the final report as
appropriate.
In its comments DHS emphasized the fact that the ADVISE tool itself
does not contain personal data and that each deployment of the tool
will be reviewed through the department's privacy compliance process,
including, as applicable, development of a PIA and a system of records
notice. DHS further stated that it is currently developing a "Privacy
Technology Implementation Guide" to be used to conduct a PIA for
ADVISE. Although we have not reviewed the guide, it appears to be a
positive step toward developing a PIA process to address technology
tools such as ADVISE.
It is not clear from the department's response whether the privacy
controls identified based on applying the Privacy Technology
Implementation Guide to ADVISE are to be incorporated into the tool
itself. We believe that any controls identified by a PIA to mitigate
privacy risks should be implemented, to the extent possible, in the
tool itself. Specific development efforts that use the tool will then
have these integrated controls readily available, thus reducing the
potential for added costs and technical risks. The department also
requested that we change the wording of our recommendation; however, we
have retained the wording in our draft report because it clearly
emphasizes the need to incorporate privacy controls into the ADVISE
tool itself.
As agreed with your office, unless you publicly announce the contents
of this report earlier, we plan no further distribution until 30 days
from the report date. At that time, we will send copies of this report
to the Secretary of Homeland Security and other interested
congressional committees. Copies will be made available to others on
request. In addition, this report will be available at no charge on our
Web site at www.gao.gov.
If you have any questions concerning this report, please call me at
(202) 512-6240 or send e-mail to koontzl@gao.gov. Contact points for
our Offices of Congressional Relations and Public Affairs may be found
on the last page of this report. Key contributors to this report are
listed in appendix III.
Sincerely yours,
Signed by:
Linda D. Koontz:
Director, Information Management Issues:
[End of section]
Appendix I: Objectives, Scope, and Methodology:
Our objectives were to determine the following:
² the planned capabilities, uses, and associated benefits of the
Analysis Dissemination, Visualization, Insight, and Semantic
Enhancement (ADVISE) tool and:
² whether potential privacy issues could arise from using the ADVISE
tool to process personal information and how the Department of Homeland
Security (DHS) has addressed any such issues.
To address our first objective, we identified and analyzed the tool's
capabilities, planned uses, and associated benefits. We reviewed
program documentation, including annual program execution plans, and
interviewed agency officials responsible for managing and implementing
the program, including officials from the DHS Science and Technology
Directorate and the Lawrence Livermore and Pacific Northwest National
Laboratories. We also viewed a demonstration of the tool's semantic
graphing capability. In addition, we interviewed officials at DHS
components to identify their current or planned uses of ADVISE, the
progress of their implementations, and the benefits they hope to gain
from using the tool. These components included Immigrations and Customs
Enforcement and other components. We also interviewed officials from
the Interagency Center of Applied Homeland Security Technology
(ICAHST), who are responsible for conducting testing of the tool's
capabilities. We also visited ICAHST at the John Hopkins Applied
Physics Laboratory in Laurel, Maryland, to view a demonstration of its
testing activities. We did not conduct work or review implementations
of ADVISE at the DHS Office of Intelligence and Analysis.
To address our second objective, we identified potential privacy
concerns that could arise from using the ADVISE tool by reviewing
relevant reports, including prior GAO reports and the DHS Privacy
Office 2006 report on data mining. We identified and analyzed DHS
actions to comply with the Privacy Act of 1974 and the E-Government Act
of 2002. We interviewed technical experts within the DHS Science and
Technology Directorate and personnel responsible for implementing
ADVISE at DHS components to assess privacy controls included in the
ADVISE tool. We also interviewed officials from the DHS Privacy Office.
We performed our work from June 2006 to December 2006 in the
Washington, D.C., metropolitan area. Our work was performed in
accordance with generally accepted government auditing standards.
[End of section]
Appendix II: Comments from the Department of Homeland Security:
U.S. Department of Homeland Security:
Washington, DC 20528:
February 2, 2007:
Ms. Linda D. Koontz:
Director, Information Management Issues:
U.S. Government Accountability Office:
Washington, D. C. 20548:
Dear Ms. Koontz:
Thank you for the opportunity to comment on the draft report GAO-07-293
"Datamining: Early Attention to Privacy in Developing a Key DHS Program
Could Reduce Risks". In this draft report, GAO recommends that "the
Secretary of Homeland Security immediately conduct a privacy impact
assessment (PIA) of the ADVISE tool and implement privacy controls as
needed to mitigate any identified risks" to ensure that privacy
protections are in place.
The ADVISE toolset is a set of generic IT tools and does not in itself
collect or use any data. The individual ADVISE tools could be combined
to create specific systems which would be designed to support specific
operational needs. Each of these deployments of the ADVISE toolset
would be reviewed and reported through the DHS privacy compliance
process and documented in the Privacy Threshold Analysis (PTA) and, as
applicable, the Privacy Impact Assessment (PIA) and System of Records
Notice (SORN).
Within DHS, the term "Privacy Impact Assessment" refers to two separate
and related functions. The first is the PIA activity of assessing a
technology, program, etc. for potential privacy impacts. The second is
the PIA report that documents the results of that assessment activity.
The distinctions between these two meanings of the term may help to
clarify DHS's approach to assessing the potential privacy impacts of
the ADVISE toolset.
The DHS Privacy Impact Assessment form (the report) is designed for
operational systems and is not well suited to open-ended toolsets like
ADVISE. In conducting the privacy impact assessment (the activity) DHS
decided that a different type of document would better fit the nature
of the ADVISE toolset. Rather than using a descriptive reporting
document, DHS is using a proscriptive guidance document that is
tailored to the specific nature of the ADVISE toolset. This guidance
document is called a "Privacy Technology Implementation Guide" and is
currently being developed by the DHS Privacy Office.
The Privacy Technology Implementation Guide is more adaptable to
technology frameworks like ADVISE and provides guidance as to how the
individual ADVISE tools could be implemented in privacy protective
ways. System Developers building specific implementations of ADVISE can
use the Guide to build privacy protections into the systems as part of
the development process. The flexibility of the Privacy Technology
Implementation Guide allows for integrated privacy protection in all
uses of the ADVISE tools and will be complemented by Privacy Impact
Assessment documents for operational deployments (individual systems).
The current draft of the Privacy Technology Implementation Guide for
ADVISE is organized into two sections. The first section identifies
privacy protections related to the general nature of ADVISE as a set of
tools and recommends that the same privacy protections related to
datamining be applied to ADVISE. This first section further recommends
that the value and limitations of ADVISE tools be specifically
identified and matched to the purpose and success measures of the
specific DHS mission the tool would be implemented to support. The
second section is organized by the technology architecture (information
layer, knowledge layer, security layer, application layer) and offers
specific guidance for integrating privacy protection into the use of
the tools from each of these architectural components.
The privacy impact assessment activity led to the determination that a
new type of document would further assist in building privacy
protections into technology. The Privacy Technology Implementation
Guide is that new type of documentation and is being developed to
accompany the toolset itself. The expected result is that as technology
developers decide that the ADVISE toolset could be used to meet a
particular DHS need, they receive privacy technology guidance along
with the toolset to assist in building privacy protections into the
system from the beginning. DHS will continue to use the suite of
privacy compliance documents (PTA, PIA, SORN) to report on the
potential privacy impacts and integrated privacy protections for each
individual systems.
Requested change to GAO recommendation:
Based on the above, DHS requests that GAO revise its recommendations to
read:
"To ensure that privacy protections are integrated into the development
process, the Secretary of Homeland Security should create privacy
controls for the ADVISE toolset to guide specific development efforts
and conduct a privacy impact assessment for each ADVISE deployment to
ensure those controls are implemented and effective."
Two additional edits:
* Page 21, Table 2: Please remove the ICE "implementation." DHS is only
engaged in early discussion and no implementation is currently planned.
* Page 26: The report seems to suggest that ADVISE provides an
automated means for making decisions regarding individuals. DHS would
like to clarify that ADVISE is an aid for analysts in identifying
relationships and patterns of interest. ADVISE is an analysis tool an
not a decision- making tool. The tool itself does not make decisions.
DHS appreciates GAO's work in planning, conducting and issuing this
report and for the opportunity to review the draft.
Sincerely,
Signed by:
Steven J. Pecinovsky:
Director, Departmental GAO/OIG Liaison Office:
[End of section]
Appendix III: GAO Contact and Staff Acknowledgments:
GAO Contact:
Linda D. Koontz, (202) 512-6240 or koontzl@gao.gov:
Staff Acknowledgments:
In addition to the individual named above, John de Ferrari, Assistant
Director; Idris Adjerid; Nabajyoti Barkakati; Barbara Collier; David
Plocher; and Jamie Pressman made key contributions to this report.
FOOTNOTES
[1] For purposes of this report, the term personal information
encompasses all information associated with an individual, including
both identifying and nonidentifying information. Personally identifying
information, which can be used to locate or identify an individual,
includes things such as names, aliases, and agency-assigned case
numbers.
[2] A privacy impact assessment is an analysis of how personal
information is collected, stored, shared, and managed in a federal
system to ensure that privacy requirements are addressed.
[3] These DHS components include Immigration and Customs Enforcement
and other components. We also interviewed officials from the
Interagency Center of Applied Homeland Security Technology, who are
responsible for testing the tool's capabilities. ADVISE is also being
used by the Office of Intelligence and Analysis. We did not review that
application.
[4] DHS, Data Mining Report: DHS Privacy Office Response to House
Report 108-774 (July 6, 2006).
[5] ADVISE is also being used by the Office of Intelligence and
Analysis. We did not review that application.
[6] GAO, Data Mining: Federal Efforts Cover a Wide Range of Uses, GAO-
04-548 (Washington, D.C.: May 4, 2004).
[7] GAO, Expedited Assistance for Victims of Hurricane Katrina and
Rita: FEMA's Control Weaknesses Exposed the Government to Significant
Fraud and Abuse, GAO-06-403T (Washington, D.C.: Feb. 13, 2006).
[8] DHS Office of Inspector General, Survey of DHS Data Mining
Activities (August 2006).
[9] The mission of the Science and Technology Directorate is to act as
the primary research and development arm of DHS, providing federal,
state, and local officials with the technology and capabilities to
protect the United States homeland.
[10] U.S. Department of Health, Education, and Welfare, Records,
Computers and the Rights of Citizens: Report of the Secretary's
Advisory Committee on Automated Personal Data Systems (Washington,
D.C.: July 1973).
[11] GAO-04-548.
[12] GAO, Data Mining: Agencies Have Taken Key Steps to Protect Privacy
in Selected Efforts, but Significant Compliance Issues Remain, GAO-05-
866 (Washington, D.C.: Aug. 15, 2005).
[13] Under the Privacy Act of 1974, the term "routine use" means (with
respect to the disclosure of a record) the use of such a record for a
purpose that is compatible with the purpose for which it was collected.
5 U.S.C. § 552a(a)(7).
[14] OMB, Guidance for Implementing the Privacy Provisions of the E-
Government Act of 2002, Memorandum M-03-22 (Washington, D.C.: Sept. 26,
2003).
[15] A PIA may not be required for all systems. For example, no
assessment is required when the information collected relates to
internal government operations, when the information has been
previously assessed under an evaluation similar to a PIA, or when
privacy issues are unchanged.
[16] DHS Privacy Office, PIA Official Guidance (March 2006).
[17] U.S. Department of Health, Education, and Welfare, Records,
Computers and the Rights of Citizens: Report of the Secretary's
Advisory Committee on Automated Personal Data Systems (Washington,
D.C.: July 1973).
[18] OECD, Guidelines on the Protection of Privacy and Transborder Flow
of Personal Data (Sept. 23, 1980). The OECD plays a prominent role in
fostering good governance in the public service and in corporate
activity among its 30 member countries. It produces internationally
agreed-upon instruments, decisions, and recommendations to promote
rules in areas where multilateral agreement is necessary for individual
countries to make progress in the global economy.
[19] European Union Data Protection Directive ("Directive 95/46/EC of
the European Parliament and of the Council of 24 October 1995 on the
Protection of Individuals with Regard to the Processing of Personal
Data and the Free Movement of Such Data") (1995).
[20] DHS, Data Mining Report: DHS Privacy Office Response to House
Report 108-774 (July 6, 2006), p. 12.
[21] DHS PIA guidance states that "[t]he purpose of a PIA is to
demonstrate that system owners and developers have consciously
incorporated privacy protections throughout the entire life cycle of a
system. This involves making certain that privacy protections are built
into the system from the start, not after the fact when they can be far
more costly or could affect the viability of the project." In addition,
OMB guidance states that "[a]gencies should commence a PIA when they
begin to develop a new or significantly modified IT system."
[22] The DHS Privacy Office was created in response to the Homeland
Security Act of 2002, Pub. L. No. 107-296, § 222, 116 Stat. 2155 (Nov.
25, 2002). The Privacy Officer is responsible for, among other things,
"assuring that the use of technologies sustain[s], and do[es] not erode
privacy protections relating to the use, collection, and disclosure of
personal information."
[23] It is important to note the distinction between the PIA
requirement, based on the E-Government Act, and the requirements of the
Privacy Act. Because the ADVISE tool itself does not contain any data,
it is not considered a system of records for purposes of the Privacy
Act and thus is not subject to the requirements of that law. As ADVISE
implementations move from development to operations, they may lead to
the creation or modification of systems of records, which would require
the development of appropriate privacy notices to be published in the
Federal Register and other actions to protect privacy.
[24] The Privacy Office's report states that ADVISE is a "technology"
and not a data mining program. Accordingly, the report's
recommendations ostensibly would not apply to ADVISE. However, the
report acknowledges that uses of ADVISE may constitute data mining, in
which case the recommendations would apply.
[25] ADVISE does not provide an automated means for making decisions
about individuals. Rather, it is an analysis tool to aid analysts in
identifying relationships and patterns of interest.
[26] In addition, a feature was to be implemented in January 2007 that
would enforce an internal DHS rule regarding how long information about
U.S. persons can be maintained in intelligence data bases. However,
because this control is designed to respond only to the DHS rule--and
not to identified privacy risks--it leaves potential concerns
unaddressed about how personal information is used when it is
maintained and processed by ADVISE.
GAO's Mission:
The Government Accountability Office, the audit, evaluation and
investigative arm of Congress, exists to support Congress in meeting
its constitutional responsibilities and to help improve the performance
and accountability of the federal government for the American people.
GAO examines the use of public funds; evaluates federal programs and
policies; and provides analyses, recommendations, and other assistance
to help Congress make informed oversight, policy, and funding
decisions. GAO's commitment to good government is reflected in its core
values of accountability, integrity, and reliability.
Obtaining Copies of GAO Reports and Testimony:
The fastest and easiest way to obtain copies of GAO documents at no
cost is through GAO's Web site (www.gao.gov). Each weekday, GAO posts
newly released reports, testimony, and correspondence on its Web site.
To have GAO e-mail you a list of newly posted products every afternoon,
go to www.gao.gov and select "Subscribe to Updates."
Order by Mail or Phone:
The first copy of each printed report is free. Additional copies are $2
each. A check or money order should be made out to the Superintendent
of Documents. GAO also accepts VISA and Mastercard. Orders for 100 or
more copies mailed to a single address are discounted 25 percent.
Orders should be sent to:
U.S. Government Accountability Office 441 G Street NW, Room LM
Washington, D.C. 20548:
To order by Phone: Voice: (202) 512-6000 TDD: (202) 512-2537 Fax: (202)
512-6061:
To Report Fraud, Waste, and Abuse in Federal Programs:
Contact:
Web site: www.gao.gov/fraudnet/fraudnet.htm E-mail: fraudnet@gao.gov
Automated answering system: (800) 424-5454 or (202) 512-7470:
Congressional Relations:
Gloria Jarmon, Managing Director, JarmonG@gao.gov (202) 512-4400 U.S.
Government Accountability Office, 441 G Street NW, Room 7125
Washington, D.C. 20548:
Public Affairs:
Paul Anderson, Managing Director, AndersonP1@gao.gov (202) 512-4800
U.S. Government Accountability Office, 441 G Street NW, Room 7149
Washington, D.C. 20548: