Decennial Census

Methods for Collecting and Reporting Hispanic Subgroup Data Need Refinement Gao ID: GAO-03-228 January 17, 2003

To help boost response rates of both the general and Hispanic populations, the U.S. Census Bureau (Bureau) redesigned the 2000 questionnaire, in part by deleting a list of examples of Hispanic subgroups from the question on Hispanic origin. While more Hispanics were counted in 2000 compared to 1990, the counts for Dominicans and other Hispanic subgroups were lower than expected. Concerned that this was caused by the deletion of Hispanic subgroup examples, congressional requesters asked us to investigate the research and management activities behind the changes.

In both the 1990 and 2000 census, Hispanics could identify themselves as Mexican, Puerto Rican, Cuban, or other Hispanic. Respondents checking off this latter category could write in a specific subgroup such as "Salvadoran." The "other" category in the 1990 Census included examples of subgroups to clarify the question. For the 2000 Census, the Bureau removed the subgroup examples as part of a broader effort to simplify the questionnaire and help improve response rates. The Bureau removed unnecessary words and added blank space to shorten the questionnaire and make it more readable. Although the Bureau conducted a number of tests on the sequencing and wording of the race and ethnicity questions, and sought input from several expert panels, no Bureau tests were designed specifically to measure the impact of the questionnaire changes on the quality of Hispanic subgroup data. According to Bureau officials, because federal laws and guidelines require data on Hispanics but not Hispanic subgroups, the Bureau targeted its resources on research aimed at improving the overall count of Hispanics. Bureau evaluations conducted after the census indicated that deleting the subgroup examples might have confused some respondents and produced less-than-accurate subgroup data. A key factor behind the Bureau's release of the questionable subgroup data was its lack of adequate guidelines governing the quality needed before making data publicly available. As part of its planning for the 2010 Census, the Bureau intends to conduct further research on the Hispanic origin question, including a field test in parts of New York City. However, until research on a new version of the question is finalized, Bureau officials said that other census surveys will continue to use the 2000 Census format of the Hispanic origin question.

Recommendations

Our recommendations from this work are listed below with a Contact for more information. Status will change from "In process" to "Open," "Closed - implemented," or "Closed - not implemented" based on our follow up work.

Director: Team: Phone:


GAO-03-228, Decennial Census: Methods for Collecting and Reporting Hispanic Subgroup Data Need Refinement This is the accessible text file for GAO report number GAO-03-228 entitled 'Decennial Census: Methods for Collecting and Reporting Hispanic Subgroup Data Need Refinement' which was released on February 19, 2003. This text file was formatted by the U.S. General Accounting Office (GAO) to be accessible to users with visual impairments, as part of a longer term project to improve GAO products‘ accessibility. Every attempt has been made to maintain the structural and data integrity of the original printed product. Accessibility features, such as text descriptions of tables, consecutively numbered footnotes placed at the end of the file, and the text of agency comment letters, are provided but may not exactly duplicate the presentation or format of the printed version. The portable document format (PDF) file is an exact electronic replica of the printed version. We welcome your feedback. Please E-mail your comments regarding the contents or accessibility features of this document to Webmaster@gao.gov. Report to Congressional Requesters: January 2003: Decennial Census: Methods for Collecting and Reporting Hispanic Subgroup Data Need Refinement: GAO-03-228: GAO Highlights: Highlights of GAO-03-228, a report to Congressional Requesters. Why GAO Did this Study: To help boost response rates of both the general and Hispanic populations, the U.S. Census Bureau (Bureau) redesigned the 2000 questionnaire, in part by deleting a list of examples of Hispanic subgroups from the question on Hispanic origin. While more Hispanics were counted in 2000 compared to 1990, the counts for Dominicans and other Hispanic subgroups were lower than expected. Concerned that this was caused by the deletion of Hispanic subgroup examples, congressional requesters asked us to investigate the research and management activities behind the changes. What GAO Found: In both the 1990 and 2000 censuses, Hispanics could identify themselves as Mexican, Puerto Rican, Cuban, or other Hispanic. Respondents checking off this latter category could write in a specific subgroup such as ’Salvadoran.“ The ’other“ category in the 1990 Census included examples of subgroups to clarify the question. For the 2000 Census, the Bureau removed the subgroup examples as part of a broader effort to simplify the questionnaire and help improve response rates. The Bureau removed unnecessary words and added blank space to shorten the questionnaire and make it more readable. Although the Bureau conducted a number of tests on the sequencing and wording of the race and ethnicity questions, and sought input from several expert panels, no Bureau tests were designed specifically to measure the impact of the questionnaire changes on the quality of Hispanic subgroup data. According to Bureau officials, because federal laws and guidelines require data on Hispanics but not Hispanic subgroups, the Bureau targeted its resources on research aimed at improving the overall count of Hispanics. Bureau evaluations conducted after the census indicated that deleting the subgroup examples might have confused some respondents and produced less-than-accurate subgroup data. A key factor behind the Bureau‘s release of the questionable subgroup data was its lack of adequate guidelines governing the quality needed before making data publicly available. As part of its planning for the 2010 Census, the Bureau intends to conduct further research on the Hispanic origin question, including a field test in parts of New York City. However, until research on a new version of the question is finalized, Bureau officials said that other census surveys will continue to use the 2000 Census format of the Hispanic origin question. What GAO Recommends: GAO recommends that the Bureau * implement its plans to conduct further research on the Hispanic question, taking steps to properly test the impact of any changes on the quality of data on Hispanic subgroups and Hispanics overall, and * develop agencywide protocols that provide guidelines for Bureau decisions on the level of quality needed to release data to the public, how to characterize any limitations in the data, and when it is acceptable to delay or suppress the data. The Bureau agreed with our recommendations, but took exception to our findings concerning the adequacy of its data quality guidelines. GAO Highlights Figure: [See PDF for image] ]End of figure] Contents: Letter: Results in Brief: Background: Objectives, Scope, and Methodology: Efforts to Simplify Questionnaire Led Bureau to Delete List of Example Hispanic Subgroups: The Bureau Plans to Conduct Targeted Research on Hispanic Subgroups in the Future: Conclusions: Recommendations for Executive Action: Agency Comments and Our Evaluation: Appendix: Appendix I: Comments from the Department of Commerce: Related GAO Products: Figures : Figure 1: Evolution of the Hispanic Question from the 1970 Census to the 2000 Census: Figure 2: The Bureau Simplified the 2000 Census Questionnaire: Figure 3: The 2000-Style Questionnaire Produced Lower Subgroup Counts than Those from a Test Using the 1990-Style Questionnaire: Letter January 17, 2003: The Honorable Danny K. Davis Ranking Minority Member Subcommittee on Civil Service, Census and Agency Organization Committee on Government Reform House of Representatives: The Honorable Wm. Lacy Clay The Honorable Charles A. Gonzalez The Honorable Carolyn B. Maloney House of Representatives: Collecting data on race and ethnicity is among the federal government‘s most complex and controversial data collection efforts. The decennial census has collected these data in various forms beginning with the very first national headcount in 1790. Since the 1960s, race and ethnicitydata have been used to monitor and enforce compliance with a number of civil rights laws, including those governing equality in employment, voting, housing, mortgage lending, health care services, and education. Over time, in response to changing federal mandates, demographics, and its own operational requirements, the U.S. Census Bureau (Bureau) has changed the format and sequence of the race and ethnicity questions. The Bureau made one such change for the 2000 Census when, in an effort to improve the count of Hispanics and simplify the questionnaire, it redesigned the question on Hispanic origin and dropped a list of examples of Hispanic subgroups. As soon as the Hispanic and Hispanic subgroup data from the 2000 Census were released in May 2001, questions were raised about the counts for specific Hispanic subgroups. For example, the reported count of Dominican Hispanics was significantly lower than the counts reported in other Bureau surveys. Concerned that the lower-than-expected Hispanic subgroup counts were the result of dropping the list of example write- in Hispanic subgroups from the 2000 questionnaire, you asked us to investigate the research and management activities behind this change. As agreed with your offices, we reviewed (1) the decision-making process behind the Bureau‘s removal of the example subgroups, (2) the research the Bureau conducted to aid in that decision, and (3) the Bureau‘s future plans for collecting Hispanic subgroup data. This report parallels our recent study addressing congressional concerns about how the Bureau reported data on people counted at emergency and transitional shelters, a segment of the population that includes, among others, the homeless.[Footnote 1] Both reports are part of our ongoing series on lessons learned from the 2000 Census that can help inform the planning effort for 2010. (See the Related GAO Products section for the reports issued to date). Results in Brief: The Bureau removed examples of Hispanic subgroups from the census question on Hispanic origin as part of an effort to make the questionnaire more ’respondent-friendly.“ The Bureau‘s evaluations of the 1990 Census indicated that deleting unnecessary words and adding more white space, among other changes, could help improve response rates. The Bureau also modified the wording and format of the Hispanic question in order to improve Hispanic participation in the census. Throughout the 1990s, the Bureau conducted a number of tests to determine the impact that these and other changes had on the overall count of Hispanics. However, because Office of Management and Budget standards governing the collection of race and ethnic data do not require data on Hispanic subgroups, the Bureau did not specifically design any tests to determine the likely effect of the changes on the quality of Hispanic subgroup data. Although the Bureau did not test the likely impact of questionnaire changes on the Hispanic subgroup data, it released subgroup counts along with the overall Hispanic data in May 2001. Immediately following the release of these data, local government officials and representatives of Hispanic subgroups raised questions about the accuracy of specific subgroup counts. Bureau evaluations conducted following the census suggest that dropping the examples of Hispanic subgroups confused some respondents and produced less-than-accurate subgroup data. For example, in one experiment, the Bureau mailed a 1990-style questionnaire (which included subgroup examples) to a sample of individuals as part of the 2000 Census. The Bureau found that 93 percent of Hispanics given the 1990-style form reported a specific subgroup, compared to 81 percent of Hispanics given the 2000-style form. Thus, while the Bureau reported what respondents marked on their questionnaires, because of respondents‘ confusion over the wording of the question, the subgroup data could be misleading. The Bureau has made improving the quality of the Hispanic question a focus for the 2010 Census and intends to test questionnaire changes aimed at improving the quality of its overall count of Hispanics and its counts of Hispanic subgroups. In 2003, the Bureau is to begin testing the Hispanic question, and as part of a field test in 2004, the Bureau plans to administer the questionnaire in parts of the New York City borough of Queens. Any changes to the census questionnaire will also affect other Bureau surveys, such as the proposed American Community Survey (ACS), which the Bureau designed in part to replace the census long-form questionnaire. Bureau officials said that the ACS will continue to use the 2000 Census Hispanic question until research and testing on a new version is complete. A key factor behind the Bureau‘s release of apparently less-than- accurate Hispanic subgroup data appears to be a lack of adequate guidelines governing decisions on quality considerations that should be addressed before making data publicly available. Had such guidelines been in place prior to releasing the Hispanic subgroup data, they could have prompted the Bureau to apply more rigorous quality checks on the accuracy of the Hispanic subgroup data; provided a basis for either releasing, delaying, or suppressing the data; and informed decisions on how to describe any of their limitations. The lack of data quality guidelines resulted in similar difficulties when the Bureau initially decided not to release data on the homeless and others without conventional housing. In our companion report, we recommended that the Secretary of Commerce ensure that the Bureau develop agencywide guidelines governing the level of quality needed to release data to the public, when and how to characterize any limitations, and when it is acceptable to suppress data. Because these incidents, if repeated, could erode public confidence in the data, it will be important for the Bureau to implement these recommendations. Additionally, with respect to the Hispanic subgroup data, we are recommending that the Bureau take steps to properly test the impact that any changes to the Hispanic origin question have on the quality of Hispanic data, and the quality of Hispanic subgroups in particular. The Secretary of Commerce forwarded written comments from the U.S. Census Bureau on a draft of this report (see app. I). The Bureau agreed with our conclusions and recommendations and is taking steps to implement them, but took exception to our findings concerning the adequacy of its data quality guidelines. Background: While the decennial census has long collected data on race and ethnicity,[Footnote 2] a specific question on Hispanic origin was first added to the 1970 Census in response to the 1965 Voting Rights Act, which required the data to ensure equality in voting.[Footnote 3] Today, antidiscrimination provisions in a number of statutes require census data on race and Hispanic origin in order to monitor and enforce equal access to housing, education, employment, and other areas. The Office of Management and Budget (OMB), through its Federal Statistical Policy Directive No. 15, sets the standards governing federal agencies‘ collection and reporting of race and ethnicity data. At least seven cabinet-level government departments, the Federal Reserve, every state government, and a number of public and private organizations use Hispanic data. Although not required by federal legislation or OMB standards, Hispanic subgroup data are also used for many of these same purposes. In addition, subgroup data are especially important to communities with rapidly growing and diverse Hispanic populations. Collecting data on race and ethnicity has been a persistent challenge for the Bureau. Race and ethnicity are subjective characteristics, which makes measurement difficult. Moreover, the Bureau has found that some Hispanics equate their ethnicity--Hispanic--with race, and thus find it difficult to classify themselves by the standard race categories that include, for example, white, black, and Asian. The Bureau‘s preparations for the 2000 Census included an extensive research and testing program to improve the Hispanic count. In 1990, the Bureau estimated that it did not enumerate 5 percent of the Hispanic population. Further, the ethnicity question, which was posed to all respondents, appeared to confuse both Hispanics and non- Hispanics. For example, many non-Hispanics, thinking the question only pertained to Hispanics, did not answer the question. Overall, 10 percent of respondents failed to answer the 1990 Hispanic question--the highest of any short form item in 1990. As a result, the Bureau made improving the Hispanic count a major priority for the 2000 Census. Objectives, Scope, and Methodology: Our objectives were to review (1) the Bureau‘s decision-making process that led to its dropping the list of subgroup examples from the Hispanic question on the 2000 Census form, (2) the research conducted by the Bureau to aid in this decision, and (3) the Bureau‘s future plans for collecting Hispanic subgroup data. To address each of these objectives, we interviewed key Bureau officials and examined Bureau, OMB, and other documents, including planning materials and internal memos. To obtain a local perspective of how municipal governments and community leaders use Hispanic subgroup data, we met with data users in New York City, including representatives of the New York Department of Planning and the Dominican and Puerto Rican communities. We also attended a meeting of the Dominican American National Round Table, a Dominican American advocacy group that discussed issues relating to the 2000 Census count of Dominican Hispanics. We also attended meetings of the Census Advisory Committee on Race and Ethnicity that addressed the issue of the quality of the Hispanic subgroup data. Finally, to examine the research behind the Bureau‘s decision to remove the example subgroups from the 2000 questionnaire, we reviewed the results of the Bureau‘s National Content Survey, Targeted Race and Ethnicity Test, and other research conducted throughout the 1990s in preparation for the 2000 Census. Additionally, we reviewed information from the Bureau‘s meetings with its Advisory Committee on the Decennial Census and its Advisory Committee on Race and Ethnicity. We also examined relevant materials from OMB‘s Interagency Committee for the Review of the Racial and Ethnic Standards. To review the Bureau‘s future plans for collecting Hispanic subgroup data, we attended meetings of the National Academy of Science Panel on Future Census Methods, the Decennial Census Advisory Committee, and the Census Advisory Committee on Race and Ethnicity. We also discussed these plans with Bureau officials. Our audit work was conducted in New York City and Washington, D.C., and at the Bureau‘s headquarters in Suitland, Maryland, from January through September 2002. Our work was done in accordance with generally accepted government auditing standards. We requested comments on a draft of this report from the Secretary of Commerce. On November 27, 2002, the Secretary forwarded the U.S. Census Bureau‘s written comments on the draft. The comments are reprinted in appendix I. We address these comments at the end of this report. Efforts to Simplify Questionnaire Led Bureau to Delete List of Example Hispanic Subgroups: Collecting accurate ethnic data has challenged the Bureau for over 30 years. Since the 1970 Census, when the Bureau first included a question on Hispanic origin, every census has had comparatively high Hispanic undercounts that reduced the quality of the data. As a result, the Bureau has modified the Hispanic question on every census since then as part of a continuing effort to improve the Hispanic count. (See fig. 1.) In addition, a Spanish language version of the census form has been available upon request since 1980. Figure 1: Evolution of the Hispanic Question from the 1970 Census to the 2000 Census: [See PDF for image] - graphic text: [End of figure] - graphic text: For the 2000 Census, Hispanics could identify themselves as Mexican, Puerto Rican, Cuban, or ’other Spanish/Hispanic/Latino.“ Respondents who checked off this last category could write in a specific subgroup such as ’Salvadoran.“ Although this approach was similar to that used for the 1990 Census, as shown in figure 1, the ’other“ category in the 1990 Census included examples of other Hispanic subgroups. The Bureau deleted these examples as one of several changes to the Hispanic question for the 2000 Census. Other changes included (1) adding the word ’Latino“ to the designation Spanish/Hispanic, (2) dropping the word ’origin“ from the question, and (3) moving the location of instructions on writing in an unlisted subgroup. According to Bureau officials, these latter three changes were made to improve the Hispanic count. The Bureau removed the subgroup examples as part of a broader effort to simplify the questionnaire and thus help reverse the downward trend in mail response rates that had been occurring since 1970. Indeed, evaluations of the 1990 Census indicated that the overall design of the form was confusing to many and contributed to lower response rates, particularly among some hard-to-enumerate groups such as Hispanics. In redesigning the questionnaire, the Bureau added as much white space as possible, and removed unnecessary words to make the questionnaire shorter and more readable. As shown in figure 2, the 2000 questionnaire appears more ’respondent-friendly“ compared to the 1990 questionnaire. Figure 2: The Bureau Simplified the 2000 Census Questionnaire: [See PDF for image] - graphic text: [End of figure] - graphic text: The Bureau initially proposed removing the example write-in subgroups during 1990 through 1992. A first version of the questionnaire without the example subgroups was used in the 1992 National Census Test. However, as discussed in the next section, testing continued from 1992 to 1996 to ensure that removing the write-in example groups did not harm the overall count of Hispanics. From 1995 to 1997, after testing showed that removal of the write-in example groups would not harm the overall Hispanic count, the Bureau finalized its decision to remove the example subgroups. Although federal law and OMB standards[Footnote 4] only require information on whether an individual is Hispanic, Bureau officials told us they collect subgroup data to help improve the overall Hispanic count. According to the Bureau, many Hispanics do not view themselves as Hispanic, but identify instead with their country of origin or with a particular Hispanic subgroup. State and local governments, academic institutions, community organizations, and marketing firms, among other organizations, also use Hispanic subgroup data for a variety of purposes. For example, officials in the New York City Department of Planning told us that they need accurate information on the number and distribution of Hispanic subgroups in planning the delivery of numerous city services. According to a Bureau official, no data are available on the precise impact the questionnaire redesign had on overall response rates in part because it was made in conjunction with other efforts to improve the response rate, such as a more aggressive outreach and promotion campaign. However, the initial mail response rate was 64 percent, 3 percentage points higher than the Bureau‘s expectations, and comparable to the similar 1990 mail response rate. Moreover, evaluations conducted since the 2000 Census by the Bureau indicate that the Bureau obtained a more complete count of Hispanics in the 2000 Census than it did in 1990. For example, Bureau data show that the 2000 Census missed an estimated 2.85 percent of the Hispanic population compared to an estimated 4.99 percent in 1990--a 43 percent reduction of the undercount.[Footnote 5] The Bureau credits the improvement in part to the changes it made to the questionnaire. However, as discussed in the next section, removing the examples of Hispanic subgroups may have reduced the completeness of data on individual segments of the Hispanic population. No Bureau Tests Were Designed Specifically to Measure the Impact of Questionnaire Changes on Hispanic Subgroup Data: Bureau guidance requires that any changes to the census form must first be thoroughly tested. For example, according to Bureau officials, before changing a question, the Bureau must first conduct research studies, cognitive tests, and field tests to determine how best to sequence and word the question, and to see if the proposed changes are likely to achieve the desired results. Additionally, the census questionnaire is to be reviewed by a variety of census advisory groups, OMB, and Congress before it is finalized. Nevertheless, while the Bureau conducted a number of tests of the sequencing and wording of the race and ethnicity questions, according to Bureau officials, it did not specifically design any tests to determine the impact of the changes on the quality of Hispanic subgroup data.[Footnote 6] Because OMB standards do not require data on Hispanic subgroups, Bureau officials said that the Bureau targeted its resources on testing and research aimed at improving the overall count of Hispanics. Throughout the 1990s, in revising the race and ethnicity questions, the Bureau sought input from several expert panels, including the Interagency Committee formed by OMB[Footnote 7] and the Census Advisory Committee on Racial and Ethnic Populations, one of several panels with which the Bureau consulted to help it plan the 2000 Census. In addition, the Bureau conducted several tests of the questionnaire to assess respondents‘ understanding of the questions and their ability to complete them properly. They included the: * 1992 National Census Test, which field tested potential questions for the 2000 Census questionnaire; * 1996 National Content Survey, which examined a number of issues to improve race and ethnic reporting; and: * 1996 Race and Ethnic Targeted Test, which tested alternative formats for asking race and ethnic questions. In addition, the Bureau analyzed the results of Hispanic data from the 1990 Census (which led to its conclusions about the undercount), but did not conduct any specific evaluations of the quality of the 1990 Hispanic subgroup data. The consultation, research, and testing played a key role in the Bureau‘s decisions to place the ethnicity question before the race question and make several other changes discussed earlier in this report. The test results also indicated that the example subgroups could produce conflicting results. On the one hand, the Bureau found that providing the example subgroups could help prevent respondents‘ confusion over how to describe their ethnicity. On the other hand, the Bureau found that removing the example subgroups could help reduce the bias caused by the example effect, which occurs when a respondent erroneously selects a response because it is provided in the questionnaire. Although the Bureau conducted a dress rehearsal for the 2000 Census in 1998 in order to test its overall design, the dress rehearsal did not identify any problems with the Hispanic subgroup question. According to Bureau officials, this could have been because none of the three test sites--the city of Sacramento, California; Menominee County, Wisconsin, including the Menominee American Indian Reservation; and the city of Columbia, South Carolina, and its 11 surrounding counties--had a large and diverse enough Hispanic population for the problems to become evident. Questions Raised about the Quality of Reported Hispanic Subgroup Data: In May 2001, the Bureau released data on Hispanics and Hispanic subgroups as part of its first release summarizing the results of the 2000 Census, called the SF-1 file. The Bureau also published The Hispanic Population, a 2000 Census brief that provided an overview of the size and distribution of the Hispanic population in 2000 and highlighted changes in the population since the 1990 census. For the first time, the Bureau released data on Hispanic subgroups as a part of its release of the full count SF-1 data even though it had not fully tested the impact of questionnaire changes on the subgroup data and provided little discussion of the potential limitations of the data. Following the initial release of the Hispanic data, local government officials and Hispanic advocacy groups raised questions about the accuracy of the counts of Hispanic subgroups listed as examples on the 1990 census form, but not the 2000 form. The 2000 Census showed lower counts of several Hispanic subgroups than analysts had expected based on their own estimates using a variety of information sources such as vital statistics, immigration statistics, population surveys, and other data. In New York City, local government officials and representatives of Hispanic subgroups who partnered with the Bureau to improve the enumeration of Hispanics told us that they were particularly concerned about low subgroup counts in their communities in part because they needed accurate numbers to plan and deliver specialized services to particular subgroups. Moreover, they said that because ’official census numbers“ are often considered definitive, problems with the released Hispanic subgroup numbers could lead to faulty decision making by data users. Questionnaire Modifications May Have Led to Problems with Hispanic Subgroup Data: Since the release of the 2000 Census Hispanic data, the Bureau has conducted evaluations of the data that provided more information on how removing the subgroup examples may have affected the quality of Hispanic subgroup data. One key evaluation was the Alternative Questionnaire Experiment, in which the Bureau sent out 1990-style census forms to a sample of individuals as part of the 2000 Census. As shown in figure 3, the Bureau‘s research indicates that the 1990-style form elicited more reports of specific Hispanic subgroups than the 2000-style questionnaire.[Footnote 8] Indeed, 93 percent of Hispanics given the 1990-style form reported a specific subgroup, compared to 81 percent of Hispanics given the 2000-style form. Moreover, virtually every subgroup reported in the 2000-style form composed a smaller percentage of the overall Hispanic count than the 1990-style form. Thus, while the Bureau reported what respondents checked off on their questionnaires, because of respondents‘ confusion over the wording of the question, the 2000 subgroup data could be misleading. Figure 3 also suggests that one possible reason for this might be that many respondents did not understand what they were supposed to write in, as many more people on the 2000-style form wrote in ’Hispanic,“ ’Spanish,“ or ’Latino“ (as opposed to a specific subgroup) compared to the 1990-style questionnaire. Additionally, a higher percentage of the respondents did not provide codeable (useable) responses. Moreover, based on its analysis of the Census 2000 Supplementary Survey--an operational test for collecting long-form-type data based on a nationwide sample of 700,000 households--the Bureau estimated that there were about 150,000 more Dominican Hispanics than were counted in the 2000 Census. Some attribute the discrepancy to the fact that many respondents to the supplementary survey provided their answers by telephone, where enumerators were able to help them better understand the question on Hispanic subgroups. Figure 3: The 2000-Style Questionnaire Produced Lower Subgroup Counts than Those from a Test Using the 1990-Style Questionnaire: [See PDF for image] - graphic text: [End of figure] - graphic text: The Bureau Plans to Conduct Targeted Research on Hispanic Subgroups in the Future: Because of concerns relating to the 2000 Census counts of Hispanic subgroups, Bureau officials said that they plan to focus testing and research on these questions in preparation for the 2010 Census. In particular, they stated that the Bureau would examine the likely impact of including Hispanic subgroup examples in the question again, as well as other aspects of the question that caused problems for some respondents. Before deciding on a new version of the Hispanic question, the Bureau must finish evaluating the results of the 2000 Census, conduct a number of cognitive tests, and field-test proposed changes to the question. The Bureau plans to begin testing the Hispanic question in 2003 and, as part of a field test in 2004, to administer the questionnaire in parts of Queens, New York, which the Bureau selected for its racial and ethnic diversity. The Bureau intends to complete its testing and decide on changes to the Hispanic question from 2006 through 2008. Any changes to the Hispanic question are relevant not only for the 2010 Census, but also for other Bureau questionnaires, such as the proposed ACS.[Footnote 9] Bureau officials told us that they expect that the ACS will continue to use the 2000 Census Hispanic question until research and testing on a new version is complete. The Bureau Lacks Clearly Written, Transparent Guidelines for Releasing Data: While continued research could help the Bureau collect better-quality Hispanic subgroup data, it will also be important for the Bureau to address what led it to release data that could mislead users. A key factor in this regard is that the Bureau lacks adequate guidelines for making decisions about how data quality considerations affect the release of data to the public. Had such guidelines been in place prior to releasing the Hispanic subgroup data, they could have (1) prompted the Bureau to apply more rigorous quality checks on the Hispanic subgroup data, (2) provided a basis for either releasing, delaying, or suppressing the data, and (3) informed decisions on how to describe any limitations to data released. This is not the first time that the lack of Bureau-wide guidelines on the level of quality needed for census results to be released to the public has created difficulties for the Bureau and data users. As we noted in our companion report[Footnote 10] on the Bureau‘s methods for collecting and reporting data on the homeless and others without conventional housing, one cause of the Bureau‘s shifting position on reporting those data and the resulting public confusion appears to be its lack of documented, clear, transparent, and consistently applied guidelines on the level of quality needed to release data to the public. With the Hispanic subgroup data, the Bureau released the information as planned before it could properly assess its quality, identify problems, and report its limitations. More rigorous guidelines could help ensure that decisions about the quality of all census data the Bureau releases are more consistent and better understood by the public. In 2000, the Bureau initiated a program aimed at documenting Bureau- wide protocols designed to ensure the quality of data it collected and released. Because this effort is still in its early stages, we could not assess it. However, Bureau officials believe that the program is a significant first step in addressing the Bureau‘s lack of data quality guidelines. As the Bureau develops its protocols further, it will be important that they be well documented, transparent, clearly defined, consistently applied, and properly communicated to the public. Conclusions: Throughout the 1990s, the Bureau went to great lengths to improve response rates to the 2000 Census in general, and participation of Hispanics in particular. Although the unique contributions of the individual components of the Bureau‘s efforts cannot be determined, the mail response rate was similar to the 1990 level, and the Bureau‘s preliminary data suggest that the 2000 Census count of Hispanics was an improvement over the 1990 count. However, the counts of Hispanic subgroups do not appear to have been improved and, in fact, there is concern that some of these subgroup counts may be less accurate than the 1990 counts. Moreover, the Bureau‘s experience in simplifying the questionnaire in part by removing the examples of the Hispanic subgroups shows the challenge the Bureau faces in trying to improve one component of the census count without adversely and unintentionally affecting other aspects of the census count. In light of these findings, it will be important for the Bureau to continue with its planned research on how best to enumerate Hispanic subgroups. The Bureau‘s release of Hispanic subgroup numbers raised questions about the quality of the reported data and the Bureau‘s decision to report these data as a part of its release of the SF-1 data. Although the specific questions about the Hispanic subgroup data differed from those identified in our review of the Bureau‘s efforts to collect and report data on the homeless and others without conventional housing, a common cause of both sets of problems was the Bureau‘s lack of agencywide guidelines for its decisions on the level of quality needed to release data to the public. As we recommended in our report on homeless counts, the Bureau needs to develop well-documented guidelines that spell out how to characterize any limitations in the data, and when it is acceptable to suppress these data. The Bureau should also ensure that these guidelines are documented, transparent, clearly defined, consistently applied, and properly communicated to the public. Recommendations for Executive Action: To ensure that the 2010 Census will provide public data users with more accurate information on specific Hispanic subgroups, we recommend that the Secretary of Commerce ensure that the Director of the U.S. Census Bureau implements Bureau plans to research the Hispanic question, taking steps to properly test the impact of the wording, format, and sequencing on the completeness and accuracy of the data on Hispanic subgroups and Hispanics overall. In addition, as we also recommended in our companion report on the homeless and others without conventional housing, we recommend that the Bureau develop agencywide guidelines governing the level of quality needed to release data to the public, when and how to characterize any limitations, and when it is acceptable to delay or suppress data. Agency Comments and Our Evaluation: The Secretary of Commerce forwarded written comments from the U.S. Census Bureau on a draft of this report (see app. I). The Bureau agreed with our conclusions and recommendations and, as indicated in the letter, is taking steps to implement them. However, it expressed several general concerns about our findings. The Bureau‘s principal concerns and our response are presented below. The Bureau also suggested minor wording changes to provide additional context and clarification. We accepted the Bureau‘s suggestions and made changes to the text as appropriate. The Bureau took exception to our findings concerning the adequacy of its data quality guidelines noting that it ’conducted the review of the data on the Hispanic origin population using standard review techniques for reasonableness and quality.“ We do not question the Bureau‘s commitment to presenting quality data. Rather, our point is that the Bureau needs to translate its commitment to quality into well documented, transparent, clearly defined guidelines to provide a basis for consistent decision making on the level of quality needed to release data to the public, and on when and how to characterize any limitations. During our review, Bureau officials, including the Associate Director for Methodology and Standards, told us that the Bureau had few written guidelines, standards, or procedures related to the quality of data released to the public. A second general concern expressed by the Bureau dealt with our characterization of problems with the Hispanic subgroup counts. The Bureau said that the data met an acceptable level of quality because they accurately reflect what people reported and therefore cannot be characterized as erroneous. We agree with the Bureau on this specific point. However, we take a broader view of data quality. Specifically, we believe that questions about the accuracy of the Hispanic subgroup data must also take into account problems that the respondents had in understanding the meaning of the question. The Bureau challenged our assertion that the wording of the question ’confused“ some respondents, preferring to say that some respondents may have ’interpreted“ the question wording, instructions, and examples differently than expected. We agree with the Bureau that additional research will be required to understand the extent of this problem. Nevertheless, we believe there is sufficient evidence from the Bureau‘s subsequent research and from analysis of trends in the data to support our concerns about the accuracy of Hispanic example subgroup counts in the 2000 Census. As agreed with your office, unless you publicly announce its contents earlier, we plan no further distribution of this report until 30 days from its issue date. At that time, we will send copies of this report to the Chairman of the House Committee on Government Reform, the Secretary of Commerce, and the Director of the U.S. Census Bureau. Copies will be made available to others on request. This report will also be available at no charge on GAO‘s home page at http:// www.gao.gov. Please contact me on (202) 512-6806 or by E-mail at daltonp@gao.gov if you have any questions. Other key contributors to this report were Robert Goldenkoff, Christopher Miller, Elizabeth Powell, Timothy Wexler, Ty Mitchell, Benjamin Crawford, James Whitcomb, Robert Parker, and Michael Volpe. Signed by Patricia A. Dalton: Patricia A. Dalton Director Strategic Issues: [End of section] Appendixes: Appendix I: Comments from the Department of Commerce: THE SECRETARY OF COMMERCE Washington, D.C. 20230: Ms. Patricia A. Dalton Director, Strategic Issues General Accounting Office Washington, DC 20548: Dear Ms. Dalton: The Department of Commerce appreciates the opportunity to comment on the General Accounting Office draft report entitled Decennial Census: Methods for Collecting and Reporting Hispanic Subgroup Data Need Refinement. The Department‘s comments on this report are enclosed. Donald L. Evans: Enclosure: Comments from the U.S. Department of Commerce U.S. Census Bureau: U.S. General Accounting Office draft report entitled Decennial Census: Methods for Collecting and Reporting Hispanic Subgroup Data Need Refinement: General Comments on the Report: While the U.S. Census Bureau agrees with the General Accounting Office‘s (GAO) recommendations in this report, we take exception to the GAO‘s suggestion that decisions regarding the release and characterization of data on detailed Hispanic origin groups were based on anything other than our consistent commitment to clearly presenting data that conform with established guidelines for data quality. The Census Bureau conducted the review of the data on the Hispanic- origin population using standard review techniques for reasonableness and quality. These quality decisions are based upon comparisons to independent work and findings from experts outside the Census Bureau, other surveys, analysis of trends, literature reviews, and consultations with experts (both public and private) throughout the decade. When data do not meet an acceptable level of quality, the Census Bureau will consider various options for modifying publication plans and determine the most appropriate way to disseminate these data. With regard to the data on detailed Hispanic-origin groups, we determined that it was entirely appropriate to present these data in our data products. Those products accurately reflect what people reported on their forms or to a census enumerator. Also, it should be noted that data obtained from the census question on ethnicity are the result of self-identification and, therefore, should not be characterized as ’erroneous“ (as compared with results from the 1990 census), nor should they be subject to suppression, except under highly unusual circumstances that are clearly not present here. Additional research will be required to understand the extent to which the question wording and format influenced some people to report a more general response rather than a specific Hispanic ethnicity. But it is important to acknowledge that, in Census 2000, more people of Hispanic ethnicity may have preferred to identify generally as Hispanic, Spanish, or Latino than in previous censuses. Furthermore, to understand the reasons for differences in totals for detailed Hispanic groups between the 1990 and 2000 censuses, results from both censuses must be analyzed. For example, the use of examples in 1990 may have influenced more people to report in the groups that were listed and fewer to report in other detailed groups. Alternatively, those whose groups were not listed may have reported more generally as Hispanic. The appropriate conclusion is that the results of the two censuses are different, not that one is more accurate than the other. The Census Bureau is undertaking a review of its data quality guidelines, independent of the GAO‘s findings in this report. Comments on the Text of the Report: 1.Section: Highlights page: ’A key factor behind the Bureau‘s release of the questionable subgroup data was its lack of adequate guidelines governing the quality needed before making data publicly available.“: Comment: As noted above, the Census Bureau conducts its data reviews using standard review techniques for reasonableness and quality. When data do not meet an acceptable level of quality, the Census Bureau will consider various options for modifying publication plans and determine the most appropriate way to disseminate the data. When we publish the data, we note any deficiencies and cautions in a section of the product documentation called ’User Updates“ and/or on our Web site. 2.Section: Page 3, second paragraph, third and sixth sentences: ’Bureau evaluations conducted following the census show that dropping the examples of Hispanic subgroups confused some respondents and produced less-than-accurate subgroup data.“: ’ . . . because of respondents‘ confusion over the wording of the question, the subgroup data could be misleading.“: Comment: In some cases, respondents may have interpreted the question wording, instructions, and examples differently than we might have expected. This does not mean the respondents were confused, but would indicate that additional research and testing will be required to more fully understand these interactions. 3.Section: Page 6, first paragraph, first sentence: ’Although not required by OMB standards, Hispanic subgroup data are also used for many of these same purposes.“: Comment: The sentence should be revised as follows: ’Although not required by OMB standards or federal legislation, Hispanic subgroup data ...... 4.Section: Page 9, heading: ’Efforts to Simplify Questionnaire Led Bureau to Delete List of Hispanic Subgroups.“: Comment: Heading should read ’Efforts to Simplify Questionnaire Led Bureau to Delete Examples of Hispanic Subgroups,“ because we use three specific subgroups (Mexican, Puerto Rican, and Cuban) as response categories. 5.Section: Page 15, first paragraph, last part of the first sentence: ’. . . it did not specifically design any tests to determine the impact of the changes on the quality of Hispanic subgroup data.“: Comment: The Census Bureau did look at the impact of changes on Hispanic subgroups. However, the sample size in the test was not large enough to detect statistically: significant differences for the Hispanic subgroups that comprise the ’Other Spanish/Hispanic/Latino“ population. Additionally, the test was not designed to detect the impact of each change to the question separately. 6.Section: Page 15, first bullet: ’1992 National Census Test, which was a field test of the 2000 Census questionnaire;“: Comment: This test was not a test of the actual questionnaire(s) used in Census 2000. The bullet item should be revised to indicate that this was a test of potential Census 2000 questionnaires. 7.Section: Page 15, last part of the last sentence: ’. . . but did not conduct any specific evaluations of the quality of the 1990 Hispanic subgroup data.“: Comment: The Census Bureau did examine the data for those Hispanic subgroups that were response categories on the 1990 census questionnaire. 8.Section: Page 17, first paragraph, third sentence: ’For the first time, the Bureau released data on Hispanic subgroups as a part of its release of SF-1 data even though it had not fully tested the impact of questionnaire changes on the subgroup data and provided little discussion of the potential limitations of the data.“: Comment: This sentence appears to be erroneous and should be deleted. The Census Bureau released data on detailed Hispanic subgroups in the sample 1990 summary files. (The data for detailed subgroups were coded only from the sample forms in 1990.) We conducted extensive testing of the wording for this question, including the instructions and examples, prior to Census 2000. Further, our review of these data from Census 2000 did not indicate any evidence of an ’error“ (for example, a data processing or data collection error) that would have precluded their dissemination. Subsequent evaluations have shown that additional research is needed to study how individuals choose the responses they write in. 9.Section: Page 18, first paragraph, last sentence: ’Thus, while the Bureau reported what respondents checked off on their questionnaires, because of respondents‘ confusion over the wording of the question, the 2000 subgroup data could be misleading.“: Comment: Same comment as in Item 2 above: In some cases, respondents may have interpreted the question wording, instructions, and examples differently than we might have expected. This does not mean the respondents were confused, but would indicate that additional research and testing will be required to more fully understand these interactions. 10.Section: Page 22, entire page. Comment: Regarding the issues addressed on this page, we refer the reader to our general comments on the report and also to our response to Recommendation 2. Responses to GAO Recommendations: Recommendation 1: The Census Bureau should implement its plans to conduct further research on the Hispanic question, taking steps to properly test the impact of any changes on the quality of data on Hispanic subgroups and Hispanics overall. Census Bureau Response: The Census Bureau concurs with this recommendation. This work is underway as part of the research and testing program for the 2010 census. Recommendation 2: The Census Bureau should develop agency-wide protocols that provide guidelines for bureau decisions on the level of quality needed to release data to the public, how to characterize any limitations in the data, and when it is acceptable to delay or suppress the data. Census Bureau Response: The Census Bureau concurs with this recommendation. In order to continue to maintain its long tradition of producing high-quality data, the Census Bureau has asked the Methodology and Standards Council to review our statistical and quality guidelines for surveys and censuses and codify them in one place. [End of section] Related GAO Products: Decennial Census: Methods for Reporting and Collecting Data on the Homeless and Others without Conventional Housing Need Refinement. GAO- 03-227. Washington, D.C.: January 17, 2003. 2000 Census: Refinements to Full Count Review Program Could Improve Future Data Quality. GAO-02-562. Washington, D.C.: July 3, 2002. 2000 Census: Coverage Evaluation Matching Implemented As Planned, but Census Bureau Should Evaluate Lessons Learned. GAO-02-297. Washington, D.C.: March 14, 2002. 2000 Census: Best Practices and Lessons Learned for a More Cost- Effective Nonresponse Follow-Up. GAO-02-196. Washington, D.C.: February 11, 2002. 2000 Census: Coverage Evaluation Interviewing Overcame Challenges, but Further Research Needed. GAO-02-26. Washington, D.C.: December 31, 2001. 2000 Census: Analysis of Fiscal Year 2000 Budget and Internal Control Weaknesses at the U.S. Census Bureau. GAO-02-30. Washington, D.C.: December 28, 2001. 2000 Census: Significant Increase in Cost Per Housing Unit Compared to 1990 Census. GAO-02-31. Washington, D.C.: December 11, 2001. 2000 Census: Better Productivity Data Needed for Future Planning and Budgeting. GAO-02-4. Washington, D.C.: October 4, 2001. 2000 Census: Review of Partnership Program Highlights Best Practices for Future Operations. GAO-01-579. Washington, D.C.: August 20, 2001. Decennial Censuses: Historical Data on Enumerator Productivity Are Limited. GAO-01-208R. Washington, D.C.: January 5, 2001. 2000 Census: Information on Short-and Long-Form Response Rates. GAO/ GGD-00-127R. Washington, D.C.: June 7, 2000. FOOTNOTES [1] U.S. General Accounting Office, Decennial Census: Methods for Collecting and Reporting Data on the Homeless and Others without Conventional Housing Need Refinement, GAO-03-227 (Washington, D.C: Jan. 17, 2003). [2] The Bureau, in accordance with Office of Management and Budget Federal Statistical Policy Directive 15, Race and Ethnic Standards for Federal Statistics and Administrative Reporting, collects data on two ethnicities: Hispanic origin and not of Hispanic origin. We use the same definition in this report. Additionally, the standards call for self-reporting of race and ethnicity rather than identification based on scientific or anthropological standards. The standards also cover reporting on race and ethnicity in administrative reports and for civil rights monitoring. They also specify that the data are not to be used for determining program eligibility. [3] 42 U.S.C. 1973aa-1a. [4] Public Law 94-311 requires the collection of data on ’Americans of Spanish origin or descent.“ OMB Federal Statistical Policy Directive 15 states that collection of data on Hispanic subgroups is optional, as long as the collection of these data does not harm efforts to collect accurate data on the number of Hispanics. [5] These figures represent the net Hispanic undercount, which is the difference between the estimated Hispanic population per the Bureau‘s Accuracy and Coverage Evaluation Survey and the census count. [6] The Census Bureau did look at the impact of changes on Hispanic subgroups. However, the sample size in the test was not large enough to detect statistically significant differences for the Hispanic subgroups that constitute the ’Other Spanish/Hispanic/Latino“ population. Additionally, the test was not designed to detect the impact of each change to the question separately. [7] A group of more than 30 agencies that represent the many and diverse federal needs for data on race and ethnicity, including statutory requirements for such data. [8] This study was conducted in English only. Because a sizable number of Hispanics only speak Spanish, the results of this study cannot be generalized to the Hispanic population at large. [9] The ACS is designed to provide annual data for areas with populations of 65,000 or more and multiyear averages for smaller geographic areas. The ACS is also intended to replace the long-form Census questionnaire. [10] GAO-03-227. GAO‘s Mission: The General Accounting Office, the investigative arm of Congress, exists to support Congress in meeting its constitutional responsibilities and to help improve the performance and accountability of the federal government for the American people. GAO examines the use of public funds; evaluates federal programs and policies; and provides analyses, recommendations, and other assistance to help Congress make informed oversight, policy, and funding decisions. GAO‘s commitment to good government is reflected in its core values of accountability, integrity, and reliability. Obtaining Copies of GAO Reports and Testimony: The fastest and easiest way to obtain copies of GAO documents at no cost is through the Internet. GAO‘s Web site ( www.gao.gov ) contains abstracts and full-text files of current reports and testimony and an expanding archive of older products. The Web site features a search engine to help you locate documents using key words and phrases. You can print these documents in their entirety, including charts and other graphics. Each day, GAO issues a list of newly released reports, testimony, and correspondence. GAO posts this list, known as ’Today‘s Reports,“ on its Web site daily. The list contains links to the full-text document files. To have GAO e-mail this list to you every afternoon, go to www.gao.gov and select ’Subscribe to daily E-mail alert for newly released products“ under the GAO Reports heading. Order by Mail or Phone: The first copy of each printed report is free. Additional copies are $2 each. A check or money order should be made out to the Superintendent of Documents. GAO also accepts VISA and Mastercard. Orders for 100 or more copies mailed to a single address are discounted 25 percent. Orders should be sent to: U.S. General Accounting Office 441 G Street NW, Room LM Washington, D.C. 20548: To order by Phone: Voice: (202) 512-6000: TDD: (202) 512-2537: Fax: (202) 512-6061: To Report Fraud, Waste, and Abuse in Federal Programs: Contact: Web site: www.gao.gov/fraudnet/fraudnet.htm E-mail: fraudnet@gao.gov Automated answering system: (800) 424-5454 or (202) 512-7470: Public Affairs: Jeff Nelligan, managing director, NelliganJ@gao.gov (202) 512-4800 U.S. General Accounting Office, 441 G Street NW, Room 7149 Washington, D.C. 20548:

The Justia Government Accountability Office site republishes public reports retrieved from the U.S. GAO These reports should not be considered official, and do not necessarily reflect the views of Justia.