Statscan's National Household Survey Is Not a 'Small' Data Problem

07/04/2013 09:09 EDT | Updated 09/03/2013 05:12 EDT

Statistics Canada should pride itself on releasing first-class data to the public and decision makers.

Instead, the national statistical agency is minimizing the impact of the federal government's decision to abandon the mandatory long form census in 2010, and to replace it by the voluntary National Household Survey (NHS), held for the first time in 2011.

According to the opinion of Statistics Canada's chief statistician, Wayne Smith, the enhanced volatility and the loss of data induced by this decision is mainly not a preoccupying issue. In his opinion, the problem would be circumscribed to "small populations and small areas, particularly small populations in small areas," as he explained to the Globe and Mail.

Unfortunately, the small standards recently adopted by Statscan are far from bearing small consequences. As a matter of fact, they should raise much more than small concerns.

Although not of the same calibre as that coming from a mandatory census, the 2011 NHS data is of course, not completely useless. Yes it is spotty, but it is very unlikely that it would be flawed or erroneous. It is nonetheless incomplete, and there is no reason to be satisfied with data marked by various caveats when we previously had a better tool at our disposition.

There are two reasons why policy makers and academics are right to worry about the 2011 NHS data.

The first one happens to be the main reason why researchers generally very cautiously interpret any conclusion driven by a discretionary information gathering process. When people refusing to comply share common characteristics (ethnicity, income level, religion, etc.), a selection problem may arise. This in turn is likely to introduce biases in the analysis of the data collected. Mr. Smith claims that such biases can be corrected by combining the NHS results with administrative reports and former studies (ironically, we find among those anchoring studies the former mandatory census). Even if this operation were to neutralize all potential problems, it remains a costly procedure. In fact, it would account for a large portion of the additional $22 millions the NHS will cost Canadians relatively to the long-form census, suggested Statistics Canada's former chief statistics in an interview with MacLean's earlier this year.

The second problem is directly linked to what Mr. Smith has described as a "small population in small areas" problem.

Those who differ from the majority with respect to their education level, their income, their nationality or some other characteristics are often the target of policy initiatives aiming to enhance their living standards or to address some of their particular needs. To elaborate such policies, those needs and characteristics must be identified by academic and public research, often conducted with census data. No wonder why a whole community of philanthropists, defense groups, academics, scientists and statisticians have stood up against the census reform.

Researchers should "get over" their nostalgia of the mandatory long-form census, suggests Statistics Canada. There is, however, no reason to get over the fact that Canada is purposely giving up on a tool that helps design better public policies, even if they are sometimes meant to address the needs of a small fraction of the population. Good data produces good policies, and by closing the door to better data we are turning our backs on better policies.

That is small vision.

