CEM platform to allow open access to demographic census microdata since 1960

DataCEM is a single platform where it is possible to find the microdata for all demographic censuses

Janaína Simões

The Center for Metropolitan Studies (CEM-Cepid/Fapesp) is launching a new platform that provides access to microdata from the demographic census carried out by the Brazilian Institute of Geography and Statistics (IBGE) between 1960 and 2010. The platform aims to provide knowledge to society and reduce costs in treating census data. By making microdata available, DataCEM allows researchers to produce their own tables from the census microdata.

Microdata are databases that contain as much detail as possible about the information collected. In the case of the census, these are the records of individuals and households interviewed for the census. Microdata can be aggregated to obtain information at different scales, as desired by the user: districts, municipalities, microregions, mesoregions, states, etc. The platform contains not only original variables, but also harmonized and standardized versions that allow users to maximize the comparability of the historical series.

The development of DataCEM was coordinated by CEM researcher, Rogério Barbosa. “We have created a platform on which it is possible to consult and extract microdata from the census. We have databases with individual information – however, with guaranteed anonymity and confidentiality, without any kind of identifier”, he emphasizes.

(Foto: Agência IBGE)

DataCEM's target audience is researchers, especially those who seek to study and understand socioeconomic, political and demographic processes in Brazil. “They are data of a very technical nature, and anyone who uses it needs to know a little bit about Statistics and be familiar with some data analysis software”, he says. The platform is the product of the work and knowledge accumulated by CEM’s highly trained researchers.

The first novel feature offered by DataCEM is the dissemination of older editions of the censuses, from 1960 to 1991 – the IBGE’s own portal only provides microdata from the two most recent censuses. The 1960 Census sample, the rarest, underwent extensive review and measures to ensure consistency. It contains information for all administrative units; it thus differs from other public databases (such as that of IPUMS, the data consortium of the University of Minnesota).

“DataCEM is more complete, we can say it is more representative, of what happens in municipalities, within municipalities in specific areas. It ends up being a single platform where you can find all the IBGE demographic censuses for which there are microdata during the period from 1960 to 2010 ”, he says.

Furthermore, IBGE distributes census information in an extremely technical format, which can be difficult to understand and access for research, especially for those who are not very experienced with using quantitative data. “At DataCEM, we have already carried out a pre-processing of census data and we only make available the variables that are needed”, he says. DataCEM also helps the user in computational terms, since the file containing a census can be up to 20 gigabytes in size. Statistical file software programmes usually open entire demographic census, which are stored on the user's computer memory.

Another advantage of using the platform is the offer of more variables than are used by the IBGE. These are called harmonized variables. They involve the standardization of the codes and categories used over time, as well as conceptual harmonization.

The platform also has WikiDados, a repository of information for consultation and explanatory texts. WikiDados is an offshoot of some of the technical reports made by the CEM research team. It explains the process of standardization of variables, provides an overview of what is included in each census, and offers a guide for using the data. "It's like a little Wikipedia, an encyclopedia about DataCEM", says Barbosa. There is a glossary of key concepts used by the IBGE in the census, articles and summaries on various areas, explanation about the possibilities of harmonization, even guidelines for researchers on how to adapt the data for specific purposes.

Knowledge transfer

DataCEM was born through the efforts of a team of six people, coordinated by Barbosa, which was responsible for collecting and processing data used by the researchers who contributed chapters to the book “Trajectories of Inequalities: How Brazil Has Changed in the Last 50 Years”, published by CEM and edited by political scientist Marta Arretche, Director of the Cepid-Fapesp. At this stage of producing the book, the work of this group was called the Census Project.

To organize these data, statistical models and tables were produced which needed to have the same profiles and definitions and to always be comparable. “We set up this taskforce that sought to standardize and harmonize data from the various editions of the demographic census, a necessary task because census interviewers do not always collect information in the same way, or they might address the same topic, but with variations”, he explains.

One example of this concerns the theme of work. In the 1960 and 1970 Censuses, someone who did volunteer work for less than 15 hours a week was not considered “employed”, but as economically active. The 1980 Census, on the other hand, includes these people as employed. “These are small variations of this kind that can make a data series inconsistent. What our team did was to identify these cases and ensure standardization, harmonization”, he tells us. “Initially we did this work to help with the book “Trajectories ”, with information for the chapters of the authors, who are specialists in their respective areas. This made our work consistent”, he adds.

After the publication of the book, CEM already had a large database that, up to then, had only been used by the researchers themselves. “Then came the idea of ​​making this work available to the wider public, so that researchers outside CEM would not need to do this work again. It was a way of transferring the knowledge produced by CEM to society”, he points out.

“With DataCEM, we are reducing entry costs for using demographic censuses. If all researchers need to harmonize data, it can take a lot of time and energy. Furthermore, without standardization, your analyses can reach different conclusions due more to variations in operational decisions than to empirical changes to the social phenomenon that have actually occurred”, he concludes. To acess DataCem: http://200.144.244.241:3004/.


Midia Press
Janaína Simões
E-mail: imprensa.cem@usp.br
Phone: 55 (11) 3091-2097 / 55 (11) 99903-6604