This guide provides a starting point for researchers who are new to using numeric data. It includes basic resources for locating data and using statistical software.
Tips for finding data
- Define your goals - this will help you decide where to look. Are you looking for summary statistics or raw/micro data to do your own analysis? What geographic region and level of geography do you need? What is the time period you are interested in? Do you need one point or a time series?
- It’s helpful to think about who might have collected the data you're looking for such as governmental bodies, organizations, business/trade groups, or commercial entities and see what data they have available.
- Performing a literature search is a useful technique for identifying datasets and research methodologies in your discipline. You can see what other data researchers have used or collected.
Quick Data Definitions
Summary-Level Data: Summary-level data are published data points in either print or electronic format. You would use summary-level data if you were looking for a quick statistic such as the unemployment rate for the current month or if you wanted to see a table of statistics, such as GDP for various countries during a specific time period.
Micro-Level Data: Micro-level data files are the numerically-coded results of individual responses to such files as the census questionnaires, public opinion surveys, etc. You have much more flexibility to work with the data and run statistical analyses on the extracted data. The data are in an unanalyzed, raw format of columns and rows, usually in ASCII format but not always. Some raw data files are accompanied by files in SPSS, SAS or other statistical software format for easier use in these packages. If you are working with only the raw data, you must consult the data documentation (codebook) and write a small program or use an extraction program to have the computer "read" in the data into a useable format.
Data Documentation/Codebooks: Codebooks provide information on the structure, content, and layout of a data file and the questionnaire, if any, used for the survey or study. Many codebooks are available electronically with the data file.
LinkedIn Learning @ Harvard
Provides Harvard students and employees with over 15,000 on-demand courses on computer software, business skills, and creative skills from industry experts.
Inter-University Consortium for Political and Social Research (ICPSR)
FAQ page provides new users with explanations of data terminology, data formats and ways to search, access and download datasets. It also links to video tutorials.