Using Existing Data

I.                    Introduction to secondary data analysis.

A.     Secondary data analysis involves collecting and analyzing data from previously collected data sets or previously published records and reports.  It is not simply a summarizing of other records or reports, but a fresh analysis of the previously collected data.  Secondary data analysis often focuses on the characteristics and features of larger social collectives like organizations, institutions, cities, states, communities, or countries.  However, secondary data about individuals can also be analyzed.

B.     Types of secondary data.

                                                1.      Invidual vs. aggregate data.

                                                2.      Textual reports and records.

                                                3.      Quantitative data sets.

                                                4.      Paper vs. electronic data.

C.     The popularity of secondary data analysis.

                                                1.      The wealth of available data.

                                                2.      Using existing data is easy and inexpensive.


II.                 Doing secondary data analysis.

A.     Locating suitable data.

                                                1.      Sources of existing quantitative data.

a.       Official (public) reports, records and statistics.

b.      Census data.

c.       Vital statistics.

d.      Police records.

e.       Educational reports and records.

f.        Organizational reports and records.

i.                     Service and not-for-profit organizations.

ii.                   Businesses and corporations.

g.       Polls and general surveys.

i.                     Gallup, Harris, media and political polls.

ii.                   NORC and the General Social Survey.

h.       Data collected by other researchers.

i.         Data archives.

                                                2.      Evaluate the data.

a.       Consider the reliability of the report or data set.

i.                     Consider the source.

ii.                   Consider the procedures used to collect the data and produce the data set.

iii.                  Consider the purpose for which the data were collected.

b.      Consider whether the report or data set has valid indicators of the concepts in which you are interested.

i.                     Have the variables been measured directly?

ii.                   Are there approximate indicators?

iii.                  Checking face and content validity of indicators.

B.     Requesting a data set.

                                                1.      Requesting reports and data from public sources.

                                                2.      Requesting reports and data from private sources.

                                                3.      Requesting reports and data from data archives.

                        4.   Online reports and data.

C.     Obtaining reports and data sets.

                                                1.      Entering data from reports into an electronic data set.

                                                2.      Obtaining existing electronic data sets.

a.       Storage media.

i.                     Hardcopy or paper data.

ii.                   USB flash drives.

iii.                  Tapes.


        Blocks, tracts, and record length.


iv.                 CDs.

v.                   Online data sets.

b.      Data set formats.

i.                     Character sets and formats.

ii.                   ASCII data.

iii.                  Free- vs. fixed-field ASCII files.

iv.                 Cases, records, columns and variables.

v.                   Codebooks and data dictionaries.

D.     Reading and cleaning the data.

                                                1.      Using statistical software to read the data set.

                                                2.      Cleaning the data set.

a.       Looking for out-of-bound or improbable values.

b.      Doing consistency checks.

c.       Correcting problems.

E.      Manipulating and analyzing the data.

                                                1.      Recoding data.

                                                2.      Creating indices and scales.

                                                3.      Analyzing the data.