Analysis project A

Develop your own research hypothesis to use in a one-sided two-sample z-test for population proportions.  One way to start is to imagine the potential social impacts of a major historical event, such as the Civil War, Industrialization, or the Great Migration.  Then, think how such an impact might be reflected in a quantitative form in the Census.              What can be measured depends on what questions were asked in Census used at that time.   Search on “index of census questions” to find a breakdown of which questions were asked and when. 

            Finding a good topic requires working back-and-forth between history and data availability.  For example, consider the Pike’s Peak Gold Rush, occurring 1858-1861 in parts of what became Kansas and Nebraska.  In principle, comparing data from the 1850 and 1860 Census could be used to study the impact of that event.  However, Kansas and Nebraska would not appear in either of the 1850 or 1860 Census because they had not yet achieved statehood.  So, there is some needle-threading that is required.

Explain to a novice how your population proportions would in principle be computed. 

Gather data from Family Search using original handwritten records.  Those are readily accessed using FamilySearch.  For each of the two samples, I recommend working from all persons shown on a single sheet of the census, which is usually between 30 and 40 people per sheet.  If you are restricting yourself to a smaller groups, say, school-age children, then your working sample sizes will of course be smaller. 

            Describe what you are doing as you present and process the data.  Show the reader where and how you got the data using cropped screenshots.  Those images will not be self-explanatory; they must be accompanied with text.  Write as if you are interested in the subject and the people, and are addressing someone else who is interested, too. 

Walk the reader through the steps of the hypothesis test in the context of your data.  As you go, explain how the test progressively answers the question, “How far is far?”

 Graph the test, labeling all relevant portions. 

 Explain how the computation of your particular p-value is connected to the null hypothesis. 

As we have stressed, taking a single page from the U.S. Census does not give a “random” sample.  What specific problems do you see in the use of a single page?   How would a random sample of the same size help overcome those issues?  Is it guaranteed to be better? 

un: babka22 pw: creditlimit99

sample hypothesis from professor: in 1850 the population proportion of the census population in Ohio under age 30 is larger than the population proportion of the census population under age 30 in Virginia.

Leave a comment

Your email address will not be published. Required fields are marked *