POLITICS OF DATA – POLS 322/422 – FALL 2020
Final Project Guidelines
In this class, we have worked to understand how humans collect, curate, analyze, and interpret data. At the same time, we have studied how data-driven politics and policy-making shape us and our behavior. For your final project, you will focus on the politics of collecting data and on your role as a “data actor.” While this project has some similarities to the midterm, in this case you will be collecting your own data. You should answer the questions in the guidelines as completely as possible, although, depending on your topic, you may spend more time on some of them than on others. You should also think of this project as a final exam on the material you have studied this semester: class texts must be used in your paper to frame your responses.
Due Date: Tuesday, December 8, 2020 by 4.00pm (East Coast Time). Please email me your paper (don’t post it somewhere on Course Site). Length: 8-10 pages, double-spaced, in a standard font. Include a bibliography, which will not count toward the page limit. You should also include your dataset on a separate page or appendix, which will not count toward your page limit. For references, use parenthetical citations, footnotes, or a combination of these as appropriate.
Guidelines:
In this project, you’ll produce an original data set that can help answer a question of interest to you and write a paper that documents the assumptions and decisions that inform your process of creating the data set. The subject of the data set must clearly be related to politics or political science. What I am asking you to do is to take unstructured data – which has not been previously collected into a data set or modeled into rows, columns, or another kind of easily searchable structure – and to turn it into structured data in the form of a table or spreadsheet that could help to answer your question.
To build your data set, you might consider a wide range of unstructured data types: text files and documents from which you extract information of interest; social media data from a variety of platforms including blogs; images and video files; audio files; or other sources. When you turn the project in, the data set should be presented in tabular or spreadsheet form and must have at least one hundred cells. It could be ten rows by ten columns; twenty rows by five columns; or any configuration that totals one hundred cells. The data set you submit can include both quantitative data (numbers) and categorical data (descriptive names or labels).
Before starting to collect data, ask: what question am I seeking to answer with a new data set? Without a clear question, you will find it difficult to settle on what data you want to collect or how to collect it. In addition, you should perform what Khanjan Mehta, Lehigh’s Vice Provost for Creative Inquiry, calls the “Five-Minute Google Test”: spend five minutes googling to see whether a source of structured data already exists that could answer your question. If you find one, choose a different question. I want you to use your creativity and insight to build a new dataset, not simply to aggregate or duplicate sources that are already “out there.”
General Format for the Accompanying Paper
You do not need to follow this format rigidly, but you should attempt to incorporate these points into your paper as relevant. Be sure to ground your analysis in class texts and cite sources as appropriate.
Introduction and first section(s):
Describe the subject you are interested in, set out your question, and explain why answering your question is important. What will be gained by creating a data set to answer this question, as opposed to using other types of sources? Describe any sources that already exist that may be relevant, but insufficient, for answering your question and the sources you have located that you will use to build your data set. Why do you think people have not yet collected the data that could help to answer your question? Are you facing a “data deluge” or a lack of useful sources?
Subsequent section(s):
Explain how you are extracting useful data from your sources. How do you create meaningful context to collect and ultimately understand your data? Explain how you select the categories that will shape your collection and presentation of data, and how you are assigning your data to these categories. This may involve explaining how you turn qualitative sources into quantitative data (numbers) or how you create scales or rankings for your observations. Are the indicators for which you are collecting data hard to measure or easy? Why? If they are hard, what could you – or did you – do to make measurement easier? What are the implications of omitting indicators that are difficult to measure? What did you do with ambiguous observations?
Analysis and conclusions:
Evaluate the quality of knowledge that you have produced with your data set, taking into account how your data represent a simplification of the world, your construction of the boundaries of categories and concepts, and your ability to locate or produce relevant data. How or where did you struggle most in collecting data? What would an “ideal” data set have looked like, and what obstacles stood in the way of creating it? Is your data set ultimately a good representation of the “reality” that you want to address with your question?