Week 8 Notes

Created by Prof. Christopher Kinson


Things Covered in this Week’s Notes


Data workers and their reponsibilities

Data Engineer

  • Create and maintain data storage and access systems

  • Have a strong background in data processing, computer programming, and tool-making

  • May build internal products specifically for data such as data warehouses and automation processes

  • Works in collaboration with data scientists and analysts

  • Someone you may know in this role: Ben Galewsky at the National Center for Supercomputing Applications (NCSA)

Data Architect

  • Design structures for data access and usage

  • Have a strong background in data structures, computer programming, and database management

  • May design pipelines for data to be used within a company

  • Works in collaboration with company administrators

  • Someone you may know in this role: Ben Galewsky at the National Center for Supercomputing Applications (NCSA)

Data Analyst

  • Produces statistical results and interprets them

  • Have a strong background in data analysis, statistical theory, and communication

  • May write reports and give presentations for clients inside or outside of the company

  • Works in collaboration with data scientists and engineers or clients

  • Someone you may know in this role: Gus Theofanis at State Farm

Data Scientist

  • Search, uncover, explain underlying features in data

  • Have a strong background in statistics, machine learning, and computer science

  • May build supervised or unsupervised procedures for feature detection and prediction

  • Works in collaboration with data engineers and analysts

  • Someone you may know in this role: Prof. Victoria Stodden at the iSchool


My thoughts on the work of data workers

Thought 1

As data engineers who might work with data to be distributed to the public, we must be very careful to omit information that might identify individuals or their whereabouts.

  • Research areas include: privacy-preserving data analysis, statistical disclosure control, inference control, privacy-preserving data-mining, & private data analysis

  • Felds of study or disciplines include: databases, cryptography, computer science, & statistics

  • Articles or research papers: “Differential Privacy” by Cynthia Dwork; “De-anonymizing Social Networks” by A. Narayanan and V. Shmatikov

Thought 2

When scraping a website, we must be sure of certain permissions to use information and be aware of intellectual property rules.


Issues in data usage


A data management set of morals

We wield lots of power in our skill set, and we must practice data wrangling with a sense of morality to do no harm and to tell the whole truth.

What will be your set of morals and behaviors when you work with data in the future? What stories and messages will you convey with data? Will you remove observations in order to make a result look better? Will you re-scale a variable to exaggerate the effect of a negative statistic?

Read a more constructed version of a set of morals called the “Feminist Data Manifest-No” about data that is inclusive of multiple communities of people and cognizant of the power data workers wield. The authors of this manifestno are: Cifor, M., Garcia, P., Cowan, T.L., Rault, J., Sutherland, T., Chan, A., Rode, J., Hoffmann, A.L., Salehi, N., Nakamura, L.