Reminder: Final Project is a Reflective Essay (UG/G) and Exit Interview (G only)
Data workers and their reponsibilities
My thoughts on the work of data workers
Issues in data usage
A data management set of morals
Create and maintain data storage and access systems
Have a strong background in data processing, computer programming, and tool-making
May build internal products specifically for data such as data warehouses and automation processes
Works in collaboration with data scientists and analysts
Someone you may know in this role: Ben Galewsky at the National Center for Supercomputing Applications (NCSA)
Design structures for data access and usage
Have a strong background in data structures, computer programming, and database management
May design pipelines for data to be used within a company
Works in collaboration with company administrators
Someone you may know in this role: Ben Galewsky at the National Center for Supercomputing Applications (NCSA)
Produces statistical results and interprets them
Have a strong background in data analysis, statistical theory, and communication
May write reports and give presentations for clients inside or outside of the company
Works in collaboration with data scientists and engineers or clients
Someone you may know in this role: Gus Theofanis at State Farm
Search, uncover, explain underlying features in data
Have a strong background in statistics, machine learning, and computer science
May build supervised or unsupervised procedures for feature detection and prediction
Works in collaboration with data engineers and analysts
Someone you may know in this role: Prof. Victoria Stodden at the iSchool
As data engineers who might work with data to be distributed to the public, we must be very careful to omit information that might identify individuals or their whereabouts.
Research areas include: privacy-preserving data analysis, statistical disclosure control, inference control, privacy-preserving data-mining, & private data analysis
Felds of study or disciplines include: databases, cryptography, computer science, & statistics
Articles or research papers: “Differential Privacy” by Cynthia Dwork; “De-anonymizing Social Networks” by A. Narayanan and V. Shmatikov
When scraping a website, we must be sure of certain permissions to use information and be aware of intellectual property rules.
Research areas include: big data, markup languages, machine learning, internet research ethics
Fields of study or disciplines include: information science, data science, ethics, intelligence law
Articles or research papers: “Ethics in Web Scraping” by James Densmore; “A primer on theory-driven web scraping: Automatic extraction of big data from the Internet for use in psychological research” by Richard Landers, et al
Data usage issues are complications that we should seriously or think about when using, sharing, or accessing data
Certain datasets may contain information that will be damaging if in the wrong hands
Public privacy, user anonymity, data ownership, and cyberbullying are all revolving topics around security
Companies have limits for ill-behaving users of their products, but those limits are not all encompassing
We wield lots of power in our skill set, and we must practice data wrangling with a sense of morality to do no harm and to tell the whole truth.
What will be your set of morals and behaviors when you work with data in the future? What stories and messages will you convey with data? Will you remove observations in order to make a result look better? Will you re-scale a variable to exaggerate the effect of a negative statistic?
Read a more constructed version of a set of morals called the “Feminist Data Manifest-No” about data that is inclusive of multiple communities of people and cognizant of the power data workers wield. The authors of this manifestno are: Cifor, M., Garcia, P., Cowan, T.L., Rault, J., Sutherland, T., Chan, A., Rode, J., Hoffmann, A.L., Salehi, N., Nakamura, L.