Chapter 3 Data Transformation

3.1 Belief 1 and 2

3.1.1 In terms of the raw data for analyzing death counts

  1. "AH_Monthly_provisional_counts_of_deaths_by_age_group__sex__and_race_ethnicity_for_select_causes_of_death.csv" dataset has a lot of irrelevant information, we extract “AllCause”, “Month”, “Year” columns for plotting and “sex”, “AgeGroup”, “Race” for data cleaning. We perform similar things to "Monthly_Counts_of_Deaths_by_Select_Causes__2014-2019.csv" dataset

  2. Month and Year are given separately, in order to give a plot where date is the x-axis, we need to merge these two columns and convert data into date format.

  3. Time series models give predicted data as time series objects. We convert that into normal data frame for plotting.

  4. In order to plot predicted data and actual data as a continuous graph, we merge these 2 datasets/data frame.

  5. We have two data sets, one records data from 2014 to 2019, the other records data from 2019 to 2020. In order to make a continuous graph, we merge these two datasets/data frames.

3.2 Belief 3

3.2.1 In terms of the raw data for analyzing state

  1. Aggregate data: we concatenated the daily reports files for US daily data. In the end, the whole data sets contains US daily data from 2020-04-12 to 2021-04-03 in US.
  2. Edit column variables: we only kept the columns ‘Province_State’, ‘Country_Region’, ‘Lat’,‘Long_’,‘date’ and renamed them as ‘State’, ‘Country’, ‘Latitude’, ‘Longitude’, ‘Date’ since only these columns are of our interest. We dropped the extra 12 columns.
  3. Format data: format date into this format “%m-%d-%Y”.

3.2.2 In terms of the raw data for analyzing race

  1. Extract data: the raw data population dataset containing population estimates in each state for all 2010 to 2020 but we only need the most recent population estimates so we extract the data we need.
  2. Merge dataset: we added a column to provide names for state abbreviations in race data so that race dataset and population dataset can be merged into one dataset by state name.