This is a review and my answers to the DataCamp Project Visualizing COVID-19. You can find the project here if you want to do it for yourself. It’s been a while since I’ve done a Datacamp project. My last one was on visualizing music data.
DataCamp Project Review:
It’s a fairly straight forward project that you can get through in about 45 minutes if you’re familiar with ggplot2. The results are several line graphs made using ggplot2. I have mixed feelings about DataCamp Projects. On the one hand, I think they’re great for practicing things like working on Jupyter Notebooks, and several R concepts. On the other hand, the algorithm that checks the projects is very strict and infuriating.
That was possibly the most annoying part about the DataCamp Project Visualizing COVID-19 was the lack of flexibility. For example, the project’s algorithm finds it very important where you put the aes() call in your ggplot chain. So you’ll find yourself re-running code that’s different that produces the same outcome. That’s frustrating. By far the most frustrating was at one point the project tells you to “Place the labels at 100000 on the y-axis. Use the who_events data again.” Straight forward enough. HOWEVER, in your code, you needed to input 100000
as 1e5
. And if you just put in 100000
you will get it wrong. You will get frustrated. And you will take the hint.
So use my answers below so you can avoid taking hits and losing out on those points XP points.
Visualizing COVID-19 Answers:
I’m going to input the code and a screenshot of the output where applicable.
Part 1 – Loading
This one is straight forward. Loading libraries and printing a table. the output will be a long table with two columns: date and cum_cases.
#Load the readr, ggplot2, and dplyr packages library(readr) library(ggplot2) library(dplyr) #Read datasets/confirmed_cases_worldwide.csv into confirmed_cases_worldwide confirmed_cases_worldwide <- read_csv("datasets/confirmed_cases_worldwide.csv") #See the result confirmed_cases_worldwide
Part 2 – Initial Time Series
Making a time series chart looking at cumulative cases over time. A straight forward ggplot2 line chart.
#Draw a line plot of cumulative cases vs. date #Label the y-axis ggplot(confirmed_cases_worldwide, aes(x = date, y = cum_cases)) + geom_line() + ylab("Cumulative confirmed cases")

Part 3 – China
Creating another time series this time adding the color aesthetic to show China VS the rest of the world. The project makes you pass the “group” variable, which I’ve never used before. and I’m not sure what it does because the graph looked the same whether I had it in there or not.
#Read in datasets/confirmed_cases_china_vs_world.csv confirmed_cases_china_vs_world <- read_csv("datasets/confirmed_cases_china_vs_world.csv") #See the result glimpse(confirmed_cases_china_vs_world) #Draw a line plot of cumulative cases vs. date, grouped and colored by is_china #Define aesthetics within the line geom plt_cum_confirmed_cases_china_vs_world <- ggplot(confirmed_cases_china_vs_world) + geom_line(aes(x = date, y = cum_cases, group = is_china, color = is_china)) + ylab("Cumulative confirmed cases") #See the plot plt_cum_confirmed_cases_china_vs_world




Part 4 – Annotations
Adding annotations to the chart. This was by far the most annoying part of the project. There are just so many settings to get exactly right. NOTE!!!! Use 1e5
instead of 100000
. Let me save your sanity
who_events <- tribble( ~ date, ~ event, "2020-01-30", "Global health\nemergency declared", "2020-03-11", "Pandemic\ndeclared", "2020-02-13", "China reporting\nchange" ) %>% mutate(date = as.Date(date)) #Using who_events, add vertical dashed lines with an xintercept at date #and text at date, labeled by event, and at 100000 on the y-axis plt_cum_confirmed_cases_china_vs_world + geom_vline(data = who_events, aes(xintercept = date), linetype = 'dashed') + geom_text(data = who_events, aes(x = date,label = event), y = 1e5)




Part 5 – Trend Line
Some of the errors you get with the geom_smooth() can be confusing in the project. It’ll ask for an x or y. Ignore them do what you’ve been taught.
#Filter for China, from Feb 15 china_after_feb15 <- confirmed_cases_china_vs_world %>% filter(is_china == "China", date >= "2020-02-15") #Using china_after_feb15, draw a line plot cum_cases vs. date #Add a smooth trend line using linear regression, no error bars ggplot(china_after_feb15, aes(x = date, y = cum_cases)) + geom_line() + geom_smooth(method = "lm", se = FALSE) + ylab("Cumulative confirmed cases")




Part 6 – Another Trend Line
#Filter confirmed_cases_china_vs_world for not China not_china <- confirmed_cases_china_vs_world %>% filter(is_china != "China") #Using not_china, draw a line plot cum_cases vs. date #Add a smooth trend line using linear regression, no error bars plt_not_china_trend_lin <- ggplot(data = not_china, aes(x = date, y = cum_cases)) + geom_line() + geom_smooth(method = "lm", se = F) + ylab("Cumulative confirmed cases") #See the result plt_not_china_trend_lin




Part 7 – Log Scale
I actually didn’t know about the scale_y_log10() function. That was nice to learn about.
#Modify the plot to use a logarithmic scale on the y-axis plt_not_china_trend_lin + scale_y_log10()




Part 8 – Other Countries
#Run this to get the data for each country confirmed_cases_by_country <- read_csv("datasets/confirmed_cases_by_country.csv") glimpse(confirmed_cases_by_country) #Group by country, summarize to calculate total cases, find the top 7 top_countries_by_total_cases <- confirmed_cases_by_country %>% group_by(country) %>% summarize(total_cases = max(cum_cases)) %>% top_n(7) #See the result top_countries_by_total_cases




Part 9 – Wrapping It Up
#Run this to get the data for the top 7 countries confirmed_cases_top7_outside_china <- read_csv("datasets/confirmed_cases_top7_outside_china.csv") glimpse(confirmed_cases_top7_outside_china) #Using confirmed_cases_top7_outside_china, draw a line plot of #cum_cases vs. date, grouped and colored by country ggplot(data = confirmed_cases_top7_outside_china) + geom_line(aes(x = date, y = cum_cases, group = country, color = country)) + ylab("Cumulative confirmed cases")



