In this project, I explored and visualized the Complete COVID-19 dataset on confirmed cases and deaths from
Johns Hopkins University (JHU).
Visualizations are available in my Tableau Public Profile. You can check that out here.
The Johns Hopkins University dataset is maintained by a team at its
Center for Systems Science and Engineering (CSSE).
It has been publishing updates on confirmed cases and deaths for all countries since January 22, 2020.
A feature on the JHU dashboard and dataset was published in The Lancet in early May 2020.
This has allowed millions of people across the world to track the course and evolution of the pandemic.
JHU updates its data multiple times each day.
This data is sourced from governments, national and subnational agencies across the world —
a full list of data sources for each country is published on Johns Hopkins's GitHub site.
It also makes its data publicly available there. These figures from Johns Hopkins only provide confirmed deaths figures.
It should be noted — based on reports and estimates of excess deaths — that this is an underestimate of the total impact of the pandemic on mortality globally.
Is the movie industry dying? Is Netflix the new entertainment king? Those were the first questions that led me to create a clean dataset
focused on movie revenue and analyze it over the last decades. But why stop there? There are more factors that intervene in this
kind of thing, like actors, genres, user ratings and more. And now, we can ask specific questions about the
movie industry, and get answers!
There are 7668 movies in the dataset (220 movies per year, 1980-2020). This data was scraped from IMDb.
Check out my Tableau Public Profile for some of my data visualizations on various datasets like Global COVID-19 dataset, Seattle's AirBnB dataset, Global Video Game Sales Trends, and Customer Segmentation for a major UK Bank etc.
A good quality data is valid - meets the constraints defined for that data, accurate - is close to the true value, complete - does not have any missing data, consistent - between different data sets, and uniform - uses the same unit of measure.
Cleaning data is important because it will ensure we have data of the highest quality. This will not only prevent errors — it will prevent customer and employee frustration, increase productivity, and improve data analysis and decision-making.
In this project, I use Nashville's Real Estate data to employ data cleaning techniques to make it more usable for an analysis.
I think it's fair to say that at some point, we all have bookmarked product pages from Amazon, which we refreshed frantically hoping for the prices to go down.
Ok, maybe not frantically, but definitely several times a day. I wrote a simple Python script that can scrape Amazon product pages from any of their stores and check the price, among other things using Beautiful Soup. Helloooo Black Friday!