Project Name: TMDb_Movies_data_analysis
Introduction
This is an analysis done on TMDb movies dataset of 1960 to 2015
The dataset contains information about 10,000 movies collected from TMDb Database including user ratings, budget for each movie and revenue.
Below is the list of columns names in our dataset and their significance
id
- This is a unique identifier for each movie
popularity
- A numeric quantity specifying the movie popularity
budget
- The cost in which the movie was made
revenue
- The worldwide revenue generated from the movie
original_title
- The title of the movie before translation or adaptaion
cast
- The name of lead and supporting actors
homepage
- A link to the homepage of the movie
director
- The directors of each movie
tagline
- The movie’s tagline
overview
- A brief description of the movie
runtime
- The running time of the movie in minutes
genres
- The genre of the movies; Action, Drama,Adventure etc..
Production_companies
- The production house of the movie
release_date
- The date the movie was released
vote_count
- The count of votes the movie received
vote_average
- Average rating the movie received
release_year
- The year the movie was released
Research Question(s) for Analysis
This dataset will be analysed to answer the foll0ing the following questions;
- What year did TMDb Movies made the highest profit?
- Which TMDb movie has the highest profit expressed as a percentage of its budget?
- What is the correlation between the attributes of our TMDb movies dataset?
Objective of Analysis
The objective of our analysis are;
- To identify the year TMDb movies made the highest profit
- To determine the movie that has the highest percentage, when profit is expressed as a percentage of its budget.
- To ascertain the correlation between different attributes of TMDb dataset.
Exploratory Analysis
Conclusion and Findings
- From our analysis we were able to ascertain that TMDb movies generated the highest profit in 2015.
- Also, we discovered that From Prada to Nada movie has the highest profit in relative to budget.
- We found that;
revenue
andpopularity
is highly correlated withvote_counts
.revenue
is highly correlated withbudget
.vote_counts
is highly correlated withprofit
.
Limitations
The filtered dataset contained vast number of Null values of which certain revenue values where dropped when budget is zero and budget when revenue is zero as well. Also, I dropped many columns from the dataset which where not needed for my analysis.