Movie Data Analysis
For our first project at the Flatiron School, we were tasked with giving three recommendations to Microsoft on how to enter the movie industry space. We used the Python package Pandas and SQL to load in and clean data from data sets as well as analyze and create visualizations. We were to present our findings to the c-suite members of the company and give them our three best recommendations. Before we started our project we had to really understand the business the movie industry is in and the best way to access the market. We understood that the market is very saturated with many big competitors like Warner Bros, Paramount Pictures, Disney, etc. So how does Microsoft best navigate this challenge of giants and become a success? A good measure of success in any industry is the return on investment and profit. So, we started our wider focus on movies and their profitability. We then got more granular in our analysis of the profitability of top: movie genres, movie run times, and movie actors/actresses.
Data Understanding
In our data analysis, we used two different data sets. We used IMDB a data set that contained categorical information on movie start years, run times, and people's profession associated with the movie (actors, actresses, writers, directors, producers, etc...). We also used a data set called The Numbers which contained numerical information on movie production budgets and gross earnings from which we were able to calculate the profit. We cleaned both these data sets that had movies with null values in the columns and then merged the two data sets. We merged the data sets based on the movie titles column and we ran into our first big problem as data scientists. We lost a lot of valuable data in the merge. We tried stripping all spaces and special characters from both columns to make them match up better when we tried to merge them but that didn't seem to help. Well, we had to work with what we had and continue our analysis. Another problem that we ran into was that almost all movies had multiple genres it was associated with. This posed a problem because when we wanted to organize our data by movie genre and their profitability, there were thousands of different combinations of movie genres and not just one genre associated with one movie. We pondered for a while how to solve this problem and that is where we fell upon the pandas .explode() method. The method takes elements of a list-like structure and transforms them into a separate row. We finally broke through that challenge. After using this method we were able to organize our data into certain categories and continue to focus on our first recommendation which then flowed into our second and third recommendations.
Data Analysis
Based on our data we found most profitable genres by median profit.
From the top 13 genres with the highest profitability based on the median profit we decided to focus in on the top 4 genres. We then narrowed our focus...
For these Top 4 profiting genres, we analyzed each with respect to profit by run times
We found that for animation movies the most profitable run times are in the time range of 80-110 minutes. For the other three genres, the most profitable run times are in the interval of 140-155 minutes. The 125-140 and 155-170 run times are quite profitable as well. We then focused on the Action genre and its actors/actresses who were most profitable.
Profitability of top 10 actors for the action genre
We chose the action genre because it had the most actors and actresses in the data set compared to the other top 4 genres. We filtered down to actors who had been in 4 or more different action movies which represents their popularity and established success in the action genre
Conclusion/Recommendations
If Microsoft were to enter the movie industry we would give 3 recommendations:
1) Pursue 1 of the top 4 most profitable genres (Animation, Adventure, Sci-Fi and Action)
2) For Adventure, Sci-Fi and Action focus on run times between 140-155 minutes. For Animation focus on run times between 80-110 minutes
3) For Action movies, recruit 1 of the 10 actors previously shown
Click here to see my github to look more into my project



Comments
Post a Comment