EDA

Investigating Fandango Movie Ratings

In October 2015, a data journalist named Walt Hickey analyzed movie ratings data and found strong evidence to suggest that Fandango’s rating system was biased and dishonest. He published his analysis in this article — a great piece of data journalism that’s totally worth reading. Fandango displays a 5-star rating system on their website, where the minimum rating is 0 stars and the maximum is 5 stars. Hickey found that there’s a significant discrepancy between the number of stars displayed to users and the actual rating, which he was able to find in the HTML of the page.

Analyzing NYC High School Data

One of the most controversial issues in the U.S. educational system is the efficacy of the standardized tests, and whther they’re unfair to certain groups. Investigating the correlation between SAT scores and demographic might be an interesting angle to take. We could correlate SAT scores with factors like race, gender, income, and more. The SAT, or Scholastic Aptitude Test, is an exam that U.S. high school students take before applying to college.

Exploratory Analysis of Hacker News Posts

Hacker news is a social news website focusing on computer science and entrepreneurship. It was started by the startup incubator Y Combinator, where posts are voted and commented on similar to reddit. Posts that make it to the top of the Hacker News’ listings have more frequent visitors as a result. In this project we are interested in the posts that begin with either Ask HN or Show HN. The posts submitted by users which ask the Hacker News community specific questions start with “Ask HN” prefix.

Mobile App for Lottery Addiction

Many people start playing the lottery for fun, but for some this activity turns into a habit which eventually escalates into addiction. Like other compulsive gamblers, lottery addicts soon begin spending from their savings and loans, they start to accumulate debts, and eventually engage in desperate behaviors like theft. In this project, we are going to contribute to the development of a mobile app by writing a couple of functions that are mostly focused on calculating probabilities.

Star Wars: A data exploration

Before the release of “Star Wars: The Force Awakens”, the team at FiveThirtyEight wanted to answer some questions about the Star Wars franchise. In particular they were interested in answering the question Which movie is the best movie in the franchise? The team needed to collect data addressing this question. To do this, they surveyed Star Wars fans using the online tool SurveyMonkey. They received 835 total responses, which you download from their GitHub repository.

Wrangling and Visualizing Music Data

Introduction How do musicians choose the chords they use in their songs? Do guitarists, pianists, and singers gravitate towards different kinds of harmony? We can uncover trends in the kinds of chord progressions used by popular artists by analyzing the harmonic data provided in the McGill Billboard Dataset. This dataset includes professionally tagged chords for several hundred pop/rock songs representative of singles that made the Billboard Hot 100 list between 1958 and 1991.

Analyze Employee Exit Survey

In this project, we will work with exit surveys from employees of the Department of Education, Training and Employment (DETE) and the Technical and Further Education (TAFE) institute in Queensland, Australia. The objective of this project is to be able to answer the following questions: Are employees who only worked for the institutes for a short period of time resigning due to some kind of dissatisfaction? What about employees who have been there longer?

Finding the best markets to advertise an e-learning product

In this project, we’ll aim to find the two best markets to advertise our product in — we’re working for an e-learning company that offers courses on programming. Most of our courses are on web and mobile development, but we also cover many other domains, like data science, game development, etc. Understanding the Data To avoid spending money on organizing a survey, we’ll first try to make use of existing data to determine whether we can reach any reliable result.