Python

Investigating Fandango Movie Ratings

In October 2015, a data journalist named Walt Hickey analyzed movie ratings data and found strong evidence to suggest that Fandango’s rating system was biased and dishonest. He published his analysis in this article — a great piece of data journalism that’s totally worth reading. Fandango displays a 5-star rating system on their website, where the minimum rating is 0 stars and the maximum is 5 stars. Hickey found that there’s a significant discrepancy between the number of stars displayed to users and the actual rating, which he was able to find in the HTML of the page.

Analyzing NYC High School Data

One of the most controversial issues in the U.S. educational system is the efficacy of the standardized tests, and whther they’re unfair to certain groups. Investigating the correlation between SAT scores and demographic might be an interesting angle to take. We could correlate SAT scores with factors like race, gender, income, and more. The SAT, or Scholastic Aptitude Test, is an exam that U.S. high school students take before applying to college.

Demand Forecasting of Perishable Products

The objective of this project is to minimize wastage of meal kits in retail stores. Currently, this is being done by tracking each individual item from the source until the point of sale. This is a cumbersome process and is labor intensive. In order to realize the objective using machine learning the first step in the process is to have an accurate forecast of the demand. This project focuses on generating accurate forecast for each individual item (46 unique items) for each store (47 unique stores).

Exploratory Analysis of Hacker News Posts

Hacker news is a social news website focusing on computer science and entrepreneurship. It was started by the startup incubator Y Combinator, where posts are voted and commented on similar to reddit. Posts that make it to the top of the Hacker News’ listings have more frequent visitors as a result. In this project we are interested in the posts that begin with either Ask HN or Show HN. The posts submitted by users which ask the Hacker News community specific questions start with “Ask HN” prefix.

Infectious disease Modeling

We want to model infectious diseases. These diseases can spread from one member of a population to another; we try to gain insights into how quickly they spread, what proportion of a population they infect, what proportion dies, etc. One of the easiest ways to model them (and the way we’re focusing on here) is with a compartmental model. A compartmental model separates the population into several compartments, for example:

Mobile App for Lottery Addiction

Many people start playing the lottery for fun, but for some this activity turns into a habit which eventually escalates into addiction. Like other compulsive gamblers, lottery addicts soon begin spending from their savings and loans, they start to accumulate debts, and eventually engage in desperate behaviors like theft. In this project, we are going to contribute to the development of a mobile app by writing a couple of functions that are mostly focused on calculating probabilities.

Modeling Corona Virus

This project is focused on fitting an extended SIR model with time-dependent $R_0$-values and resource-dependent death rates to real Coronavirus data, in order to come as close as possible to the real numbers and make informed predictions about possible future developments. But before we jump right into fitting the data to our model, let’s do something that is often overlooked — let’s have a short look at what our model cannot do.

AirBnB: Nearest Neighbors

Introduction AirBnB is a marketplace for short term rentals that allows you to list part or all of your living space for others to rent. You can rent everything from a room in an apartment to your entire house on AirBnB. Because most of the listings are on a short-term basis, AirBnB has grown to become a popular alternative to hotels. The company itself has grown from it’s founding in 2008 to a 30 billion dollar valuation in 2016 and is currently worth more than any hotel chain in the world.

Star Wars: A data exploration

Before the release of “Star Wars: The Force Awakens”, the team at FiveThirtyEight wanted to answer some questions about the Star Wars franchise. In particular they were interested in answering the question Which movie is the best movie in the franchise? The team needed to collect data addressing this question. To do this, they surveyed Star Wars fans using the online tool SurveyMonkey. They received 835 total responses, which you download from their GitHub repository.

Analyze Employee Exit Survey

In this project, we will work with exit surveys from employees of the Department of Education, Training and Employment (DETE) and the Technical and Further Education (TAFE) institute in Queensland, Australia. The objective of this project is to be able to answer the following questions: Are employees who only worked for the institutes for a short period of time resigning due to some kind of dissatisfaction? What about employees who have been there longer?