🚖 Automatidata

Identify data types and relevant variables using Python

To get clear insights, New York TLC’s data must be analyzed, key variables identified, and the dataset ensured it is ready for analysis.

Exploratory Data Analysis

I use Python to show data structuring and cleaning, as well as any matplotlib/seaborn visualizations plotted to help understand the data, a box plot of the ride durations, and some time series plots, like a breakdown by quarter or month.

Conduct an A/B test

The project is reaching its midpoint. Next, I get a specific assignment: to compute descriptive statistics and conduct a hypothesis test.

Build a multiple linear regression model

It’s time to work on predicting the taxi fare amounts. We are ready to build the regression model and update the client New York City TLC about our progress.

Machine Learning

Our client, the New York City Taxi & Limousine Commission (New York City TLC), has requested that I build a machine learning model to predict if a customer will not leave a tip. They want to use the model in an app that will alert taxi drivers to customers who are unlikely to tip since drivers depend on tips.

👔 Salifort Motors

In this project, my goals are to analyze the data collected by the HR department and to build a model that predicts whether or not an employee will leave the company.

🚀 Space Mission

An incredibly rich dataset from nextspaceflight.com that includes all the space missions since the beginning of Space Race between the USA and the Soviet Union in 1957! It has data on the mission status (success/failure), the cost of the mission, the number of launches per country, and much much more.

💰 Earning Predict

The National Longitudinal Survey of Youth 1997-2011 dataset is one of the most important databases available to social scientists working with US data. It allows scientists to look at the determinants of earnings as well as educational attainment and has incredible relevance for government policy. When we have a better understanding how these variables affect education and earnings we can also formulate more suitable government policies.

🦄 Unicorn Companies

Discovery

The data I will use for this task provides information on over 1,000 unicorn companies, including their industry, country, year founded, and select investors. I will use this information to gain insights into how and when companies reach this prestigious milestone and to make recommentations for next steps to the investing firm.

Structure

We work with the unicorn companies dataset, discovering characteristics of the data, structuring the data in ways that will help us draw meaningful insights, and using visualizations to analyze the data. Ultimately, we will draw conclusions about what significant trends or patterns we find in the dataset.