top of page

Airport Delays

Think about the last time you actually made it to the airport on time, only to find out your flight has been delayed. For some, that's was probaly a couple of years ago and for others, it may have been as recent as yesterday. However long you it's been for you, it's probably not great finding out that the rate of flight delays have not decreased in the past decade. For this reason, I want to analyze the operation of major airports around the country to determine factors that lead to these delays.

Data

I collected data from three different datasets relating to airport operations. These include a dataset detailing the arrival and departure delays/diversions by airport, a dataset that provides metrics related to arrivals and departures for each airport, and a dataset that details names and characteristics for each airport code. I combined these datasets into one and retained onlt the features I found valuable. The datasets consists of airport information collected from 2004 - 2014.

Exploratory Data Analysis

The following graphs below show relationships between airport cancellations and various operations (including diversions) within the airport. For my subplots. I normalized the axes so all the graph can be visible on the same scale. In the end, we'll be looking at the trends to determine the relationship between cancellations and operations. The results from the subplot are quite intuitive: gate and airport delays are. on average, positively related to departure cancellations while on-time departures are, on average, negatively related to departure cancellations. It is important to note, however, that on the first three graphs, we see small instances where the respective operations have no impact on cancellations. We can also see that departure diversions are positively related to departure cancellations

The second subplot looks at relationships between arrival cancellations and airport operations. Similar to the previous subplot, we see that arrival delays are, on average,  positively related to arrival delays and negatively related on-time arrivals. They are also, on average positively related to arrival diversions. the relationship between on-time gate arrivals and airport cancellations is not so strong. The same goes for arrival delays and arrival cancellations. This means that on-time gate arrivals and arrival delays do not strongly impact a flight cancellation. This makes sense intuitively: a flight arrival is highly unlikely to get cancelled because of an arrival delay.

Clustering

For my analysis, I used KMeans Clustering to group the airport, based on select features they share. One of the most importance aspects of clustering is determining how many clusters to create. I can determine the best cluster by comparing the silhouette score of different number of clusters. The features used in clustering are: percent on-time airport departures, percent on-time gate arrivals, average_gate_departure_delay, 

average_taxi_out_time, average taxi out delay, average airport departure delay, average airborne delay, average taxi in delay, average block delay, and average gate arrival delay. I experimented with clusters ranging from two to six. From my results, I found that a cluster size of two was the most appropriate since it had the highest silhouette score. the results are shown below.

Average Silhouette Score: 0.409466528634

Average Silhouette Score: 0.243833756646

Average Silhouette Score: 0.249974046988

Average Silhouette Score: 0.221558582283

Average Silhouette Score: 0.223243682632

Principal Component Analysis

I went ahead to use the first three components since they carry 87% of the data's total explained variance. Then, I ran the PCA on the dataset and the anaylzed the relationship between the original variables and the first three PCs. The results are shown in the table below. From the table, we can observe which variables are most correlated with the PCs, and then select those variables. For this analysis, I find that the core components of operations related to delays are Airport Departure Delays, Taxi Out Delays, and Taxi In Delays. We see from the table above that these components are the most correlated the three most relevant PCs. I also included a 3-D graph to give a visual representation of the three PC. You should definitely play around with it, it's fun!

  • LinkedIn Social Icon
  • Facebook Social Icon
  • Google+ Social Icon

© 2018 by Dami Lasisi

bottom of page