In this part, we will draw more advanced insights using data frame transformation techniques and window functions from the pandas library. > result.head()Īirline nb_flights perc_delayed iata_code airline_nameĠ AA 85747.0 0.1499 AA American Airlines Inc.ġ AS 16196.0 0.1208 AS Alaska Airlines Inc. rank( method = 'first', ascending =False)) # Compute airline size and delay statistics
Names of airlines associated to their IATA code is then gathered using the merge() method with the airlines_df data frame.The top 10 airlines with the highest volume of flighs are kept using the function.It is good to note that the ranking is done across all airlines.
# Only flights from set of airports and with reasonable delay amount # All rows should not have any null value We add the following columns using the assign() method, time of flight in datetime format by combining existing columns:.In this part, we fix existing columns and add new ones that will be useful later on: We also convert flight_number from being integers to being character values with the astype() method, by noting that these are IDs are have no ordered meaning.We choose to filter out flights that have more than 1 day delay. Looking at the mean and median of departure_delay, we see that values are heavily right-skewed, and we have a maximum delay of 1988 (~ 33 hours).We keep flights departing from airports that we want to look at with the function.We remove rows with missing values with the dropna() method.# Compute statistics of columns flights_df_raw.