Rで始めるデータサイエンス⑤宿題
flights {nycflights13} R Documentation
Flights data
Description
On-time data for all flights that departed NYC (i.e. JFK, LGA or EWR) in 2013.
Usage
flights
Format
Data frame with columns
year,month,day
Date of departure
dep_time,arr_time
Actual departure and arrival times (format HHMM or HMM), local tz.
sched_dep_time,sched_arr_time
Scheduled departure and arrival times (format HHMM or HMM), local tz.
dep_delay,arr_delay
Departure and arrival delays, in minutes. Negative times represent early departures/arrivals.
hour,minute
Time of scheduled departure broken into hour and minutes.
carrier
Two letter carrier abbreviation. See airlines() to get name
tailnum
Plane tail number
flight
Flight number
origin,dest
Origin and destination. See airports() for additional metadata.
air_time
Amount of time spent in the air, in minutes
distance
Distance between airports, in miles
time_hour
Scheduled date and hour of the flight as a POSIXct date. Along with origin, can be used to join flights data to weather data.
Source
RITA, Bureau of transportation statistics, https://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236
nycglights13のデータを取り込む
library(nycflights13)
- Find all flights that
- Had an arrival delay of two or more hours
filter(flights, arr_delay>120)- Flew to Houston (
IAH
orHOU
)
filter(flights, dest%in%c(“IAH”,”HOU”))- Were operated by United, American, or Delta
> airlines
# A tibble: 16 x 2
carrier name
<chr> <chr>
1 9E Endeavor Air Inc.
2 AA American Airlines Inc.
3 AS Alaska Airlines Inc.
4 B6 JetBlue Airways
5 DL Delta Air Lines Inc.
6 EV ExpressJet Airlines Inc.
7 F9 Frontier Airlines Inc.
8 FL AirTran Airways Corporation
9 HA Hawaiian Airlines Inc.
10 MQ Envoy Air
11 OO SkyWest Airlines Inc.
12 UA United Air Lines Inc.
13 US US Airways Inc.
14 VX Virgin America
15 WN Southwest Airlines Co.
16 YV Mesa Airlines Inc.filter(flights, carrier%in%c(“AA”,”UA”,”DL”))- Departed in summer (July, August, and September)
filter(flights, month %in%c (6,7,8))- Arrived more than two hours late, but didn’t leave late
filter(flights, month %in%c (6,7,8))- Were delayed by at least an hour, but made up over 30 minutes in flight
filter(flights, dep_delay >60, dep_delay-arr_delay >30)- Departed between midnight and 6am (inclusive)
filter(flights, dep_time >=0, dep_time <=600)- Another useful dplyr filtering helper is
between()
. What does it do? Can you use it to simplify the code needed to answer the previous challenges?
filter(flights, between(dep_time,0,600))- How many flights have a missing
dep_time
? What other variables are missing? What might these rows represent?
year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay
<int> <int> <int> <int> <int> <dbl> <int> <int> <dbl>
1 2013 1 1 NA 1630 NA NA 1815 NA
2 2013 1 1 NA 1935 NA NA 2240 NA
3 2013 1 1 NA 1500 NA NA 1825 NA
4 2013 1 1 NA 600 NA NA 901 NA
5 2013 1 2 NA 1540 NA NA 1747 NA
6 2013 1 2 NA 1620 NA NA 1746 NA
7 2013 1 2 NA 1355 NA NA 1459 NA
8 2013 1 2 NA 1420 NA NA 1644 NA
9 2013 1 2 NA 1321 NA NA 1536 NA
10 2013 1 2 NA 1545 NA NA 1910 NA
到着時間も分からないので、結構している可能性が高い。- Why is
NA ^ 0
not missing? Why isNA | TRUE
not missing? Why isFALSE & NA
not missing? Can you figure out the general rule? (NA * 0
is a tricky counterexample!)NA^0
[1] 1
> NA*0
[1] NA
> NA|TRUE
[1] TRUE
> FALSE & NA
[1] FALSE
→やばい、さっぱりわからない。
ディスカッション
コメント一覧
まだ、コメントがありません