我正在寻找从纽约电网中抓取可公开获得的表格,网址为:http://icap.nyiso.com/ucap/public/auc_view_spot_detail.do 我可以在夏季这样做,但不能......
我正在寻找从纽约电网中获取可公开获得的表格,网址为: http://icap.nyiso.com/ucap/public/auc_view_spot_detail.do
我可以在夏季这样做,但不能在冬季这样做。我不清楚我错过了什么,所以我希望有更聪明的人能帮我解答。
下面是我的过程,从页面截图开始。
和 hitamp; Season
& Month
的组合 Display
才能生成表格。我复制了请求标头信息,包括我作为 POST 请求主体包含的 URL 编码负载。
# libraries
library(jsonlite)
library(lubridate)
library(data.table)
library(httr)
library(rvest)
# get session and cookies
initial_url <- "http://icap.nyiso.com/ucap/public/auc_view_spot_detail.do"
initial_response <- GET(initial_url)
cookie_data <- cookies(initial_response)
cookie_string <- paste0(cookie_data$name, "=", cookie_data$value, collapse = "; ")
# Define the POST request headers, including cookies
headers <- c(
"Accept" = "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
"Accept-Encoding" = "gzip, deflate",
"Accept-Language" = "en-US,en;q=0.9",
"Cache-Control" = "max-age=0",
"Connection" = "keep-alive",
"Content-Length" = "85",
"Content-Type" = "application/x-www-form-urlencoded",
"Cookie" = cookie_string,
"Host" = "icap.nyiso.com",
"Origin" = "null",
"Upgrade-Insecure-Requests" = "1",
"User-Agent" = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36"
)
# Define the URL for the POST request
post_url <- "http://icap.nyiso.com/ucap/public/auc_view_spot_detail.do"
# Below is working code for a "Summer" season:
response <- POST(post_url, add_headers(.headers = headers), encode = "form",
body = "seasonId=702793&seasonId=Summer+2024&month=05%2F2024&month=May%2F2024&display=Display")
html_content <- content(response, as = "text")
html <- read_html(html_content)
tables <- html %>% html_nodes("table")
html_table(tables[4]) # print
#[[1]]
## A tibble: 45 × 2
# X1 X2
# <chr> <chr>
# 1 "05/2024" "05/2024"
# 2 "G-J Locality" "G-J Locality"
# 3 "Awarded Deficiency (MW)" "1,888.1"
# 4 "Awarded Excess (MW)" "1,694.400"
# 5 "% Excess Above Requirement" "14.78"
# 6 "Price ($/kW-M)" "$4.27"
# 7 "" ""
# 8 "LI" "LI"
# 9 "Awarded Deficiency (MW)" "176.9"
#10 "Awarded Excess (MW)" "519.200"
## ℹ 35 more rows
## ℹ Use `print(n = ...)` to see more rows
奇怪的是,如果我在冬季更换车身,这个过程就不起作用了,正如检查网络所显示的那样。知道我可能错过了什么吗?
# does not work to generate the data
response <- POST(post_url, add_headers(.headers = headers), encode = "form",
body = "seasonId=702409&seasonId=Winter+2023-2024&month=02%2F2024&month=Feb%2F2024&display=Display")
html_content <- content(response, as = "text")
html <- read_html(html_content)
tables <- html %>% html_nodes("table")
html_table(tables[4]) # there is no such table
我注意到一些奇怪的行为:
seasonId=702793
您可以将 seasonId 编号 ( ) 更改 seasonId=Summer+2024
。其他 ID 位于此处 http://icap.nyiso.com/ucap/rest/seasons/public )
我也无法找到针对表中实际数据的特定公共 rest api。
感谢您的时间和想法。
这里有一堆我用来确定这只是冬季才会出现的问题的身体细线:
body_strings <- c("seasonId=700085&seasonId=Winter+2021-2022&month=01%2F2022&month=Jan%2F2022&display=Display",
"seasonId=700085&seasonId=Winter+2021-2022&month=02%2F2022&month=Feb%2F2022&display=Display",
"seasonId=700085&seasonId=Winter+2021-2022&month=03%2F2022&month=Mar%2F2022&display=Display",
"seasonId=700085&seasonId=Winter+2021-2022&month=04%2F2022&month=Apr%2F2022&display=Display",
"seasonId=700490&seasonId=Summer+2022&month=05%2F2022&month=May%2F2022&display=Display",
"seasonId=700490&seasonId=Summer+2022&month=06%2F2022&month=Jun%2F2022&display=Display",
"seasonId=700490&seasonId=Summer+2022&month=07%2F2022&month=Jul%2F2022&display=Display",
"seasonId=700490&seasonId=Summer+2022&month=08%2F2022&month=Aug%2F2022&display=Display",
"seasonId=700490&seasonId=Summer+2022&month=09%2F2022&month=Sep%2F2022&display=Display",
"seasonId=700490&seasonId=Summer+2022&month=10%2F2022&month=Oct%2F2022&display=Display",
"seasonId=700882&seasonId=Winter+2022-2023&month=11%2F2022&month=Nov%2F2022&display=Display",
"seasonId=700882&seasonId=Winter+2022-2023&month=12%2F2022&month=Dec%2F2022&display=Display",
"seasonId=700882&seasonId=Winter+2022-2023&month=01%2F2023&month=Jan%2F2023&display=Display",
"seasonId=700882&seasonId=Winter+2022-2023&month=02%2F2023&month=Feb%2F2023&display=Display",
"seasonId=700882&seasonId=Winter+2022-2023&month=03%2F2023&month=Mar%2F2023&display=Display",
"seasonId=700882&seasonId=Winter+2022-2023&month=04%2F2023&month=Apr%2F2023&display=Display",
"seasonId=701280&seasonId=Summer+2023&month=05%2F2023&month=May%2F2023&display=Display",
"seasonId=701280&seasonId=Summer+2023&month=06%2F2023&month=Jun%2F2023&display=Display",
"seasonId=701280&seasonId=Summer+2023&month=07%2F2023&month=Jul%2F2023&display=Display",
"seasonId=701280&seasonId=Summer+2023&month=08%2F2023&month=Aug%2F2023&display=Display",
"seasonId=701280&seasonId=Summer+2023&month=09%2F2023&month=Sep%2F2023&display=Display",
"seasonId=701280&seasonId=Summer+2023&month=10%2F2023&month=Oct%2F2023&display=Display",
"seasonId=702409&seasonId=Winter+2023-2024&month=11%2F2023&month=Nov%2F2023&display=Display",
"seasonId=702409&seasonId=Winter+2023-2024&month=12%2F2023&month=Dec%2F2023&display=Display",
"seasonId=702409&seasonId=Winter+2023-2024&month=01%2F2024&month=Jan%2F2024&display=Display",
"seasonId=702409&seasonId=Winter+2023-2024&month=02%2F2024&month=Feb%2F2024&display=Display",
"seasonId=702409&seasonId=Winter+2023-2024&month=03%2F2024&month=Mar%2F2024&display=Display",
"seasonId=702409&seasonId=Winter+2023-2024&month=04%2F2024&month=Apr%2F2024&display=Display",
"seasonId=702793&seasonId=Summer+2024&month=05%2F2024&month=May%2F2024&display=Display",
"seasonId=702793&seasonId=Summer+2024&month=06%2F2024&month=Jun%2F2024&display=Display",
"seasonId=702793&seasonId=Summer+2024&month=07%2F2024&month=Jul%2F2024&display=Display",
"seasonId=702793&seasonId=Summer+2024&month=08%2F2024&month=Aug%2F2024&display=Display"
)
您的问题是,您在标题中指定了内容长度,但您没有在内容字符串中遵守该长度('2023-2024 年冬季' 比'2023 年夏季' 长)。
这里的部分问题在于您过度指定了请求,这使其更难调试。您不需要初始 GET 请求、cookie、用户代理或大多数其他标头。
以下内容在干净的会话中完全可重现
library(httr)
library(rvest)
headers <- c(`Connection` = "keep-alive",
`Content-Type` = "application/x-www-form-urlencoded",
`Upgrade-Insecure-Requests` = "1")
POST("http://icap.nyiso.com/ucap/public/auc_view_spot_detail.do",
body = paste0("seasonId=702409",
"&seasonId=Winter+2023-2024",
"&month=02%2F2024",
"&month=Feb%2F2024",
"&display=Display"),
add_headers(.headers = headers)) %>%
content(as = "text") %>%
read_html() %>%
html_nodes("table") %>%
getElement(4) %>%
html_table()
#> # A tibble: 45 x 2
#> X1 X2
#> <chr> <chr>
#> 1 "02/2024" "02/2024"
#> 2 "G-J Locality" "G-J Locality"
#> 3 "Awarded Deficiency (MW)" "2,620.8"
#> 4 "Awarded Excess (MW)" "1,748.600"
#> 5 "% Excess Above Requirement" "14.16"
#> 6 "Price ($/kW-M)" "$4.56"
#> 7 "" ""
#> 8 "LI" "LI"
#> 9 "Awarded Deficiency (MW)" "42.8"
#> 10 "Awarded Excess (MW)" "859.700"
#> # i 35 more rows