10 擷取網路資料:Web API

(投影片 / 程式碼 / 影片)

10.1 HTTP

  • HTTP 是網路上,電腦與電腦 (伺服器) 之間進行溝通時,所遵循的一套規則。當你在瀏覽器當中輸入一個網址 (URL) 時,你的瀏覽器會幫你向位於該 URL 的伺服器發出一個 HTTP request。伺服器收到這個 HTTP request 之後,會解讀它,並依據解讀的結果,向你的電腦送出 HTTP response。若一切正常運作,HTTP response 應會包含你想要的內容 (e.g. 某個網頁)。你的瀏覽器在接收到 HTTP response 之後,會解讀它並將其所包含的內容 (e.g. HTML) (經處理後) 呈現在螢幕上。
Hypertext Transfer Protocol (HTTP)

Figure 10.1: Hypertext Transfer Protocol (HTTP)

JSON shown on Browser

Figure 10.2: JSON shown on Browser

  • 伺服器回傳 JSON 格式不是為了 (直接) 給人看的,而是為了方便程式與電腦處理。換言之,位在這個 URL 的伺服器希望使用者透過程式與自己互動,因此它不提供一個漂亮的界面 (i.e. 網頁),而是提供一個 Web API,讓使用者可以透過程式語言直接取得乾淨、結構化的資料。

10.2 URL 結構

  • Web API 的設計常是透過讓使用者在 URL 加上一些訊息 (query string),藉此得知使用者想取得哪些資料。
    • 例如,anime quotes API 透過使用者在 URL 之後加上的 title=<動畫>,判斷要回傳哪個動畫中的語錄: https://animechan.vercel.app/api/quotes/anime?title=naruto
  • URL 當中的 query string 具有一些特殊的結構。query string 讓使用者可以提供一或多個 key-value pairs (需看 API 說明文件)。例如,上方例子中的 title 即為 key-value pairs 中的 key,在 key 與 value 之間則用 = 分隔。詳見下圖的 URL 結構。
URL 結構

Figure 10.3: URL 結構

10.3 httr

HTTP with R

Figure 10.4: HTTP with R

  • 我們通常是透過瀏覽器與網路上的其它電腦 (伺服器) 互動 (i.e. 透過瀏覽器發出/接收 HTTP request/response)。但我們也可以透過 R 做到這件事 — 套件 httr 即提供一些函數方便在 R 裡面處理 HTTP request/response。

  • 下方,我們使用 httr::GET()https://animechan.vercel.app/api/quotes/anime?title=naruto&page=2 發出 HTTP request,以取得最新的匯率資料:

    library(httr)
    library(magrittr)
    
    # https://animechan.vercel.app/api/quotes/anime?title=naruto&page=2
    resp <- GET('https://animechan.vercel.app/',
                path = 'api/quotes/anime',
                query = list(title = "naruto", page = "2"))
    • 雖然我們可以將整個 URL 以字串的形式寫在 httr::GET() 的第一個 argument,但我們通常會將 URL 拆開成 base URL, path 以及 query string 這三個部份 (見上 URL 結構),再透過 httr::GET() 所提供的一些 argument 組出完整的 URL24:

      • url (first arg.): URL 中的 base URL,即上圖綠色部份 https://api.ratesapi.io/
      • path: URL 中的 path,即上圖紫色部份 api/latest
      • query: URL 中的 query string。在 httr::GET() 中,query 是以 list 的資料結構提供。上方的 URL 因為有兩個 key-value pairs,所以 query 這個 argument 也由兩個元素 (皆有 name) 組成。
  • httr::GET() 會將 HTTP request 以及接收到的 HTTP response 轉換成 R 的物件。例如,resp$url 可以取得 HTTP request 目的地 URL,resp$status_code 則可以看到 request 是否成功 (2** 代表 request 成功,4** 代表失敗):

    resp$url
    #> [1] "https://animechan.vercel.app/api/quotes/anime?title=naruto&page=2"
    resp$status_code
    #> [1] 200
  • 若要取得 HTTP response 的內容,可使用 httr::content()。若 response 的內容格式是 JSON,XML 或 HTML,httr::content() 會自動將其轉換成 R 的物件,若不想自動進行這件事,可加入 as = "text" 這個 argument:

    content(resp)
    #> [[1]]
    #> [[1]]$anime
    #> [1] "Naruto"
    #> 
    #> [[1]]$character
    #> [1] "Yashamaru"
    #> 
    #> [[1]]$quote
    #> [1] "Physical wounds will definitely bleed and may look painful \nbut over time they heal by themselves and if you apply medicine, \nthey will heal faster. What's troublesome are wounds of the heart. Nothing is harder to heal. They're a bit different from physical injuries. You can't apply medicine for one thing and sometimes, they never heal. There's only one cure for a wound of the heart. \nIt's a bit bothersome and you can only receive it from someone else. What is it? Love."
    #> 
    #> 
    #> [[2]]
    #> [[2]]$anime
    #> [1] "Naruto"
    #> 
    #> [[2]]$character
    #> [1] "Shino Aburame"
    #> 
    #> [[2]]$quote
    #> [1] "Trying to improve by learning from others that is what calls friendship."
    #> 
    #> 
    #> [[3]]
    #> [[3]]$anime
    #> [1] "Naruto"
    #> 
    #> [[3]]$character
    #> [1] "Itachi Uchiha"
    #> 
    #> [[3]]$quote
    #> [1] "Those who forgive themselves, and are able to accept their true nature... They are the strong ones!"
    #> 
    #> 
    #> [[4]]
    #> [[4]]$anime
    #> [1] "Naruto"
    #> 
    #> [[4]]$character
    #> [1] "Rock Lee"
    #> 
    #> [[4]]$quote
    #> [1] "Stop it! How dare you disrespect an opponent that fought you with all he had!?"
    #> 
    #> 
    #> [[5]]
    #> [[5]]$anime
    #> [1] "Naruto"
    #> 
    #> [[5]]$character
    #> [1] "Genma Shiranui"
    #> 
    #> [[5]]$quote
    #> [1] "When captured birds grow wiser, they try to open the cage with their beaks. They don't give up, because they want to fly again."
    #> 
    #> 
    #> [[6]]
    #> [[6]]$anime
    #> [1] "Naruto"
    #> 
    #> [[6]]$character
    #> [1] "Gaara"
    #> 
    #> [[6]]$quote
    #> [1] "Just because someone is important to you, it doesn't necessarily mean that, that person is good. Even if you knew that person was evil... People cannot win against their loneliness."
    #> 
    #> 
    #> [[7]]
    #> [[7]]$anime
    #> [1] "Naruto"
    #> 
    #> [[7]]$character
    #> [1] "Pain"
    #> 
    #> [[7]]$quote
    #> [1] "Even the most ignorant, innocent child will eventually grow up as they learn what true pain is. It affects what they say, what they think… and they become real people."
    #> 
    #> 
    #> [[8]]
    #> [[8]]$anime
    #> [1] "Naruto"
    #> 
    #> [[8]]$character
    #> [1] "Neji Hyuuga"
    #> 
    #> [[8]]$quote
    #> [1] "Fear. That is what we live with. And we live it everyday. Only in death are we free of it."
    #> 
    #> 
    #> [[9]]
    #> [[9]]$anime
    #> [1] "Naruto"
    #> 
    #> [[9]]$character
    #> [1] "Itachi Uchiha"
    #> 
    #> [[9]]$quote
    #> [1] "Even the strongest of opponents always has a weakness."
    #> 
    #> 
    #> [[10]]
    #> [[10]]$anime
    #> [1] "Naruto"
    #> 
    #> [[10]]$character
    #> [1] "Gaara"
    #> 
    #> [[10]]$quote
    #> [1] "We have walked through the darkness of this world, that’s why we are able to see even a sliver of light."
    content(resp, as = "text") %>% cat()
    #> [{"anime":"Naruto","character":"Yashamaru","quote":"Physical wounds will definitely bleed and may look painful \nbut over time they heal by themselves and if you apply medicine, \nthey will heal faster. What's troublesome are wounds of the heart. Nothing is harder to heal. They're a bit different from physical injuries. You can't apply medicine for one thing and sometimes, they never heal. There's only one cure for a wound of the heart. \nIt's a bit bothersome and you can only receive it from someone else. What is it? Love."},{"anime":"Naruto","character":"Shino Aburame","quote":"Trying to improve by learning from others that is what calls friendship."},{"anime":"Naruto","character":"Itachi Uchiha","quote":"Those who forgive themselves, and are able to accept their true nature... They are the strong ones!"},{"anime":"Naruto","character":"Rock Lee","quote":"Stop it! How dare you disrespect an opponent that fought you with all he had!?"},{"anime":"Naruto","character":"Genma Shiranui","quote":"When captured birds grow wiser, they try to open the cage with their beaks. They don't give up, because they want to fly again."},{"anime":"Naruto","character":"Gaara","quote":"Just because someone is important to you, it doesn't necessarily mean that, that person is good. Even if you knew that person was evil... People cannot win against their loneliness."},{"anime":"Naruto","character":"Pain","quote":"Even the most ignorant, innocent child will eventually grow up as they learn what true pain is. It affects what they say, what they think… and they become real people."},{"anime":"Naruto","character":"Neji Hyuuga","quote":"Fear. That is what we live with. And we live it everyday. Only in death are we free of it."},{"anime":"Naruto","character":"Itachi Uchiha","quote":"Even the strongest of opponents always has a weakness."},{"anime":"Naruto","character":"Gaara","quote":"We have walked through the darkness of this world, that’s why we are able to see even a sliver of light."}]

10.3.1 HTTP request methods

HTTP request 有許多不同的「種類」,稱為 request methods。最常見的一種即是 GET method,例如在網頁輸入 URL 或是剛剛使用的 httr::GET(), 都是在向伺服器發出 GET request。GET request 的目的是向伺服器取得資料。另一種常見的 request method 是 POSTPOST request 的目的是向伺服器提交資料,常見的例子是「帳號登入」與「填寫表單」。關於 HTTP request methods 可見 https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods

10.4 JSON

  • JSON 是一種純文字格式,跟 .csv 一樣是用來記錄結構化資訊的一種格式。但 JSON 可以記錄結構複雜許多的資料,因為它可以具有階層 (巢狀) 結構。

  • JSON 所記錄的資料,結構上與 R 的 list 非常接近。下圖左邊的 R 程式碼所建立出的 list,其所表徵的資料結構等價於右方的 JSON:

    • 在 R 裡面,可以使用 jsonlite 套件協助處理 JSON。例如,jsonlite::fromJSON() 可將符合 JSON 格式的字串轉換成 R 的 list (反之,使用 jsonlite::toJSON()):

      json_str1 = '{
        "name": "美髮",
        "subscriptionCount": 1838,
        "subscribed": false,
        "topics": ["剪髮","染髮","洗髮"],
        "postThumbnail": {
          "size": null
        }
      }'
      
      jsonlite::fromJSON(json_str1, simplifyVector = F)
      #> $name
      #> [1] "美髮"
      #> 
      #> $subscriptionCount
      #> [1] 1838
      #> 
      #> $subscribed
      #> [1] FALSE
      #> 
      #> $topics
      #> $topics[[1]]
      #> [1] "剪髮"
      #> 
      #> $topics[[2]]
      #> [1] "染髮"
      #> 
      #> $topics[[3]]
      #> [1] "洗髮"
      #> 
      #> 
      #> $postThumbnail
      #> $postThumbnail$size
      #> NULL

10.4.1 Format

  • 通常,JSON 格式是以 { 開頭,以 } 結尾 (有時是 [])。在 {} 之間,是由一個個 key:value pair 所組成,每個 key-value pair 之間以 , 分隔。key 的形式一定是字串,而 value 可為:

    • 字串、數值、布林值、null
    • 或是一個以 {} 包裹的 key-value pairs (i.e., 可具有巢狀結構)
  • 在 JSON 格式裡,{} 之內的 key-value pair 並無次序關係。若需要表徵先後次序,需使用 array: []。在 [] 中,可以放入任意數量以及各個類別的資料 (彼此以 , 分隔),例如:

    {
      "id": "rlads2019",
      "array1": [1.1, "a string", false],
      "array2": [2, {"id": 1234}, null]
    }