기상청_단기예보 ((구)_동네예보) 조회서비스 api 파이썬크롤링

728x90

SMALL

https://www.data.go.kr/tcs/dss/selectApiDataDetailView.do?publicDataPk=15057210

기상청_지상(종관, ASOS) 시간자료 조회서비스

종관기상관측 장비로 관측한 시간 기상자료를 조회하는 서비스

www.data.go.kr

해당 api를 파이썬으로 크롤링해보자

nx, ny격자가 있기때문에 csv파일로 만들어놓은 데이터들을 읽어주고

각 nx, ny를 parameter에 대입하여 온도(T1H)와 습도(REH)를 DB에 적재시켜줄 것이다.

시간은 40분을 기준으로 나눠주었지만 크론을 사용할 거기때문에 사실 상관X

자세한 것은 코드 주석으로 적어놓았음

import pandas as pd
from lxml import html
from urllib.request import Request, urlopen
from urllib.parse import urlencode, quote_plus, unquote
import bs4
import re
import requests
import os
import csv
import psycopg2
import time
import glob
from datetime import datetime, timedelta

host = "host number"
port = "port number"
dbname="test"
user="postgres"
password="postgres"

try:
    conn_string = "host=" + host + " port=" + port + " dbname=" + dbname + " user=" + user + " password=" + password
    conn = psycopg2.connect(conn_string)
except (Exception, psycopg2.DatabaseError) as error:
    print(error)
cur = conn.cursor()

now = datetime.now()
base_date = now.strftime('%Y%m%d')
base_time = now.strftime('%H00')

#api자료가 40분 기준이기 때문에 40분 기준으로 시간을 나눠준다
if(now.minute < 40):
    one_hour_ago = now - timedelta(hours=1)
    base_time = one_hour_ago.strftime('%H00')
else:
    base_time = now.strftime('%H00')


print(base_time)
print(base_date)

#읽을 csvfile path
csvfile = open('C:/Users/Administrator/Desktop/test.csv')
spamreader = csv.reader(csvfile, skipinitialspace=True, quotechar=None)

#자정이 될 때 기본테이블에서 past테이블로 데이터 이관
if(base_time == '0000'):
    sqlCopy = "insert into tmp_reh_api_past select * from tmp_reh_api"
    cur.execute(sqlCopy)

    sqlDel = "delete from tmp_reh_api"
    cur.execute(sqlDel)

for admCd in spamreader:
    # 1. URL 파라미터 분리하기.
    # Service URL
    xmlUrl = 'http://apis.data.go.kr/1360000/VilageFcstInfoService_2.0/getUltraSrtNcst'

    print(admCd[0], admCd[1])

    My_API_Key = unquote('apikey')
    
    # get 방식으로 쿼리를 분리하기 위해 '?'를 넣은 것
    queryParams = '?' + urlencode(    
        {
            quote_plus('serviceKey') : My_API_Key,
            quote_plus('pageNo') : '1',
            quote_plus('numOfRows') : '10000000',
            quote_plus('dataType') : 'xml',
            quote_plus('base_date') : str(base_date),
            quote_plus('base_time') : str(base_time),
            quote_plus('nx') : str(admCd[0]),
            quote_plus('ny') : str(admCd[1]),
        }
    )

    response = requests.get(xmlUrl + queryParams).text.encode('utf-8')
    xmlobj = bs4.BeautifulSoup(response, 'lxml-xml')

    rows = xmlobj.findAll('item')
    columns = rows[0].find_all()

    # 모든 행과 열의 값을 모아 매트릭스로 만들기
    rowList = []
    nameList = []
    columnList = []

    rowsLen = len(rows)

    for i in range(0, rowsLen):
        columns = rows[i].find_all()
        
        columnsLen = len(columns)
        for j in range(0, columnsLen):
            if i == 0:
                nameList.append(columns[j].name)
            eachColumn = columns[j].text
            columnList.append(eachColumn)
        print(columnList)
        if(columnList[2] == 'REH'):
            hmdt = columnList[5]
        elif(columnList[2] == 'T1H'):
            tmprt = columnList[5]

        rowList.append(columnList)
        columnList = []    # 다음 row의 값을 넣기 위해 비워준다. (매우 중요!!)
    
    sql = "insert into tmp_reh_api values('" + str(admCd[0]) + "', '" + str(admCd[1]) + "', '" + hmdt + "', '" + tmprt + "', '"+ base_date + "', '"+ base_time + "')"
    cur.execute(sql)
    conn.commit()
        
    result = pd.DataFrame(rowList, columns=nameList)
    result.head()

실행시켜주면 잘 들어가는 것을 확인할 수 있다

728x90

LIST

저작자표시 비영리 변경금지 (새창열림)

'Python' 카테고리의 다른 글

파이썬 백그라운드 실행 및 백그라운드 실행종료 (0)	2023.10.24
Python 날짜 형식 변환 strftime() 사용하기, 어제날짜구하기 (0)	2022.10.07
[python]윈도우에서 파이썬배치파일 크론으로 스케줄링하기 (0)	2022.08.30

헤맨 만큼 내 땅이다

기상청_단기예보 ((구)_동네예보) 조회서비스 api 파이썬크롤링

'Python' 카테고리의 다른 글

댓글

티스토리툴바

기상청_단기예보 ((구)_동네예보) 조회서비스 api 파이썬크롤링

'Python' 카테고리의 다른 글

관련글

댓글

티스토리툴바