사업자 번호 유효 여부 크롤링

Python/Base 2023. 6. 5. 09:59 Posted by 퓨어레드

사업자 번호를 이상하게 입력하는 사람들 땜시 홈택스 간단 크롤러를 작업했다

 

def company_no_search(request, company_no: str):
    company_no = company_no.replace("-", "")

    if len(company_no) == 10:

        headers = {'Content-Type': 'text/xml'}

        xml = "<map id='ATTABZAA001R08'><pubcUserNo/><mobYn>N</mobYn><inqrTrgtClCd>1</inqrTrgtClCd><txprDscmNo>" + company_no + "</txprDscmNo><dongCode>05</dongCode><psbSearch>Y</psbSearch><map id='userReqInfoVO'/></map>"

        r = requests.post(
            'https://teht.hometax.go.kr/wqAction.do?actionId=ATTABZAA001R08&screenId=UTEABAAA13&popupYn=false&realScreenId=',
            data=xml, headers=headers)

        if r.status_code == 200:

            tree = elemTree.fromstring(r.text)

            trtEndCd = tree.find('./trtEndCd').text
            state = tree.find('./smpcBmanTrtCntn').text
            description = tree.find('./trtCntn').text

            success = False

            if "등록되어 있는 사업자등록번호 입니다" in state.strip():
                success = True

            r = {"success": success, "company_no": company_no, "end_cd": trtEndCd, "state": state, "message": description}

            new_company = CompanyNoSearch()

            new_company.company_no = company_no
            new_company.trt_end_cd = trtEndCd
            new_company.state = state
            new_company.description = description
            new_company.success = success

            new_company.save()

        else:
            r = {"success": False, "message": "조회 할 수 없습니다"}

    else:
        r = {"success": False, "message": "잘못된 사업자 번호"}

    return HttpResponse(json.dumps(r), content_type="application/json")

 

 

해당 로직만 짤라 냈다 /ㅁ/

 

디펜던시는 requests 를 이용한다.

 

다만 홈텍스니깐.. 너무 마구 날리진 말자 ㅠㅠ

 

이전에 올렸던 시칸 체크 클래스와 apscheduler 를 이용하여 주기적 실행 코드를 작성해 보았다.

 

2019/11/07 - [Python/Base] - 시작 시간, 종료 시간 리스트가 현재 시간에 포함되어있는지 체크

 

해당 기능을 이용하려면 apscheduler 를 설치해야 된다.

 

pip install apscheduler

 

아래 코드는 어쩐지 오늘은 님의 코드를 참고하여 시간 체크 클래스를 응용하여 만들었다.

 

원본글

https://zzsza.github.io/development/2018/07/07/python-scheduler/

 

Python Scheduler 만들기(APScheduler)

종종 스케쥴러를 만들어야할 때가 있습니다. 스케쥴러를 만드는 방법은 분산 작업큐를 담당하는 Celery, crontab, Airflow, APScheduler 등 다양하게 존재합니다.

zzsza.github.io

 

Execute!! 부분에 실행 로직을 추가 하면 된다.

from apscheduler.jobstores.base import JobLookupError
from apscheduler.schedulers.background import BackgroundScheduler
import time

from TimeIncludeChk import TimeIncludeChk


class Scheduler:
    def __init__(self, time_chk: TimeIncludeChk):
        self.schedule = BackgroundScheduler()
        self.schedule.start()
        self.job_id = '1MinuteWork'
        self.timeChk = time_chk

    def __del__(self):
        self.shutdown()

    def shutdown(self):
        self.schedule.shutdown()

    def kill_scheduler(self):
        try:
            self.schedule.remove_job(self.job_id)
        except JobLookupError as err:
            print("fail to stop Scheduler: {err}".format(err=err))
            return

    def execute_work(self):

        print("Scheduler process_id[Job] : %d" % (time.localtime().tm_sec,))
        print("is_include_time_now : ", self.timeChk.is_include_time_now())

        if self.timeChk.is_include_time_now():
            print("Execute!!!")

    def scheduler(self):
        print("Scheduler Start")
        self.schedule.add_job(self.execute_work, 'interval', minutes=1, id=self.job_id)


if __name__ == '__main__':

    datas = [
        {'no': 276, 'fountainNo': 'DE0000000007', 'dayWeek': 'MON', 'dayWeekIdx': 0, 'startHour': 9, 'startMin': 0,
         'startTime': '09:00', 'endHour': 9, 'endMin': 50, 'endTime': '09:50', 'inputNo': 18, 'regDate': 1572910002000,
         'offLight': 'Y'},
        {'no': 283, 'fountainNo': 'DE0000000007', 'dayWeek': 'MON', 'dayWeekIdx': 0, 'startHour': 13, 'startMin': 0,
         'startTime': '13:00', 'endHour': 13, 'endMin': 50, 'endTime': '13:50', 'inputNo': 19, 'regDate': 1572910025000,
         'offLight': 'N'},
        {'no': 277, 'fountainNo': 'DE0000000007', 'dayWeek': 'TUE', 'dayWeekIdx': 1, 'startHour': 9, 'startMin': 0,
         'startTime': '09:00', 'endHour': 9, 'endMin': 50, 'endTime': '09:50', 'inputNo': 18, 'regDate': 1572910002000,
         'offLight': 'Y'},
        {'no': 284, 'fountainNo': 'DE0000000007', 'dayWeek': 'TUE', 'dayWeekIdx': 1, 'startHour': 13, 'startMin': 0,
         'startTime': '13:00', 'endHour': 13, 'endMin': 50, 'endTime': '13:50', 'inputNo': 19, 'regDate': 1572910025000,
         'offLight': 'N'},
        {'no': 278, 'fountainNo': 'DE0000000007', 'dayWeek': 'WED', 'dayWeekIdx': 2, 'startHour': 9, 'startMin': 0,
         'startTime': '09:00', 'endHour': 9, 'endMin': 50, 'endTime': '09:50', 'inputNo': 18, 'regDate': 1572910002000,
         'offLight': 'Y'},
        {'no': 285, 'fountainNo': 'DE0000000007', 'dayWeek': 'WED', 'dayWeekIdx': 2, 'startHour': 13, 'startMin': 0,
         'startTime': '13:00', 'endHour': 13, 'endMin': 50, 'endTime': '13:50', 'inputNo': 19, 'regDate': 1572910025000,
         'offLight': 'N'},
        {'no': 286, 'fountainNo': 'DE0000000007', 'dayWeek': 'THU', 'dayWeekIdx': 3, 'startHour': 13, 'startMin': 0,
         'startTime': '13:00', 'endHour': 13, 'endMin': 50, 'endTime': '13:50', 'inputNo': 19, 'regDate': 1572910025000,
         'offLight': 'N'},
        {'no': 279, 'fountainNo': 'DE0000000007', 'dayWeek': 'THU', 'dayWeekIdx': 3, 'startHour': 9, 'startMin': 0,
         'startTime': '09:00', 'endHour': 9, 'endMin': 50, 'endTime': '09:50', 'inputNo': 18, 'regDate': 1572910002000,
         'offLight': 'Y'},
        {'no': 280, 'fountainNo': 'DE0000000007', 'dayWeek': 'FRI', 'dayWeekIdx': 4, 'startHour': 9, 'startMin': 0,
         'startTime': '09:00', 'endHour': 9, 'endMin': 50, 'endTime': '09:50', 'inputNo': 18, 'regDate': 1572910002000,
         'offLight': 'Y'},
        {'no': 287, 'fountainNo': 'DE0000000007', 'dayWeek': 'FRI', 'dayWeekIdx': 4, 'startHour': 13, 'startMin': 0,
         'startTime': '13:00', 'endHour': 13, 'endMin': 50, 'endTime': '13:50', 'inputNo': 19, 'regDate': 1572910025000,
         'offLight': 'N'},
        {'no': 281, 'fountainNo': 'DE0000000007', 'dayWeek': 'SAT', 'dayWeekIdx': 5, 'startHour': 9, 'startMin': 0,
         'startTime': '09:00', 'endHour': 9, 'endMin': 50, 'endTime': '09:50', 'inputNo': 18, 'regDate': 1572910002000,
         'offLight': 'Y'},
        {'no': 288, 'fountainNo': 'DE0000000007', 'dayWeek': 'SAT', 'dayWeekIdx': 5, 'startHour': 13, 'startMin': 0,
         'startTime': '13:00', 'endHour': 13, 'endMin': 50, 'endTime': '13:50', 'inputNo': 19, 'regDate': 1572910025000,
         'offLight': 'N'},
        {'no': 282, 'fountainNo': 'DE0000000007', 'dayWeek': 'SUN', 'dayWeekIdx': 6, 'startHour': 9, 'startMin': 0,
         'startTime': '09:00', 'endHour': 9, 'endMin': 50, 'endTime': '09:50', 'inputNo': 18, 'regDate': 1572910002000,
         'offLight': 'Y'},
        {'no': 289, 'fountainNo': 'DE0000000007', 'dayWeek': 'SUN', 'dayWeekIdx': 6, 'startHour': 13, 'startMin': 0,
         'startTime': '13:00', 'endHour': 13, 'endMin': 50, 'endTime': '13:50', 'inputNo': 19, 'regDate': 1572910025000,
         'offLight': 'N'}
    ]

    chk = TimeIncludeChk()

    chk.set_time_list(datas)

    scheduler = Scheduler(chk)
    scheduler.scheduler()

    count = 0
    while True:
        time.sleep(60)
        count += 1
        if count == 150:
            scheduler.kill_scheduler()
            print("Kill cron Scheduler")

프로젝트를 진행하다보니 매주 몇요일 몇시 몇분 부터 몇시 몇분 까지 스케줄을 짜야 하는 경우가 발생하였다.

 

주단위 반복 스케줄에 이용하면 괜찮을 것 같다

 

set_time_list 에 시간 리스트를 넣어야 되며 형태는 다음과 같다.

 

{
  "dayWeek": "MON",
  "startHour": 9,
  "startMin": 30,
  "endHour": 10,
  "endMin": 20
}

 

체크 클래스

import datetime


class TimeIncludeChk:
    def __init__(self):
        self.week_dict = {}

    def set_time_list(self, time_list):
        for week in range(0, 6 + 1):
            self.week_dict[self.week_idx_to_str(week)] = []

        for t in time_list:
            self.week_dict[t.get("dayWeek")].append(
                {'startHour': t.get("startHour"), 'startMin': t.get("startMin"), 'endHour': t.get("endHour"),
                 'endMin': t.get("endMin")})

    def is_include_time_now(self):
        now = datetime.datetime.now()

        week_idx = self.week_idx_to_str(now.weekday())

        hour = now.hour
        minute = now.minute

        nv = self.eval_hour_min_val(hour, minute)

        for w in self.week_dict.get(week_idx):
            st = self.eval_hour_min_val(w['startHour'], w['startMin'])
            ed = self.eval_hour_min_val(w['endHour'], w['endMin'])

            if st <= nv <= ed:
                return True

        return False

    @staticmethod
    def eval_hour_min_val(hour, min):
        return (hour * 100) + min

    @staticmethod
    def week_idx_to_str(idx: int):
        if idx == 0:
            return "MON"
        elif idx == 1:
            return "TUE"
        elif idx == 2:
            return "WED"
        elif idx == 3:
            return "THU"
        elif idx == 4:
            return "FRI"
        elif idx == 5:
            return "SAT"
        else:
            return "SUN"

 

동작 샘플

from TimeIncludeChk import TimeIncludeChk

datas = [
{'no': 276, 'fountainNo': 'DE0000000007', 'dayWeek': 'MON', 'dayWeekIdx': 0, 'startHour': 9, 'startMin': 0, 'startTime': '09:00', 'endHour': 9, 'endMin': 50, 'endTime': '09:50', 'inputNo': 18, 'regDate': 1572910002000, 'offLight': 'Y'},
{'no': 283, 'fountainNo': 'DE0000000007', 'dayWeek': 'MON', 'dayWeekIdx': 0, 'startHour': 13, 'startMin': 0, 'startTime': '13:00', 'endHour': 13, 'endMin': 50, 'endTime': '13:50', 'inputNo': 19, 'regDate': 1572910025000, 'offLight': 'N'},
{'no': 277, 'fountainNo': 'DE0000000007', 'dayWeek': 'TUE', 'dayWeekIdx': 1, 'startHour': 9, 'startMin': 0, 'startTime': '09:00', 'endHour': 9, 'endMin': 50, 'endTime': '09:50', 'inputNo': 18, 'regDate': 1572910002000, 'offLight': 'Y'},
{'no': 284, 'fountainNo': 'DE0000000007', 'dayWeek': 'TUE', 'dayWeekIdx': 1, 'startHour': 13, 'startMin': 0, 'startTime': '13:00', 'endHour': 13, 'endMin': 50, 'endTime': '13:50', 'inputNo': 19, 'regDate': 1572910025000, 'offLight': 'N'},
{'no': 278, 'fountainNo': 'DE0000000007', 'dayWeek': 'WED', 'dayWeekIdx': 2, 'startHour': 9, 'startMin': 0, 'startTime': '09:00', 'endHour': 9, 'endMin': 50, 'endTime': '09:50', 'inputNo': 18, 'regDate': 1572910002000, 'offLight': 'Y'},
{'no': 285, 'fountainNo': 'DE0000000007', 'dayWeek': 'WED', 'dayWeekIdx': 2, 'startHour': 13, 'startMin': 0, 'startTime': '13:00', 'endHour': 13, 'endMin': 50, 'endTime': '13:50', 'inputNo': 19, 'regDate': 1572910025000, 'offLight': 'N'},
{'no': 286, 'fountainNo': 'DE0000000007', 'dayWeek': 'THU', 'dayWeekIdx': 3, 'startHour': 13, 'startMin': 0, 'startTime': '13:00', 'endHour': 13, 'endMin': 50, 'endTime': '13:50', 'inputNo': 19, 'regDate': 1572910025000, 'offLight': 'N'},
{'no': 279, 'fountainNo': 'DE0000000007', 'dayWeek': 'THU', 'dayWeekIdx': 3, 'startHour': 9, 'startMin': 0, 'startTime': '09:00', 'endHour': 9, 'endMin': 50, 'endTime': '09:50', 'inputNo': 18, 'regDate': 1572910002000, 'offLight': 'Y'},
{'no': 280, 'fountainNo': 'DE0000000007', 'dayWeek': 'FRI', 'dayWeekIdx': 4, 'startHour': 9, 'startMin': 0, 'startTime': '09:00', 'endHour': 9, 'endMin': 50, 'endTime': '09:50', 'inputNo': 18, 'regDate': 1572910002000, 'offLight': 'Y'},
{'no': 287, 'fountainNo': 'DE0000000007', 'dayWeek': 'FRI', 'dayWeekIdx': 4, 'startHour': 13, 'startMin': 0, 'startTime': '13:00', 'endHour': 13, 'endMin': 50, 'endTime': '13:50', 'inputNo': 19, 'regDate': 1572910025000, 'offLight': 'N'},
{'no': 281, 'fountainNo': 'DE0000000007', 'dayWeek': 'SAT', 'dayWeekIdx': 5, 'startHour': 9, 'startMin': 0, 'startTime': '09:00', 'endHour': 9, 'endMin': 50, 'endTime': '09:50', 'inputNo': 18, 'regDate': 1572910002000, 'offLight': 'Y'},
{'no': 288, 'fountainNo': 'DE0000000007', 'dayWeek': 'SAT', 'dayWeekIdx': 5, 'startHour': 13, 'startMin': 0, 'startTime': '13:00', 'endHour': 13, 'endMin': 50, 'endTime': '13:50', 'inputNo': 19, 'regDate': 1572910025000, 'offLight': 'N'},
{'no': 282, 'fountainNo': 'DE0000000007', 'dayWeek': 'SUN', 'dayWeekIdx': 6, 'startHour': 9, 'startMin': 0, 'startTime': '09:00', 'endHour': 9, 'endMin': 50, 'endTime': '09:50', 'inputNo': 18, 'regDate': 1572910002000, 'offLight': 'Y'},
{'no': 289, 'fountainNo': 'DE0000000007', 'dayWeek': 'SUN', 'dayWeekIdx': 6, 'startHour': 13, 'startMin': 0, 'startTime': '13:00', 'endHour': 13, 'endMin': 50, 'endTime': '13:50', 'inputNo': 19, 'regDate': 1572910025000, 'offLight': 'N'}
]

chk = TimeIncludeChk()

chk.set_time_list(datas)

print(chk.is_include_time_now())