정보보안 과정 Day120 : Python requests / beautifulsoup 모듈

2021. 3. 4. 10:59

[목차]

1. 사용처

2. requests

3. beautifulSoup

[사용처]

파이썬의 기본 문법 ---> 파이썬 모듈 -----> 파이썬 공격 프로그램 개발(on DVWA)

- requests

- urllib/urllib2

- BeautifulSoup

[requests]

■ 모듈 임폴트 하기

import requests

■ HTTP Request Method 종류

url = 'https://api.github.com/events'

data = {'key':'values'}

r = requests.get('https://api.github.com/events')

r = requtest.post(url, data)

r = requests.put(url, data)

r = requests.delete(url)

r = requests.head(url)

r = requests.options(url)

[EX] Method 사용해 보기

■ requests.get

>>> import requests

>>> r = requests.get('https://api.github.com/events')

>>> r

>>> r.headers

{'Date': 'Mon, 15 Jul 2019 06:43:56 GMT', 'Content-Type': 'application/json; charset=utf-8', 'Transfer-Encoding': 'chunked', 'Server': 'GitHub.com',

..... (중략) ....

>>> r.text

'[{"id":"10010021980","type":"PushEvent","actor":{"id":8013154,"login":"KexyBiscuit","display_login":"KexyBiscuit","gravatar_id":"","url":"https://api.github.com/users/KexyBiscuit","avatar_url":"https://avatars.githubusercont

..... (중략) .....

■ requests.post

>>> data = {'key': 'value'}

>>> r = requests.post('https://httpbin.org/post', data=data)

>>> r.text

'<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">\n<title>405 Method

..... (중략) .....

■ requests.put

>>> r = requests.put('https://httpbin.org/put', data=data)

>>> r.text

'{\n "args": {}, \n "data": "", \n "files": {}, \n "form": {\n "key":

..... (중략) .....

■ requests.delete

>>> r = requests.delete('https://httpbin.org/delete')

>>> r.text

'{\n "args": {}, \n "data": "", \n "files": {}, \n "form": {\n

..... (중략) .....

■ requests.head

>>> r = requests.head('https://httpbin.org/get')

>>> r.headers

{'Access-Control-Allow-Credentials': 'true', 'Access-Control-Allow-Origin':

..... (중략) .....

■ requests.h

>>> r = requests.options('https://httpbin.org/get')

>>> r.headers

{'Access-Control-Allow-Credentials': 'true', 'Access-Control-Allow-Methods':

..... (중략) .....

[1] HTTP Request

* GET Method

* POST Method

1. GET Method

url = 'https://httpbin.org/get'

params = {'key1': 'value1'}

params = {'key1': 'value1', 'key2': 'value2', 'key3': ['value1', 'value2']}

headers = {'user-agent': 'my-app/0.0.1'}

proxies = {'http': 'http://localhost:8080', 'https': 'https://localhost:8080'}

r = requests.get(url)

r = requests.get(url, params=params)

r = requests.get(url, headers=headers)

r = requests.get(url, proxies=proxies)

2. POST Method

url = 'https://api.github.com/events'

data = {'some': 'data'}

data = {'key1': 'value1', 'key2': 'value2', 'key3': ['value1', 'value2']}

headers = {'user-agent': 'my-app/0.0.1'}

proxies = {'http': 'http://localhost:8080', 'https': 'https://localhost:8080'}

files = {'file': 'file contents'}

files = {'file': open('report.xls', 'rb')}

files = {'file': open('report.xls', 'rb'), 'application/vnd.ms-excel', {'Expires': '0'}}

r = requtest.post(url, data=data)

r = requests.post(url, json=data)

r = requests.post(url, files=files)

[참고]

r = requests.get(url)

[EX] HTTP Reqeust - GET/POST Method 실습

■ URL 매개 변수 전달

>>> import requests

>>> payload = {'key1': 'value1', 'key2': 'value2'}

>>> r = requests.get('https://httpbin.org/get', params=payload)

>>> r.url

'https://httpbin.org/get?key1=value1&key2=value2'

>>> payload = {'key1': 'value1', 'key2': ['value2', 'value3']}

>>> r = requests.get('https://httpbin.org/get', params=payload)

>>> r.url

'https://httpbin.org/get?key1=value1&key2=value2&key2=value3'

■ 사용자 정의 헤더

>>> url = 'https://api.github.com/some/endpoint'

>>> headers = {'user-agent': 'my-app/0.0.1'}

>>> r = requests.get(url, headers=headers)

>>>

-> 정상적으로 잘 보내지는지만 테스트 한다.

■ 복잡한 POST 요청

>>> payload = {'key1': 'value1', 'key2': 'value2'}

>>> r = requests.post("https://httpbin.org/post", data=payload)

>>> r.text

{\n "args": {}, \n "data": "", \n "files": {}, \n "form": {\n "key1": "value1", \n "key2": "value2"\n },

..... (중략) .....

>>> payload_tuples = [('key1', 'value1'), ('key1', 'value2')]

>>> r1 = requests.post('https://httpbin.org/post', data=payload_tuples)

>>> payload_dict = {'key1': ['value1', 'value2']}

>>> r2 = requests.post('https://httpbin.org/post', data=payload_dict)

>>> r1.text

'{\n "args": {}, \n "data": "", \n "files": {}, \n "form": {\n "key1": [\n "value1", \n "value2"\n ]\n }, \n

>>> r1.text == r2.text

True

>>> import json

>>> url = 'https://api.github.com/some/endpoint'

>>> payload = {'some': 'data'}

>>> r1 = requests.post(url, data=json.dumps(payload))

>>> r2 = requests.post(url, json=payload)

>>> r1.text == r2.text

True

■ POST 멀티 파트 인코딩 된 파일

>>> import os

>>> os.system('echo 1111 > report.xls')

>>> os.system('cat report.xls')

1111

>>> url = 'https://httpbin.org/post'

>>> files = {'file': open('report.xls', 'rb')}

>>> r = requests.post(url, files=files)

>>> r.text

'{\n "args": {}, \n "data": "", \n "files": {\n "file": "1111\\n"\n },

..... (중략) .....

-> '0'은 운영체제 쉘의 return value 이다. 파일의 내용이 아니다.

>> url = 'https://httpbin.org/post'

>>> files = {'file': ('report.csv', 'some,data,to,send\nanother,row,to,send\n')}

>>> r = requests.post(url, files=files)

>>> r.text

'{\n "args": {}, \n "data": "", \n "files": {\n "file": "some,data,to,send\\nanother,row,to,send\\n"\n },

..... (중략) .....

[2] HTTP Response

r.headers

r.headers['content-type'] == r.headers.get('content-type')

r.text

r.content

r.json()

r.raw

r.url

r.encoding

r.status_code

[EX] HTTP Response 실습

■ 일반적인 응답 내용

>>> import requests

>>> r = requests.get('https://api.github.com/events')

>>> r.text

[{"id":"10010171733","type":"DeleteEvent","actor":{"id":27856297,"login":"

..... (중략) .....

>>> r.encoding

'utf-8'

■ JSON 응답 내용

>> r = requests.get('https://api.github.com/events')

>>> r.json()

[{'id': '10010192817', 'type': 'PushEvent', 'actor': {'id': 158862, 'login':

..... (중략) .....

■ 응답 상태 코드

>>> r = requests.get('https://httpbin.org/get')

>>> r.status_code

200

>>> r.status_code == requests.codes.ok

True

■ 응답헤더

>> r.headers

{'Access-Control-Allow-Credentials': 'true', 'Access-Control-Allow-Origin': '*', 'Content-Encoding': 'gzip', 'Content-Type': 'application/json',

..... (중략) .....

>>> r.headers['Content-Type']

'application/json'

>>> r.headers.get('content-type')

'application/json

■ 쿠키

(주의) python console 재실행한다.

>>> import requests

>>> url = 'http://192.168.10.134/dvwa/login.php'

>>> proxies = {'http': 'http://localhost:9000', 'https': 'https://localhost:9000'}

>>> s = requests.Session()

>>> r = s.get(url, proxies=proxies)

>>> r.cookies

<RequestsCookieJar[Cookie(version=0, name='PHPSESSID', value='1444a34491d1e683cca6caedba13f5ea', port=None, port_specified=False, domain='192.168.10.134', domain_specified=False, domain_initial_dot=False, path='/', path_specified=True, secure=False, expires=None, discard=True, comment=None, comment_url=None, rest={}, rfc2109=False), Cookie(version=0, name='security', value='high', port=None, port_specified=False, domain='192.168.10.134', domain_specified=False, domain_initial_dot=False, path='/dvwa', path_specified=False, secure=False, expires=None, discard=True, comment=None, comment_url=None, rest={}, rfc2109=False)]>

>>> r.cookies['PHPSESSID']

'1444a34491d1e683cca6caedba13f5ea'

■ 리디렉션 및 기록

>> r = requests.get('http://github.com/')

>>> r.url

'https://github.com/'

>>> r.status_code

200

>>> r.history

[<Response [301]>]

>>> r = requests.get('http://github.com/', allow_redirects=False)

>>> r.status_code

301

>>> r.history

[]

>>> r = requests.head('http://github.com/', allow_redirects=True)

>>> r.url

'https://github.com/'

>>> r.history

[<Response [301]>]

■ 실습에서 사용하려는 포맷 - GET/POST Method & Response

url = 'https://api.github.com/events'

data = {'username':'admin', 'password':'password', 'Login':'Login'}

proxies = {'http':'http://localhost:8080', 'https':'http://localhost:8080'}

s = reqeusts.Session()

req = request.Request('POST', url, data=data)

prepared = s.prepare_request(req)

resp = s.send(prepared, proxies=proxies)

url = 'https://api.github.com/events'

data = {'username':'admin', 'password':'password', 'Login':'Login'}

proxies = {'http':'http://localhost:8080', 'https':'http://localhost:8080'}

s = requests.Session()

resp = s.post(url, data=data, proxies=proxies)

[실습] Attacker(192.168.10.60) -- burpsuite --> DVWA

• 설정: burpsuite(proxy:9000), firefox(proxy:9000)

• 대상: http://192.168.10.134/dvwa/login.php

import requests

login_url = 'http://192.168.10.134/dvwa/login.php'

login_data = {'username':'admin', 'password':'password', 'Login':'Login'}

proxies = {'http':'http://localhost:9000', 'https':'http://localhost:9000'}

s = requests.Session()

resp = s.post(login_url, data=data, proxies=proxies)

print(resp.text)

[beautifulsoup]

2. 패키지 추가 :

PyCharm > File > Settings > Project:프로젝트이름

> Project Interpreter > +

패키지 이름: beautifulsoup4, bs4, lxml, html5lib

3. 모듈 임폴트 하기

from bs4 import BeautifulSoup

4. 파서 선택하기(Parser selection)

soup = BeautifulSoup(resp, 'html.parser') /* Python's HTML Parser */

soup = BeautifulSoup(resp, 'lxml') /* lxml's HTML Parser */

soup = BeautifulSoup(resp, 'lxml-xml') /* lxml's XML Parser */

soup = BeautifulSoup(resp, 'html5lib') /* html5lib */

5. <Tag> : 첫 번째 만나는 것만 찾아줌

soup.title

soup.a

soup.p

soup.body.b

6. <Name, String>

soup.title.name

soup.title.string

7. <Attribute>

soup.a.attrs

soup.a['class']

soup.a['id']

soup.a.get('class')

속성 지우기

del soup.a['id']

8. .find() / .find_all() : 리스트 안에 들어가게 된다.

(ㄱ) find() / find_all() 형식

find_all()

• soup.find_all(name, attr, recursive, string, limit, **kwargs)

find()

• soup.find(name, attr, recursive, string, limit, **kwargs)

(ㄴ) tag/attributes/text/keyword 매개변수

<Tag> 매개변수

soup.find_all('title')

soup.find_all({'h1', 'h2', 'h3', 'h4', 'h5', 'h6'})

soup.find_all(['h1', 'h2', 'h3', 'h4', 'h5', 'h6'])

<attributes> 매개변수

soup.find_all("a", {"class": {"red", "green"}})

<text> 매개변수

soup.find_all(id="the prince")

== soup.find_all("", {"id": "the prince"})

(주의)

(X) soup.find_all(class="green") /* class : 파이썬 예약어, class 지정할 때 사용 */

(0) soup.find_all(class_="green")

(0) soup.find_all("", {"class": "green"})

<keyword> 매개변수

soup.find_all(id="text")[0].get_text()

soup.find_all(id="text")[0].string

(ㄷ) .find_all() 여러가지 예제

soup.find_all("p", "title")

soup.find_all(id="link3")

soup.find_all(id=True)

soup.find_all(["a", "b"])

soup.find_all("a", class_="message") # 예전 방식

soup.find_all("a", {"class":"message"}) # 새로운 방식

soup.find_all("a", text="Elsie") # 예전 버전

soup.find_all("a", string="Elsie") # 새로운 버전

soup.find_all(string="Elsie")

soup.find_all(string=["Title", "Elsie", "Lacie"])

soup.find_all(attrs={"id": "link1"})

soup.find_all("a", {"class":re.compile("sister")})

9. .prettify() : 보기 좋게 출력

soup.prettify()

soup.b.prettify()

10. 반복구문을 이용한 다루기

from bs4 import BeautifulSoup

....

soup = BeautifulSoup(resp, 'lxml')

for link in find_all('a'):

print(link.get('href'))

[EX1]

print(soup.html.head.title)

print(soup.title)

print(soup.find_all("title")[0])

[EX2]

print(soup.title.string)

print(soup.title.get_text())

print(soup.find_all("title")[0].get_text())

[EX3]

print(soup.find_all("a"))

print(soup.find_all("a", class_="sister"))

(list 아니라면)

for i in soup.find_all("a", class_="sister"):

print(i)

[EX4]

print(soup.find_all("a", class_="sister")[1])

(list 아니라면)

for i in soup.find_all('a'):

if re.search('Elsie2', str(i)):

print(i)

[EX5]

print(soup.find_all("a", {'id': 'link1'})[1].get_text())

[EX6]

for ptag in soup.find_all("p", class_="story"):

if re.search('ID', str(ptag)):

print(ptag.string)

for ptag in soup.find_all("p", class_="story"):

if re.search('ID', str(ptag)):

print(ptag.get_text())

[EX7]

for ptag in soup.find_all("p", class_="story"):

if re.search('ID', str(ptag)):

print(ptag.string.split(',')[1])

[EX8]

print(soup.input['name'])

[EX9]

print(soup.get_text())

[EX10]

p=re.compile("Dormouse's")

print(p.findall(soup.get_text()))

728x90

저작자표시

'정보보안공부 > 정보보안전문과정' 카테고리의 다른 글

정보보안 과정 Day 121 : 파이썬 공격코드 제작 실습2 (0)	2021.03.05
정보보안 과정 Day120-1 : 파이썬 공격 코드 제작 실습 (0)	2021.03.04
정보보안 과정 Day 119 : CRLF(HTTP 응답분할) / XSS / 파일업로드 (0)	2021.03.03
정보보안과정 119 : Paros + sqlmap 2 (0)	2021.03.02
정보보안 과정 Day 118-1 : Paros + sqlmap (0)	2021.02.26

타쿠대디

정보보안 과정 Day120 : Python requests / beautifulsoup 모듈

'정보보안공부 > 정보보안전문과정' 카테고리의 다른 글

+ Recent posts

티스토리툴바