[목차]
1. 사용처
2. requests
3. beautifulSoup
[사용처]
파이썬의 기본 문법 ---> 파이썬 모듈 -----> 파이썬 공격 프로그램 개발(on DVWA)
- requests
- urllib/urllib2
- BeautifulSoup
[requests]
■ 모듈 임폴트 하기
import requests
■ HTTP Request Method 종류
url = 'https://api.github.com/events'
data = {'key':'values'}
r = requests.get('https://api.github.com/events')
r = requtest.post(url, data)
r = requests.put(url, data)
r = requests.delete(url)
r = requests.head(url)
r = requests.options(url)
[EX] Method 사용해 보기
■ requests.get
>>> import requests
>>> r = requests.get('https://api.github.com/events')
>>> r
<Response [200]>
>>> r.headers
{'Date': 'Mon, 15 Jul 2019 06:43:56 GMT', 'Content-Type': 'application/json; charset=utf-8', 'Transfer-Encoding': 'chunked', 'Server': 'GitHub.com',
..... (중략) ....
>>> r.text
'[{"id":"10010021980","type":"PushEvent","actor":{"id":8013154,"login":"KexyBiscuit","display_login":"KexyBiscuit","gravatar_id":"","url":"https://api.github.com/users/KexyBiscuit","avatar_url":"https://avatars.githubusercont
..... (중략) .....
■ requests.post
>>> data = {'key': 'value'}
>>> r = requests.post('https://httpbin.org/post', data=data)
>>> r.text
'<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">\n<title>405 Method
..... (중략) .....
■ requests.put
>>> r = requests.put('https://httpbin.org/put', data=data)
>>> r.text
'{\n "args": {}, \n "data": "", \n "files": {}, \n "form": {\n "key":
..... (중략) .....
■ requests.delete
>>> r = requests.delete('https://httpbin.org/delete')
>>> r.text
'{\n "args": {}, \n "data": "", \n "files": {}, \n "form": {\n
..... (중략) .....
■ requests.head
>>> r = requests.head('https://httpbin.org/get')
>>> r.headers
{'Access-Control-Allow-Credentials': 'true', 'Access-Control-Allow-Origin':
..... (중략) .....
■ requests.h
>>> r = requests.options('https://httpbin.org/get')
>>> r.headers
{'Access-Control-Allow-Credentials': 'true', 'Access-Control-Allow-Methods':
..... (중략) .....
[1] HTTP Request
* GET Method
* POST Method
1. GET Method
url = 'https://httpbin.org/get'
params = {'key1': 'value1'}
params = {'key1': 'value1', 'key2': 'value2', 'key3': ['value1', 'value2']}
headers = {'user-agent': 'my-app/0.0.1'}
proxies = {'http': 'http://localhost:8080', 'https': 'https://localhost:8080'}
r = requests.get(url)
r = requests.get(url, params=params)
r = requests.get(url, headers=headers)
r = requests.get(url, proxies=proxies)
2. POST Method
url = 'https://api.github.com/events'
data = {'some': 'data'}
data = {'key1': 'value1', 'key2': 'value2', 'key3': ['value1', 'value2']}
headers = {'user-agent': 'my-app/0.0.1'}
proxies = {'http': 'http://localhost:8080', 'https': 'https://localhost:8080'}
files = {'file': 'file contents'}
files = {'file': open('report.xls', 'rb')}
files = {'file': open('report.xls', 'rb'), 'application/vnd.ms-excel', {'Expires': '0'}}
r = requtest.post(url, data=data)
r = requests.post(url, json=data)
r = requests.post(url, files=files)
[참고]
r = requests.get(url)
[EX] HTTP Reqeust - GET/POST Method 실습
■ URL 매개 변수 전달
>>> import requests
>>> payload = {'key1': 'value1', 'key2': 'value2'}
>>> r = requests.get('https://httpbin.org/get', params=payload)
>>> r.url
'https://httpbin.org/get?key1=value1&key2=value2'
>>> payload = {'key1': 'value1', 'key2': ['value2', 'value3']}
>>> r = requests.get('https://httpbin.org/get', params=payload)
>>> r.url
'https://httpbin.org/get?key1=value1&key2=value2&key2=value3'
■ 사용자 정의 헤더
>>> url = 'https://api.github.com/some/endpoint'
>>> headers = {'user-agent': 'my-app/0.0.1'}
>>> r = requests.get(url, headers=headers)
>>>
-> 정상적으로 잘 보내지는지만 테스트 한다.
■ 복잡한 POST 요청
>>> payload = {'key1': 'value1', 'key2': 'value2'}
>>> r = requests.post("https://httpbin.org/post", data=payload)
>>> r.text
{\n "args": {}, \n "data": "", \n "files": {}, \n "form": {\n "key1": "value1", \n "key2": "value2"\n },
..... (중략) .....
>>> payload_tuples = [('key1', 'value1'), ('key1', 'value2')]
>>> r1 = requests.post('https://httpbin.org/post', data=payload_tuples)
>>> payload_dict = {'key1': ['value1', 'value2']}
>>> r2 = requests.post('https://httpbin.org/post', data=payload_dict)
>>> r1.text
'{\n "args": {}, \n "data": "", \n "files": {}, \n "form": {\n "key1": [\n "value1", \n "value2"\n ]\n }, \n
>>> r1.text == r2.text
True
>>> import json
>>> url = 'https://api.github.com/some/endpoint'
>>> payload = {'some': 'data'}
>>> r1 = requests.post(url, data=json.dumps(payload))
>>> r2 = requests.post(url, json=payload)
>>> r1.text == r2.text
True
■ POST 멀티 파트 인코딩 된 파일
>>> import os
>>> os.system('echo 1111 > report.xls')
0
>>> os.system('cat report.xls')
1111
0
>>> url = 'https://httpbin.org/post'
>>> files = {'file': open('report.xls', 'rb')}
>>> r = requests.post(url, files=files)
>>> r.text
'{\n "args": {}, \n "data": "", \n "files": {\n "file": "1111\\n"\n },
..... (중략) .....
-> '0'은 운영체제 쉘의 return value 이다. 파일의 내용이 아니다.
>> url = 'https://httpbin.org/post'
>>> files = {'file': ('report.csv', 'some,data,to,send\nanother,row,to,send\n')}
>>> r = requests.post(url, files=files)
>>> r.text
'{\n "args": {}, \n "data": "", \n "files": {\n "file": "some,data,to,send\\nanother,row,to,send\\n"\n },
..... (중략) .....
[2] HTTP Response
r.headers
r.headers['content-type'] == r.headers.get('content-type')
r.text
r.content
r.json()
r.raw
r.url
r.encoding
r.status_code
[EX] HTTP Response 실습
■ 일반적인 응답 내용
>>> import requests
>>> r = requests.get('https://api.github.com/events')
>>> r.text
[{"id":"10010171733","type":"DeleteEvent","actor":{"id":27856297,"login":"
..... (중략) .....
>>> r.encoding
'utf-8'
■ JSON 응답 내용
>> r = requests.get('https://api.github.com/events')
>>> r.json()
[{'id': '10010192817', 'type': 'PushEvent', 'actor': {'id': 158862, 'login':
..... (중략) .....
■ 응답 상태 코드
>>> r = requests.get('https://httpbin.org/get')
>>> r.status_code
200
>>> r.status_code == requests.codes.ok
True
■ 응답헤더
>> r.headers
{'Access-Control-Allow-Credentials': 'true', 'Access-Control-Allow-Origin': '*', 'Content-Encoding': 'gzip', 'Content-Type': 'application/json',
..... (중략) .....
>>> r.headers['Content-Type']
'application/json'
>>> r.headers.get('content-type')
'application/json
■ 쿠키
(주의) python console 재실행한다.
>>> import requests
>>> url = 'http://192.168.10.134/dvwa/login.php'
>>> proxies = {'http': 'http://localhost:9000', 'https': 'https://localhost:9000'}
>>> s = requests.Session()
>>> r = s.get(url, proxies=proxies)
>>> r.cookies
<RequestsCookieJar[Cookie(version=0, name='PHPSESSID', value='1444a34491d1e683cca6caedba13f5ea', port=None, port_specified=False, domain='192.168.10.134', domain_specified=False, domain_initial_dot=False, path='/', path_specified=True, secure=False, expires=None, discard=True, comment=None, comment_url=None, rest={}, rfc2109=False), Cookie(version=0, name='security', value='high', port=None, port_specified=False, domain='192.168.10.134', domain_specified=False, domain_initial_dot=False, path='/dvwa', path_specified=False, secure=False, expires=None, discard=True, comment=None, comment_url=None, rest={}, rfc2109=False)]>
>>> r.cookies['PHPSESSID']
'1444a34491d1e683cca6caedba13f5ea'
■ 리디렉션 및 기록
>> r = requests.get('http://github.com/')
>>> r.url
>>> r.status_code
200
>>> r.history
[<Response [301]>]
>>> r = requests.get('http://github.com/', allow_redirects=False)
>>> r.status_code
301
>>> r.history
[]
>>> r = requests.head('http://github.com/', allow_redirects=True)
>>> r.url
>>> r.history
[<Response [301]>]
■ 실습에서 사용하려는 포맷 - GET/POST Method & Response
url = 'https://api.github.com/events'
data = {'username':'admin', 'password':'password', 'Login':'Login'}
proxies = {'http':'http://localhost:8080', 'https':'http://localhost:8080'}
s = reqeusts.Session()
req = request.Request('POST', url, data=data)
prepared = s.prepare_request(req)
resp = s.send(prepared, proxies=proxies)
or
url = 'https://api.github.com/events'
data = {'username':'admin', 'password':'password', 'Login':'Login'}
proxies = {'http':'http://localhost:8080', 'https':'http://localhost:8080'}
s = requests.Session()
resp = s.post(url, data=data, proxies=proxies)
[실습] Attacker(192.168.10.60) -- burpsuite --> DVWA
• 설정: burpsuite(proxy:9000), firefox(proxy:9000)
• 대상: http://192.168.10.134/dvwa/login.php
import requests
login_url = 'http://192.168.10.134/dvwa/login.php'
login_data = {'username':'admin', 'password':'password', 'Login':'Login'}
proxies = {'http':'http://localhost:9000', 'https':'http://localhost:9000'}
s = requests.Session()
resp = s.post(login_url, data=data, proxies=proxies)
print(resp.text)
[beautifulsoup]
1. 관련 문서
https://www.crummy.com/software/BeautifulSoup/bs4/doc/
2. 패키지 추가 :
PyCharm > File > Settings > Project:프로젝트이름
> Project Interpreter > +
패키지 이름: beautifulsoup4, bs4, lxml, html5lib
3. 모듈 임폴트 하기
from bs4 import BeautifulSoup
4. 파서 선택하기(Parser selection)
soup = BeautifulSoup(resp, 'html.parser') /* Python's HTML Parser */
soup = BeautifulSoup(resp, 'lxml') /* lxml's HTML Parser */
soup = BeautifulSoup(resp, 'lxml-xml') /* lxml's XML Parser */
soup = BeautifulSoup(resp, 'html5lib') /* html5lib */
5. <Tag> : 첫 번째 만나는 것만 찾아줌
soup.title
soup.a
soup.p
soup.body.b
6. <Name, String>
soup.title.name
soup.title.string
7. <Attribute>
soup.a.attrs
soup.a['class']
soup.a['id']
soup.a.get('class')
속성 지우기
del soup.a['id']
8. .find() / .find_all() : 리스트 안에 들어가게 된다.
(ㄱ) find() / find_all() 형식
find_all()
• soup.find_all(name, attr, recursive, string, limit, **kwargs)
find()
• soup.find(name, attr, recursive, string, limit, **kwargs)
(ㄴ) tag/attributes/text/keyword 매개변수
<Tag> 매개변수
soup.find_all('title')
soup.find_all({'h1', 'h2', 'h3', 'h4', 'h5', 'h6'})
soup.find_all(['h1', 'h2', 'h3', 'h4', 'h5', 'h6'])
<attributes> 매개변수
soup.find_all("a", {"class": {"red", "green"}})
<text> 매개변수
soup.find_all(id="the prince")
== soup.find_all("", {"id": "the prince"})
(주의)
(X) soup.find_all(class="green") /* class : 파이썬 예약어, class 지정할 때 사용 */
(0) soup.find_all(class_="green")
(0) soup.find_all("", {"class": "green"})
<keyword> 매개변수
soup.find_all(id="text")[0].get_text()
soup.find_all(id="text")[0].string
(ㄷ) .find_all() 여러가지 예제
soup.find_all("p", "title")
soup.find_all(id="link3")
soup.find_all(id=True)
soup.find_all(["a", "b"])
soup.find_all("a", class_="message") # 예전 방식
soup.find_all("a", {"class":"message"}) # 새로운 방식
soup.find_all("a", text="Elsie") # 예전 버전
soup.find_all("a", string="Elsie") # 새로운 버전
soup.find_all(string="Elsie")
soup.find_all(string=["Title", "Elsie", "Lacie"])
soup.find_all(attrs={"id": "link1"})
soup.find_all("a", {"class":re.compile("sister")})
9. .prettify() : 보기 좋게 출력
soup.prettify()
soup.b.prettify()
10. 반복구문을 이용한 다루기
from bs4 import BeautifulSoup
....
soup = BeautifulSoup(resp, 'lxml')
for link in find_all('a'):
print(link.get('href'))
[EX1]
print(soup.html.head.title)
print(soup.title)
print(soup.find_all("title")[0])
[EX2]
print(soup.title.string)
print(soup.title.get_text())
print(soup.find_all("title")[0].get_text())
[EX3]
print(soup.find_all("a"))
print(soup.find_all("a", class_="sister"))
(list 아니라면)
for i in soup.find_all("a", class_="sister"):
print(i)
[EX4]
print(soup.find_all("a", class_="sister")[1])
(list 아니라면)
for i in soup.find_all('a'):
if re.search('Elsie2', str(i)):
print(i)
[EX5]
print(soup.find_all("a", {'id': 'link1'})[1].get_text())
[EX6]
for ptag in soup.find_all("p", class_="story"):
if re.search('ID', str(ptag)):
print(ptag.string)
for ptag in soup.find_all("p", class_="story"):
if re.search('ID', str(ptag)):
print(ptag.get_text())
[EX7]
for ptag in soup.find_all("p", class_="story"):
if re.search('ID', str(ptag)):
print(ptag.string.split(',')[1])
[EX8]
print(soup.input['name'])
[EX9]
print(soup.get_text())
[EX10]
p=re.compile("Dormouse's")
print(p.findall(soup.get_text()))
'정보보안공부 > 정보보안전문과정' 카테고리의 다른 글
정보보안 과정 Day 121 : 파이썬 공격코드 제작 실습2 (0) | 2021.03.05 |
---|---|
정보보안 과정 Day120-1 : 파이썬 공격 코드 제작 실습 (0) | 2021.03.04 |
정보보안 과정 Day 119 : CRLF(HTTP 응답분할) / XSS / 파일업로드 (0) | 2021.03.03 |
정보보안과정 119 : Paros + sqlmap 2 (0) | 2021.03.02 |
정보보안 과정 Day 118-1 : Paros + sqlmap (0) | 2021.02.26 |