본문 바로가기

python

깨진 한글 인코딩 찾기

https://stackoverflow.com/questions/1728376/get-a-list-of-all-the-encodings-python-can-encode-to


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import requests
import lxml.html
 
text= "±¸·ç±¸·ç+01(v7).zip"
textencodings = [] 
 
url = 'https://docs.python.org/3.6/library/codecs.html#standard-encodings'
html = requests.get(url).text
doc = lxml.html.fromstring(html)
standard_encodings_table = doc.xpath(
        '//table[preceding::h2[.//text()[contains(., "Standard Encodings")]]][//th/text()="Codec"]'
    )[0]
textencodings = standard_encodings_table.xpath('.//td[1]/text()')
    
for x in textencodings:
    for y in textencodings:
        try:
            print(text.encode(x).decode(y),x,y)
        except:
            pass
cs

그냥 무식하게 돌려서 찾아내기~


썩섺스

'python' 카테고리의 다른 글

django-tube 튜토리얼 하면서 생긴 오류  (0) 2018.04.04
Tesseract - OCR 연습  (0) 2018.03.23
Share counter in processes  (0) 2018.03.05
Mutiprocess Downloader  (0) 2018.03.05
Multiprocessing  (0) 2018.03.04