Learn to embrace Unicode...the world isn't ASCII anymore.
Assuming you are on Windows and viewing the .CSV with Excel or Notepad, use the following line on Python 3. With only this change (and fixing indentation of your post), You will even be able to view the non-ASCII characters correctly. Notepad and Excel like a UTF-8 BOM signature at the start of the file, which utf-8-sig
provides.
with open('usnwr_schools.csv', 'w', newline='', encoding='utf-8-sig') as f:
If reading the file in another Python script, make sure to read the file with the following. Your example of what you read b'University of Michigan\xe2\x80\x94\xe2\x80\x8bAnn Arbor'
was read in binary mode 'rb'
.
with open('usnwr_schools.csv', encoding='utf-8-sig') as f:
If on Linux, you can use utf8
instead of utf-8-sig
.
As an aside, you can replace your loops with:
with open('usnwr_schools.csv', 'w', newline='', encoding='utf-8-sig') as f:
writer = csv.writer(f)
for school in reqSoup:
x = reqSoup.find_all("a", {"class" : "school-name"})
for item in x:
y = item.get_text()
writer.writerow([y])
Reading it back:
with open('usnwr_schools.csv',encoding='utf-8-sig') as f:
print(f.read())
Output:
Massachusetts Institute of Technology
Stanford University
University of California—Berkeley
California Institute of Technology
Carnegie Mellon University
University of Michigan—Ann Arbor
Georgia Institute of Technology
University of Illinois—Urbana-Champaign
Purdue University—West Lafayette
University of Texas—Austin (Cockrell)
Texas A&M; University—College Station (Look)
Cornell University
University of Southern California (Viterbi)
Columbia University (Fu Foundation)
University of California—Los Angeles (Samueli)
University of California—San Diego (Jacobs)
Princeton University
Northwestern University (McCormick)
University of Pennsylvania
Johns Hopkins University (Whiting)
Virginia Tech
University of California—Santa Barbara
Harvard University
University of Maryland—College Park (Clark)
University of Washington
Massachusetts Institute of Technology
Stanford University
University of California—Berkeley
California Institute of Technology
Carnegie Mellon University
University of Michigan—Ann Arbor
Georgia Institute of Technology
University of Illinois—Urbana-Champaign
Purdue University—West Lafayette
University of Texas—Austin (Cockrell)
Texas A&M; University—College Station (Look)
Cornell University
University of Southern California (Viterbi)
Columbia University (Fu Foundation)
University of California—Los Angeles (Samueli)
University of California—San Diego (Jacobs)
Princeton University
Northwestern University (McCormick)
University of Pennsylvania
Johns Hopkins University (Whiting)
Virginia Tech
University of California—Santa Barbara
Harvard University
University of Maryland—College Park (Clark)
University of Washington
Massachusetts Institute of Technology
Stanford University
University of California—Berkeley
California Institute of Technology
Carnegie Mellon University
University of Michigan—Ann Arbor
Georgia Institute of Technology
University of Illinois—Urbana-Champaign
Purdue University—West Lafayette
University of Texas—Austin (Cockrell)
Texas A&M; University—College Station (Look)
Cornell University
University of Southern California (Viterbi)
Columbia University (Fu Foundation)
University of California—Los Angeles (Samueli)
University of California—San Diego (Jacobs)
Princeton University
Northwestern University (McCormick)
University of Pennsylvania
Johns Hopkins University (Whiting)
Virginia Tech
University of California—Santa Barbara
Harvard University
University of Maryland—College Park (Clark)
University of Washington
Massachusetts Institute of Technology
Stanford University
University of California—Berkeley
California Institute of Technology
Carnegie Mellon University
University of Michigan—Ann Arbor
Georgia Institute of Technology
University of Illinois—Urbana-Champaign
Purdue University—West Lafayette
University of Texas—Austin (Cockrell)
Texas A&M; University—College Station (Look)
Cornell University
University of Southern California (Viterbi)
Columbia University (Fu Foundation)
University of California—Los Angeles (Samueli)
University of California—San Diego (Jacobs)
Princeton University
Northwestern University (McCormick)
University of Pennsylvania
Johns Hopkins University (Whiting)
Virginia Tech
University of California—Santa Barbara
Harvard University
University of Maryland—College Park (Clark)
University of Washington
Massachusetts Institute of Technology
Stanford University
University of California—Berkeley
California Institute of Technology
Carnegie Mellon University
University of Michigan—Ann Arbor
Georgia Institute of Technology
University of Illinois—Urbana-Champaign
Purdue University—West Lafayette
University of Texas—Austin (Cockrell)
Texas A&M; University—College Station (Look)
Cornell University
University of Southern California (Viterbi)
Columbia University (Fu Foundation)
University of California—Los Angeles (Samueli)
University of California—San Diego (Jacobs)
Princeton University
Northwestern University (McCormick)
University of Pennsylvania
Johns Hopkins University (Whiting)
Virginia Tech
University of California—Santa Barbara
Harvard University
University of Maryland—College Park (Clark)
University of Washington
If you still want to be ASCII only, this will do it:
import requests
import bs4
import csv
results = requests.get('http://grad-schools.usnews.rankingsandreviews.com/best-graduate-schools/top-engineering-schools/eng-rankings?int=a74509')
replacements = {ord('\N{EN DASH}'):'-',
ord('\N{EM DASH}'):'-',
ord('\N{ZERO WIDTH SPACE}'):None}
reqSoup = bs4.BeautifulSoup(results.text, "html.parser")
with open('usnwr_schools.csv', 'w', newline='', encoding='ascii') as f:
writer = csv.writer(f)
for school in reqSoup:
x = reqSoup.find_all("a", {"class" : "school-name"})
for item in x:
y = item.get_text()
writer.writerow([y.translate(replacements)])
with open('usnwr_schools.csv',encoding='ascii') as f:
print(f.read())