kath
Posts: 7
Joined: Thu Sep 07, 2017 11:13 pm

Get a table from a website to a CSV file

Tue Jul 03, 2018 4:19 pm

Hello,

I was given the challenge of transforming the tables of a website into a CSV file using code in Python. (website: http://www.tbs-sct.gc.ca/pses-saff/2017 ... ng.aspx#s1)

So far, I've only succeeded in extracting the text from that website. Any help would be very appreciated.

Thanks!

Heater
Posts: 9482
Joined: Tue Jul 17, 2012 3:02 pm

Re: Get a table from a website to a CSV file

Tue Jul 03, 2018 4:34 pm

Sounds like you need html.parser to pull the relevant data out of the table elements of the page you have downloaded.
https://docs.python.org/3/library/html.parser.html


kath
Posts: 7
Joined: Thu Sep 07, 2017 11:13 pm

Re: Get a table from a website to a CSV file

Tue Jul 10, 2018 12:56 pm

Hello,

I tried using BeautifulSoup to parse tables from websites to a CSV file.
I am trying to run this code:

Code: Select all

import csv
from bs4 import BeautifulSoup
from urllib.request import *

soup = BeautifulSoup(urlopen('http://www.fsa.gov.uk/about/media/facts/fines/2002http://www.tbs-sct.gc.ca/pses-saff/2017-2/results-resultats/bq-pq/12/org-eng.aspx#s1'))
table = soup.find('table', attrs={ "class" : "table-horizontal-line"})
headers = [header.text for header in table.find_all('th')]
rows = []
for row in table.find_all('tr'):
    rows.append([val.text.encode('utf8') for val in row.find_all('td')])

with open('test.csv', 'wb') as f:
    writer = csv.writer(f)
    writer.writerow(headers)
    writer.writerows(row for row in rows if row)
    
Every time I try to run the code above, I get this error:
Could anybody help me with this? Thanks!
Traceback (most recent call last):
File "/home/pi/Documents/pandas.py", line 35, in <module>
soup = BeautifulSoup(urlopen('http://www.fsa.gov.uk/about/media/facts ... ng.aspx#s1'))
File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.4/urllib/request.py", line 461, in open
response = meth(req, response)
File "/usr/lib/python3.4/urllib/request.py", line 571, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python3.4/urllib/request.py", line 499, in error
return self._call_chain(*args)
File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain
result = func(*args)
File "/usr/lib/python3.4/urllib/request.py", line 579, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

scotty101
Posts: 2987
Joined: Fri Jun 08, 2012 6:03 pm

Re: Get a table from a website to a CSV file

Tue Jul 10, 2018 1:34 pm

You appear to the two URLs in the urlopen function arguments. Which one did you want? fsa.gov.uk or bs-sct.gc.ca?
Electronic and Computer Engineer
Pi Interests: Home Automation, IOT, Python and Tkinter

DirkS
Posts: 8495
Joined: Tue Jun 19, 2012 9:46 pm
Location: Essex, UK

Re: Get a table from a website to a CSV file

Tue Jul 10, 2018 1:36 pm

kath wrote:
Tue Jul 10, 2018 12:56 pm
Every time I try to run the code above, I get this error:
Could anybody help me with this? Thanks!
Traceback (most recent call last):
File "/home/pi/Documents/pandas.py", line 35, in <module>
soup = BeautifulSoup(urlopen('http://www.fsa.gov.uk/about/media/facts ... ng.aspx#s1'))
File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.4/urllib/request.py", line 461, in open
response = meth(req, response)
File "/usr/lib/python3.4/urllib/request.py", line 571, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python3.4/urllib/request.py", line 499, in error
return self._call_chain(*args)
File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain
result = func(*args)
File "/usr/lib/python3.4/urllib/request.py", line 579, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
Well, you cannot get a clearer error message. The URL is wrong.
I get the same error if I paste that error in my browser...

kath
Posts: 7
Joined: Thu Sep 07, 2017 11:13 pm

Re: Get a table from a website to a CSV file

Wed Jul 11, 2018 1:27 pm

Hi,

Thanks a lot for the help.

Now that I fixed the URL (http://www.tbs-sct.gc.ca/pses-saff/2017 ... ng.aspx#s1), I get another error:
Traceback (most recent call last):
File "/home/pi/Documents/pandas.py", line 38, in <module>
headers = [header.text for header in table.fin_all('th')]
AttributeError: 'NoneType' object has no attribute 'fin_all'
I tried changing the 'find_all' to 'findAll' but I get the same error. How could I fix this?

Thanks!
Oceanne

DirkS
Posts: 8495
Joined: Tue Jun 19, 2012 9:46 pm
Location: Essex, UK

Re: Get a table from a website to a CSV file

Wed Jul 11, 2018 1:35 pm

kath wrote:
Wed Jul 11, 2018 1:27 pm
Now that I fixed the URL (http://www.tbs-sct.gc.ca/pses-saff/2017 ... ng.aspx#s1), I get another error:
Traceback (most recent call last):
File "/home/pi/Documents/pandas.py", line 38, in <module>
headers = [header.text for header in table.fin_all('th')]
AttributeError: 'NoneType' object has no attribute 'fin_all'
I tried changing the 'find_all' to 'findAll' but I get the same error. How could I fix this?
pandas is a Python module. A file named pandas.py in directory ~/Documents could mean that Python cannot find the pandas module.
Did you create it yourself? If so then you should rename (or move / rename) it.

kath
Posts: 7
Joined: Thu Sep 07, 2017 11:13 pm

Re: Get a table from a website to a CSV file

Wed Jul 11, 2018 3:12 pm

Hi,
pandas is a Python module. A file named pandas.py in directory ~/Documents could mean that Python cannot find the pandas module.
Did you create it yourself? If so then you should rename (or move / rename) it.
I renamed my python document but I still get the error:
Traceback (most recent call last):
File "/home/pi/Documents/webScraping.py", line 7, in <module>
headers = [header.text for header in table.find_all('th')]
AttributeError: 'NoneType' object has no attribute 'find_all'
Is there something else I could try?
Thanks,

DirkS
Posts: 8495
Joined: Tue Jun 19, 2012 9:46 pm
Location: Essex, UK

Re: Get a table from a website to a CSV file

Wed Jul 11, 2018 3:47 pm

We'll need to see your code...

kath
Posts: 7
Joined: Thu Sep 07, 2017 11:13 pm

Re: Get a table from a website to a CSV file

Wed Jul 11, 2018 8:45 pm

Hello,

I am trying to parse tables from websites to a CSV file with BeautifulSoup.

Here is my code:

Code: Select all

import csv
from bs4 import BeautifulSoup
from urllib.request import *

soup = BeautifulSoup(urlopen('http://www.tbs-sct.gc.ca/pses-saff/2017-2/results-resultats/bq-pq/12/org-eng.aspx#s1'))
table = soup.find('table', attrs={ "class" : "table-horizontal-line"})
headers = [header.text for header in table.find_all('th')]
rows = []
for row in table.find_all('tr'):
    rows.append([val.
                 text.encode('utf8') for val in row.find_all(
                     'td')])

with open('test.csv', 'wb') as f:
    writer = csv.writer(f)
    writer.writerow(headers)
    writer.writerows(row for row in rows if row)
Every time I try to run it, I get the following error:
Traceback (most recent call last):
File "/home/pi/Documents/webScraping.py", line 7, in <module>
headers = [header.text for header in table.find_all('th')]
AttributeError: 'NoneType' object has no attribute 'find_all'
I tried changing 'find_all' by 'findAll' but it didn't work.
How could I fix this error?

Thanks,
Oceanne

pfletch101
Posts: 107
Joined: Sat Feb 24, 2018 4:09 am

Re: Get a table from a website to a CSV file

Wed Jul 11, 2018 9:14 pm

The error message appears to be telling you that the variable 'table' has no contents - probably meaning that the soup.find function is failing to find anything meeting the specified search criteria. A fairly quick glance at the source of the target page does seem to confirm that there isn't such a table - it is an actively generated page, and these can be insanely difficult to scrape.

Return to “Python”

Who is online

Users browsing this forum: Heater, Paul Hutch and 17 guests