You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: webscraping_challenge.md
+67Lines changed: 67 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,3 +11,70 @@ Choose between one of the two websites below:
11
11
-[The Epic List of 250 Legendary Swords](https://hobbylark.com/fandoms/The-Epic-List-of-250-Legendary-Swords)
12
12
13
13
In both sites, you're presented with a long list of data (names of flowers or swords) as well as a description of each item. Using Beautiful Soup, return ALL the flowers/swords AND THEIR DESCRIPTIONS from the page and place them inside of a dictionary. The format of the dictionary is up to you- what do you think will be the best design?
14
+
15
+
<!--
16
+
#!/usr/bin/python3
17
+
"""
18
+
Learning to scrape webdata with BeautifulSoup
19
+
"""
20
+
21
+
from requests import get
22
+
from requests.exceptions import RequestException
23
+
from contextlib import closing
24
+
from bs4 import BeautifulSoup
25
+
from bs4 import element
26
+
import json
27
+
28
+
def simple_get(url):
29
+
"""
30
+
Attempts to get the content at `url` by making an HTTP GET request.
31
+
If the content-type of response is some kind of HTML/XML, return the
32
+
text content, otherwise return None.
33
+
"""
34
+
try:
35
+
with closing(get(url, stream=True)) as resp:
36
+
# stream=True means Requests cannot release the connection until closed
37
+
# closing() will close "resp" at the end of this block
38
+
if is_good_response(resp):
39
+
return resp.content
40
+
# .content() reads the HTML of the Requests object
41
+
else:
42
+
return None
43
+
44
+
except RequestException as e:
45
+
log_error('Error during requests to {0} : {1}'.format(url, str(e)))
46
+
return None
47
+
48
+
49
+
def is_good_response(resp):
50
+
"""
51
+
Returns True if the response seems to be HTML, False otherwise.
0 commit comments