Skip to content
This repository was archived by the owner on May 25, 2022. It is now read-only.

Commit 91e4239

Browse files
committed
Changes to the python script, readme and deleting obselete files
Named the text files after article name. Saving all articles to a folder scraped_articles existing in the same directory. Updated the readme.md accordingly. Deleted scraped_article.txt file.
1 parent 7c23be8 commit 91e4239

File tree

5 files changed

+139
-10
lines changed

5 files changed

+139
-10
lines changed
+3-2
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,14 @@
11
# Scraping Medium Articles
22
Well [Medium](https://medium.com/) is a website containing great articles and used by many programmers.
3-
<br>This script asks the user for the url of a medium article, scrapes it's text and saves it to a text file in the same directory.
3+
<br>This script asks the user for the url of a medium article, scrapes it's text and saves it to a text file into a folder named scraped_articles in the same directory.
4+
<br>There are 3 text files in the folder scraped_articles as an example of how the article is scraped.
45

56
### Prerequisites
67
`pip` install the modules given in requirements.txt
78
<br>Have a working network connection on the device
89

910
### How to run the script
10-
Steps on how to run the script along with suitable examples.
11+
Run it like any other python file
1112

1213
## *Author Name*
1314
[Naman Shah](https://github.com/namanshah01)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
url: https://medium.com/code-for-cause/one-month-into-the-mlh-fellowship-448249f61590
2+
3+
Title: ONE MONTH INTO THE MLH FELLOWSHIP
4+
by Kunal Kushwaha
5+
6+
INTRODUCTION
7+
8+
One month into the MLH FellowshipKunal KushwahaFollowJul 5 · 8 min read
9+
10+
“In real open source, you have the right to control your own destiny.”
11+
— Linus Torvalds
12+
What is the MLH Fellowship?
13+
The MLH Fellowship is an internship alternative for software engineers, with a focus on Open Source projects. Instead of working on a project for just one company, students contribute to Open Source projects that are used by companies around the world. At the beginning of the program, fellows are placed into small groups called “pods” that collectively contribute to the assigned projects as a team under the educational mentorship of a professional software engineer.
14+
Open source is a great way to get real-world software development experience from the comfort of your home. The open source community is very helpful and encourages new developers to take part in their organizations. One gains exposure, can test their skills, gain knowledge and bond with the community in order to produce quality code that helps people around the world.
15+
The Process
16+
I found out about the program via the MLH mailing list. Being an Open Source enthusiast, I was impressed by the structure of the program. Having attended past MLH events, I knew I had to sign up for this. The initial phase was the shortlisting of applications followed by a technical interview. Apart from work, the fellowship program also provides opportunities to build a network and have fun while doing so!
17+
Result of my application
18+
Students get to work on the latest Open Source technologies and are matched with projects according to their skills and interest, providing students with a learning opportunity while contributing to real-world projects. But, it’s not just about coding. Soft-skills and team-building exercises are conducted by MLH regularly, in addition to technical hands-on workshops! It’s a remote opprtunity but provides a global platform for students to showcase their skills.
19+
Students are also provided with a monthly stipend to help cover the basic living expenses during their participation in the program.
20+
Source: https://github.blog/2020-06-24-welcome-to-the-inaugural-class-of-mlh-fellows/
21+
22+
WEEK 1
23+
Alright, so the first week. This week was spent getting acquainted with the Fellowship system as well as getting to know the team members. I got introduced to some amazing community members during this time. Being an open-source enthusiast, I believe that diversity in the workplace and participation from people hailing from different cultures is necessary as well as instrumental for the growth of the IT sector. It exposes one to the multitude of values and principles that people from varying ethnicities hold. Meeting people from around the world teaches people to respect opposing perspectives and opinions, and ingrains in them respect for their peers.
24+
We followed an exercise in which each fellow had to have a 1-on-1 get to know meeting with each of their Pod members which I believe this was a great way to get to know each other. We also got introduced to our mentor Jani, who has been a great motivation throughout the program and is helping each and every one of us achieve more, both in terms of technical as well as soft skills. During one of our first stand-ups, we decided on the name of our Pod together as a Team. I remember Jessie (my Podmate) suggested Reactive Sharks and I suggested Hackathon Sea-Son (as the theme was marine), and that’s how we ended up with Reactive Sea-Son (the best pod).
25+
Reactive Sea-Son Logo
26+
The first week ended with an Orientation Hackathon where we were divided into groups of 3–4. I got to see so many amazing projects presented by my fellow fellows. Our team Quarantime (pun intended) built a social media platform using MERNG stack, for students to use during the quarantine.
27+
28+
These are the projects that I really thought went out of the box!
29+
MLH-Fellowship/0.4.2-cssifyTired of using Bootstrap/Bulma, but don't want to scaffold a whole bunch of CSS on your own? CSSify to the rescue …github.com
30+
MLH-Fellowship/0.4.1-Execute.ly-serverServer: Edit and execute handwritten or any code in an image right in your browser. …github.com
31+
32+
WEEK 2
33+
This week started with the announcement of hackathon results and team Execute.ly from our Pod bagged the first prize! We all were really proud of our team, also because everyone in the winning team’s Pod was going to get prizes xD. I also spent some time this week to design our Pod’s logo.
34+
We were excited for week 2 as this was the week during which our mentor was going to assign us projects. I found out that I will be contributing to Jest this summer. Jest is a JavaScript testing framework maintained by Facebook. It felt amazing that the code that I am going to write is going to be used by people around the world. Plus getting involved in the community of experienced developers is itself a huge learning opportunity.
35+
After having a much project kickoff call with the Jest maintainers, the rest of the week was spent into learning more about the projects. I believe that writing blogs is a great way to show what you have learned to the community and help other newcomers in the projects as well. Keeping that in mind I wrote a blog on the architecture of Jest, provided below.
36+
Jest ArchitectureWhy is Testing important?medium.com
37+
38+
WEEK 3
39+
Week 3 started on a Monday with our daily standup. This was the week of coding and exploring more about the projects assigned to us. We also got introduced to weekly retrospectives and show and tells. Weekly retrospectives are a way to communicate with your team and let them know about your progress, shoutouts, and any blockers they might be facing. It’s divided into sub-points like:
40+
Shoutouts (Optional Thank You’s / Recognition) — If anyone went above and beyond, let them know!Red (Stop / need help) — List out areas that have been challenging. This could include projects, tasks, workload, or challenges with Podmates. What didn’t work well this week? What can be done differently next week?Yellow (Use caution) — Provide context on areas of improvement. This could include projects, tasks, workload, or challenges with Podmates. What can be improved upon for next week? What resources and tools could you use to reach success?Green (All Good!) — Highlight What some of your successes were. What has gone well this week? Give examples of your weekly wins! This could include projects, tasks, or successes in teamwork.
41+
Pod Retrospective
42+
This is the week we started conducting show and tells. I had never been a part of such activity before where a person publicly presents what they have learned to a group of people and then they all have discussions over it. It seemed like a great learning opportunity for everyone and I highly recommend it. I volunteered for our first show and tell to give a demo on Docker, Kubernetes, and Red Hat’s Java K8s client. And I must say, it went amazing! Everyone, including me, learned a lot. I started with an introduction to the topics following a hands-on demo. Whatever discussion we have as a team, one of the best parts is the guidance and perspective we receive from our mentor, Jani, on the topics of discussion to relate it to the real-world. Shout to everyone on our team for being an amazing audience and for their active participation ☀️
43+
Kubernetes Made EasyWhat is Kubernetes?medium.com
44+
This was also the week when we got some PRs flowing to Jest. Shout out to Saurav, who is an amazing teammate and it has been an amazing experience contributing to Jest with him. I also got to attend various workshops this week conducted by MLH. My favourite one this week was an Introduction to Network Security by Kyle 👨‍💻
45+
46+
WEEK 4
47+
E-Liang’s show and tell
48+
Week 4, better known as the week of PRs. The highlight of this week was the show and tell by E-Liang and the launch of Foam by Jani. For our second show and tell this week, we had E-Liang as a volunteer. This was hands-down my favorite personal project by a person. We got to know about Gent, which is a lightweight, reusable business logic layer that makes it easy to build GraphQL servers in Node.js and TypeScript, which is heavily inspired by Ent, a Facebook Open Source project.
49+
taneliang/gentGent is a lightweight, reusable business logic layer that makes it easy to build GraphQL servers in Node.js and…github.com
50+
I also got to attend a lot of sessions this week such as the React-Native session by our mentor Jani, webinar on working remotely by Joe Nash, and a discussion about “Designing Your Life” by John Britton who shared his inspiring journey with the Fellows and other community members. I also had a one-on-one mentorship session with Jani which was really educational. I got several life lessons and pointers on how to be a better developer and get the most out of my learning experience.
51+
By far the most impressive thing this week was the release of Foam, a personal knowledge management and sharing system, by Jani. The project blew up in a matter of days and now has more than 4.4k stars on GitHub!! That’s a big number. Check it out:
52+
foambubble/foam👋 Hello friend! Looks like you're reading this page on GitHub. Please go to the 👉 rendered Foam Workspace for an…github.com
53+
So far the journey has been amazing, unlike any other program that I’ve been a part of. It’s a perfect balance between education and contributions + having fun while doing so.
54+
The end of the week was followed by a delightful session of pictionary with the MLH fellows. 🖼
55+
56+
CONTACT:
57+
Twitter: https://twitter.com/kush_kunal
58+
Thanks for reading!Gonna clap this one out like we do in standups 👏
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
url: https://medium.com/coding-blocks/one-stop-guide-to-google-summer-of-code-a9e803beeda7
2+
3+
Title: ONE STOP GUIDE TO GOOGLE SUMMER OF CODE
4+
by Harshit Dwivedi
5+
6+
INTRODUCTION
7+
8+
One stop guide to Google Summer of CodeHarshit DwivediFollowJul 18, 2018 · 9 min read
9+
Getting bombarded with tons of messages and requests on the same topic over and over again, I was about to post a “If I had a penny for …” joke on my social media handles.
10+
But instead, why not write a blog instead containing all the ifs, buts, whys and hows on Google Summer of Code.
11+
So if you are a student who is wondering about getting into Google Summer of Code or someone who has been pestered with questions regarding GSoC, hang on tight, while this is going to be a long one, I can assure you that this is going to helpful.
12+
Let’s first get the basics out of the way :
13+
14+
WHAT IS GOOGLE SUMMER OF CODE?
15+
Simply put, it’s a 16 week long program by Google aimed at promoting Open Source Software development among college and university students.
16+
You work with one of the many Open Source Organizations on a language/framework of your own choice.
17+
In return you get :
18+
1. An excellent experience of working on a real world project
19+
2. A chance to get mentored by some of the best software developers from tech giants like Facebook and Google
20+
3. The Google Summer of Code Tag, that will benefit you immensely with all your job hunts and not to mention a Golden referral to apply for any role at Google!
21+
4. Of course the money and the bragging rights! 😎 💰
22+
23+
WHAT GOOGLE SUMMER OF CODE IS NOT?
24+
An Internship!
25+
Google Summer of Code isn’t an internship and it definitely isn’t you interning at Google.
26+
It’s merely Google providing you and the Open Source Organizations a platform to work together.A direct entry into Google
27+
While it’s true that you get a referral to apply at any opening in Google, GSoC does not give you a direct pass into Google.
28+
You still have to go through all the interview rounds, it just gives you an extra edge over the competition.A trend that you **have to** be a part of
29+
Please, don’t treat GSoC as an IIT JEE entrance exam that you have to crack in order to be successful.
30+
I’ve seen folks achieve wonderful things without even doing GSoC and vice versa, so preparing for and applying into GSoC just because every other Tom, Dick and Harry is doing so it ridiculous.
31+
Students, especially Indian students need to understand the essence and the deeper meaning behind the program and only go for it if it’s something you truly resonate with and are willing to continue long after the program ends and you are not getting paid for it.
32+
33+
WHEN DOES GOOGLE SUMMER OF CODE HAPPEN?
34+
The application process officially starts sometime around March, but the selected organizations are announced sometime in February first week.
35+
So students can start looking into the selected organizations and shortlist projects which interest them.
36+
37+
HOW DO I GET INTO GOOGLE SUMMER OF CODE?
38+
Getting into GSoC is not a one step process; rather it’s a multi-step process ranging from February - April and you need to perfect each and every step to maximize your chances.
39+
I’ll outline the major steps below :
40+
1. Start early!
41+
Since GSoC isn’t a one step process as mentioned beforehand, you need to get started as early as possible which means shortlist the organization(s) and project(s) which interest you and start contributing to them as soon as they are announced by google.
42+
However, some students also start way early on in November/December. Instead of waiting for new organizations to be announced, they shortlist few organizations which have been selected continuously for the past few years.
43+
While this is risky, if done properly and carefully, it does give you an edge over others, since the number of contributions and interactions you’ve had with the organization factors in a lot while applying for GSoC under that organization.
44+
P.S. While you have to work with one, to maximize your chances, you can apply for 3 organizations/projects, so select them carefully.
45+
2. Contribute
46+
This is probably the most important phase of GSoC.
47+
Once you’ve shortlisted, you have to focus on contributing as much as possible to the organization(s) you’ve selected.
48+
Pro Tip : Don’t select more than 3 organizations, it’ll only diminish your chances since you won’t be able to focus properly on any one of those.
49+
What does a contribution mean?
50+
Anything from fixing/reporting an issue in the project or implementing a new feature to writing documentation for setting up and using the project counts as a contribution.
51+
Granted each of them has a different weight attached with them, for example fixing an issue/adding a new feature is generally contains more weightage than reporting an issue or writing documentation.
52+
But as someone newly exploring to a project, starting off with filing issues and writing documentation is a good idea.
53+
3. A good proposal will help you hit it home
54+
A proposal is a document which you submit to the organization(s) you’ve selected in the above step which outlines a detailed breakdown on how you plan on enhancing/building the project in the 16 week coding period of Google Summer of Code.
55+
Your proposal is going to be the secret key towards ensuring your selection so ensure that you are putting in extra efforts towards making it as detailed and informative as proposal.
56+
P.S. Please do not float the same proposal across multiple projects/organizations, each project should have a separate proposal of its own.
57+
I won’t be outlining the best practices on writing a good proposal as I believe the following blog does a fantastic job at it, so I encourage you to go through it before starting off with your proposal.
58+
Also, here’s my proposal, in case you want to refer to it and get a general sense of how a proposal should be made. ;)
59+
https://drive.google.com/file/d/0B6OtIpAL6oa6U3JURDA2cjVZVlZ5UUVqcXRBTGlrY0hmUkVV/view?usp=sharing
60+
4. Repeat
61+
After you’re done with submitting your proposal, don’t sit idle.
62+
You get a window of 1 month from the day when you submit the proposal to the day when the selected students are announced.
63+
Make the best of this opportunity to contribute even more to maximize the chances.
64+
P.S. Interacting with the organization members publicly and giving them your feedback on upcoming features and releases is also a potential contribution that can be done.
65+
You can generally find the contact link for an Organization at it’s page in the GSoC website.

projects/Scraping Medium Articles/scraped_article.txt renamed to projects/Scraping Medium Articles/scraped_articles/The_Pros_and_Cons_of_Open_Source_Software.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
url: https://medium.com/4thought-studios/the-pros-and-cons-of-open-source-software-d498304f2a95
22

3-
Title: THE PROS AND CONS OF OPEN SOURCE SOFTWARE
3+
Title: THE PROS AND CONS OF OPEN SOURCE SOFTWARE
44
by Khalil Khalaf
55

66
INTRODUCTION

projects/Scraping Medium Articles/scraping_medium.py

+12-7
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99

1010
# function to get the html of the page
1111
def get_page():
12+
global url
1213
url = input('Enter url of a medium article: ')
1314
# handling possible error
1415
if not re.match(r'https?://medium.com/',url):
@@ -17,7 +18,7 @@ def get_page():
1718
res = requests.get(url)
1819
res.raise_for_status()
1920
soup = BeautifulSoup(res.text, 'html.parser')
20-
return url, soup
21+
return soup
2122

2223
# function to remove all the html tags and replace some with specific strings
2324
def purify(text):
@@ -29,10 +30,12 @@ def purify(text):
2930
return text
3031

3132
# function to compile all of the scraped text in one string
32-
def collect_text(url, soup):
33+
def collect_text(soup):
3334
fin = f'url: {url}\n\n'
3435
main = (soup.head.title.text).split('|')
35-
fin += f'Title: {main[0].strip().upper()}\n{main[1].strip()}'
36+
global title
37+
title = main[0].strip()
38+
fin += f'Title: {title.upper()}\n{main[1].strip()}'
3639

3740
header = soup.find_all('h1')
3841
j = 1
@@ -55,12 +58,14 @@ def collect_text(url, soup):
5558

5659
# function to save file in the current directory
5760
def save_file(fin):
58-
with open('scraped_article.txt', 'w', encoding='utf8') as outfile:
61+
if not os.path.exists('./scraped_articles'):
62+
os.mkdir('./scraped_articles')
63+
fname = './scraped_articles/' + '_'.join(title.split()) + '.txt'
64+
with open(fname, 'w', encoding='utf8') as outfile:
5965
outfile.write(fin)
60-
print('File saved in current directory as scraped_article.txt')
66+
print(f'File saved in directory {fname}')
6167

6268
# driver code
6369
if __name__ == '__main__':
64-
url, soup = get_page()
65-
fin = collect_text(url, soup)
70+
fin = collect_text(get_page())
6671
save_file(fin)

0 commit comments

Comments
 (0)