Skip to content

Commit d2c0766

Browse files
committed
implemented the flask api module and fixed some errors
1 parent d27a40e commit d2c0766

File tree

5 files changed

+119
-8
lines changed

5 files changed

+119
-8
lines changed

README.md

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,6 @@ The results will be saved within a MongoDB collection. Another script will be tr
2020

2121
The second step results in new collections where each keyword gets one new collection and all found pastebin entries will be copied there.
2222

23-
The third step might be the accumulator. It identifies specific words (similiar to step 2) but also specials like eMail addresses, Bitcoin addresses, URLs, IP Adresses etc.
24-
2523
### 1. pastebin_scrape.py
2624

2725
For this one to work good you need an API key. I bought a lifetime access to the pastebin API a while ago for 29,99 USD. It doesn't make you poor.
@@ -32,7 +30,7 @@ You will also need to update your Scraping IP, in order to make it work: [Change
3230
-db 1 \ # save to DB (without this, nothing will be saved)
3331
-api <YOUR_PASTE_BIN_API_KEY> \
3432
-mongodbhost <mongo_db_hostname> \ # default: localhost
35-
-mongodbport <mongo_db_port> \ # default: 27017
33+
-mongodbport <mongo_db_port> # default: 27017
3634

3735
### 2. pastebin_analyze.py
3836

@@ -42,19 +40,20 @@ When you have finished, start the analze module:
4240

4341
python pastebin_analyze.py -f <path_to_keyword_file> \
4442
-mongodbhost <mongo_db_hostname> \ # default: localhost
45-
-mongodbport <mongo_db_port< # default: 27017
43+
-mongodbport <mongo_db_port> # default: 27017
4644
4745
Finally it will create collections for all of the keywords it found and copy the pastebin into that collection. There might also be empty collections. Sometimes you
48-
just can't find anything you are searching for :-()
49-
### 3. pastebin_accumulate.py
46+
just can't find anything you are searching for.
5047

5148
### Access Data via Flask API
5249

5350
Finally you can either write yourself a clean data retriever or you can use this Flask API implementation here:
5451

5552
```
56-
# start it in debug and verbose mode first!
57-
python pastebin_api.py -d -v
53+
# start it in debug mode first!
54+
python pastebin_api.py -d \
55+
-mongodbhost <mongo_db_hostname> \
56+
-mongodbport <mongo_db_port>
5857
```
5958

6059
Well there is only one API method. Grab yourself a browser or use curl:

keywords_example.txt

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
ip
2+
malware
3+
glock
4+
android
5+
ios
6+
lenovo
7+

pastebin_analyze.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,13 @@
1313

1414

1515
def main():
16+
"""
17+
starts the entire process of analyzation by creating new collections and appending new documents into existing
18+
collections based on keywords.
19+
20+
:return:
21+
"""
22+
1623
client = MongoClient(str(args['mongodbhost']), int(args['mongodbport']))
1724
db = client.scrape
1825
logger.info("MongoDB Connection created")

pastebin_api.py

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
from flask import Flask, jsonify, make_response
2+
from pymongo import MongoClient
3+
from bson import json_util
4+
5+
import argparse
6+
import json
7+
8+
app = Flask(__name__)
9+
10+
api_version = "1.0"
11+
12+
13+
@app.errorhandler(404)
14+
def not_found(error):
15+
"""
16+
some standard error handling for unknown pages.
17+
18+
:param error:
19+
:return:
20+
"""
21+
return make_response(jsonify({'error': 'Notfound'}), 404)
22+
23+
24+
@app.route('/')
25+
def get_index():
26+
"""
27+
standard output when nothing is set
28+
:return:
29+
"""
30+
31+
basic_info = [
32+
{
33+
'api': '1.0',
34+
'name': 'PastebinPython Flask Accessing API',
35+
'author': 'Andre Fritsche / ihgalis'
36+
}
37+
]
38+
39+
return jsonify({'basic_info': basic_info})
40+
41+
42+
@app.route('/api/getpastebins/<string:keyword>', methods=['GET'])
43+
def get_pastebins(keyword):
44+
"""
45+
method gets all documents related to the specified keyword. It accesses the corresponding collections so you will
46+
always get only the documents that have been identified by the pastebin_analyze.py script.
47+
48+
:param keyword: string
49+
:return: JSON based dictionary
50+
"""
51+
52+
client = MongoClient(str(args['mongodbhost']), int(args['mongodbport']))
53+
db = client.scrape
54+
55+
tlist = list()
56+
57+
dbcursor = db[keyword].find({})
58+
for document in dbcursor:
59+
sanitized = json.loads(json_util.dumps(document))
60+
tlist.append(sanitized)
61+
62+
return jsonify(tlist)
63+
64+
65+
if __name__ == "__main__":
66+
parser = argparse.ArgumentParser(description="PastebinPython Flask Accessing API")
67+
68+
parser.add_argument('-mongodbhost',
69+
help="A string with the URL to your MongoDB Server.",
70+
default="localhost",
71+
required=True)
72+
73+
parser.add_argument('-mongodbport',
74+
help="THe port to which your MongoDB listens.",
75+
default=27017,
76+
required=True)
77+
78+
parser.add_argument('-d',
79+
action="store_true",
80+
help="Debug in Flask active or not.",
81+
default=0)
82+
83+
args = vars(parser.parse_args())
84+
85+
if args['d']:
86+
app.run(debug=True)
87+
else:
88+
app.run(debug=False)

requirements.txt

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
pymongo
2+
argparse
3+
logging
4+
re
5+
time
6+
sys
7+
pprint
8+
flask
9+
bson
10+
requests

0 commit comments

Comments
 (0)