Jump to content
YOUR-AD-HERE
HOSTING
TOOLS
992Proxy

Locked Web Crawler


sQuo

Recommended Posts

Web Crawler

Author: Dragunov D.A.

 

[HIDE-THANKS][LENGUAJE=python]

Web Crawler

 

#!/usr/bin/env python

# -*- coding: utf-8 -*-

########################################################################

# author: Dragunov D.A.

# site:

This is the hidden content, please

# group:

This is the hidden content, please

# paste:

This is the hidden content, please

# twitter:

This is the hidden content, please

# IRC: irc.freenode.net:6667 #KeyViewer

# FB:

This is the hidden content, please

# ICQ: 648064213

# E-mail: [email protected]

########################################################################

# This program is free software; you can redistribute it and/or modify

# it under the terms of the GNU General Public License as published by

# the Free Software Foundation; either version 2 of the License, or

# (at your option) any later version.

# This program is distributed in the hope that it will be useful,

# but WITHOUT ANY WARRANTY; without even the implied warranty of

# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the

# GNU General Public License for more details.

# You should have received a copy of the GNU General Public License

# along with this program; if not, write to the Free Software

# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,

# MA 02110-1301, USA.

########################################################################

import re, urllib

links = []

myurl = raw_input("\nWeb Crawler v1.0\nType your URL including 'http://' or 'ftp://'\nURL: ")

mylog = raw_input("Type your log file name: ")

log = open(mylog + '.txt','w+')

log.write(myurl + '\n')

def crawler(url):

print ''

links.append(url)

for i in re.findall('''href=["'](?!javascript:)(.[^"']+)["']''', urllib.urlopen(url).read(), re.I):

check = re.match('(?!http|ftp)', i);

if(check):

i = myurl + i

print i

links.append(i)

try:

crawler(myurl)

links = list(set(links))

links.sort()

for n in links:

log.write(n +'\n')

log.close()

print "\n%i crawled links\nURL: %s" % (len(links)+1, myurl)

except:

print "Error crawling: " + myurl

[/LENGUAJE][/HIDE-THANKS]

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.