Win Web Crawler - Proven uses of Win Web Crawler

Home | Quick Start | How To | Download | Support | Purchase | Other Tools | Links


  I want to extract url of travel related companies.

I want to extract url of travel related companies of Australia.

I want to extract all pages from a website http://www.mydomain.com

I want to collect all photographers web site url, data from yahoo dir Photographers to build a photographer directory.

I have a list of urls in a file and I want to extract data from those urls.

I want to compile a list of offshore, banking, tax, accounting related websites that do link exchange with other sites.

I want to build a domain list of health/medicine related websites.

I have url list in a SQL database. I want to extract url, title, description, keyword, plain page text of html <BODY> to </BODY> and merge them into database. 


(1) I want to extract url, meta tag of travel related companies.

Go to New Session Dialog

Select "Source = Search Engines"

Enter travel in Keyword Box

Select "Save Data" folder , i.e. where program will save the data

Select Save Format - CSV or line by line 

Click OK button


(2) I want to extract url, meta tag data of travel related companies of Australia.

Repeat (1) but select "Engine = Australia" from Engine Listing Dialog. You can lunch this dialog by clicking "Engines" button of New Session - General Tab.

By default US/International Engines are selected.


(3) I want to extract all url, meta tag data from a web site.

Go to New Session Dialog

Select "Source = WebSite"

Enter website URL in Starting Address box: like http://www.mydomain.com

Select depth = 0 (to spider entire website , see more about depth here)

Select "Save Data" folder , i.e. where program will save the data

Select Save Format - CSV or line by line 

Click OK button


(4) I want to collect all photographers web site url, data from yahoo dir Photographers to build a photographer directory.

Go to New Session Dialog

Select "Source = WebSite"

Enter website URL in Starting Address box: like http://dir.yahoo.com/Arts/Visual_Arts/Photography/Photographers/

Select depth = 0 ; Check "Stay within Full URL"
These 2 combination tells program to process entire photographers dir but not other part of yahoo dir.

Select "Save Data" folder , i.e. where program will save the data

Select Save Format - CSV or line by line 

Now go to External Site tab - select "Follow External URLs" - Select "Process 1 Page Only"

Now back to General tab and Click OK button


(5) I have a list of urls in a file and I want to extract data from those urls.

Go to New Session Dialog

Select "Source = URLs from File"

Enter url file path in File name box. This file must be plain text file with one URL per line and starting with http:// string each line.

Select Depth = 0 for entire website extraction of each website located in the text file  or select "process 1 page only" to spider only the specified url.

Select "Save Data" folder , i.e. where program will save the data

Select Save Format - CSV or line by line 

Click OK button

 


(6) I want to compile a list of offshore, banking, tax related websites that do link exchange with other sites.

Go to New Session Dialog

Select "Source = Search Engines"

Generate Keywords using following 2 lists:

offshore
banking
tax
accounting
link exchange
trade links
swap link
add url

 

Select "Save Data" folder , i.e. where program will save the data

Select Save Format - CSV  

Now go to External Site Tab. Select "Follow External URL". Select "Process 1 Page Only". Select "Spider Base URL only"

Now go to Filters - Text Filters tab. Check "page must contain following text" . Enter following string in the box:

links.htm
link.htm
resource.htm
add url
submit url
add your site
submit your site

So that program will extract data from only those websites who do link exchange or add urls to their directories.

 

Now back to General tab and Click OK button.

After extraction completed, go to Data Tab - Meta Tag list. These are the related sites that do link exchange with other sites.


(7) I want to build a domain list of health/medicine related websites.

Go to New Session Dialog

Select "Source = Search Engines"

Enter following Keywords:
health
medicine
so on...

Select Extract URL (select Base URL)

Select "Save Data" folder , i.e. where program will save the data

Select Save Format - line by line 

Click OK button


(8) I have url list in a SQL database. I want to extract url, title, description, keyword, plain page text of html <BODY> to </BODY> and merge them into database. 

"Win Web Crawler" can not access SQL database. You need to export url list from SQL database to a plain text disk file, and use this file in "Win Web Crawler".

Go to New Session Dialog

Select "Source = URLs from File"

Enter url file path in File name box. This file must be plain text file with one URL per line and starting with http:// string each line.

Select "process 1 page only" to extract meta tag of specified root domain.  If you need to extract meta tag of ALL pages of each website then select depth=0

Select Extract Meta tag, Extract Body (you can set text size limit by clicking ... button)

Select "Save Data" folder , i.e. where program will save the data

Select Save Format - CSV 

Uncheck "View - Display data in data tab" for very large URL Meta tag extraction, so that "Win Web Crawler" will not display data within program but will write directly to disk file - this will surely increase program performance.

Click OK button

After extraction completed, you can import this csv file (metatag.txt) to SQL database and do further processing, query, etc...