Home | Quick Start | How To | Download | Support | Purchase | Other Tools | Links
I want to extract url of
travel related companies.
I want to extract
url of
travel related companies of Australia.
I want to extract all
pages from a
website http://www.mydomain.com
I have a list of urls in a file and I
want to extract data from those urls.
I want to build a domain list of health/medicine related
websites.
(1) I want to extract url, meta tag of travel related companies.
Go to New Session Dialog
Select "Source = Search Engines"
Enter travel in Keyword Box
Select "Save Data" folder , i.e. where program will save the data
Select Save Format - CSV or line by line
Click OK button
(2) I want to extract url, meta tag data of travel related companies of Australia.
Repeat (1) but select "Engine = Australia" from Engine Listing Dialog. You can lunch this dialog by clicking "Engines" button of New Session - General Tab.
By default US/International Engines are selected.
(3) I want to extract all url, meta tag data from a web site.
Go to New Session Dialog
Select "Source = WebSite"
Enter website URL in Starting Address box: like http://www.mydomain.com
Select depth = 0 (to spider entire website , see more about depth here)
Select "Save Data" folder , i.e. where program will save the data
Select Save Format - CSV or line by line
Click OK button
(4) I want to collect all photographers web site url, data from yahoo dir Photographers to build a photographer directory.
Go to New Session Dialog
Select "Source = WebSite"
Enter website URL in Starting Address box: like http://dir.yahoo.com/Arts/Visual_Arts/Photography/Photographers/
Select depth = 0 ; Check "Stay within Full URL"
These 2 combination tells program to process entire photographers dir but not other
part of yahoo dir.
Select "Save Data" folder , i.e. where program will save the data
Select Save Format - CSV or line by line
Now go to External Site tab - select "Follow External URLs" - Select "Process 1 Page Only"
Now back to General tab and Click OK button
(5) I have a list of urls in a file and I want to extract data from those urls.
Go to New Session Dialog
Select "Source = URLs from File"
Enter url file path in File name box. This file must be plain text file with one URL per line and starting with http:// string each line.
Select Depth = 0 for entire website extraction of each website located in the text file or select "process 1 page only" to spider only the specified url.
Select "Save Data" folder , i.e. where program will save the data
Select Save Format - CSV or line by line
Click OK button
(6) I want to compile a list of offshore, banking, tax related websites that do link exchange with other sites.
Go to New Session Dialog
Select "Source = Search Engines"
Generate Keywords using following 2 lists:
| offshore banking tax accounting |
link exchange trade links swap link add url |
Select "Save Data" folder , i.e. where program will save the data
Select Save Format - CSV
Now go to External Site Tab. Select "Follow External URL". Select "Process 1 Page Only". Select "Spider Base URL only"
Now go to Filters - Text Filters tab. Check "page must contain following text" . Enter following string in the box:
links.htm
link.htm
resource.htm
add url
submit url
add your site
submit your site
So that program will extract data from only those websites who do link exchange or
add urls to their directories.
Now back to General tab and Click OK button.
After extraction completed, go to Data Tab - Meta Tag list. These are the related sites that do link exchange with other sites.
(7) I want to build a domain list of health/medicine related websites.
Go to New Session Dialog
Select "Source = Search Engines"
Enter following Keywords:
health
medicine
so on...
Select Extract URL (select Base URL)
Select "Save Data" folder , i.e. where program will save the data
Select Save Format - line by line
Click OK button
(8) I have url list in a SQL database. I want to extract url, title, description, keyword, plain page text of html <BODY> to </BODY> and merge them into database.
"Win Web Crawler" can not access SQL database. You need to export url list from SQL database to a plain text disk file, and use this file in "Win Web Crawler".
Go to New Session Dialog
Select "Source = URLs from File"
Enter url file path in File name box. This file must be plain text file with one URL per line and starting with http:// string each line.
Select "process 1 page only" to extract meta tag of specified root domain. If you need to extract meta tag of ALL pages of each website then select depth=0
Select Extract Meta tag, Extract Body (you can set text size limit by clicking ... button)
Select "Save Data" folder , i.e. where program will save the data
Select Save Format - CSV
Uncheck "View - Display data in data tab" for very large URL Meta tag extraction, so that "Win Web Crawler" will not display data within program but will write directly to disk file - this will surely increase program performance.
Click OK button
After extraction completed, you can import this csv file (metatag.txt) to SQL database and do further processing, query, etc...