Proven uses of Win Web Crawler
1. I want to extract url, meta tag of travel related companies
- Go to New Session Dialog
- Select "Source = Search Engines"
- Enter travel in Keyword Box
- Select "Save Data" folder , i.e. where program will save the data
- Select Save Format - CSV or line by line
- Click OK button
2. I want to extract url, meta tag data of travel related companies of Australia
- Repeat (1) but select "Engine = Australia" from Engine Listing Dialog. You can lunch this dialog by clicking "Engines" button of New Session - General Tab.
- By default US/International Engines are selected.
3. I want to extract all url, meta tag data from a web site
- Go to New Session Dialog
- Select "Source = WebSite"
- Enter website URL in Starting Address box: like http://www.mydomain.com
- Select depth = 0 (to spider entire website , see more about depth here)
- Select "Save Data" folder , i.e. where program will save the data
- Select Save Format - CSV or line by line
- Click OK button
4. I want to collect all photographers web site url, data
from yahoo dir Photographers to build a photographer directory
- Go to New Session Dialog
- Select "Source = WebSite"
- Enter website URL in Starting Address box:
like http://dir.yahoo.com/Arts/Visual_Arts/Photography/Photographers/
- Select depth = 0 ; Check "Stay within Full URL"
- These 2 combination tells program to process entire photographers dir but not other part of yahoo dir.
- Select "Save Data" folder , i.e. where program will save the data
- Select Save Format - CSV or line by line
- Now go to External Site tab - select "Follow External URLs" - Select "Process 1 Page Only"
- Now back to General tab and Click OK button
5. I have a list of urls in a file and I want to extract data from those urls
- Go to New Session Dialog
- Select "Source = URLs from File"
- Enter url file path in File name box. This file must be plain text file with one URL per line and starting with http:// string each line.
- Select Depth = 0 for entire website extraction of each website located in the text file or select "process 1 page only" to spider only the specified url.
- Select "Save Data" folder , i.e. where program will save the data
- Select Save Format - CSV or line by line
- Click OK button
6. I want to compile a list of offshore, banking, tax related websites
that do link exchange with other sites
- Go to New Session Dialog
- Select "Source = Search Engines"
- Generate Keywords using following 2 lists:
- offshore banking tax accounting
- link exchange trade links swap link add url
- Select "Save Data" folder , i.e. where program will save the data
- Select Save Format - CSV
- Now go to External Site Tab. Select "Follow External URL". Select "Process 1 Page Only".
- Select "Spider Base URL only"
- Now go to Filters - Text Filters tab. Check "page must contain following text".
- Enter following string in the box:
- links.htm
- link.htm
- resource.htm
- add url
- submit url
- add your site
- submit your site
- So that program will extract data from only those websites who do link exchange or add urls to their directories.
- Now back to General tab and Click OK button.
- After extraction completed, go to Data Tab - Meta Tag list. These are the related sites that do link exchange with other sites.
7. I want to build a domain list of health/medicine related websites
- Go to New Session Dialog
- Select "Source = Search Engines"
- Enter following Keywords:
- Select Extract URL (select Base URL)
- Select "Save Data" folder , i.e. where program will save the data
- Select Save Format - line by line
- Click OK button
8. I have url list in a SQL database. I want to extract url, title, description,
keyword, plain page text of html <BODY> to </BODY>
and merge them into database.
- "Win Web Crawler" can not access SQL database. You need to export url list from SQL database to a plain text disk file, and use this file in "Win Web Crawler".
- Go to New Session Dialog
- Select "Source = URLs from File"
- Enter url file path in File name box. This file must be plain text file with one URL per line and starting with http:// string each line.
- Select "process 1 page only" to extract meta tag of specified root domain.
- If you need to extract meta tag of ALL pages of each website then select depth=0
- Select Extract Meta tag, Extract Body (you can set text size limit by clicking ... button)
- Select "Save Data" folder , i.e. where program will save the data
- Select Save Format - CSV
- Uncheck "View - Display data in data tab" for very large URL Meta tag extraction, so that
- "Win Web Crawler" will not display data within program but will write directly to disk file - this will surely increase program performance.
- Click OK button
- After extraction completed, you can import this csv file (metatag.txt) to SQL database and do further processing, query, etc...