網頁亂抓一通 (通常是都對岸的)
這種 robot 會有的特徵:
- 不以真實的 User Agent Name 出現,常偽裝成 IE,如 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
- 同一秒鐘抓取數個頁面,或以 1~3秒的速度抓網頁 (不符合正常人點閱習慣,像趨勢Trend 放出來的 robot 就是這樣..很討厭)
- <還沒想到>
- <還沒想到>
偵測方式:
簡單的方法,在畫面中放個隱藏連結 (就是一般網友不會看到、不會按到的連結)
但 robot 會去爬這個連結
例如:
<a href="detect.php">.</a>
會執行到 detect.php 的來源,8、9 成就是 robot
最近測到的 robot 來源
150.70.64.199 wtp-gb-vvs3.sjdc
150.70.75.33 wtp-g3-maya7.sjdc
150.70.97.36 wtp-gs-maya1.sjdc
150.70.172.107 iad1-wtp-gd-maya4.sdi.trendnet.org = 查 whois 資料是 趨勢科技 / Trend Micro, Inc / [email protected]
都是偽裝成 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
同一秒會抓 2,3頁、或間隔 1~3秒連續抓網頁
IP 反查資料
ccc@localbox:/tmp$ nslookup 150.70.75.33
Server: 168.95.1.1
Address: 168.95.1.1#53
Non-authoritative answer:
33.75.70.150.in-addr.arpa name = wtp-g3-maya7.sjdc.
Authoritative answers can be found from:
whois 資料
ccc@localbox:/tmp$ whois 150.70.75.33
% [whois.apnic.net node-5]
% Whois data copyright terms http://www.apnic.net/db/dbcopyright.html
inetnum: 150.26.0.0 - 150.100.255.255
netname: JAPAN150
country: JP
descr: Japan Network Information Center
admin-c: JNIC1-AP
tech-c: JNIC1-AP
status: ALLOCATED PORTABLE
notify: [email protected]
mnt-by: MAINT-JPNIC
changed: [email protected] 20070824
source: APNIC
role: Japan Network Information Center
address: Urbannet-Kanda Bldg 4F
address: 3-6-2 Uchi-Kanda
address: Chiyoda-ku, Tokyo 101-0047,Japan
country: JP
phone: +81-3-5297-2311
fax-no: +81-3-5297-2312
e-mail: [email protected]
admin-c: JI13-AP
tech-c: JE53-AP
nic-hdl: JNIC1-AP
mnt-by: MAINT-JPNIC
changed: [email protected] 20041222
changed: [email protected] 20050324
changed: [email protected] 20051027
changed: [email protected] 20120828
source: APNIC
ccc@localbox:/tmp$ traceroute 150.70.75.33
traceroute to 150.70.75.33 (150.70.75.33), 30 hops max, 38 byte packets
1 192.168.0.2 (192.168.0.2) 1.866 ms 0.645 ms 0.602 ms
2 h254.s98.ts.hinet.net (168.95.98.254) 7.660 ms 7.384 ms 7.412 ms
3 TPE4-3302.hinet.net (168.95.101.202) 6.836 ms 7.107 ms 7.233 ms
4 TPE4-3201.hinet.net (220.128.5.170) 16.367 ms TPE4-3202.hinet.net (220.128.5.26) 7.739 ms 7.522 ms
5 TPDT-3012.hinet.net (220.128.2.170) 8.101 ms TPDT-3012.hinet.net (220.128.2.110) 7.977 ms 8.100 ms
6 TPDT-4101.hinet.net (220.128.7.201) 24.226 ms 7.017 ms 7.466 ms
7 r4001-s2.tp.hinet.net (220.128.7.205) 7.264 ms r4001-s2.tp.hinet.net (220.128.7.213) 7.628 ms r4001-s2.tp.hi
net.net (220.128.7.217) 7.668 ms
8 r01-pa.us.hinet.net (211.72.108.237) 142.177 ms 141.997 ms r01-pa.us.hinet.net (211.72.108.197) 139.601 ms
9 12.118.116.13 (12.118.116.13) 147.940 ms 145.333 ms 12.118.116.73 (12.118.116.73) 144.026 ms
10 cr83.sffca.ip.att.net (12.122.137.78) 155.294 ms 152.283 ms 148.160 ms
11 cr1.sffca.ip.att.net (12.123.15.109) 142.225 ms 142.207 ms 146.168 ms
12 ggr4.sffca.ip.att.net (12.122.86.197) 197.720 ms 144.424 ms 143.970 ms
13 ae-8.r06.snjsca04.us.bb.gin.ntt.net (129.250.8.241) 144.458 ms 147.389 ms 145.427 ms
14 ae-6.r20.snjsca04.us.bb.gin.ntt.net (129.250.5.12) 144.301 ms 141.917 ms 144.441 ms
15 ae-2.r20.mlpsca01.us.bb.gin.ntt.net (129.250.5.6) 145.980 ms 143.420 ms 145.698 ms
16 * * *
17 d1-1-3-0-8.a02.mlpsca01.us.ce.verio.net (131.103.120.46) 145.146 ms 148.968 ms 157.842 ms
18 216.99.143.93 (216.99.143.93) 145.407 ms 143.632 ms 144.931 ms
19 216.99.143.117 (216.99.143.117) 149.505 ms 142.684 ms 152.196 ms
20 150.70.75.33 (150.70.75.33) 142.761 ms 146.625 ms 150.916 ms