nidBox

note: MooseFS

2011-03-01 10:20


/var/log/syslog 的訊息

ref: http://www.moosefs.org/moosefs-faq.html#writeworker

What does 'file: NNN, index: NNN, chunk: NNN, version: NNN - writeworker: connection with (XXXXXXXX:PPPP) was timed out (unfinished writes: Y; try counter: Z)' message mean?

 

This means that Zth try to write the chunk was not successful and writing of Y blocks sent to the chunkserver were not confirmed. After reconnecting these blocks would be sent again for saving. The limit of trials is set by default to 30.

This message is for informational purposes and doesn't mean data loss.


實際的訊息,如:

Mar  1 05:31:11 zoo1 mfsmount[12752]: file: 1620078, index: 0, chunk: 2058807, version: 1 - writeworker: connectio
n with (C0A801C8:9422) was timed out (unfinished writes: 2; try counter: 1)
Mar  1 09:54:19 zoo1 mfsmount[12752]: file: 1620999, index: 0, chunk: 2059631, version: 1 - writeworker: connectio
n with (C0A801C8:9422) was timed out (unfinished writes: 2; try counter: 1)


What does 'file: NNN, index: NNN, chunk: NNN, version: NNN, cs: XXXXXXXX:PPPP - readblock error (try counter: Z)' message mean?

 

This means that Zth try to read the chunk was not successful and system will try to read the block again. If value of Z equals 1 it is a transitory problem and you should not worry about it. The limit of trials is set by default to 30.


XXXXXXXX is a chunkserver IP written in hexadecimal format. This can be helpful for determining if there is an issue with a chunkserver. When messages appear more often from one chunkserver than from the others it may mean there are issues with this chunkserver – check its charts, hard disk operation times, etc. in the CGI monitor.

 
 



明明 fuse 已經安裝,版本也很新
但 mfs configure 時仍出現
..
..
checking for FUSE... no
******************************** mfsmount disabled ********************************
* fuse library is too old or not installed - mfsmount needs version 2.6 or higher *
***********************************************************************************
..
 

解決辦法

fuse 重裝

./configure --prefix=/usr

(不要用 /configure --prefix=/usr/ocal/fuse  )





chunk data 的大小
最小是  70656Bytes (另外附加 4+kb 的 header)


chunk replication 的速度很慢
每分鐘只有  60~65個 chunk data
若同時掛上多台 chunk server,每一台的複製速度也是 60~65/per min



2012-01-16
這裡有提到
http://www.moosefs.org/moosefs-faq.html#rebalancing-speed

Our experiences from working in a production environment have shown that aggressive replication is not desirable as it can substantially slow down the whole system. The overall performance of the system is more important than equal utilization of hard drives over all of the chunk servers.. By default replication is configured to be a non-aggressive operation. At our environment normally it takes 3-4 weeks for a new chunkserver to get to a standard hdd utilization. Aggressive replication would make the whole system considerably slow for several days.

Replication speeds can be adjusted on master server startup by setting these two options:
CHUNKS_WRITE_REP_LIMIT
how many chunks may be saved in parallel on one chunkserver while replicating (by default 1).
CHUNKS_READ_REP_LIMIT
how many chunks may be read in parallel from one chunkserver while replicating (by default 5).

Tuning these in your environment will require experimentation. When adding a new chunkserver, try setting the first option to 5 and the second to 15 and restart the master server. After replication finishes you should restore these settings back to their default values and again restart the master server.

mfsmaster.cfg
CHUNKS_WRITE_REP_LIMIT =5  (default 1)
CHUNKS_READ_REP_LIMIT =15  (default 5)