moosefs 3.0 筆記

2018081013:08
使用環境 
moosefs  3.0


chunkserver 原有兩個硬碟滿了
當裝了新硬碟後
資料會自動重新分配、平均分散到三顆硬碟中 (叫做 internal rebalance )
最後三個硬碟內的空間使用率 會很接近
  ==>注意:不是讓所有硬碟的 chunks 數目平均、也不是讓個硬碟的使用量(GB數目)平均
  ==>而是讓各硬碟的使用百分比 平均

例如原先兩顆硬碟
/dev/sdb2 720G  舊硬碟
/dev/sda2 720G  舊硬碟
/dev/sdf1  1.8T   新加入

當做完 rebalance 後

$ du -h
檔案系統        容量    已用  可用    已用% 掛載點
:  :  :
/dev/sdb2       720G  332G  352G   49% /mnt/sdc2    <--舊硬碟
/dev/sda2       720G  332G  353G   49% /mnt/sda2    <--舊硬碟
/dev/sdf1       1.8T  833G  909G   48% /mnt/disk31  <--新加入的硬碟


所以,新加入的硬碟若跟舊硬碟的容量差異過大
chunk data 的搬移就很可觀,容量的分配也很奇怪




在 rebalance 過程中,可以看到 /var/log/message 裏頭一堆搬移紀錄:
Aug 10 11:15:54 box204 mfschunkserver[2824]: move chunk /mnt/sda2/mfschunks/ -> /mnt/disk31/mfschunks/ (chunk: 00000000000003F2_00000001)
Aug 10 11:15:54 box204 mfschunkserver[2824]: move chunk /mnt/sdc2/mfschunks/ -> /mnt/disk31/mfschunks/ (chunk: 0000000000009003_00000001)
Aug 10 11:15:54 box204 mfschunkserver[2824]: move chunk /mnt/sda2/mfschunks/ -> /mnt/disk31/mfschunks/ (chunk: 0000000000001E2A_00000001)
Aug 10 11:15:57 box204 mfschunkserver[2824]: move chunk /mnt/sdc2/mfschunks/ -> /mnt/disk31/mfschunks/ (chunk: 0000000000021C9D_00000001)
Aug 10 11:15:57 box204 mfschunkserver[2824]: move chunk /mnt/sda2/mfschunks/ -> /mnt/disk31/mfschunks/ (chunk: 0000000000003EF6_00000001)



 

文件提到
4.2.8 Chunkserver states
Chunkserver can work in 3 states:
normal, overloaded and (since MooseFS 3.0.62) internalrebalance

• Normal
state is a standard state. In ”Servers” CGI tab you can see load as a normal number, e.g.: 7

•Internal rebalance
state is a special Chunkserver state. It is activated when e.g. you
add a new, empty HDD to a Chunkserver. Then Chunkserver enters this special mode
and rebalances chunks between all HDDs to make all HDDs utilization as close to equal
as possible. In ”Servers” CGI tab you can see load as number in round brackets, e.g.: (7).

•Overloaded
is a special, temporary
Chunkserver state. It is activated when Chunkserver
load is high and Chunkserver is not able to perform more operations at the moment. In
such case, Chunkserver sends an information to Master Server that it is overloaded. If
the load lowers to the normal level, Chunkserver sends an information to Master Server,
that it is not overloaded any more. In ”Servers” CGI tab you can see load as a number
in square brackets, e.g.:



Master 的 RAM 需求量:

The most important factor in sizing requirements for the Master Server machine is RAM, as
the full file system structure is cached in RAM for speed. The Master Server should have
approximately 300-350 MiB of RAM allocated to handle 1 million objects (files, directories,
pipes, sockets, ...)



Chunkserver 的 RAM 需求量:

MooseFS Chunkserver uses approximately 250 MiB of RAM allocated to handle 1 million chunks.
 

Metalogger 的 RAM 需求量:

metalogger 只是定時做 metadata 備份,所以對硬體需求不高

MooseFS metalogger simply gathers metadata backups from the MooseFS Master Server – so
the hardware requirements are not higher than for the Master Server itself; it needs about the
same disk space. Similarly to the Master Server – the OS has to be POSIX compliant (Linux,
FreeBSD, Mac OS X, OpenSolaris, etc.).
MooseFS Metalogger should have at least the same amount of HDD space
(especially the free space in /var/lib/mfs ) as the main Master Server.
If you would like to use the Metalogger as a Master Server in case of the main Master’s failure,
the Metalogger machine should have at least the same amount of RAM as the main Master
Server




刪檔案



MooseFS does not immediately erase files on deletion, to allow you to revert the delete operation. Deleted files are kept in the trash bin for the configured amount of time (default: 24h / 86400 seconds) before they are deleted.
You can configure for how long files are kept in trash and empty the trash manually (to release the space).


要用程式設定

$ mfsgettrashtime  /home/mfsmount
.: 86400

$
mfssettrashtime -r 3600 deldolder/
deltest/:
 inodes with trashtime changed:              0
 inodes with trashtime not changed:      41576
 inodes with permission denied:           3744

特別注意! 雖然指定的是「秒數」但 mfs 系統仍回無條件進位為「小時」
換句話說 mfssettrashtime -r 60 deldolder/ 這樣的話
系統仍設為 1小時
若設 3601 那就等於 2 小時 (無條件進位)

也可以用
mfssettrashtime -r 3h deldolder/


 

 

救回已刪除檔案


** 必須在 moosefs server 從垃圾桶刪除前 才能救回,通常是 86400秒以內


$ mfsmount -m /mnt/mfstrash -H mfsmaster
or
$ mfsmount -o mfsmeta -H mfsmaster /mnt/mfstrash

$ ls -l /mnt/mfstrash
total 0
dr-x------.    2 root root 0 Aug 13 15:47 sustained
drwx------. 4099 root root 0 Aug 13 15:47 trash

$ ls -l /mnt/mfstrash/trash/
total 0
drwx------.    3 root root 0 Aug 13 15:49 000
drwx------.    3 root root 0 Aug 13 15:49 001
drwx------.    3 root root 0 Aug 13 15:49 002
drwx------.    3 root root 0 Aug 13 15:49 003
drwx------.    3 root root 0 Aug 13 15:49 004
drwx------.    3 root root 0 Aug 13 15:49 005
drwx------.    3 root root 0 Aug 13 15:49 006
drwx------.    3 root root 0 Aug 13 15:49 007

$ ls -l /mnt/mfstrash/trash/000
total 297
-rw-rw-r--.    1 root  root   15265 Aug  8 13:54 00013000|htdocs|m|home_top_19.jpg
-rw-rw-r--.    1 root  root  213581 Aug  8 14:05 00014000|htdocs|a|2020|p7e02o_6e8bdbsc.jpg
-rw-rw-r--.    1 root  root    6144 Aug  8 14:15 00016000|htdocs|img|calendar|Thumbs.db
-rw-rw-r--.    1 root  root       0 Aug  8 14:19 00017000|htdocs|img|object|Thumbs.db
-rw-rw-r--.    1 root  root     993 Aug  8 14:20 00018000|htdocs|js|ui|themes|ui.resizable.css
d-w-------. 4098 root root      0 Aug 13 15:49 undel


救回檔案的方式
就是把上面 000* 檔案移到 undel 目錄即可

例如
$ mv /mnt/mfstrash/trash/000/000* mfstrash/trash/000/undel

但麻煩的是:你不知道被刪除的檔案 是放在 000 ~ FFF 中的哪個資料夾??!!



用 find 搜尋,如
$ find -name home_top_19.jpg


相關命令


環境 4台 chunkserver:
192.168.0.203

192.168.0.204
192.168.0.206
192.168.0.239


$ mfsgetgoal  abc.txt
abc.txt: 3

查看一個檔案被分散儲存在哪幾台chunk server
$ /usr/bin/mfsfileinfo abc.txt  (filesize 25MB)
abc.txt:
        chunk 0: 000000000000A118_00000001 / (id:41240 ver:1)
                copy 1: 192.168.0.203:9422 (status:VALID)
                copy 2: 192.168.0.204:9422 (status:VALID)
                copy 3: 192.168.0.206:9422 (status:VALID)


查看一個檔案被分散儲存在哪幾台chunk server
$ /usr/bin/mfsfileinfo a20.img.gz   (filesize 130MB)
a20.img.gz:


        chunk 0: 0000000000038FB6_00000001 / (id:233398 ver:1)
                copy 1: 192.168.0.204:9422 (status:VALID)
                copy 2: 192.168.0.206:9422 (status:VALID)
                copy 3: 192.168.0.239:9422 (status:VALID)
        chunk 1: 0000000000038FB7_00000001 / (id:233399 ver:1)
                copy 1: 192.168.0.203:9422 (status:VALID)
                copy 2: 192.168.0.204:9422 (status:VALID)
                copy 3: 192.168.0.206:9422 (status:VALID)
        chunk 2: 0000000000038FB8_00000001 / (id:233400 ver:1)
                copy 1: 192.168.0.203:9422 (status:VALID)
                copy 2: 192.168.0.204:9422 (status:VALID)
                copy 3: 192.168.0.239:9422 (status:VALID)




Snapshot

makes a ”real” snapshot (lazy copy, like in case ofmfsappendchunks)of some object(s) or subtree (similarly tocp -r command). It’s atomic with respect toeachSOURCEargument separately. IfDESTINATIONpoints to already existing file, errorwill be reported unless-o(overwrite) option is given. Note: ifSOURCEis a directory, it’scopied as a whole; but if it’s followed by trailing slash, only directory content is copied.


目錄掛在 /mnt/mfs 底下時
( mfsmount /mnt/mfs -H mfsmaster )


$ pwd
/mnt/mfs

$ ls -a
drwxr-xr-x.  4 root  root  3000140 Aug  6 13:10 App
drwxr-xr-x. 12 root  root   3000151 Aug  7 12:14 download
drwxrwxr-x.  5 root  root   3003474 Aug  7 10:40 ISO

$ mfsmakesnapshot App App2

$ ls -a
drwxr-xr-x.  4 root  root  3000140 Aug  6 13:10 App
drwxr-xr-x.  4 root  root  3000140 Aug 13 16:14 App2
drwxr-xr-x. 12 root  root   3000151 Aug  7 12:14 download
drwxrwxr-x.  5 root  root   3003474 Aug  7 10:40 ISO

注意,不是
$ mfsmakesnapshot /mnt/mfs /mnt/mfs_snapshop
(/mnt/mfs_snapshop,/mntmfs): both elements must be on the same device



其它

2019-03
現有兩台 chunkserver
安裝第三台新的 chunkserver 後
在 1Gb 的網路環境,
60 分鐘複製了 191 GB 檔案
4.5 小時複製了 958 GB 檔案