moosefs 3.0 筆記

2018081013:08

使用環境
moosefs 3.0

chunkserver 原有兩個硬碟滿了
當裝了新硬碟後
資料會自動重新分配、平均分散到三顆硬碟中 (叫做 internal rebalance )
最後三個硬碟內的空間使用率會很接近
==>注意：不是讓所有硬碟的 chunks 數目平均、也不是讓個硬碟的使用量(GB數目)平均
==>而是讓各硬碟的使用百分比平均

例如原先兩顆硬碟
/dev/sdb2 720G 舊硬碟
/dev/sda2 720G 舊硬碟
/dev/sdf1 1.8T   新加入

當做完 rebalance 後

$ du -h
檔案系統        容量    已用可用已用% 掛載點
: : :
/dev/sdb2       720G 332G 352G   49% /mnt/sdc2    <--舊硬碟
/dev/sda2       720G 332G 353G   49% /mnt/sda2    <--舊硬碟
/dev/sdf1       1.8T 833G 909G   48% /mnt/disk31 <--新加入的硬碟

所以，新加入的硬碟若跟舊硬碟的容量差異過大
chunk data 的搬移就很可觀，容量的分配也很奇怪

在 rebalance 過程中，可以看到 /var/log/message 裏頭一堆搬移紀錄：
Aug 10 11:15:54 box204 mfschunkserver[2824]: move chunk /mnt/sda2/mfschunks/ -> /mnt/disk31/mfschunks/ (chunk: 00000000000003F2_00000001)
Aug 10 11:15:54 box204 mfschunkserver[2824]: move chunk /mnt/sdc2/mfschunks/ -> /mnt/disk31/mfschunks/ (chunk: 0000000000009003_00000001)
Aug 10 11:15:54 box204 mfschunkserver[2824]: move chunk /mnt/sda2/mfschunks/ -> /mnt/disk31/mfschunks/ (chunk: 0000000000001E2A_00000001)
Aug 10 11:15:57 box204 mfschunkserver[2824]: move chunk /mnt/sdc2/mfschunks/ -> /mnt/disk31/mfschunks/ (chunk: 0000000000021C9D_00000001)
Aug 10 11:15:57 box204 mfschunkserver[2824]: move chunk /mnt/sda2/mfschunks/ -> /mnt/disk31/mfschunks/ (chunk: 0000000000003EF6_00000001)

文件提到
4.2.8 Chunkserver states
Chunkserver can work in 3 states:
normal, overloaded and (since MooseFS 3.0.62) internalrebalance

• Normal
state is a standard state. In ”Servers” CGI tab you can see load as a normal number, e.g.: 7

•Internal rebalance
state is a special Chunkserver state. It is activated when e.g. you
add a new, empty HDD to a Chunkserver. Then Chunkserver enters this special mode
and rebalances chunks between all HDDs to make all HDDs utilization as close to equal
as possible. In ”Servers” CGI tab you can see load as number in round brackets, e.g.: (7).

•Overloaded
is a special, temporary
Chunkserver state. It is activated when Chunkserver
load is high and Chunkserver is not able to perform more operations at the moment. In
such case, Chunkserver sends an information to Master Server that it is overloaded. If
the load lowers to the normal level, Chunkserver sends an information to Master Server,
that it is not overloaded any more. In ”Servers” CGI tab you can see load as a number
in square brackets, e.g.:

Master 的 RAM 需求量：

The most important factor in sizing requirements for the Master Server machine is RAM, as
the full file system structure is cached in RAM for speed. The Master Server should have
approximately 300-350 MiB of RAM allocated to handle 1 million objects (files, directories,
pipes, sockets, ...)

Chunkserver 的 RAM 需求量：

MooseFS Chunkserver uses approximately 250 MiB of RAM allocated to handle 1 million chunks.

Metalogger 的 RAM 需求量：

metalogger 只是定時做 metadata 備份，所以對硬體需求不高

MooseFS metalogger simply gathers metadata backups from the MooseFS Master Server – so
the hardware requirements are not higher than for the Master Server itself; it needs about the
same disk space. Similarly to the Master Server – the OS has to be POSIX compliant (Linux,
FreeBSD, Mac OS X, OpenSolaris, etc.).
MooseFS Metalogger should have at least the same amount of HDD space
(especially the free space in /var/lib/mfs ) as the main Master Server.
If you would like to use the Metalogger as a Master Server in case of the main Master’s failure,
the Metalogger machine should have at least the same amount of RAM as the main Master
Server

刪檔案

MooseFS does not immediately erase ﬁles on deletion, to allow you to revert the delete operation. Deleted ﬁles are kept in the trash bin for the conﬁgured amount of time (default: 24h / 86400 seconds) before they are deleted.
You can conﬁgure for how long ﬁles are kept in trash and empty the trash manually (to release the space).

要用程式設定

$ mfsgettrashtime /home/mfsmount
.: 86400

$ mfssettrashtime -r 3600 deldolder/
deltest/:
inodes with trashtime changed:              0
inodes with trashtime not changed:      41576
inodes with permission denied:           3744

特別注意! 雖然指定的是「秒數」但 mfs 系統仍回無條件進位為「小時」
換句話說 mfssettrashtime -r 60 deldolder/ 這樣的話
系統仍設為 1小時
若設 3601 那就等於 2 小時 (無條件進位)

也可以用
mfssettrashtime -r 3h deldolder/

救回已刪除檔案

** 必須在 moosefs server 從垃圾桶刪除前才能救回，通常是 86400秒以內

$ mfsmount -m /mnt/mfstrash -H mfsmaster
or
$ mfsmount -o mfsmeta -H mfsmaster /mnt/mfstrash

$ ls -l /mnt/mfstrash
total 0
dr-x------.    2 root root 0 Aug 13 15:47 sustained
drwx------. 4099 root root 0 Aug 13 15:47 trash

$ ls -l /mnt/mfstrash/trash/
total 0
drwx------.    3 root root 0 Aug 13 15:49 000
drwx------.    3 root root 0 Aug 13 15:49 001
drwx------.    3 root root 0 Aug 13 15:49 002
drwx------.    3 root root 0 Aug 13 15:49 003
drwx------.    3 root root 0 Aug 13 15:49 004
drwx------.    3 root root 0 Aug 13 15:49 005
drwx------.    3 root root 0 Aug 13 15:49 006
drwx------.    3 root root 0 Aug 13 15:49 007

$ ls -l /mnt/mfstrash/trash/000
total 297
-rw-rw-r--.    1 root root   15265 Aug 8 13:54 00013000|htdocs|m|home_top_19.jpg
-rw-rw-r--.    1 root root 213581 Aug 8 14:05 00014000|htdocs|a|2020|p7e02o_6e8bdbsc.jpg
-rw-rw-r--.    1 root root    6144 Aug 8 14:15 00016000|htdocs|img|calendar|Thumbs.db
-rw-rw-r--.    1 root root       0 Aug 8 14:19 00017000|htdocs|img|object|Thumbs.db
-rw-rw-r--.    1 root root     993 Aug 8 14:20 00018000|htdocs|js|ui|themes|ui.resizable.css
d-w-------. 4098 root root      0 Aug 13 15:49 undel

救回檔案的方式
就是把上面 000* 檔案移到 undel 目錄即可

例如
$ mv /mnt/mfstrash/trash/000/000* mfstrash/trash/000/undel

但麻煩的是：你不知道被刪除的檔案是放在 000 ~ FFF 中的哪個資料夾??!!

用 find 搜尋，如
$ find -name home_top_19.jpg

Snapshot

makes a ”real” snapshot (lazy copy, like in case ofmfsappendchunks)of some object(s) or subtree (similarly tocp -r command). It’s atomic with respect toeachSOURCEargument separately. IfDESTINATIONpoints to already existing file, errorwill be reported unless-o(overwrite) option is given. Note: ifSOURCEis a directory, it’scopied as a whole; but if it’s followed by trailing slash, only directory content is copied.

目錄掛在 /mnt/mfs 底下時
( mfsmount /mnt/mfs -H mfsmaster )

$ pwd
/mnt/mfs

$ ls -a
drwxr-xr-x. 4 root root 3000140 Aug 6 13:10 App
drwxr-xr-x. 12 root root   3000151 Aug 7 12:14 download
drwxrwxr-x. 5 root root   3003474 Aug 7 10:40 ISO

$ mfsmakesnapshot App App2

$ ls -a
drwxr-xr-x. 4 root root 3000140 Aug 6 13:10 App
drwxr-xr-x. 4 root root 3000140 Aug 13 16:14 App2
drwxr-xr-x. 12 root root   3000151 Aug 7 12:14 download
drwxrwxr-x. 5 root root   3003474 Aug 7 10:40 ISO

注意，不是
$ mfsmakesnapshot /mnt/mfs /mnt/mfs_snapshop
(/mnt/mfs_snapshop,/mntmfs): both elements must be on the same device

其它

2019-03
現有兩台 chunkserver
安裝第三台新的 chunkserver 後
在 1Gb 的網路環境，
60 分鐘複製了 191 GB 檔案
4.5 小時複製了 958 GB 檔案

相關文章

我要留言

電腦工坊-小郭於 2018-08-22 21:49 1F
不是很懂哈哈XD

moosefs 3.0 筆記

刪檔案

救回已刪除檔案

相關命令

Snapshot

其它

google 無預警關閉GCP用戶服務, GCP的風險

SQL Injection 一些參數範例

note: 分布式文件系统 moosefs (mfs) 資料

Amazon Elastic Compute Cloud (EC2) 筆記

MySQL / File System 效能