2

I have been setting up a raid1 array and have set up an encrypted medium using cryptsetup with default options. The raid arrays are supposed to use 2 drives, but for the moment, I only have 1 drive in each raid1 array, to compare the performance between them.

Performance

The unencrypted array

Write performance

dd if=/dev/zero of=/media/storage/Temp/test.img bs=100M count=10 10+0 records in 10+0 records out 1048576000 bytes (1.0 GB) copied, 7.35153 s, 143 MB/s

Top output:

top - 10:30:02 up 2 days, 19:18,  2 users,  load average: 0.00, 0.16, 0.72
Tasks: 147 total,   3 running, 144 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.1 us, 21.4 sy,  0.0 ni, 75.0 id,  0.9 wa,  0.0 hi,  2.7 si,  0.0 st
KiB Mem:   4044256 total,  1135880 used,  2908376 free,   224624 buffers
KiB Swap:  7812496 total,   123488 used,  7689008 free,   470796 cached

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
11591 root      20   0  109m 100m  572 R  98.5  2.5   0:03.12 dd
11592 root      20   0     0    0    0 R  98.5  0.0   0:00.24 flush-9:1
  203 root      20   0     0    0    0 S  52.1  0.0   0:15.59 md1_raid1

Everything here seems like expected

Read performance

hdparm -t /dev/md1

/dev/md1:
 Timing buffered disk reads: 574 MB in  3.01 seconds = 190.95 MB/sec

The encrypted array

Write performance

dd if=/dev/zero of=/dev/mapper/galerkin_storage bs=100M count=100
100+0 records in
100+0 records out
10485760000 bytes (10 GB) copied, 209.058 s, 50.2 MB/s

Top output:

top - 10:12:20 up 2 days, 19:00,  2 users,  load average: 5.65, 2.92, 1.60
Tasks: 149 total,   6 running, 143 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.1 us, 21.4 sy,  0.0 ni, 74.9 id,  0.9 wa,  0.0 hi,  2.7 si,  0.0 st
KiB Mem:   4044256 total,  3749816 used,   294440 free,  3155712 buffers
KiB Swap:  7812496 total,   132464 used,  7680032 free,    40892 cached

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
10940 root      20   0     0    0    0 R  99.0  0.0   1:49.99 kworker/2:1
11538 root      20   0     0    0    0 R  94.5  0.0   1:28.32 kworker/3:1
11486 root      20   0     0    0    0 R  63.0  0.0   2:13.37 kworker/1:2
11489 root      20   0     0    0    0 R  27.0  0.0   0:52.80 flush-253:0
10910 root      20   0     0    0    0 R  22.5  0.0   2:06.59 kworker/0:2
 1305 root      20   0     0    0    0 S  18.0  0.0 338:40.46 md3_raid1
11490 root      20   0     0    0    0 S  13.5  0.0   1:31.37 kworker/0:1
11539 root      20   0  109m 100m  572 D  13.5  2.5   0:23.25 dd

Read performance

hdparm -t /dev/mapper/galerkin_storage 

/dev/mapper/galerkin_storage:
 Timing buffered disk reads:  84 MB in  3.03 seconds =  27.73 MB/sec

using dd

dd if=/dev/mapper/galerkin_storage of=/dev/null bs=100M count=100
100+0 records in
100+0 records out
10485760000 bytes (10 GB) copied, 369.272 s, 28.4 MB/s

top output

top - 10:29:49 up 3 days, 19:18,  2 users,  load average: 2.14, 2.69, 1.69
Tasks: 148 total,   2 running, 146 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.1 us, 15.8 sy,  0.0 ni, 81.4 id,  0.8 wa,  0.0 hi,  2.0 si,  0.0 st
KiB Mem:   4044256 total,  1586852 used,  2457404 free,  1070080 buffers
KiB Swap:  7812496 total,   115916 used,  7696580 free,    67056 cached

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
13963 root      20   0     0    0    0 R  84.9  0.0   3:55.93 kworker/2:0
13773 root      20   0     0    0    0 S  30.3  0.0   2:38.38 kworker/3:2
14158 root      20   0  109m 100m  572 D  18.2  2.5   0:08.50 dd
14170 robert    20   0 23168 1448 1076 R   6.1  0.0   0:00.02 top
    1 root      20   0 10648  708  704 S   0.0  0.0   0:05.26 init
    2 root      20   0     0    0    0 S   0.0  0.0   0:00.17 kthreadd
    3 root      20   0     0    0    0 S   0.0  0.0   1:05.31 ksoftirqd/0
    5 root      20   0     0    0    0 S   0.0  0.0   0:00.00 kworker/u:0
    6 root      rt   0     0    0    0 S   0.0  0.0   0:00.14 migration/0

My conclusion

The write performance seems to be limited by my CPU performance, since the top reports the kworker uses 60-98% CPU. I can accept that my Intel Atom dual core is for performance. What surprises me is that the read performance is (1) less than the write performance and (2) does not seem to be limited by CPU performance.

Is my notion, that the read performance should be roughly equal to the write performance? Should I simply update to the latest version of debian, and not do archaeology? Is the version of cryptsetup I'm using (1.4.3) for reading not so multithreaded as the writing? The write seems to use 4 different threads, while writing uses 4?

I have looked at the question Very poor performance on LUKS/LVM/RAID combination under Debian Squeeze but I don't seem to have the same issue, since my top output displays 4 processes for kryptd, suggesting my cryptsetup is really multithreading.

Background info

The raid1 arrays only includes 1 drive at the moment, because I wanted to compare them to each other. luksDump of my encrypted medium

LUKS header information for /dev/md3

Version:        1
Cipher name:    aes
Cipher mode:    cbc-essiv:sha256
Hash spec:      sha1
Payload offset: 4096
MK bits:        256
MK digest:      
MK salt:       

MK iterations:  12250
UUID:           022e94a0-9dce-45c1-806b-9fb54cfabf9b

Key Slot 0: ENABLED
    Iterations:             49360
    Salt: 

    Key material offset:    8
    AF stripes:             4000
Key Slot 1: DISABLED
Key Slot 2: DISABLED
Key Slot 3: DISABLED
Key Slot 4: DISABLED
Key Slot 5: DISABLED
Key Slot 6: DISABLED
Key Slot 7: DISABLED

kernel

uname -ra
Linux galerkin 3.2.0-4-amd64 #1 SMP Debian 3.2.73-2+deb7u2 x86_64 GNU/Linux

Debian version

cat /etc/debian_version 
7.9

Cryptsetup version

cryptsetup --version
cryptsetup 1.4.3

The encrypted array was set up with

cryptsetup -v luksFormat /dev/md3 --key-file=/root/key-file

The raid array was set up with

mdadm --create /dev/md3 --level=1 --raid-devices=2 /dev/sda4 missing

Cpuinfo

cat /proc/cpuinfo 
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 28
model name  : Intel(R) Atom(TM) CPU D525   @ 1.80GHz
stepping    : 10
microcode   : 0x107
cpu MHz     : 1800.136
cache size  : 512 KB

The CPU is reported as 4 of the above.

Edit: Wrong version given in title. Correct is 7.9 (Wheezy).

Edit: Updated to cryptsetup 1.6.6

cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1       204800 iterations per second
PBKDF2-sha256     151703 iterations per second
PBKDF2-sha512      79824 iterations per second
PBKDF2-ripemd160  169562 iterations per second
PBKDF2-whirlpool   30913 iterations per second
#  Algorithm | Key |  Encryption |  Decryption
     aes-cbc   128b    39.5 MiB/s    43.5 MiB/s
 serpent-cbc   128b    29.3 MiB/s    32.0 MiB/s
 twofish-cbc   128b    34.0 MiB/s    46.4 MiB/s
     aes-cbc   256b    30.6 MiB/s    32.8 MiB/s
 serpent-cbc   256b    29.8 MiB/s    32.0 MiB/s
 twofish-cbc   256b    34.4 MiB/s    46.5 MiB/s
     aes-xts   256b    43.0 MiB/s    44.2 MiB/s
 serpent-xts   256b    31.5 MiB/s    32.3 MiB/s
 twofish-xts   256b    33.1 MiB/s    34.2 MiB/s
     aes-xts   512b    32.7 MiB/s    33.2 MiB/s
 serpent-xts   512b    31.8 MiB/s    32.3 MiB/s
 twofish-xts   512b    33.4 MiB/s    34.1 MiB/s

New performance measurements for the encrypted array with cryptsetup 1.6.6

Write Performance

dd if=/dev/zero of=/dev/mapper/galerkin_storage bs=100M count=100
100+0 records in
100+0 records out
10485760000 bytes (10 GB) copied, 207.493 s, 50.5 MB/s

top record during the write

top - 21:42:48 up 22 min,  2 users,  load average: 2.96, 1.07, 0.69
Tasks: 142 total,   7 running, 135 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.2 us, 12.6 sy,  0.0 ni, 82.6 id,  4.2 wa,  0.0 hi,  0.4 si,  0.0 st
KiB Mem:   4044256 total,  3252544 used,   791712 free,  2721776 buffers
KiB Swap:  7812496 total,       44 used,  7812452 free,    65520 cached

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
 4379 root      20   0     0    0    0 R  93.5  0.0   0:24.72 kworker/1:2
 4377 root      20   0     0    0    0 R  82.5  0.0   0:03.55 kworker/2:0
 4378 root      20   0     0    0    0 R  82.5  0.0   0:31.93 kworker/3:1
 4336 root      20   0     0    0    0 R  55.0  0.0   0:33.53 kworker/0:0
  189 root      20   0     0    0    0 S  44.0  0.0   0:13.94 md3_raid1
 4380 root      20   0  105m 100m  540 R  11.0  2.5   0:09.26 dd
 4396 robert    20   0 23348 1396 1032 R  11.0  0.0   0:00.03 top
    1 root      20   0 15468  900  740 S   0.0  0.0   0:01.15 init

Read Performance

dd if=/dev/mapper/galerkin_storage of=/dev/null bs=100M count=100
100+0 records in
100+0 records out
10485760000 bytes (10 GB) copied, 368.387 s, 28.5 MB/s

top record during the reading:

top - 21:25:17 up 4 min,  2 users,  load average: 0.57, 0.20, 0.09
Tasks: 141 total,   2 running, 139 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.8 us,  3.7 sy,  0.0 ni, 91.9 id,  3.6 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem:   4044256 total,  1055628 used,  2988628 free,   611612 buffers
KiB Swap:  7812496 total,        0 used,  7812496 free,   130004 cached

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
   11 root      20   0     0    0    0 R  54.5  0.0   0:07.55 kworker/0:1
   30 root      20   0     0    0    0 S  30.3  0.0   0:10.40 kworker/2:1
    9 root      20   0     0    0    0 S  24.2  0.0   0:02.59 kworker/1:0
 4287 root      20   0  105m 100m  540 D  24.2  2.5   0:04.63 dd
 4288 root      20   0     0    0    0 S  12.1  0.0   0:04.24 kworker/3:2
 4306 robert    20   0 23348 1404 1032 R   6.1  0.0   0:00.02 top
    1 root      20   0 15468  900  740 S   0.0  0.0   0:01.13 init

With hdparm

hdparm -t /dev/mapper/galerkin_storage 

/dev/mapper/galerkin_storage:
 Timing buffered disk reads:  84 MB in  3.06 seconds =  27.44 MB/sec

So, the read performance is still considerably less than the write performance. If I interpret the luksDump correctly, I have a 256bit aes-cbc. The benchmark command suggests it should have a read performance in the region of my dd benchmark. The write performance is however unexpectedly high. One thing that just struck me. I have before filled the encrypted partition with /dev/zero, so could it be that the writes are not needed to be performed, since the data is already zero?

Hestben
  • 31
  • 7
  • Could you run `cryptsetup benchmark` and see if there is a better performing cipher? – Aaron Copley Jan 15 '16 at 18:01
  • @AaronCopley: That version of cryptsetup does not include the benchmark option. I will upgrade to Debian 8 sometime in the near future an will get back on that then. – Hestben Jan 16 '16 at 08:36
  • 1
    Ok, I see your update with the CPU now, too. I would say that your problem is 50/50 Intel Atom and old version of LUKS. (v1.4.3 is 4 years old) – Aaron Copley Jan 16 '16 at 18:09
  • 1
    @AaronCopley: I have tested with cryptsetup 1.6.6, and that shows the same discrepancy between write and read performance. Write is 75% faster than reading. – Hestben Jan 18 '16 at 14:35
  • Fair enough. I missed that. But, if you have 1.6.6 now, you should have `cryptsetup benchmark`, right? – Aaron Copley Jan 19 '16 at 00:26
  • @AaronCopley Yes. I added output from `cryptsetup benchmark` just above "New performance measurements for the encrypted array with cryptsetup 1.6.6". – Hestben Jan 20 '16 at 06:44

0 Answers0