9

We are using a yum repository to distribute our software to our production instances. Unfortunately, createrepo is becoming a bottleneck, and we only have 469 packages in the repository.

$ time createrepo /opt/tm-yum-repo
Spawning worker 0 with 469 pkgs
Workers Finished
Gathering worker results

Saving Primary metadata
Saving file lists metadata
Saving other metadata
Generating sqlite DBs
Sqlite DBs complete

real    0m43.188s
user    0m37.798s
sys 0m1.296s

What can I do to make it faster?

jsd
  • 219
  • 1
  • 2
  • 8
  • Why does the createrepo time matter? – ewwhite Oct 29 '13 at 22:52
  • 1
    Developers are waiting for their code to go live. We went from an "rsync to live boxes" model to an rpm model and they are complaining that it now takes a few minutes where it used to take a few seconds. I'm somewhat sympathetic to their plight. But only somewhat :) – jsd Oct 29 '13 at 22:54
  • Thanks for the explanation. I couldn't tell if this was a one-time delay or not. – ewwhite Oct 29 '13 at 22:57
  • Please post your "after optimization" results so we can see how much time the optimizations saved :) – Joshua Miller Oct 30 '13 at 03:24
  • Using the options "--cachedir=cache --update --workers 4" reduced the time from 50 seconds to 15 seconds, so huge win! Thanks for the very helpful suggestions, guys! – jsd Oct 30 '13 at 16:20

4 Answers4

9

The --cachedir option given by dmourati in his answer will help you, but you should also use --update, especially if you are not replacing all 469 packages at once.

       --update
              If metadata already exists  in  the  outputdir  and  an  rpm  is
              unchanged  (based on file size and mtime) since the metadata was
              generated, reuse the existing metadata rather than recalculating
              it.  In  the  case  of a large repository with only a few new or
              modified rpms this can significantly reduce I/O  and  processing
              time.

In addition, consider making a separate repo for this package if deploying it this way is truly time-sensitive and --update doesn't help enough.

Michael Hampton
  • 244,070
  • 43
  • 506
  • 972
6

From the createrepo manpage, you'll see an option for a cachedir.

-c --cachedir <path>
              Specify a directory to use as a cachedir. This allows createrepo
              to create a cache of checksums of packages in the repository. In
              consecutive runs of createrepo over the same repository of files
              that  do  not  have  a  complete change out of all packages this
              decreases the processing time dramatically.

I'd start there.

If that didn't speed createrepo up sufficiently, I'd look at SSD or tmpfs.

dmourati
  • 25,540
  • 2
  • 42
  • 72
5

Have you tried making use of --workers for multi core CPU? Normally I use --workers 4 to spawn 4 threads of createrepo

Shâu Shắc
  • 356
  • 2
  • 4
2

Use createrepo_c, C implementation of createrepo

sj26
  • 105
  • 4
user799109
  • 21
  • 1
  • 1
    Welcome to SE. please add some information or links to sources to make your answer more helpful. – rubo77 Jul 10 '16 at 18:20