3

In short: From all the (rpm) packages installed I would like to identify the ones unused (for example since the last 6 months).

In long: I have number of machines with a respectable service record. Every time I upgrade from one release to another I'm surprised how well the upgrade procedure goes.

However over the years many packages were installed (via yum), a number of which I know are no longer used. I want to get rid of these as they have a negative impact on resource usage and the overall security of the system.

I'm looking for the best method to find unused packages.

One way would be to manually sift through the installed packages? The method works and I learn a lot, but its extremely time consuming.

So I'm looking for an automated way to identify unused packages so I can clean them manually.

I guess one way forward would be to monitor all used files on a server, link them to packages and see what's leftover. Is there anything available for this purpose?

Are there more inventive ways to accomplish this?

Zabuzzman
  • 733
  • 10
  • 25
  • What problem would you solve by doing this? – ewwhite Dec 27 '14 at 10:17
  • As mentioned: reduce resource usage and increase security – Zabuzzman Dec 27 '14 at 10:18
  • What resources usage would you be reducing? Security doesn't hinge on what's installed, but more likely what's running (*services, open ports, etc.*) - Do you have a resource or security problem? – ewwhite Dec 27 '14 at 10:19
  • I like to keep resource usage as low as possible. And for security, go look at the CVE database to understand why every installed package poses a potential security problem. – Zabuzzman Dec 27 '14 at 10:24
  • 1
    Then just do a minimum install, and install your service after. For the security concern, if no user have bash access then how does this will exploit it? As ewwhite told, you need to secure what is open. – yagmoth555 Dec 27 '14 at 12:17
  • Your comment is correct when it concerns, for example, an internet webserver. But, alas, the question is not limited to that: what about the server where I have untrusted users logging in, or even, a desktop system? Are you saying security is a lost case for them? – Zabuzzman Dec 27 '14 at 12:26
  • 2
    No, @yagmoth555's advice is perfectly applicable to desktops as well, only "minimum" in that case is a "Desktop" installation and "service" is the applications that the users need. In both cases, you should have clear documentation somewhere of what packages any given type of system needs. Your puppet roles/profiles that you have defined for them is a great place for it. – Michael Hampton Dec 30 '14 at 13:09
  • 2
    I think for anyone to be able to answer this appropriately you will need to come up with a concrete definition of what you determine as 'used' and 'unused'. – Matthew Ife Dec 30 '14 at 15:40
  • I would propose the same approach as Aaron. If you do this for security reasons I would focus on rpms containing suid root binaries. – Nils Jan 04 '15 at 19:47

2 Answers2

9

Given the nature of RPM's and shared libraries common to multiple packages, I would take the approach of building a list of packages that I actually use and diff that against a list of installed packages. There are benefits to removing unused packages, such as freeing up disk space, reducing packages that would facilitate privilege escalation, reducing the size of a checksum database i.e. OSSEC, aide, tripwire.

Assumption:

  • atime is enabled. If you are using a mount option of noatime, then the access times of files will not be updated and could not be used to determine what files are accessed. It is common for noatime to be set on a filesystem to avoid the write penalty.

Disclaimer: This method has some risk you will need to consider. For example, if your server has been up for a couple of years there could be daemons running that use old files you have not accessed since the server/daemon start time. There are plenty of other risks to factor in, but you asked so here is one method I might start with. This still requires a human to determine what could safely be removed. You should not automate removal of packages using this method. This is for educational use only.

Build a list of all RPM's installed.

rpm -qa | sort -n > /dev/shm/all.txt

Build a list of recently accessed files and save a count. We are approaching the new year, so you might want to look at last year.

YEAR=`date -d "one year ago" '+%Y'`
# YEAR=2014
OFS="$IFS";IFS=$'\n';stat --printf="%y %n\n" $(ls -tr $(find /bin /boot /etc /lib /lib64 /sbin /usr /var -type f ! -name "*~" ! -name "*.gz" ! -name "*.tar")) | grep ^${YEAR} | awk {'print $NF'} > /dev/shm/recent.txt;IFS="$OFS";
FILECOUNT=`egrep -c ^.+ /dev/shm/recent.txt`

Copy our RPM database to the ram disk so we don't abuse the server. Ensure you have at least 100 MB free or so. e.g. df -Ph /dev/shm

mkdir --mode=0700 /dev/shm/rpmdb
rsync -a /var/lib/rpm/. /dev/shm/rpmdb/.

Find the RPM's associated with our recent.txt list. This will take a while. I bet someone could find more efficient, faster and clever ways to do this step. I would do this in a screen session.

renice 19 -p $$ > /dev/null 2>&1
printf "${FILECOUNT} files to iterate through."
> /dev/shm/recent_packages.txt
for file in `cat /dev/shm/recent.txt`
do
rpm --dbpath /dev/shm/rpmdb -q --whatprovides ${file} >> /dev/shm/recent_packages.txt 2>/dev/null
# optional status indicator.
printf "."
done

Remove from our list the files not owned by an RPM package from the findings.

grep -v "not owned by" /dev/shm/recent_packages.txt | sort -n | uniq > /dev/shm/recent_sorted.txt

Diff the output. Again, this is not completely useful by itself. You will need to determine why the files from these packages have not been accessed.

diff -u /dev/shm/recent_sorted.txt /dev/shm/all.txt | grep '^+'

You can list the contents of an RPM with rpm -ql package. Here is the output on one of my VM's. As you can see, this is not entirely useful in my case.

+++ /dev/shm/all.txt    2014-12-31 20:50:06.521227281 +0000
+basesystem-10.0-4.el6.noarch
+dhcp-common-4.1.1-43.P1.el6.centos.x86_64
+filesystem-2.4.30-3.el6.x86_64
+rootfiles-8.1-6.1.el6.noarch

I need to keep filesystem and basesystem around, despite the fact those files have not been accessed in a while. Note: At some point I enabled noatime

I removed dhcp-common and its associated dhclient package, since I will never need DHCP in my specific use case. I realize this method is not entirely efficient, but it should give you a starting point on each unique role of your servers. Happy new year!

Aaron
  • 2,859
  • 2
  • 12
  • 30
  • Note this is probably going to miss libraries that are read/executed by other dependant programs. I thought about 'fixing' this using access times instead but this is generally updated by prelink. – Matthew Ife Dec 31 '14 at 22:06
  • Prelinking is a factor as well. I suppose that would have to be factored case-by-case if prelink is installed. I don't see a good way to further address all of Zabuzzman's needs using historical data. Any other solution would likely require putting something in place and waiting to get a baseline. – Aaron Jan 03 '15 at 02:25
  • @Aaron is there any way to contact you please? – seoppc Jan 17 '16 at 04:30
4

I don't know that there's a proper answer to this...

It's important to note that extraneous packages are often installed in many general-purpose Linux deployments.

Most engineers don't hand-select the individual software packages that get installed, but rather choose by logical groups of applications (web server, mail server, NFS server) or system roles (server, workstation, minimal).

There are also dependencies that get installed as part of the above selections. The notion of determining which packages are "unused" is fraught because of this.

Security doesn't hinge solely on what's installed... It's more of a function of what's actually running on the system; namely daemons, network services, exposed ports, processes, etc.

In terms of resource utilization, you're only losing disk space by having unused software installed. A process won't consume CPU or RAM resources until it's executed. So, the consequences are low. If I were an employer/manager, I'd suggest that other things receive your attention. Not this.

If you want to improve the system builds, the right way is to start with a base set of packages and add whatever's necessary to provide the requisite system functionality. Document the additional package lists and add them to a kickstart (example below). Don't go the wrong direction by removing software from a running system.

snippet from one of my kickstart package lists, with package groups and several extra packages...

%packages

@ base
@ core
@ cifs-file-server
@ compat-libraries
@ console-internet
@ development
@ mail-server
@ nfs-file-server
@ network-server
@ network-tools
@ system-management
@ system-admin-tools
@ web-server

yum-fastestmirror
rpm-devel
e2fsprogs
grub
kernel-devel
net-snmp-utils
screen
ewwhite
  • 197,159
  • 92
  • 443
  • 809