4

I have an SQL query to get the content size from a content URL, but when I execute the SQL and compare the result to the directory size of contentstore it doesn't match. For example first I get the content size from database (PostgreSQL):

SELECT count(*),
       SUM(round(u.content_size/1024/1024,2)) AS "Size (MB)",
       to_char(n.audit_created:: date,'yyyy') AS yyyy_
  FROM alf_node            AS n,
       alf_node_properties AS p,
       alf_node_properties AS p1,
       alf_namespace       AS ns,
       alf_qname           AS q,
       alf_content_data    AS d,
       alf_content_url     AS u
WHERE n.id=p.node_id
  AND ns.id=q.ns_id
  AND p.qname_id=q.id
  AND p.long_value=d.id
  AND d.content_url_id=u.id
  AND p1.node_id=n.id
  AND p1.qname_id IN (SELECT id FROM alf_qname WHERE local_name='name')
GROUP BY yyyy_
ORDER BY yyyy_ DESC;

It gives me a total size of 600 GB. But when I see the size property for folder content store in alf_data it shows 2.4 TB. How is that possible?

diziaq
  • 6,881
  • 16
  • 54
  • 96
GaneshK
  • 39
  • 4

2 Answers2

6

It is not so easy to know the space of the contentstore based only on what you currently have in your repository.

When a document is deleted or updated with a new version, Alfresco does not immediately delete the binary from the disk.

The process to actually free the disk space is a bit more "complex". How to configure the first step of this process is described in the following article: Cleaning up orphaned content (purge)

The "purge" by default is configured to be executed every day at 4AM and will purge any orphaned file that has been removed from the repository at least 14 days earlier.

As described in the article, by default, Alfresco does not delete the purged files but it only moves them in a different folder (contentstore.deleted).

It is safe to delete the files in the contentstore.deleted after a backup.

As usual, in alfresco-global-properties, you can modify the default configuration (that you can find in alfresco/repository.properties)

You can end up with a lot of orphaned files in many ways. Just to give some examples:

  1. the purge is not working for some reason
  2. someone recently has imported and then deleted many and/or big files
  3. someone has used the alfresco export, that creates the zip file in the repository
  4. the system is used to store short-living documents that are often deleted or updated (like big videos), etc...
Marco Altieri
  • 3,726
  • 2
  • 33
  • 47
  • I agree with your statement Marco, but I do not guess he has around 75% of his repo deleted recently and so orphenated waiting to get reaped – Younes Regaieg Feb 03 '16 at 23:23
  • 1
    I do not know if this is the case or not. I only wrote that in general you cannot know what binaries are in your content store looking the database. It can easily happen that: 1) the purge is not working for some reason 2) someone recently has imported and then deleted many and/or big files 3) someone has used the alfresco export, that creates the zip file in the repository 4) the system is used to store short-living document that are often deleted or updated (like big videos), etc... – Marco Altieri Feb 03 '16 at 23:33
  • yes, nice catch for those 4 cases that may cause something like this, would you mind posting an update to your question with it? – Younes Regaieg Feb 03 '16 at 23:37
5
  • First, if you are on windows, you should be aware of the difference between size and size on disk
  • You should be aware of the difference between MagaByte (MB), MebiByte (MiB) and MegaBit (Mbit)
  • And last, but not least, you're rounding up file sized in MB, for each file before summing that up, try summing in bytes then rounding the results like this:

    SELECT count(*),round(SUM(u.content_size)/1024/1024, 3) AS "Size (MB)", to_char(n.audit_created:: date,'yyyy') as yyyy_
    FROM alf_node AS n,
    alf_node_properties AS p,
    alf_node_properties AS p1,
    alf_namespace AS ns,
    alf_qname AS q,
    alf_content_data AS d,
    alf_content_url AS u
    WHERE n.id=p.node_id
    AND ns.id=q.ns_id
    AND p.qname_id=q.id
    AND p.long_value=d.id
    AND d.content_url_id=u.id
    AND p1.node_id=n.id
    AND p1.qname_id IN (SELECT id FROM alf_qname WHERE local_name='name')
    group by  yyyy_
    ORDER BY yyyy_ DESC;
    
Younes Regaieg
  • 4,156
  • 2
  • 21
  • 37