4

Is there a way to have a sparse checkout of an SVN repo with deep nested directory structure.

I'm doing this using a listing of all the files in the repo and filtering for just *.xml:

svn list --recursive "http://myRepo.com/trunk" > allFiles.txt

I'm trying to do the following:

svn checkout "http://myRepo.com/trunk" --depth empty "myRepo"
svn update --set-depth empty project1/dirs/moreDirs/evenMore/file.xml

I tried this but got an error saying it was skipping updating that file.

If I manually do the following I can get the file in my checkout (I want a --set-depth empty that gets the parent directories for a nested SVN path).

svn update --set-depth empty project1
svn update --set-depth empty project1/dirs/moreDirs
svn update --set-depth empty project1/dirs/moreDirs/evenMore
svn update --set-depth empty project1/dirs/moreDirs/evenMore/file.xml

svn status -v project1/dirs/moreDirs/evenMore/file.xml
# prints svn file information

EDIT

I have 2 workarounds right now neither ideal

1. piece-meal svn update --set-depth empty

I wrote a bash function that takes the file path that I'm looking for a executes svn update --set-depth empty on it. For example for project1/dirs/moreDirs/evenMore/file.xml it would call:

svn update --set-depth empty updateproject1 updateproject1/dirs updateproject1/dirs/moreDirs updateproject1/dirs/moreDirs/evenMore updateproject1/dirs/moreDirs/evenMore/file.xml

It works but seems like it's pretty slow (maybe I can batch the calls for multiple files into one svn update call). I can't make multiple svn update calls for separate files in parallel because svn locks the repo.

Here's the full script:

function getContentFile() 
{

    CONTENT_FILE="$1"
    SVN_FILE="${SVN_REMOTE}${CONTENT_FILE}"
    LOCAL_CONTENT_FILE="${SVN_CHECKOUT}/${CONTENT_FILE}"
    LOCAL_CONTENT_FILE_DIR=$(dirname ${LOCAL_CONTENT_FILE})

    SVN_UPDATE_ARG="${CONTENT_FILE}"
    PARENT_DIR="$(dirname ${CONTENT_FILE})"
    if [ ! -e "${LOCAL_CONTENT_FILE}" ]; then
        pushd "${SVN_CHECKOUT}"
        while [ "$PARENT_DIR" != "." ]; do
            # Escape any spaces in the argument list being passed to svn update
            PARENT_ARG=$(echo $PARENT_DIR | sed 's/ /\\ /g')
            if [ -e "${SVN_CHECKOUT}/${PARENT_DIR}" ]; then
                # Stop if we detect a directory already controlled by SVN
                break
            fi
            SVN_UPDATE_ARG="$PARENT_ARG $SVN_UPDATE_ARG"
            PARENT_DIR="$(dirname ${PARENT_DIR})" || true
        done
        svn update --set-depth empty ${SVN_UPDATE_ARG}
    fi
}
# export function to use in xargs
export -f getContentFile

cat "$SVN_FILE_LISTING_CACHE" | egrep '\.xml$' | xargs -P 1 -n 1 -I{} bash -e -c 'getContentFile "$@"' _ {}

2. svn cat to get files

I can also just create the path for the folder structure and svn cat the file and I can do it on multiple files at the same time, but this suffers from not being connected to svn (e.g. I can't commit it back easily or update the file from svn without walking and matching the path), it's not a real svn checkout.

function getAllContentFiles() 
{
    FILE_REGEX="$1"
    #NUM_PROCESSORS=`sysctl hw.ncpu | awk '{print $2}'`
    # Do this in parallel (doesn't have to match number of actual processors)
    NUM_PROCESSORS=50
    #TODO: need to do it 1 at a time because of SVN lock for svn updates
    cat "$SVN_FILE_LISTING_CACHE" | egrep $FILE_REGEX | xargs -P ${NUM_PROCESSORS} -n 1 -I{} bash -ex -c 'getContentFile "$@"' _ {}
}

function getContentFile() 
{

    CONTENT_FILE="$1"
    SVN_FILE="${SVN_REMOTE}${CONTENT_FILE}"
    LOCAL_CONTENT_FILE="${SVN_CHECKOUT}/${CONTENT_FILE}"
    LOCAL_CONTENT_FILE_DIR=$(dirname ${LOCAL_CONTENT_FILE})

    SVN_UPDATE_ARG="${CONTENT_FILE}"
    PARENT_DIR="$(dirname ${CONTENT_FILE})"
    mkdir -p "${PARENT_DIR}"

    if [ ! -e "${LOCAL_CONTENT_FILE}" ]; then
        pushd "${SVN_CHECKOUT}"
        svn cat "${SVN_FILE}" > "${LOCAL_CONTENT_FILE}"
    fi
}
Dougnukem
  • 14,709
  • 24
  • 89
  • 130
  • And if you get individual files using `svn cat`? For example `svn cat "http://myRepo.com/trunk/path/to/file.xml" > "path/to/file.xml"` – janos Nov 20 '13 at 21:49
  • I could get individual files and use mkdir -p $(dirname ${FILE_PATH}) but if I use the svn update --set-depth I can edit and track it in SVN (e.g. I could commit back to those files or update to the latest). – Dougnukem Nov 21 '13 at 01:46

1 Answers1

3

Since Subversion 1.7.0 update has accepted a --parents option that does what you want.

So you can do the following for example:

$ svn co --depth empty https://svn.apache.org/repos/asf/subversion svn-sparse
 U   svn-sparse
Checked out revision 1544721.

$ svn up --parents trunk/subversion/libsvn_ra_svn/protocol
A    trunk
A    trunk/subversion
A    trunk/subversion/libsvn_ra_svn
Updating 'trunk/subversion/libsvn_ra_svn/protocol':
A    trunk/subversion/libsvn_ra_svn/protocol
Updated to revision 1544721.
Ben Reser
  • 5,695
  • 1
  • 21
  • 29
  • That works, it's still super slow to checkout (guessing each path in the svn update is making remote calls). Running the `svn cat` version with xargs -P 50 it's able to open a lot of parallel network connections. – Dougnukem Nov 23 '13 at 06:47
  • Yes it's walking the tree so it's really no different than what you were doing by hand. Sadly, not only is it make multiple requests, it's doing so in separate sessions. Which means it has to open a fresh TCP connection, negotiate SSL (if used), handle authentication (if required), do the OPTIONS request(s), and then finally do the REPORT. We ought to be filling in the REPORT with an entry along the entire path. The same way we do if you run an update after creating a situation like this. – Ben Reser Nov 23 '13 at 07:29
  • @Dougnukem [I did some work tonight toward improving this](http://svn.apache.org/r1565920). If you'd like to give it a spin (via a source build on Subversion trunk, I'd be very interested in how much it helps). I can probably improve it more but that's a more intrusive change and this halved the time an `svn update --parents` took for me over DAV. – Ben Reser Feb 08 '14 at 05:22
  • It seems like this should be the accepted answer @Dougnukem – solvingJ Aug 09 '18 at 17:53