filter - Delete files from list with rsync

07
2014-07
  • deajan

    I'm struggling with this a lot and cannot find any solution:

    I would like to use rsync to remotely delete files contained in a list after having synced two directories. Here's what i try:

    rsync -ri --include="*/" --include-from="my_file_list" --exclude="*" --delete /source /dest
    

    This gives me: cannot delete non-empty directory: filename

    I've tried to play with protect filters, removing the first include, and a lot of stuff without success.

    Why am i using rsync instead of rm to delete files ? Because rm is slow, and i have like +50000 files to delete on a remote server. Anyone ? This gives me a real headache. Any alternative solution is also welcome.

  • Answers
    Know someone who can answer? Share a link to this question via email, Google+, Twitter, or Facebook.

    Related Question

    filter - rsync only a specific subset of directories
  • Martin Scharrer

    I need to use rsync to synchronize several directories from an rsync server. The whole rsync module is quite large and I like to avoid coping the other, not required parts as well.

    I have the wanted directories as text file but have problems creating a proper filter rule file. My requirements are as follows:

    • Include only directories in my list with all files and subdirectories in them.
    • Files in the included directories should be deleted if they are deleted on the server.
    • However all .hg directories (Mercurial repository) located on my site but not on the server, and all files and subdirectories in them, should not be deleted.
    • Excluded directories should not be deleted.

    So far I created a filter file which looks like this

    include sub/dir/I/want/***
    include other/sub/dir/I/want/***
    ...
    protect .hg/***
    exclude **
    

    But this excludes everything apparently. Without the exclude line all other files are also included.


  • Related Answers
  • Martin Scharrer

    I found the issue. My troubles were caused by the way rsync is processing the file names. Absolute (i.e. relative to the transfer root) include path do not work directly because the parent directories must also be included. Otherwise the whole directory structure is already excluded and the wanted file or sub-dir is never processed. The manual actual says so (somewhere) but it is highly counter-intuitive.

    In order to only include certain sub-dirs all parent dirs must be included and all other sub-dirs of them must then be excluded again:

    include sub/
    include sub/dir/
    include sub/dir/I/
    include sub/dir/I/want/***
    exclude sub/*
    exclude sub/dir/*
    exclude sub/dir/I/*
    
    include other/
    include other/sub/
    include other/sub/dir/
    include other/sub/dir/I/
    include other/sub/dir/I/want/***
    exclude other/*
    exclude other/sub/*
    exclude other/sub/dir/*
    exclude other/sub/dir/I/*
    
    ...
    
    protect .hg*
    exclude /*
    

    The second-last line protects all .hg* directories and files like .hg/ and .hgtags. The line excludes all other dirs in the transfer root.

    I wrote a Perl script to produce the above filter file from the list of wanted sub-directories. It is accessible under http://www.perlmonks.org/?node_id=928357.

  • grawity

    Run rsync twice.

    rsync hostname::sub/dir/I/want/        ./sub/dir/I/want/
    rsync hostname::other/sub/dir/I/want/  ./other/sub/dir/I/want/