Copy 10 million small files between remote servers (over web), linux systems, no SSH on source

03
2014-05
  • BookMaster

    Here is the situation:

    Need to copy aprox 10 million of small files (1k - 50k each) from a single directory, between 2 remote servers over the web. Tried with FTP and SCP but failed since all files are in single directory and somehow freezes the transfer.

    Problem is that can not use TAR as on source server SSH is not available, just on destination server where I have full control.

    Everyday number of files is increasing by 10-40k so it is getting harder to make final copy, any suggestions will be much appreciated.

    Thanks, R.

    Edit: To clarify the situation: source server (where are all files located) is a normal shared hosting server with access to PHP/Mysql and stuff (PHP can execute common linux commands though). Destination server where I want to transfer the files is a full root access (SSH etc) on a VPS instance.

    Now, I can tar/zip etc the files but wonder how long it will take to archive all 10/20 million files that I have (small size files). If I do it via PHP at some point a time out will appear or can I send shell exec with run in background mode, or something?

    Other option is to pull the files from destination server somehow, in small amounts or ? Any suggestions will be appreciated as I am getting frustrated already. Thanks so much for replys already made.

  • Answers
  • Sachin Shekhar

    You said, "Everyday number of files are increasing." If you can stop it, recursively pull all files from FTP server using wget:

    wget -m ftp://username:[email protected]
    

    If you can't stop new file addition until you move fully to new server, use curlftpfs (on new server) to mount FTP host as local directory. Then, use cp with -u and r flags. You can use this in multiple sessions after operation interuption (-u takes care of this).
    After mounting FTP host, you can also use rsync.

  • Steve N

    Depending on how the files are named, you could try to tar/zip chunks and then try SCP or FTP. You didn't specify the type of file but if they are logs or other text then you should get reasonable compression. Use wildcards to archive all the files beginning with a, foo, bar123 etc.
    For example:
    tar -czvf chunk01.tar.gz a*
    tar -czvf chunk02.tar.gz b*
    tar -czvf chunk03.tar.gz c*
    tar -czvf chunk03.tar.gz d*
    tar -czvf chunk03.tar.gz e*
    ...


  • Related Question

    linux - How to connect to remote X-Server (logged in via ssh)
  • IanH

    When I'm logged on to another host (e.g via ssh), how do I connect to the XServer of that machine (same user is logged in and is running a desktop (gnome))?

    You may ask way I wish to do that: There are commands that don't open an X-Window, e.g. xinput, xhost, etc.. and there are situations where you want to run them from remote.


  • Related Answers
  • IanH

    I found the problem. Setting DISPLAY manually to localhost:0 is not working, because the XServer does not listen to TCP connections (default Ubuntu 10.04 configuration).

    However, setting

    export DISPLAY=:0
    

    does the trick.

  • erichui

    You will need to set your display environment variable in the ssh session. Most likely, the X server is running on display 0. So in the ssh session (assuming a Bourne-like shell), type:

    export DISPLAY=localhost:0
    xclock
    

    You should see the clock on the remote X server display.

    Note: this should "just work" if your ssh session is logged in as the same user that started the desktop session on the X server. If you are logged in as a different user, you may need to obtain the xauth cookie from the desktop session's user account.

  • Nathan Adams

    If you are using the command line ssh, and assuming you are using Linux:

    ssh -X host
    

    Then try something like:

    xclock
    

    And you should see a clock but it is being ran on the remote computer.

    Note: This will only work if x forwarding is turned on in the sshd config file.

    Of course this is just a quick overview - can you post more info like what OS you have and what SSH client you are using?