Data arrival by network transfer

Looking for data on Thelma

  1. change to arsf user on a safe machine
    ssh arsf@gridmaster2 # (or su - arsf and don't turn your machine off!)
    
  2. find the exact data path on Ops server
    ssh arsfdan@thelma.nerc-arsf.ac.uk
     # go find the path to the data to download
     # these will vary..
    cd /data/Data_2009/
    cd UK
    cd 281-09_GB08-02_Delamere
     # check path
    pwd
     # returns /data/Data_2009/UK/281-09_GB08-02_Delamere
     #
     # check size isn't crazy big
     # > 150GB and think about a disk transfer
    du -hsc .
     # if you get any permission denied errors, fix up permissions then redo the du.
     # fix permissions with:
     # /data/permissions_fixer/fix_perms_for_arsfdan.sh DIRECTORY_NEEDING_FIX
     #  **be very careful** with this as you could easily screw up the whole thelma server if you get the directory wrong!
    logout
    

Looking for data on the disk station

  1. change to arsf user on a safe machine
    ssh arsf@gridmaster2 # (or su - arsf and don't turn your machine off!)
    
  2. find the exact data path on Ops server
    ssh arsfdan@thelma.nerc-arsf.ac.uk
     # go find the path to the data to download
     # these will vary..
    cd /mnt/synology1/ or /mnt/synology2
    cd data/Data\ 2014/ or maybe Data/Data\ 2014/ or similar
    cd 281-09_GB08-02_Delamere
     # check path
    pwd
     # returns /mnt/synology1/data/Data 2014/UK/281-09_GB08-02_Delamere
     #
     # check size isn't crazy big
     # > 150GB and think about a disk transfer
    du -hsc .
     # if you get any permission denied errors, fix up permissions then redo the du.
     # Not sure how for the disk station
    logout
    

Getting the data

  1. create a temporary directory in ~arsf_data/YYYY/flight_data/unpacking/ to download to, e.g.
    DOWNDIR=~/arsf_data/`date +%Y`/flight_data/unpacking/download-`date +%Y%m%d%H%M%S`
    mkdir $DOWNDIR
    cd $DOWNDIR
    
  2. do the download with rsync, using compression (-z flag). Note there is no trailing / on the PROJECTNAME part of the path below - this is important!
    • example rsync command follows, with a --dry-run option to prevent any damage occurring on the trial attempt. If it lists the files you'd expect it to transfer, remove the --dry-run option and repeat. If not, check your slashes!
    • rsync's --progress flag is also a useful one to include if you'd like to see the progress of your download.
      date
      rsync --dry-run -avz arsfdan@thelma.nerc-arsf.ac.uk:/PATH/YOU/FOUND/EARLIER/PROJECTNAME . ; date
       # date commands are optional, they just let you see when it started and stopped
      

When running rsync, you might also want to consider putting it inside a while loop in case of any failures. This is especially useful when there is a high volume of data. Doing this will mean rsync will automatically restart after a given period if it fails. An example is as follows:

## Note: don't use too low a number in sleep, as this will repeat the command too often and won't make thelma very happy
while(true) ; do rsync -avz --progress arsfdan@thelma.nerc-arsf.ac.uk:/PATH/YOU/FOUND/EARLIER/PROJECTNAME . ; date ; sleep 1800 ; done

If something goes wrong after an hour or two, don't delete what you've already downloaded. Instead, repeat the rsync command and it'll pick up where it left off.

Spaces

Spaces in the path name need double escaping in rsync, eg.

arsfdan@thelma.nerc-arsf.ac.uk:/mnt/synology1/Data/Data\\\ 2014/216-14_GB14-00_Riss_Cal_Hyperspectral

(Not a typo / formatting problem there are three backslashes)

Unpacking data

Return to Procedures/NewDataArrival

Last modified 10 years ago Last modified on Apr 17, 2015, 9:03:54 AM