Data arrival by network transfer
Looking for data on Thelma
- change to arsf user on a safe machine
ssh arsf@gridmaster2 # (or su - arsf and don't turn your machine off!)
- find the exact data path on Ops server
ssh arsfdan@thelma.nerc-arsf.ac.uk # go find the path to the data to download # these will vary.. cd /data/Data_2009/ cd UK cd 281-09_GB08-02_Delamere # check path pwd # returns /data/Data_2009/UK/281-09_GB08-02_Delamere # # check size isn't crazy big # > 150GB and think about a disk transfer du -hsc . # if you get any permission denied errors, fix up permissions then redo the du. # fix permissions with: # /data/permissions_fixer/fix_perms_for_arsfdan.sh DIRECTORY_NEEDING_FIX # **be very careful** with this as you could easily screw up the whole thelma server if you get the directory wrong! logout
Looking for data on the disk station
- change to arsf user on a safe machine
ssh arsf@gridmaster2 # (or su - arsf and don't turn your machine off!)
- find the exact data path on Ops server
ssh arsfdan@thelma.nerc-arsf.ac.uk # go find the path to the data to download # these will vary.. cd /mnt/synology1/ or /mnt/synology2 cd data/Data\ 2014/ or maybe Data/Data\ 2014/ or similar cd 281-09_GB08-02_Delamere # check path pwd # returns /mnt/synology1/data/Data 2014/UK/281-09_GB08-02_Delamere # # check size isn't crazy big # > 150GB and think about a disk transfer du -hsc . # if you get any permission denied errors, fix up permissions then redo the du. # Not sure how for the disk station logout
Getting the data
- create a temporary directory in ~arsf_data/YYYY/flight_data/unpacking/ to download to, e.g.
DOWNDIR=~/arsf_data/`date +%Y`/flight_data/unpacking/download-`date +%Y%m%d%H%M%S` mkdir $DOWNDIR cd $DOWNDIR
- do the download with rsync, using compression (-z flag). Note there is no trailing / on the PROJECTNAME part of the path below - this is important!
- example rsync command follows, with a --dry-run option to prevent any damage occurring on the trial attempt. If it lists the files you'd expect it to transfer, remove the --dry-run option and repeat. If not, check your slashes!
- rsync's --progress flag is also a useful one to include if you'd like to see the progress of your download.
date rsync --dry-run -avz arsfdan@thelma.nerc-arsf.ac.uk:/PATH/YOU/FOUND/EARLIER/PROJECTNAME . ; date # date commands are optional, they just let you see when it started and stopped
When running rsync, you might also want to consider putting it inside a while loop in case of any failures. This is especially useful when there is a high volume of data. Doing this will mean rsync will automatically restart after a given period if it fails. An example is as follows:
## Note: don't use too low a number in sleep, as this will repeat the command too often and won't make thelma very happy while(true) ; do rsync -avz --progress arsfdan@thelma.nerc-arsf.ac.uk:/PATH/YOU/FOUND/EARLIER/PROJECTNAME . ; date ; sleep 1800 ; done
If something goes wrong after an hour or two, don't delete what you've already downloaded. Instead, repeat the rsync command and it'll pick up where it left off.
Spaces
Spaces in the path name need double escaping in rsync, eg.
arsfdan@thelma.nerc-arsf.ac.uk:/mnt/synology1/Data/Data\\\ 2014/216-14_GB14-00_Riss_Cal_Hyperspectral
(Not a typo / formatting problem there are three backslashes)
Unpacking data
Return to Procedures/NewDataArrival
Last modified 10 years ago
Last modified on Apr 17, 2015, 9:03:54 AM