Version 71 (modified by knpa, 12 years ago) (diff)

--

NEODC Data Delivery

This page documents the procedure to follow when delivering data to NEODC.

First, we need to have some data to send - these should be datasets that are 'completed'.
All sensors need to be delivered and, if 2010 or earlier, r-synced from the workspace. Check with whoever processed each sensor if need be.

  1. If there is a workspace version, move it into workspace/being_archived
  2. Prepare the repository version for archiving:
    1. Make sure everything is present and where it should be! (see Processing/FilenameConventions for the required layout and name formats)
      Things to look out for: Delivery folders, applanix/rinex data, las files, DEMs
      Use proj_tidy.sh to highlight any problems:
      proj_tidy.sh -p <project directory> -c
      
    2. Add a copy of the relevant trac ticket(s) ; run:
      mkdir -p admin/trac_ticket
      pushd admin/trac_ticket
      wget --recursive --level 1 --convert-links --html-extension http://arsf-dan.nerc.ac.uk/trac/ticket/TICKETNUMBER
      popd
      
    3. Scan the filesystem for any 'bad' things and fix them:
      1. Delete any unnecessary files - backups of DEMs that weren't used, temp files created by gedit (~ at end of filename), hidden files, duplicates in lev1 dir etc
      2. Find all files/dirs with unusual characters (space, brackets, etc), ignoring the admin/trac_ticket folder:
        find -regex '.*[^-0-9a-zA-Z/._].*'  -o -path './admin/trac_ticket' -prune | ~arsf/usr/bin/fix_naughty_chars.py
        
        This will give suggested commands, but check first.
    4. Set permissions:
      1. Remove executable bit on all files (except the point cloud filter and the run[aceh] scripts):
        find -type f -not -wholename '*pt_cloud_filter*' -and -not -regex '.*/run[aceh]/.*sh' -and -perm /a=x -exec chmod a-x {} \;
        
      2. Give everyone read permissions (and execute if it has user execute) for the current directory and below:
        chmod -R a+rX .
        
  3. Create the tarballs for NEODC to download:

    If AIMMS or GRIMM data is present then you will first need to separate these and put into separate tarballs.
    Use: tar czf <TARBALL NAME> <DIRECTORY TO TARBALL>
    Tarball name should be in format: GB09_05-2009_278b_Leighton_moss-AIMMS.tar.gz
    Create a md5sum file for the AIMMS/GRIMM data also
    Use: md5sum <TARBALL NAME> > <TARBALL NAME>-MD5SUM.txt
    su - arsf
    ~/arsf_data/archived/qsub_archiver.sh <path to project in repository> <optional additional projects>
    (e.g. ~arsf/arsf_data/2011/flight_data/spain_portugal/EU11_03-2011_142_Jimena/)
    # To run the archiving locally rather than via the grid engine, use:
    ~arsf/usr/bin/archiving_tarballer.sh <path to project>
    
    When complete, this will have dumped the data into ~arsf/arsf_data/archived/neodc_transfer_area/staging/. Check it looks OK then move it up one level so NEODC can rsync it. Logs will be in ~arsf/arsf_data/archived/archiver_logs/.

  4. Notify NEODC they can download the data (Current contact is: wendy.garland@…) and record the date in the ticket.
  5. When NEODC confirm they have backed up the data:
    1. Remove tarball from the transfer area
    2. Move the repository project to archive disk at: ~arsf/arsf_data/archived/<original path from ~arsf/arsf_data/>
      e.g. mv ~arsf/arsf_data/2008/flight_data/uk/CEH08_01/ ~arsf/arsf_data/archived/2008/flight_data/uk/CEH08_01
      You may need to create parent directories if they don't yet exist.
    3. Create a symlink to the project in it's original location. Point the symlink through ~arsf/arsf_data/archived rather than directly to specific disk.
      e.g. ln -s ~arsf/arsf_data/archived/2008/flight_data/uk/CEH08_01 ~arsf/arsf_data/2008/flight_data/uk/CEH08_01
    4. Note in ticket that it has been backed up by NEODC and moved to archive disk.
  6. Final steps - maybe wait a month:
    1. If workspace version present, delete from being_archived.
    2. Close the ticket