Version 74 (modified by knpa, 12 years ago) (diff)


Archiving projects with NEODC

This page documents the procedure to follow when sending data to NEODC.

  1. Choose a project.
    • A project is ready to be archived when all sensors have been delivered.
    • If it is 2010 or earlier then it will need to be fully r-synced from the workspace.
    • Check the ticket and make sure there is nothing on there which suggests something still needs to be done with the dataset (ask the processor if need be).
  2. If there is a workspace version, move it into workspace/being_archived
  3. Prepare the repository version for archiving:
    1. Run to highlight any problems: -p <project directory> -c
      Check the output. Delete hidden/temporary/broken files. Fix incorrect file name formats and any other obvious errors. Everything in the delivery should be close to perfect, but don't worry too much about things in the main project.
    2. Make sure that the main delivery data is present and the raw data is present. Run a quick eye over the rest of the file tree.
    3. Remove unwanted large files. The project should have been cleaned up by the processor but often large files remain which are not needed. In particular, there are sometimes duplicates of data in processing/<sensor> which are included in delivery. Free up as much space as possible but deleting unwanted large files.
    4. Add a copy of the relevant trac ticket(s) ; run:
      mkdir -p admin/trac_ticket
      pushd admin/trac_ticket
      wget --recursive --level 1 --convert-links --html-extension
    5. Set permissions:
      1. Remove executable bit on all files (except the point cloud filter and the run[aceh] scripts):
        find -type f -not -wholename '*pt_cloud_filter*' -and -not -regex '.*/run[aceh]/.*sh' -and -perm /a=x -exec chmod a-x {} \;
      2. Give everyone read permissions (and execute if it has user execute) for the current directory and below:
        chmod -R a+rX .
  1. Create the tarballs for NEODC to download:

    If AIMMS or GRIMM data is present then you will first need to separate these and put into separate tarballs.
    Tarball name should be in format: GB09_05-2009_278b_Leighton_moss-AIMMS.tar.gz
    Create a md5sum file for the AIMMS/GRIMM data also
    Use: md5sum <TARBALL NAME> > <TARBALL NAME>-MD5SUM.txt
    su - arsf
    ~/arsf_data/archived/ <path to project in repository> <optional additional projects>
    (e.g. ~arsf/arsf_data/2011/flight_data/spain_portugal/EU11_03-2011_142_Jimena/)
    # To run the archiving locally rather than via the grid engine, use:
    ~arsf/usr/bin/ <path to project>
    When complete, this will have dumped the data into ~arsf/arsf_data/archived/neodc_transfer_area/staging/. Check it looks OK then move it up one level so NEODC can rsync it. Logs will be in ~arsf/arsf_data/archived/archiver_logs/.

  2. Notify NEODC they can download the data (Current contact is: wendy.garland@…) and record the date in the ticket.
  3. When NEODC confirm they have backed up the data:
    1. Remove tarball from the transfer area
    2. Move the repository project to archive disk at: ~arsf/arsf_data/archived/<original path from ~arsf/arsf_data/>
      e.g. mv ~arsf/arsf_data/2008/flight_data/uk/CEH08_01/ ~arsf/arsf_data/archived/2008/flight_data/uk/CEH08_01
      You may need to create parent directories if they don't yet exist.
    3. Create a symlink to the project in it's original location. Point the symlink through ~arsf/arsf_data/archived rather than directly to specific disk.
      e.g. ln -s ~arsf/arsf_data/archived/2008/flight_data/uk/CEH08_01 ~arsf/arsf_data/2008/flight_data/uk/CEH08_01
    4. Note in ticket that it has been backed up by NEODC and moved to archive disk.
  4. Final steps - maybe wait a month:
    1. If workspace version present, delete from being_archived.
    2. Close the ticket