Version 120 (modified by wja, 6 years ago) (diff)

--

Archiving projects with CEDA

Creating JSON files

First, create the JSON files for the delivered data. Those files must be archived later as well. Make sure all projects have been finalised go to the main project dir as airborne user and run:

create_ceda_json.py -p .

The script will identify the different deliveries under delivery directory and files will be placed in under processing/json_files/. The type of delivery will be taken from the delivery name if it has the keywords: owl, hyperspectral, lidar or camera. Once the files have been created, have a look to the overview image and compare to the mosaic in the delivery to make sure everything is correct. Also open one JSON file in a text editor to check quickly all fields are correct. Specially check the ceda_path looks OK. If the project has been already archived, you will be able to get the link by opening http://data.ceda.ac.uk/ followed by the ceda_path. If the project has not been archived, check that the path looks similar to others in the same period of time.

Note that if you are creating JSONs for a LiDAR delivery that used the old azgcor processing, the projection will need to be input manually using the --projection flag. Look in the ReadMe for the projection used.

If you are running without a GUI you might get a problem due to the matplotlib backend. You can overcome this by changing the backend using an environmental variable:

export MPLBACKEND=agg

You can also specify the delivery directory by using -d and even run two different '-d' statements as well as you can choose a different output folder using -o. If you want to create a JSON file for each photograph file, then use --camera_per_frame. If at any time the overview image is not displaying correctly (not centred and because of that not displaying all data or labels) re-run the script with option -i. That will create an interactive plot that will allow you to check and show all data in the plot before exiting the script.

You should also make sure that all polygons created are valid. To this end, you can run:

check_jsons_geom.py  -p .

You can also specify to plot wrong polygons if there are any using --plot or just specify a single json directory with -j json_dir_path. If the polygons are not correct, you will have to recreate those using the previous script create_ceda_json.py. Try to variate the navigation interval using --nav_interval to get better results and check again with check_json_gemos.py to make sure the problem was solved. The bigger the value of nav_interval, the less nodes will have the polygon and errors caused with rapid changes can be corrected. Also try to use --keep_full_geom if needed. You should create those JSON files into a different location for testing and just overwrite the incorrect JSON file one with the correct one.

Committing the JSON to the PostGIS database

If all JSON files look OK under processing/json_files we need that data in our database. If this is a new project you can upload all JSON files for the different deliveries by running:

json_into_postgis.py -p .

If you want to only commit a specific type of JSON (camera, owl, lidar or hyperspectral) because other deliveries were already committed; you can do so by using the --json_dir argument insetad of --project. For example, if you want to add only the camera_json files just run:

json_into_postgis.py -j ./processing/camera_jsons/

In the case that the project has been reprocessed and new JSON files have been created, we do not want duplicates in the database. Running the script with the flag --reprocessed will delete the data in the sensor table (if data for that sensor were already present) and commit the new data. For example, if the camera has been redelivered and new JSON files have been created you can run:

json_into_postgis.py -j ./processing/camera_jsons/  --reprocessed

You can visualise the PostGIS database using QGIS by following this simple tutorial: Visualise PostGIS (internal-only page)

Archiving

This page documents the procedure to follow when sending data to CEDA.

  1. Choose a project:
    • A project is ready to be archived when all sensors have been delivered.
    • Check the ticket and make sure there is nothing on there which suggests something still needs to be done with the dataset (ask the processor if need be).
  2. Record in ticket that you are beginning to archive this flight.
  3. If there is a workspace version, move it into workspace/being_archived (pre-2011 only)
  4. Make sure deliveries have been finalised using 'finalise_delivered_data.sh' and use the current naming conversion of ProjectCode-Day-Sensor-Data.
  5. Prepare the repository version for archiving
    1. Make sure that the delivery sensor data is present and the raw data is present (incl. navigation). Run a quick eye over the rest of the deliveries.
    2. Remove unwanted large files. The project should have been cleaned up by the processor but often large files remain which are not needed. In particular, there are sometimes duplicates of data in processing/<sensor> which are included in delivery. Free up as much space as possible by deleting unwanted large files. Don't delete anything in processing/kml_overview
    3. Add a copy of the relevant trac ticket(s) and zip; run as arsf in the project directory:
      TICKETNUMBER=<ticket number of flight to be archived>
      
      chmod u+w -R admin
      mkdir -p admin/trac_ticket
      pushd admin/trac_ticket
      wget --recursive --level 1 --convert-links --html-extension http://nerc-arf-dan.pml.ac.uk/trac/ticket/$TICKETNUMBER
      zip -r nerc-arf-dan.pml.ac.uk.zip nerc-arf-dan.pml.ac.uk
      rm -fr nerc-arf-dan.pml.ac.uk
      popd
      chmod u-w -R admin
      
    4. Set permissions:
      1. Remove executable bit on all files (except the point cloud filter and the run[aceh] scripts):
        Note - if you are processing 2011 or later, you will need to run the below commands as both arsf and airborne
        find -type f -user `whoami` -not -wholename '*pt_cloud_filter*' -and -not -regex '.*/run[aceh]/.*sh' -and -perm /a=x -exec chmod a-x {} \;
        
      2. Give everyone read permissions (and execute if it has user execute) for the current directory and below:
        find -user `whoami` -exec chmod a+rX {} \;
        
    5. If there are multiple deliveries for a sensor (apart from the APL reprocessing), then put all but the old versions in a subdirectory called previous_deliveries. Make sure the newest version is a complete delivery. Fill in missing data with hardlinks using cp -nl if necessary. Ask for help if not sure.
    6. Make sure the delivery directory is in the top level (otherwise use finalise_delivered_data.sh)
      1. If you move the delivery directory the KML overview links need to be updated to the new delivery folder location
      2. Delete the broken symlinks and run make_kmloverview.py
      3. Check the links in the processing/kml_oveview folders or use the link on the wiki to open the google earth KML file and check the download links work and that there is data under each of the tabs
  6. Upload the data to CEDA using ceda-archive.py. This will upload the data using rsync (see http://www.ceda.ac.uk/help/archiving-with-ceda/sending-data-to-ceda/ for more details on archiving data at CEDA using rsync). If you are not running as airborne and haven't set up a password file you will need to enter your password (listed on the Passwords page).
    ceda-archive.py --year 2012 --jday 324 --sortie c --rsync
    
    • Note: If you don't need to upload all data (e.g., procesed only) run the command without the --rsync flag, this will create the folder structure and print the rsync commands to be run manually.
  7. If for any reason rsync fails you can also upload the data via FTP (note the general password, not the rsync one, is used) using:
    lftp arsfdan@arrivals.ceda.ac.uk
    > mirror -R -L /tmp/ceda_fEYiiE/ET12_14-2012_324_Boset_raw
    
  8. Notify CEDA the data has been uploaded (current contact is: wendy.garland@…. cc arsfinternal).
  9. Record in the ticket that it has been uploaded to CEDA and include the date.
  10. When CEDA confirm they have backed up the data:
    1. Note in ticket that it has been backed up by CEDA.
    2. Change status page entry to "archived" for relevant sensors (if pre-2011 HS then only do this if both original and reprocessed HS have been archived).
    3. If workspace version present, delete from being_archived.
    4. Add the flight in the appropriate format and in the appropriate position in ~arsf/archived_flights.txt. This is version controlled in ~arsf/live_git_repos/config_files so needs to be committed after being changed (there is no remote copy so no need to push).
    5. If all sensors have been archived (including reprocessing) then close the ticket. Otherwise note why the ticket is being left open.
  11. Note, if you don't get a conformation email from CEDA you can check projects which are out of embargo by seeing if the raw and processed data are available from http://browse.ceda.ac.uk/browse/neodc/arsf. Use the same username and password as for the CEDA FTP.