wiki:Procedures/NewDataArrival

Context Navigation

Version 142 (modified by anch, 15 years ago) (diff)
--

Arrival of new flight data

This procedure should be followed on receipt of new flight data from ARSF.

If in any doubt about something (e.g. a dataset has two project codes), contact Gary.

Copy the data onto the system

After successfully copying the data onto the system, email arsf-processing to confirm the data transferred ok (but hasn't yet been checked!).

Create entry on flight data status spreadsheet for each project

Note time of disk arrival
Currently located ~arsf/docs/flight_data_status/YYYY/flight_data_status_YYYY.xls

Unpacking projects

The next stage can be manual or semi-automated. Even if the script works, go through and check it has correctly done each of the manual steps.

Semi-scripted method

In the directory above the new project directories (eg '.../flight_data/unpacking/') run 'unpack_folder_structure.py'
NOTE: If you want to just run this on just one project (and as we usually only unpack one at a time) then you will need to move this project into a temporary containing folder when running this script (will change script at some point so can specify a single project).
- By default, it runs safely in a dry-run mode and will output the commands it will run to the terminal. Check these look ok.
- If happy, either re-run with --execute (and optionally --verbose)
- Each project directory should be re-formatted to the current standard

Non-scripting method

Prune empty directories (check these aren't stubs for later data first) (use rmdir * in the project directory)
Rename any capitalised subdirectories
Move DCALM* files to applanix/raw
Remove any spaces, brackets or other Unix-upsetting characters in filenames
- use find -regex '.*[^-0-9a-zA-Z/._].*' | ~arsf/usr/bin/fix_naughty_chars.py
- gives suggested commands, but check before pasting commands!
Remove executable bit on all files
- use find -type f -exec chmod a-x {} \;
Remove group & other write permission
- chmod go-w . -R
Convert the .doc logsheet to .pdf
- ooffice -invisible "macro:///Standard.Module1.SaveAsPDF(FULL_PATH_TO_FILE)"
Rename 'rinex' to 'basestation' in top directory (if present). Move all basestation data to here, then create a symlink to it from inside both applanix and leica/ipas.

Verification

Unpack file check

In each project directory run 'unpack_file_check.py -l <admin/logsheet.doc(.txt)>'

This will convert .doc logsheet to .txt, or use the .txt if one available. NOTE to convert to .doc requires ooffice macro
Will then do various checks of data against logsheet as listed below. Information will be output to terminal. Important (error) messages are printed again at the end.
- Check file sizes against a 'suitable' size and also against header file (Eagle + Hawk)
- Check number of files against logsheet
- Check number of logsheets
- Check GPS start/stop times in header file (Eagle + Hawk)
- Check .raw, .nav, .log, .hdr for each Eagle + Hawk line
- Check for nav-sync issues - THIS IS BASED ON A FALSE ASSUMPTION AND PROBABLY IS USELESS

proj_tidy.sh

In the base of the project directory run proj_tidy:
proj_tidy.sh -p ./ -e
This will check that the directory structures look sensible and tell you where they don't. Correct anything significant that gets flagged up.

Final manual checks

Look at the logsheet and verify that we have copies of all relevant data mentioned there.

In some cases, the flight crew may fly two projects back-to-back but enter all the data onto a single logsheet. If so, you may need to split the project directory into two, particularly if there's a large time gap (navigation needs separate processing) or the PIs are different (different delivery addresses/tracking). If you do need to split a project, ensure both copies have copies of common files (logsheets, rinex, etc), but that non-common files are not duplicated (ie. don't include hawk data for part 1 in part 2..). Also note in the ticket what was done for tracking purposes.

Verify the details on the logsheet (esp. PI) by calling/emailing ARSF-Ops (probably Gary) - the application form and logsheet are not reliable enough, nor do they track any changes in PI over the lifetime of the application.

Check the filesizes of all data files (Eagle, Hawk, ATM, CASI) to make sure none are zero bytes (or obviously broken in some way).

Move to permanent PML data storage

Move to appropriate location in the repository (~arsf/arsf_data/2009/flight_data/...)

ensure the project directory names conform to the standard - PROJECTCODE-YYYY_JJJxx_SITENAME, e.g. GB07_07-2007_102a_Inverclyde, boresight-2007_198, etc

Create subsidiary files

Run the DEM generating script if a UK flight. nextmapdem.sh

If the flight is outside the UK you'll need to process the LiDAR data first or use the ASTER/SRTM dem (see http://arsf-dan.nerc.ac.uk/trac/wiki/Processing/SRTMDEMs)

Create the calibration sym link. This is done automatically if the unpacking scripts have been run.

Tickets and tracking

Ticket

Raise a new trac ticket (type 'flight processing') for the new data.

Ticket summary should be of the form BGS07/02, flight day 172/2007, Keyworth
Add short version of scientific purpose to guide processing (check ARSF application in ~arsf/arsf_data/2009/ARSF_Applications)
Note arrival time of data
Set priority of ticket from project grading (try the grades subpages on Projects or hassle ARSF-Ops)
Note any specific comments that might help with processing
Owner should be blank
Verify the sensors that were requested in the application (primary) and those that weren't (secondary). Note in the ticket which these were.

Ticket body should contain:

Data location: ~arsf/arsf_data/2011/flight_data/..... FILL IN

Data arrived from ARSF via SATA disk LETTER OR network transfer on DATE.

Scientific objective: FILL IN FROM APPLICATION (just enough to guide processing choices)

Priority: FILL IN FROM APPLICATION/WIKI PAGE (e.g. alpha-5 low), set ticket priority appropriately

PI: A. N. Other
EUFAR Project ID:

Any other notes..

= Sensors: =
 * Eagle (requested)
 * Hawk  (requested)
 * Leica LIDAR (not requested but flown anyway)
 * RCD (requested)

Status Page

Add details to the processing status page

#!
== Update spreadsheet ==

Make any necessary updates to the spreadsheet (PI, sensors, primary/secondary sensors, etc).

== Processing Order ==

Add to the [wiki:Status/Order processing order page].  Put in the correct primary/secondary list.

Vectors

If the site is in the UK, check the requested OS vectors wiki page? to see if we have vectors for the site in question. If not then email ARSF-Ops (Gary) and ask for them, specifying the 4 corner points of the area covered, in OS BNG grid coordinates (use the generated nextmap DEM to get the range of tiles, then do the conversion). This can take a couple of weeks before we get the vectors.

E-mail PI

Email the PI to inform them that their data has arrived for processing. Sample text:

fill in the 5 fields: <PI_NAME>, <PROJECT>, <EUFAR ID> <TICKET_NO>, <DATE_OF_ARRIVAL>
- the date of arrival should be when the disks arrived or when the download begun
cc to arsf-processing
also cc to neodc@…
set reply-to to arsf-processing

subject: ARSF data arrival notification (<PROJECT> [<EUFAR ID>])

Dear <PI_NAME>,

This is a notification that your ARSF data for <PROJECT> [<EUFAR ID>] is at the
ARSF Data Analysis Node for processing (data received from ARSF 
Operations on <DATE_OF_ARRIVAL>).

We aim to deliver as quickly as possible - our current processing 
priority order can be found at http://arsf-dan.nerc.ac.uk/trac/wiki/Status/Order, 
though please note that this queue is subject to change.

You can follow progress at the following webpages:

 http://arsf-dan.nerc.ac.uk/status/status.php
  - general status page

 http://arsf-dan.nerc.ac.uk/trac/ticket/<TICKETNO>
  - our notes during processing (may be technical)

If you would like any more information, please feel free to contact us at arsf-processing@pml.ac.uk

Regards,

Download in other formats:

Plain Text