Version 5 (modified by knpa, 10 years ago) (diff) |
---|
proj_tidy.sh
Intro
proj_tidy (project tidy) is a tool to keep the arsf flight directories in a neat and standard data structure with correct layout and filenames. It also runs a multitude of checks to flag any problems with the data.
It also serves as our "unpacking script", it prints a list of commands that you should use to convert it from the format we get it from ARSF-OPS to our desired standard.
proj_tidy should be in harmony with the project structure page
If they disagree, go on what the majority of flights have done for that year.
Location
It lives here:
~arsf/usr/bin/proj_tidy.sh
It is modular, and it's modules live here:
~arsf/usr/bin/proj_tidy/
Basic guide to functions
Changes between years
Also present in the module directory, are regex/templates. These are designed to hold filename/folder structure information for each year. This is because we use proj_tidy to archive flights in previous years. We want to make sure these are standard for a given year but we don't care if they change in-between years (which things do). We therefore need a record of past conventions. The regex files are used by proj_tidy, the templates are the same thing but in a form where things like flightDay can be readily inserted to build correct paths/names. I think this was so proj_tidy can give suggested changes so you can copy and paste without having to correct it manually, but this has not yet been implemented. For pre-2011 an older version of proj_tidy is ran (proj_tidy_old.sh), this is because the project structure/filename conventions were originally hardcoded into the script. When we changed to APL in 2011, I decided to rewrite proj_tidy in the new, flexible year format, and kept the old one around for 2010 stuff. If you run the regular proj_tidy on something pre-2011, it will call proj_tidy_old instead.
Improvements/fixes
- SERIOUS BUG - although it's set up to check pre-2011 delivies (as outlined above), it currently can't do this if that flight has been converted to the layout. Since ALL flights have now been converted to the new layout, this needs fixing before we can archive older flights. (dap, ask someone what about the change in 2011, reprocessing, and conversion of structures).
- Currently it determines if raw data for a sensor exists by looking if there is e.g. hyperspectral/fenix/ dir. This should be changed so it actually looks for data files e.g. FENIX*.raw
- Check if we have .hdr file and a .log file for every .raw file, and do something similar for other sensors. So basically check, if we do have a sensor as determined above, that all expected files are present
- Check the database to get number of lines for a sensor, and check there are indeed this number
- If can't find raw data, don't search for files pertaining to those sensors
- Check that the bandset in the hyperspectral header files is what we expect
- Some updates are needed to the regex files so it recognises some newer files.