obtasks/DIR-3098/commit.md
2019-04-29 14:49:56 +01:00

620 B

DIR-3098 Extend OBDFCASCRAPE to be able to upload German NCA data

Summary

  • Added archiving - The contents of the artefact folder is compressed

  • Exteded archiving to France.

  • Added uploading to S3

  • Flattened folder structure, removed timestamping of folders within the archive to make it easier for the Ingestion process to handle

  • Added an index of the pages scraped to aid ingestion

  • Changed filenames to ensure there are no collisions, items are prefixed with either ps_, em or ci_.

  • Implemented Amazon SQS message service so that an announcement can be made when a new file is avilable for ingestion