1.4 KiB
Note 2019-05-29T11.34.48
As discussed, I've added all the defects linked to the Netherlands scrapper to this ticket:
Defect 001 - Json file data and main page screenshot are missing for CI entity: ABN AMRO Groenbank BV Attached screenshots NL_Defect_001 and NL_Defect_001a are linked to the above issue
Defect 002 - Not all 'Category' data is captured in the JSON file for some CI. Bank is captured but the rest are ignored. Attached screenshots NL_Defect_002 is linked to the above issue
Defect 003 - JSON files data and main page screenshots are missing for these PI entities: detail.jsp?id=4366080d9645e911811b005056b60a9d&locale=en_GB detail.jsp?id=bf85dc049745e911811b005056b60a9d&locale=en_GB detail.jsp?id=bf85dc049745e911811b005056b60a9d&locale=en_GB Attached screenshots NL_Defect_003 and NL_Defect_003a are linked to the above issue
The entity we currently have on the NCA register website was also not scrapped: detail.jsp?id=8d41e2ab5948e311b55a005056b672cf Attached screenshots NL_Defect_003b is linked to the above issue
Defect 1
The 'CS' ABN Amro Groenbank B V is indexed but not processed
Defect 2
NIBC Bank N.V. has a category with 2 items but one item is logged.
Defect 3
Coliding filenames
Solution
The ID query string value is taken from the href link, and the first 8 characters are extracted as a short hash.
This short hash is added to the filename when it is created.