DIN-329

As discussed, I've added all the defects linked to the Netherlands scrapper to this ticket:

Defect 001 - Json file data and main page screenshot are missing for CI entity: ABN AMRO Groenbank BV Attached screenshots NL_Defect_001 and NL_Defect_001a are linked to the above issue

Defect 002 - Not all 'Category' data is captured in the JSON file for some CI. Bank is captured but the rest are ignored. Attached screenshots NL_Defect_002 is linked to the above issue

Defect 003 - JSON files data and main page screenshots are missing for these PI entities: detail.jsp?id=4366080d9645e911811b005056b60a9d&locale=en_GB detail.jsp?id=bf85dc049745e911811b005056b60a9d&locale=en_GB detail.jsp?id=bf85dc049745e911811b005056b60a9d&locale=en_GB Attached screenshots NL_Defect_003 and NL_Defect_003a are linked to the above issue

The entity we currently have on the NCA register website was also not scrapped: detail.jsp?id=8d41e2ab5948e311b55a005056b672cf Attached screenshots NL_Defect_003b is linked to the above issue

Defect 1

The 'CS' ABN Amro Groenbank B V is indexed but not processed

[
    [
      "Statutory name",
      "ABN AMRO Groenbank B.V."
    ],
    [
      "Trade name",
      "ABN AMRO Groenbank B.V."
    ],

Defect 2

NIBC Bank N.V. has a category with 2 items but one item is logged.

[ "Category", "Emittent effecten CSDB, Bank" ]

Defect 3

Coliding filenames

Solution

The ID query string value is taken from the href link, and the first 8 characters are extracted as a short hash.

This short hash is added to the filename when it is created.

Statutory name checking is now implemented. If the name has ben used before, it will use the statuory name and the trade name, otherwise just the statutory name.

1.8 KiB Raw Permalink Blame History

DIN-329

Defect 1

Defect 2

Defect 3

1.8 KiB

Raw Permalink Blame History