Skip to content

fix : article pages, refactor and better test data generation

Nathan Tien You requested to merge crawler_pages into master

crawlers:

  • Remove constructor when it is not necessary
  • Move get_or_create_source and get_or_create_periode into static methods
  • Use article.fpage and article.lpage when possible instead of article.page_range

cleanup_str:

  • Normalize strings using NFKC (Unicode)
  • Remove some characters (\xf7, \r) in strings

tests:

  • Update test data
  • Sort test data jsons by key
  • Allow regenerating the same dataset: use --keep
Edited by Nathan Tien You

Merge request reports

Loading