fix : article pages, refactor and better test data generation
crawlers:
- Remove constructor when it is not necessary
- Move
get_or_create_source
andget_or_create_periode
into static methods - Use
article.fpage
andarticle.lpage
when possible instead ofarticle.page_range
cleanup_str
:
- Normalize strings using
NFKC
(Unicode) - Remove some characters (\xf7, \r) in strings
tests:
- Update test data
- Sort test data jsons by key
- Allow regenerating the same dataset: use
--keep
Edited by Nathan Tien You