cmoncrawl.processor.extraction.utils
Functions
|
Applies fc to all values in dict and returns a dict with same keys but with transformed values. |
|
Chains transforms together. |
|
Checks if required fields are present in the extracted dict. |
|
Combines list of dictioneries into one. |
|
Extracts data from tag using extract_dict defining what to extract and how to name it, and extract_transform_dict defining how to transform the extracted data. |
|
Returns a function that takes a bs4 tag and returns the value of the attribute attr_name or None if the attribute doesn't exist. |
|
Returns a function that takes a bs4 tag and returns the first tag that matches the tag_desc. |
|
Returns a function that takes a bs4 tag and returns a list of tags that match the tag_desc. |
|
Returns a function that takes a list of bs4 tags and returns a string with all the text from the tags joined with sep. |
|
Returns text from tag. |
|
Transforms dict using transforms dict. |