Extraction utils
================

The utilies for extraction are defined :py:mod:`cmoncrawl.processor.extraction`.
It provides helper function for both filtering and extraction.


Filtering
---------

- `must_exist_filter``: filter out the ulrs that don't contain css selector

- `must_not_exist_filter`: filter out the ulrs that contain css selector


Extraction
----------

-- `check_required`: Creates a function that checks if all the required fileds are present in the extracted data

-- `chain_transform`: Creates a function that chains multiple transformation function, if any return None, the chain is broken and None is returned.  Especially usefull with soup select etc...

-- `extract_transform`: Creates a function that extracts the data from the soup tag using the css selector and transforms it using your transformation functions.