API Reference¶
Broker¶
-
class
proxybroker.api.
Broker
(queue=None, timeout=8, max_conn=200, max_tries=3, judges=None, providers=None, verify_ssl=False, loop=None, **kwargs)[source]¶ The Broker.
One broker to rule them all, one broker to find them,One broker to bring them all and in the darkness bind them.Parameters: - queue (asyncio.Queue) – (optional) Queue of found/checked proxies
- timeout (int) – (optional) Timeout of a request in seconds
- max_conn (int) – (optional) The maximum number of concurrent checks of proxies
- max_tries (int) – (optional) The maximum number of attempts to check a proxy
- judges (list) – (optional) Urls of pages that show HTTP headers and IP address.
Or
Judge
objects - providers (list) – (optional) Urls of pages where to find proxies.
Or
Provider
objects - verify_ssl (bool) – (optional) Flag indicating whether to check the SSL certificates. Set to True to check ssl certifications
- loop – (optional) asyncio compatible event loop
Deprecated since version 0.2.0: Use
max_conn
andmax_tries
instead ofmax_concurrent_conn
andattempts_conn
.-
find
(*, types=None, data=None, countries=None, post=False, strict=False, dnsbl=None, limit=0, **kwargs)[source]¶ Gather and check proxies from providers or from a passed data.
Parameters: - types (list) – Types (protocols) that need to be check on support by proxy. Supported: HTTP, HTTPS, SOCKS4, SOCKS5, CONNECT:80, CONNECT:25 And levels of anonymity (HTTP only): Transparent, Anonymous, High
- data – (optional) String or list with proxies. Also can be a file-like object supports read() method. Used instead of providers
- countries (list) – (optional) List of ISO country codes where should be located proxies
- post (bool) – (optional) Flag indicating use POST instead of GET for requests when checking proxies
- strict (bool) – (optional) Flag indicating that anonymity levels of types (protocols) supported by a proxy must be equal to the requested types and levels of anonymity. By default, strict mode is off and for a successful check is enough to satisfy any one of the requested types
- dnsbl (list) – (optional) Spam databases for proxy checking. Wiki
- limit (int) – (optional) The maximum number of proxies
Raises: ValueError – If
types
not given.Changed in version 0.2.0: Added:
post
,strict
,dnsbl
. Changed:types
is required.
-
grab
(*, countries=None, limit=0)[source]¶ Gather proxies from the providers without checking.
Parameters: - countries (list) – (optional) List of ISO country codes where should be located proxies
- limit (int) – (optional) The maximum number of proxies
-
serve
(host='127.0.0.1', port=8888, limit=100, **kwargs)[source]¶ Start a local proxy server.
The server distributes incoming requests to a pool of found proxies.
When the server receives an incoming request, it chooses the optimal proxy (based on the percentage of errors and average response time) and passes to it the incoming request.
In addition to the parameters listed below are also accept all the parameters of the
find()
method and passed it to gather proxies to a pool.Parameters: - host (str) – (optional) Host of local proxy server
- port (int) – (optional) Port of local proxy server
- limit (int) – (optional) When will be found a requested number of working
proxies, checking of new proxies will be lazily paused.
Checking will be resumed if all the found proxies will be discarded
in the process of working with them (see
max_error_rate
,max_resp_time
). And will continue until it finds one working proxy and paused again. The default value is 100 - max_tries (int) – (optional) The maximum number of attempts to handle an incoming
request. If not specified, it will use the value specified during
the creation of the
Broker
object. Attempts can be made with different proxies. The default value is 3 - min_req_proxy (int) – (optional) The minimum number of processed requests to estimate the
quality of proxy (in accordance with
max_error_rate
andmax_resp_time
). The default value is 5 - max_error_rate (int) – (optional) The maximum percentage of requests that ended with an error. For example: 0.5 = 50%. If proxy.error_rate exceeds this value, proxy will be removed from the pool. The default value is 0.5
- max_resp_time (int) – (optional) The maximum response time in seconds. If proxy.avg_resp_time exceeds this value, proxy will be removed from the pool. The default value is 8
- prefer_connect (bool) – (optional) Flag that indicates whether to use the CONNECT method if possible. For example: If is set to True and a proxy supports HTTP proto (GET or POST requests) and CONNECT method, the server will try to use CONNECT method and only after that send the original request. The default value is False
- http_allowed_codes (list) – (optional) Acceptable HTTP codes returned by proxy on requests.
If a proxy return code, not included in this list, it will be
considered as a proxy error, not a wrong/unavailable address.
For example, if a proxy will return a
404 Not Found
response - this will be considered as an error of a proxy. Checks only for HTTP protocol, HTTPS not supported at the moment. By default the list is empty and the response code is not verified - backlog (int) – (optional) The maximum number of queued connections passed to listen. The default value is 100
Raises: ValueError – If
limit
is less than or equal to zero. Because a parsing of providers will be endlessNew in version 0.2.0.
Proxy¶
-
class
proxybroker.proxy.
Proxy
(host=None, port=None, types=(), timeout=8, verify_ssl=False)[source]¶ Proxy.
Parameters: - host (str) – IP address of the proxy
- port (int) – Port of the proxy
- types (tuple) – (optional) List of types (protocols) which may be supported by the proxy and which can be checked to work with the proxy
- timeout (int) – (optional) Timeout of a connection and receive a response in seconds
- verify_ssl (bool) – (optional) Flag indicating whether to check the SSL certificates. Set to True to check ssl certifications
Raises: ValueError – If the host not is IP address, or if the port > 65535
-
classmethod
create
(host, *args, **kwargs)[source]¶ Asynchronously create a
Proxy
object.Parameters: - host (str) – A passed host can be a domain or IP address. If the host is a domain, try to resolve it
- *args (str) – (optional) Positional arguments that
Proxy
takes - **kwargs (str) – (optional) Keyword arguments that
Proxy
takes
Returns: Proxy
objectReturn type: proxybroker.Proxy
Raises: - ResolveError – If could not resolve the host
- ValueError – If the port > 65535
-
get_log
()[source]¶ Proxy log.
Returns: The proxy log in format: (negotaitor, msg, runtime) Return type: tuple New in version 0.2.0.
-
avg_resp_time
¶ The average connection/response time.
Return type: float
-
error_rate
¶ Error rate: from 0 to 1.
For example: 0.7 = 70% requests ends with error.
Return type: float New in version 0.2.0.
-
geo
¶ Geo information about IP address of the proxy.
Returns: - Named tuple with fields:
code
- ISO country codename
- Full name of country
Return type: collections.namedtuple Changed in version 0.2.0: In previous versions return a dictionary, now named tuple.
-
is_working
¶ True if the proxy is working, False otherwise.
Return type: bool
-
types
¶ Types (protocols) supported by the proxy.
Where key is type, value is level of anonymity (only for HTTP, for other types level always is None).Available types: HTTP, HTTPS, SOCKS4, SOCKS5, CONNECT:80, CONNECT:25Available levels: Transparent, Anonymous, High.Return type: dict
Provider¶
-
class
proxybroker.providers.
Provider
(url=None, proto=(), max_conn=4, max_tries=3, timeout=20, loop=None)[source]¶ Proxy provider.
Provider - a website that publish free public proxy lists.
Parameters: - url (str) – Url of page where to find proxies
- proto (tuple) – (optional) List of the types (protocols) that may be supported
by proxies returned by the provider. Then used as
Proxy.types
- max_conn (int) – (optional) The maximum number of concurrent connections on the provider
- max_tries (int) – (optional) The maximum number of attempts to receive response
- timeout (int) – (optional) Timeout of a request in seconds
-
proxies
¶ Return all found proxies.
Returns: Set of tuples with proxy hosts, ports and types (protocols) that may be supported (from proto
).- For example:
- {(‘192.168.0.1’, ‘80’, (‘HTTP’, ‘HTTPS’), …)}
Return type: set