# Web

# Url

# Encoding

+ means a space only in application/x-www-form-urlencoded content, such as the query part of a URL

http://www.example.com/path/foo+bar/path?query+name=query+value

in this URL, the parameter name is query name with a space and the value is query value with a space, but the folder name in the path is literally foo+bar, not foo bar.

%20 is a valid way to encode a space in either of these contexts. So if you need to URL-encode a string for inclusion in part of a URL, it is always safe to replace spaces with %20 and pluses with %2B

from urllib.parse import quote, quote_plus
quote(' ') # %20, similar like encodeURIComponent in js
quote_plus(' ')# +, %20 is recommended for convenience

from requests.utils import requote_uri
requote_uri("http://www.sample+d.com/?id=123 abc") # http://www.sample+d.com/?id=123%20abc, similar like encodeURI

# Selenium

  1. XPATH usage

    firefox:

    function getElementsByXPath(xpath, parent)
    {
        let results = [];
        let query = document.evaluate(xpath, parent || document,
            null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);
        for (let i = 0, length = query.snapshotLength; i < length; ++i) {
            results.push(query.snapshotItem(i));
        }
        return results;
    }
    

    chrome: $x

    chromedriver:

    driver.switch_to.frame(0)
    driver.switch_to.parent
    # search xpath in iframe
    
  2. XPATH syntax

    /following-sibling::: next sibling

    /..: parent

    /*: child

    //*: all descendants

# requests

  • auth: user & password in http:user:password@hostname, auth=('user', 'password')
  • data: default content type is application/x-www-form-urlencoded; custom content type in headers, but not supporting multipart/form-data
  • files: content type is multipart/form-data; data only or tuple of things
    • {name: (filename[can be None], fileobj[open() or text], content_type[optional], custom_headers[optional])}
  • json: content type is application/json

WARNING

to post a file in multipart, in flask's test_client:

client.post(url, data={'document': open('file_path', 'rb')}, headers={'content-type': 'multipart/form-data'})

# Api server

WSGI stands for Web Server Gateway Interface, and ASGI stands for Asynchronous Server Gateway interface. They both specify the interface and sit in between the web server and a Python web application or framework.

# Flask

from flask import Flask, Blueprint, request, jsonify

app = Flask(__name__)
# blueprint is used for common prefix
api = Blueprint('serverless_handler', __name__)

@api.route('/', methods=['GET'])  # may not starts with slash
def home():
    return 'Hi there'  # plain text is acceptable

app.register_blueprint(api, url_prefix='/api/webhook')

to start: FLASK_APP=app.py flask run

# Flask + Gunicorn

Run a gunicorn app in pycharm

  1. scripts => /path/to/env/bin/gunicorn
  2. script parameter => -c python:gunicorn_conf package_path.app:app
  3. initialize logconfig_dict, workers, and timeout in gunicorn_conf.py

# connexion.FlaskApp

wrapper of flask.Flask.App

# Start an app

inside package_path/app.py

if __name__ == '__main__':
    app.run(port=8000)

# request

test_client = connexion.FlaskApp.app.test_client()

# query_string appears in url
# json appears in request body
# data accepts tuple of (data, filename), if uploading files
test_client.get('route_only', query_string={}, json={}, data={})

# OpenAPI

API description format for REST APIs. By leveraging json schema, OpenApi can also validate request and response format.

# Unstability in OpenAPI

OpenAPI 3.0 uses an extended subset of JSON Schema Specification Wright Draft 00 (aka Draft 5) to describe the data formats.

Info on OpenAPI development:

OpenAPI tools: editor

# Unstability in json schema

Info on json schema development

Json schema 7

Caveats:

  • list of dict check: empty list or list of dict with required properties are both valid

# Parameter Handling

In the OpenAPI 3.x.x spec, the requestBody does not have a name. By default it will be passed in as body. You can optionally provide the x-body-name parameter in your operation (or legacy position within the requestBody schema) to override the name of the parameter that will be passed to your handler function.

/path
  post:
    requestBody:
      x-body-name: body
      content:
        application/json:
          schema:

# Some examples

# File as in request body

openapi:

requestBody:
    content:
        multipart/form-data:
        schema:
            type: object  # required for swagger to add widget
            properties:
            document:
                type: string
                format: binary

swagger:

# Json as in multipart/form-data:

openapi:

/test:
    post:
        summary: test
        operationId: api.test
        tags:
            - document
        requestBody:
            content:
                multipart/form-data:
                    encoding:
                        body:
                            contentType: application/json
                    schema:
                        type: object
                        required: [body]
                        properties:
                            body:
                                type: object
                                required:
                                    - fields
                                properties:
                                fields:
                                    type: array

accepts:

--610267985175094a52c9b65216d3e15c
Content-Disposition: form-data; name="body"
{"fields": []}
--610267985175094a52c9b65216d3e15c--

access:

def test(body):  # see Parameter Handling
    body['body']

# sanic

from sanic import Sanic
from sanic.response import text

app = Sanic('MyHelloWorldApp')

@app.get('/')
async def hello_world(request):
    return text('Hello, world.')

to start: sanic path.file:app

# aiohttp

from aiohttp import web

routes = web.RouteTableDef()

@routes.get('/')  # path must starts with /
async def hello(request):
    return web.Response(text="Hello, world")  # plain text must be wrapped in web.Response

app = web.Application()
app.add_routes(routes)

if __name__ == '__main__':
    web.run_app(app)

Regards with cancellation: https://github.com/aio-libs/aiohttp/pull/6727/commits

# request

def create_app():
    api_path = pathlib.Path(__file__).parent / 'openapi'
    app = AioHttpApp(__name__, arguments={
            "API_VERSION": os.getenv('API_VERSION', 'devel'),
            "GIT_COMMIT_SHA": os.getenv('GIT_COMMIT_SHA', 'devel'),
        },
        specification_dir=api_path.as_posix(),
        server_args={'client_max_size': int(CLIENT_MAX_SIZE)}
    )

    app.add_api('api.yaml', validate_responses=False, base_path='/ai')

    return app

# pip install pytest-aiohttp==0.3.0
@pytest.fixture(scope='function')
def test_client(aiohttp_client, loop):
    app = create_app().app
    return loop.run_until_complete(aiohttp_client(app))

# https://docs.aiohttp.org/en/stable/client_reference.html
def test_x(test_client):
    # The data to send in the body of the request. This can be a FormData object or anything that can be passed into FormData, e.g. a dictionary, bytes, or file-like object. (optional)
    # json: Any json compatible python object (optional). json and data parameters could not be used at the same time.
    # params: Mapping, iterable of tuple of key/value pairs or string to be sent as parameters in the query string of the new request. Ignored for subsequent redirected requests
    test_client.get('route_only', params=?, data=?, json={})

# Remarks

  • data={'key': 'json_string', 'key2': b'bytes'} => if bytes exist, automatically multipart/form-data and json_string is converted to json type

# Celery

Distributed system

To start a worker:

celery -A package.module_with_celery_instance  --concurrency=1 --loglevel=DEBUG

Remark:

  • celery.autodiscover_tasks(package_list, name, force=False) discovers tasks in lazy mode
    • file where tasks is defined contains few imports and dependency is easy to solve
    • otherwise, package must be explicitly imported in the start file
    • otherwise, change force to true
Last Updated: 2/1/2024, 4:22:58 PM