Skip to content

PlaceOS/search-ingest

PostgreSQL Elasticsearch Ingest Service

Build CI Changelog

A small (one might even say 'micro') service that hooks into pg-orm models and generates elasticsearch indices. search-ingest exposes a REST API to reindex/backfill specific models.

Usage

  • Set the tables to be mirrored in ES through setting SearchIngest::MANAGED_TABLES with an array of (T < PgORM::Base).class
  • Configure Elastic client through ELASTIC_HOST and ELASTIC_PORT env vars, or through switches on the command line
  • Configure PostgreSQL connection PG_DATABASE_URL env var

POST /api/v1/reindex[?backfill=true]

Deletes indexes and recreates index mappings. Backfills the indices by default (toggle with backfill boolean).

POST /api/v1/backfill

Backfills all indexes with data from PostgreSQL.

GET /api/v1/healthz

Healthcheck.

Index Schema

  • Each PostgreSQL table receives an ES index, with a mapping generated from the attributes of a PgORM model.
  • PgORM attributes can accept a tag es_type to specify the correct field datatype for the index schema.
  • belongs_to associations are modeled with ES join datatypes, associated documents are replicated in their parent's index. This is necessary for has_parent and has_child queries.

PostgreSQL Mirroring

SearchIngest::TableManager hooks into the changefeed of a table, resolves associations of the model and creates/updates documents in the appropriate ES indices.

Configuration

  • ENV: A value of production lowers log verbosity
  • ES_HOST: Elasticsearch host
  • ES_PORT: Elasticsearch port
  • ES_TLS: Use Elasticsearch https, default is false
  • ES_URI: Elasticsearch uri, detects whether to use TLS off schema
  • ES_DISABLE_BULK: Use single requests to Elasticsearch instead of the bulk API. Defaults to false
  • ES_CONN_POOL_TIMEOUT: Timeout when checking a connection out of the Elasticsearch connection pool
  • ES_CONN_POOL: Size of the Elasticsearch connection pool
  • ES_IDLE_POOL: Maximum number of idle connections in the Elasticsearch connection pool
  • UDP_LOG_HOST: Host for sending JSON formatted logs to
  • UDP_LOG_PORT: Port that UDP input service is listening on
  • PG_DATABASE: DB to mirror to Elasticsearch, defaults to "test"
  • PG_HOST: Host of PostgreSQL, defaults to localhost
  • PG_PORT: Port of PostgreSQL, defaults to 5432
  • PG_USER: PostgreSQL database user, defaults to postgres
  • PG_PWD: PostgreSQL database password, defaults to ""
  • PLACE_SEARCH_INGEST_HOST: Host to bind server to
  • PLACE_SEARCH_INGEST_PORT: Port for server to listen on

Contributing

See CONTRIBUTING.md.

Contributors

About

Search indexing service, mirroring documents in RethinkDB to Elasticsearch

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Contributors 8