A rust implementation of skywatch-phash
README.md

Cache Module#

Purpose#

This module provides a Redis-backed cache for storing the perceptual hashes (phashes) of image blobs. Its primary purpose is to prevent redundant work by avoiding the need to re-download and re-process an image that has been seen before. Since image CIDs are content-addressed, the same image will always have the same CID, making it an ideal cache key.

Key Components#

mod.rs#

  • PhashCache: A cloneable struct that wraps a multiplexed Redis connection for concurrent access.
    • It stores phash strings, using the image blob's CID as the key.
    • The cache can be globally enabled or disabled via the CACHE_ENABLED environment variable.
    • Cached entries are automatically evicted after a configurable Time-to-Live (TTL).

Key Methods#

  • new(config): Establishes a connection to the Redis server specified in the configuration.
  • get(cid): Retrieves the cached phash for a given blob CID.
  • set(cid, phash): Stores a phash in the cache with the configured TTL.
  • get_or_compute(cid, compute_fn): A powerful helper method that implements the cache-aside pattern. It first tries to get the phash from the cache. If it's a miss, it calls the provided async compute_fn, stores the result in the cache, and then returns it.

Cache Flow#

The get_or_compute method simplifies the logic in the processing worker:

  1. A worker needs the phash for a blob CID.
  2. It calls cache.get_or_compute(cid, async { ... }).
  3. If Cache Hit: The cached phash is returned immediately (~1ms).
  4. If Cache Miss: The async block is executed. This block downloads the image, computes the phash, and returns it. The get_or_compute method then automatically calls set to store the new phash before returning it to the worker.

This pattern ensures that downloading and processing only happen when absolutely necessary.

Configuration#

  • CACHE_ENABLED: (Default: true) A boolean to enable or disable the cache entirely.
  • CACHE_TTL_SECONDS: (Default: 86400) The expiration time for cache entries, in seconds.
  • REDIS_URL: The connection string for the Redis server.

Dependencies#

  • redis: The asynchronous Redis client for Tokio.
  • crate::config::Config: Provides the cache and Redis settings.