This commit is contained in:
Gilles Soulier
2026-01-14 21:54:55 +01:00
parent c91c0f1fc9
commit d0b73b9319
140 changed files with 5822 additions and 161 deletions

BIN
.coverage Executable file → Normal file

Binary file not shown.

21
.env Normal file
View File

@@ -0,0 +1,21 @@
# Database
PW_DB_HOST=localhost
PW_DB_PORT=5432
PW_DB_DATABASE=pricewatch
PW_DB_USER=pricewatch
PW_DB_PASSWORD=pricewatch
# Redis
PW_REDIS_HOST=localhost
PW_REDIS_PORT=6379
PW_REDIS_DB=0
# App
PW_DEBUG=false
PW_WORKER_TIMEOUT=300
PW_WORKER_CONCURRENCY=2
PW_ENABLE_DB=true
PW_ENABLE_WORKER=true
# API
PW_API_TOKEN=change_me

3
.env.example Executable file → Normal file
View File

@@ -16,3 +16,6 @@ PW_WORKER_TIMEOUT=300
PW_WORKER_CONCURRENCY=2
PW_ENABLE_DB=true
PW_ENABLE_WORKER=true
# API
PW_API_TOKEN=change_me

0
.gitignore vendored Executable file → Normal file
View File

View File

@@ -8,10 +8,10 @@ Le format est basé sur [Keep a Changelog](https://keepachangelog.com/fr/1.0.0/)
## [Non publié]
**Dernière mise à jour**: 2026-01-15
### En cours
- Phase 2 : Base de données PostgreSQL
- Phase 2 : Worker Redis/RQ
- Phase 3 : API REST FastAPI
- Phase 3 : API REST FastAPI (filtres/exports/webhooks)
- Phase 4 : Web UI
### Ajouté
@@ -26,6 +26,38 @@ Le format est basé sur [Keep a Changelog](https://keepachangelog.com/fr/1.0.0/)
- Tests repository/pipeline (SQLite)
- Test end-to-end CLI + DB (SQLite)
- Worker RQ + scheduler (tasks + CLI)
- Tests worker/scheduler (SQLite + mocks)
- Tests CLI worker/enqueue/schedule + erreur DB (SQLite)
- Gestion erreurs Redis (RedisUnavailableError, check_redis_connection)
- Messages d'erreur clairs pour Redis down dans CLI (worker, enqueue, schedule)
- 7 nouveaux tests pour la gestion des erreurs Redis
- Logs d'observabilité pour jobs planifiés (JOB START/OK/FAILED, FETCH, PARSE)
- Tests end-to-end worker + DB (Redis/SQLite, skip si Redis down)
- Test end-to-end CLI -> DB -> worker (Redis, skip si Redis down)
- Guide de migration JSON -> DB
- API FastAPI (health/products/prices/logs/enqueue/schedule) + auth token
- Docker API + uvicorn
- Tests API de base
- Docker Compose API: port 8001 et hosts postgres/redis
- CRUD API (products/prices/logs)
- Filtres avances API (prix, dates, stock, status)
- Exports API CSV/JSON (products, prices, logs)
- Webhooks API (CRUD + test)
- Tests compatibilite `--no-db` (CLI)
- Test charge legere 100 snapshots (SQLite)
- Nettoyage warnings (Pydantic ConfigDict, datetime UTC, selectors SoupSieve)
- Web UI Vue 3 (layout dense, themes, settings) + Docker compose frontend
- Web UI: integration API (list produits, edition, enqueue, settings API)
- API: endpoints preview/commit scraping pour ajout produit depuis l UI
- Web UI: ajout produit par URL avec preview scraping et sauvegarde en base
- Web UI: popup ajout produit central + favicon
- API: logs Uvicorn exposes pour l UI
- Parsing prix: gestion des separateurs de milliers (espace, NBSP, point)
- API/DB: description + msrp + images/specs exposes, reduction calculee
### Corrigé
- Migration Alembic: down_revision aligne sur 20260114_02
- Amazon: extraction images via data-a-dynamic-image + filtrage logos
---

33
Dockerfile Normal file
View File

@@ -0,0 +1,33 @@
FROM python:3.12-slim
WORKDIR /app
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1
RUN mkdir -p /app/logs
COPY pyproject.toml README.md alembic.ini ./
COPY pricewatch ./pricewatch
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
libglib2.0-0 \
libgbm1 \
libnss3 \
libatk1.0-0 \
libatk-bridge2.0-0 \
libgtk-3-0 \
libxkbcommon0 \
libxcomposite1 \
libxrandr2 \
libxinerama1 \
libasound2 \
libpangocairo-1.0-0 \
&& rm -rf /var/lib/apt/lists/*
RUN pip install --no-cache-dir -e .
EXPOSE 8000
CMD ["sh", "-c", "uvicorn pricewatch.app.api.main:app --host 0.0.0.0 --port 8000 2>&1 | tee /app/logs/uvicorn.log"]

BIN
Image collée (2).png Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 136 KiB

BIN
Image collée (3).png Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

BIN
Image collée (4).png Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

BIN
Image collée.png Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 67 KiB

83
MIGRATION_GUIDE.md Normal file
View File

@@ -0,0 +1,83 @@
# Migration JSON -> Database (Phase 2)
Guide pour migrer des resultats JSON existants (Phase 1) vers PostgreSQL (Phase 2).
## Prerequis
- PostgreSQL + Redis operationnels
- Dependencies installees (`pip install -e .`)
- Migration DB appliquee (`alembic upgrade head`)
## 1) Verifier la configuration
Copier l'exemple et ajuster les identifiants si besoin:
```bash
cp .env.example .env
```
Verifier la configuration:
```bash
pricewatch doctor
```
## 2) Initialiser la base
Si la base n'est pas encore initialisee:
```bash
pricewatch upgrade
```
Verifier les tables:
```bash
psql -h localhost -U pricewatch pricewatch
\dt
```
## 3) Migrer un fichier JSON existant
Le JSON de Phase 1 est deja conforme au schema `ProductSnapshot`. Il suffit de le recharger puis de repasser par la persistence.
### Option A: Script rapide
Creer un petit script ad-hoc (exemple):
```python
from pricewatch.app.core.io import read_json_results
from pricewatch.app.scraping.pipeline import ScrapingPipeline
snapshots = read_json_results("scraped_store.json")
pipeline = ScrapingPipeline()
for snapshot in snapshots:
pipeline.process_snapshot(snapshot, save_to_db=True)
```
Execution:
```bash
python migrate_json.py
```
### Option B: Enqueue via worker
Si vous voulez traiter les snapshots via worker, utilisez une boucle qui enqueue `scrape_product` avec l'URL du snapshot, puis laissez le worker tourner. Cela garantira un refresh complet (fetch + parse + DB) au lieu d'inserer uniquement le JSON.
## 4) Verifier les donnees
```bash
psql -h localhost -U pricewatch pricewatch
SELECT COUNT(*) FROM products;
SELECT COUNT(*) FROM price_history;
SELECT COUNT(*) FROM scraping_logs;
```
## 5) Notes importantes
- Si `reference` est absente, la persistence du produit est ignoree, mais un `ScrapingLog` est cree.
- La contrainte d'unicite `(source, reference)` evite les doublons.
- Les images/specs sont synchronises par ajout/ups ert (pas de suppression automatique).
- En cas d'erreur DB, le snapshot est conserve et une note est ajoutee dans `snapshot.debug.notes`.

0
PHASE_1_COMPLETE.md Executable file → Normal file
View File

118
PHASE_2_PROGRESS.md Executable file → Normal file
View File

@@ -8,17 +8,28 @@
## 📊 Vue d'Ensemble
### Mises a jour recentes
- Migration Alembic corrigee (down_revision sur 20260114_02)
- Extraction images Amazon amelioree (data-a-dynamic-image + filtre logos)
- Nouveau scraping de validation (URL Amazon ASUS A16)
### Prochaines actions
- Verifier l'affichage des images, description, specs, msrp et reduction dans le Web UI
- Confirmer que le popup ajout produit affiche toutes les donnees du preview
### Objectifs Phase 2
- ✅ Configuration centralisée (database, Redis, app)
- ✅ Modèles SQLAlchemy ORM (5 tables)
- ✅ Connexion base de données (init_db, get_session)
- ✅ Migrations Alembic
- Repository pattern (CRUD)
- Worker RQ pour scraping asynchrone
- Scheduler pour jobs récurrents
- ✅ CLI étendu (commandes DB)
- Repository pattern (CRUD)
- Worker RQ pour scraping asynchrone
- Scheduler pour jobs récurrents
- ✅ CLI étendu (commandes DB + worker)
- ✅ Docker Compose (PostgreSQL + Redis)
- ⏳ Tests complets
- ✅ Gestion erreurs Redis
- ✅ Logs d'observabilité jobs
- ⏳ Tests end-to-end (Semaine 4)
---
@@ -226,7 +237,7 @@ PW_ENABLE_WORKER=true
---
## 📦 Semaine 2: Repository & Pipeline (EN COURS)
## 📦 Semaine 2: Repository & Pipeline (TERMINEE)
### Tâches Prévues
@@ -279,7 +290,7 @@ PW_ENABLE_WORKER=true
---
## 📦 Semaine 3: Worker Infrastructure (EN COURS)
## 📦 Semaine 3: Worker Infrastructure (TERMINEE)
### Tâches Prévues
@@ -313,22 +324,73 @@ pricewatch schedule <url> --interval 24 # Scrape quotidien
**Statut**: ✅ Terminé
#### Tests worker + scheduler ✅
**Fichiers**:
- `tests/tasks/test_scrape_task.py`
- `tests/tasks/test_scheduler.py`
**Statut**: ✅ Terminé
#### Gestion erreurs Redis ✅
**Fichiers modifiés**:
- `pricewatch/app/tasks/scheduler.py`:
- Ajout `RedisUnavailableError` exception
- Ajout `check_redis_connection()` helper
- Connexion lazy avec ping de vérification
- `pricewatch/app/cli/main.py`:
- Commandes `worker`, `enqueue`, `schedule` gèrent Redis down
- Messages d'erreur clairs avec instructions
**Tests ajoutés** (7 tests):
- `test_scheduler_redis_connection_error`
- `test_scheduler_lazy_connection`
- `test_check_redis_connection_success`
- `test_check_redis_connection_failure`
- `test_scheduler_schedule_redis_error`
**Statut**: ✅ Terminé
#### Logs d'observabilité jobs ✅
**Fichier modifié**: `pricewatch/app/tasks/scrape.py`
**Logs ajoutés**:
- `[JOB START]` - Début du job avec URL
- `[STORE]` - Store détecté
- `[FETCH]` - Résultat fetch HTTP/Playwright (durée, taille)
- `[PARSE]` - Résultat parsing (titre, prix)
- `[JOB OK]` / `[JOB FAILED]` - Résultat final avec durée totale
**Note**: Les logs sont aussi persistés en DB via `ScrapingLog` (déjà implémenté).
**Statut**: ✅ Terminé
---
## 📦 Semaine 4: Tests & Documentation (NON DÉMARRÉ)
## 📦 Semaine 4: Tests & Documentation (EN COURS)
### Tâches Prévues
#### Tests
- Tests end-to-end (CLI → DB → Worker)
- Tests erreurs (DB down, Redis down)
- Tests backward compatibility (`--no-db`)
- Performance tests (100+ produits)
- Tests end-to-end (CLI → DB → Worker)
- Tests erreurs (DB down, Redis down)
- Tests backward compatibility (`--no-db`)
- Performance tests (100+ produits)
**Fichiers tests ajoutes**:
- `tests/cli/test_worker_cli.py`
- `tests/cli/test_enqueue_schedule_cli.py`
- `tests/scraping/test_pipeline.py` (erreurs DB)
- `tests/tasks/test_redis_errors.py`
- `tests/cli/test_run_no_db.py`
- `tests/db/test_bulk_persistence.py`
- `tests/tasks/test_worker_end_to_end.py`
- `tests/cli/test_cli_worker_end_to_end.py`
- **Resultat**: OK avec Redis actif
#### Documentation
- Update README.md (setup Phase 2)
- Update CHANGELOG.md
- Migration guide (JSON → DB)
- Update README.md (setup Phase 2)
- Update CHANGELOG.md
- Migration guide (JSON → DB)
---
@@ -338,20 +400,22 @@ pricewatch schedule <url> --interval 24 # Scrape quotidien
|-----------|------------|---------|---|
| **Semaine 1** | 10 | 10 | 100% |
| **Semaine 2** | 5 | 5 | 100% |
| **Semaine 3** | 3 | 6 | 50% |
| **Semaine 4** | 0 | 7 | 0% |
| **TOTAL Phase 2** | 18 | 28 | **64%** |
| **Semaine 3** | 6 | 6 | 100% |
| **Semaine 4** | 7 | 7 | 100% |
| **TOTAL Phase 2** | 28 | 28 | **100%** |
---
## 🎯 Prochaine Étape Immédiate
**Prochaine étape immédiate**
- Tests end-to-end worker + DB
- Gestion des erreurs Redis down (CLI + worker)
- Phase 2 terminee, bascule vers Phase 3 (API REST)
- API v1 avancee: filtres, export CSV/JSON, webhooks + tests associes
**Apres (prevu)**
- Logs d'observabilite pour jobs planifies
**Après (prévu)**
- Documentation Phase 2 (resume final)
- Retry policy (optionnel)
- Phase 4 Web UI (dashboard + graphiques)
---
@@ -423,7 +487,13 @@ SELECT * FROM scraping_logs ORDER BY fetched_at DESC LIMIT 5;
---
**Dernière mise à jour**: 2026-01-14
**Dernière mise à jour**: 2026-01-15
### Recap avancement recent (Phase 3 API)
- Filtres avances + exports CSV/JSON + webhooks (CRUD + test)
- Tests API avances ajoutes
- Nettoyage warnings Pydantic/datetime/selectors
- Suite pytest complete: 339 passed, 4 skipped
### Validation locale (Semaine 1)
```bash
@@ -434,4 +504,4 @@ psql -h localhost -U pricewatch pricewatch
```
**Resultat**: 6 tables visibles (products, price_history, product_images, product_specs, scraping_logs, alembic_version).
**Statut**: ✅ Semaine 1 en cours (30% complétée)
**Statut**: ✅ Semaine 1 terminee (100%).

View File

@@ -146,6 +146,70 @@ docker-compose up -d
cp .env.example .env
```
Guide de migration JSON -> DB: `MIGRATION_GUIDE.md`
## API REST (Phase 3)
L'API est protegee par un token simple.
```bash
export PW_API_TOKEN=change_me
docker compose up -d api
```
Exemples:
```bash
curl -H "Authorization: Bearer $PW_API_TOKEN" http://localhost:8001/products
curl http://localhost:8001/health
```
Filtres (exemples rapides):
```bash
curl -H "Authorization: Bearer $PW_API_TOKEN" \\
"http://localhost:8001/products?price_min=100&stock_status=in_stock"
curl -H "Authorization: Bearer $PW_API_TOKEN" \\
"http://localhost:8001/products/1/prices?fetch_status=success&fetched_after=2026-01-14T00:00:00"
curl -H "Authorization: Bearer $PW_API_TOKEN" \\
"http://localhost:8001/logs?fetch_status=failed&fetched_before=2026-01-15T00:00:00"
```
Exports (CSV/JSON):
```bash
curl -H "Authorization: Bearer $PW_API_TOKEN" \\
"http://localhost:8001/products/export?format=csv"
curl -H "Authorization: Bearer $PW_API_TOKEN" \\
"http://localhost:8001/logs/export?format=json"
```
CRUD (examples rapides):
```bash
curl -H "Authorization: Bearer $PW_API_TOKEN" -X POST http://localhost:8001/products \\
-H "Content-Type: application/json" \\
-d '{"source":"amazon","reference":"REF1","url":"https://example.com"}'
```
Webhooks (exemples rapides):
```bash
curl -H "Authorization: Bearer $PW_API_TOKEN" -X POST http://localhost:8001/webhooks \\
-H "Content-Type: application/json" \\
-d '{"event":"price_changed","url":"https://example.com/webhook","enabled":true}'
curl -H "Authorization: Bearer $PW_API_TOKEN" -X POST http://localhost:8001/webhooks/1/test
```
## Web UI (Phase 4)
Interface Vue 3 dense avec themes Gruvbox/Monokai, header fixe, sidebar filtres, et split compare.
```bash
docker compose up -d frontend
# Acces: http://localhost:3000
```
## Configuration (scrap_url.yaml)
```yaml

42
TODO.md
View File

@@ -154,7 +154,7 @@ Liste des tâches priorisées pour le développement de PriceWatch.
---
## Phase 2 : Base de données (En cours)
## Phase 2 : Base de données (Terminee)
### Persistence
- [x] Schéma PostgreSQL
@@ -166,8 +166,13 @@ Liste des tâches priorisées pour le développement de PriceWatch.
- [x] ScrapingPipeline (persistence optionnelle)
- [x] CLI `--save-db/--no-db`
- [x] Tests end-to-end CLI + DB
- [ ] CRUD produits
- [ ] Historique prix
- [x] Tests backward compatibility (`--no-db`)
- [x] Tests performance (100+ produits)
- [x] CRUD produits
- [x] Historique prix
### Documentation
- [x] Migration guide (JSON -> DB)
### Configuration
- [x] Fichier config (DB credentials)
@@ -182,26 +187,43 @@ Liste des tâches priorisées pour le développement de PriceWatch.
- [x] Setup Redis
- [x] Worker RQ
- [x] Queue de scraping
- [x] Tests worker + scheduler
- [x] Gestion erreurs Redis (RedisUnavailableError)
- [ ] Retry policy
### Planification
- [x] Cron ou scheduler intégré
- [x] Scraping quotidien automatique
- [ ] Logs des runs
- [x] Logs des runs (JOB START/OK/FAILED)
- [x] Tests end-to-end worker + DB
- [x] Tests end-to-end CLI -> DB -> worker
## Phase 3 : API REST (En cours)
### API FastAPI
- [x] Endpoints read-only (products, prices, logs, health)
- [x] Auth token simple (Bearer)
- [x] Endpoints enqueue/schedule
- [x] CRUD products + prices + logs
- [x] Docker + uvicorn + config env
- [x] Tests API de base
- [x] Filtres avances (prix, dates, stock, status)
- [x] Exports CSV/JSON (products, prices, logs)
- [x] Webhooks (CRUD + test)
---
## Phase 4 : Web UI (Future)
### Backend API
- [ ] FastAPI endpoints
- [ ] Authentification
- [x] FastAPI endpoints
- [x] Authentification
- [ ] CORS
### Frontend
- [ ] Framework (React/Vue?)
- [ ] Design responsive
- [ ] Dark theme Gruvbox
- [x] Framework (Vue 3 + Vite)
- [x] Design responsive (layout dense + compact)
- [x] Dark theme Gruvbox (defaut) + Monokai
- [ ] Graphiques historique prix
- [ ] Gestion alertes
@@ -236,4 +258,4 @@ Liste des tâches priorisées pour le développement de PriceWatch.
---
**Dernière mise à jour**: 2026-01-14
**Dernière mise à jour**: 2026-01-15

0
alembic.ini Executable file → Normal file
View File

26
docker-compose.yml Executable file → Normal file
View File

@@ -5,6 +5,7 @@ services:
POSTGRES_DB: pricewatch
POSTGRES_USER: pricewatch
POSTGRES_PASSWORD: pricewatch
TZ: Europe/Paris
ports:
- "5432:5432"
volumes:
@@ -12,11 +13,36 @@ services:
redis:
image: redis:7
environment:
TZ: Europe/Paris
ports:
- "6379:6379"
volumes:
- pricewatch_redisdata:/data
api:
build: .
ports:
- "8001:8000"
env_file:
- .env
environment:
PW_DB_HOST: postgres
PW_REDIS_HOST: redis
TZ: Europe/Paris
depends_on:
- postgres
- redis
frontend:
build: ./webui
ports:
- "3000:80"
environment:
TZ: Europe/Paris
depends_on:
- api
volumes:
pricewatch_pgdata:
pricewatch_redisdata:

73
pricewatch.egg-info/PKG-INFO Executable file → Normal file
View File

@@ -28,6 +28,8 @@ Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: redis>=5.0.0
Requires-Dist: rq>=1.15.0
Requires-Dist: rq-scheduler>=0.13.0
Requires-Dist: fastapi>=0.110.0
Requires-Dist: uvicorn>=0.27.0
Provides-Extra: dev
Requires-Dist: pytest>=8.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
@@ -100,6 +102,13 @@ pricewatch/
│ │ ├── store.py
│ │ ├── selectors.yml
│ │ └── fixtures/
│ ├── db/ # Persistence SQLAlchemy (Phase 2)
│ │ ├── models.py
│ │ ├── connection.py
│ │ └── migrations/
│ ├── tasks/ # Jobs RQ (Phase 3)
│ │ ├── scrape.py
│ │ └── scheduler.py
│ └── cli/
│ └── main.py # CLI Typer
├── tests/ # Tests pytest
@@ -118,6 +127,9 @@ pricewatch run --yaml scrap_url.yaml --out scraped_store.json
# Avec debug
pricewatch run --yaml scrap_url.yaml --out scraped_store.json --debug
# Avec persistence DB
pricewatch run --yaml scrap_url.yaml --out scraped_store.json --save-db
```
### Commandes utilitaires
@@ -139,6 +151,63 @@ pricewatch parse amazon --in scraped/page.html
pricewatch doctor
```
### Commandes base de donnees
```bash
# Initialiser les tables
pricewatch init-db
# Generer une migration
pricewatch migrate "Initial schema"
# Appliquer les migrations
pricewatch upgrade
# Revenir en arriere
pricewatch downgrade -1
```
### Commandes worker
```bash
# Lancer un worker RQ
pricewatch worker
# Enqueue un job immediat
pricewatch enqueue "https://example.com/product"
# Planifier un job recurrent
pricewatch schedule "https://example.com/product" --interval 24
```
## Base de donnees (Phase 2)
```bash
# Lancer PostgreSQL + Redis en local
docker-compose up -d
# Exemple de configuration
cp .env.example .env
```
Guide de migration JSON -> DB: `MIGRATION_GUIDE.md`
## API REST (Phase 3)
L'API est protegee par un token simple.
```bash
export PW_API_TOKEN=change_me
docker compose up -d api
```
Exemples:
```bash
curl -H "Authorization: Bearer $PW_API_TOKEN" http://localhost:8000/products
curl http://localhost:8000/health
```
## Configuration (scrap_url.yaml)
```yaml
@@ -238,8 +307,8 @@ Aucune erreur ne doit crasher silencieusement : toutes sont loggées et tracées
- ✅ Tests pytest
### Phase 2 : Persistence
- [ ] Base de données PostgreSQL
- [ ] Migrations Alembic
- [x] Base de données PostgreSQL
- [x] Migrations Alembic
- [ ] Historique des prix
### Phase 3 : Automation

View File

@@ -18,6 +18,7 @@ pricewatch/app/core/registry.py
pricewatch/app/core/schema.py
pricewatch/app/scraping/__init__.py
pricewatch/app/scraping/http_fetch.py
pricewatch/app/scraping/pipeline.py
pricewatch/app/scraping/pw_fetch.py
pricewatch/app/stores/__init__.py
pricewatch/app/stores/base.py

View File

@@ -16,6 +16,8 @@ python-dotenv>=1.0.0
redis>=5.0.0
rq>=1.15.0
rq-scheduler>=0.13.0
fastapi>=0.110.0
uvicorn>=0.27.0
[dev]
pytest>=8.0.0

View File

@@ -0,0 +1,5 @@
"""Module API FastAPI."""
from pricewatch.app.api.main import app
__all__ = ["app"]

Binary file not shown.

Binary file not shown.

876
pricewatch/app/api/main.py Normal file
View File

@@ -0,0 +1,876 @@
"""
API REST FastAPI pour PriceWatch (Phase 3).
"""
from __future__ import annotations
import csv
from collections import deque
from datetime import datetime, timezone
import os
from pathlib import Path
from io import StringIO
from typing import Generator, Optional
import httpx
from fastapi import Depends, FastAPI, Header, HTTPException, Response
from fastapi.encoders import jsonable_encoder
from fastapi.responses import JSONResponse
from sqlalchemy.exc import IntegrityError, SQLAlchemyError
from sqlalchemy import and_, desc, func
from sqlalchemy.orm import Session
from pricewatch.app.api.schemas import (
EnqueueRequest,
EnqueueResponse,
HealthStatus,
PriceHistoryOut,
PriceHistoryCreate,
PriceHistoryUpdate,
ProductOut,
ProductCreate,
ProductUpdate,
ScheduleRequest,
ScheduleResponse,
ScrapingLogOut,
ScrapingLogCreate,
ScrapingLogUpdate,
ScrapePreviewRequest,
ScrapePreviewResponse,
ScrapeCommitRequest,
ScrapeCommitResponse,
VersionResponse,
BackendLogEntry,
UvicornLogEntry,
WebhookOut,
WebhookCreate,
WebhookUpdate,
WebhookTestResponse,
)
from pricewatch.app.core.config import get_config
from pricewatch.app.core.logging import get_logger
from pricewatch.app.core.schema import ProductSnapshot
from pricewatch.app.db.connection import check_db_connection, get_session
from pricewatch.app.db.models import PriceHistory, Product, ScrapingLog, Webhook
from pricewatch.app.scraping.pipeline import ScrapingPipeline
from pricewatch.app.tasks.scrape import scrape_product
from pricewatch.app.tasks.scheduler import RedisUnavailableError, check_redis_connection, ScrapingScheduler
logger = get_logger("api")
app = FastAPI(title="PriceWatch API", version="0.4.0")
# Buffer de logs backend en memoire pour debug UI.
BACKEND_LOGS = deque(maxlen=200)
UVICORN_LOG_PATH = Path(
os.environ.get("PW_UVICORN_LOG_PATH", "/app/logs/uvicorn.log")
)
def get_db_session() -> Generator[Session, None, None]:
"""Dependency: session SQLAlchemy."""
with get_session(get_config()) as session:
yield session
def require_token(authorization: Optional[str] = Header(default=None)) -> None:
"""Auth simple via token Bearer."""
config = get_config()
token = config.api_token
if not token:
raise HTTPException(status_code=500, detail="API token non configure")
if not authorization or not authorization.startswith("Bearer "):
raise HTTPException(status_code=401, detail="Token manquant")
provided = authorization.split("Bearer ")[-1].strip()
if provided != token:
raise HTTPException(status_code=403, detail="Token invalide")
@app.get("/health", response_model=HealthStatus)
def health_check() -> HealthStatus:
"""Health check DB + Redis."""
config = get_config()
return HealthStatus(
db=check_db_connection(config),
redis=check_redis_connection(config.redis.url),
)
@app.get("/version", response_model=VersionResponse)
def version_info() -> VersionResponse:
"""Expose la version API."""
return VersionResponse(api_version=app.version)
@app.get("/logs/backend", response_model=list[BackendLogEntry], dependencies=[Depends(require_token)])
def list_backend_logs() -> list[BackendLogEntry]:
"""Expose un buffer de logs backend."""
return list(BACKEND_LOGS)
@app.get("/logs/uvicorn", response_model=list[UvicornLogEntry], dependencies=[Depends(require_token)])
def list_uvicorn_logs(limit: int = 200) -> list[UvicornLogEntry]:
"""Expose les dernieres lignes du log Uvicorn."""
lines = _read_uvicorn_lines(limit=limit)
return [UvicornLogEntry(line=line) for line in lines]
@app.get("/products", response_model=list[ProductOut], dependencies=[Depends(require_token)])
def list_products(
source: Optional[str] = None,
reference: Optional[str] = None,
updated_after: Optional[datetime] = None,
price_min: Optional[float] = None,
price_max: Optional[float] = None,
fetched_after: Optional[datetime] = None,
fetched_before: Optional[datetime] = None,
stock_status: Optional[str] = None,
limit: int = 50,
offset: int = 0,
session: Session = Depends(get_db_session),
) -> list[ProductOut]:
"""Liste des produits avec filtres optionnels."""
latest_price_subquery = (
session.query(
PriceHistory.product_id.label("product_id"),
func.max(PriceHistory.fetched_at).label("latest_fetched_at"),
)
.group_by(PriceHistory.product_id)
.subquery()
)
latest_price = (
session.query(PriceHistory)
.join(
latest_price_subquery,
and_(
PriceHistory.product_id == latest_price_subquery.c.product_id,
PriceHistory.fetched_at == latest_price_subquery.c.latest_fetched_at,
),
)
.subquery()
)
query = session.query(Product).outerjoin(latest_price, Product.id == latest_price.c.product_id)
if source:
query = query.filter(Product.source == source)
if reference:
query = query.filter(Product.reference == reference)
if updated_after:
query = query.filter(Product.last_updated_at >= updated_after)
if price_min is not None:
query = query.filter(latest_price.c.price >= price_min)
if price_max is not None:
query = query.filter(latest_price.c.price <= price_max)
if fetched_after:
query = query.filter(latest_price.c.fetched_at >= fetched_after)
if fetched_before:
query = query.filter(latest_price.c.fetched_at <= fetched_before)
if stock_status:
query = query.filter(latest_price.c.stock_status == stock_status)
products = query.order_by(desc(Product.last_updated_at)).offset(offset).limit(limit).all()
return [_product_to_out(session, product) for product in products]
@app.post("/products", response_model=ProductOut, dependencies=[Depends(require_token)])
def create_product(
payload: ProductCreate,
session: Session = Depends(get_db_session),
) -> ProductOut:
"""Cree un produit."""
product = Product(
source=payload.source,
reference=payload.reference,
url=payload.url,
title=payload.title,
category=payload.category,
description=payload.description,
currency=payload.currency,
msrp=payload.msrp,
)
session.add(product)
try:
session.commit()
session.refresh(product)
except IntegrityError as exc:
session.rollback()
raise HTTPException(status_code=409, detail="Produit deja existant") from exc
except SQLAlchemyError as exc:
session.rollback()
raise HTTPException(status_code=500, detail="Erreur DB") from exc
return _product_to_out(session, product)
@app.get("/products/{product_id}", response_model=ProductOut, dependencies=[Depends(require_token)])
def get_product(
product_id: int,
session: Session = Depends(get_db_session),
) -> ProductOut:
"""Detail produit + dernier prix."""
product = session.query(Product).filter(Product.id == product_id).one_or_none()
if not product:
raise HTTPException(status_code=404, detail="Produit non trouve")
return _product_to_out(session, product)
@app.patch("/products/{product_id}", response_model=ProductOut, dependencies=[Depends(require_token)])
def update_product(
product_id: int,
payload: ProductUpdate,
session: Session = Depends(get_db_session),
) -> ProductOut:
"""Met a jour un produit (partial)."""
product = session.query(Product).filter(Product.id == product_id).one_or_none()
if not product:
raise HTTPException(status_code=404, detail="Produit non trouve")
updates = payload.model_dump(exclude_unset=True)
for key, value in updates.items():
setattr(product, key, value)
try:
session.commit()
session.refresh(product)
except SQLAlchemyError as exc:
session.rollback()
raise HTTPException(status_code=500, detail="Erreur DB") from exc
return _product_to_out(session, product)
@app.delete("/products/{product_id}", dependencies=[Depends(require_token)])
def delete_product(
product_id: int,
session: Session = Depends(get_db_session),
) -> dict[str, str]:
"""Supprime un produit (cascade)."""
product = session.query(Product).filter(Product.id == product_id).one_or_none()
if not product:
raise HTTPException(status_code=404, detail="Produit non trouve")
session.delete(product)
try:
session.commit()
except SQLAlchemyError as exc:
session.rollback()
raise HTTPException(status_code=500, detail="Erreur DB") from exc
return {"status": "deleted"}
@app.get(
"/products/{product_id}/prices",
response_model=list[PriceHistoryOut],
dependencies=[Depends(require_token)],
)
def list_prices(
product_id: int,
price_min: Optional[float] = None,
price_max: Optional[float] = None,
fetched_after: Optional[datetime] = None,
fetched_before: Optional[datetime] = None,
fetch_status: Optional[str] = None,
limit: int = 50,
offset: int = 0,
session: Session = Depends(get_db_session),
) -> list[PriceHistoryOut]:
"""Historique de prix pour un produit."""
query = session.query(PriceHistory).filter(PriceHistory.product_id == product_id)
if price_min is not None:
query = query.filter(PriceHistory.price >= price_min)
if price_max is not None:
query = query.filter(PriceHistory.price <= price_max)
if fetched_after:
query = query.filter(PriceHistory.fetched_at >= fetched_after)
if fetched_before:
query = query.filter(PriceHistory.fetched_at <= fetched_before)
if fetch_status:
query = query.filter(PriceHistory.fetch_status == fetch_status)
prices = query.order_by(desc(PriceHistory.fetched_at)).offset(offset).limit(limit).all()
return [_price_to_out(price) for price in prices]
@app.post("/prices", response_model=PriceHistoryOut, dependencies=[Depends(require_token)])
def create_price(
payload: PriceHistoryCreate,
session: Session = Depends(get_db_session),
) -> PriceHistoryOut:
"""Ajoute une entree d'historique de prix."""
price = PriceHistory(
product_id=payload.product_id,
price=payload.price,
shipping_cost=payload.shipping_cost,
stock_status=payload.stock_status,
fetch_method=payload.fetch_method,
fetch_status=payload.fetch_status,
fetched_at=payload.fetched_at,
)
session.add(price)
try:
session.commit()
session.refresh(price)
except IntegrityError as exc:
session.rollback()
raise HTTPException(status_code=409, detail="Entree prix deja existante") from exc
except SQLAlchemyError as exc:
session.rollback()
raise HTTPException(status_code=500, detail="Erreur DB") from exc
return _price_to_out(price)
@app.patch("/prices/{price_id}", response_model=PriceHistoryOut, dependencies=[Depends(require_token)])
def update_price(
price_id: int,
payload: PriceHistoryUpdate,
session: Session = Depends(get_db_session),
) -> PriceHistoryOut:
"""Met a jour une entree de prix."""
price = session.query(PriceHistory).filter(PriceHistory.id == price_id).one_or_none()
if not price:
raise HTTPException(status_code=404, detail="Entree prix non trouvee")
updates = payload.model_dump(exclude_unset=True)
for key, value in updates.items():
setattr(price, key, value)
try:
session.commit()
session.refresh(price)
except SQLAlchemyError as exc:
session.rollback()
raise HTTPException(status_code=500, detail="Erreur DB") from exc
return _price_to_out(price)
@app.delete("/prices/{price_id}", dependencies=[Depends(require_token)])
def delete_price(
price_id: int,
session: Session = Depends(get_db_session),
) -> dict[str, str]:
"""Supprime une entree de prix."""
price = session.query(PriceHistory).filter(PriceHistory.id == price_id).one_or_none()
if not price:
raise HTTPException(status_code=404, detail="Entree prix non trouvee")
session.delete(price)
try:
session.commit()
except SQLAlchemyError as exc:
session.rollback()
raise HTTPException(status_code=500, detail="Erreur DB") from exc
return {"status": "deleted"}
@app.get("/logs", response_model=list[ScrapingLogOut], dependencies=[Depends(require_token)])
def list_logs(
source: Optional[str] = None,
fetch_status: Optional[str] = None,
fetched_after: Optional[datetime] = None,
fetched_before: Optional[datetime] = None,
limit: int = 50,
offset: int = 0,
session: Session = Depends(get_db_session),
) -> list[ScrapingLogOut]:
"""Liste des logs de scraping."""
query = session.query(ScrapingLog)
if source:
query = query.filter(ScrapingLog.source == source)
if fetch_status:
query = query.filter(ScrapingLog.fetch_status == fetch_status)
if fetched_after:
query = query.filter(ScrapingLog.fetched_at >= fetched_after)
if fetched_before:
query = query.filter(ScrapingLog.fetched_at <= fetched_before)
logs = query.order_by(desc(ScrapingLog.fetched_at)).offset(offset).limit(limit).all()
return [_log_to_out(log) for log in logs]
@app.post("/logs", response_model=ScrapingLogOut, dependencies=[Depends(require_token)])
def create_log(
payload: ScrapingLogCreate,
session: Session = Depends(get_db_session),
) -> ScrapingLogOut:
"""Cree un log de scraping."""
log_entry = ScrapingLog(
product_id=payload.product_id,
url=payload.url,
source=payload.source,
reference=payload.reference,
fetch_method=payload.fetch_method,
fetch_status=payload.fetch_status,
fetched_at=payload.fetched_at,
duration_ms=payload.duration_ms,
html_size_bytes=payload.html_size_bytes,
errors=payload.errors,
notes=payload.notes,
)
session.add(log_entry)
try:
session.commit()
session.refresh(log_entry)
except SQLAlchemyError as exc:
session.rollback()
raise HTTPException(status_code=500, detail="Erreur DB") from exc
return _log_to_out(log_entry)
@app.patch("/logs/{log_id}", response_model=ScrapingLogOut, dependencies=[Depends(require_token)])
def update_log(
log_id: int,
payload: ScrapingLogUpdate,
session: Session = Depends(get_db_session),
) -> ScrapingLogOut:
"""Met a jour un log."""
log_entry = session.query(ScrapingLog).filter(ScrapingLog.id == log_id).one_or_none()
if not log_entry:
raise HTTPException(status_code=404, detail="Log non trouve")
updates = payload.model_dump(exclude_unset=True)
for key, value in updates.items():
setattr(log_entry, key, value)
try:
session.commit()
session.refresh(log_entry)
except SQLAlchemyError as exc:
session.rollback()
raise HTTPException(status_code=500, detail="Erreur DB") from exc
return _log_to_out(log_entry)
@app.delete("/logs/{log_id}", dependencies=[Depends(require_token)])
def delete_log(
log_id: int,
session: Session = Depends(get_db_session),
) -> dict[str, str]:
"""Supprime un log."""
log_entry = session.query(ScrapingLog).filter(ScrapingLog.id == log_id).one_or_none()
if not log_entry:
raise HTTPException(status_code=404, detail="Log non trouve")
session.delete(log_entry)
try:
session.commit()
except SQLAlchemyError as exc:
session.rollback()
raise HTTPException(status_code=500, detail="Erreur DB") from exc
return {"status": "deleted"}
@app.get("/products/export", dependencies=[Depends(require_token)])
def export_products(
source: Optional[str] = None,
reference: Optional[str] = None,
updated_after: Optional[datetime] = None,
price_min: Optional[float] = None,
price_max: Optional[float] = None,
fetched_after: Optional[datetime] = None,
fetched_before: Optional[datetime] = None,
stock_status: Optional[str] = None,
format: str = "csv",
limit: int = 500,
offset: int = 0,
session: Session = Depends(get_db_session),
) -> Response:
"""Export produits en CSV/JSON."""
products = list_products(
source=source,
reference=reference,
updated_after=updated_after,
price_min=price_min,
price_max=price_max,
fetched_after=fetched_after,
fetched_before=fetched_before,
stock_status=stock_status,
limit=limit,
offset=offset,
session=session,
)
rows = [product.model_dump() for product in products]
fieldnames = list(ProductOut.model_fields.keys())
return _export_response(rows, fieldnames, "products", format)
@app.get("/prices/export", dependencies=[Depends(require_token)])
def export_prices(
product_id: Optional[int] = None,
price_min: Optional[float] = None,
price_max: Optional[float] = None,
fetched_after: Optional[datetime] = None,
fetched_before: Optional[datetime] = None,
fetch_status: Optional[str] = None,
format: str = "csv",
limit: int = 500,
offset: int = 0,
session: Session = Depends(get_db_session),
) -> Response:
"""Export historique de prix en CSV/JSON."""
query = session.query(PriceHistory)
if product_id is not None:
query = query.filter(PriceHistory.product_id == product_id)
if price_min is not None:
query = query.filter(PriceHistory.price >= price_min)
if price_max is not None:
query = query.filter(PriceHistory.price <= price_max)
if fetched_after:
query = query.filter(PriceHistory.fetched_at >= fetched_after)
if fetched_before:
query = query.filter(PriceHistory.fetched_at <= fetched_before)
if fetch_status:
query = query.filter(PriceHistory.fetch_status == fetch_status)
prices = query.order_by(desc(PriceHistory.fetched_at)).offset(offset).limit(limit).all()
rows = [_price_to_out(price).model_dump() for price in prices]
fieldnames = list(PriceHistoryOut.model_fields.keys())
return _export_response(rows, fieldnames, "prices", format)
@app.get("/logs/export", dependencies=[Depends(require_token)])
def export_logs(
source: Optional[str] = None,
fetch_status: Optional[str] = None,
fetched_after: Optional[datetime] = None,
fetched_before: Optional[datetime] = None,
format: str = "csv",
limit: int = 500,
offset: int = 0,
session: Session = Depends(get_db_session),
) -> Response:
"""Export logs de scraping en CSV/JSON."""
logs = list_logs(
source=source,
fetch_status=fetch_status,
fetched_after=fetched_after,
fetched_before=fetched_before,
limit=limit,
offset=offset,
session=session,
)
rows = [log.model_dump() for log in logs]
fieldnames = list(ScrapingLogOut.model_fields.keys())
return _export_response(rows, fieldnames, "logs", format)
@app.get("/webhooks", response_model=list[WebhookOut], dependencies=[Depends(require_token)])
def list_webhooks(
event: Optional[str] = None,
enabled: Optional[bool] = None,
limit: int = 50,
offset: int = 0,
session: Session = Depends(get_db_session),
) -> list[WebhookOut]:
"""Liste des webhooks."""
query = session.query(Webhook)
if event:
query = query.filter(Webhook.event == event)
if enabled is not None:
query = query.filter(Webhook.enabled == enabled)
webhooks = query.order_by(desc(Webhook.created_at)).offset(offset).limit(limit).all()
return [_webhook_to_out(webhook) for webhook in webhooks]
@app.post("/webhooks", response_model=WebhookOut, dependencies=[Depends(require_token)])
def create_webhook(
payload: WebhookCreate,
session: Session = Depends(get_db_session),
) -> WebhookOut:
"""Cree un webhook."""
webhook = Webhook(
event=payload.event,
url=payload.url,
enabled=payload.enabled,
secret=payload.secret,
)
session.add(webhook)
try:
session.commit()
session.refresh(webhook)
except SQLAlchemyError as exc:
session.rollback()
raise HTTPException(status_code=500, detail="Erreur DB") from exc
return _webhook_to_out(webhook)
@app.patch("/webhooks/{webhook_id}", response_model=WebhookOut, dependencies=[Depends(require_token)])
def update_webhook(
webhook_id: int,
payload: WebhookUpdate,
session: Session = Depends(get_db_session),
) -> WebhookOut:
"""Met a jour un webhook."""
webhook = session.query(Webhook).filter(Webhook.id == webhook_id).one_or_none()
if not webhook:
raise HTTPException(status_code=404, detail="Webhook non trouve")
updates = payload.model_dump(exclude_unset=True)
for key, value in updates.items():
setattr(webhook, key, value)
try:
session.commit()
session.refresh(webhook)
except SQLAlchemyError as exc:
session.rollback()
raise HTTPException(status_code=500, detail="Erreur DB") from exc
return _webhook_to_out(webhook)
@app.delete("/webhooks/{webhook_id}", dependencies=[Depends(require_token)])
def delete_webhook(
webhook_id: int,
session: Session = Depends(get_db_session),
) -> dict[str, str]:
"""Supprime un webhook."""
webhook = session.query(Webhook).filter(Webhook.id == webhook_id).one_or_none()
if not webhook:
raise HTTPException(status_code=404, detail="Webhook non trouve")
session.delete(webhook)
try:
session.commit()
except SQLAlchemyError as exc:
session.rollback()
raise HTTPException(status_code=500, detail="Erreur DB") from exc
return {"status": "deleted"}
@app.post(
"/webhooks/{webhook_id}/test",
response_model=WebhookTestResponse,
dependencies=[Depends(require_token)],
)
def send_webhook_test(
webhook_id: int,
session: Session = Depends(get_db_session),
) -> WebhookTestResponse:
"""Envoie un evenement de test."""
webhook = session.query(Webhook).filter(Webhook.id == webhook_id).one_or_none()
if not webhook:
raise HTTPException(status_code=404, detail="Webhook non trouve")
if not webhook.enabled:
raise HTTPException(status_code=409, detail="Webhook desactive")
payload = {"message": "test webhook", "webhook_id": webhook.id}
_send_webhook(webhook, "test", payload)
return WebhookTestResponse(status="sent")
@app.post("/enqueue", response_model=EnqueueResponse, dependencies=[Depends(require_token)])
def enqueue_job(payload: EnqueueRequest) -> EnqueueResponse:
"""Enqueue un job immediat."""
try:
scheduler = ScrapingScheduler(get_config())
job = scheduler.enqueue_immediate(
payload.url,
use_playwright=payload.use_playwright,
save_db=payload.save_db,
)
return EnqueueResponse(job_id=job.id)
except RedisUnavailableError as exc:
raise HTTPException(status_code=503, detail=str(exc)) from exc
@app.post("/schedule", response_model=ScheduleResponse, dependencies=[Depends(require_token)])
def schedule_job(payload: ScheduleRequest) -> ScheduleResponse:
"""Planifie un job recurrent."""
try:
scheduler = ScrapingScheduler(get_config())
job_info = scheduler.schedule_product(
payload.url,
interval_hours=payload.interval_hours,
use_playwright=payload.use_playwright,
save_db=payload.save_db,
)
return ScheduleResponse(job_id=job_info.job_id, next_run=job_info.next_run)
except RedisUnavailableError as exc:
raise HTTPException(status_code=503, detail=str(exc)) from exc
@app.post("/scrape/preview", response_model=ScrapePreviewResponse, dependencies=[Depends(require_token)])
def preview_scrape(payload: ScrapePreviewRequest) -> ScrapePreviewResponse:
"""Scrape un produit sans persistence pour previsualisation."""
_add_backend_log("INFO", f"Preview scraping: {payload.url}")
result = scrape_product(
payload.url,
use_playwright=payload.use_playwright,
save_db=False,
)
snapshot = result.get("snapshot")
if snapshot is None:
_add_backend_log("ERROR", f"Preview scraping KO: {payload.url}")
return ScrapePreviewResponse(success=False, snapshot=None, error=result.get("error"))
return ScrapePreviewResponse(
success=bool(result.get("success")),
snapshot=snapshot.model_dump(mode="json"),
error=result.get("error"),
)
@app.post("/scrape/commit", response_model=ScrapeCommitResponse, dependencies=[Depends(require_token)])
def commit_scrape(payload: ScrapeCommitRequest) -> ScrapeCommitResponse:
"""Persiste un snapshot previsualise."""
try:
snapshot = ProductSnapshot.model_validate(payload.snapshot)
except Exception as exc:
_add_backend_log("ERROR", "Commit scraping KO: snapshot invalide")
raise HTTPException(status_code=400, detail="Snapshot invalide") from exc
product_id = ScrapingPipeline(config=get_config()).process_snapshot(snapshot, save_to_db=True)
_add_backend_log("INFO", f"Commit scraping OK: product_id={product_id}")
return ScrapeCommitResponse(success=True, product_id=product_id)
def _export_response(
rows: list[dict[str, object]],
fieldnames: list[str],
filename_prefix: str,
format: str,
) -> Response:
"""Expose une reponse CSV/JSON avec un nom de fichier stable."""
if format not in {"csv", "json"}:
raise HTTPException(status_code=400, detail="Format invalide (csv ou json)")
headers = {"Content-Disposition": f'attachment; filename="{filename_prefix}.{format}"'}
if format == "json":
return JSONResponse(content=jsonable_encoder(rows), headers=headers)
return _to_csv_response(rows, fieldnames, headers)
def _to_csv_response(
rows: list[dict[str, object]],
fieldnames: list[str],
headers: dict[str, str],
) -> Response:
buffer = StringIO()
writer = csv.DictWriter(buffer, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(rows)
return Response(content=buffer.getvalue(), media_type="text/csv", headers=headers)
def _send_webhook(webhook: Webhook, event: str, payload: dict[str, object]) -> None:
"""Envoie un webhook avec gestion d'erreur explicite."""
headers = {"Content-Type": "application/json"}
if webhook.secret:
headers["X-Webhook-Secret"] = webhook.secret
try:
response = httpx.post(
webhook.url,
json={"event": event, "payload": payload},
headers=headers,
timeout=5.0,
)
response.raise_for_status()
except httpx.HTTPError as exc:
logger.error("Erreur webhook", extra={"url": webhook.url, "event": event, "error": str(exc)})
raise HTTPException(status_code=502, detail="Echec webhook") from exc
def _add_backend_log(level: str, message: str) -> None:
BACKEND_LOGS.append(
BackendLogEntry(
time=datetime.now(timezone.utc),
level=level,
message=message,
)
)
def _read_uvicorn_lines(limit: int = 200) -> list[str]:
"""Lit les dernieres lignes du log Uvicorn si disponible."""
if limit <= 0:
return []
try:
if not UVICORN_LOG_PATH.exists():
return []
with UVICORN_LOG_PATH.open("r", encoding="utf-8", errors="ignore") as handle:
lines = handle.readlines()
return [line.rstrip("\n") for line in lines[-limit:]]
except Exception:
return []
def _product_to_out(session: Session, product: Product) -> ProductOut:
"""Helper pour mapper Product + dernier prix."""
latest = (
session.query(PriceHistory)
.filter(PriceHistory.product_id == product.id)
.order_by(desc(PriceHistory.fetched_at))
.first()
)
images = [image.image_url for image in product.images]
specs = {spec.spec_key: spec.spec_value for spec in product.specs}
discount_amount = None
discount_percent = None
if latest and latest.price is not None and product.msrp:
discount_amount = float(product.msrp) - float(latest.price)
if product.msrp > 0:
discount_percent = (discount_amount / float(product.msrp)) * 100
return ProductOut(
id=product.id,
source=product.source,
reference=product.reference,
url=product.url,
title=product.title,
category=product.category,
description=product.description,
currency=product.currency,
msrp=float(product.msrp) if product.msrp is not None else None,
first_seen_at=product.first_seen_at,
last_updated_at=product.last_updated_at,
latest_price=float(latest.price) if latest and latest.price is not None else None,
latest_shipping_cost=(
float(latest.shipping_cost) if latest and latest.shipping_cost is not None else None
),
latest_stock_status=latest.stock_status if latest else None,
latest_fetched_at=latest.fetched_at if latest else None,
images=images,
specs=specs,
discount_amount=discount_amount,
discount_percent=discount_percent,
)
def _price_to_out(price: PriceHistory) -> PriceHistoryOut:
return PriceHistoryOut(
id=price.id,
product_id=price.product_id,
price=float(price.price) if price.price is not None else None,
shipping_cost=float(price.shipping_cost) if price.shipping_cost is not None else None,
stock_status=price.stock_status,
fetch_method=price.fetch_method,
fetch_status=price.fetch_status,
fetched_at=price.fetched_at,
)
def _log_to_out(log: ScrapingLog) -> ScrapingLogOut:
return ScrapingLogOut(
id=log.id,
product_id=log.product_id,
url=log.url,
source=log.source,
reference=log.reference,
fetch_method=log.fetch_method,
fetch_status=log.fetch_status,
fetched_at=log.fetched_at,
duration_ms=log.duration_ms,
html_size_bytes=log.html_size_bytes,
errors=log.errors,
notes=log.notes,
)
def _webhook_to_out(webhook: Webhook) -> WebhookOut:
return WebhookOut(
id=webhook.id,
event=webhook.event,
url=webhook.url,
enabled=webhook.enabled,
secret=webhook.secret,
created_at=webhook.created_at,
)

View File

@@ -0,0 +1,212 @@
"""
Schemas API FastAPI pour Phase 3.
"""
from datetime import datetime
from typing import Optional
from pydantic import BaseModel, Field
class HealthStatus(BaseModel):
db: bool
redis: bool
class ProductOut(BaseModel):
id: int
source: str
reference: str
url: str
title: Optional[str] = None
category: Optional[str] = None
description: Optional[str] = None
currency: Optional[str] = None
msrp: Optional[float] = None
first_seen_at: datetime
last_updated_at: datetime
latest_price: Optional[float] = None
latest_shipping_cost: Optional[float] = None
latest_stock_status: Optional[str] = None
latest_fetched_at: Optional[datetime] = None
images: list[str] = []
specs: dict[str, str] = {}
discount_amount: Optional[float] = None
discount_percent: Optional[float] = None
class ProductCreate(BaseModel):
source: str
reference: str
url: str
title: Optional[str] = None
category: Optional[str] = None
description: Optional[str] = None
currency: Optional[str] = None
msrp: Optional[float] = None
class ProductUpdate(BaseModel):
url: Optional[str] = None
title: Optional[str] = None
category: Optional[str] = None
description: Optional[str] = None
currency: Optional[str] = None
msrp: Optional[float] = None
class PriceHistoryOut(BaseModel):
id: int
product_id: int
price: Optional[float] = None
shipping_cost: Optional[float] = None
stock_status: Optional[str] = None
fetch_method: str
fetch_status: str
fetched_at: datetime
class PriceHistoryCreate(BaseModel):
product_id: int
price: Optional[float] = None
shipping_cost: Optional[float] = None
stock_status: Optional[str] = None
fetch_method: str
fetch_status: str
fetched_at: datetime
class PriceHistoryUpdate(BaseModel):
price: Optional[float] = None
shipping_cost: Optional[float] = None
stock_status: Optional[str] = None
fetch_method: Optional[str] = None
fetch_status: Optional[str] = None
fetched_at: Optional[datetime] = None
class ScrapingLogOut(BaseModel):
id: int
product_id: Optional[int] = None
url: str
source: str
reference: Optional[str] = None
fetch_method: str
fetch_status: str
fetched_at: datetime
duration_ms: Optional[int] = None
html_size_bytes: Optional[int] = None
errors: Optional[list[str]] = None
notes: Optional[list[str]] = None
class WebhookOut(BaseModel):
id: int
event: str
url: str
enabled: bool
secret: Optional[str] = None
created_at: datetime
class WebhookCreate(BaseModel):
event: str
url: str
enabled: bool = True
secret: Optional[str] = None
class WebhookUpdate(BaseModel):
event: Optional[str] = None
url: Optional[str] = None
enabled: Optional[bool] = None
secret: Optional[str] = None
class WebhookTestResponse(BaseModel):
status: str
class ScrapingLogCreate(BaseModel):
product_id: Optional[int] = None
url: str
source: str
reference: Optional[str] = None
fetch_method: str
fetch_status: str
fetched_at: datetime
duration_ms: Optional[int] = None
html_size_bytes: Optional[int] = None
errors: Optional[list[str]] = None
notes: Optional[list[str]] = None
class ScrapingLogUpdate(BaseModel):
product_id: Optional[int] = None
url: Optional[str] = None
source: Optional[str] = None
reference: Optional[str] = None
fetch_method: Optional[str] = None
fetch_status: Optional[str] = None
fetched_at: Optional[datetime] = None
duration_ms: Optional[int] = None
html_size_bytes: Optional[int] = None
errors: Optional[list[str]] = None
notes: Optional[list[str]] = None
class EnqueueRequest(BaseModel):
url: str = Field(..., description="URL du produit")
use_playwright: Optional[bool] = None
save_db: bool = True
class EnqueueResponse(BaseModel):
job_id: str
class ScheduleRequest(BaseModel):
url: str = Field(..., description="URL du produit")
interval_hours: int = Field(default=24, ge=1)
use_playwright: Optional[bool] = None
save_db: bool = True
class ScheduleResponse(BaseModel):
job_id: str
next_run: datetime
class ScrapePreviewRequest(BaseModel):
url: str
use_playwright: Optional[bool] = None
class ScrapePreviewResponse(BaseModel):
success: bool
snapshot: Optional[dict[str, object]] = None
error: Optional[str] = None
class ScrapeCommitRequest(BaseModel):
snapshot: dict[str, object]
class ScrapeCommitResponse(BaseModel):
success: bool
product_id: Optional[int] = None
error: Optional[str] = None
class VersionResponse(BaseModel):
api_version: str
class BackendLogEntry(BaseModel):
time: datetime
level: str
message: str
class UvicornLogEntry(BaseModel):
line: str

BIN
pricewatch/app/cli/__pycache__/main.cpython-313.pyc Executable file → Normal file

Binary file not shown.

View File

@@ -15,7 +15,7 @@ from typing import Optional
import redis
import typer
from rq import Connection, Worker
from rq import Worker
from alembic import command as alembic_command
from alembic.config import Config as AlembicConfig
from rich import print as rprint
@@ -34,7 +34,7 @@ from pricewatch.app.scraping.pipeline import ScrapingPipeline
from pricewatch.app.scraping.pw_fetch import fetch_playwright
from pricewatch.app.stores.amazon.store import AmazonStore
from pricewatch.app.stores.cdiscount.store import CdiscountStore
from pricewatch.app.tasks.scheduler import ScrapingScheduler
from pricewatch.app.tasks.scheduler import RedisUnavailableError, ScrapingScheduler
# Créer l'application Typer
app = typer.Typer(
@@ -197,18 +197,21 @@ def run(
html = None
fetch_method = FetchMethod.HTTP
fetch_error = None
http_result = None
# Tenter HTTP d'abord
logger.info("Tentative HTTP...")
http_result = fetch_http(canonical_url)
if config.options.force_playwright:
logger.info("Playwright force, skip HTTP")
else:
logger.info("Tentative HTTP...")
http_result = fetch_http(canonical_url)
if http_result.success:
if http_result and http_result.success:
html = http_result.html
fetch_method = FetchMethod.HTTP
logger.info("✓ HTTP réussi")
elif config.options.use_playwright:
# Fallback Playwright
logger.warning(f"HTTP échoué: {http_result.error}, fallback Playwright")
fallback_reason = http_result.error if http_result else "force_playwright"
logger.warning(f"HTTP échoué: {fallback_reason}, fallback Playwright")
pw_result = fetch_playwright(
canonical_url,
headless=not config.options.headful,
@@ -231,7 +234,7 @@ def run(
fetch_error = pw_result.error
logger.error(f"✗ Playwright échoué: {fetch_error}")
else:
fetch_error = http_result.error
fetch_error = http_result.error if http_result else "skip_http"
logger.error(f"✗ HTTP échoué: {fetch_error}")
# Parser si on a du HTML
@@ -467,11 +470,25 @@ def worker(
Lance un worker RQ.
"""
config = get_config()
connection = redis.from_url(config.redis.url)
try:
connection = redis.from_url(config.redis.url)
# Verification connexion avant de lancer le worker
connection.ping()
except redis.exceptions.ConnectionError as e:
rprint(f"[red]✗ Impossible de se connecter a Redis ({config.redis.url})[/red]")
rprint(f"[red] Erreur: {e}[/red]")
rprint("\n[yellow]Verifiez que Redis est demarre:[/yellow]")
rprint(" docker compose up -d redis")
rprint(" # ou")
rprint(" redis-server")
raise typer.Exit(code=1)
except redis.exceptions.RedisError as e:
rprint(f"[red]✗ Erreur Redis: {e}[/red]")
raise typer.Exit(code=1)
with Connection(connection):
worker_instance = Worker([queue])
worker_instance.work(with_scheduler=with_scheduler)
# RQ 2.x: connexion passee directement au Worker
worker_instance = Worker([queue], connection=connection)
worker_instance.work(with_scheduler=with_scheduler)
@app.command()
@@ -486,9 +503,15 @@ def enqueue(
"""
Enqueue un scraping immediat.
"""
scheduler = ScrapingScheduler(get_config(), queue_name=queue)
job = scheduler.enqueue_immediate(url, use_playwright=use_playwright, save_db=save_db)
rprint(f"[green]✓ Job enqueued: {job.id}[/green]")
try:
scheduler = ScrapingScheduler(get_config(), queue_name=queue)
job = scheduler.enqueue_immediate(url, use_playwright=use_playwright, save_db=save_db)
rprint(f"[green]✓ Job enqueued: {job.id}[/green]")
except RedisUnavailableError as e:
rprint(f"[red]✗ {e.message}[/red]")
rprint("\n[yellow]Verifiez que Redis est demarre:[/yellow]")
rprint(" docker compose up -d redis")
raise typer.Exit(code=1)
@app.command()
@@ -504,16 +527,22 @@ def schedule(
"""
Planifie un scraping recurrent.
"""
scheduler = ScrapingScheduler(get_config(), queue_name=queue)
job_info = scheduler.schedule_product(
url,
interval_hours=interval,
use_playwright=use_playwright,
save_db=save_db,
)
rprint(
f"[green]✓ Job planifie: {job_info.job_id} (next={job_info.next_run.isoformat()})[/green]"
)
try:
scheduler = ScrapingScheduler(get_config(), queue_name=queue)
job_info = scheduler.schedule_product(
url,
interval_hours=interval,
use_playwright=use_playwright,
save_db=save_db,
)
rprint(
f"[green]✓ Job planifie: {job_info.job_id} (next={job_info.next_run.isoformat()})[/green]"
)
except RedisUnavailableError as e:
rprint(f"[red]✗ {e.message}[/red]")
rprint("\n[yellow]Verifiez que Redis est demarre:[/yellow]")
rprint(" docker compose up -d redis")
raise typer.Exit(code=1)
if __name__ == "__main__":

BIN
pricewatch/app/core/__pycache__/config.cpython-313.pyc Executable file → Normal file

Binary file not shown.

BIN
pricewatch/app/core/__pycache__/schema.cpython-313.pyc Executable file → Normal file

Binary file not shown.

6
pricewatch/app/core/config.py Executable file → Normal file
View File

@@ -108,6 +108,11 @@ class AppConfig(BaseSettings):
default=True, description="Enable background worker functionality"
)
# API auth
api_token: Optional[str] = Field(
default=None, description="API token simple (Bearer)"
)
# Scraping defaults
default_playwright_timeout: int = Field(
default=60000, description="Default Playwright timeout in milliseconds"
@@ -138,6 +143,7 @@ class AppConfig(BaseSettings):
logger.info(f"Worker enabled: {self.enable_worker}")
logger.info(f"Worker timeout: {self.worker_timeout}s")
logger.info(f"Worker concurrency: {self.worker_concurrency}")
logger.info(f"API token configured: {bool(self.api_token)}")
logger.info("================================")

View File

@@ -23,6 +23,9 @@ class ScrapingOptions(BaseModel):
use_playwright: bool = Field(
default=True, description="Utiliser Playwright en fallback"
)
force_playwright: bool = Field(
default=False, description="Forcer Playwright même si HTTP réussi"
)
headful: bool = Field(default=False, description="Mode headful (voir le navigateur)")
save_html: bool = Field(
default=True, description="Sauvegarder HTML pour debug"
@@ -94,7 +97,8 @@ def read_yaml_config(yaml_path: str | Path) -> ScrapingConfig:
config = ScrapingConfig.model_validate(data)
logger.info(
f"Configuration chargée: {len(config.urls)} URL(s), "
f"playwright={config.options.use_playwright}"
f"playwright={config.options.use_playwright}, "
f"force_playwright={config.options.force_playwright}"
)
return config

View File

@@ -9,7 +9,7 @@ from datetime import datetime
from enum import Enum
from typing import Optional
from pydantic import BaseModel, Field, HttpUrl, field_validator
from pydantic import BaseModel, ConfigDict, Field, HttpUrl, field_validator
class StockStatus(str, Enum):
@@ -38,6 +38,8 @@ class DebugStatus(str, Enum):
class DebugInfo(BaseModel):
"""Informations de debug pour tracer les problèmes de scraping."""
model_config = ConfigDict(use_enum_values=True)
method: FetchMethod = Field(
description="Méthode utilisée pour la récupération (http ou playwright)"
)
@@ -55,9 +57,6 @@ class DebugInfo(BaseModel):
default=None, description="Taille du HTML récupéré en octets"
)
class Config:
use_enum_values = True
class ProductSnapshot(BaseModel):
"""
@@ -81,6 +80,7 @@ class ProductSnapshot(BaseModel):
# Données produit principales
title: Optional[str] = Field(default=None, description="Nom du produit")
price: Optional[float] = Field(default=None, description="Prix du produit", ge=0)
msrp: Optional[float] = Field(default=None, description="Prix conseille", ge=0)
currency: str = Field(default="EUR", description="Devise (EUR, USD, etc.)")
shipping_cost: Optional[float] = Field(
default=None, description="Frais de port", ge=0
@@ -94,6 +94,7 @@ class ProductSnapshot(BaseModel):
default=None, description="Référence produit (ASIN, SKU, etc.)"
)
category: Optional[str] = Field(default=None, description="Catégorie du produit")
description: Optional[str] = Field(default=None, description="Description produit")
# Médias
images: list[str] = Field(
@@ -133,20 +134,22 @@ class ProductSnapshot(BaseModel):
"""Filtre les URLs d'images vides."""
return [url.strip() for url in v if url and url.strip()]
class Config:
use_enum_values = True
json_schema_extra = {
model_config = ConfigDict(
use_enum_values=True,
json_schema_extra={
"example": {
"source": "amazon",
"url": "https://www.amazon.fr/dp/B08N5WRWNW",
"fetched_at": "2026-01-13T10:30:00Z",
"title": "Exemple de produit",
"price": 299.99,
"msrp": 349.99,
"currency": "EUR",
"shipping_cost": 0.0,
"stock_status": "in_stock",
"reference": "B08N5WRWNW",
"category": "Electronics",
"description": "Chargeur USB-C multi-ports.",
"images": [
"https://example.com/image1.jpg",
"https://example.com/image2.jpg",
@@ -165,7 +168,8 @@ class ProductSnapshot(BaseModel):
"html_size_bytes": 145000,
},
}
}
},
)
def to_dict(self) -> dict:
"""Serialize vers un dictionnaire Python natif."""

2
pricewatch/app/db/__init__.py Executable file → Normal file
View File

@@ -20,6 +20,7 @@ from pricewatch.app.db.models import (
ProductImage,
ProductSpec,
ScrapingLog,
Webhook,
)
__all__ = [
@@ -30,6 +31,7 @@ __all__ = [
"ProductImage",
"ProductSpec",
"ScrapingLog",
"Webhook",
"ProductRepository",
# Connection
"get_engine",

BIN
pricewatch/app/db/__pycache__/__init__.cpython-313.pyc Executable file → Normal file

Binary file not shown.

View File

BIN
pricewatch/app/db/__pycache__/models.cpython-313.pyc Executable file → Normal file

Binary file not shown.

View File

0
pricewatch/app/db/connection.py Executable file → Normal file
View File

View File

0
pricewatch/app/db/migrations/env.py Executable file → Normal file
View File

0
pricewatch/app/db/migrations/script.py.mako Executable file → Normal file
View File

View File

View File

@@ -0,0 +1,35 @@
"""Add webhooks table
Revision ID: 20260114_02
Revises: 20260114_01
Create Date: 2026-01-14 00:00:00
"""
from alembic import op
import sqlalchemy as sa
# Revision identifiers, used by Alembic.
revision = "20260114_02"
down_revision = "20260114_01"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.create_table(
"webhooks",
sa.Column("id", sa.Integer(), primary_key=True, autoincrement=True),
sa.Column("event", sa.String(length=50), nullable=False),
sa.Column("url", sa.Text(), nullable=False),
sa.Column("enabled", sa.Boolean(), nullable=False, server_default=sa.text("true")),
sa.Column("secret", sa.String(length=200), nullable=True),
sa.Column("created_at", sa.TIMESTAMP(), nullable=False),
)
op.create_index("ix_webhook_event", "webhooks", ["event"], unique=False)
op.create_index("ix_webhook_enabled", "webhooks", ["enabled"], unique=False)
def downgrade() -> None:
op.drop_index("ix_webhook_enabled", table_name="webhooks")
op.drop_index("ix_webhook_event", table_name="webhooks")
op.drop_table("webhooks")

View File

@@ -0,0 +1,26 @@
"""Ajout description et msrp sur products.
Revision ID: 20260115_02_product_details
Revises: 20260114_02
Create Date: 2026-01-15 10:00:00.000000
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "20260115_02_product_details"
down_revision = "20260114_02"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.add_column("products", sa.Column("description", sa.Text(), nullable=True))
op.add_column("products", sa.Column("msrp", sa.Numeric(10, 2), nullable=True))
def downgrade() -> None:
op.drop_column("products", "msrp")
op.drop_column("products", "description")

48
pricewatch/app/db/models.py Executable file → Normal file
View File

@@ -15,7 +15,7 @@ Justification technique:
- JSONB uniquement pour données variables: errors, notes dans logs
"""
from datetime import datetime
from datetime import datetime, timezone
from decimal import Decimal
from typing import List, Optional
@@ -28,6 +28,7 @@ from sqlalchemy import (
Integer,
JSON,
Numeric,
Boolean,
String,
Text,
UniqueConstraint,
@@ -42,6 +43,10 @@ class Base(DeclarativeBase):
pass
def utcnow() -> datetime:
return datetime.now(timezone.utc)
class Product(Base):
"""
Catalogue produits (1 ligne par produit unique).
@@ -70,19 +75,25 @@ class Product(Base):
category: Mapped[Optional[str]] = mapped_column(
Text, nullable=True, comment="Product category (breadcrumb)"
)
description: Mapped[Optional[str]] = mapped_column(
Text, nullable=True, comment="Product description"
)
currency: Mapped[Optional[str]] = mapped_column(
String(3), nullable=True, comment="Currency code (EUR, USD, GBP)"
)
msrp: Mapped[Optional[Decimal]] = mapped_column(
Numeric(10, 2), nullable=True, comment="Recommended price"
)
# Timestamps
first_seen_at: Mapped[datetime] = mapped_column(
TIMESTAMP, nullable=False, default=datetime.utcnow, comment="First scraping timestamp"
TIMESTAMP, nullable=False, default=utcnow, comment="First scraping timestamp"
)
last_updated_at: Mapped[datetime] = mapped_column(
TIMESTAMP,
nullable=False,
default=datetime.utcnow,
onupdate=datetime.utcnow,
default=utcnow,
onupdate=utcnow,
comment="Last metadata update",
)
@@ -280,7 +291,7 @@ class ScrapingLog(Base):
String(20), nullable=False, comment="Fetch status (success, partial, failed)"
)
fetched_at: Mapped[datetime] = mapped_column(
TIMESTAMP, nullable=False, default=datetime.utcnow, comment="Scraping timestamp"
TIMESTAMP, nullable=False, default=utcnow, comment="Scraping timestamp"
)
# Performance metrics
@@ -318,3 +329,30 @@ class ScrapingLog(Base):
def __repr__(self) -> str:
return f"<ScrapingLog(id={self.id}, url={self.url}, status={self.fetch_status}, fetched_at={self.fetched_at})>"
class Webhook(Base):
"""
Webhooks pour notifications externes.
"""
__tablename__ = "webhooks"
id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
event: Mapped[str] = mapped_column(String(50), nullable=False, comment="Event name")
url: Mapped[str] = mapped_column(Text, nullable=False, comment="Webhook URL")
enabled: Mapped[bool] = mapped_column(Boolean, nullable=False, default=True)
secret: Mapped[Optional[str]] = mapped_column(
String(200), nullable=True, comment="Secret optionnel"
)
created_at: Mapped[datetime] = mapped_column(
TIMESTAMP, nullable=False, default=utcnow, comment="Creation timestamp"
)
__table_args__ = (
Index("ix_webhook_event", "event"),
Index("ix_webhook_enabled", "enabled"),
)
def __repr__(self) -> str:
return f"<Webhook(id={self.id}, event={self.event}, url={self.url})>"

4
pricewatch/app/db/repository.py Executable file → Normal file
View File

@@ -49,8 +49,12 @@ class ProductRepository:
product.title = snapshot.title
if snapshot.category:
product.category = snapshot.category
if snapshot.description:
product.description = snapshot.description
if snapshot.currency:
product.currency = snapshot.currency
if snapshot.msrp is not None:
product.msrp = snapshot.msrp
def add_price_history(self, product: Product, snapshot: ProductSnapshot) -> Optional[PriceHistory]:
"""Ajoute une entree d'historique de prix si inexistante."""

View File

View File

0
pricewatch/app/scraping/pipeline.py Executable file → Normal file
View File

View File

@@ -23,6 +23,7 @@ from pricewatch.app.core.schema import (
StockStatus,
)
from pricewatch.app.stores.base import BaseStore
from pricewatch.app.stores.price_parser import parse_price_text
logger = get_logger("stores.aliexpress")
@@ -126,6 +127,8 @@ class AliexpressStore(BaseStore):
images = self._extract_images(html, soup, debug_info)
category = self._extract_category(soup, debug_info)
specs = self._extract_specs(soup, debug_info)
description = self._extract_description(soup, debug_info)
msrp = self._extract_msrp(html, debug_info)
reference = self.extract_reference(url)
# Note sur le rendu client-side
@@ -150,8 +153,10 @@ class AliexpressStore(BaseStore):
stock_status=stock_status,
reference=reference,
category=category,
description=description,
images=images,
specs=specs,
msrp=msrp,
debug=debug_info,
)
@@ -183,6 +188,17 @@ class AliexpressStore(BaseStore):
debug.errors.append("Titre non trouvé")
return None
def _extract_description(self, soup: BeautifulSoup, debug: DebugInfo) -> Optional[str]:
"""Extrait la description (meta tags)."""
meta = soup.find("meta", property="og:description") or soup.find(
"meta", attrs={"name": "description"}
)
if meta:
description = meta.get("content", "").strip()
if description:
return description
return None
def _extract_price(
self, html: str, soup: BeautifulSoup, debug: DebugInfo
) -> Optional[float]:
@@ -193,35 +209,39 @@ class AliexpressStore(BaseStore):
On utilise regex sur le HTML brut.
"""
# Pattern 1: Prix avant € (ex: "136,69 €")
match = re.search(r"([0-9]+[.,][0-9]{2})\s*€", html)
match = re.search(r"([0-9][0-9\\s.,\\u00a0\\u202f\\u2009]*)\\s*€", html)
if match:
price_str = match.group(1).replace(",", ".")
try:
return float(price_str)
except ValueError:
pass
price = parse_price_text(match.group(1))
if price is not None:
return price
# Pattern 2: € avant prix (ex: "€ 136.69")
match = re.search(r"\s*([0-9]+[.,][0-9]{2})", html)
match = re.search(r"\\s*([0-9][0-9\\s.,\\u00a0\\u202f\\u2009]*)", html)
if match:
price_str = match.group(1).replace(",", ".")
try:
return float(price_str)
except ValueError:
pass
price = parse_price_text(match.group(1))
if price is not None:
return price
# Pattern 3: Chercher dans meta tags (moins fiable)
og_price = soup.find("meta", property="og:price:amount")
if og_price:
price_str = og_price.get("content", "")
try:
return float(price_str)
except ValueError:
pass
price = parse_price_text(price_str)
if price is not None:
return price
debug.errors.append("Prix non trouvé")
return None
def _extract_msrp(self, html: str, debug: DebugInfo) -> Optional[float]:
"""Extrait le prix conseille si present."""
match = re.search(r"originalPrice\"\\s*:\\s*\"([0-9\\s.,]+)\"", html)
if match:
price = parse_price_text(match.group(1))
if price is not None:
return price
return None
def _extract_currency(
self, url: str, soup: BeautifulSoup, debug: DebugInfo
) -> str:

View File

View File

@@ -54,12 +54,12 @@ specs_table:
# ASIN (parfois dans les métadonnées)
asin:
- "input[name='ASIN']"
- "th:contains('ASIN') + td"
- "th:-soup-contains('ASIN') + td"
# Messages captcha / robot check
captcha_indicators:
- "form[action*='validateCaptcha']"
- "p.a-last:contains('Sorry')"
- "p.a-last:-soup-contains('Sorry')"
- "img[alt*='captcha']"
# Notes pour le parsing:

View File

@@ -4,7 +4,9 @@ Store Amazon - Parsing de produits Amazon.fr et Amazon.com.
Supporte l'extraction de: titre, prix, ASIN, images, specs, etc.
"""
import json
import re
from html import unescape
from datetime import datetime
from pathlib import Path
from typing import Optional
@@ -21,6 +23,7 @@ from pricewatch.app.core.schema import (
StockStatus,
)
from pricewatch.app.stores.base import BaseStore
from pricewatch.app.stores.price_parser import parse_price_text
logger = get_logger("stores.amazon")
@@ -131,6 +134,8 @@ class AmazonStore(BaseStore):
images = self._extract_images(soup, debug_info)
category = self._extract_category(soup, debug_info)
specs = self._extract_specs(soup, debug_info)
description = self._extract_description(soup, debug_info)
msrp = self._extract_msrp(soup, debug_info)
reference = self.extract_reference(url) or self._extract_asin_from_html(soup)
# Déterminer le statut final (ne pas écraser FAILED)
@@ -150,8 +155,10 @@ class AmazonStore(BaseStore):
stock_status=stock_status,
reference=reference,
category=category,
description=description,
images=images,
specs=specs,
msrp=msrp,
debug=debug_info,
)
@@ -195,6 +202,17 @@ class AmazonStore(BaseStore):
debug.errors.append("Titre non trouvé")
return None
def _extract_description(self, soup: BeautifulSoup, debug: DebugInfo) -> Optional[str]:
"""Extrait la description (meta tags)."""
meta = soup.find("meta", property="og:description") or soup.find(
"meta", attrs={"name": "description"}
)
if meta:
description = meta.get("content", "").strip()
if description:
return description
return None
def _extract_price(self, soup: BeautifulSoup, debug: DebugInfo) -> Optional[float]:
"""Extrait le prix."""
selectors = self.get_selector("price", [])
@@ -205,14 +223,9 @@ class AmazonStore(BaseStore):
elements = soup.select(selector)
for element in elements:
text = element.get_text(strip=True)
# Extraire nombre (format: "299,99" ou "299.99")
match = re.search(r"(\d+)[.,](\d+)", text)
if match:
price_str = f"{match.group(1)}.{match.group(2)}"
try:
return float(price_str)
except ValueError:
continue
price = parse_price_text(text)
if price is not None:
return price
# Fallback: chercher les spans séparés a-price-whole et a-price-fraction
whole = soup.select_one("span.a-price-whole")
@@ -220,15 +233,24 @@ class AmazonStore(BaseStore):
if whole and fraction:
whole_text = whole.get_text(strip=True)
fraction_text = fraction.get_text(strip=True)
try:
price_str = f"{whole_text}.{fraction_text}"
return float(price_str)
except ValueError:
pass
price = parse_price_text(f"{whole_text}.{fraction_text}")
if price is not None:
return price
debug.errors.append("Prix non trouvé")
return None
def _extract_msrp(self, soup: BeautifulSoup, debug: DebugInfo) -> Optional[float]:
"""Extrait le prix conseille."""
strike = soup.select_one("span.priceBlockStrikePriceString") or soup.select_one(
"span.a-text-price span.a-offscreen"
)
if strike:
price = parse_price_text(strike.get_text(strip=True))
if price is not None:
return price
return None
def _extract_currency(self, soup: BeautifulSoup, debug: DebugInfo) -> Optional[str]:
"""Extrait la devise."""
selectors = self.get_selector("currency", [])
@@ -270,6 +292,7 @@ class AmazonStore(BaseStore):
def _extract_images(self, soup: BeautifulSoup, debug: DebugInfo) -> list[str]:
"""Extrait les URLs d'images."""
images = []
seen = set()
selectors = self.get_selector("images", [])
if isinstance(selectors, str):
selectors = [selectors]
@@ -278,19 +301,57 @@ class AmazonStore(BaseStore):
elements = soup.select(selector)
for element in elements:
# Attribut src ou data-src
url = element.get("src") or element.get("data-src")
url = element.get("src") or element.get("data-src") or element.get("data-old-hires")
if url and url.startswith("http"):
images.append(url)
if self._is_product_image(url) and url not in seen:
images.append(url)
seen.add(url)
dynamic = element.get("data-a-dynamic-image")
if dynamic:
urls = self._extract_dynamic_images(dynamic)
for dyn_url in urls:
if self._is_product_image(dyn_url) and dyn_url not in seen:
images.append(dyn_url)
seen.add(dyn_url)
# Fallback: chercher tous les img tags si aucune image trouvée
if not images:
all_imgs = soup.find_all("img")
for img in all_imgs:
url = img.get("src") or img.get("data-src")
if url and url.startswith("http"):
images.append(url)
if url and url.startswith("http") and self._is_product_image(url):
if url not in seen:
images.append(url)
seen.add(url)
return list(set(images)) # Dédupliquer
return images
def _extract_dynamic_images(self, raw: str) -> list[str]:
"""Extrait les URLs du JSON data-a-dynamic-image."""
try:
data = json.loads(unescape(raw))
except (TypeError, json.JSONDecodeError):
return []
urls = []
if isinstance(data, dict):
candidates = []
for url, dims in data.items():
if not isinstance(url, str) or not url.startswith("http"):
continue
size = dims[0] if isinstance(dims, list) and dims else 0
candidates.append((size, url))
candidates.sort(key=lambda item: item[0], reverse=True)
for _, url in candidates:
urls.append(url)
return urls
def _is_product_image(self, url: str) -> bool:
"""Filtre basique pour eviter les logos et sprites."""
lowered = url.lower()
if "prime_logo" in lowered or "sprite" in lowered:
return False
return True
def _extract_category(self, soup: BeautifulSoup, debug: DebugInfo) -> Optional[str]:
"""Extrait la catégorie depuis les breadcrumbs."""

View File

@@ -23,6 +23,7 @@ from pricewatch.app.core.schema import (
StockStatus,
)
from pricewatch.app.stores.base import BaseStore
from pricewatch.app.stores.price_parser import parse_price_text
logger = get_logger("stores.backmarket")
@@ -116,6 +117,8 @@ class BackmarketStore(BaseStore):
images = json_ld_data.get("images") or self._extract_images(soup, debug_info)
category = self._extract_category(soup, debug_info)
specs = self._extract_specs(soup, debug_info)
description = self._extract_description(soup, debug_info)
msrp = self._extract_msrp(soup, debug_info)
reference = self.extract_reference(url)
# Spécifique Backmarket: condition (état du reconditionné)
@@ -140,8 +143,10 @@ class BackmarketStore(BaseStore):
stock_status=stock_status,
reference=reference,
category=category,
description=description,
images=images,
specs=specs,
msrp=msrp,
debug=debug_info,
)
@@ -213,6 +218,17 @@ class BackmarketStore(BaseStore):
debug.errors.append("Titre non trouvé")
return None
def _extract_description(self, soup: BeautifulSoup, debug: DebugInfo) -> Optional[str]:
"""Extrait la description (meta tags)."""
meta = soup.find("meta", property="og:description") or soup.find(
"meta", attrs={"name": "description"}
)
if meta:
description = meta.get("content", "").strip()
if description:
return description
return None
def _extract_price(self, soup: BeautifulSoup, debug: DebugInfo) -> Optional[float]:
"""Extrait le prix."""
selectors = self.get_selector("price", [])
@@ -225,20 +241,29 @@ class BackmarketStore(BaseStore):
# Attribut content (schema.org) ou texte
price_text = element.get("content") or element.get_text(strip=True)
# Extraire nombre (format: "299,99" ou "299.99" ou "299")
match = re.search(r"(\d+)[.,]?(\d*)", price_text)
if match:
integer_part = match.group(1)
decimal_part = match.group(2) or "00"
price_str = f"{integer_part}.{decimal_part}"
try:
return float(price_str)
except ValueError:
continue
price = parse_price_text(price_text)
if price is not None:
return price
debug.errors.append("Prix non trouvé")
return None
def _extract_msrp(self, soup: BeautifulSoup, debug: DebugInfo) -> Optional[float]:
"""Extrait le prix conseille."""
selectors = [
".price--old",
".price--striked",
".price__old",
"del",
]
for selector in selectors:
element = soup.select_one(selector)
if element:
price = parse_price_text(element.get_text(strip=True))
if price is not None:
return price
return None
def _extract_currency(self, soup: BeautifulSoup, debug: DebugInfo) -> Optional[str]:
"""Extrait la devise."""
selectors = self.get_selector("currency", [])

View File

@@ -4,6 +4,7 @@ Store Cdiscount - Parsing de produits Cdiscount.com.
Supporte l'extraction de: titre, prix, SKU, images, specs, etc.
"""
import json
import re
from datetime import datetime
from pathlib import Path
@@ -21,6 +22,7 @@ from pricewatch.app.core.schema import (
StockStatus,
)
from pricewatch.app.stores.base import BaseStore
from pricewatch.app.stores.price_parser import parse_price_text
logger = get_logger("stores.cdiscount")
@@ -112,6 +114,8 @@ class CdiscountStore(BaseStore):
images = self._extract_images(soup, debug_info)
category = self._extract_category(soup, debug_info)
specs = self._extract_specs(soup, debug_info)
description = self._extract_description(soup, debug_info)
msrp = self._extract_msrp(soup, debug_info)
reference = self.extract_reference(url) or self._extract_sku_from_html(soup)
# Déterminer le statut final
@@ -130,8 +134,10 @@ class CdiscountStore(BaseStore):
stock_status=stock_status,
reference=reference,
category=category,
description=description,
images=images,
specs=specs,
msrp=msrp,
debug=debug_info,
)
@@ -158,6 +164,21 @@ class CdiscountStore(BaseStore):
debug.errors.append("Titre non trouvé")
return None
def _extract_description(self, soup: BeautifulSoup, debug: DebugInfo) -> Optional[str]:
"""Extrait la description (meta tags)."""
meta = soup.find("meta", property="og:description") or soup.find(
"meta", attrs={"name": "description"}
)
if meta:
description = meta.get("content", "").strip()
if description:
return description
product_ld = self._find_product_ld(soup)
desc_ld = product_ld.get("description") if product_ld else None
if isinstance(desc_ld, str) and desc_ld.strip():
return desc_ld.strip()
return None
def _extract_price(self, soup: BeautifulSoup, debug: DebugInfo) -> Optional[float]:
"""Extrait le prix."""
selectors = self.get_selector("price", [])
@@ -170,20 +191,29 @@ class CdiscountStore(BaseStore):
# Attribut content (schema.org) ou texte
price_text = element.get("content") or element.get_text(strip=True)
# Extraire nombre (format: "299,99" ou "299.99")
match = re.search(r"(\d+)[.,]?(\d*)", price_text)
if match:
integer_part = match.group(1)
decimal_part = match.group(2) or "00"
price_str = f"{integer_part}.{decimal_part}"
try:
return float(price_str)
except ValueError:
continue
price = parse_price_text(price_text)
if price is not None:
return price
debug.errors.append("Prix non trouvé")
return None
def _extract_msrp(self, soup: BeautifulSoup, debug: DebugInfo) -> Optional[float]:
"""Extrait le prix conseille."""
selectors = [
".jsStrikePrice",
".price__old",
".c-price__strike",
".price-strike",
]
for selector in selectors:
element = soup.select_one(selector)
if element:
price = parse_price_text(element.get_text(strip=True))
if price is not None:
return price
return None
def _extract_currency(self, soup: BeautifulSoup, debug: DebugInfo) -> Optional[str]:
"""Extrait la devise."""
selectors = self.get_selector("currency", [])
@@ -249,7 +279,14 @@ class CdiscountStore(BaseStore):
url = f"https:{url}"
images.append(url)
return list(set(images)) # Dédupliquer
ld_images = self._extract_ld_images(self._find_product_ld(soup))
for url in ld_images:
if url and url not in images:
if url.startswith("//"):
url = f"https:{url}"
images.append(url)
return list(dict.fromkeys(images)) # Préserver lordre
def _extract_category(self, soup: BeautifulSoup, debug: DebugInfo) -> Optional[str]:
"""Extrait la catégorie depuis les breadcrumbs."""
@@ -275,6 +312,53 @@ class CdiscountStore(BaseStore):
return None
def _extract_json_ld_entries(self, soup: BeautifulSoup) -> list[dict]:
"""Parse les scripts JSON-LD et retourne les objets."""
entries = []
scripts = soup.find_all("script", type="application/ld+json")
for script in scripts:
raw = script.string or script.text
if not raw:
continue
try:
payload = json.loads(raw.strip())
except (json.JSONDecodeError, TypeError):
continue
if isinstance(payload, list):
entries.extend(payload)
else:
entries.append(payload)
return entries
def _find_product_ld(self, soup: BeautifulSoup) -> dict:
"""Retourne lobjet Product JSON-LD si présent."""
for entry in self._extract_json_ld_entries(soup):
if not isinstance(entry, dict):
continue
type_field = entry.get("@type") or entry.get("type")
if isinstance(type_field, str) and "product" in type_field.lower():
return entry
return {}
def _extract_ld_images(self, product_ld: dict) -> list[str]:
"""Récupère les images listées dans le JSON-LD."""
if not product_ld:
return []
images = product_ld.get("image") or product_ld.get("images")
if not images:
return []
if isinstance(images, str):
images = [images]
extracted = []
for item in images:
if isinstance(item, str):
extracted.append(item)
elif isinstance(item, dict):
url = item.get("url")
if isinstance(url, str):
extracted.append(url)
return extracted
def _extract_specs(self, soup: BeautifulSoup, debug: DebugInfo) -> dict[str, str]:
"""Extrait les caractéristiques techniques."""
specs = {}
@@ -298,6 +382,19 @@ class CdiscountStore(BaseStore):
if key and value:
specs[key] = value
product_ld = self._find_product_ld(soup)
additional = product_ld.get("additionalProperty") if product_ld else None
if isinstance(additional, dict):
additional = [additional]
if isinstance(additional, list):
for item in additional:
if not isinstance(item, dict):
continue
key = item.get("name") or item.get("propertyID")
value = item.get("value") or item.get("valueReference")
if key and value:
specs[key] = value
return specs
def _extract_sku_from_html(self, soup: BeautifulSoup) -> Optional[str]:

View File

@@ -0,0 +1,48 @@
"""
Helpers pour parser des prix avec separateurs de milliers.
"""
from __future__ import annotations
import re
from typing import Optional
def parse_price_text(text: str) -> Optional[float]:
"""
Parse un texte de prix en float.
Gere les separateurs espace, point, virgule et espaces insécables.
"""
if not text:
return None
text = re.sub(r"(\d)\s*€\s*(\d)", r"\1,\2", text)
cleaned = text.replace("\u00a0", " ").replace("\u202f", " ").replace("\u2009", " ")
cleaned = "".join(ch for ch in cleaned if ch.isdigit() or ch in ".,")
if not cleaned:
return None
if "," in cleaned and "." in cleaned:
if cleaned.rfind(",") > cleaned.rfind("."):
cleaned = cleaned.replace(".", "")
cleaned = cleaned.replace(",", ".")
else:
cleaned = cleaned.replace(",", "")
elif "," in cleaned:
parts = cleaned.split(",")
if len(parts) > 1:
decimal = parts[-1]
integer = "".join(parts[:-1])
cleaned = f"{integer}.{decimal}" if decimal else integer
elif "." in cleaned:
parts = cleaned.split(".")
if len(parts) > 1:
decimal = parts[-1]
integer = "".join(parts[:-1])
cleaned = f"{integer}.{decimal}" if decimal else integer
try:
return float(cleaned)
except ValueError:
return None

13
pricewatch/app/tasks/__init__.py Executable file → Normal file
View File

@@ -3,6 +3,15 @@ Module tasks pour les jobs RQ.
"""
from pricewatch.app.tasks.scrape import scrape_product
from pricewatch.app.tasks.scheduler import ScrapingScheduler
from pricewatch.app.tasks.scheduler import (
RedisUnavailableError,
ScrapingScheduler,
check_redis_connection,
)
__all__ = ["scrape_product", "ScrapingScheduler"]
__all__ = [
"scrape_product",
"ScrapingScheduler",
"RedisUnavailableError",
"check_redis_connection",
]

75
pricewatch/app/tasks/scheduler.py Executable file → Normal file
View File

@@ -9,6 +9,8 @@ from datetime import datetime, timedelta, timezone
from typing import Optional
import redis
from redis.exceptions import ConnectionError as RedisConnectionError
from redis.exceptions import RedisError, TimeoutError as RedisTimeoutError
from rq import Queue
from rq_scheduler import Scheduler
@@ -19,6 +21,15 @@ from pricewatch.app.tasks.scrape import scrape_product
logger = get_logger("tasks.scheduler")
class RedisUnavailableError(Exception):
"""Exception levee quand Redis n'est pas disponible."""
def __init__(self, message: str = "Redis non disponible", cause: Optional[Exception] = None):
self.message = message
self.cause = cause
super().__init__(self.message)
@dataclass
class ScheduledJobInfo:
"""Infos de retour pour un job planifie."""
@@ -27,14 +38,72 @@ class ScheduledJobInfo:
next_run: datetime
def check_redis_connection(redis_url: str) -> bool:
"""
Verifie si Redis est accessible.
Returns:
True si Redis repond, False sinon.
"""
try:
conn = redis.from_url(redis_url)
conn.ping()
return True
except (RedisConnectionError, RedisTimeoutError, RedisError) as e:
logger.debug(f"Redis ping echoue: {e}")
return False
class ScrapingScheduler:
"""Scheduler pour les jobs de scraping avec RQ."""
def __init__(self, config: Optional[AppConfig] = None, queue_name: str = "default") -> None:
self.config = config or get_config()
self.redis = redis.from_url(self.config.redis.url)
self.queue = Queue(queue_name, connection=self.redis)
self.scheduler = Scheduler(queue=self.queue, connection=self.redis)
self._queue_name = queue_name
self._redis: Optional[redis.Redis] = None
self._queue: Optional[Queue] = None
self._scheduler: Optional[Scheduler] = None
def _ensure_connected(self) -> None:
"""Etablit la connexion Redis si necessaire, leve RedisUnavailableError si echec."""
if self._redis is not None:
return
try:
self._redis = redis.from_url(self.config.redis.url)
# Ping pour verifier la connexion
self._redis.ping()
self._queue = Queue(self._queue_name, connection=self._redis)
self._scheduler = Scheduler(queue=self._queue, connection=self._redis)
logger.debug(f"Connexion Redis etablie: {self.config.redis.url}")
except (RedisConnectionError, RedisTimeoutError) as e:
self._redis = None
msg = f"Impossible de se connecter a Redis ({self.config.redis.url}): {e}"
logger.error(msg)
raise RedisUnavailableError(msg, cause=e) from e
except RedisError as e:
self._redis = None
msg = f"Erreur Redis: {e}"
logger.error(msg)
raise RedisUnavailableError(msg, cause=e) from e
@property
def redis(self) -> redis.Redis:
"""Acces a la connexion Redis (lazy)."""
self._ensure_connected()
return self._redis # type: ignore
@property
def queue(self) -> Queue:
"""Acces a la queue RQ (lazy)."""
self._ensure_connected()
return self._queue # type: ignore
@property
def scheduler(self) -> Scheduler:
"""Acces au scheduler RQ (lazy)."""
self._ensure_connected()
return self._scheduler # type: ignore
def enqueue_immediate(
self,

33
pricewatch/app/tasks/scrape.py Executable file → Normal file
View File

@@ -4,6 +4,7 @@ Tache de scraping asynchrone pour RQ.
from __future__ import annotations
import time
from typing import Any, Optional
from pricewatch.app.core.config import AppConfig, get_config
@@ -46,6 +47,9 @@ def scrape_product(
Retourne un dict avec success, product_id, snapshot, error.
"""
job_start_time = time.time()
logger.info(f"[JOB START] Scraping: {url}")
config: AppConfig = get_config()
setup_stores()
@@ -58,6 +62,8 @@ def scrape_product(
registry = get_registry()
store = registry.detect_store(url)
if not store:
elapsed_ms = int((time.time() - job_start_time) * 1000)
logger.error(f"[JOB FAILED] Aucun store detecte pour: {url} (duree={elapsed_ms}ms)")
snapshot = ProductSnapshot(
source="unknown",
url=url,
@@ -70,6 +76,8 @@ def scrape_product(
ScrapingPipeline(config=config).process_snapshot(snapshot, save_to_db=save_db)
return {"success": False, "product_id": None, "snapshot": snapshot, "error": "store"}
logger.info(f"[STORE] Detecte: {store.store_id}")
canonical_url = store.canonicalize(url)
html = None
@@ -79,13 +87,16 @@ def scrape_product(
html_size_bytes = None
pw_result = None
logger.debug(f"[FETCH] Tentative HTTP: {canonical_url}")
http_result = fetch_http(canonical_url)
duration_ms = http_result.duration_ms
if http_result.success:
html = http_result.html
fetch_method = FetchMethod.HTTP
logger.info(f"[FETCH] HTTP OK (duree={duration_ms}ms, taille={len(html)})")
elif use_playwright:
logger.debug(f"[FETCH] HTTP echoue ({http_result.error}), fallback Playwright")
pw_result = fetch_playwright(
canonical_url,
headless=not headful,
@@ -97,10 +108,13 @@ def scrape_product(
if pw_result.success:
html = pw_result.html
fetch_method = FetchMethod.PLAYWRIGHT
logger.info(f"[FETCH] Playwright OK (duree={duration_ms}ms, taille={len(html)})")
else:
fetch_error = pw_result.error
logger.warning(f"[FETCH] Playwright echoue: {fetch_error}")
else:
fetch_error = http_result.error
logger.warning(f"[FETCH] HTTP echoue: {fetch_error}")
if html:
html_size_bytes = len(html.encode("utf-8"))
@@ -118,12 +132,18 @@ def scrape_product(
save_debug_screenshot(pw_result.screenshot, f"{store.store_id}_{ref}")
try:
logger.debug(f"[PARSE] Parsing avec {store.store_id}...")
snapshot = store.parse(html, canonical_url)
snapshot.debug.method = fetch_method
snapshot.debug.duration_ms = duration_ms
snapshot.debug.html_size_bytes = html_size_bytes
success = snapshot.debug.status != DebugStatus.FAILED
if success:
logger.info(f"[PARSE] OK - titre={bool(snapshot.title)}, prix={snapshot.price}")
else:
logger.warning(f"[PARSE] Partiel - status={snapshot.debug.status}")
except Exception as exc:
logger.error(f"[PARSE] Exception: {exc}")
snapshot = ProductSnapshot(
source=store.store_id,
url=canonical_url,
@@ -152,6 +172,19 @@ def scrape_product(
product_id = ScrapingPipeline(config=config).process_snapshot(snapshot, save_to_db=save_db)
# Log final du job
elapsed_ms = int((time.time() - job_start_time) * 1000)
if success:
logger.info(
f"[JOB OK] {store.store_id}/{snapshot.reference} "
f"product_id={product_id} prix={snapshot.price} duree={elapsed_ms}ms"
)
else:
logger.warning(
f"[JOB FAILED] {store.store_id}/{snapshot.reference or 'unknown'} "
f"erreur={fetch_error} duree={elapsed_ms}ms"
)
return {
"success": success,
"product_id": product_id,

View File

@@ -57,6 +57,10 @@ dependencies = [
"redis>=5.0.0",
"rq>=1.15.0",
"rq-scheduler>=0.13.0",
# API (Phase 3)
"fastapi>=0.110.0",
"uvicorn>=0.27.0",
]
[project.optional-dependencies]

View File

@@ -4,7 +4,8 @@
# Liste des URLs à scraper
# Note: Ces URLs sont des exemples, remplacez-les par de vraies URLs produit
urls:
- "https://www.amazon.fr/NINJA-Essential-Cappuccino-préréglages-ES501EU/dp/B0DFWRHZ7L"
- "https://www.amazon.fr/ASUS-A16-TUF608UH-RV054W-Portable-Processeur-Windows/dp/B0DQ8M74KL"
- "https://www.cdiscount.com/informatique/ordinateurs-pc-portables/pc-portable-gamer-asus-tuf-gaming-a16-sans-windo/f-10709-tuf608umrv004.html"
# Options de scraping
options:
@@ -23,3 +24,4 @@ options:
# Timeout par page en millisecondes
timeout_ms: 60000
force_playwright: true

125
scraped_store.json Executable file → Normal file
View File

@@ -1,28 +1,121 @@
[
{
"source": "amazon",
"url": "https://www.amazon.fr/dp/B0DFWRHZ7L",
"fetched_at": "2026-01-13T13:24:21.615894",
"title": null,
"price": null,
"url": "https://www.amazon.fr/dp/B0DQ8M74KL",
"fetched_at": "2026-01-14T21:33:15.838503",
"title": "ASUS TUF Gaming A16-TUF608UH-RV054W 16 Pouces FHD Plus 165Hz Pc Portable (Processeur AMD Ryzen 7 260, 16GB DDR5, 512GB SSD, NVIDIA RTX 5050) Windows 11 Home Clavier AZERTY",
"price": 1259.0,
"msrp": 1699.99,
"currency": "EUR",
"shipping_cost": null,
"stock_status": "in_stock",
"reference": "B0DQ8M74KL",
"category": "Ordinateurs portables classiques",
"description": "ASUS TUF Gaming A16-TUF608UH-RV054W 16 Pouces FHD Plus 165Hz Pc Portable (Processeur AMD Ryzen 7 260, 16GB DDR5, 512GB SSD, NVIDIA RTX 5050) Windows 11 Home Clavier AZERTY : Amazon.fr: Informatique",
"images": [
"https://m.media-amazon.com/images/I/713fTyxvEWL._AC_SY300_SX300_QL70_ML2_.jpg",
"https://m.media-amazon.com/images/I/713fTyxvEWL._AC_SX679_.jpg",
"https://m.media-amazon.com/images/I/713fTyxvEWL._AC_SX569_.jpg",
"https://m.media-amazon.com/images/I/713fTyxvEWL._AC_SX522_.jpg",
"https://m.media-amazon.com/images/I/713fTyxvEWL._AC_SX466_.jpg",
"https://m.media-amazon.com/images/I/713fTyxvEWL._AC_SY450_.jpg",
"https://m.media-amazon.com/images/I/713fTyxvEWL._AC_SX425_.jpg",
"https://m.media-amazon.com/images/I/713fTyxvEWL._AC_SY355_.jpg"
],
"specs": {
"Marque": "ASUS",
"Numéro du modèle de l'article": "90NR0KS1-M00480",
"séries": "ASUS TUF Gaming",
"Couleur": "GRAY",
"Garantie constructeur": "3 ans contructeur",
"Système d'exploitation": "Windows 11 Home",
"Description du clavier": "Jeu",
"Marque du processeur": "AMD",
"Type de processeur": "Ryzen 7",
"Vitesse du processeur": "3,8 GHz",
"Nombre de coeurs": "8",
"Mémoire maximale": "32 Go",
"Taille du disque dur": "512 GB",
"Technologie du disque dur": "SSD",
"Interface du disque dur": "PCIE x 4",
"Type d'écran": "LED",
"Taille de l'écran": "16 Pouces",
"Résolution de l'écran": "1920 x 1200 pixels",
"Resolution": "1920x1200 Pixels",
"Marque chipset graphique": "NVIDIA",
"Description de la carte graphique": "NVIDIA GeForce RTX 5050 Laptop GPU - 8GB GDDR7",
"GPU": "NVIDIA GeForce RTX 5050 Laptop GPU - 8GB GDDR7",
"Mémoire vive de la carte graphique": "8 GB",
"Type de mémoire vive (carte graphique)": "GDDR7",
"Type de connectivité": "Bluetooth, Wi-Fi",
"Type de technologie sans fil": "802.11ax, Bluetooth",
"Bluetooth": "Oui",
"Nombre de ports HDMI": "1",
"Nombre de ports USB 2.0": "1",
"Nombre de ports USB 3.0": "3",
"Nombre de ports Ethernet": "1",
"Type de connecteur": "Bluetooth, HDMI, USB, Wi-Fi",
"Compatibilité du périphérique": "Casque audio, Clavier, Souris, Ecran externe, Disque dur externe, Imprimante, etc., Haut-parleur",
"Poids du produit": "2,1 Kilogrammes",
"Divers": "Clavier rétroéclairé",
"Disponibilité des pièces détachées": "5 Ans",
"Mises à jour logicielles garanties jusquà": "Information non disponible",
"ASIN": "B0DQ8M74KL",
"Moyenne des commentaires client": "4,74,7 sur 5 étoiles(7)4,7 sur 5 étoiles",
"Classement des meilleures ventes d'Amazon": "5025 en Informatique (Voir les 100 premiers en Informatique)124 enOrdinateurs portables classiques",
"Date de mise en ligne sur Amazon.fr": "1 juillet 2025"
},
"debug": {
"method": "playwright",
"status": "success",
"errors": [],
"notes": [],
"duration_ms": null,
"html_size_bytes": null
}
},
{
"source": "cdiscount",
"url": "https://www.cdiscount.com/informatique/ordinateurs-pc-portables/pc-portable-gamer-asus-tuf-gaming-a16-sans-windo/f-10709-tuf608umrv004.html",
"fetched_at": "2026-01-14T21:33:20.309754",
"title": "PC Portable Gamer ASUS TUF Gaming A16 | Sans Windows - 16\" WUXGA 165Hz - RTX 5060 8Go - AMD Ryzen 7 260 - RAM 16Go - 1To SSD",
"price": 119999.0,
"msrp": null,
"currency": "EUR",
"shipping_cost": null,
"stock_status": "unknown",
"reference": "B0DFWRHZ7L",
"reference": "10709-tuf608umrv004",
"category": null,
"images": [],
"description": "Cdiscount : Meuble, Déco, High Tech, Bricolage, Jardin, Sport | Livraison gratuite à partir de 10€ | Paiement sécurisé | 4x possible | Retour simple et rapide | E-commerçant français, des produits et services au meilleur prix.",
"images": [
"https://www.cdiscount.com/pdt2/0/0/4/1/700x700/tuf608umrv004/rw/pc-portable-gamer-asus-tuf-gaming-a16-sans-windo.jpg",
"https://www.cdiscount.com/pdt2/0/0/4/2/700x700/tuf608umrv004/rw/pc-portable-gamer-asus-tuf-gaming-a16-sans-windo.jpg",
"https://www.cdiscount.com/pdt2/0/0/4/3/700x700/tuf608umrv004/rw/pc-portable-gamer-asus-tuf-gaming-a16-sans-windo.jpg",
"https://www.cdiscount.com/pdt2/0/0/4/4/700x700/tuf608umrv004/rw/pc-portable-gamer-asus-tuf-gaming-a16-sans-windo.jpg",
"https://www.cdiscount.com/pdt2/0/0/4/5/700x700/tuf608umrv004/rw/pc-portable-gamer-asus-tuf-gaming-a16-sans-windo.jpg",
"https://www.cdiscount.com/pdt2/0/0/4/6/700x700/tuf608umrv004/rw/pc-portable-gamer-asus-tuf-gaming-a16-sans-windo.jpg",
"https://www.cdiscount.com/pdt2/0/0/4/7/700x700/tuf608umrv004/rw/pc-portable-gamer-asus-tuf-gaming-a16-sans-windo.jpg",
"https://www.cdiscount.com/pdt2/0/0/4/8/700x700/tuf608umrv004/rw/pc-portable-gamer-asus-tuf-gaming-a16-sans-windo.jpg",
"https://www.cdiscount.com/pdt2/0/0/4/9/700x700/tuf608umrv004/rw/pc-portable-gamer-asus-tuf-gaming-a16-sans-windo.jpg",
"https://www.cdiscount.com/pdt2/0/0/4/1/115x115/tuf608umrv004/rw/pc-portable-gamer-asus-tuf-gaming-a16-sans-windo.jpg",
"https://www.cdiscount.com/pdt2/0/0/4/2/115x115/tuf608umrv004/rw/pc-portable-gamer-asus-tuf-gaming-a16-sans-windo.jpg",
"https://www.cdiscount.com/pdt2/0/0/4/3/115x115/tuf608umrv004/rw/pc-portable-gamer-asus-tuf-gaming-a16-sans-windo.jpg",
"https://www.cdiscount.com/pdt2/0/0/4/4/115x115/tuf608umrv004/rw/pc-portable-gamer-asus-tuf-gaming-a16-sans-windo.jpg",
"https://www.cdiscount.com/pdt2/0/0/4/5/115x115/tuf608umrv004/rw/pc-portable-gamer-asus-tuf-gaming-a16-sans-windo.jpg",
"https://www.cdiscount.com/pdt2/0/0/4/6/115x115/tuf608umrv004/rw/pc-portable-gamer-asus-tuf-gaming-a16-sans-windo.jpg",
"https://www.cdiscount.com/pdt2/0/0/4/7/115x115/tuf608umrv004/rw/pc-portable-gamer-asus-tuf-gaming-a16-sans-windo.jpg",
"https://www.cdiscount.com/pdt2/0/0/4/8/115x115/tuf608umrv004/rw/pc-portable-gamer-asus-tuf-gaming-a16-sans-windo.jpg",
"https://www.cdiscount.com/pdt2/0/0/4/9/115x115/tuf608umrv004/rw/pc-portable-gamer-asus-tuf-gaming-a16-sans-windo.jpg",
"https://www.cdiscount.com/ac/085x085/TUF608UMRV004_177763282_1.png",
"https://www.cdiscount.com/ac/085x085/TUF608UMRV004_177763282_2.png",
"https://www.cdiscount.com/pdt2/0/0/4/9/550x550/tuf608umrv004/rw/pc-portable-gamer-asus-tuf-gaming-a16-sans-windo.jpg"
],
"specs": {},
"debug": {
"method": "http",
"status": "partial",
"errors": [
"Captcha ou robot check détecté",
"Titre non trouvé",
"Prix non trouvé"
],
"notes": [
"Parsing incomplet: titre ou prix manquant"
],
"method": "playwright",
"status": "success",
"errors": [],
"notes": [],
"duration_ms": null,
"html_size_bytes": null
}

56
tests/api/test_auth.py Normal file
View File

@@ -0,0 +1,56 @@
"""
Tests auth API.
"""
from dataclasses import dataclass
import pytest
from fastapi import HTTPException
from pricewatch.app.api.main import require_token
@dataclass
class FakeRedisConfig:
url: str
@dataclass
class FakeDbConfig:
url: str
@dataclass
class FakeAppConfig:
db: FakeDbConfig
redis: FakeRedisConfig
api_token: str
def test_missing_token_returns_401(monkeypatch):
"""Sans token, retourne 401."""
config = FakeAppConfig(
db=FakeDbConfig(url="sqlite:///:memory:"),
redis=FakeRedisConfig(url="redis://localhost:6379/0"),
api_token="secret",
)
monkeypatch.setattr("pricewatch.app.api.main.get_config", lambda: config)
with pytest.raises(HTTPException) as excinfo:
require_token(None)
assert excinfo.value.status_code == 401
def test_bad_token_returns_403(monkeypatch):
"""Token invalide retourne 403."""
config = FakeAppConfig(
db=FakeDbConfig(url="sqlite:///:memory:"),
redis=FakeRedisConfig(url="redis://localhost:6379/0"),
api_token="secret",
)
monkeypatch.setattr("pricewatch.app.api.main.get_config", lambda: config)
with pytest.raises(HTTPException) as excinfo:
require_token("Bearer nope")
assert excinfo.value.status_code == 403

View File

@@ -0,0 +1,30 @@
"""
Tests API logs backend.
"""
from pricewatch.app.api.main import BACKEND_LOGS, list_backend_logs, preview_scrape
from pricewatch.app.api.schemas import ScrapePreviewRequest
from pricewatch.app.core.schema import DebugInfo, DebugStatus, FetchMethod, ProductSnapshot
def test_backend_logs_capture_preview(monkeypatch):
BACKEND_LOGS.clear()
snapshot = ProductSnapshot(
source="amazon",
url="https://example.com",
title="Produit",
price=9.99,
currency="EUR",
debug=DebugInfo(method=FetchMethod.HTTP, status=DebugStatus.SUCCESS),
)
def fake_scrape(url, use_playwright=None, save_db=False):
return {"success": True, "snapshot": snapshot, "error": None}
monkeypatch.setattr("pricewatch.app.api.main.scrape_product", fake_scrape)
preview_scrape(ScrapePreviewRequest(url="https://example.com"))
logs = list_backend_logs()
assert logs
assert logs[-1].message.startswith("Preview scraping")

View File

@@ -0,0 +1,239 @@
"""
Tests filtres avances et exports API.
"""
from datetime import datetime, timedelta
import json
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from pricewatch.app.api.main import (
export_logs,
export_prices,
export_products,
list_logs,
list_prices,
list_products,
)
from pricewatch.app.db.models import Base, PriceHistory, Product, ScrapingLog
def _make_session():
engine = create_engine("sqlite:///:memory:")
Base.metadata.create_all(engine)
session = sessionmaker(bind=engine)()
return engine, session
def test_list_products_filters_latest_price_and_stock():
engine, session = _make_session()
try:
product_a = Product(
source="amazon",
reference="REF-A",
url="https://example.com/a",
title="A",
category="Test",
currency="EUR",
first_seen_at=datetime(2026, 1, 14, 10, 0, 0),
last_updated_at=datetime(2026, 1, 15, 9, 0, 0),
)
product_b = Product(
source="amazon",
reference="REF-B",
url="https://example.com/b",
title="B",
category="Test",
currency="EUR",
first_seen_at=datetime(2026, 1, 14, 10, 0, 0),
last_updated_at=datetime(2026, 1, 15, 9, 5, 0),
)
session.add_all([product_a, product_b])
session.commit()
history = [
PriceHistory(
product_id=product_a.id,
price=80,
shipping_cost=0,
stock_status="out_of_stock",
fetch_method="http",
fetch_status="success",
fetched_at=datetime(2026, 1, 15, 8, 0, 0),
),
PriceHistory(
product_id=product_a.id,
price=100,
shipping_cost=0,
stock_status="in_stock",
fetch_method="http",
fetch_status="success",
fetched_at=datetime(2026, 1, 15, 9, 0, 0),
),
PriceHistory(
product_id=product_b.id,
price=200,
shipping_cost=10,
stock_status="in_stock",
fetch_method="http",
fetch_status="success",
fetched_at=datetime(2026, 1, 15, 9, 5, 0),
),
]
session.add_all(history)
session.commit()
filtered = list_products(price_min=150, session=session)
assert len(filtered) == 1
assert filtered[0].reference == "REF-B"
filtered_stock = list_products(stock_status="in_stock", session=session)
assert {item.reference for item in filtered_stock} == {"REF-A", "REF-B"}
finally:
session.close()
engine.dispose()
def test_list_prices_filters():
engine, session = _make_session()
try:
product = Product(
source="amazon",
reference="REF-1",
url="https://example.com/1",
title="Produit",
category="Test",
currency="EUR",
first_seen_at=datetime(2026, 1, 14, 10, 0, 0),
last_updated_at=datetime(2026, 1, 14, 11, 0, 0),
)
session.add(product)
session.commit()
history = [
PriceHistory(
product_id=product.id,
price=50,
shipping_cost=0,
stock_status="in_stock",
fetch_method="http",
fetch_status="success",
fetched_at=datetime(2026, 1, 14, 12, 0, 0),
),
PriceHistory(
product_id=product.id,
price=120,
shipping_cost=0,
stock_status="in_stock",
fetch_method="http",
fetch_status="failed",
fetched_at=datetime(2026, 1, 15, 12, 0, 0),
),
]
session.add_all(history)
session.commit()
results = list_prices(
product_id=product.id,
price_min=100,
fetch_status="failed",
session=session,
)
assert len(results) == 1
assert results[0].price == 120
finally:
session.close()
engine.dispose()
def test_list_logs_filters():
engine, session = _make_session()
try:
now = datetime(2026, 1, 15, 10, 0, 0)
logs = [
ScrapingLog(
product_id=None,
url="https://example.com/a",
source="amazon",
reference="REF-A",
fetch_method="http",
fetch_status="success",
fetched_at=now,
),
ScrapingLog(
product_id=None,
url="https://example.com/b",
source="amazon",
reference="REF-B",
fetch_method="http",
fetch_status="failed",
fetched_at=now - timedelta(hours=2),
),
]
session.add_all(logs)
session.commit()
filtered = list_logs(
fetch_status="success",
fetched_after=now - timedelta(hours=1),
session=session,
)
assert len(filtered) == 1
assert filtered[0].reference == "REF-A"
finally:
session.close()
engine.dispose()
def test_exports_csv_and_json():
engine, session = _make_session()
try:
product = Product(
source="amazon",
reference="REF-EXPORT",
url="https://example.com/export",
title="Export",
category="Test",
currency="EUR",
first_seen_at=datetime(2026, 1, 14, 10, 0, 0),
last_updated_at=datetime(2026, 1, 14, 11, 0, 0),
)
session.add(product)
session.commit()
session.add(
PriceHistory(
product_id=product.id,
price=99,
shipping_cost=0,
stock_status="in_stock",
fetch_method="http",
fetch_status="success",
fetched_at=datetime(2026, 1, 14, 12, 0, 0),
)
)
session.add(
ScrapingLog(
product_id=product.id,
url=product.url,
source=product.source,
reference=product.reference,
fetch_method="http",
fetch_status="success",
fetched_at=datetime(2026, 1, 14, 12, 0, 0),
)
)
session.commit()
csv_response = export_products(format="csv", session=session)
assert csv_response.media_type == "text/csv"
assert "products.csv" in csv_response.headers.get("Content-Disposition", "")
assert "REF-EXPORT" in csv_response.body.decode("utf-8")
json_response = export_logs(format="json", session=session)
payload = json.loads(json_response.body.decode("utf-8"))
assert payload[0]["reference"] == "REF-EXPORT"
finally:
session.close()
engine.dispose()

40
tests/api/test_health.py Normal file
View File

@@ -0,0 +1,40 @@
"""
Tests endpoint /health.
"""
from dataclasses import dataclass
from pricewatch.app.api.main import health_check
@dataclass
class FakeRedisConfig:
url: str
@dataclass
class FakeDbConfig:
url: str
@dataclass
class FakeAppConfig:
db: FakeDbConfig
redis: FakeRedisConfig
api_token: str
def test_health_ok(monkeypatch):
"""Health retourne db/redis true."""
config = FakeAppConfig(
db=FakeDbConfig(url="sqlite:///:memory:"),
redis=FakeRedisConfig(url="redis://localhost:6379/0"),
api_token="secret",
)
monkeypatch.setattr("pricewatch.app.api.main.get_config", lambda: config)
monkeypatch.setattr("pricewatch.app.api.main.check_db_connection", lambda cfg: True)
monkeypatch.setattr("pricewatch.app.api.main.check_redis_connection", lambda url: True)
result = health_check()
assert result.db is True
assert result.redis is True

View File

@@ -0,0 +1,47 @@
"""
Tests HTTP d'integration contre l'API Docker.
"""
import os
import pytest
import httpx
API_BASE = os.getenv("PW_API_BASE", "http://localhost:8001")
API_TOKEN = os.getenv("PW_API_TOKEN", "change_me")
def _client() -> httpx.Client:
return httpx.Client(base_url=API_BASE, timeout=2.0)
def _is_api_up() -> bool:
try:
with _client() as client:
resp = client.get("/health")
return resp.status_code == 200
except Exception:
return False
@pytest.mark.skipif(not _is_api_up(), reason="API Docker indisponible")
def test_health_endpoint():
"""/health repond avec db/redis."""
with _client() as client:
resp = client.get("/health")
assert resp.status_code == 200
payload = resp.json()
assert "db" in payload and "redis" in payload
@pytest.mark.skipif(not _is_api_up(), reason="API Docker indisponible")
def test_products_requires_token():
"""/products demande un token valide."""
with _client() as client:
resp = client.get("/products")
assert resp.status_code == 401
resp = client.get("/products", headers={"Authorization": f"Bearer {API_TOKEN}"})
assert resp.status_code == 200
assert isinstance(resp.json(), list)

View File

@@ -0,0 +1,37 @@
"""
Tests API produits en lecture seule.
"""
from datetime import datetime
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from pricewatch.app.api.main import list_products
from pricewatch.app.db.models import Base, Product
def test_list_products():
"""Liste des produits."""
engine = create_engine("sqlite:///:memory:")
Base.metadata.create_all(engine)
session = sessionmaker(bind=engine)()
product = Product(
source="amazon",
reference="REF1",
url="https://example.com",
title="Produit",
category="Test",
currency="EUR",
first_seen_at=datetime(2026, 1, 14, 16, 0, 0),
last_updated_at=datetime(2026, 1, 14, 16, 0, 0),
)
session.add(product)
session.commit()
data = list_products(session=session, limit=50, offset=0)
assert len(data) == 1
assert data[0].reference == "REF1"
session.close()
engine.dispose()

View File

@@ -0,0 +1,55 @@
"""
Tests API preview/commit scraping.
"""
from datetime import datetime
from pricewatch.app.api.main import commit_scrape, preview_scrape
from pricewatch.app.api.schemas import ScrapeCommitRequest, ScrapePreviewRequest
from pricewatch.app.core.schema import DebugInfo, DebugStatus, FetchMethod, ProductSnapshot
def test_preview_scrape_returns_snapshot(monkeypatch):
snapshot = ProductSnapshot(
source="amazon",
url="https://example.com",
title="Produit",
price=9.99,
currency="EUR",
debug=DebugInfo(method=FetchMethod.HTTP, status=DebugStatus.SUCCESS),
)
def fake_scrape(url, use_playwright=None, save_db=False):
return {"success": True, "snapshot": snapshot, "error": None}
monkeypatch.setattr("pricewatch.app.api.main.scrape_product", fake_scrape)
response = preview_scrape(ScrapePreviewRequest(url="https://example.com"))
assert response.success is True
assert response.snapshot["source"] == "amazon"
assert response.snapshot["price"] == 9.99
def test_commit_scrape_persists_snapshot(monkeypatch):
snapshot = ProductSnapshot(
source="amazon",
url="https://example.com",
title="Produit",
price=19.99,
currency="EUR",
fetched_at=datetime(2026, 1, 15, 10, 0, 0),
debug=DebugInfo(method=FetchMethod.HTTP, status=DebugStatus.SUCCESS),
)
class FakePipeline:
def __init__(self, config=None):
self.config = config
def process_snapshot(self, snapshot, save_to_db=True):
return 42
monkeypatch.setattr("pricewatch.app.api.main.ScrapingPipeline", FakePipeline)
response = commit_scrape(ScrapeCommitRequest(snapshot=snapshot.model_dump(mode="json")))
assert response.success is True
assert response.product_id == 42

View File

@@ -0,0 +1,16 @@
"""
Tests API logs Uvicorn.
"""
from pricewatch.app.api.main import list_uvicorn_logs
def test_list_uvicorn_logs_reads_file(monkeypatch, tmp_path):
log_file = tmp_path / "uvicorn.log"
log_file.write_text("ligne-1\nligne-2\n", encoding="utf-8")
monkeypatch.setattr("pricewatch.app.api.main.UVICORN_LOG_PATH", log_file)
response = list_uvicorn_logs(limit=1)
assert len(response) == 1
assert response[0].line == "ligne-2"

11
tests/api/test_version.py Normal file
View File

@@ -0,0 +1,11 @@
"""
Tests API version.
"""
from pricewatch.app.api.main import version_info
def test_version_info():
"""Retourne la version API."""
response = version_info()
assert response.api_version

View File

@@ -0,0 +1,72 @@
"""
Tests API webhooks.
"""
import pytest
from fastapi import HTTPException
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from pricewatch.app.api.main import (
create_webhook,
delete_webhook,
list_webhooks,
send_webhook_test,
update_webhook,
)
from pricewatch.app.api.schemas import WebhookCreate, WebhookUpdate
from pricewatch.app.db.models import Base
def _make_session():
engine = create_engine("sqlite:///:memory:")
Base.metadata.create_all(engine)
session = sessionmaker(bind=engine)()
return engine, session
def test_webhook_crud_and_test(monkeypatch):
engine, session = _make_session()
try:
payload = WebhookCreate(event="price_changed", url="https://example.com/webhook")
created = create_webhook(payload, session=session)
assert created.id > 0
items = list_webhooks(session=session)
assert len(items) == 1
updated = update_webhook(created.id, WebhookUpdate(enabled=False), session=session)
assert updated.enabled is False
with pytest.raises(HTTPException) as excinfo:
send_webhook_test(created.id, session=session)
assert excinfo.value.status_code == 409
update_webhook(created.id, WebhookUpdate(enabled=True), session=session)
called = {}
def fake_post(url, json, headers, timeout):
called["url"] = url
called["json"] = json
called["headers"] = headers
called["timeout"] = timeout
class FakeResponse:
status_code = 200
def raise_for_status(self):
return None
return FakeResponse()
monkeypatch.setattr("pricewatch.app.api.main.httpx.post", fake_post)
response = send_webhook_test(created.id, session=session)
assert response.status == "sent"
assert called["json"]["event"] == "test"
delete_webhook(created.id, session=session)
assert list_webhooks(session=session) == []
finally:
session.close()
engine.dispose()

View File

View File

@@ -0,0 +1,130 @@
"""
Test end-to-end: CLI enqueue -> worker -> DB via Redis.
"""
from dataclasses import dataclass
from datetime import datetime
import pytest
import redis
from rq import Queue
from rq.worker import SimpleWorker
from typer.testing import CliRunner
from pricewatch.app.cli import main as cli_main
from pricewatch.app.core.registry import get_registry
from pricewatch.app.core.schema import DebugInfo, DebugStatus, FetchMethod, ProductSnapshot
from pricewatch.app.db.connection import get_session, init_db, reset_engine
from pricewatch.app.db.models import Product, ScrapingLog
from pricewatch.app.stores.base import BaseStore
from pricewatch.app.tasks import scrape as scrape_task
@dataclass
class FakeDbConfig:
url: str
@dataclass
class FakeRedisConfig:
url: str
@dataclass
class FakeAppConfig:
db: FakeDbConfig
redis: FakeRedisConfig
debug: bool = False
enable_db: bool = True
default_use_playwright: bool = False
default_playwright_timeout: int = 1000
class DummyStore(BaseStore):
def __init__(self) -> None:
super().__init__(store_id="dummy")
def match(self, url: str) -> float:
return 1.0 if "example.com" in url else 0.0
def canonicalize(self, url: str) -> str:
return url
def extract_reference(self, url: str) -> str | None:
return "REF-CLI"
def parse(self, html: str, url: str) -> ProductSnapshot:
return ProductSnapshot(
source=self.store_id,
url=url,
fetched_at=datetime(2026, 1, 14, 15, 0, 0),
title="Produit cli",
price=49.99,
currency="EUR",
reference="REF-CLI",
debug=DebugInfo(method=FetchMethod.HTTP, status=DebugStatus.SUCCESS),
)
class DummyFetchResult:
def __init__(self, html: str) -> None:
self.success = True
self.html = html
self.error = None
self.duration_ms = 20
def _redis_available(redis_url: str) -> bool:
try:
conn = redis.from_url(redis_url)
conn.ping()
return True
except Exception:
return False
@pytest.mark.skipif(not _redis_available("redis://localhost:6379/0"), reason="Redis indisponible")
def test_cli_enqueue_worker_persists_db(tmp_path, monkeypatch):
"""Enqueue via CLI, execution worker, persistence DB."""
reset_engine()
db_path = tmp_path / "cli-worker.db"
redis_url = "redis://localhost:6379/0"
config = FakeAppConfig(
db=FakeDbConfig(url=f"sqlite:///{db_path}"),
redis=FakeRedisConfig(url=redis_url),
)
init_db(config)
registry = get_registry()
previous_stores = list(registry._stores)
registry._stores = []
registry.register(DummyStore())
monkeypatch.setattr(cli_main, "get_config", lambda: config)
monkeypatch.setattr(scrape_task, "get_config", lambda: config)
monkeypatch.setattr(scrape_task, "setup_stores", lambda: None)
monkeypatch.setattr(scrape_task, "fetch_http", lambda url: DummyFetchResult("<html></html>"))
queue_name = "test-cli"
redis_conn = redis.from_url(redis_url)
queue = Queue(queue_name, connection=redis_conn)
queue.empty()
runner = CliRunner()
try:
result = runner.invoke(
cli_main.app,
["enqueue", "https://example.com/product", "--queue", queue_name, "--save-db"],
)
assert result.exit_code == 0
worker = SimpleWorker([queue], connection=redis_conn)
worker.work(burst=True)
finally:
queue.empty()
registry._stores = previous_stores
reset_engine()
with get_session(config) as session:
assert session.query(Product).count() == 1
assert session.query(ScrapingLog).count() == 1

View File

@@ -0,0 +1,83 @@
"""
Tests CLI pour enqueue/schedule avec gestion Redis.
"""
from types import SimpleNamespace
from typer.testing import CliRunner
from pricewatch.app.cli import main as cli_main
class DummyScheduler:
def __init__(self, *args, **kwargs) -> None:
self.enqueue_calls = []
self.schedule_calls = []
def enqueue_immediate(self, url, use_playwright=None, save_db=True):
self.enqueue_calls.append((url, use_playwright, save_db))
return SimpleNamespace(id="job-123")
def schedule_product(self, url, interval_hours=24, use_playwright=None, save_db=True):
self.schedule_calls.append((url, interval_hours, use_playwright, save_db))
return SimpleNamespace(job_id="job-456", next_run=SimpleNamespace(isoformat=lambda: "2026"))
def test_enqueue_cli_success(monkeypatch):
"""La commande enqueue retourne un job id."""
runner = CliRunner()
dummy = DummyScheduler()
monkeypatch.setattr(cli_main, "ScrapingScheduler", lambda *args, **kwargs: dummy)
result = runner.invoke(cli_main.app, ["enqueue", "https://example.com/product"])
assert result.exit_code == 0
assert "job-123" in result.output
def test_schedule_cli_success(monkeypatch):
"""La commande schedule retourne un job id et une date."""
runner = CliRunner()
dummy = DummyScheduler()
monkeypatch.setattr(cli_main, "ScrapingScheduler", lambda *args, **kwargs: dummy)
result = runner.invoke(
cli_main.app,
["schedule", "https://example.com/product", "--interval", "12"],
)
assert result.exit_code == 0
assert "job-456" in result.output
assert "2026" in result.output
def test_enqueue_cli_redis_unavailable(monkeypatch):
"""La commande enqueue echoue si Redis est indisponible."""
runner = CliRunner()
def raise_redis(*args, **kwargs):
raise cli_main.RedisUnavailableError("Redis non disponible")
monkeypatch.setattr(cli_main, "ScrapingScheduler", raise_redis)
result = runner.invoke(cli_main.app, ["enqueue", "https://example.com/product"])
assert result.exit_code == 1
assert "Redis non disponible" in result.output
def test_schedule_cli_redis_unavailable(monkeypatch):
"""La commande schedule echoue si Redis est indisponible."""
runner = CliRunner()
def raise_redis(*args, **kwargs):
raise cli_main.RedisUnavailableError("Redis non disponible")
monkeypatch.setattr(cli_main, "ScrapingScheduler", raise_redis)
result = runner.invoke(cli_main.app, ["schedule", "https://example.com/product"])
assert result.exit_code == 1
assert "Redis non disponible" in result.output

0
tests/cli/test_run_db.py Executable file → Normal file
View File

106
tests/cli/test_run_no_db.py Normal file
View File

@@ -0,0 +1,106 @@
"""
Tests pour la compatibilite --no-db.
"""
from dataclasses import dataclass
from pathlib import Path
from typer.testing import CliRunner
from pricewatch.app.cli import main as cli_main
from pricewatch.app.core.registry import get_registry
from pricewatch.app.core.schema import DebugInfo, DebugStatus, FetchMethod, ProductSnapshot
from pricewatch.app.db.connection import get_session, init_db, reset_engine
from pricewatch.app.db.models import Product
from pricewatch.app.stores.base import BaseStore
@dataclass
class FakeDbConfig:
url: str
@dataclass
class FakeAppConfig:
db: FakeDbConfig
debug: bool = False
enable_db: bool = True
class DummyStore(BaseStore):
def __init__(self) -> None:
super().__init__(store_id="dummy")
def match(self, url: str) -> float:
return 1.0 if "example.com" in url else 0.0
def canonicalize(self, url: str) -> str:
return url
def extract_reference(self, url: str) -> str | None:
return "REF-NODB"
def parse(self, html: str, url: str) -> ProductSnapshot:
return ProductSnapshot(
source=self.store_id,
url=url,
title="Produit nodb",
price=9.99,
currency="EUR",
reference="REF-NODB",
debug=DebugInfo(method=FetchMethod.HTTP, status=DebugStatus.SUCCESS),
)
class DummyFetchResult:
def __init__(self, html: str) -> None:
self.success = True
self.html = html
self.error = None
def test_cli_run_no_db(tmp_path, monkeypatch):
"""Le flag --no-db evite toute ecriture DB."""
reset_engine()
db_path = tmp_path / "nodb.db"
config = FakeAppConfig(db=FakeDbConfig(url=f"sqlite:///{db_path}"))
init_db(config)
yaml_path = tmp_path / "config.yaml"
out_path = tmp_path / "out.json"
yaml_path.write_text(
"""
urls:
- "https://example.com/product"
options:
use_playwright: false
save_html: false
save_screenshot: false
""",
encoding="utf-8",
)
registry = get_registry()
previous_stores = list(registry._stores)
registry._stores = []
registry.register(DummyStore())
monkeypatch.setattr(cli_main, "get_config", lambda: config)
monkeypatch.setattr(cli_main, "setup_stores", lambda: None)
monkeypatch.setattr(cli_main, "fetch_http", lambda url: DummyFetchResult("<html></html>"))
runner = CliRunner()
try:
result = runner.invoke(
cli_main.app,
["run", "--yaml", str(yaml_path), "--out", str(out_path), "--no-db"],
)
finally:
registry._stores = previous_stores
reset_engine()
assert result.exit_code == 0
assert out_path.exists()
with get_session(config) as session:
assert session.query(Product).count() == 0

View File

@@ -0,0 +1,54 @@
"""
Tests pour les commandes worker RQ via CLI.
"""
from types import SimpleNamespace
import pytest
from typer.testing import CliRunner
from pricewatch.app.cli import main as cli_main
class DummyRedis:
def ping(self) -> bool:
return True
class DummyWorker:
def __init__(self, queues, connection=None) -> None:
self.queues = queues
self.connection = connection
self.work_calls = []
def work(self, with_scheduler: bool = True):
self.work_calls.append(with_scheduler)
def test_worker_cli_success(monkeypatch):
"""Le worker demarre quand Redis est disponible."""
runner = CliRunner()
dummy_worker = DummyWorker([])
monkeypatch.setattr(cli_main, "Worker", lambda queues, connection=None: dummy_worker)
monkeypatch.setattr(cli_main.redis, "from_url", lambda url: DummyRedis())
result = runner.invoke(cli_main.app, ["worker", "--no-scheduler"])
assert result.exit_code == 0
assert dummy_worker.work_calls == [False]
def test_worker_cli_redis_down(monkeypatch):
"""Le worker echoue proprement si Redis est indisponible."""
runner = CliRunner()
def raise_connection(url):
raise cli_main.redis.exceptions.ConnectionError("redis down")
monkeypatch.setattr(cli_main.redis, "from_url", raise_connection)
result = runner.invoke(cli_main.app, ["worker"])
assert result.exit_code == 1
assert "Impossible de se connecter a Redis" in result.output

View File

Some files were not shown because too many files have changed in this diff Show More