chore: sync project files

2026-01-13 19:49:04 +01:00
parent 53f8227941
commit ecda149a4b
149 changed files with 65272 additions and 1 deletions
--- a/TODO.md
+++ b/TODO.md
@@ -0,0 +1,219 @@
+# TODO - PriceWatch
+
+Liste des tâches priorisées pour le développement de PriceWatch.
+
+## Légende
+- [ ] À faire
+- [x] Terminé
+- [~] En cours
+
+---
+
+## Phase 1 : Fondations CLI
+
+### Étape 1 : Documentation et structure
+- [x] Créer README.md complet
+- [x] Créer TODO.md (ce fichier)
+- [x] Créer CHANGELOG.md
+- [x] Créer structure des dossiers du projet
+- [x] Créer pyproject.toml avec dépendances
+
+### Étape 2 : Modèle de données
+- [x] Implémenter ProductSnapshot (Pydantic) dans core/schema.py
+  - [x] Champs métadonnées (source, url, fetched_at)
+  - [x] Champs produit (title, price, currency, shipping_cost, stock_status, reference, images, category, specs)
+  - [x] Champs debug (method, errors, notes, status)
+  - [x] Validation et serialization JSON
+
+### Étape 3 : Core utilitaires
+- [x] Implémenter core/logging.py
+  - [x] Configuration logger avec niveaux (DEBUG, INFO, ERROR)
+  - [x] Formatage des logs
+- [x] Implémenter core/io.py
+  - [x] Lecture YAML (scrap_url.yaml)
+  - [x] Écriture JSON (scraped_store.json)
+  - [x] Validation des fichiers
+
+### Étape 4 : Architecture des stores
+- [x] Implémenter BaseStore abstrait (stores/base.py)
+  - [x] Méthode match(url) -> float
+  - [x] Méthode canonicalize(url) -> str
+  - [x] Méthode extract_reference(url) -> str
+  - [x] Méthode fetch(url, method, options) -> str
+  - [x] Méthode parse(html, url) -> ProductSnapshot
+- [x] Implémenter Registry (core/registry.py)
+  - [x] Enregistrement dynamique des stores
+  - [x] Détection automatique du store depuis URL
+  - [x] Méthode get_best_store(url)
+
+### Étape 5 : Scraping
+- [x] Implémenter scraping/http_fetch.py
+  - [x] Fonction fetch_http(url, timeout, headers)
+  - [x] Gestion des erreurs (403, timeout, connexion)
+  - [x] User-Agent rotation
+  - [x] Logging détaillé
+- [x] Implémenter scraping/pw_fetch.py
+  - [x] Fonction fetch_playwright(url, options)
+  - [x] Support headless/headful
+  - [x] Sauvegarde HTML optionnelle
+  - [x] Screenshot optionnel
+  - [x] Timeout configurable
+  - [x] Logging détaillé
+
+### Étape 6 : Store Amazon
+- [x] Créer structure stores/amazon/
+- [x] Implémenter stores/amazon/store.py (AmazonStore)
+  - [x] match() : détection amazon.fr/amazon.com
+  - [x] canonicalize() : nettoyage URL vers /dp/{ASIN}
+  - [x] extract_reference() : extraction ASIN
+  - [x] parse() : parsing HTML vers ProductSnapshot
+- [x] Créer stores/amazon/selectors.yml
+  - [x] Sélecteurs pour title
+  - [x] Sélecteurs pour price
+  - [x] Sélecteurs pour currency
+  - [x] Sélecteurs pour shipping_cost
+  - [x] Sélecteurs pour stock_status
+  - [x] Sélecteurs pour images
+  - [x] Sélecteurs pour category
+  - [x] Sélecteurs pour specs
+- [ ] Ajouter fixtures HTML dans stores/amazon/fixtures/
+
+### Étape 7 : Store Cdiscount
+- [x] Créer structure stores/cdiscount/
+- [x] Implémenter stores/cdiscount/store.py (CdiscountStore)
+  - [x] match() : détection cdiscount.com
+  - [x] canonicalize() : nettoyage URL
+  - [x] extract_reference() : extraction SKU
+  - [x] parse() : parsing HTML vers ProductSnapshot
+- [x] Créer stores/cdiscount/selectors.yml
+  - [x] Sélecteurs pour tous les champs ProductSnapshot
+- [ ] Ajouter fixtures HTML dans stores/cdiscount/fixtures/
+
+### Étape 8 : CLI
+- [x] Implémenter cli/main.py avec Typer
+  - [x] Commande `pricewatch run`
+  - [x] Commande `pricewatch detect`
+  - [x] Commande `pricewatch fetch`
+  - [x] Commande `pricewatch parse`
+  - [x] Commande `pricewatch doctor`
+  - [x] Flag --debug global
+  - [x] Logging vers console
+
+### Étape 9 : Tests
+- [x] Configurer pytest dans pyproject.toml
+- [x] Tests core/schema.py
+  - [x] Validation ProductSnapshot
+  - [x] Serialization JSON
+- [x] Tests core/registry.py
+  - [x] Enregistrement stores
+  - [x] Détection automatique
+- [x] Tests stores/amazon/
+  - [x] match() avec différentes URLs
+  - [x] canonicalize()
+  - [x] extract_reference()
+  - [~] parse() sur fixtures HTML (6 tests nécessitent fixtures réels)
+- [ ] Tests stores/cdiscount/
+  - [ ] Idem Amazon
+- [ ] Tests scraping/
+  - [ ] http_fetch avec mock
+  - [ ] pw_fetch avec mock
+
+### Étape 10 : Intégration et validation
+- [x] Créer scrap_url.yaml exemple
+- [x] Tester pipeline complet YAML → JSON
+- [x] Tester avec vraies URLs Amazon
+- [ ] Tester avec vraies URLs Cdiscount
+- [x] Vérifier tous les modes de debug
+- [x] Valider sauvegarde HTML/screenshots
+- [x] Documentation finale
+
+### Bilan Étape 9 (Tests pytest)
+**État**: 80 tests passent / 86 tests totaux (93%)
+- ✓ core/schema.py: 29/29 tests
+- ✓ core/registry.py: 24/24 tests
+- ✓ stores/amazon/: 27/33 tests (6 tests nécessitent fixtures HTML réalistes)
+
+**Tests restants**:
+- Fixtures HTML Amazon/Cdiscount
+- Tests Cdiscount store
+- Tests scraping avec mocks
+
+---
+
+## Phase 2 : Base de données (Future)
+
+### Persistence
+- [ ] Schéma PostgreSQL
+- [ ] Migrations Alembic
+- [ ] Models SQLAlchemy
+- [ ] CRUD produits
+- [ ] Historique prix
+
+### Configuration
+- [ ] Fichier config (DB credentials)
+- [ ] Variables d'environnement
+- [ ] Dockerfile PostgreSQL
+
+---
+
+## Phase 3 : Worker et automation (Future)
+
+### Worker
+- [ ] Setup Redis
+- [ ] Worker RQ ou Celery
+- [ ] Queue de scraping
+- [ ] Retry policy
+
+### Planification
+- [ ] Cron ou scheduler intégré
+- [ ] Scraping quotidien automatique
+- [ ] Logs des runs
+
+---
+
+## Phase 4 : Web UI (Future)
+
+### Backend API
+- [ ] FastAPI endpoints
+- [ ] Authentification
+- [ ] CORS
+
+### Frontend
+- [ ] Framework (React/Vue?)
+- [ ] Design responsive
+- [ ] Dark theme Gruvbox
+- [ ] Graphiques historique prix
+- [ ] Gestion alertes
+
+---
+
+## Phase 5 : Alertes (Future)
+
+### Notifications
+- [ ] Système d'alertes (baisse prix, retour stock)
+- [ ] Email
+- [ ] Webhooks
+- [ ] Push notifications
+
+---
+
+## Améliorations techniques
+
+### Performance
+- [ ] Cache Redis pour résultats
+- [ ] Rate limiting par store
+- [ ] Parallélisation scraping
+
+### Robustesse
+- [ ] Retry automatique sur échec
+- [ ] Circuit breaker
+- [ ] Monitoring (Prometheus?)
+
+### Extensibilité
+- [ ] Plugin system pour nouveaux stores
+- [ ] Configuration stores externe
+- [ ] API publique
+
+---
+
+**Dernière mise à jour**: 2026-01-13