127 lines
3.8 KiB
Markdown
127 lines
3.8 KiB
Markdown
# EBoek.info Scraper
|
|
|
|
Een moderne PyQt5 GUI applicatie voor het scrapen van EBoek.info met dual scraping modes, real-time voortgangsmonitoring en veilige opslag van inloggegevens.
|
|
|
|
## ✨ Functies
|
|
|
|
- **Twee scraping modi**: All Comics en Latest Comics
|
|
- **Gebruiksvriendelijke GUI** met real-time voortgang
|
|
- **Veilige credential opslag** in JSON config
|
|
- **Cross-platform** ondersteuning (Windows/macOS)
|
|
- **Background threading** - GUI blijft responsief
|
|
- **Graceful cancellation** tijdens operaties
|
|
|
|
## 📋 Vereisten
|
|
|
|
- **Python 3.8+**
|
|
- **Google Chrome** browser
|
|
- **EBoek.info** account
|
|
|
|
## 🚀 Installatie
|
|
|
|
### Optie 1: Standalone Executable (Aanbevolen)
|
|
**Geen Python installatie nodig!**
|
|
|
|
**Windows:**
|
|
```cmd
|
|
# Bouw Windows .exe (run in Command Prompt)
|
|
scripts\build_exe.bat
|
|
|
|
# Of cross-platform builder:
|
|
python scripts\build_executable.py
|
|
```
|
|
|
|
**macOS/Linux:**
|
|
```bash
|
|
# Bouw executable (eenmalig)
|
|
python3 scripts/build_executable.py
|
|
|
|
# Of platform-specific:
|
|
./scripts/build_exe.sh
|
|
```
|
|
|
|
**Result:** `dist/EBoek_Scraper.exe` (Windows) of `dist/EBoek_Scraper` (Unix)
|
|
|
|
### Optie 2: Python Installatie
|
|
**Windows:**
|
|
```cmd
|
|
scripts/install_and_run.bat
|
|
```
|
|
|
|
**macOS / Linux:**
|
|
```bash
|
|
chmod +x scripts/install_and_run.sh
|
|
./scripts/install_and_run.sh
|
|
```
|
|
|
|
**Handmatig:**
|
|
```bash
|
|
pip install selenium urllib3 PyQt5
|
|
python3 gui_main.py
|
|
```
|
|
|
|
## 🎯 Gebruik
|
|
|
|
1. **Start de applicatie**: `python3 gui_main.py`
|
|
2. **Voer credentials in**: Klik "Change Credentials"
|
|
3. **Kies scraping mode**: All Comics of Latest Comics
|
|
4. **Stel pagina bereik in**: Start/eind pagina
|
|
5. **Start scraping**: Klik "Start Scraping"
|
|
|
|
## 📊 Scraping Modi
|
|
|
|
### Mode 0: All Comics
|
|
- **URL patroon**: `stripverhalen-alle/page/X/`
|
|
- **Structuur**: Traditionele blog layout
|
|
- **Selecteer**: `h2.post-title a`
|
|
|
|
### Mode 1: Latest Comics
|
|
- **URL patroon**: `laatste?_page=X&ref=dw`
|
|
- **Structuur**: Grid layout met containers
|
|
- **Selecteer**: `.pt-cv-wrapper .pt-cv-ifield h5.pt-cv-title a`
|
|
|
|
## 🗂️ Project Structuur
|
|
|
|
```
|
|
├── README.md # Project documentatie
|
|
├── requirements.txt # Dependencies
|
|
├── gui_main.py # GUI applicatie entry point
|
|
├── scripts/ # Build & install scripts
|
|
│ ├── build_executable.py # Executable builder (aanbevolen)
|
|
│ ├── build_exe.bat # Windows build script
|
|
│ ├── build_exe.sh # macOS/Linux build script
|
|
│ ├── eboek_scraper.spec # PyInstaller configuration
|
|
│ ├── install_and_run.bat # Windows installer
|
|
│ └── install_and_run.sh # macOS/Linux installer
|
|
├── docs/ # Documentatie
|
|
│ └── BUILD_GUIDE.md # Executable build guide
|
|
├── core/ # Scraping logic
|
|
│ ├── scraper.py # Dual-mode scraper
|
|
│ ├── scraper_thread.py # Threading wrapper
|
|
│ └── credentials.py # Config management
|
|
├── gui/ # GUI components
|
|
│ ├── main_window.py # Main interface
|
|
│ ├── login_dialog.py # Credential input
|
|
│ └── progress_dialog.py # Progress monitoring
|
|
├── tests/ # Test scripts
|
|
├── utils/ # Helper functions
|
|
└── dist/ # Built executables (na build)
|
|
└── EBoek_Scraper # Standalone executable
|
|
```
|
|
|
|
## 🔧 Troubleshooting
|
|
|
|
**GUI start niet**: Controleer PyQt5 installatie
|
|
**Login problemen**: Test credentials via GUI
|
|
**Download issues**: Controleer `~/Downloads` folder
|
|
|
|
## 💡 Tips
|
|
|
|
- **Distributie**: Gebruik executable voor makkelijke delen (geen Python setup nodig)
|
|
- **Testing**: Begin met 1-2 pagina's om de functionaliteit te testen
|
|
- **Performance**: Gebruik headless mode voor optimale snelheid
|
|
- **Monitoring**: Volg de voortgang in de progress dialog met realtime statistieken
|
|
|
|
---
|
|
|
|
**Veel succes met scrapen! 🚀** |