- Python 98.4%
- C++ 1.5%
- C 0.1%
|
|
||
|---|---|---|
| .gear | ||
| altrepobot | ||
| bin | ||
| service | ||
| sql | ||
| src | ||
| tests | ||
| tools | ||
| .clang-format | ||
| .gitignore | ||
| .ruff.toml | ||
| amqpfire_config.json.example | ||
| AUTHORS.txt | ||
| CHANGELOG.md | ||
| config.ini.example | ||
| LICENSE | ||
| MANIFEST.in | ||
| README.md | ||
| requirements.txt | ||
| setup.cfg | ||
| setup.py | ||
ALTRepo Uploader
ALTRepo Uploader (a.k.a ALTRepoDB) is a set of tools that used to uploading data about ALT Linux distributions to Clickhouse database.
Database contents is used to maintain ALT Linux development and analytics with ALTRepo API.
License
Dependencies
ALTRepo Uploader requires Python version 3.9 or higher.
ALTRepo Uploader requires following packages installed for tools to be full functional.
Note: some package names are ALT Linux specific
System packages
- xz
- git
- fuseiso
- gostsum
- squashfuse
- cdrkit-utils
- libvirt
- qemu-img
- qemu-kvm
- libguestfs
- guestfs-data
- rabbitmq-c
- librpm7
- libclickhouse-cpp
Python packages
- python3-module-rpm
- python3-module-requests
- python3-module-zstandard
- python3-module-libarchive-c
- python3-module-setproctitle
- python3-module-beautifulsoup4
- python3-module-clickhouse-driver
Database structure
Project overview
Project summary and purpose
ALTRepo Uploader collects repository snapshots, build tasks, distribution images, and QA and security feeds from the ALT Linux ecosystem and stores them in ClickHouse. The data powers ALTRepo API and analytics for package history, build state, and vulnerability tracking.
Example: a daily branch snapshot is loaded into a package set so the API can answer "what packages were in p10 on a given date."
Architecture overview
- CLI loaders parse local artifacts (repo snapshots, task trees, images) and write normalized data to ClickHouse.
- The
altrepoddaemon runs service instances that consume AMQP messages and invoke the same loaders. - Scheduler tasks periodically pull external feeds and refresh enrichment tables.
- Errata Server integration performs CVE matching and errata creation during repo and task processing.
- ALTRepo API reads from ClickHouse for analytics and reporting.
Example: an AMQP message for a repo update triggers repo_loader through altrepod.
Business logic description
- Repository ingestion: parse a branch snapshot, compute package and file metadata, and create a package set record.
- Task ingestion: parse build tasks, store task state, subtasks, logs, approvals, and build iterations for traceability.
- Image ingestion: extract package lists from ISO/IMG/QCOW or container images and store image package sets with edition and variant metadata.
- Enrichment: load ACL ownership, Bugzilla issues, Beehive build status, Repocop checks, Watch updates, SPDX licenses, and Repology versions.
- Vulnerability and errata processing: update CVE and CPE data, map packages to CPEs by branch, match version ranges, compute vulnerability status, and generate errata entries from changelogs.
Example: when a package version moves outside a vulnerable range, it is marked fixed and linked to an errata record.
Database structure description
- Package inventory:
Packages,PackageHash,Files,FileNames, andChangelog. - Package sets and repository state:
PackageSetName,PackageSet, andRepositoryStatus. - Build and task history:
Tasks,TaskStates,TaskIterations,TaskLogs,TaskProgress, andTaskApprovals. - Image metadata:
ImagePackageSetName,ImageStatus, andImageTagStatus. - QA and external feeds:
Bugzilla,BeehiveStatus,PackagesRepocop,PackagesWatch,SPDXLicenses, andRepologyLatestVersions. - Vulnerability and errata:
Vulnerabilities,CpeDictionary,CpeMatch,PackagesCveMatch,PackagesVulnerabilityStatus,ErrataHistory,ErrataChangeHistory, andErrataID. - Ingestion helpers: buffer tables and materialized views that batch loads and keep "latest" datasets up to date.
Example: PackageSetName links a branch snapshot to the package hashes it contains.
ALTRepo Uploader uses Clickhouse as DBMS due to it's high performance and convenience for analytics.
Database structure initialization
Initial database structure is stored in sql/0000-initial.sql file and could be deployed at Clickhouse server with following command:
[user@host]$ cat sql/0000_initial.sql | clickhouse-client -h %SEREVR_IP_OR_DNS_NAME% -d %DATABASE_NAME% -n
Database contents initialization
Some additional initialization data included as well. For example license name aliases could be uploaded with:
[user@host]$ cat sql/license_aliases.json | clickhouse-client -h %SEREVR_IP_OR_DNS_NAME% -d %DATABASE_NAME% --query="INSERT INTO LicenseAliases FORMAT JSONEachRow"
Database permissions
It is necessary to set proper permissions for database user that will be used by utilities for connection. At least it is neccessary to grant read and write permissions for all created tables and full permissions for temporary tables.
ALTRepo Uploader service
ALTRepo Uploader provides an altrepod systemd daemon that handles uploading data by receiving AMQP messages from RabbitMQ broker.
Altrepod uses service instances with separate configuration to handle particular AMQP messages.
Configuration files
When installed through RPM package, systemd unit file ready to be enabled in regular way right after appropriate configuration files are added to /etc/altrepod/config.json for altrepod itself and /etc/altrepod/services.d/%service_name%.json for each service instance that enabled.
Configuration templates could be found in /etc/altrepod directory.
Each service configuration file consists of 3 sections:
- Service behaviour configuration
- Database connection configuration
- RabbitMQ connection configuration
Secure connection to RabbitMQ
While connecting with RabbitMQ using SSL(https) it is required to have certificate file on host and set path to it in configuration files accordingly.
The amqpfire utility
In order to provide tool to 'fire' some specific altrepod service an repodb_amqpfire utility were added.
The utility sends AMQP messages with appropriate payload using it's own configuration file.
List of supported services and options could be obtained running utility with -h argument.
[user@host]$ repodb_amqpfire -h
[user@host]$ repodb_amqpfire -c amqpfire_config.json -s repo -p p10 2022-06-22
Configuration example could be found in /usr/share/doc/altrepodb-%version%/ dicrectory.
ALTRepo Uploader utilities
Most of provided CLI tools has pretty common set of arguments. All of them have at least -h option that displays the usage message.
Configuration file
All CLI tools supports configuration provided by file with -c, --config option. Configuration file example is config.ini.example.
[DEFAULT]
workers=10 # number of threads (if used by utility)
[LOGGING]
log_to_file=no # controls logging to file
log_to_syslog=no # controls logging to syslog
log_to_console=yes # controls logging to console [stderr]
syslog_ident=altrepodb # controls syslog identity
[DATABASE]
dbname=repodb # database name
host=localhost # Clickhouse server IP address
port=9000 # Clickhouse server port
user=default # databse user name
password= # database user password
Note: Only logging level could be managed by CLI options. Logging handlers are controlled only by configuration file.
Command line tools
repo_loader
The utility uploads content of branch's repository state from file system to database. Check the usage message with command:
[user@host]$ repo_loader -h
Usage example:
[user@host]$ repo_loader sisyphus /archive/repo/sisyphus/date/2021/08/18 --date 2021-08-18 -c config.ini --tag test_load -v
task_loader
The utility uploads content of building task state from file system to database. Check the usage message with command:
[user@host]$ task_loader -h
Usage example:
[user@host]$ task_loader /archive/tasks/done/_276/283337 -c config.ini -f -D
image_loader
The utility uploads content of ALT Linux distribution image in ISO, TAR, IMG, QCOW2 formats to database. Check the usage message with command:
[user@host]$ image_loader -h
Usage example:
[user@host]$ image_loader alt-p10-opennebula-x86_64.qcow2 --branch p10 --edition cloud --version 10.0.0 --release release --platform "" --variant install --flavor opennebula --arch x86_64 --date 2022-02-10 --url "http://ftp.altlinux.org/%PATH_TO_IMAGE%" --type qcow -c config.ini --debug
acl_loader
The utility uploads ALT Linux maintaners ACLs to database. Check the usage message with command:
[user@host]$ acl_loader -h
beehive_loader
The utility uploads Beehive packages build results to database. Check the usage message with command:
[user@host]$ beehive_loader -h
bugzilla
The utility uploads Bugzilla issues to database. Check the usage message with command:
[user@host]$ bugzilla -h
repocop_loader
The utility uploads Repocop packages inspection to database. Check the usage message with command:
[user@host]$ repocop_loader -h
watch_loader
The utility uploads package's versions updates from Watch to database. Check the usage message with command:
[user@host]$ watch_loader -h
spdx_loader
The utility uploads licenses information from SPDX Git repository to database. Check the usage message with command:
[user@host]$ spdx_loader -h
Code style
Now project uses black for code formatting and flake8 as a linter with configuration defined in setup.cfg file.
Afternote
ALTRepo Uploader is under continuous development.
Functionality, database and code structure changes rapidly.
Check changelog and Git history for details.