database: treat database as read-only after creation and use shared locks#315
database: treat database as read-only after creation and use shared locks#315fischeti wants to merge 3 commits into
Conversation
a6ac0ee to
9a16348
Compare
|
Actually, I am not sure anymore if the locks are needed. I have the impression that git should be able to handle concurrency itself of a shared git database i.e. objects are immutable and do not need to be locked, while other things like refs seem to have fine-grained lock files: https://stackoverflow.com/questions/19962024/locking-strategy-of-git-to-achieve-concurrency Edit: I guess the |
I think we should preserve locks, and I like your solution to speed things up (still need to review the details). The main issue I'm attempting to solve is with multiple repos fetching the same bare ref. Concurrent fetches racing on the same ref would produce |
|
Yes, you are right🤓 I just realized that git locks are not blocking but just error out |
In #307 we introduced database locks and separaring database from checkout in order to share the database as a cache cross projects. This is also relevant in the context of CI, since the git databases are only fetched once and can then be reused. However, the exclusive locks on the databases essentially prevent any kind of parallelization and every invocation of bender becomes serialized across CI jobs.
This PR aims to relax the locks to use exclusive locks only when writing to the database, and use shared locks when reading from it.
The flow of fetching and checking out a git repository was slightly adapted to reduce the amount of write operations on a shared database:
git initandgit fetch [<rev>]are write operations and still acquire an exclusive locktmp-<hash>in the database and the later checkout withgit clone --branch tmp-<hash>has been replaced withgit clone --shared --no-checkout && git checkout <rev>git gcsince revisions might not be tracked anymore by refs i.e. the previoustmp-<hash>tags. This is maybe overly catious.Furthermore, file locking is done more fine grained. Instead of locking the whole
git_databasecall exclusively, only the part that requires writes (git initandgit fetch) acquire an exclusive lock (if database is not ready yet).