llvm.org GIT mirror llvm / 697e537
docs/GithubMove.rst: Remove obsolete information Summary: Remove references to the multirepo and update the document to reflect the current state of the github repository. Reviewers: mehdi_amini, jyknight Subscribers: jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58420 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365645 91177308-0d34-0410-b5e6-96231b3b80d8 Tom Stellard a month ago
1 changed file(s) with 41 addition(s) and 362 deletion(s). Raw diff Collapse all Expand all
1212 hosted Subversion to GitHub. Below are the financial and technical arguments as
1313 to why we are proposing such a move and how people (and validation
1414 infrastructure) will continue to work with a Git-based LLVM.
16 There will be a survey pointing at this document which we'll use to gauge the
17 community's reaction and, if we collectively decide to move, the time-frame. Be
18 sure to make your view count.
20 Additionally, we will discuss this during a BoF at the next US LLVM Developer
21 meeting (http://llvm.org/devmtg/2016-11/).
2316 What This Proposal is *Not* About
2417 =================================
201194 14. Update links on the LLVM website pointing to viewvc/klaus/phab etc. to
202195 point to GitHub instead.
204 One or Multiple Repositories?
197 Github Repository Description
205198 =============================
207 There are two major variants for how to structure our Git repository: The
208 "multirepo" and the "monorepo".
210 Multirepo Variant
211 -----------------
213 This variant recommends moving each LLVM sub-project to a separate Git
214 repository. This mimics the existing official read-only Git repositories
215 (e.g., http://llvm.org/git/compiler-rt.git), and creates new canonical
216 repositories for each sub-project.
218 This will allow the individual sub-projects to remain distinct: a
219 developer interested only in compiler-rt can checkout only this repository,
220 build it, and work in isolation of the other sub-projects.
222 A key need is to be able to check out multiple projects (i.e. lldb+clang+llvm or
223 clang+llvm+libcxx for example) at a specific revision.
225 A tuple of revisions (one entry per repository) accurately describes the state
226 across the sub-projects.
227 For example, a given version of clang would be
228 **.
230 Umbrella Repository
231 ^^^^^^^^^^^^^^^^^^^
233 To make this more convenient, a separate *umbrella* repository will be
234 provided. This repository will be used for the sole purpose of understanding
235 the sequence in which commits were pushed to the different repositories and to
236 provide a single revision number.
238 This umbrella repository will be read-only and continuously updated
239 to record the above tuple. The proposed form to record this is to use Git
240 [submodules]_, possibly along with a set of scripts to help check out a
241 specific revision of the LLVM distribution.
243 A regular LLVM developer does not need to interact with the umbrella repository
244 -- the individual repositories can be checked out independently -- but you would
245 need to use the umbrella repository to bisect multiple sub-projects at the same
246 time, or to check-out old revisions of LLVM with another sub-project at a
247 consistent state.
249 This umbrella repository will be updated automatically by a bot (running on
250 notice from a webhook on every push, and periodically) on a per commit basis: a
251 single commit in the umbrella repository would match a single commit in a
252 sub-project.
254 Living Downstream
255 ^^^^^^^^^^^^^^^^^
257 Downstream SVN users can use the read/write SVN bridges with the following
258 caveats:
260 * Be prepared for a one-time change to the upstream revision numbers.
261 * The upstream sub-project revision numbers will no longer be in sync.
263 Downstream Git users can continue without any major changes, with the minor
264 change of upstreaming using `git push` instead of `git svn dcommit`.
266 Git users also have the option of adopting an umbrella repository downstream.
267 The tooling for the upstream umbrella can easily be reused for downstream needs,
268 incorporating extra sub-projects and branching in parallel with sub-project
269 branches.
271 Multirepo Preview
272 ^^^^^^^^^^^^^^^^^
274 As a preview (disclaimer: this rough prototype, not polished and not
275 representative of the final solution), you can look at the following:
277 * Repository: https://github.com/llvm-beanz/llvm-submodules
278 * Update bot: http://beanz-bot.com:8180/jenkins/job/submodule-update/
280 Concerns
281 ^^^^^^^^
283 * Because GitHub does not allow server-side hooks, and because there is no
284 "push timestamp" in Git, the umbrella repository sequence isn't totally
285 exact: commits from different repositories pushed around the same time can
286 appear in different orders. However, we don't expect it to be the common case
287 or to cause serious issues in practice.
288 * You can't have a single cross-projects commit that would update both LLVM and
289 other sub-projects (something that can be achieved now). It would be possible
290 to establish a protocol whereby users add a special token to their commit
291 messages that causes the umbrella repo's updater bot to group all of them
292 into a single revision.
293 * Another option is to group commits that were pushed closely enough together
294 in the umbrella repository. This has the advantage of allowing cross-project
295 commits, and is less sensitive to mis-ordering commits. However, this has the
296 potential to group unrelated commits together, especially if the bot goes
297 down and needs to catch up.
298 * This variant relies on heavier tooling. But the current prototype shows that
299 it is not out-of-reach.
300 * Submodules don't have a good reputation / are complicating the command line.
301 However, in the proposed setup, a regular developer will seldom interact with
302 submodules directly, and certainly never update them.
303 * Refactoring across projects is not friendly: taking some functions from clang
304 to make it part of a utility in libSupport wouldn't carry the history of the
305 code in the llvm repo, preventing recursively applying `git blame` for
306 instance. However, this is not very different than how most people are
307 Interacting with the repository today, by splitting such change in multiple
308 commits.
310 Workflows
311 ^^^^^^^^^
313 * :ref:`Checkout/Clone a Single Project, without Commit Access `.
314 * :ref:`Checkout/Clone a Single Project, with Commit Access `.
315 * :ref:`Checkout/Clone Multiple Projects, with Commit Access `.
316 * :ref:`Commit an API Change in LLVM and Update the Sub-projects `.
317 * :ref:`Branching/Stashing/Updating for Local Development or Experiments `.
318 * :ref:`Bisecting `.
320 Monorepo Variant
200 Monorepo
321201 ----------------
323 This variant recommends moving all LLVM sub-projects to a single Git repository,
324 similar to https://github.com/llvm-project/llvm-project.
325 This would mimic an export of the current SVN repository, with each sub-project
326 having its own top-level directory.
327 Not all sub-projects are used for building toolchains. In practice, www/
328 and test-suite/ will probably stay out of the monorepo.
203 The LLVM git repository hosted at https://github.com/llvm/llvm-project contains all
204 sub-projects in a single source tree. It is often refered to as a monorepo and
205 mimics an export of the current SVN repository, with each sub-project having its
206 own top-level directory. Not all sub-projects are used for building toolchains.
207 For example, www/ and test-suite/ are not part of the monorepo.
330209 Putting all sub-projects in a single checkout makes cross-project refactoring
331210 naturally simple:
352231 Building a single sub-project
353232 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
355 Nobody will be forced to build unnecessary projects. The exact structure
356 is TBD, but making it trivial to configure builds for a single sub-project
357 (or a subset of sub-projects) is a hard requirement.
359 As an example, it could look like the following::
234 Even though there is a single source tree, you are not required to build
235 all sub-projects together. It is trivial to configure builds for a single
236 sub-project.
238 For example::
361240 mkdir build && cd build
362241 # Configure only LLVM (default)
369248 .. _git-svn-mirror:
371 Read/write sub-project mirrors
250 Outstanding Questions
251 ---------------------
253 Read-only sub-project mirrors
372254 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
374 With the Monorepo, the existing single-subproject mirrors (e.g.
375 http://llvm.org/git/compiler-rt.git) with git-svn read-write access would
376 continue to be maintained: developers would continue to be able to use the
377 existing single-subproject git repositories as they do today, with *no changes
378 to workflow*. Everything (git fetch, git svn dcommit, etc.) could continue to
379 work identically to how it works today. The monorepo can be set-up such that the
380 SVN revision number matches the SVN revision in the GitHub SVN-bridge.
382 Living Downstream
383 ^^^^^^^^^^^^^^^^^
385 Downstream SVN users can use the read/write SVN bridge. The SVN revision
386 number can be preserved in the monorepo, minimizing the impact.
388 Downstream Git users can continue without any major changes, by using the
389 git-svn mirrors on top of the SVN bridge.
391 Git users can also work upstream with monorepo even if their downstream
392 fork has split repositories. They can apply patches in the appropriate
393 subdirectories of the monorepo using, e.g., `git am --directory=...`, or
394 plain `diff` and `patch`.
396 Alternatively, Git users can migrate their own fork to the monorepo. As a
397 demonstration, we've migrated the "CHERI" fork to the monorepo in two ways:
399 * Using a script that rewrites history (including merges) so that it looks
400 like the fork always lived in the monorepo [LebarCHERI]_. The upside of
401 this is when you check out an old revision, you get a copy of all llvm
402 sub-projects at a consistent revision. (For instance, if it's a clang
403 fork, when you check out an old revision you'll get a consistent version
404 of llvm proper.) The downside is that this changes the fork's commit
405 hashes.
407 * Merging the fork into the monorepo [AminiCHERI]_. This preserves the
408 fork's commit hashes, but when you check out an old commit you only get
409 the one sub-project.
411 Monorepo Preview
412 ^^^^^^^^^^^^^^^^^
414 As a preview (disclaimer: this rough prototype, not polished and not
415 representative of the final solution), you can look at the following:
417 * Full Repository: https://github.com/joker-eph/llvm-project
418 * Single sub-project view with *SVN write access* to the full repo:
419 https://github.com/joker-eph/compiler-rt
421 Concerns
422 ^^^^^^^^
256 With the Monorepo, it is undecided whether the existing single-subproject
257 mirrors (e.g. https://git.llvm.org/git/compiler-rt.git) will continue to
258 be maintained.
260 Read/write SVN bridge
261 ^^^^^^^^^^^^^^^^^^^^^
263 GitHub supports a read/write SVN bridge for its repositories. However,
264 there have been issues with this bridge working correctly in the past,
265 so it's not clear if this is something that will be supported going forward.
267 Monorepo Drawbacks
268 ------------------
424270 * Using the monolithic repository may add overhead for those contributing to a
425271 standalone sub-project, particularly on runtimes like libcxx and compiler-rt
426272 that don't rely on LLVM; currently, a fresh clone of libcxx is only 15MB (vs.
427273 1GB for the monorepo), and the commit rate of LLVM may cause more frequent
428 `git push` collisions when upstreaming. Affected contributors can continue to
429 use the SVN bridge or the single-subproject Git mirrors with git-svn for
430 read-write.
274 `git push` collisions when upstreaming. Affected contributors may be able to
275 use the SVN bridge or the single-subproject Git mirrors. However, it's
276 undecided if these projects will continue to be mantained.
431277 * Using the monolithic repository may add overhead for those *integrating* a
432278 standalone sub-project, even if they aren't contributing to it, due to the
433279 same disk space concern as the point above. The availability of the
434 sub-project Git mirror addresses this, even without SVN access.
280 sub-project Git mirrors would addresses this.
435281 * Preservation of the existing read/write SVN-based workflows relies on the
436 GitHub SVN bridge, which is an extra dependency. Maintaining this locks us
282 GitHub SVN bridge, which is an extra dependency. Maintaining this locks us
437283 into GitHub and could restrict future workflow changes.
439285 Workflows
440286 ^^^^^^^^^
442288 * :ref:`Checkout/Clone a Single Project, without Commit Access `.
443 * :ref:`Checkout/Clone a Single Project, with Commit Access `.
444289 * :ref:`Checkout/Clone Multiple Projects, with Commit Access `.
445290 * :ref:`Commit an API Change in LLVM and Update the Sub-projects `.
446291 * :ref:`Branching/Stashing/Updating for Local Development or Experiments `.
447292 * :ref:`Bisecting `.
449 Multi/Mono Hybrid Variant
450 -------------------------
452 This variant recommends moving only the LLVM sub-projects that are *rev-locked*
453 to LLVM into a monorepo (clang, lld, lldb, ...), following the multirepo
454 proposal for the rest. While neither variant recommends combining sub-projects
455 like www/ and test-suite/ (which are completely standalone), this goes further
456 and keeps sub-projects like libcxx and compiler-rt in their own distinct
457 repositories.
459 Concerns
460 ^^^^^^^^
462 * This has most disadvantages of multirepo and monorepo, without bringing many
463 of the advantages.
464 * Downstream have to upgrade to the monorepo structure, but only partially. So
465 they will keep the infrastructure to integrate the other separate
466 sub-projects.
467 * All projects that use LIT for testing are effectively rev-locked to LLVM.
468 Furthermore, some runtimes (like compiler-rt) are rev-locked with Clang.
469 It's not clear where to draw the lines.
472294 Workflow Before/After
473295 =====================
477299 various use-cases.
479301 .. _workflow-checkout-commit:
481 Checkout/Clone a Single Project, without Commit Access
482 ------------------------------------------------------
484 Except the URL, nothing changes. The possibilities today are::
486 svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm
487 # or with Git
488 git clone http://llvm.org/git/llvm.git
490 After the move to GitHub, you would do either::
492 git clone https://github.com/llvm-project/llvm.git
493 # or using the GitHub svn native bridge
494 svn co https://github.com/llvm-project/llvm/trunk
496 The above works for both the monorepo and the multirepo, as we'll maintain the
497 existing read-only views of the individual sub-projects.
499303 Checkout/Clone a Single Project, with Commit Access
500304 ---------------------------------------------------
519323 .. _workflow-multicheckout-nocommit:
521 Multirepo Variant
522 ^^^^^^^^^^^^^^^^^
524 With the multirepo variant, nothing changes but the URL, and commits can be
525 performed using `svn commit` or `git commit` and `git push`::
527 git clone https://github.com/llvm/llvm.git llvm
528 # or using the GitHub svn native bridge
529 svn co https://github.com/llvm/llvm/trunk/ llvm
531 .. _workflow-monocheckout-nocommit:
533325 Monorepo Variant
534326 ^^^^^^^^^^^^^^^^
536328 With the monorepo variant, there are a few options, depending on your
537 constraints. First, you could just clone the full repository::
539 git clone https://github.com/llvm/llvm-projects.git llvm
540 # or using the GitHub svn native bridge
541 svn co https://github.com/llvm/llvm-projects/trunk/ llvm
329 constraints. First, you could just clone the full repository:
331 git clone https://github.com/llvm/llvm-project.git
543333 At this point you have every sub-project (llvm, clang, lld, lldb, ...), which
544334 :ref:`doesn't imply you have to build all of them `. You
545335 can still build only compiler-rt for instance. In this way it's not different
546336 from someone who would check out all the projects with SVN today.
548 You can commit as normal using `git commit` and `git push` or `svn commit`, and
549 read the history for a single project (`git log libcxx` for example).
551 Secondly, there are a few options to avoid checking out all the sources.
553 **Using the GitHub SVN bridge**
555 The GitHub SVN native bridge allows to checkout a subdirectory directly:
557 svn co https://github.com/llvm/llvm-projects/trunk/compiler-rt compiler-rt —username=...
559 This checks out only compiler-rt and provides commit access using "svn commit",
560 in the same way as it would do today.
562 **Using a Subproject Git Mirror**
564 You can use *git-svn* and one of the sub-project mirrors::
566 # Clone from the single read-only Git repo
567 git clone http://llvm.org/git/llvm.git
568 cd llvm
569 # Configure the SVN remote and initialize the svn metadata
570 $ git svn init https://github.com/joker-eph/llvm-project/trunk/llvm —username=...
571 git config svn-remote.svn.fetch :refs/remotes/origin/master
572 git svn rebase -l
574 In this case the repository contains only a single sub-project, and commits can
575 be made using `git svn dcommit`, again exactly as we do today.
577 **Using a Sparse Checkouts**
579 You can hide the other directories using a Git sparse checkout::
338 If you want to avoid checking out all the sources, you can hide the other
339 directories using a Git sparse checkout::
581341 git config core.sparseCheckout true
582342 echo /compiler-rt > .git/info/sparse-checkout
645405 Note that the list would be longer with more sub-projects.
647 .. _workflow-multicheckout-multicommit:
649 Multirepo Variant
650 ^^^^^^^^^^^^^^^^^
652 With the multirepo variant, the umbrella repository will be used. This is
653 where the mapping from a single revision number to the individual repositories
654 revisions is stored.::
656 git clone https://github.com/llvm-beanz/llvm-submodules
657 cd llvm-submodules
658 git checkout $REVISION
659 git submodule init
660 git submodule update clang llvm libcxx
661 # the list of sub-project is optional, `git submodule update` would get them all.
663 At this point the clang, llvm, and libcxx individual repositories are cloned
664 and stored alongside each other. There are CMake flags to describe the directory
665 structure; alternatively, you can just symlink `clang` to `llvm/tools/clang`,
666 etc.
668 Another option is to checkout repositories based on the commit timestamp::
670 git checkout `git rev-list -n 1 --before="2009-07-27 13:37" master`
672407 .. _workflow-monocheckout-multicommit:
674409 Monorepo Variant
677412 The repository contains natively the source for every sub-projects at the right
678413 revision, which makes this straightforward::
680 git clone https://github.com/llvm/llvm-projects.git llvm-projects
415 git clone https://github.com/llvm/llvm-project.git
681416 cd llvm-projects
682417 git checkout $REVISION
734469 cd ../../projects/libcxx
735470 git checkout AnotherBranch
737 .. _workflow-multi-branching:
739 Multirepo Variant
740 ^^^^^^^^^^^^^^^^^
742 The multirepo works the same as the current Git workflow: every command needs
743 to be applied to each of the individual repositories.
744 However, the umbrella repository makes this easy using `git submodule foreach`
745 to replicate a command on all the individual repositories (or submodules
746 in this case):
748 To create a new branch::
750 git submodule foreach git checkout -b MyBranch
752 To switch branches::
754 git submodule foreach git checkout AnotherBranch
756472 .. _workflow-mono-branching:
758474 Monorepo Variant
788504 the native Git bisection script over the llvm repository, and use some scripting
789505 to synchronize the clang repository to match the llvm revision.
791 .. _workflow-multi-bisecting:
793 Multirepo Variant
794 ^^^^^^^^^^^^^^^^^
796 With the multi-repositories variant, the cross-repository synchronization is
797 achieved using the umbrella repository. This repository contains only
798 submodules for the other sub-projects. The native Git bisection can be used on
799 the umbrella repository directly. A subtlety is that the bisect script itself
800 needs to make sure the submodules are updated accordingly.
802 For example, to find which commit introduces a regression where clang-3.9
803 crashes but not clang-3.8 passes, one should be able to simply do::
805 git bisect start release_39 release_38
806 git bisect run ./bisect_script.sh
808 With the `bisect_script.sh` script being::
810 #!/bin/sh
812 git submodule update llvm clang libcxx #....
813 cd $BUILD_DIR
815 ninja clang || exit 125 # an exit code of 125 asks "git bisect"
816 # to "skip" the current commit
818 ./bin/clang some_crash_test.cpp
820 When the `git bisect run` command returns, the umbrella repository is set to
821 the state where the regression is introduced. The commit diff in the umbrella
822 indicate which submodule was updated, and the last commit in this sub-projects
823 is the one that the bisect found.
825507 .. _workflow-mono-bisecting:
827509 Monorepo Variant
834516 The same example, finding which commit introduces a regression where clang-3.9
835517 crashes but not clang-3.8 passes, will look like::
837 git bisect start release_39 release_38
519 git bisect start releases/3.9.x releases/3.8.x
838520 git bisect run ./bisect_script.sh
840522 With the `bisect_script.sh` script being::
13931075 .. [TrickRevNum] Andrew Trick, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041721.html
13941076 .. [JSonnRevNum] Joerg Sonnenberg, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041688.html
13951077 .. [MatthewsRevNum] Chris Matthews, http://lists.llvm.org/pipermail/cfe-dev/2016-July/049886.html
1396 .. [submodules] Git submodules, https://git-scm.com/book/en/v2/Git-Tools-Submodules)
13971078 .. [statuschecks] GitHub status-checks, https://help.github.com/articles/about-required-status-checks/
1398 .. [LebarCHERI] Port *CHERI* to a single repository rewriting history, http://lists.llvm.org/pipermail/llvm-dev/2016-July/102787.html
1399 .. [AminiCHERI] Port *CHERI* to a single repository preserving history, http://lists.llvm.org/pipermail/llvm-dev/2016-July/102804.html