llvm.org GIT mirror llvm / d6afe38
Moving to GitHub - Unified Proposal This document describes the proposal to move to GitHub, and compare the two proposals through various workflow examples, presenting the current set of commands following by the ones involved in each of the two proposals. It is intended to supersede the previous "submodule proposal" document entirely, and drive the discussion at the BoF during the next Dev Meeting. Differential Revision: https://reviews.llvm.org/D24167 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284077 91177308-0d34-0410-b5e6-96231b3b80d8 Mehdi Amini 3 years ago
3 changed file(s) with 870 addition(s) and 275 deletion(s). Raw diff Collapse all Expand all
0 ==============================
1 Moving LLVM Projects to GitHub
2 ==============================
3
4 .. contents:: Table of Contents
5 :depth: 4
6 :local:
7
8 Introduction
9 ============
10
11 This is a proposal to move our current revision control system from our own
12 hosted Subversion to GitHub. Below are the financial and technical arguments as
13 to why we are proposing such a move and how people (and validation
14 infrastructure) will continue to work with a Git-based LLVM.
15
16 There will be a survey pointing at this document which we'll use to gauge the
17 community's reaction and, if we collectively decide to move, the time-frame. Be
18 sure to make your view count.
19
20 Additionally, we will discuss this during a BoF at the next US LLVM Developer
21 meeting (http://llvm.org/devmtg/2016-11/).
22
23 What This Proposal is *Not* About
24 =================================
25
26 Changing the development policy.
27
28 This proposal relates only to moving the hosting of our source-code repository
29 from SVN hosted on our own servers to Git hosted on GitHub. We are not proposing
30 using GitHub's issue tracker, pull-requests, or code-review.
31
32 Contributers will continue to earn commit access on demand under the Developer
33 Policy, except that that a GitHub account will be required instead of SVN
34 username/password-hash.
35
36 Why Git, and Why GitHub?
37 ========================
38
39 Why Move At All?
40 ----------------
41
42 This discussion began because we currently host our own Subversion server
43 and Git mirror on a voluntary basis. The LLVM Foundation sponsors the server and
44 provides limited support, but there is only so much it can do.
45
46 Volunteers are not sysadmins themselves, but compiler engineers that happen
47 to know a thing or two about hosting servers. We also don't have 24/7 support,
48 and we sometimes wake up to see that continuous integration is broken because
49 the SVN server is either down or unresponsive.
50
51 We should take advantage of one of the services out there (GitHub, GitLab,
52 and BitBucket, among others) that offer better service (24/7 stability, disk
53 space, Git server, code browsing, forking facilities, etc) for free.
54
55 Why Git?
56 --------
57
58 Many new coders nowadays start with Git, and a lot of people have never used
59 SVN, CVS, or anything else. Websites like GitHub have changed the landscape
60 of open source contributions, reducing the cost of first contribution and
61 fostering collaboration.
62
63 Git is also the version control many LLVM developers use. Despite the
64 sources being stored in a SVN server, these developers are already using Git
65 through the Git-SVN integration.
66
67 Git allows you to:
68
69 * Commit, squash, merge, and fork locally without touching the remote server.
70 * Maintain local branches, enabling multiple threads of development.
71 * Collaborate on these branches (e.g. through your own fork of llvm on GitHub).
72 * Inspect the repository history (blame, log, bisect) without Internet access.
73 * Maintain remote forks and branches on Git hosting services and
74 integrate back to the main repository.
75
76 In addition, because Git seems to be replacing many OSS projects' version
77 control systems, there are many tools that are built over Git.
78 Future tooling may support Git first (if not only).
79
80 Why GitHub?
81 -----------
82
83 GitHub, like GitLab and BitBucket, provides free code hosting for open source
84 projects. Any of these could replace the code-hosting infrastructure that we
85 have today.
86
87 These services also have a dedicated team to monitor, migrate, improve and
88 distribute the contents of the repositories depending on region and load.
89
90 GitHub has one important advantage over GitLab and
91 BitBucket: it offers read-write **SVN** access to the repository
92 (https://github.com/blog/626-announcing-svn-support).
93 This would enable people to continue working post-migration as though our code
94 were still canonically in an SVN repository.
95
96 In addition, there are already multiple LLVM mirrors on GitHub, indicating that
97 part of our community has already settled there.
98
99 On Managing Revision Numbers with Git
100 -------------------------------------
101
102 The current SVN repository hosts all the LLVM sub-projects alongside each other.
103 A single revision number (e.g. r123456) thus identifies a consistent version of
104 all LLVM sub-projects.
105
106 Git does not use sequential integer revision number but instead uses a hash to
107 identify each commit. (Linus mentioned that the lack of such revision number
108 is "the only real design mistake" in Git [TorvaldRevNum]_.)
109
110 The loss of a sequential integer revision number has been a sticking point in
111 past discussions about Git:
112
113 - "The 'branch' I most care about is mainline, and losing the ability to say
114 'fixed in r1234' (with some sort of monotonically increasing number) would
115 be a tragic loss." [LattnerRevNum]_
116 - "I like those results sorted by time and the chronology should be obvious, but
117 timestamps are incredibly cumbersome and make it difficult to verify that a
118 given checkout matches a given set of results." [TrickRevNum]_
119 - "There is still the major regression with unreadable version numbers.
120 Given the amount of Bugzilla traffic with 'Fixed in...', that's a
121 non-trivial issue." [JSonnRevNum]_
122 - "Sequential IDs are important for LNT and llvmlab bisection tool." [MatthewsRevNum]_.
123
124 However, Git can emulate this increasing revision number:
125 `git rev-list --count `. This identifier is unique only within a
126 single branch, but this means the tuple `(num, branch-name)` uniquely identifies
127 a commit.
128
129 We can thus use this revision number to ensure that e.g. `clang -v` reports a
130 user-friendly revision number (e.g. `master-12345` or `4.0-5321`), addressing
131 the objections raised above with respect to this aspect of Git.
132
133 What About Branches and Merges?
134 -------------------------------
135
136 In contrast to SVN, Git makes branching easy. Git's commit history is
137 represented as a DAG, a departure from SVN's linear history. However, we propose
138 to mandate making merge commits illegal in our canonical Git repository.
139
140 Unfortunately, GitHub does not support server side hooks to enforce such a
141 policy. We must rely on the community to avoid pushing merge commits.
142
143 GitHub offers a feature called `Status Checks`: a branch protected by
144 `status checks` requires commits to be whitelisted before the push can happen.
145 We could supply a pre-push hook on the client side that would run and check the
146 history, before whitelisting the commit being pushed [statuschecks]_.
147 However this solution would be somewhat fragile (how do you update a script
148 installed on every developer machine?) and prevents SVN access to the
149 repository.
150
151 What About Commit Emails?
152 -------------------------
153
154 We will need a new bot to send emails for each commit. This proposal leaves the
155 email format unchanged besides the commit URL.
156
157 Straw Man Migration Plan
158 ========================
159
160 Step #1 : Before The Move
161 -------------------------
162
163 1. Update docs to mention the move, so people are aware of what is going on.
164 2. Set up a read-only version of the GitHub project, mirroring our current SVN
165 repository.
166 3. Add the required bots to implement the commit emails, as well as the
167 umbrella repository update (if the multirepo is selected) or the read-only
168 Git views for the sub-projects (if the monorepo is selected).
169
170 Step #2 : Git Move
171 ------------------
172
173 4. Update the buildbots to pick up updates and commits from the GitHub
174 repository. Not all bots have to migrate at this point, but it'll help
175 provide infrastructure testing.
176 5. Update Phabricator to pick up commits from the GitHub repository.
177 6. LNT and llvmlab have to be updated: they rely on unique monotonically
178 increasing integer across branch [MatthewsRevNum]_.
179 7. Instruct downstream integrators to pick up commits from the GitHub
180 repository.
181 8. Review and prepare an update for the LLVM documentation.
182
183 Until this point nothing has changed for developers, it will just
184 boil down to a lot of work for buildbot and other infrastructure
185 owners.
186
187 The migration will pause here until all dependencies have cleared, and all
188 problems have been solved.
189
190 Step #3: Write Access Move
191 --------------------------
192
193 9. Collect developers' GitHub account information, and add them to the project.
194 10. Switch the SVN repository to read-only and allow pushes to the GitHub repository.
195 11. Update the documentation.
196 12. Mirror Git to SVN.
197
198 Step #4 : Post Move
199 -------------------
200
201 13. Archive the SVN repository.
202 14. Update links on the LLVM website pointing to viewvc/klaus/phab etc. to
203 point to GitHub instead.
204
205 One or Multiple Repositories?
206 =============================
207
208 There are two major variants for how to structure our Git repository: The
209 "multirepo" and the "monorepo".
210
211 Multirepo Variant
212 -----------------
213
214 This variant recommends moving each LLVM sub-project to a separate Git
215 repository. This mimics the existing official read-only Git repositories
216 (e.g., http://llvm.org/git/compiler-rt.git), and creates new canonical
217 repositories for each sub-project.
218
219 This will allow the individual sub-projects to remain distinct: a
220 developer interested only in compiler-rt can checkout only this repository,
221 build it, and work in isolation of the other sub-projects.
222
223 A key need is to be able to check out multiple projects (i.e. lldb+clang+llvm or
224 clang+llvm+libcxx for example) at a specific revision.
225
226 A tuple of revisions (one entry per repository) accurately describes the state
227 across the sub-projects.
228 For example, a given version of clang would be
229 **.
230
231 Umbrella Repository
232 ^^^^^^^^^^^^^^^^^^^
233
234 To make this more convenient, a separate *umbrella* repository will be
235 provided. This repository will be used for the sole purpose of understanding
236 the sequence in which commits were pushed to the different repositories and to
237 provide a single revision number.
238
239 This umbrella repository will be read-only and continuously updated
240 to record the above tuple. The proposed form to record this is to use Git
241 [submodules]_, possibly along with a set of scripts to help check out a
242 specific revision of the LLVM distribution.
243
244 A regular LLVM developer does not need to interact with the umbrella repository
245 -- the individual repositories can be checked out independently -- but you would
246 need to use the umbrella repository to bisect multiple sub-projects at the same
247 time, or to check-out old revisions of LLVM with another sub-project at a
248 consistent state.
249
250 This umbrella repository will be updated automatically by a bot (running on
251 notice from a webhook on every push, and periodically) on a per commit basis: a
252 single commit in the umbrella repository would match a single commit in a
253 sub-project.
254
255 Living Downstream
256 ^^^^^^^^^^^^^^^^^
257
258 Downstream SVN users can use the read/write SVN bridges with the following
259 caveats:
260
261 * Be prepared for a one-time change to the upstream revision numbers.
262 * The upstream sub-project revision numbers will no longer be in sync.
263
264 Downstream Git users can continue without any major changes, with the minor
265 change of upstreaming using `git push` instead of `git svn dcommit`.
266
267 Git users also have the option of adopting an umbrella repository downstream.
268 The tooling for the upstream umbrella can easily be reused for downstream needs,
269 incorporating extra sub-projects and branching in parallel with sub-project
270 branches.
271
272 Multirepo Preview
273 ^^^^^^^^^^^^^^^^^
274
275 As a preview (disclaimer: this rough prototype, not polished and not
276 representative of the final solution), you can look at the following:
277
278 * Repository: https://github.com/llvm-beanz/llvm-submodules
279 * Update bot: http://beanz-bot.com:8180/jenkins/job/submodule-update/
280
281 Concerns
282 ^^^^^^^^
283
284 * Because GitHub does not allow server-side hooks, and because there is no
285 "push timestamp" in Git, the umbrella repository sequence isn't totally
286 exact: commits from different repositories pushed around the same time can
287 appear in different orders. However, we don't expect it to be the common case
288 or to cause serious issues in practice.
289 * You can't have a single cross-projects commit that would update both LLVM and
290 other sub-projects (something that can be achieved now). It would be possible
291 to establish a protocol whereby users add a special token to their commit
292 messages that causes the umbrella repo's updater bot to group all of them
293 into a single revision.
294 * Another option is to group commits that were pushed closely enough together
295 in the umbrella repository. This has the advantage of allowing cross-project
296 commits, and is less sensitive to mis-ordering commits. However, this has the
297 potential to group unrelated commits together, especially if the bot goes
298 down and needs to catch up.
299 * This variant relies on heavier tooling. But the current prototype shows that
300 it is not out-of-reach.
301 * Submodules don't have a good reputation / are complicating the command line.
302 However, in the proposed setup, a regular developer will seldom interact with
303 submodules directly, and certainly never update them.
304 * Refactoring across projects is not friendly: taking some functions from clang
305 to make it part of a utility in libSupport wouldn't carry the history of the
306 code in the llvm repo, preventing recursively applying `git blame` for
307 instance. However, this is not very different than how most people are
308 Interacting with the repository today, by splitting such change in multiple
309 commits.
310
311 Workflows
312 ^^^^^^^^^
313
314 * :ref:`Checkout/Clone a Single Project, without Commit Access `.
315 * :ref:`Checkout/Clone a Single Project, with Commit Access `.
316 * :ref:`Checkout/Clone Multiple Projects, with Commit Access `.
317 * :ref:`Commit an API Change in LLVM and Update the Sub-projects `.
318 * :ref:`Branching/Stashing/Updating for Local Development or Experiments `.
319 * :ref:`Bisecting `.
320
321 Monorepo Variant
322 ----------------
323
324 This variant recommends moving all LLVM sub-projects to a single Git repository,
325 similar to https://github.com/llvm-project/llvm-project.
326 This would mimic an export of the current SVN repository, with each sub-project
327 having its own top-level directory.
328 Not all sub-projects are used for building toolchains. In practice, www/
329 and test-suite/ will probably stay out of the monorepo.
330
331 Putting all sub-projects in a single checkout makes cross-project refactoring
332 naturally simple:
333
334 * New sub-projects can be trivially split out for better reuse and/or layering
335 (e.g., to allow libSupport and/or LIT to be used by runtimes without adding a
336 dependency on LLVM).
337 * Changing an API in LLVM and upgrading the sub-projects will always be done in
338 a single commit, designing away a common source of temporary build breakage.
339 * Moving code across sub-project (during refactoring for instance) in a single
340 commit enables accurate `git blame` when tracking code change history.
341 * Tooling based on `git grep` works natively across sub-projects, allowing to
342 easier find refactoring opportunities across projects (for example reusing a
343 datastructure initially in LLDB by moving it into libSupport).
344 * Having all the sources present encourages maintaining the other sub-projects
345 when changing API.
346
347 Finally, the monorepo maintains the property of the existing SVN repository that
348 the sub-projects move synchronously, and a single revision number (or commit
349 hash) identifies the state of the development across all projects.
350
351 .. _build_single_project:
352
353 Building a single sub-project
354 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
355
356 Nobody will be forced to build unnecessary projects. The exact structure
357 is TBD, but making it trivial to configure builds for a single sub-project
358 (or a subset of sub-projects) is a hard requirement.
359
360 As an example, it could look like the following::
361
362 mkdir build && cd build
363 # Configure only LLVM (default)
364 cmake path/to/monorepo
365 # Configure LLVM and lld
366 cmake path/to/monorepo -DLLVM_ENABLE_PROJECTS=lld
367 # Configure LLVM and clang
368 cmake path/to/monorepo -DLLVM_ENABLE_PROJECTS=clang
369
370 .. _git-svn-mirror:
371
372 Read/write sub-project mirrors
373 ------------------------------
374
375 With the Monorepo, the existing single-subproject mirrors (e.g.
376 http://llvm.org/git/compiler-rt.git) with git-svn read-write access would
377 continue to be maintained: developers would continue to be able to use the
378 existing single-subproject git repositories as they do today, with *no changes
379 to workflow*. Everything (git fetch, git svn dcommit, etc.) could continue to
380 work identically to how it works today. The monorepo can be set-up such that the
381 SVN revision number matches the SVN revision in the GitHub SVN-bridge.
382
383 Living Downstream
384 ^^^^^^^^^^^^^^^^^
385
386 Downstream SVN users can use the read/write SVN bridge. The SVN revision
387 number can be preserved in the monorepo, minimizing the impact.
388
389 Downstream Git users can continue without any major changes, by using the
390 git-svn mirrors on top of the SVN bridge.
391
392 Git users can also work upstream with monorepo even if their downstream
393 fork has split repositories. They can apply patches in the appropriate
394 subdirectories of the monorepo using, e.g., `git am --directory=...`, or
395 plain `diff` and `patch`.
396
397 Alternatively, Git users can migrate their own fork to the monorepo. As a
398 demonstration, we've migrated the "CHERI" fork to the monorepo in two ways:
399
400 * Using a script that rewrites history (including merges) so that it looks
401 like the fork always lived in the monorepo [LebarCHERI]_. The upside of
402 this is when you check out an old revision, you get a copy of all llvm
403 sub-projects at a consistent revision. (For instance, if it's a clang
404 fork, when you check out an old revision you'll get a consistent version
405 of llvm proper.) The downside is that this changes the fork's commit
406 hashes.
407
408 * Merging the fork into the monorepo [AminiCHERI]_. This preserves the
409 fork's commit hashes, but when you check out an old commit you only get
410 the one sub-project.
411
412 Monorepo Preview
413 ^^^^^^^^^^^^^^^^^
414
415 As a preview (disclaimer: this rough prototype, not polished and not
416 representative of the final solution), you can look at the following:
417
418 * Full Repository: https://github.com/joker-eph/llvm-project
419 * Single sub-project view with *SVN write access* to the full repo:
420 https://github.com/joker-eph/compiler-rt
421
422 Concerns
423 ^^^^^^^^
424
425 * Using the monolithic repository may add overhead for those contributing to a
426 standalone sub-project, particularly on runtimes like libcxx and compiler-rt
427 that don't rely on LLVM; currently, a fresh clone of libcxx is only 15MB (vs.
428 1GB for the monorepo), and the commit rate of LLVM may cause more frequent
429 `git push` collisions when upstreaming. Affected contributors can continue to
430 use the SVN bridge or the single-subproject Git mirrors with git-svn for
431 read-write.
432 * Using the monolithic repository may add overhead for those *integrating* a
433 standalone sub-project, even if they aren't contributing to it, due to the
434 same disk space concern as the point above. The availability of the
435 sub-project Git mirror addesses this, even without SVN access.
436 * Preservation of the existing read/write SVN-based workflows relies on the
437 GitHub SVN bridge, which is an extra dependency. Maintaining this locks us
438 into GitHub and could restrict future workflow changes.
439
440 Workflows
441 ^^^^^^^^^
442
443 * :ref:`Checkout/Clone a Single Project, without Commit Access `.
444 * :ref:`Checkout/Clone a Single Project, with Commit Access `.
445 * :ref:`Checkout/Clone Multiple Projects, with Commit Access `.
446 * :ref:`Commit an API Change in LLVM and Update the Sub-projects `.
447 * :ref:`Branching/Stashing/Updating for Local Development or Experiments `.
448 * :ref:`Bisecting `.
449
450 Multi/Mono Hybrid Variant
451 -------------------------
452
453 This variant recommends moving only the LLVM sub-projects that are *rev-locked*
454 to LLVM into a monorepo (clang, lld, lldb, ...), following the multirepo
455 proposal for the rest. While neither variant recommends combining sub-projects
456 like www/ and test-suite/ (which are completely standalone), this goes further
457 and keeps sub-projects like libcxx and compiler-rt in their own distinct
458 repositories.
459
460 Concerns
461 ^^^^^^^^
462
463 * This has most disadvantages of multirepo and monorepo, without bringing many
464 of the advantages.
465 * Downstream have to upgrade to the monorepo structure, but only partially. So
466 they will keep the infrastructure to integrate the other separate
467 sub-projects.
468 * All projects that use LIT for testing are effectively rev-locked to LLVM.
469 Furthermore, some runtimes (like compiler-rt) are rev-locked with Clang.
470 It's not clear where to draw the lines.
471
472
473 Workflow Before/After
474 =====================
475
476 This section goes through a few examples of workflows, intended to illustrate
477 how end-users or developers would interact with the repository for
478 various use-cases.
479
480 .. _workflow-checkout-commit:
481
482 Checkout/Clone a Single Project, without Commit Access
483 ------------------------------------------------------
484
485 Except the URL, nothing changes. The possibilities today are::
486
487 svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm
488 # or with Git
489 git clone http://llvm.org/git/llvm.git
490
491 After the move to GitHub, you would do either::
492
493 git clone https://github.com/llvm-project/llvm.git
494 # or using the GitHub svn native bridge
495 svn co https://github.com/llvm-project/llvm/trunk
496
497 The above works for both the monorepo and the multirepo, as we'll maintain the
498 existing read-only views of the individual sub-projects.
499
500 Checkout/Clone a Single Project, with Commit Access
501 ---------------------------------------------------
502
503 Currently
504 ^^^^^^^^^
505
506 ::
507
508 # direct SVN checkout
509 svn co https://user@llvm.org/svn/llvm-project/llvm/trunk llvm
510 # or using the read-only Git view, with git-svn
511 git clone http://llvm.org/git/llvm.git
512 cd llvm
513 git svn init https://llvm.org/svn/llvm-project/llvm/trunk --username=
514 git config svn-remote.svn.fetch :refs/remotes/origin/master
515 git svn rebase -l # -l avoids fetching ahead of the git mirror.
516
517 Commits are performed using `svn commit` or with the sequence `git commit` and
518 `git svn dcommit`.
519
520 .. _workflow-multicheckout-nocommit:
521
522 Multirepo Variant
523 ^^^^^^^^^^^^^^^^^
524
525 With the multirepo variant, nothing changes but the URL, and commits can be
526 performed using `svn commit` or `git commit` and `git push`::
527
528 git clone https://github.com/llvm/llvm.git llvm
529 # or using the GitHub svn native bridge
530 svn co https://github.com/llvm/llvm/trunk/ llvm
531
532 .. _workflow-monocheckout-nocommit:
533
534 Monorepo Variant
535 ^^^^^^^^^^^^^^^^
536
537 With the monorepo variant, there are a few options, depending on your
538 constraints. First, you could just clone the full repository::
539
540 git clone https://github.com/llvm/llvm-projects.git llvm
541 # or using the GitHub svn native bridge
542 svn co https://github.com/llvm/llvm-projects/trunk/ llvm
543
544 At this point you have every sub-project (llvm, clang, lld, lldb, ...), which
545 :ref:`doesn't imply you have to build all of them `. You
546 can still build only compiler-rt for instance. In this way it's not different
547 from someone who would check out all the projects with SVN today.
548
549 You can commit as normal using `git commit` and `git push` or `svn commit`, and
550 read the history for a single project (`git log libcxx` for example).
551
552 Secondly, there are a few options to avoid checking out all the sources.
553
554 **Using the GitHub SVN bridge**
555
556 The GitHub SVN native bridge allows to checkout a subdirectory directly:
557
558 svn co https://github.com/llvm/llvm-projects/trunk/compiler-rt compiler-rt —username=...
559
560 This checks out only compiler-rt and provides commit access using "svn commit",
561 in the same way as it would do today.
562
563 **Using a Subproject Git Nirror**
564
565 You can use *git-svn* and one of the sub-project mirrors::
566
567 # Clone from the single read-only Git repo
568 git clone http://llvm.org/git/llvm.git
569 cd llvm
570 # Configure the SVN remote and initialize the svn metadata
571 $ git svn init https://github.com/joker-eph/llvm-project/trunk/llvm —username=...
572 git config svn-remote.svn.fetch :refs/remotes/origin/master
573 git svn rebase -l
574
575 In this case the repository contains only a single sub-project, and commits can
576 be made using `git svn dcommit`, again exactly as we do today.
577
578 **Using a Sparse Checkouts**
579
580 You can hide the other directories using a Git sparse checkout::
581
582 git config core.sparseCheckout true
583 echo /compiler-rt > .git/info/sparse-checkout
584 git read-tree -mu HEAD
585
586 The data for all sub-projects is still in your `.git` directory, but in your
587 checkout, you only see `compiler-rt`.
588 Before you push, you'll need to fetch and rebase (`git pull --rebase`) as
589 usual.
590
591 Note that when you fetch you'll likely pull in changes to sub-projects you don't
592 care about. If you are using spasre checkout, the files from other projects
593 won't appear on your disk. The only effect is that your commit hash changes.
594
595 You can check whether the changes in the last fetch are relevant to your commit
596 by running::
597
598 git log origin/master@{1}..origin/master -- libcxx
599
600 This command can be hidden in a script so that `git llvmpush` would perform all
601 these steps, fail only if such a dependent change exists, and show immediately
602 the change that prevented the push. An immediate repeat of the command would
603 (almost) certainly result in a successful push.
604 Note that today with SVN or git-svn, this step is not possible since the
605 "rebase" implicitly happens while committing (unless a conflict occurs).
606
607 Checkout/Clone Multiple Projects, with Commit Access
608 ----------------------------------------------------
609
610 Let's look how to assemble llvm+clang+libcxx at a given revision.
611
612 Currently
613 ^^^^^^^^^
614
615 ::
616
617 svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm -r $REVISION
618 cd llvm/tools
619 svn co http://llvm.org/svn/llvm-project/clang/trunk clang -r $REVISION
620 cd ../projects
621 svn co http://llvm.org/svn/llvm-project/libcxx/trunk libcxx -r $REVISION
622
623 Or using git-svn::
624
625 git clone http://llvm.org/git/llvm.git
626 cd llvm/
627 git svn init https://llvm.org/svn/llvm-project/llvm/trunk --username=
628 git config svn-remote.svn.fetch :refs/remotes/origin/master
629 git svn rebase -l
630 git checkout `git svn find-rev -B r258109`
631 cd tools
632 git clone http://llvm.org/git/clang.git
633 cd clang/
634 git svn init https://llvm.org/svn/llvm-project/clang/trunk --username=
635 git config svn-remote.svn.fetch :refs/remotes/origin/master
636 git svn rebase -l
637 git checkout `git svn find-rev -B r258109`
638 cd ../../projects/
639 git clone http://llvm.org/git/libcxx.git
640 cd libcxx
641 git svn init https://llvm.org/svn/llvm-project/libcxx/trunk --username=
642 git config svn-remote.svn.fetch :refs/remotes/origin/master
643 git svn rebase -l
644 git checkout `git svn find-rev -B r258109`
645
646 Note that the list would be longer with more sub-projects.
647
648 .. _workflow-multicheckout-multicommit:
649
650 Multirepo Variant
651 ^^^^^^^^^^^^^^^^^
652
653 With the multirepo variant, the umbrella repository will be used. This is
654 where the mapping from a single revision number to the individual repositories
655 revisions is stored.::
656
657 git clone https://github.com/llvm-beanz/llvm-submodules
658 cd llvm-submodules
659 git checkout $REVISION
660 git submodule init
661 git submodule update clang llvm libcxx
662 # the list of sub-project is optional, `git submodule update` would get them all.
663
664 At this point the clang, llvm, and libcxx individual repositories are cloned
665 and stored alongside each other. There are CMake flags to describe the directory
666 structure; alternatively, you can just symlink `clang` to `llvm/tools/clang`,
667 etc.
668
669 Another option is to checkout repositories based on the commit timestamp::
670
671 git checkout `git rev-list -n 1 --before="2009-07-27 13:37" master`
672
673 .. _workflow-monocheckout-multicommit:
674
675 Monorepo Variant
676 ^^^^^^^^^^^^^^^^
677
678 The repository contains natively the source for every sub-projects at the right
679 revision, which makes this straightforward::
680
681 git clone https://github.com/llvm/llvm-projects.git llvm-projects
682 cd llvm-projects
683 git checkout $REVISION
684
685 As before, at this point clang, llvm, and libcxx are stored in directories
686 alongside each other.
687
688 .. _workflow-cross-repo-commit:
689
690 Commit an API Change in LLVM and Update the Sub-projects
691 --------------------------------------------------------
692
693 Today this is possible, even though not common (at least not documented) for
694 subversion users and for git-svn users. For example, few Git users try to update
695 LLD or Clang in the same commit as they change an LLVM API.
696
697 The multirepo variant does not address this: one would have to commit and push
698 separately in every individual repository. It would be possible to establish a
699 protocol whereby users add a special token to their commit messages that causes
700 the umbrella repo's updater bot to group all of them into a single revision.
701
702 The monorepo variant handles this natively.
703
704 Branching/Stashing/Updating for Local Development or Experiments
705 ----------------------------------------------------------------
706
707 Currently
708 ^^^^^^^^^
709
710 SVN does not allow this use case, but developers that are currently using
711 git-svn can do it. Let's look in practice what it means when dealing with
712 multiple sub-projects.
713
714 To update the repository to tip of trunk::
715
716 git pull
717 cd tools/clang
718 git pull
719 cd ../../projects/libcxx
720 git pull
721
722 To create a new branch::
723
724 git checkout -b MyBranch
725 cd tools/clang
726 git checkout -b MyBranch
727 cd ../../projects/libcxx
728 git checkout -b MyBranch
729
730 To switch branches::
731
732 git checkout AnotherBranch
733 cd tools/clang
734 git checkout AnotherBranch
735 cd ../../projects/libcxx
736 git checkout AnotherBranch
737
738 .. _workflow-multi-branching:
739
740 Multirepo Variant
741 ^^^^^^^^^^^^^^^^^
742
743 The multirepo works the same as the current Git workflow: every command needs
744 to be applied to each of the individual repositories.
745 However, the umbrella repository makes this easy using `git submodule foreach`
746 to replicate a command on all the individual repositories (or submodules
747 in this case):
748
749 To create a new branch::
750
751 git submodule foreach git checkout -b MyBranch
752
753 To switch branches::
754
755 git submodule foreach git checkout AnotherBranch
756
757 .. _workflow-mono-branching:
758
759 Monorepo Variant
760 ^^^^^^^^^^^^^^^^
761
762 Regular Git commands are sufficient, because everything is in a single
763 repository:
764
765 To update the repository to tip of trunk::
766
767 git pull
768
769 To create a new branch::
770
771 git checkout -b MyBranch
772
773 To switch branches::
774
775 git checkout AnotherBranch
776
777 Bisecting
778 ---------
779
780 Assuming a developer is looking for a bug in clang (or lld, or lldb, ...).
781
782 Currently
783 ^^^^^^^^^
784
785 SVN does not have builtin bisection support, but the single revision across
786 sub-projects makes it possible to script around.
787
788 Using the existing Git read-only view of the repositories, it is possible to use
789 the native Git bisection script over the llvm repository, and use some scripting
790 to synchronize the clang repository to match the llvm revision.
791
792 .. _workflow-multi-bisecting:
793
794 Multirepo Variant
795 ^^^^^^^^^^^^^^^^^
796
797 With the multi-repositories variant, the cross-repository synchronization is
798 achieved using the umbrella repository. This repository contains only
799 submodules for the other sub-projects. The native Git bisection can be used on
800 the umbrella repository directly. A subtlety is that the bisect script itself
801 needs to make sure the submodules are updated accordingly.
802
803 For example, to find which commit introduces a regression where clang-3.9
804 crashes but not clang-3.8 passes, one should be able to simply do::
805
806 git bisect start release_39 release_38
807 git bisect run ./bisect_script.sh
808
809 With the `bisect_script.sh` script being::
810
811 #!/bin/sh
812 cd $UMBRELLA_DIRECTORY
813 git submodule update llvm clang libcxx #....
814 cd $BUILD_DIR
815
816 ninja clang || exit 125 # an exit code of 125 asks "git bisect"
817 # to "skip" the current commit
818
819 ./bin/clang some_crash_test.cpp
820
821 When the `git bisect run` command returns, the umbrella repository is set to
822 the state where the regression is introduced. The commit diff in the umbrella
823 indicate which submodule was updated, and the last commit in this sub-projects
824 is the one that the bisect found.
825
826 .. _workflow-mono-bisecting:
827
828 Monorepo Variant
829 ^^^^^^^^^^^^^^^^
830
831 Bisecting on the monorepo is straightforward, and very similar to the above,
832 except that the bisection script does not need to include the
833 `git submodule update` step.
834
835 The same example, finding which commit introduces a regression where clang-3.9
836 crashes but not clang-3.8 passes, will look like::
837
838 git bisect start release_39 release_38
839 git bisect run ./bisect_script.sh
840
841 With the `bisect_script.sh` script being::
842
843 #!/bin/sh
844 cd $BUILD_DIR
845
846 ninja clang || exit 125 # an exit code of 125 asks "git bisect"
847 # to "skip" the current commit
848
849 ./bin/clang some_crash_test.cpp
850
851 Also, since the monorepo handles commits update across multiple projects, you're
852 less like to encounter a build failure where a commit change an API in LLVM and
853 another later one "fixes" the build in clang.
854
855
856 References
857 ==========
858
859 .. [LattnerRevNum] Chris Lattner, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041739.html
860 .. [TrickRevNum] Andrew Trick, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041721.html
861 .. [JSonnRevNum] Joerg Sonnenberg, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041688.html
862 .. [TorvaldRevNum] Linus Torvald, http://git.661346.n2.nabble.com/Git-commit-generation-numbers-td6584414.html
863 .. [MatthewsRevNum] Chris Matthews, http://lists.llvm.org/pipermail/cfe-dev/2016-July/049886.html
864 .. [submodules] Git submodules, https://git-scm.com/book/en/v2/Git-Tools-Submodules)
865 .. [statuschecks] GitHub status-checks, https://help.github.com/articles/about-required-status-checks/
866 .. [LebarCHERI] Port *CHERI* to a single repository rewriting history, http://lists.llvm.org/pipermail/llvm-dev/2016-July/102787.html
867 .. [AminiCHERI] Port *CHERI* to a single repository preserving history, http://lists.llvm.org/pipermail/llvm-dev/2016-July/102804.html
+0
-273
docs/Proposals/GitHubSubMod.rst less more
None ===============================================
1 Moving LLVM Projects to GitHub with Sub-Modules
2 ===============================================
3
4 Introduction
5 ============
6
7 This is a proposal to move our current revision control system from our own
8 hosted Subversion to GitHub. Below are the financial and technical arguments as
9 to why we need such a move and how will people (and validation infrastructure)
10 continue to work with a Git-based LLVM.
11
12 There will be a survey pointing at this document when we'll know the community's
13 reaction and, if we collectively decide to move, the time-frames. Be sure to make
14 your views count.
15
16 Essentially, the proposal is divided in the following parts:
17
18 * Outline of the reasons to move to Git and GitHub
19 * Description on what the work flow will look like (compared to SVN)
20 * Remaining issues and potential problems
21 * The proposed migration plan
22
23 Why Git, and Why GitHub?
24 ========================
25
26 Why move at all?
27 ----------------
28
29 The strongest reason for the move, and why this discussion started in the first
30 place, is that we currently host our own Subversion server and Git mirror in a
31 voluntary basis. The LLVM Foundation sponsors the server and provides limited
32 support, but there is only so much it can do.
33
34 The volunteers are not Sysadmins themselves, but compiler engineers that happen
35 to know a thing or two about hosting servers. We also don't have 24/7 support,
36 and we sometimes wake up to see that continuous integration is broken because
37 the SVN server is either down or unresponsive.
38
39 With time and money, the foundation and volunteers could improve our services,
40 implement more functionality and provide around the clock support, so that we
41 can have a first class infrastructure with which to work. But the cost is not
42 small, both in money and time invested.
43
44 On the other hand, there are multiple services out there (GitHub, GitLab,
45 BitBucket among others) that offer that same service (24/7 stability, disk space,
46 Git server, code browsing, forking facilities, etc) for the very affordable price
47 of *free*.
48
49 Why Git?
50 --------
51
52 Most new coders nowadays start with Git. A lot of them have never used SVN, CVS
53 or anything else. Websites like GitHub have changed the landscape of open source
54 contributions, reducing the cost of first contribution and fostering
55 collaboration.
56
57 Git is also the version control most LLVM developers use. Despite the sources
58 being stored in an SVN server, most people develop using the Git-SVN integration,
59 and that shows that Git is not only more powerful than SVN, but people have
60 resorted to using a bridge because its features are now indispensable to their
61 internal and external workflows.
62
63 In essence, Git allows you to:
64
65 * Commit, squash, merge, fork locally without any penalty to the server
66 * Add as many branches as necessary to allow for multiple threads of development
67 * Collaborate with peers directly, even without access to the Internet
68 * Have multiple trees without multiplying disk space.
69
70 In addition, because Git seems to be replacing every project's version control
71 system, there are many more tools that can use Git's enhanced feature set, so
72 new tooling is much more likely to support Git first (if not only), than any
73 other version control system.
74
75 Why GitHub?
76 -----------
77
78 GitHub, like GitLab and BitBucket, provide free code hosting for open source
79 projects. Essentially, they will completely replace *all* the infrastructure that
80 we have today that serves code repository, mirroring, user control, etc.
81
82 They also have a dedicated team to monitor, migrate, improve and distribute the
83 contents of the repositories depending on region and load. A level of quality
84 that we'd never have without spending money that would be better spent elsewhere,
85 for example development meetings, sponsoring disadvantaged people to work on
86 compilers and foster diversity and equality in our community.
87
88 GitHub has the added benefit that we already have a presence there. Many
89 developers use it already, and the mirror from our current repository is already
90 set up.
91
92 Furthermore, GitHub has an *SVN view* (https://github.com/blog/626-announcing-svn-support)
93 where people that still have/want to use SVN infrastructure and tooling can
94 slowly migrate or even stay working as if it was an SVN repository (including
95 read-write access).
96
97 So, any of the three solutions solve the cost and maintenance problem, but GitHub
98 has two additional features that would be beneficial to the migration plan as
99 well as the community already settled there.
100
101
102 What will the new workflow look like
103 ====================================
104
105 In order to move version control, we need to make sure that we get all the
106 benefits with the least amount of problems. That's why the migration plan will
107 be slow, one step at a time, and we'll try to make it look as close as possible
108 to the current style without impacting the new features we want.
109
110 Each LLVM project will continue to be hosted as separate GitHub repository
111 under a single GitHub organisation. Users can continue to choose to use either
112 SVN or Git to access the repositories to suit their current workflow.
113
114 In addition, we'll create a repository that will mimic our current *linear
115 history* repository. The most accepted proposal, then, was to have an umbrella
116 project that will contain *sub-modules* (https://git-scm.com/book/en/v2/Git-Tools-Submodules)
117 of all the LLVM projects and nothing else.
118
119 This repository can be checked out on its own, in order to have *all* LLVM
120 projects in a single check-out, as many people have suggested, but it can also
121 only hold the references to the other projects, and be used for the sole purpose
122 of understanding the *sequence* in which commits were added by using the
123 ``git rev-list --count hash`` or ``git describe hash`` commands.
124
125 One example of such a repository is Takumi's llvm-project-submodule
126 (https://github.com/chapuni/llvm-project-submodule), which when checked out,
127 will have the references to all sub-modules but not check them out, so one will
128 need to *init* the module manually. This will allow the *exact* same behaviour
129 as checking out individual SVN repositories, as it will keep the correct linear
130 history.
131
132 There is no need to additional tags, flags and properties, or external
133 services controlling the history, since both SVN and *git rev-list* can already
134 do that on their own.
135
136 We will need additional server hooks to avoid non-fast-forwards commits (ex.
137 merges, forced pushes, etc) in order to keep the linearity of the history.
138
139 The three types hooks to be implemented are:
140
141 * Status Checks: By placing status checks on a protected branch, we can guarantee
142 that the history is kept linear and sane at all times, on all repositories.
143 See: https://help.github.com/articles/about-required-status-checks/
144 * Umbrella updates: By using GitHub web hooks, we can update a small web-service
145 inside LLVM's own infrastructure to update the umbrella project remotely. The
146 maintenance of this service will be lower than the current SVN maintenance and
147 the scope of its failures will be less severe.
148 See: https://developer.github.com/webhooks/
149 * Commits email update: By adding an email web hook, we can make every push show
150 in the lists, allowing us to retain history and do post-commit reviews.
151 See: https://help.github.com/articles/managing-notifications-for-pushes-to-a-repository/
152
153 Access will be transferred one-to-one to GitHub accounts for everyone that already
154 has commit access to our current repository. Those who don't have accounts will
155 have to create one in order to continue contributing to the project. In the
156 future, people only need to provide their GitHub accounts to be granted access.
157
158 In a nutshell:
159
160 * The projects' repositories will remain identical, with a new address (GitHub).
161 * They'll continue to have SVN access (Read-Write), but will also gain Git RW access.
162 * The linear history can still be accessed in the (RO) submodule meta project.
163 * Individual projects' history will be local (ie. not interlaced with the other
164 projects, as the current SVN repos are), and we need the umbrella project
165 (using submodules) to have the same view as we had in SVN.
166
167 Additionally, each repository will have the following server hooks:
168
169 * Pre-commit hooks to stop people from applying non-fast-forward merges
170 * Webhook to update the umbrella project (via buildbot or web services)
171 * Email hook to each commits list (llvm-commit, cfe-commit, etc)
172
173 Essentially, we're adding Git RW access in addition to the already existing
174 structure, with all the additional benefits of it being in GitHub.
175
176 Example of a working version:
177
178 * Repository: https://github.com/llvm-beanz/llvm-submodules
179 * Update bot: http://beanz-bot.com:8180/jenkins/job/submodule-update/
180
181 What will *not* be changed
182 --------------------------
183
184 This is a change of version control system, not the whole infrastructure. There
185 are plans to replace our current tools (review, bugs, documents), but they're
186 all orthogonal to this proposal.
187
188 We'll also be keeping the buildbots (and migrating them to use Git) as well as
189 LNT, and any other system that currently provides value upstream.
190
191 Any discussion regarding those tools are out of scope in this proposal.
192
193 Remaining questions and problems
194 ================================
195
196 1. How much the SVN view emulates and how much it'll break tools/CI?
197
198 For this one, we'll need people that will have problems in that area to tell
199 us what's wrong and how to help them fix it.
200
201 We also recommend people and companies to migrate to Git, for its many other
202 additional benefits.
203
204 2. Which tools will need changing?
205
206 LNT may break, since it relies on SVN's history. We can continue to
207 use LNT with the SVN-View, but it would be best to move it to Git once and for
208 all.
209
210 The LLVMLab bisect tool will also be affected and will need adjusting. As with
211 LNT, it should be fine to use GitHub's SVN view, but changing it to work on Git
212 will be required in the long term.
213
214 Phabricator will also need to change its configuration to point at the GitHub
215 repositories, but since it already works with Git, this will be a trivial change.
216
217 Migration Plan
218 ==============
219
220 If we decide to move, we'll have to set a date for the process to begin.
221
222 As usual, we should be announcing big changes in one release to happen in the
223 next one. But since this won't impact external users (if they rely on our source
224 release tarballs), we don't necessarily have to.
225
226 We will have to make sure all the *problems* reported are solved before the
227 final push. But we can start all non-binding processes (like mirroring to GitHub
228 and testing the SVN interface in it) before any hard decision.
229
230 Here's a proposed plan:
231
232 STEP #1 : Pre Move
233
234 0. Update docs to mention the move, so people are aware the it's going on.
235 1. Register an official GitHub project with the LLVM foundation.
236 2. Setup another (read-only) mirror of llvm.org/git at this GitHub project,
237 adding all necessary hooks to avoid broken history (merge, dates, pushes), as
238 well as a webhook to update the umbrella project (see below).
239 3. Make sure we have an llvm-project (with submodules) setup in the official
240 account, with all necessary hooks (history, update, merges).
241 4. Make sure bisecting with llvm-project works.
242 5. Make sure no one has any other blocker.
243
244 STEP #2 : Git Move
245
246 6. Update the buildbots to pick up updates and commits from the official git
247 repository.
248 7. Update Phabricator to pick up commits from the official git repository.
249 8. Tell people living downstream to pick up commits from the official git
250 repository.
251 9. Give things time to settle. We could play some games like disabling the SVN
252 repository for a few hours on purpose so that people can test that their
253 infrastructure has really become independent of the SVN repository.
254
255 Until this point nothing has changed for developers, it will just
256 boil down to a lot of work for buildbot and other infrastructure
257 owners.
258
259 Once all dependencies are cleared, and all problems have been solved:
260
261 STEP #3: Write Access Move
262
263 10. Collect peoples GitHub account information, adding them to the project.
264 11. Switch SVN repository to read-only and allow pushes to the GitHub repository.
265 12. Mirror Git to SVN.
266
267 STEP #4 : Post Move
268
269 13. Archive the SVN repository, if GitHub's SVN is good enough.
270 14. Review and update *all* LLVM documentation.
271 15. Review website links pointing to viewvc/klaus/phab etc. to point to GitHub
272 instead.
509509 :hidden:
510510
511511 CodeOfConduct
512 Proposals/GitHubSubMod
512 Proposals/GitHubMove
513513
514514 :doc:`CodeOfConduct`
515515 Proposal to adopt a code of conduct on the LLVM social spaces (lists, events,
516516 IRC, etc).
517517
518 :doc:`Proposals/GitHubSubMod`
518 :doc:`Proposals/GitHubMove`
519519 Proposal to move from SVN/Git to GitHub.
520520
521521