llvm.org GIT mirror llvm / 2edc14d
Add recipes for migrating downstream branches of git mirrors Add some common recipes for downstream users developing on top of the existing git mirrors. These instructions show how to migrate local branches to the monorepo. Differential Revision: https://reviews.llvm.org/D56550 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@353713 91177308-0d34-0410-b5e6-96231b3b80d8 David Greene 9 months ago
1 changed file(s) with 534 addition(s) and 0 deletion(s). Raw diff Collapse all Expand all
851851 less like to encounter a build failure where a commit change an API in LLVM and
852852 another later one "fixes" the build in clang.
853853
854 Moving Local Branches to the Monorepo
855 =====================================
856
857 Suppose you have been developing against the existing LLVM git
858 mirrors. You have one or more git branches that you want to migrate
859 to the "final monorepo".
860
861 The simplest way to migrate such branches is with the
862 ``migrate-downstream-fork.py`` tool at
863 https://github.com/jyknight/llvm-git-migration.
864
865 Basic migration
866 ---------------
867
868 Basic instructions for ``migrate-downstream-fork.py`` are in the
869 Python script and are expanded on below to a more general recipe::
870
871 # Make a repository which will become your final local mirror of the
872 # monorepo.
873 mkdir my-monorepo
874 git -C my-monorepo init
875
876 # Add a remote to the monorepo.
877 git -C my-monorepo remote add upstream/monorepo https://github.com/llvm/llvm-project.git
878
879 # Add remotes for each git mirror you use, from upstream as well as
880 # your local mirror. All projects are listed here but you need only
881 # import those for which you have local branches.
882 my_projects=( clang
883 clang-tools-extra
884 compiler-rt
885 debuginfo-tests
886 libcxx
887 libcxxabi
888 libunwind
889 lld
890 lldb
891 llvm
892 openmp
893 polly )
894 for p in ${my_projects[@]}; do
895 git -C my-monorepo remote add upstream/split/${p} https://github.com/llvm-mirror/${p}.git
896 git -C my-monorepo remote add local/split/${p} https://my.local.mirror.org/${p}.git
897 done
898
899 # Pull in all the commits.
900 git -C my-monorepo fetch --all
901
902 # Run migrate-downstream-fork to rewrite local branches on top of
903 # the upstream monorepo.
904 (
905 cd my-monorepo
906 migrate-downstream-fork.py \
907 refs/remotes/local \
908 refs/tags \
909 --new-repo-prefix=refs/remotes/upstream/monorepo \
910 --old-repo-prefix=refs/remotes/upstream/split \
911 --source-kind=split \
912 --revmap-out=monorepo-map.txt
913 )
914
915 # Octopus-merge the resulting local split histories to unify them.
916
917 # Assumes local work on local split mirrors is on master (and
918 # upstream is presumably represented by some other branch like
919 # upstream/master).
920 my_local_branch="master"
921
922 git -C my-monorepo branch --no-track local/octopus/master \
923 $(git -C my-monorepo merge-base refs/remotes/upstream/monorepo/master \
924 refs/remotes/local/split/llvm/${my_local_branch})
925 git -C my-monorepo checkout local/octopus/${my_local_branch}
926
927 subproject_branches=()
928 for p in ${my_projects[@]}; do
929 subproject_branch=${p}/local/monorepo/${my_local_branch}
930 git -C my-monorepo branch ${subproject_branch} \
931 refs/remotes/local/split/${p}/${my_local_branch}
932 if [[ "${p}" != "llvm" ]]; then
933 subproject_branches+=( ${subproject_branch} )
934 fi
935 done
936
937 git -C my-monorepo merge ${subproject_branches[@]}
938
939 for p in ${my_projects[@]}; do
940 subproject_branch=${p}/local/monorepo/${my_local_branch}
941 git -C my-monorepo branch -d ${subproject_branch}
942 done
943
944 # Create local branches for upstream monorepo branches.
945 for ref in $(git -C my-monorepo for-each-ref --format="%(refname)" \
946 refs/remotes/upstream/monorepo); do
947 upstream_branch=${ref#refs/remotes/upstream/monorepo/}
948 git -C my-monorepo branch upstream/${upstream_branch} ${ref}
949 done
950
951 The above gets you to a state like the following::
952
953 U1 - U2 - U3 <- upstream/master
954 \ \ \
955 \ \ - Llld1 - Llld2 -
956 \ \ \
957 \ - Lclang1 - Lclang2-- Lmerge <- local/octopus/master
958 \ /
959 - Lllvm1 - Lllvm2-----
960
961 Each branched component has its branch rewritten on top of the
962 monorepo and all components are unified by a giant octopus merge.
963
964 If additional active local branches need to be preserved, the above
965 operations following the assignment to ``my_local_branch`` should be
966 done for each branch. Ref paths will need to be updated to map the
967 local branch to the corresponding upstream branch. If local branches
968 have no corresponding upstream branch, then the creation of
969 ``local/octopus/`` need not use ``git-merge-base`` to
970 pinpont its root commit; it may simply be branched from the
971 appropriate component branch (say, ``llvm/local_release_X``).
972
973 Zipping local history
974 ---------------------
975
976 The octopus merge is suboptimal for many cases, because walking back
977 through the history of one component leaves the other components fixed
978 at a history that likely makes things unbuildable.
979
980 Some downstream users track the order commits were made to subprojects
981 with some kind of "umbrella" project that imports the project git
982 mirrors as submodules, similar to the multirepo umbrella proposed
983 above. Such an umbrella repository looks something like this::
984
985 UM1 ---- UM2 -- UM3 -- UM4 ---- UM5 ---- UM6 ---- UM7 ---- UM8 <- master
986 | | | | | | |
987 Lllvm1 Llld1 Lclang1 Lclang2 Lllvm2 Llld2 Lmyproj1
988
989 The vertical bars represent submodule updates to a particular local
990 commit in the project mirror. ``UM3`` in this case is a commit of
991 some local umbrella repository state that is not a submodule update,
992 perhaps a ``README`` or project build script update. Commit ``UM8``
993 updates a submodule of local project ``myproj``.
994
995 The tool ``zip-downstream-fork.py`` at
996 https://github.com/greened/llvm-git-migration/tree/zip can be used to
997 convert the umbrella history into a monorepo-based history with
998 commits in the order implied by submodule updates::
999
1000 U1 - U2 - U3 <- upstream/master
1001 \ \ \
1002 \ -----\--------------- local/zip--.
1003 \ \ \ |
1004 - Lllvm1 - Llld1 - UM3 - Lclang1 - Lclang2 - Lllvm2 - Llld2 - Lmyproj1 <-'
1005
1006
1007 The ``U*`` commits represent upstream commits to the monorepo master
1008 branch. Each submodule update in the local ``UM*`` commits brought in
1009 a subproject tree at some local commit. The trees in the ``L*1``
1010 commits represent merges from upstream. These result in edges from
1011 the ``U*`` commits to their corresponding rewritten ``L*1`` commits.
1012 The ``L*2`` commits did not do any merges from upstream.
1013
1014 Note that the merge from ``U2`` to ``Lclang1`` appears redundant, but
1015 if, say, ``U3`` changed some files in upstream clang, the ``Lclang1``
1016 commit appearing after the ``Llld1`` commit would actually represent a
1017 clang tree *earlier* in the upstream clang history. We want the
1018 ``local/zip`` branch to accurately represent the state of our umbrella
1019 history and so the edge ``U2 -> Lclang1`` is a visual reminder of what
1020 clang's tree actually looks like in ``Lclang1``.
1021
1022 Even so, the edge ``U3 -> Llld1`` could be problematic for future
1023 merges from upstream. git will think that we've already merged from
1024 ``U3``, and we have, except for the state of the clang tree. One
1025 possible migitation strategy is to manually diff clang between ``U2``
1026 and ``U3`` and apply those updates to ``local/zip``. Another,
1027 possibly simpler strategy is to freeze local work on downstream
1028 branches and merge all submodules from the latest upstream before
1029 running ``zip-downstream-fork.py``. If downstream merged each project
1030 from upstream in lockstep without any intervening local commits, then
1031 things should be fine without any special action. We anticipate this
1032 to be the common case.
1033
1034 The tree for ``Lclang1`` outside of clang will represent the state of
1035 things at ``U3`` since all of the upstream projects not participating
1036 in the umbrella history should be in a state respecting the commit
1037 ``U3``. The trees for llvm and lld should correctly represent commits
1038 ``Lllvm1`` and ``Llld1``, respectively.
1039
1040 Commit ``UM3`` changed files not related to submodules and we need
1041 somewhere to put them. It is not safe in general to put them in the
1042 monorepo root directory because they may conflict with files in the
1043 monorepo. Let's assume we want them in a directory ``local`` in the
1044 monorepo.
1045
1046 **Example 1: Umbrella looks like the monorepo**
1047
1048 For this example, we'll assume that each subproject appears in its own
1049 top-level directory in the umbrella, just as they do in the monorepo .
1050 Let's also assume that we want the files in directory ``myproj`` to
1051 appear in ``local/myproj``.
1052
1053 Given the above run of ``migrate-downstream-fork.py``, a recipe to
1054 create the zipped history is below::
1055
1056 # Import any non-LLVM repositories the umbrella references.
1057 git -C my-monorepo remote add localrepo \
1058 https://my.local.mirror.org/localrepo.git
1059 git fetch localrepo
1060
1061 subprojects=( clang clang-tools-extra compiler-rt debuginfo-tests libclc
1062 libcxx libcxxabi libunwind lld lldb llgo llvm openmp
1063 parallel-libs polly pstl )
1064
1065 # Import histories for upstream split projects (this was probably
1066 # already done for the ``migrate-downstream-fork.py`` run).
1067 for project in ${subprojects[@]}; do
1068 git remote add upstream/split/${project} \
1069 https://github.com/llvm-mirror/${subproject}.git
1070 git fetch umbrella/split/${project}
1071 done
1072
1073 # Import histories for downstream split projects (this was probably
1074 # already done for the ``migrate-downstream-fork.py`` run).
1075 for project in ${subprojects[@]}; do
1076 git remote add local/split/${project} \
1077 https://my.local.mirror.org/${subproject}.git
1078 git fetch local/split/${project}
1079 done
1080
1081 # Import umbrella history.
1082 git -C my-monorepo remote add umbrella \
1083 https://my.local.mirror.org/umbrella.git
1084 git fetch umbrella
1085
1086 # Put myproj in local/myproj
1087 echo "myproj local/myproj" > my-monorepo/submodule-map.txt
1088
1089 # Rewrite history
1090 (
1091 cd my-monorepo
1092 zip-downstream-fork.py \
1093 refs/remotes/umbrella \
1094 --new-repo-prefix=refs/remotes/upstream/monorepo \
1095 --old-repo-prefix=refs/remotes/upstream/split \
1096 --revmap-in=monorepo-map.txt \
1097 --revmap-out=zip-map.txt \
1098 --subdir=local \
1099 --submodule-map=submodule-map.txt \
1100 --update-tags
1101 )
1102
1103 # Create the zip branch (assuming umbrella master is wanted).
1104 git -C my-monorepo branch --no-track local/zip/master refs/remotes/umbrella/master
1105
1106 Note that if the umbrella has submodules to non-LLVM repositories,
1107 ``zip-downstream-fork.py`` needs to know about them to be able to
1108 rewrite commits. That is why the first step above is to fetch commits
1109 from such repositories.
1110
1111 With ``--update-tags`` the tool will migrate annotated tags pointing
1112 to submodule commits that were inlined into the zipped history. If
1113 the umbrella pulled in an upstream commit that happened to have a tag
1114 pointing to it, that tag will be migrated, which is almost certainly
1115 not what is wanted. The tag can always be moved back to its original
1116 commit after rewriting, or the ``--update-tags`` option may be
1117 discarded and any local tags would then be migrated manually.
1118
1119 **Example 2: Nested sources layout**
1120
1121 The tool handles nested submodules (e.g. llvm is a submodule in
1122 umbrella and clang is a submodule in llvm). The file
1123 ``submodule-map.txt`` is a list of pairs, one per line. The first
1124 pair item describes the path to a submodule in the umbrella
1125 repository. The second pair item secribes the path where trees for
1126 that submodule should be written in the zipped history.
1127
1128 Let's say your umbrella repository is actually the llvm repository and
1129 it has submodules in the "nested sources" layout (clang in
1130 tools/clang, etc.). Let's also say ``projects/myproj`` is a submodule
1131 pointing to some downstream repository. The submodule map file should
1132 look like this (we still want myproj mapped the same way as
1133 previously)::
1134
1135 tools/clang clang
1136 tools/clang/tools/extra clang-tools-extra
1137 projects/compiler-rt compiler-rt
1138 projects/debuginfo-tests debuginfo-tests
1139 projects/libclc libclc
1140 projects/libcxx libcxx
1141 projects/libcxxabi libcxxabi
1142 projects/libunwind libunwind
1143 tools/lld lld
1144 tools/lldb lldb
1145 projects/openmp openmp
1146 tools/polly polly
1147 projects/myproj local/myproj
1148
1149 If a submodule path does not appear in the map, the tools assumes it
1150 should be placed in the same place in the monorepo. That means if you
1151 use the "nested sources" layout in your umrella, you *must* provide
1152 map entries for all of the projects in your umbrella (except llvm).
1153 Otherwise trees from submodule updates will appear underneath llvm in
1154 the zippped history.
1155
1156 Because llvm is itself the umbrella, we use --subdir to write its
1157 content into ``llvm`` in the zippped history::
1158
1159 # Import any non-LLVM repositories the umbrella references.
1160 git -C my-monorepo remote add localrepo \
1161 https://my.local.mirror.org/localrepo.git
1162 git fetch localrepo
1163
1164 subprojects=( clang clang-tools-extra compiler-rt debuginfo-tests libclc
1165 libcxx libcxxabi libunwind lld lldb llgo llvm openmp
1166 parallel-libs polly pstl )
1167
1168 # Import histories for upstream split projects (this was probably
1169 # already done for the ``migrate-downstream-fork.py`` run).
1170 for project in ${subprojects[@]}; do
1171 git remote add upstream/split/${project} \
1172 https://github.com/llvm-mirror/${subproject}.git
1173 git fetch umbrella/split/${project}
1174 done
1175
1176 # Import histories for downstream split projects (this was probably
1177 # already done for the ``migrate-downstream-fork.py`` run).
1178 for project in ${subprojects[@]}; do
1179 git remote add local/split/${project} \
1180 https://my.local.mirror.org/${subproject}.git
1181 git fetch local/split/${project}
1182 done
1183
1184 # Import umbrella history. We want this under a different refspec
1185 # so zip-downstream-fork.py knows what it is.
1186 git -C my-monorepo remote add umbrella \
1187 https://my.local.mirror.org/llvm.git
1188 git fetch umbrella
1189
1190 # Create the submodule map.
1191 echo "tools/clang clang" > my-monorepo/submodule-map.txt
1192 echo "tools/clang/tools/extra clang-tools-extra" >> my-monorepo/submodule-map.txt
1193 echo "projects/compiler-rt compiler-rt" >> my-monorepo/submodule-map.txt
1194 echo "projects/debuginfo-tests debuginfo-tests" >> my-monorepo/submodule-map.txt
1195 echo "projects/libclc libclc" >> my-monorepo/submodule-map.txt
1196 echo "projects/libcxx libcxx" >> my-monorepo/submodule-map.txt
1197 echo "projects/libcxxabi libcxxabi" >> my-monorepo/submodule-map.txt
1198 echo "projects/libunwind libunwind" >> my-monorepo/submodule-map.txt
1199 echo "tools/lld lld" >> my-monorepo/submodule-map.txt
1200 echo "tools/lldb lldb" >> my-monorepo/submodule-map.txt
1201 echo "projects/openmp openmp" >> my-monorepo/submodule-map.txt
1202 echo "tools/polly polly" >> my-monorepo/submodule-map.txt
1203 echo "projects/myproj local/myproj" >> my-monorepo/submodule-map.txt
1204
1205 # Rewrite history
1206 (
1207 cd my-monorepo
1208 zip-downstream-fork.py \
1209 refs/remotes/umbrella \
1210 --new-repo-prefix=refs/remotes/upstream/monorepo \
1211 --old-repo-prefix=refs/remotes/upstream/split \
1212 --revmap-in=monorepo-map.txt \
1213 --revmap-out=zip-map.txt \
1214 --subdir=llvm \
1215 --submodule-map=submodule-map.txt \
1216 --update-tags
1217 )
1218
1219 # Create the zip branch (assuming umbrella master is wanted).
1220 git -C my-monorepo branch --no-track local/zip/master refs/remotes/umbrella/master
1221
1222
1223 Comments at the top of ``zip-downstream-fork.py`` describe in more
1224 detail how the tool works and various implications of its operation.
1225
1226 Importing local repositories
1227 ----------------------------
1228
1229 You may have additional repositories that integrate with the LLVM
1230 ecosystem, essentially extending it with new tools. If such
1231 repositories are tightly coupled with LLVM, it may make sense to
1232 import them into your local mirror of the monorepo.
1233
1234 If such repositores participated in the umbrella repository used
1235 during the zipping process above, they will automatically be added to
1236 the monorepo. For downstream repositories that don't participate in
1237 an umbrella setup, the ``import-downstream-repo.py`` tool at
1238 https://github.com/greened/llvm-git-migration/tree/import can help with
1239 getting them into the monorepo. A recipe follows::
1240
1241 # Import downstream repo history into the monorepo.
1242 git -C my-monorepo remote add myrepo https://my.local.mirror.org/myrepo.git
1243 git fetch myrepo
1244
1245 my_local_tags=( refs/tags/release
1246 refs/tags/hotfix )
1247
1248 (
1249 cd my-monorepo
1250 import-downstream-repo.py \
1251 refs/remotes/myrepo \
1252 ${my_local_tags[@]} \
1253 --new-repo-prefix=refs/remotes/upstream/monorepo \
1254 --subdir=myrepo \
1255 --tag-prefix="myrepo-"
1256 )
1257
1258 # Preserve release braches.
1259 for ref in $(git -C my-monorepo for-each-ref --format="%(refname)" \
1260 refs/remotes/myrepo/release); do
1261 branch=${ref#refs/remotes/myrepo/}
1262 git -C my-monorepo branch --no-track myrepo/${branch} ${ref}
1263 done
1264
1265 # Preserve master.
1266 git -C my-monorepo branch --no-track myrepo/master refs/remotes/myrepo/master
1267
1268 # Merge master.
1269 git -C my-monorepo checkout local/zip/master # Or local/octopus/master
1270 git -C my-monorepo merge myrepo/master
1271
1272 You may want to merge other corresponding branches, for example
1273 ``myrepo`` release branches if they were in lockstep with LLVM project
1274 releases.
1275
1276 ``--tag-prefix`` tells ``import-downstream-repo.py`` to rename
1277 annotated tags with the given prefix. Due to limitations with
1278 ``fast_filter_branch.py``, unannotated tags cannot be renamed
1279 (``fast_filter_branch.py`` considers them branches, not tags). Since
1280 the upstream monorepo had its tags rewritten with an "llvmorg-"
1281 prefix, name conflicts should not be an issue. ``--tag-prefix`` can
1282 be used to more clearly indicate which tags correspond to various
1283 imported repositories.
1284
1285 Given this repository history::
1286
1287 R1 - R2 - R3 <- master
1288 ^
1289 |
1290 release/1
1291
1292 The above recipe results in a history like this::
1293
1294 U1 - U2 - U3 <- upstream/master
1295 \ \ \
1296 \ -----\--------------- local/zip--.
1297 \ \ \ |
1298 - Lllvm1 - Llld1 - UM3 - Lclang1 - Lclang2 - Lllvm2 - Llld2 - Lmyproj1 - M1 <-'
1299 /
1300 R1 - R2 - R3 <-.
1301 ^ |
1302 | |
1303 myrepo-release/1 |
1304 |
1305 myrepo/master--'
1306
1307 Commits ``R1``, ``R2`` and ``R3`` have trees that *only* contain blobs
1308 from ``myrepo``. If you require commits from ``myrepo`` to be
1309 interleaved with commits on local project branches (for example,
1310 interleaved with ``llvm1``, ``llvm2``, etc. above) and myrepo doesn't
1311 appear in an umbrella repository, a new tool will need to be
1312 developed. Creating such a tool would involve:
1313
1314 1. Modifying ``fast_filter_branch.py`` to optionally take a
1315 revlist directly rather than generating it itself
1316
1317 2. Creating a tool to generate an interleaved ordering of local
1318 commits based on some criteria (``zip-downstream-fork.py`` uses the
1319 umbrella history as its criterion)
1320
1321 3. Generating such an ordering and feeding it to
1322 ``fast_filter_branch.py`` as a revlist
1323
1324 Some care will also likely need to be taken to handle merge commits,
1325 to ensure the parents of such commits migrate correctly.
1326
1327 Scrubbing the Local Monorepo
1328 ----------------------------
1329
1330 Once all of the migrating, zipping and importing is done, it's time to
1331 clean up. The python tools use ``git-fast-import`` which leaves a lot
1332 of cruft around and we want to shrink our new monorepo mirror as much
1333 as possible. Here is one way to do it::
1334
1335 git -C my-monorepo checkout master
1336
1337 # Delete branches we no longer need. Do this for any other branches
1338 # you merged above.
1339 git -C my-monorepo branch -D local/zip/master || true
1340 git -C my-monorepo branch -D local/octopus/master || true
1341
1342 # Remove remotes.
1343 git -C my-monorepo remote remove upstream/monorepo
1344
1345 for p in ${my_projects[@]}; do
1346 git -C my-monorepo remote remove upstream/split/${p}
1347 git -C my-monorepo remote remove local/split/${p}
1348 done
1349
1350 git -C my-monorepo remote remove localrepo
1351 git -C my-monorepo remote remove umbrella
1352 git -C my-monorepo remote remove myrepo
1353
1354 # Add anything else here you don't need. refs/tags/release is
1355 # listed below assuming tags have been rewritten with a local prefix.
1356 # If not, remove it from this list.
1357 refs_to_clean=(
1358 refs/original
1359 refs/remotes
1360 refs/tags/backups
1361 refs/tags/release
1362 )
1363
1364 git -C my-monorepo for-each-ref --format="%(refname)" ${refs_to_clean[@]} |
1365 xargs -n1 --no-run-if-empty git -C my-monorepo update-ref -d
1366
1367 git -C my-monorepo reflog expire --all --expire=now
1368
1369 # fast_filter_branch.py might have gc running in the background.
1370 while ! git -C my-monorepo \
1371 -c gc.reflogExpire=0 \
1372 -c gc.reflogExpireUnreachable=0 \
1373 -c gc.rerereresolved=0 \
1374 -c gc.rerereunresolved=0 \
1375 -c gc.pruneExpire=now \
1376 gc --prune=now; do
1377 continue
1378 done
1379
1380 # Takes a LOOOONG time!
1381 git -C my-monorepo repack -A -d -f --depth=250 --window=250
1382
1383 git -C my-monorepo prune-packed
1384 git -C my-monorepo prune
1385
1386 You should now have a trim monorepo. Upload it to your git server and
1387 happy hacking!
8541388
8551389 References
8561390 ==========