Submodules
Subrepo Inside a Parent Repo
This is not the step to clone a repo inside another repo. This is just how I initially tried to do it but went with using submodules instead.
To see a detailed explanation, scroll down to Submodules are pointers.
To use a submodule (cloning a repo inside another repo), scroll down to Creating Submodules.
The goal was to consolidate all repositories into a single central monorepo. Inside the parent repo, I tried to clone a remote repository:
git clone git@github.com:username/submodule-name.git
In the subrepo, I ran:
submodule-name$ git remote -v
origin git@github.com:username/submodule-name.git (fetch)
origin git@github.com:username/submodule-name.git (push)
Checking the git status:
submodule-name$ git status
On branch main
Your branch is up to date with 'origin/main'.
nothing to commit, working tree clean
Next, I moved up to the root of the parent repo:
parent-repo$ git remote -v
origin git@github.com:username/parent-repo.git (fetch)
origin git@github.com:username/parent-repo.git (push)
The git status of the parent repo shows:
parent-repo$ git status
On branch master
Your branch is up to date with 'origin/master'.
Untracked files:
(use "git add <file>..." to include in what will be committed)
submodule-name/
Since I haven't made any changes in the subrepo, there's nothing to commit there. However, the parent repo sees the new subrepo as an untracked change that needs to be committed. When I tried to commit, I got this message:
parent-repo$ git add .; git commit -m "Added subrepo"; git push
warning: adding embedded git repository: submodule-name
hint: You've added another git repository inside your current repository.
hint: Clones of the outer repository will not contain the contents of
hint: the embedded repository and will not know how to obtain it.
hint: If you meant to add a submodule, use:
hint:
hint: git submodule add <url> submodule-name
hint:
hint: If you added this path by mistake, you can remove it from the
hint: index with:
hint:
hint: git rm --cached submodule-name
hint:
hint: See "git help submodule" for more information.
[master 47bb7e6] Added subrepo
2 files changed, 5 insertions(+), 4 deletions(-)
It looks like Git is treating the submodule-name
as a submodule and suggests that I should use a submodule. When I check GitHub, I can see the subrepo has been pushed, but I can’t open it after clicking on it. The folder icon shows an "arrow pointing to the right," which means Git is indeed treating it as a submodule.
Submodules are pointers
For submodules, the remote repository doesn’t display the subrepo inside it but rather points to another remote repository. So when I click the embedded subrepo with the "arrow pointing to the right," it should redirect me to the remote repository of the submodule.
Creating Submodules
You don't need to clone the remote repo. Running the
submodule
command will also clone the remote repo.
To create a submodule, run the command below. The name of the folder in my case is submodule-name, but you can set it to any name you want.
git submodule add <http-url-of-the-remote-repo> submodule-name
If you are using SSH keys to authenticate to Github, use this command:
git submodule add git@github.com:username/submodule-name.git submodule-name
Note that when you added a submodule inside another Git repository, you need to commit and push the changes from inside the submodule directory, then you also need to commit and push the changes from the root of the parent repo.
Go inside the submodule directory:
cd parent-repo/submodule-name
git add .
git commit -m "Added submodule inside parent repo"
git push
Then go up to the root of the parent repo:
cd parent-repo
git add .
git commit -m "Update changes on the submodule-name"
git push
Cloning Parent Repo with Submodules
The assumption here is you want to clone the parent repo from another machine.
When you clone a parent repository from a remote source like GitHub, you can successfully pull down the main repository. However, the contents of the submodules are not automatically included. If you check the GitHub repository and click on a submodule, you will be directed to a separate remote repository.
Essentially, submodules in the remote repository act as pointers. Thus, when you clone the parent repository, the submodules will be empty.
$ tree test-repos/
test-repos/
├── go-webapp-sample
├── test-jenkins-project
└── test-static-site
4 directories, 0 files
If you need to get the contents of the submodules, you need to go inside each submodule and then do a git pull
to pull the contents of that repo.
cd test-static-site
git pull
Cloning Specific Directory (With Trailing Directories)
The assumption here is you want to clone the parent repo from another machine.
Let's say you have the following in your remote Github repository:
$ tree main-repo
└── .git
└── directory-a
└── file-x.txt
└── file-y.txt
└── directory-b
└── directoryc
└── nested-directory
├── README.md
├── test1.txt
└── test2.txt
└── test3.txt
If you want other users to pull down just the nested-directory
without pulling down the entire parent repo, you can use the commands below to pull down the code
Before doing this, make sure you have the absolute path of the nested directory inside the parent repo. You can check this in the remote GIthub repository. In this example, the absolute path is:
directory-b/directory-c/nested-directory
git clone -n --depth=1 --filter=tree:0 https://github.com/username/main-repo.git
$ ls -al main-repo/
total 0
drwxrwxrwx 1 username username 512 Oct 31 18:38 .
drwxrwx--- 1 username username 512 Oct 31 18:38 ..
drwxrwxrwx 1 username username 512 Oct 31 18:40 .git
Go inside the parent repo and run the sparse-checkout
commands:
cd main-repo
git sparse-checkout set --no-cone directory-b/directory-c/nested-directory
git checkout
Note that this will include the first layers "directory-b/directory-c/nested-directory".
$ tree main-repo
└── .git
└── directory-b
└── directory-c
└── nested-directory
├── README.md
├── test1.txt
└── test2.txt
└── test3.txt
Cloning Specific Directory (Without Trailing Directories)
The steps here actually pulls the entire parent repo and remove the unnecessary files, leaving only the specific directory. I wouldn't recommend this since the code base could be large and downloading all of it might take time and network bandwidth.
There is another way to rewrite the repo so that only the specific directory is cloned even if that directory is nested deep inside layers of directories.
git clone --depth 1 https://github.com/username/main-repo.git nested-directory
cd nested-directory
Then use the filter-branch
to delete all other files/directories except the desired sub-directory.
git filter-branch --prune-empty --subdirectory-filter directory-b/directory-c/nested-directory HEAD
$ tree nested-directory
├── README.md
├── test1.txt
└── test2.txt
└── test3.txt
Reference: Rewriting the repo
Not Intended for Submodule
However, I didn’t want to set up the repository as a submodule on my local machine. I just want it to be a child repo inside of the a parent repo. So locally, it’s not a submodule and isn’t linked to any remote repository. Yet, when I commit and push the parent repo, the embedded subrepo gets treated as a submodule. That’s why I can see the subrepo on GitHub, but I can’t open it.
Convert Directory to a Submodule
To convert the embedded subrepo to a submodule, you need to do this steps:
-
Add the directory a submodule using the command below. It will clone the remote repository to this directory.
git submodule add git@github.com:username/submodule-name submodule-name
If you are using SSH keys to authenticate to Github, use this command:
git submodule add git@github.com:username/submodule-name.git submodule-name
-
In some instances, you may need to delete the folder of the subrepo.
rm -rf submodule-name
-
If you encounter an error, you can try removing the deleted submodule from Git index and then try doing step 2 again.
git rm --cached submodule-name
-
To verify if the submodule was created, run the command below. It should return something like this:
$ git submodule
721fbdf8a89ec49f0c494ad4261d31b4335dcbd5 submodule-name
(heads/main) -
The parent repo and submodule should now be pointing to different remote repositories. From your parent repo:
$ git remote -v
origin git@github.com:username/parent-repo.git (fetch)
origin git@github.com:username/parent-repo.git (push)From the submodule:
$ cd submodule-name
$ git remote -v
origin git@github.com:username/submodule-name (fetch)
origin git@github.com:username/submodule-name (push) -
From the root of the parent repo, push the changes to Github.
git add .; git commit -m "Added submodule"; git push
-
In Github, you can see the submodule inside the parent repo. In my case, the submodule name is test-jenkins-project but the name will appear different because it's actually a pointer.
Convert Submodule to a Normal Directory
After an entire day of playing around nested repos, I figure it's way easier to use Submodules than using subrepos inside parent repos. There are workarounds to make subrepos work especially in Jenkins pipelines and when cloning code, but it still requires additional steps to ensure that the GIt history of the parent repo and the nested repos inside doesn't mess with each other.
Having said, I choose to use submodules moving forward for the following reasons:
-
Separate Git History: Each submodule retains its own history independently, which prevents conflicts or complexity in the parent repo's Git history.
-
Easier CI/CD Integration: In Jenkins pipelines, you can reference the submodule’s remote repository directly if you only need to work with that specific project.
-
Selective Updates: With submodules, you can control when to update or pull changes for each project, This makes it easy to keep some submodules stable while actively developing others.
I may still use nested repos on some cases where I don't need to use pipelines on the individual project repositories.
-
Go to your submodule directory and delete the .git folder.
cd parent-repo/submodule-name
rm -rf .git -
Go one level up and remove the submodule from Git index.
git rm --cached submodule-name
-
Go back to your parent repo and clear the .gitmodules file. Note that if you have other submodules inside the parent repo, don't run the
cat
command below as it will delete the contents of the .gitmodules files. Instead just delete the specific submodule.cd parent-repo
cat > .gitmodules # then click Ctrl-D -
Still in the root of the parent repo, locate any modules folder. Delete the specific modules folder.
rm -rf .git/modules/path/to/submodule-name
-
Note that the parent repo's own .git/config file may also be referencing the deleted submodule. Make sure to delete the reference.
cat .git/config
If you find these lines, remove them.
[submodule "parent-repo/submodule-name"]
url = git@github.com:username/submodule-name.git
active = true -
At this point, the submodule directory is now converted into a normal directory. There is now only one repo, which is the parent repo. Commit and push the changes to the parent repo's remote repo.
git add .
git commit -m "Converted submodule to a normal directory inside the parent repo"
- Verify in Github if the pointer is now converted to a directory. The folder icon should not have the "arrow pointing right" and you should be able to open it after clicking.
-
Back in your terminal, go inside the converted submodule directory it and initialize it. Commit the changes.
cd parent-repo/submodule-name ## submodule-name is not a submodule anymore
git init
git add .
git commit -m "Initialize project directory to its own git repo inside a parent repo."Verify the status:
$ git status
On branch master
nothing to commit, working tree clean
Deleting a Submodule
There are three steps to delete a submodule inside a parent repo:
-
Clear the submodule from the Git index.
git rm --cached /path/to/submodule-name
-
Delete the directory. You may also just move the directory on a different directory outside of the repo.
rm -rf submodule-name
-
If you want to keep the directory, move it to a different directoy outside of the parent repo and delete
.git
folder inside the submodule directory.cd submodule-name
rm -rf .git
cd ..
mv submodule-name /another-directory/outside/repo -
There may still be remnants of the submodule so go back to the parent repo and find any
.gitmodule
file.cd parent-repo
ls -la .git/modules
ls -la .gitmodulesIf there's only one submodule, you can delete the entire
modules
directory.rm -rf .git/modules/
If you have other submodules inside the
modules
directory, only delete the specific submodule.rm -rf .git/modules/specific-submodule
-
If you have a
.gitmodules
file at the root of the parent directory, check first if you have other submodules inside it.If there are other submodules inside the
.gitmodules
file, edit the file usingvi
ornano
and delete the specific submodule/s.$ cat .gitmodules
[submodule "path/to/submodule-name-a"]
path = path/to/submodule-name-a
url = git@github.com:username/remote-repository-submodule-name-a
[submodule "path/to/submodule-name-b"]
path = path/to/submodule-name-b
url = git@github.com:username/remote-repository-submodule-name-b
[submodule "path/to/specific-submodule"]
path = path/to/specific-submodule
url = git@github.com:username/remote-repository-specific-submodulevi .gitmodules
If it only contains the specific submodule that you want to delete, you can delete the
.gitmodules
file$ cat .gitmodules
[submodule "path/to/specific-submodule"]
path = path/to/specific-submodule
url = git@github.com:username/remote-repository-specific-submodulerm -rf .gitmodule
-
To verify, run the command below:
git submodule