• ABOUT
  • NEWSLETTER
  • Contact
Krzysztof Marczewski Krzysztof Marczewski
  • ABOUT
  • NEWSLETTER
  • Contact
  • Development

How to change git history with git-filter-repo

  • Krzysztof Marczewski
  • 26 February 2025
  • 7 minute read

Not long ago, together with my colleague Krzysztof Babis, who is also a co-author of this article, we faced the need to modify the Git repository history. After a short research we’ve found a way that was git-filter-repo tool.
In this step-by-step guide, you’ll learn how to use git-filter-repo to efficiently rewrite the history of your Git repository.

Before we go any further, remember to backup your repository before using git-filter-repo, as it makes permanent changes to your history.

You may face this type of task for a variety of reasons, such as migration to another repository vendor, changes in code ownership, staff turnover, or common errors such as pushing files with sensitive data.

As a preview of what the filtering repo can do, the last example can be fixed with a single command:

git filter-repo --path sensitive-file.txt --invert-paths

Our goal was a bit different, as we needed to change the email addresses of the commit authors as a result of the brand and domain name changes, as well as the corresponding strings in the commit messages, and of course the contents of the files. The possibilities are almost endless, as this is just the tip of the iceberg, but here we focus on these two main issues.

Let’s take a look at some basic operations and see some two direct examples from the documentation:

1. Simple modifications of commit messages with —replace-message

If you want to modify commit or tag messages, you can do so with the same syntax as --replace-text, explained above. For example, with a file named expressions.txt containing

foo==>bar

then running

git filter-repo --replace-message expressions.txt

will replace foo in commit or tag messages with bar.

2. Basic changing user and email based

To modify username and emails of commits, you can create a mailmap file in the format accepted by git-shortlog. For example, if you have a file named my-mailmap you can run

git filter-repo --mailmap my-mailmap

and if the current contents of that file are as follows (if the specified mailmap file is version controlled, historical versions of the file are ignored):

Name For User <email@addre.ss>
<new@ema.il> <old1@ema.il>
New Name And <new@ema.il> <old2@ema.il>
New Name And <new@ema.il> Old Name And <old3@ema.il>

Prerequisites

Git-filter-repo is a command-line tool. So if you’ve pushed a file that contains, say, sensitive or leaked data, you can use it directly for non-trivial changes. For more complicated solutions, scripts are the way to go. In the second part of this article, we’ll see a step-by-step tutorial on how to build this script.

In order to use git-filter-repo as part of our script there are some prerequisite steps one have to follow:

  • python3 (git-filter-repo is a script written in Python, so pip is used to install it
  • terminal
  • IDE to write script that will be executed agains our git

Making custom script and how it works

Problem #1: Unstructured commit author data

The first step was to analyze the current state of affairs, which fortunately could be done using git log: git log --format='%an <%ae>'

The repository has been expanded over the years, so we expected to receive many authors on the list. presented in a not entirely consistent way and the results did not surprise us:

Adam Jones < adam.jones@outdated.com >
Adam Jones < adamjones@MacBook-Pro-Adam.local >
Bart Clear < bart.clear@outdated.com >
bart.clear@outdated.com <Bart Clear>
Robot - Build Service (outdated) <Robot - Build Service (outdated)>
Outdated Build Service <Outdated - Build Service (outdated)@outdated.com>
Josh <>

We have identified the following issues:

  • Some users have different email addresses
  • Some commits do not have their author data configured correctly
  • There are also some special cases that stand out, such as Robot - Build Service

Based on them, we defined the following goals to achieve:

  1. Unification of the e-mail domain → Each user should have an address in @example.com
  2. Assigning missing email addresses → If the email is empty, we generate it based on the first and last name.
  3. Replacing specific names → Build Services should have specific names and addresses
  4. Fixed the situation where name and surname are saved as email address and vice versa

Following the KISS principle, we kept it simple and iterated based on a scaled-down copy of the repository.

Step 1: Email Unification

To start with, we created a simple function that replaces the email domain with @example.com.

import subprocess

callback_code = '''
def map_email(email):
    email_prefix = email.split("@")[0]
    return f"{email_prefix}@example.com"

commit.author_email = map_email(commit.author_email.decode("utf-8")).encode("utf-8")
commit.committer_email = map_email(commit.committer_email.decode("utf-8")).encode("utf-8")
'''

subprocess.run([
    'git', 'filter-repo', '--force', '--commit-callback', callback_code
], check=True)

Step 2: Handling empty emails

Next, we added support for cases where commit authors didn’t include their email addresses. We generated them according to the company’s mailing address policy:

def map_email(name, email):
    if email == "" or email == "<>":
        name_surname = name.lower().replace(" ", ".").replace("-", ".")
        return f"{name_surname}@example.com"
    email_prefix = email.split("@")[0]
    return f"{email_prefix}@example.com"

Step 3: Special Cases

It’s time to handle special cases, which was initially helped by the added map_name function:

def map_name(name):
    if "Robot - Build Service (outdated)" in name:
        return "Robot Build Service"
    if "Outdated Build Service" in name:
        return "Media Build Service"
    return name

We left the best part, i.e. replacing the name and surname with the email address, for last, using a regular expression (regex):

import re

def map_name(name):
    if re.match(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\\\.[a-zA-Z]{2,}$', name):
        name_part = name.split("@")[0]
        first_name, last_name = name_part.split(".")
        return f"{first_name.capitalize()} {last_name.capitalize()}"
    return name

We used this in the script fragment responsible for transforming the author and committer information for each history entry:

# Rewrite author information with debug output
author_name = commit.author_name.decode("utf-8")
author_email = commit.author_email.decode("utf-8")
new_author_name = map_name(author_name)
new_author_email = map_email(author_name, author_email)
print(f"Rewriting Author: {author_name} <{author_email}> to {new_author_name} <{new_author_email}>")
commit.author_name = new_author_name.encode("utf-8")
commit.author_email = new_author_email.encode("utf-8")

# Rewrite committer information with debug output
committer_name = commit.committer_name.decode("utf-8")
committer_email = commit.committer_email.decode("utf-8")
new_committer_name = map_name(committer_name)
new_committer_email = map_email(committer_name, committer_email)
print(f"Rewriting Committer: {committer_name} <{committer_email}> to {new_committer_name} <{new_committer_email}>")
commit.committer_name = new_committer_name.encode("utf-8")
commit.committer_email = new_committer_email.encode("utf-8")

So bart.clear@outdated.com <Bart Clear> became Bart Clear < bart.clear@outdated.com >.

Problem #2: Unacceptable phrases in commit titles

We needed to remove sensitive information and standardize naming conventions, so we improved our script to automatically update commit messages and replace specific text throughout a repository’s history.

Step 1: Setting the Goal

We wanted to replace certain words in commit messages and file contents across all Git history. The replacements are:

  • "to_be_replaced" → "example"
  • "replace_me" → "example"
  • "replaceable" → "example"
  • "renameable" → "example"

Step 2: Preparing a List of Replacements with or without Regular Expressions

First, we needed a structured way to store our text replacements. We created a mapping list that followed Git’s replace-text format:

replace_text_code = '''
to_be_replaced==>example
replaceable==>example
replace_me==>example
renameable==>example
'''

Each line specified a replacement in the form:

old_text==>new_text

This was used to replace occurrences of these words in file contents.

If we need more robust solution to filter case insensitive texts or other regular expressions, we have to use specific syntax

regex:(?i)old_text==>new_text

In our case this will replace text no matter of case used, so OLD_text, Old_Text etc.

replace_text_code = '''
regex:(?i)to_be_replaced==>example
regex:(?i)replaceable==>example
regex:(?i)replace_me==>example
regex:(?i)renameable==>example
'''

Step 3: Modifying Commit Messages with Regular Expressions

Next, we needed to make sure commit messages also followed these rules. We used regular expressions to search and replace text while keeping the message formatting intact.

commit_callback_code = '''
message = re.sub(b"to_be_replaced", b"example", message, flags=re.MULTILINE | re.IGNORECASE)
message = re.sub(b"replaceable", b"example", message, flags=re.MULTILINE | re.IGNORECASE)
message = re.sub(b"replace_me", b"example", message, flags=re.MULTILINE | re.IGNORECASE)
message = re.sub(b"renameable", b"example", message, flags=re.MULTILINE | re.IGNORECASE)
return message'''

Breaking It Down:

  • re.sub(pattern, replacement, message, flags=...) → Searches for a pattern and replaces it.
  • b"to_be_replaced" → The b prefix ensures we’re working with binary strings, which is needed for git filter-repo.
  • flags=re.MULTILINE | re.IGNORECASE → This makes the replacement case-insensitive and allows for changes in multiline messages.

Step 4: Writing the Replacement Rules to a Temporary File

Git’s filter-repo command requires a file containing replacement rules. Instead of manually creating a file, we generated a temporary one:

import tempfile

with tempfile.NamedTemporaryFile(mode='w', delete=False) as replace_file:
    replace_file.write(replace_text_code)
    replace_file_path = replace_file.name # Store the file path

Why Use a Temporary File?

  • Prevents cluttering your project with extra files.
  • Automatically cleans up after execution.

Step 5: Running git filter-repo to Rewrite History

We finally applied the changes using subprocess.run(), which calls a system command from Python:

import subprocess

try:
    subprocess.run([
        'git', 'filter-repo', '--force',
        '--message-callback', 'update_commit_message',
        '--replace-text', replace_file_path
    ], check=True)

    print("Git history rewritten successfully.")
except subprocess.CalledProcessError as e:
    print(f"Error while running git-filter-repo: {e}")

Breaking It Down:

  • git filter-repo --force → Forces execution, even if there are warnings.
  • -message-callback 'update_commit_message' → Runs our Python function to modify commit messages.
  • -replace-text replace_file_path → Uses the temporary file to replace text in the repository.

Step 6: Handling Errors

If something goes wrong (eg, git filter-repo is not installed or a commit message contains unexpected content), we catch the error:

except subprocess.CalledProcessError as e:
    print(f"Error while running git-filter-repo: {e}")

Combined final solution

After the above steps, the script took the following form:

import subprocess
import tempfile

callback_code = '''
import re

def map_email(name, email):
    if email == "" or email == "<>":
        name_surname = name.lower().replace(" ", ".").replace("-", ".")
        return f"{name_surname}@example.com"
    email_prefix = email.split("@")[0]
    return f"{email_prefix}@example.com"

def map_name(name):
    if "Robot - Build Service (outdated)" in name:
        return "Robot Build Service"
    if "Outdated Build Service" in name:
        return "Media Build Service"

    if re.match(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\\\.[a-zA-Z]{2,}$', name):
        name_part = name.split("@")[0]
        first_name, last_name = name_part.split(".")
        return f"{first_name.capitalize()} {last_name.capitalize()}"
    return name

# Rewrite author information with debug output
author_name = commit.author_name.decode("utf-8")
author_email = commit.author_email.decode("utf-8")
new_author_name = map_name(author_name)
new_author_email = map_email(author_name, author_email)
print(f"Rewriting Author: {author_name} <{author_email}> to {new_author_name} <{new_author_email}>")
commit.author_name = new_author_name.encode("utf-8")
commit.author_email = new_author_email.encode("utf-8")

# Rewrite committer information with debug output
committer_name = commit.committer_name.decode("utf-8")
committer_email = commit.committer_email.decode("utf-8")
new_committer_name = map_name(committer_name)
new_committer_email = map_email(committer_name, committer_email)
print(f"Rewriting Committer: {committer_name} <{committer_email}> to {new_committer_name} <{new_committer_email}>")
commit.committer_name = new_committer_name.encode("utf-8")
commit.committer_email = new_committer_email.encode("utf-8")

commit.author_name = map_name(commit.author_name.decode("utf-8")).encode("utf-8")
commit.author_email = map_email(commit.author_name.decode("utf-8"), commit.author_email.decode("utf-8")).encode("utf-8")
commit.committer_name = map_name(commit.committer_name.decode("utf-8")).encode("utf-8")
commit.committer_email = map_email(commit.committer_name.decode("utf-8"), commit.committer_email.decode("utf-8")).encode("utf-8")
'''

commit_callback_code = '''
message = re.sub(b"to_be_replaced", b"example", message, flags=re.MULTILINE | re.IGNORECASE)
message = re.sub(b"replaceable", b"example", message, flags=re.MULTILINE | re.IGNORECASE)
message = re.sub(b"replace_me", b"example", message, flags=re.MULTILINE | re.IGNORECASE)
message = re.sub(b"renameable", b"example", message, flags=re.MULTILINE | re.IGNORECASE)
return message'''

replace_text_code = '''
regex:(?i)to_be_replaced==>example
regex:(?i)replaceable==>example
regex:(?i)replace_me==>example
regex:(?i)renameable==>example
'''

try:
    with tempfile.NamedTemporaryFile(mode='w', delete=False, encoding='utf-8') as replace_file:
        replace_file.write(replace_text_code)
        replace_file.close()

        print(f"Temporary file created at: {replace_file.name}")

        subprocess.run([
            'git', 'filter-repo', '--force', '--commit-callback', callback_code, '--message-callback', commit_callback_code, '--replace-text', replace_file.name
        ], check=True)

    print("Git history rewritten successfully.")
except subprocess.CalledProcessError as e:
    print(f"Error while running git-filter-repo: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

The time has come for the grand finale, in which:

  1. Being aware of the irreversibility of the operation, we made a backup copy of the repository before running the script
  2. We reminded the team members about the need to clone the repository again
  3. We ran the script via python3 rewrite-authors.py
  4. We checked the result using git log --format='%an <%ae>'
  5. We pushed the changes using git push --force --all to overwrite the history

Final words

That’s it! As you can see, filter-repo is a very powerful tool. It allows us to modify git history in a way that completely rewrites author names, emails, commit messages, and their contents.

Whether you need to remove sensitive data, standardize author details, or reorganize your commit messages, filter-repo provides a robust and flexible solution.

I’d also like to give a special thanks to Krzysztof Babis, who you can find on LinkedIn. Krzysztof is an excellent iOS developer and created the underlying script (the main part) and the step-by-step explanation.

Total
0
Shares
0
0
0
0
Previous Article
  • Productivity

How to quickly write and run Kotlin script – practical guide

  • Krzysztof Marczewski
  • 15 September 2022
View Post

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

© 2024 by Krzysztof Marczewski

Input your search keywords and press Enter.

We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
Cookie settingsACCEPT
Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
CookieDurationDescription
cookielawinfo-checbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytics
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Others
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
SAVE & ACCEPT