Skip to content

Add a title: (bug): Single file ingestion (blob) ignores branch/commit validation #557

@samarthsaxena2004

Description

@samarthsaxena2004

Which interface did you use?

Web UI

Repository URL (if public)

https://github.com/coderamp-labs/gitingest

Git host

GitHub (github.com)

Other Git host

No response

Repository visibility

public

Commit, branch, or tag

branch

Did you ingest the full repository or a subdirectory?

full repository

Operating system

Not relevant (Web UI)

Browser (Web UI only)

Not relevant (CLI / PyPI)

Other browser

No response

Gitingest version

No response

Python version

No response

Bug description

Bug Description

The current implementation for processing single files (blobs) in src/gitingest/ingestion.py does not validate if the requested file actually exists in the specific Git reference (branch, tag, or commit) provided in the query.

As noted in the codebase TODO: # TODO: We do this wrong! We should still check the branch and commit!, the system currently reads whatever is present in the local filesystem after a shallow clone, which could lead to serving incorrect file versions if a specific commit hash is requested but not strictly verified.

Steps to reproduce

  1. Provide a URL for a specific file blob pointing to a branch or commit where that file does not yet exist.
  2. Observe that if the file exists on the default branch that was shallow-cloned, the system may ingest it without verifying it belongs to the requested reference.

Expected behavior

The system should use Git to verify that the specific subpath exists within the requested commit or branch before attempting to read it from the disk.

Actual behavior

The code only checks if not path.is_file(): on the local filesystem, ignoring the query.commit or query.branch values.

Steps to reproduce

1. Open Gitingest Web UI.
2. Input a URL pointing to a single file blob on a specific branch or commit where that file does not exist (e.g., https://github.com/user/repo/blob/feature-branch/new-file.py).
3. If the file exists on the default branch, the system may ingest it without verifying it strictly belongs to the requested 'feature-branch' reference.

Expected behavior

The system should strictly validate that the requested file path exists within the specific Git reference (branch, tag, or commit) provided in the query using Git verification (like git rev-parse) before processing.

Actual behavior

The system currently checks for file existence on the local filesystem of the shallow clone. If the file exists locally (perhaps from a different branch), it is processed without confirming it belongs to the requested Git state.

Additional context, logs, or screenshots

This issue addresses the existing TODO in src/gitingest/ingestion.py: "# TODO: We do this wrong! We should still check the branch and commit!"

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions