Skip to content

Conversation

@andrewbranch
Copy link
Member

@andrewbranch andrewbranch commented Dec 16, 2025

This PR introduces a completely redesigned auto-import collection infrastructure. As we ported Strada's auto-import code to Corsa, two things became clear:

  1. The cache infrastructure would need to be redesigned to work with the snapshot-based LSP.
  2. We were leaving significant parallelization opportunities on the table.

We wanted to address these issues before (or while) adding package.json-based dependency discovery, the main feature missing from the initial port of Strada's auto-imports code.

While the data collection backend has been completely redesigned, the fix generation code is mostly unchanged, although the files have been moved from internal/ls into internal/ls/autoimport.

I’m closing existing all bugs/crashes filed against this repo’s auto-imports implementation with this PR. Please comment if you have a bug that still repros the exact same way after this PR.

Strada's collection and cache infrastructure

The earliest version of auto-imports in TypeScript had no caching or separate collection phase. During every global completions request, we used the type checker to gather every export from every source file in the program. This was actually fairly fast, in part because we deferred computing the actual import module specifier until a completion item was highlighted. Eventually, we wanted to address two common complaints: (1) users wanted to see the import module specifier for every item in the completion list without having to highlight it first, and (2) we would often suggest imports from transitive dependencies, bloating the list with entries that should really never be imported. Computing module specifiers for every item in the list slowed things down enough that we implemented a caching layer to compensate. Since the cache included module specifiers and filtering based on what dependencies were accessible to the file requesting completions, much of the cache was specific to the requesting file, and needed to be rebuilt whenever the user started editing a different file.

Later, we added a very commonly requested feature: auto-imports from package.json dependencies. Since the data source for auto-imports was source files already in the program, a common complaint was that immediately after npm installing a dependency, auto-imports from that package wouldn't show up. Users would have to manually write an import from the new package once, then auto-imports would start appearing, even in other files. To address this without completely overhauling all the existing infrastructure, which was designed to use a type checker to extract exports from source files in a program, we added a separate program and type checker “in the background” (scare quotes because we couldn’t actually do anything in a separate process; we just added it onto project loading) containing entrypoints from package.json dependencies that were absent from the main program. Then, the language service could basically just run its existing auto-import collection code twice: once for the main program, and once for the package.json dependencies program. Since this had the potential to add significant amounts of work both to project loading and completions requests, we added a conservative bail-out if more than ten entrypoints were going to be loaded into the background program. This bail-out strategy was a big and imprecise hammer, since those ten entrypoints could trigger an arbitrarily large or small amount of work through transitive imports that would be eventually be filtered out.

In short, auto-imports was due for a rewrite even before typescript-go.

New infrastructure

The primary innovations of the new system are:

Granular buckets

The new infrastructure stores exports in separate buckets that are independent of each other and independent of the requesting file. The autoimport.Registry contains one bucket for each project owning the editor's open files, plus one bucket for each node_modules directory visible to the open files. Project buckets contain exports from the project's local, non-node_modules files. This means that when switching between open files, any buckets that are already populated can be reused. When a project updates, all node_modules buckets are unaffected; when an npm install happens, only the node_modules directories that were touched need to be updated.

Highly parallelized collection

Previously, collection of exports for package.json dependencies worked by creating a program containing all the entrypoints we care about, then using the type checker to extract exports from every source file in that program. The disadvantage of creating a program for this purpose is that we would resolve and parse transitive dependencies that have absolutely no impact on the exports we actually care about. In the new infrastructure, we instead create a mock program (autoimport.aliasResolver) for each dependency package, seeded with just the entrypoint files for that package. We then walk the exports of those entrypoints, and only use a checker when one of those exports is a re-export from another module. The result is that we only process exactly what we need, we limit our reliance on type checkers, and when we do need a type checker, the data backing it is much smaller, so we can create many checkers very cheaply.

Pre-computed module specifiers for node_modules

Since the old infrastructure originally started from a program's source files, module specifier generation always went through the full process of “importing file name + destination file name => module specifier.” That process could be complicated for node_modules files, since the destination file could be the realpath of a symlinked package, or could have been a transitive import that’s not actually reachable from the importing file, or could be blocked by the package’s package.json "exports". Without knowing how that file got into the program, there were a dozen edge cases that needed to be considered to reverse engineer the correct module specifier, or even to determine whether a valid one exists at all. With the new directory-based buckets and extraction only from dependency entrypoints, we actually know the module specifier for every entrypoint as we find it, so we just store it alongside the export information.

Limitations / observable differences compared to Strada

  • Since we use sourceFile.Symbol.Exports instead of checker.GetExportsAndPropertiesOfModule, properties of export = assignments are not collected. Many of these properties were filtered out in the old logic because they were highly suspicious anyway (e.g. class instance methods, properties of arrays). There is a special syntactic carve-out for JavaScript module.exports = { foo, bar, ... } patterns, which are still collected. This limitation did not cause any new test failures.
  • For node_modules packages, since we only initialize the type checker with entrypoints from that package, re-exports of globals declared in other packages or non-entrypoint files of that package will not resolve, and so may lack proper categorization and deduplication with other exports. (This limiation does not apply to re-exports from ambient modules declared elsewhere, which has special handling.) Exporting a global is rare; I would be surprised if this limitation were ever noticed.
  • Since we only use a type checker when we encounter a re-export, some categorization features that eventually determine the icon for a completion entry may differ. In particular, export declare const foo: SomeType would appear as a function if SomeType declared call signatures, but now appears as a variable.
  • Previously, an unresolvable module augmentation would not contribute anything to auto-imports. Now, they do, even though importing them will cause an error. This seems possibly desirable, since the user may fix the underlying module resolution issue.
  • For now, untyped dependency packages don't show up. Previously, we would parse JavaScript file entrypoints as a last resort. There are currently bugs in JavaScript reparsing that cause crashes, performance problems, and phantom exports when loading bundled JavaScript files, so I've disabled JS loading until those are addressed.
  • Previously, completions would usually avoid offering multiple ways of importing the same symbol based on several ranking criteria, and then falling back to an arbitrary symbol-based order. The same ranking criteria is still applied, but instead of falling back to a totally arbitrary order, same-ranked entries generate multiple completion items.

Performance

It's extremely hard to compare this to main to Strada, because each has its own limitations that results in very different behavior, not just different timing characteristics:

  • Strada: volatile cache, package.json dependencies parsed on project load, more info cached on completions request
  • Corsa main: no cache, no package.json dependencies, everything done during completions
  • Corsa new: stable cache, package.json dependencies parsed on first edit, local module specifiers cached on completions request

That said, here are some numbers on two example projects.

Signal-Desktop

Project load First edit First completion Subsequent completion Number of entries
Strada 301 ms 348 ms 101 ms 13,688
Corsa main 387 ms 352 ms 14,576
Corsa PR 0.25 ms 129 ms 99 ms 73 ms 13,589

VS Code

Project load First edit First completion Subsequent completion Number of entries
Strada 318 ms 95 ms 15,864
Corsa main 434 ms 398 ms 15,851
Corsa PR .09 ms 100 ms 125 ms 86 ms 16,310

Follow-up work

  • Investigate including untyped packages after aforementioned JS reparsing bugs are fixed. Even after the fix, we may want some extra heuristics for avoiding minified or bundled files.
  • There’s currently no specific integration with automatic type acquisition. I think Strada had some special logic for this that’s not ported.
  • Import path string completions are not yet implemented (LSP: Path suggestion missing (import/export) #2383), and the node_modules registry buckets implemented here will be very helpful for simplifying the complex logic that exists in Strada.
  • completionItem/resolve is surprisingly slow (not just for auto-imports), and may be missing info like JSDoc that was previously(?) there for auto-imports

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a completely redesigned auto-import collection infrastructure with improved parallelization and caching. The main changes include:

  • Complete rewrite of auto-import data collection and caching infrastructure
  • New granular bucket-based system for storing exports (separate for projects and node_modules)
  • Pre-computed module specifiers for node_modules packages
  • Highly parallelized collection using mock programs instead of full type checking
  • Movement of auto-import code from internal/ls to internal/ls/autoimport

Reviewed changes

Copilot reviewed 186 out of 186 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
internal/tspath/extension.go Exports previously private extension lists and improves ChangeAnyExtension function
internal/testutil/projecttestutil/projecttestutil.go Adds SetupWithRealFS and ToPath utilities for testing
internal/testutil/autoimporttestutil/fixtures.go New comprehensive test fixtures for auto-import lifecycle testing
internal/project/snapshotfs.go Adds sourceFS type for tracking file access during compilation
internal/project/snapshot.go Integrates auto-import registry with snapshot lifecycle
internal/project/session.go Adds auto-import preparation and warming logic
internal/project/autoimport.go New host implementation for auto-import registry
internal/project/dirty/mapbuilder.go New utility for copy-on-write map updates
internal/project/dirty/map.go Adds Replace and TryDelete methods
internal/project/compilerhost.go Refactors to use sourceFS for file tracking
internal/parser/references.go Fixes ambient module name collection
internal/packagejson/*.go Adds utility methods for iterating dependencies
internal/modulespecifiers/*.go Extracts and exports utility functions, adds entrypoint processing
internal/module/*.go Moves shared utilities and adds entrypoint resolution
internal/lsp/server.go Adds auto-import request handler and logging
internal/lsp/lsproto/lsp_generated.go Updates auto-import protocol structures
internal/ls/*.go Major refactoring to use new auto-import infrastructure
internal/ls/autoimport/*.go New auto-import package with registry, view, and fix generation

@jakebailey
Copy link
Member

Removing devDependencies from the search should improve parsing/loading time a good bit!

IIRC this isn't desirable because then you won't get auto imports in things like build scripts or tests.

(I have not read the PR quite yet.)

@andrewbranch
Copy link
Member Author

I think it’s desirable, because you edit build scripts a lot less than you edit your app code! This is how it always worked in Strada. Almost every single one of our users has typescript as a dependency, and polluting all of their projects with our hundreds of API exports doesn’t make any sense.

@jakebailey
Copy link
Member

Ah, sure, I guess I'm not sure how I've been getting fairly good auto imports in say, Herebyfile

entrypointsMu.Unlock()

aliasResolver := createAliasResolver(packageName, packageEntrypoints.Entrypoints)
checker, _ := checker.NewChecker(aliasResolver)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How often does this happen? Creating checkers is quite expensive, technically, so I assume not often

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing expensive about creating checkers is global symbol merges, but part of the point of feeding it with the alias resolver is seeding the checker with only the package’s entrypoint files, so very very few globals if any (typically just ambient modules declared by the entrypoints). This is very fast. On my machine, creating fewer checkers than processors was measurably slower.

}

func createCheckerPool(program checker.Program) (getChecker func() (*checker.Checker, func()), closePool func(), getCreatedCount func() int32) {
maxSize := int32(runtime.GOMAXPROCS(0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we limit this? Or do these actually do so little as to not matter? On my machines that's something like 16 or 20 checkers, which would normally scare me

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, limiting this was measurably slower on my machine on projects I tried (including VS Code, which is a single large program)

defer stop()

if err := s.Run(ctx); err != nil {
fmt.Fprintln(os.Stderr, err)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven’t looked at #2336 yet, but this could be what’s missing to show a stack trace.

}

func (c *Checker) errorNoModuleMemberSymbol(moduleSymbol *ast.Symbol, targetSymbol *ast.Symbol, node *ast.Node, name *ast.Node) {
if c.compilerOptions.NoCheck.IsTrue() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-imports uses checkers for minimal alias resolution, and these code paths go way down a rabbit hole hitting the file system and using the node builder to generate errors. Added some targeted short circuits.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the added failures are differing in a way that I think is unobjectionable or even preferable. I mostly focused on not regressing, so I haven’t looked at all of the tests that were previously failing and still are. I think some follow-up to categorize/address all the remaining ones would be good.

Comment on lines +2161 to +2184
for condition, export := range exports.AsObject().Entries() {
if excludeConditions != nil && excludeConditions.Has(condition) {
continue
}

conditionAlwaysMatches := condition == "default" || condition == "types" || IsApplicableVersionedTypesKey(condition)
newIncludeConditions := includeConditions
if !(conditionAlwaysMatches) {
newIncludeConditions = includeConditions.Clone()
excludeConditions = excludeConditions.Clone()
if newIncludeConditions == nil {
newIncludeConditions = &collections.Set[string]{}
}
newIncludeConditions.Add(condition)
for _, prevCondition := range prevConditions {
if excludeConditions == nil {
excludeConditions = &collections.Set[string]{}
}
excludeConditions.Add(prevCondition)
}
}

prevConditions = append(prevConditions, condition)
loadEntrypointsFromTargetExports(subpath, newIncludeConditions, excludeConditions, export)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is neat: in order to build a node_modules cache that works for any file in any project, regardless of customConditions or the file’s resolution mode, we walk through every conditional export of a package, and alongside the file the entrypoint corresponds to, we record the conditions you would need to have to get there (as we descend into a conditions object) and the conditions you would need not to have (as iterate past a previous condition in the same object). Then, when actually computing completions, we can iterate through these candidate entrypoints and compare the include/exclude sets against the reality for the project and file being queried (ls/autoimport/specifiers.go). Example:

{
  "exports": {
    ".": {
      "node": {
        "custom": "./custom.d.ts",
        "import": "./esm.d.mts",
        "default": "./cjs.d.cts"
      }
    }
  }
}
  • To offer auto-imports from custom.d.ts, the importing file must have node and custom
  • To offer auto-imports from esm.d.mts, the importing file must have node and import and must not have custom
  • To offer auto-imports from cjs.d.cts, the importing file must have node and must not have custom or import.

This is one of the strategy changes from Strada that lets us build a cache that can be used by more than one file.

@andrewbranch
Copy link
Member Author

andrewbranch commented Dec 17, 2025

Ah, sure, I guess I'm not sure how I've been getting fairly good auto imports in say, Herebyfile

So, the functionality where you get imports from packages you've already imported even if they're not in your package.json dependencies is also preserved from Strada, so you'll get auto-imports for any of these things that are already present:

image

Strada also seems to give you imports from the ATA cache, which at least for me, in Herebyfile, includes some pretty random stuff (like React) that I think shouldn't be there. I think it may be including everything in the global typings cache rather than just what Herebyfile is actually pulling on? This was the case for both Strada and Corsa, so I actually just filtered those out in Corsa for now.

Including devDependencies also sounds like something that would be reasonable to expose as an option, especially considering you could combine that with targeted exclusions.

@jakebailey
Copy link
Member

Tried this out, it's very fast!

In DefinitelyTyped-tools's packages/eslint-plugin/test/eslint.test.ts, I commented out the globSync import; when I then tried to auto import it again, globSync was not presented to me unless I had glob imported. I guess this is the devDep filter at work? The same happens to jest-file-snapshot. But, main also has this problem.

Commenting out normalizeSlashes nets me some extra import options that look undesirable:

image

main only has the first one.

In packages/eslint-plugin/src/rules/no-import-of-dev-dependencies.ts, I commented out import { TSESTree } from "@typescript-eslint/utils";; the auto import suggested both @typescript-eslint/types and @typescript-eslint/utils, which I think could be correct. Interestingly, main, only suggests /types, which is worse.

@andrewbranch
Copy link
Member Author

andrewbranch commented Dec 18, 2025

I’ll check out the example in the screenshot. The good one is probably from discovering the symlink in node_modules, while the bad ones probably originated from the realpath file in the program. (I’m not sure why the one with src is duplicated though, that’s kinda weird.) Interestingly, I think the solution is disable the newly lit up symlink cache in module specifier generation for program-sourced files 🙃

Interestingly, main, only suggests /types, which is worse.

I forgot to mention this behavior difference I intentionally made because it’s pretty obscure and in the weeds. In completions, we typically try to deduplicate re-exports of the same symbol and just pick the best one, mostly based on module specifier. When two module specifiers tie in ranking, like @typescript-eslint/utils and @typescript-eslint/types, we would fall back to program order, which IIRC ended up preferring the least deep file, or in other words the most-re-exported symbol. Since we no longer have program order as an input to node_modules files, I noticed some arbitrary swaps of these kinds of ties in test failures in the new system, so I changed the rules slightly:

  • re-exports from different package names are never deduped, always treated as separate
  • for re-exports from the same package (or no package, as local project files), we return multiple completions in the case where there's a tie for the winner, rather than falling back to depth or another totally arbitrary measure

@andrewbranch
Copy link
Member Author

Hm, ok, yeah, symlinked monorepo packages are going to take more thought than I previously thought. I guess the good news is that the current failure mode is that you might get too many suggestions. The bad news is that the ones that come from node_modules might get stale as you edit your local packages, since we won't figure out from the watch event on the realpath that something stored in the node_modules bucket changed. Getting my thoughts out in writing since I may not have time to finish this out before I’m gone for the holidays:

The current way we include node_modules packages that would normally be filtered out by your package.json dependencies, but are directly imported by your program files, is when we iterate over program files during project bucket building, if the file is in node_modules, we only include it if the node_modules bucket corresponding to the node_modules directory the file is in is not going to include it (we've already computed the set of dependency names for any node_modules buckets that are being built concurrently). If the node_modules bucket doesn't have that package in its dependencies, we just process the file as part of the project bucket. This has pros and cons:

  • Pro: the dependency that should arguably be filtered out is only offered in the project that's already importing it; it doesn't "leak" into any other projects you may have open.
  • Con: only the files that are already in the program become available for auto-import. If the package has a bunch of different entrypoints, e.g. like an icon library with a file per icon, importing "cool-icons/react/chevron-left" isn't going to make "cool-icons/react/chevron-right" appear, which is pretty annoying.
  • Con: since we didn't discover the file by entrypoint searching its package.json, it doesn't get the new fast pre-computed module specifier. It goes through the old module specifier generation code. Not a very big deal, but it makes me slightly nervous that the code paths are pretty different.

I think it might be a better idea to invert the current logic, and instead of precomputing the node_modules bucket dependencies so we can add extra node_modules files to project buckets, we should precompute the node_modules packages referenced by active programs and add those to the dependency list for those node_modules buckets. I think that will make everything else more consistent, and node_modules files will truly always be in node_modules buckets. The dependency computation is slightly more complicated, but there's not this weird exception to the rule of where exports are indexed.

Either way, when we go through the program files to find resolved node_modules packages, we need to check to see if non-node_modules file names are actually the realpath of a node_modules package so we can know to let the node_modules buckets handle them. Additionally, when we handle a file change event for one of those files, we need to mark the node_modules bucket dirty, not the project bucket dirty.

@alexandr2110pro
Copy link

Hey there. Reading the code and discussions here is a pleasure.

While you're working on this, maybe you can also check this little repo too.

It is a standard Nx ts-project-reference-based monorepo (this is their current default mode. Previously it was based on compilerOptions.paths) with nextjs and a bunch of example libs.

I've created it to explore how Zed's eslint and TS lsp integrations work, and it has troubles with autocompletions (and autoimport actions) of members exported from local packages. By looking at the vtsls logs, it doesn't return the completions, so it is probably not the sole zed's fault.
In my experience, vscode struggles with this too, although it seems to work a bit better.

A little insight - if you have monorepo package A with files fooA.ts and barA.ts and B with files fooB.ts and barB.ts
then, for all files in package A will not receive any suggestions for members of package B... Until you manually add import from B into any of files of A: eg import fooB into fooA => then both fooB.ts and barB.ts are present in completions in both fooA.ts and barA.ts.

If this problem has roots in the tsserver implementation, you could fix it the new implementation and drastically improve DX for large Nx monorepos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants