Skip to content

Support typed Data.define members via RBS comment#865

Draft
julianojulio wants to merge 1 commit into
masterfrom
ae-task-22-typed-data-define-members-via-rbs-commen
Draft

Support typed Data.define members via RBS comment#865
julianojulio wants to merge 1 commit into
masterfrom
ae-task-22-typed-data-define-members-via-rbs-commen

Conversation

@julianojulio
Copy link
Copy Markdown

@julianojulio julianojulio commented Mar 23, 2026

Summary

Add support for typing Data.define members by propagating types from a sig on initialize to the member reader methods. This works with both RBS #: comments and traditional sig { } blocks.

Based on the typed Data.define approach originally designed by @cbothner in sorbet/sorbet#8079. This PR adds the RBS virtual initialize (zero runtime cost) and incorporates the conservative bare-super check from @jez's review feedback on that PR.

Addresses sorbet#10055

Two ways to type Data.define members

RBS comment (zero runtime cost — recommended)

The #: comment annotates a "virtual initialize" — no method is defined at runtime:

TypedPoint = Data.define(:x, :y) do
  #: (x: Integer, y: String) -> void
end

TypedPoint.new(x: 1, y: "hi").x  # => Integer (was T.untyped)

Traditional Sorbet sig (requires explicit initialize)

An explicit def initialize(...) = super is needed, which adds a small runtime cost (one extra method dispatch per construction):

TypedPoint = Data.define(:x, :y) do
  extend T::Sig
  sig { params(x: Integer, y: String).void }
  def initialize(x:, y:) = super
end

Both produce identical type checking. The RBS form is preferred because it has zero runtime overhead.

Ambiguity with methods in the block

When the block also contains methods, a #: comment could be ambiguous — is it a virtual initialize signature or a signature for the method below it?

Point = Data.define(:x, :y) do
  #: (x: Integer, y: String) -> void  # virtual init? or sig for advance?

  def advance(x, y); end
end

The implementation resolves this using a gap heuristic: a #: comment is treated as a virtual initialize only if there is at least one blank line between it and the first method definition. If the comment is immediately above a def, it's treated as that method's signature.

However, for clarity, when a block contains both typed members and additional methods, we recommend using an explicit def initialize to remove all ambiguity:

Point = Data.define(:x, :y) do
  #: (x: Integer, y: String) -> void
  def initialize(x:, y:) = super

  #: (Integer, Integer) -> void
  def advance(x, y); end
end

This way each #: unambiguously attaches to the def below it. The virtual initialize (no explicit def) is best suited for simple Data classes with no additional methods.

Design decisions (informed by #8079 review)

Following @jez's principle that "the default assumption should be 'Sorbet doesn't do anything' and only if a certain set of very precise constraints are met should Sorbet do something":

  • Bare super required: Typed readers are only created when the initialize body is exactly bare super (i.e., def initialize(x:, y:) = super). When the user transforms values (e.g., super(x: x.to_i)), readers conservatively fall back to T.untyped. This keeps our options open for smarter analysis in the future.

  • self.[] left untyped: Sorbet cannot overload a method to accept both positional and keyword arguments, so only new/initialize is typed.

  • No partial typing: If the sig's param names don't match the Data.define members, standard RBS/sig parameter mismatch errors fire and readers fall back to T.untyped.

Approach

  • rbs/prism/CommentsAssociatorPrism.cc: maybeExtractDataDefineOrphanComments — for Data.define blocks without an explicit def initialize, extracts leading orphan #: comments and associates them with the Prism block node. Uses a gap heuristic: a comment is "orphan" (for virtual init) only if there's at least one blank line before the first def in the body, or the body is empty.

  • rbs/prism/SigsRewriterPrism.cc: maybeSynthesizeDataDefineVirtualInit — for Data.define blocks with an associated orphan signature, synthesizes a Prism pm_def_node_t for initialize (with keyword args + bare super body) and translates the RBS comment into a sig node.

  • rewriter/Data.cc: isBareSuper + findSigBeforeInitialize + extractTypesFromParams + extractInitializeTypes — when the block contains a sig preceding a bare-super def initialize, extracts the parameter types and creates typed reader stubs with sig { returns(Type) }.

Test coverage

RBS tests (test/testdata/rbs/signatures_data_define.rb):

  • Virtual initialize (zero-cost), explicit initialize with defaults
  • Additional methods alongside virtual init, non-method stmts (include)
  • ::Data.define, untyped fallback, mismatched param names
  • Complex types: T.nilable, T.any, T::Array, T::Hash, T::Boolean
  • Many members (5+), nested in modules
  • Wrong type / missing kwarg error validation, T.assert_type!

Sorbet sig tests (test/testdata/rewriter/data.rb):

  • Basic typed members, complex types (T::Array, T.nilable)
  • Additional methods, non-bare-super → untyped fallback

Limitations

  • Multiline #| continuation: Supported in code (the orphan extraction collects both #: and #| prefixed comments) but not tested — Sorbet's test assertion parser interprets #| lines containing colons as test annotations (e.g., #| name: String is parsed as assertion type name).

Co-authored-by: Cameron Bothner cbothner@users.noreply.github.com
Co-authored-by: Claude noreply@anthropic.com

@julianojulio julianojulio force-pushed the ae-task-22-typed-data-define-members-via-rbs-commen branch 7 times, most recently from 0cb51f6 to 905e0e9 Compare March 24, 2026 01:59
@julianojulio julianojulio force-pushed the ae-task-22-typed-data-define-members-via-rbs-commen branch from 905e0e9 to 69efa7e Compare May 21, 2026 16:28
@julianojulio julianojulio force-pushed the ae-task-22-typed-data-define-members-via-rbs-commen branch from 69efa7e to bd746d6 Compare May 29, 2026 22:53
Add support for typing Data.define members by propagating types from
a sig on initialize to the member reader methods. This works with
both RBS #: comments and traditional sig { } blocks.

For the RBS case, a virtual initialize is synthesized from the #:
comment — zero runtime cost, no method defined:

  TypedPoint = Data.define(:x, :y) do
    #: (x: Integer, y: String) -> void
  end

For the Sorbet sig case, an explicit def initialize is required:

  TypedPoint = Data.define(:x, :y) do
    extend T::Sig
    sig { params(x: Integer, y: String).void }
    def initialize(x:, y:) = super
  end

Typed readers are only created when the initialize body is exactly
bare super, ensuring the sig types reliably match the stored values.
When the user transforms values (e.g. super(x: x.to_i)), readers
conservatively fall back to T.untyped.

Three pipeline components are changed:

CommentsAssociator: extracts orphan #: comments in Data.define
blocks and associates them with the Block node.

SigsRewriter: synthesizes a virtual def initialize with keyword
args matching the define members from the orphan RBS signature.

Data rewriter: when the block contains a sig preceding a bare-super
def initialize, extracts parameter types and creates typed reader
stubs with sig { returns(Type) }.

Based on the typed Data.define approach originally designed by
Cameron Bothner in sorbet#8079.

Co-authored-by: Cameron Bothner <cbothner@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
@julianojulio julianojulio force-pushed the ae-task-22-typed-data-define-members-via-rbs-commen branch from bd746d6 to d5a324a Compare May 29, 2026 23:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant