Skip to content

feat: gdrive export and encryption service integration#5250

Open
Sentiaus wants to merge 13 commits into
apache:mainfrom
Sentiaus:gdrive/backend
Open

feat: gdrive export and encryption service integration#5250
Sentiaus wants to merge 13 commits into
apache:mainfrom
Sentiaus:gdrive/backend

Conversation

@Sentiaus
Copy link
Copy Markdown
Contributor

What changes were proposed in this PR?

Adds the backend required for Google Drive OAuth integration.

Schema changes: Adds a new user_oauth_token table (sql/updates/23.sql) to store encrypted OAuth tokens per provider. The provider column (google_drive, etc.) is intentionally generic so future integrations (AWS, Microsoft) can reuse the same table without a schema change. The auth blob is stored as a JWE-encrypted JSON string rather than a raw token.

Token encryption: Adds TokenEncryptionService using jose4j AES-256-GCM (DIRECT key management) to encrypt auth blobs at rest. The encryption key is read from auth.encryption.256-bit-secret in auth.conf, with AUTH_ENCRYPTION_SECRET as the env-var override. This follows the same pattern as the existing JWT secret key.

New endpoints — GoogleDriveAuthResource:

GET /api/auth/google/drive/connect — Returns a Google OAuth authorization URL for the frontend to open in a popup. Accepts a reauth query param; when true, sets prompt=consent to force Google to re-issue a refresh token (used when a previous token has returned invalid_grant). Requires REGULAR or ADMIN role.

GET /api/auth/google/drive/callback — Called by Google's OAuth redirect. Not role-gated (no Authorization header is present on a browser redirect). Authenticates the user via a short-lived JWT in the state query parameter, exchanges the code for tokens, encrypts the auth blob, and upserts into user_oauth_token.

GET /api/auth/google/drive/token — Decrypts the stored auth blob, uses the refresh token to fetch a short-lived access token from Google, and returns it to the frontend. Returns no_refresh_token if no record exists, or invalid_grant if Google rejects the refresh token. Requires REGULAR or ADMIN role.

GET /api/auth/google/config — Exposes clientId and redirectUri to the frontend so the Drive service doesn't need to hardcode them.

Config: Adds google.client-id, google.client-secret, and app-domain to UserSystemConfig and user-system.conf. These must be configured on the Texera GCP project before Drive integration will work.

Any related issues, documentation, discussions?

Closes #4240 (partial — frontend in follow-up PRs)

Google Documentation to enable Google Picker: https://developers.google.com/workspace/drive/picker/guides/overview

How was this PR tested?

  • sbt "Auth/testOnly org.apache.texera.auth.TokenEncryptionServiceSpec" — 2 unit tests covering encrypt/decrypt round-trip and invalid-input error case
  • Backend compiles cleanly: sbt amber/compile
  • The /callback endpoint was tested manually via the full OAuth flow in a local dev environment

Was this PR authored or co-authored using generative AI tooling?

Commit messages and some implementation co-authored with Claude Sonnet 4.6

Sentiaus and others added 3 commits May 26, 2026 23:59
… DB schema

- Add user_oauth_token table to store encrypted OAuth refresh tokens per provider
- Add TokenEncryptionService using jose4j AES-256-GCM for encrypting auth blobs
- Add AuthConfig.encryptionSecretKey reading from auth.encryption.256-bit-secret
- Add GoogleDriveAuthResource with /connect, /callback, and /token endpoints
- Add GoogleAuthResource config endpoint exposing client ID and redirect URI
- Add DriveTokenIssueResponse and GoogleAuthConfigResponse HTTP models
- Wire GoogleDriveAuthResource into TexeraWebApplication and GuestAuthFilter
- Add google.client-id, client-secret, and app-domain to UserSystemConfig
- Update k8s values with new config keys

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nd error case

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 49.40%. Comparing base (34be37d) to head (6a4aa12).

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #5250      +/-   ##
============================================
- Coverage     49.62%   49.40%   -0.23%     
+ Complexity     2384     2381       -3     
============================================
  Files          1051     1051              
  Lines         40399    40399              
  Branches       4292     4292              
============================================
- Hits          20050    19960      -90     
- Misses        19165    19260      +95     
+ Partials       1184     1179       -5     
Flag Coverage Δ *Carryforward flag
access-control-service 41.89% <ø> (ø)
agent-service 33.76% <ø> (ø) Carriedforward from e650919
amber 51.63% <ø> (-0.03%) ⬇️ Carriedforward from e650919
computing-unit-managing-service 0.00% <ø> (ø)
config-service 0.00% <ø> (ø)
file-service 38.42% <ø> (ø)
frontend 41.71% <ø> (-0.52%) ⬇️ Carriedforward from e650919
python 90.80% <ø> (ø) Carriedforward from e650919
workflow-compiling-service 56.81% <ø> (ø)

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Sentiaus Sentiaus changed the title feat: GDrive Backend integration feat: gdrive export and encryption service integration May 27, 2026
@Sentiaus Sentiaus mentioned this pull request May 27, 2026
5 tasks
@chenlica chenlica requested a review from xuang7 May 28, 2026 00:12
Copy link
Copy Markdown
Contributor

@xuang7 xuang7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Left a few comments. Please follow the formatting instructions in the
contributing guide and fix the formatting issues.

Comment thread amber/src/main/scala/org/apache/texera/web/auth/GuestAuthFilter.scala Outdated
@QueryParam("reauth") @DefaultValue("false") reauth: Boolean
): Response = {
val user = sessionUser.getUser
val state = JwtAuth.jwtToken(jwtClaims(user, TOKEN_EXPIRE_TIME_IN_MINUTES))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid using the normal session JWT as the OAuth state. Since it is still a valid login token before expiration, it may be safer to use a dedicated short-lived OAuth state token instead.

Comment thread bin/k8s/values-development.yaml Outdated
Comment thread bin/k8s/values.yaml Outdated
Comment thread common/config/src/main/resources/auth.conf Outdated
Comment thread common/config/src/main/scala/org/apache/texera/config/AuthConfig.scala Outdated
Comment thread sql/updates/23.sql

try {
val blob = mapper.readTree(TokenEncryptionService.decrypt(record.getAuthBlob))
val refreshToken = blob.get("refreshToken").asText()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest using path("refreshToken").asText("") here to avoid a possible NPE when the field is missing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xuang7 I can add this, but seeing as this is wrapped in a try-catch, I feel like the error is fine/more defined, compared to getting "", sending a request to google and getting an error there.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I agree we should not silently send an empty refresh token to Google. Maybe we can use path("refreshToken").asText("") to avoid the possible NPE, then explicitly check whether the token is empty and return no_refresh_token locally before making the Google request.

Sentiaus and others added 2 commits May 28, 2026 00:31
…ogleDriveAuthResource

OAuth state is now a UUID stored in a ConcurrentHashMap with a 10-minute TTL,
consumed exactly once on callback. Removes JwtParser/JwtAuth dependency from
the Drive resource and avoids encoding user info in the callback URL.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot removed the dev label May 28, 2026
xuang7 and others added 5 commits May 28, 2026 13:32
Removed random secret key for eSecretKey
Added default asText("") to avoid NPE
…_token

- Add DELETE /api/auth/google/drive/disconnect to remove stored OAuth token
- Add created_at and updated_at columns to user_oauth_token table
- Set updated_at on token refresh in callback

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@xuang7 xuang7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall! Left one comment. Could you add a short reply to each resolved previous comment so we can better track the update history? Also, please add the new table to sql/texera_ddl.sql. That should unblock the CI failure. @Sentiaus

Could you help review this PR as well? @aicam

.filter(r => r.getProvider == PROVIDER_GOOGLE_DRIVE)
.findFirst()

if (existing.isPresent) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disconnect currently only removes the local DB row, but it does not revoke the grant on Google’s side. Would it make sense to also revoke the token with Google before deleting the local row? That would better match the expected “Disconnect” behavior.

xuang7 and others added 2 commits May 30, 2026 21:56
@chenlica
Copy link
Copy Markdown
Contributor

chenlica commented Jun 1, 2026

@xuang7 Before the merge, I would like to have another member to review this PR.

@carloea2 Can you do it?

@Sentiaus In the description, can you also provide an overview of the design?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common ddl-change Changes to the TexeraDB DDL engine

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Export to external storage

4 participants