Skip to content

Feature/hubsot handler updates#12046

Closed
tino097 wants to merge 5 commits into
mindsdb:releases/25.14.0from
tino097:feature/hubsot-handler-updates
Closed

Feature/hubsot handler updates#12046
tino097 wants to merge 5 commits into
mindsdb:releases/25.14.0from
tino097:feature/hubsot-handler-updates

Conversation

@tino097

@tino097 tino097 commented Dec 30, 2025

Copy link
Copy Markdown
Contributor

Description

HubSpot integration updates

(Please delete options that are not relevant)

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • ⚡ New feature (non-breaking change which adds functionality)
  • 📢 Breaking change (fix or feature that would cause existing functionality not to work as expected)
  • 📄 This change requires a documentation update

Verification Process

To ensure the changes are working as expected:

  • Test Location: Specify the URL or path for testing.
  • Verification Steps: Outline the steps or queries needed to validate the change. Include any data, configurations, or actions required to reproduce or see the new functionality.

Additional Media:

  • I have attached a brief loom video or screenshots showcasing the new functionality or change.

Checklist:

  • My code follows the style guidelines(PEP 8) of MindsDB.
  • I have appropriately commented on my code, especially in complex areas.
  • Necessary documentation updates are either made or tracked in issues.
  • Relevant unit and integration tests are updated or added.

Comment on lines 34 to 40
("lastname", "VARCHAR", "Last name"),
("phone", "VARCHAR", "Phone number"),
("company", "VARCHAR", "Associated company"),
("city", "VARCHAR", "City"),
("website", "VARCHAR", "Website URL"),
("createdate", "TIMESTAMP", "Creation date"),
("lastmodifieddate", "TIMESTAMP", "Last modification date"),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness: Add "city" to the searchable_columns set in _search_contacts_by_conditions call (line ~680). Without it, _build_hubspot_search_filters returns None for city-based WHERE conditions, forcing fallback to the slower get_all API instead of the optimized search API.

🤖 AI Agent Prompt for Cursor/Windsurf

📋 Copy this prompt to your AI coding assistant (Cursor, Windsurf, etc.) to get help fixing this issue

File: mindsdb/integrations/handlers/hubspot_handler/hubspot_tables.py
Lines: ~580-585 (in the `get_contacts` method)

Problem: The `city` column was added to the contacts schema (line 37), but it's missing from the searchable columns set used by `_build_hubspot_search_filters`. This causes city-based WHERE conditions to bypass the optimized search API and fall back to the slower `get_all` method.

Fix: Update the searchable columns set in the `get_contacts` method to include `"city"` (and verify if `"website"` should also be searchable). Locate the call to `_build_hubspot_search_filters` around line 580-585 and add `"city"` to the set: `{"email", "id", "firstname", "lastname", "phone", "company", "city", "website"}`.
✨ Committable Code Suggestion

💡 This is a one-click fix! Click "Commit suggestion" to apply this change directly to your branch.

Suggested change
("lastname", "VARCHAR", "Last name"),
("phone", "VARCHAR", "Phone number"),
("company", "VARCHAR", "Associated company"),
("city", "VARCHAR", "City"),
("website", "VARCHAR", "Website URL"),
("createdate", "TIMESTAMP", "Creation date"),
("lastmodifieddate", "TIMESTAMP", "Last modification date"),
("firstname", "VARCHAR", "First name"),
("lastname", "VARCHAR", "Last name"),
("phone", "VARCHAR", "Phone number"),
("company", "VARCHAR", "Associated company"),
("city", "VARCHAR", "City"),
("website", "VARCHAR", "Website URL"),
("createdate", "TIMESTAMP", "Creation date"),
("lastmodifieddate", "TIMESTAMP", "Last modification date"),

Comment on lines 220 to 233
self.create_companies(company_data)

def modify(self, conditions: List[FilterCondition], values: Dict) -> None:
companies_df = pd.json_normalize(self.get_companies(limit=1000))
normalized_conditions = _normalize_filter_conditions(conditions)
companies_df = pd.json_normalize(self.get_companies(limit=200, where_conditions=normalized_conditions))

if companies_df.empty:
raise ValueError(
"No companies retrieved from HubSpot to evaluate update conditions. Verify your connection and permissions."
)

normalized_conditions = _normalize_filter_conditions(conditions)
update_query_executor = UPDATEQueryExecutor(companies_df, normalized_conditions)
filtered_df = update_query_executor.execute_query()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness: modify() passes limit=200 to get_companies(), truncating results before UPDATEQueryExecutor filters. Companies beyond 200 matching where_conditions are silently excluded from update. Remove the hardcoded limit or set limit=None to retrieve all matching records via pagination.

🤖 AI Agent Prompt for Cursor/Windsurf

📋 Copy this prompt to your AI coding assistant (Cursor, Windsurf, etc.) to get help fixing this issue

File: mindsdb/integrations/handlers/hubspot_handler/hubspot_tables.py
Lines: 223

Problem: The `modify()` method calls `get_companies(limit=200, where_conditions=normalized_conditions)` which artificially caps results at 200 records. When more than 200 companies match the filter conditions, only the first 200 will be retrieved and updated — the rest are silently ignored, causing data inconsistency.

Fix Instructions:
1. Remove the `limit=200` parameter from the `get_companies()` call on line 223
2. Let `get_companies()` retrieve all matching records (it already handles pagination internally via `_execute_hubspot_search` when conditions are provided)
3. The `UPDATEQueryExecutor` will then correctly filter the complete dataset
4. Verify the same issue doesn't exist in `remove()` method (line 246) and `ContactsTable.modify()` (line 434) and `DealsTable.modify()` (line 634) — apply the same fix if present

Expected result: All companies matching the WHERE conditions will be updated, regardless of count.
✨ Committable Code Suggestion

💡 This is a one-click fix! Click "Commit suggestion" to apply this change directly to your branch.

Suggested change
self.create_companies(company_data)
def modify(self, conditions: List[FilterCondition], values: Dict) -> None:
companies_df = pd.json_normalize(self.get_companies(limit=1000))
normalized_conditions = _normalize_filter_conditions(conditions)
companies_df = pd.json_normalize(self.get_companies(limit=200, where_conditions=normalized_conditions))
if companies_df.empty:
raise ValueError(
"No companies retrieved from HubSpot to evaluate update conditions. Verify your connection and permissions."
)
normalized_conditions = _normalize_filter_conditions(conditions)
update_query_executor = UPDATEQueryExecutor(companies_df, normalized_conditions)
filtered_df = update_query_executor.execute_query()
def add(self, company_data: List[dict]):
self.create_companies(company_data)
def modify(self, conditions: List[FilterCondition], values: Dict) -> None:
normalized_conditions = _normalize_filter_conditions(conditions)
companies_df = pd.json_normalize(self.get_companies(where_conditions=normalized_conditions))
if companies_df.empty:
raise ValueError(
"No companies retrieved from HubSpot to evaluate update conditions. Verify your connection and permissions."
)
update_query_executor = UPDATEQueryExecutor(companies_df, normalized_conditions)
filtered_df = update_query_executor.execute_query()

Comment on lines 474 to 480

def remove(self, conditions: List[FilterCondition]) -> None:
where_conditions = _normalize_filter_conditions(conditions)
contacts_df = pd.json_normalize(self.get_contacts(limit=1000, where_conditions=where_conditions))
contacts_df = pd.json_normalize(self.get_contacts(limit=200, where_conditions=where_conditions))

if contacts_df.empty:
raise ValueError(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness: [hubspot_tables.py:467] remove() fetches only 200 contacts with limit=200. If >200 contacts match where_conditions, only the first 200 are deleted, silently leaving the rest. Remove the limit or fetch all matching records before filtering.

🤖 AI Agent Prompt for Cursor/Windsurf

📋 Copy this prompt to your AI coding assistant (Cursor, Windsurf, etc.) to get help fixing this issue

File: mindsdb/integrations/handlers/hubspot_handler/hubspot_tables.py
Lines: 474-476

Problem: The `remove()` method in `ContactsTable` class uses `limit=200` when fetching contacts to evaluate delete conditions. This causes incomplete deletions when more than 200 contacts match the WHERE clause - only the first 200 are deleted while the rest are silently ignored.

Fix Required:
1. Change `limit=200` to `limit=None` in the `get_contacts()` call on line 475
2. Apply the same fix to the `modify()` method on line 461 (same issue)
3. Review and fix identical patterns in `CompaniesTable.remove()` (line ~304) and `CompaniesTable.modify()` (line ~283)
4. Review and fix identical patterns in `DealsTable.remove()` and `DealsTable.modify()` methods

Rationale: When evaluating conditions for DELETE or UPDATE operations, we must retrieve ALL matching records, not just a subset. The limit should only apply to read operations (SELECT), not to condition evaluation for write operations. Using `limit=None` ensures complete data retrieval for accurate condition matching.
✨ Committable Code Suggestion

💡 This is a one-click fix! Click "Commit suggestion" to apply this change directly to your branch.

Suggested change
def remove(self, conditions: List[FilterCondition]) -> None:
where_conditions = _normalize_filter_conditions(conditions)
contacts_df = pd.json_normalize(self.get_contacts(limit=1000, where_conditions=where_conditions))
contacts_df = pd.json_normalize(self.get_contacts(limit=200, where_conditions=where_conditions))
if contacts_df.empty:
raise ValueError(
self.update_contacts(contact_ids, values)
def remove(self, conditions: List[FilterCondition]) -> None:
where_conditions = _normalize_filter_conditions(conditions)
contacts_df = pd.json_normalize(self.get_contacts(limit=None, where_conditions=where_conditions))
if contacts_df.empty:
raise ValueError(

Comment on lines 589 to 595
"lastname": properties.get("lastname"),
"phone": properties.get("phone"),
"company": properties.get("company"),
"city": properties.get("city"),
"website": properties.get("website"),
"createdate": properties.get("createdate"),
"lastmodifieddate": properties.get("lastmodifieddate"),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness: [hubspot_tables.py:543] Add "city" to default_properties in get_contacts(). Without it, get_all() returns None for city while search API returns actual values, breaking data consistency.

🤖 AI Agent Prompt for Cursor/Windsurf

📋 Copy this prompt to your AI coding assistant (Cursor, Windsurf, etc.) to get help fixing this issue

File: mindsdb/integrations/handlers/hubspot_handler/hubspot_tables.py
Lines: 543-551

Problem: The `city` field was added to `_contact_to_dict()` method (line 589) but is missing from the `default_properties` list in `get_contacts()` method. This causes inconsistent behavior where contacts fetched via search API include city data, but contacts fetched via `get_all()` will have `city` as `None`.

Fix: Add `"city"` to the `default_properties` list in the `get_contacts()` method (around line 547) to ensure the city field is consistently fetched from HubSpot API regardless of the fetch method used.
✨ Committable Code Suggestion

💡 This is a one-click fix! Click "Commit suggestion" to apply this change directly to your branch.

Suggested change
"lastname": properties.get("lastname"),
"phone": properties.get("phone"),
"company": properties.get("company"),
"city": properties.get("city"),
"website": properties.get("website"),
"createdate": properties.get("createdate"),
"lastmodifieddate": properties.get("lastmodifieddate"),
"firstname": properties.get("firstname"),
"lastname": properties.get("lastname"),
"phone": properties.get("phone"),
"company": properties.get("company"),
"city": properties.get("city"),
"website": properties.get("website"),
"createdate": properties.get("createdate"),
"lastmodifieddate": properties.get("lastmodifieddate"),

@MinuraPunchihewa MinuraPunchihewa left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @tino097,
Overall, this looks fine to me, but I am wondering if the three tables that we've added support for here are sufficient for answering the questions our customers are interested in. If not, I think we will need to add these tables in.

I also think it might be a good idea for you to rebase this branch against the release branch because there seem to be some rogue commits in there.

You will also need to check why the unit tests are failing.

@MinuraPunchihewa

Copy link
Copy Markdown
Contributor

@hamishfagg Do you have any idea why the tests are failing? Is it because this PR originates from a fork?

@tino097 Do you have access to create a PR against MindsDB directly? If not, @hamishfagg can we grant him this access? I think we will need to deploy this PR to one of our environments and I am not sure this can be done via the fork?

@tino097 tino097 force-pushed the feature/hubsot-handler-updates branch from e38b7da to 3ded00e Compare January 7, 2026 13:20
@tino097 tino097 changed the base branch from releases/26.0.0 to releases/25.13.1 January 7, 2026 13:23
@tino097 tino097 changed the base branch from releases/25.13.1 to releases/25.14.0 January 7, 2026 16:04
@tino097 tino097 closed this Jan 7, 2026
@github-actions github-actions Bot locked and limited conversation to collaborators Jan 7, 2026
@tino097

tino097 commented Jan 7, 2026

Copy link
Copy Markdown
Contributor Author

Closed in favor of #12058

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants