Feature/hubsot handler updates#12046
Conversation
| ("lastname", "VARCHAR", "Last name"), | ||
| ("phone", "VARCHAR", "Phone number"), | ||
| ("company", "VARCHAR", "Associated company"), | ||
| ("city", "VARCHAR", "City"), | ||
| ("website", "VARCHAR", "Website URL"), | ||
| ("createdate", "TIMESTAMP", "Creation date"), | ||
| ("lastmodifieddate", "TIMESTAMP", "Last modification date"), |
There was a problem hiding this comment.
Correctness: Add "city" to the searchable_columns set in _search_contacts_by_conditions call (line ~680). Without it, _build_hubspot_search_filters returns None for city-based WHERE conditions, forcing fallback to the slower get_all API instead of the optimized search API.
🤖 AI Agent Prompt for Cursor/Windsurf
📋 Copy this prompt to your AI coding assistant (Cursor, Windsurf, etc.) to get help fixing this issue
File: mindsdb/integrations/handlers/hubspot_handler/hubspot_tables.py
Lines: ~580-585 (in the `get_contacts` method)
Problem: The `city` column was added to the contacts schema (line 37), but it's missing from the searchable columns set used by `_build_hubspot_search_filters`. This causes city-based WHERE conditions to bypass the optimized search API and fall back to the slower `get_all` method.
Fix: Update the searchable columns set in the `get_contacts` method to include `"city"` (and verify if `"website"` should also be searchable). Locate the call to `_build_hubspot_search_filters` around line 580-585 and add `"city"` to the set: `{"email", "id", "firstname", "lastname", "phone", "company", "city", "website"}`.
✨ Committable Code Suggestion
💡 This is a one-click fix! Click "Commit suggestion" to apply this change directly to your branch.
| ("lastname", "VARCHAR", "Last name"), | |
| ("phone", "VARCHAR", "Phone number"), | |
| ("company", "VARCHAR", "Associated company"), | |
| ("city", "VARCHAR", "City"), | |
| ("website", "VARCHAR", "Website URL"), | |
| ("createdate", "TIMESTAMP", "Creation date"), | |
| ("lastmodifieddate", "TIMESTAMP", "Last modification date"), | |
| ("firstname", "VARCHAR", "First name"), | |
| ("lastname", "VARCHAR", "Last name"), | |
| ("phone", "VARCHAR", "Phone number"), | |
| ("company", "VARCHAR", "Associated company"), | |
| ("city", "VARCHAR", "City"), | |
| ("website", "VARCHAR", "Website URL"), | |
| ("createdate", "TIMESTAMP", "Creation date"), | |
| ("lastmodifieddate", "TIMESTAMP", "Last modification date"), |
| self.create_companies(company_data) | ||
|
|
||
| def modify(self, conditions: List[FilterCondition], values: Dict) -> None: | ||
| companies_df = pd.json_normalize(self.get_companies(limit=1000)) | ||
| normalized_conditions = _normalize_filter_conditions(conditions) | ||
| companies_df = pd.json_normalize(self.get_companies(limit=200, where_conditions=normalized_conditions)) | ||
|
|
||
| if companies_df.empty: | ||
| raise ValueError( | ||
| "No companies retrieved from HubSpot to evaluate update conditions. Verify your connection and permissions." | ||
| ) | ||
|
|
||
| normalized_conditions = _normalize_filter_conditions(conditions) | ||
| update_query_executor = UPDATEQueryExecutor(companies_df, normalized_conditions) | ||
| filtered_df = update_query_executor.execute_query() | ||
|
|
There was a problem hiding this comment.
Correctness: modify() passes limit=200 to get_companies(), truncating results before UPDATEQueryExecutor filters. Companies beyond 200 matching where_conditions are silently excluded from update. Remove the hardcoded limit or set limit=None to retrieve all matching records via pagination.
🤖 AI Agent Prompt for Cursor/Windsurf
📋 Copy this prompt to your AI coding assistant (Cursor, Windsurf, etc.) to get help fixing this issue
File: mindsdb/integrations/handlers/hubspot_handler/hubspot_tables.py
Lines: 223
Problem: The `modify()` method calls `get_companies(limit=200, where_conditions=normalized_conditions)` which artificially caps results at 200 records. When more than 200 companies match the filter conditions, only the first 200 will be retrieved and updated — the rest are silently ignored, causing data inconsistency.
Fix Instructions:
1. Remove the `limit=200` parameter from the `get_companies()` call on line 223
2. Let `get_companies()` retrieve all matching records (it already handles pagination internally via `_execute_hubspot_search` when conditions are provided)
3. The `UPDATEQueryExecutor` will then correctly filter the complete dataset
4. Verify the same issue doesn't exist in `remove()` method (line 246) and `ContactsTable.modify()` (line 434) and `DealsTable.modify()` (line 634) — apply the same fix if present
Expected result: All companies matching the WHERE conditions will be updated, regardless of count.
✨ Committable Code Suggestion
💡 This is a one-click fix! Click "Commit suggestion" to apply this change directly to your branch.
| self.create_companies(company_data) | |
| def modify(self, conditions: List[FilterCondition], values: Dict) -> None: | |
| companies_df = pd.json_normalize(self.get_companies(limit=1000)) | |
| normalized_conditions = _normalize_filter_conditions(conditions) | |
| companies_df = pd.json_normalize(self.get_companies(limit=200, where_conditions=normalized_conditions)) | |
| if companies_df.empty: | |
| raise ValueError( | |
| "No companies retrieved from HubSpot to evaluate update conditions. Verify your connection and permissions." | |
| ) | |
| normalized_conditions = _normalize_filter_conditions(conditions) | |
| update_query_executor = UPDATEQueryExecutor(companies_df, normalized_conditions) | |
| filtered_df = update_query_executor.execute_query() | |
| def add(self, company_data: List[dict]): | |
| self.create_companies(company_data) | |
| def modify(self, conditions: List[FilterCondition], values: Dict) -> None: | |
| normalized_conditions = _normalize_filter_conditions(conditions) | |
| companies_df = pd.json_normalize(self.get_companies(where_conditions=normalized_conditions)) | |
| if companies_df.empty: | |
| raise ValueError( | |
| "No companies retrieved from HubSpot to evaluate update conditions. Verify your connection and permissions." | |
| ) | |
| update_query_executor = UPDATEQueryExecutor(companies_df, normalized_conditions) | |
| filtered_df = update_query_executor.execute_query() | |
|
|
||
| def remove(self, conditions: List[FilterCondition]) -> None: | ||
| where_conditions = _normalize_filter_conditions(conditions) | ||
| contacts_df = pd.json_normalize(self.get_contacts(limit=1000, where_conditions=where_conditions)) | ||
| contacts_df = pd.json_normalize(self.get_contacts(limit=200, where_conditions=where_conditions)) | ||
|
|
||
| if contacts_df.empty: | ||
| raise ValueError( |
There was a problem hiding this comment.
Correctness: [hubspot_tables.py:467] remove() fetches only 200 contacts with limit=200. If >200 contacts match where_conditions, only the first 200 are deleted, silently leaving the rest. Remove the limit or fetch all matching records before filtering.
🤖 AI Agent Prompt for Cursor/Windsurf
📋 Copy this prompt to your AI coding assistant (Cursor, Windsurf, etc.) to get help fixing this issue
File: mindsdb/integrations/handlers/hubspot_handler/hubspot_tables.py
Lines: 474-476
Problem: The `remove()` method in `ContactsTable` class uses `limit=200` when fetching contacts to evaluate delete conditions. This causes incomplete deletions when more than 200 contacts match the WHERE clause - only the first 200 are deleted while the rest are silently ignored.
Fix Required:
1. Change `limit=200` to `limit=None` in the `get_contacts()` call on line 475
2. Apply the same fix to the `modify()` method on line 461 (same issue)
3. Review and fix identical patterns in `CompaniesTable.remove()` (line ~304) and `CompaniesTable.modify()` (line ~283)
4. Review and fix identical patterns in `DealsTable.remove()` and `DealsTable.modify()` methods
Rationale: When evaluating conditions for DELETE or UPDATE operations, we must retrieve ALL matching records, not just a subset. The limit should only apply to read operations (SELECT), not to condition evaluation for write operations. Using `limit=None` ensures complete data retrieval for accurate condition matching.
✨ Committable Code Suggestion
💡 This is a one-click fix! Click "Commit suggestion" to apply this change directly to your branch.
| def remove(self, conditions: List[FilterCondition]) -> None: | |
| where_conditions = _normalize_filter_conditions(conditions) | |
| contacts_df = pd.json_normalize(self.get_contacts(limit=1000, where_conditions=where_conditions)) | |
| contacts_df = pd.json_normalize(self.get_contacts(limit=200, where_conditions=where_conditions)) | |
| if contacts_df.empty: | |
| raise ValueError( | |
| self.update_contacts(contact_ids, values) | |
| def remove(self, conditions: List[FilterCondition]) -> None: | |
| where_conditions = _normalize_filter_conditions(conditions) | |
| contacts_df = pd.json_normalize(self.get_contacts(limit=None, where_conditions=where_conditions)) | |
| if contacts_df.empty: | |
| raise ValueError( |
| "lastname": properties.get("lastname"), | ||
| "phone": properties.get("phone"), | ||
| "company": properties.get("company"), | ||
| "city": properties.get("city"), | ||
| "website": properties.get("website"), | ||
| "createdate": properties.get("createdate"), | ||
| "lastmodifieddate": properties.get("lastmodifieddate"), |
There was a problem hiding this comment.
Correctness: [hubspot_tables.py:543] Add "city" to default_properties in get_contacts(). Without it, get_all() returns None for city while search API returns actual values, breaking data consistency.
🤖 AI Agent Prompt for Cursor/Windsurf
📋 Copy this prompt to your AI coding assistant (Cursor, Windsurf, etc.) to get help fixing this issue
File: mindsdb/integrations/handlers/hubspot_handler/hubspot_tables.py
Lines: 543-551
Problem: The `city` field was added to `_contact_to_dict()` method (line 589) but is missing from the `default_properties` list in `get_contacts()` method. This causes inconsistent behavior where contacts fetched via search API include city data, but contacts fetched via `get_all()` will have `city` as `None`.
Fix: Add `"city"` to the `default_properties` list in the `get_contacts()` method (around line 547) to ensure the city field is consistently fetched from HubSpot API regardless of the fetch method used.
✨ Committable Code Suggestion
💡 This is a one-click fix! Click "Commit suggestion" to apply this change directly to your branch.
| "lastname": properties.get("lastname"), | |
| "phone": properties.get("phone"), | |
| "company": properties.get("company"), | |
| "city": properties.get("city"), | |
| "website": properties.get("website"), | |
| "createdate": properties.get("createdate"), | |
| "lastmodifieddate": properties.get("lastmodifieddate"), | |
| "firstname": properties.get("firstname"), | |
| "lastname": properties.get("lastname"), | |
| "phone": properties.get("phone"), | |
| "company": properties.get("company"), | |
| "city": properties.get("city"), | |
| "website": properties.get("website"), | |
| "createdate": properties.get("createdate"), | |
| "lastmodifieddate": properties.get("lastmodifieddate"), |
MinuraPunchihewa
left a comment
There was a problem hiding this comment.
Hey @tino097,
Overall, this looks fine to me, but I am wondering if the three tables that we've added support for here are sufficient for answering the questions our customers are interested in. If not, I think we will need to add these tables in.
I also think it might be a good idea for you to rebase this branch against the release branch because there seem to be some rogue commits in there.
You will also need to check why the unit tests are failing.
|
@hamishfagg Do you have any idea why the tests are failing? Is it because this PR originates from a fork? @tino097 Do you have access to create a PR against MindsDB directly? If not, @hamishfagg can we grant him this access? I think we will need to deploy this PR to one of our environments and I am not sure this can be done via the fork? |
Co-authored-by: tino097 <konstantin.sivakov@gmail.com>
e38b7da to
3ded00e
Compare
|
Closed in favor of #12058 |
Description
HubSpot integration updates
(Please delete options that are not relevant)
Verification Process
To ensure the changes are working as expected:
Additional Media:
Checklist: