You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: supporting-blog-content/homecraft-vertex/README.md
+16-16Lines changed: 16 additions & 16 deletions
Original file line number
Diff line number
Diff line change
@@ -12,6 +12,8 @@ This repo shows how to leverage Elastic search capabilities (both text and vecto
12
12
13
13
## Configuration steps
14
14
15
+
!!! NEW !!! Now available a detailed step-by-step walkthrough to implement this repo [here](https://github.com/valerioarvizzigno/homecraft_vertex_lab) (also usable for external workshops)
16
+
15
17
1. Setup your Elastic cluster with ML nodes
16
18
17
19
2. Install python on your local machine. If using Homebew on macOS simply use
4. (Optional) If step 3 is followed, activate your virtual env. Check herehttps://docs.python.org/3/tutorial/venv.html commands depending on your OS. For Unix or macOS use
31
+
4. (Optional) If step 3 is followed, activate your virtual env. Check [here](https://docs.python.org/3/tutorial/venv.html) to check commands depending on your OS. For Unix or macOS use
9. Authenticate the VertexAI SDK (it has been installed with requirements.txt). More info herehttps://googleapis.dev/python/google-api-core/latest/auth.html
62
+
9. Authenticate the VertexAI SDK (it has been installed with requirements.txt). More info [here](https://googleapis.dev/python/google-api-core/latest/auth.html)
61
63
62
64
```bash
63
65
gcloud auth application-default login
64
66
```
65
67
66
-
10. Load the all-distillroberta-v1(https://huggingface.co/sentence-transformers/all-distilroberta-v1) ML model in you Elastic cluster via Eland client and start it. To run Eland client you need docker installed. An easy way to accomplish this step without python/docker installation is via Google's Cloud Shell.
68
+
10. Load the [all-distillroberta-v1](https://huggingface.co/sentence-transformers/all-distilroberta-v1) ML model in you Elastic cluster via Eland client and start it. To run Eland client you need docker installed. An easy way to accomplish this step without python/docker installation is via Google's Cloud Shell.
67
69
68
70
```bash
69
71
git clone https://github.com/elastic/eland.git
@@ -72,10 +74,7 @@ cd eland/
72
74
73
75
docker build -t elastic/eland .
74
76
75
-
docker run -it --rm elastic/eland eland_import_hub_model
11. Index general data from a retailer website (I used https://www.ikea.com/gb/en/) with Elastic Enterprise Search's webcrawler and give the index the "search-homecraft-ikea" name (for immediate compatibility with this repo code, otherwise change the index references in all homecraft_*.py files). For better crawling performance search the sitemap.xml file inside the robots.txt file of the target webserver, and add its path to the Site Maps tab. Set a custom ingest pipeline, named "ml-inference-title-vector", working directly at crawling time, to enrich crawled documents with dense vectors. Use the previously loaded ML model for inference on the "title" field as source, and set "title-vector" as target field for dense vectors.
@@ -98,7 +97,7 @@ POST search-homecraft-ikea/_mapping
98
97
99
98
13. Start crawling.
100
99
101
-
14. Index the Home Depot products dataset(https://www.kaggle.com/datasets/thedevastator/the-home-depot-products-dataset) into elastic.
100
+
14. Index the Home Depot [products dataset](https://www.kaggle.com/datasets/thedevastator/the-home-depot-products-dataset) into elastic.
102
101
103
102
15. Create a new empty index that will host the dense vectors called "home-depot-product-catalog-vector" (for immediate compatibility with this repo code, otherwise change the index references in all homecraft_*.py files) and specify mappings.
104
103
@@ -135,23 +134,24 @@ POST _reindex
135
134
136
135
17. Leverage the BigQuery to Elasticsearch Dataflow's [native integration](https://www.elastic.co/blog/ingest-data-directly-from-google-bigquery-into-elastic-using-google-dataflow) to move a [sample e-commerce dataset](https://console.cloud.google.com/marketplace/product/bigquery-public-data/thelook-ecommerce?project=elastic-sa) into Elastic. Take a look ad tables available in this dataset withih BigQuery explorer UI. Copy the ID of the "Order_items" table and create a new Dataflow job to move data from this BQ table to an index named "bigquery-thelook-order-items". You need to create an API key on the Elastic cluster and pass it along with Elastic cluster's cloud_id, user and pass to the job config. This new index will be used for retrieving user orders.
137
136
138
-
18.Clone this repo in your project folder.
137
+
18.Set up the environment variables cloud_id (the elastic CloudID - find it on the Elastic admin console), cloud_pass and cloud_user (Elastic deployments's user details) and gcp_project_id (the GCP project you're working in). This variables are used inside of the app code to reference the correct systems to communicate with (Elastic cluster and VertexAI API in your GCP project)
19.Set up the environment variables cloud_id, cloud_pass, cloud_user (Elastic deployment) and gcp_project_id (the GCP project you're working in)
146
+
19.Fine-tune text-bison@001 via VertexAI fine-tuning feature, using the fine-tuning/fine_tuning_dataset.jsonl file. This will instruct the model in advertizing partner network when specific questions are asked. For more information about fine-tuning look at [these docs](https://cloud.google.com/vertex-ai/docs/generative-ai/models/tune-models#generative-ai-tune-model-python)
145
147
146
-
20. Fine-tune text-bison@001 via VertexAI fine-tuning feature, using the fine-tuning/fine_tuning_dataset.jsonl file. This will instruct the model in advertizing partner network when specific questions are asked. For more information about fine-tuning look at https://cloud.google.com/vertex-ai/docs/generative-ai/models/tune-models#generative-ai-tune-model-python
0 commit comments