Skip to content

Commit 477d39b

Browse files
Update homecraft-vertex README.md - fixes (elastic#109)
1 parent 3a04d8a commit 477d39b

File tree

1 file changed

+16
-16
lines changed
  • supporting-blog-content/homecraft-vertex

1 file changed

+16
-16
lines changed

supporting-blog-content/homecraft-vertex/README.md

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@ This repo shows how to leverage Elastic search capabilities (both text and vecto
1212

1313
## Configuration steps
1414

15+
!!! NEW !!! Now available a detailed step-by-step walkthrough to implement this repo [here](https://github.com/valerioarvizzigno/homecraft_vertex_lab) (also usable for external workshops)
16+
1517
1. Setup your Elastic cluster with ML nodes
1618

1719
2. Install python on your local machine. If using Homebew on macOS simply use
@@ -26,7 +28,7 @@ brew install [email protected]
2628
python -m venv homecraftenv
2729
```
2830

29-
4. (Optional) If step 3 is followed, activate your virtual env. Check here https://docs.python.org/3/tutorial/venv.html commands depending on your OS. For Unix or macOS use
31+
4. (Optional) If step 3 is followed, activate your virtual env. Check [here](https://docs.python.org/3/tutorial/venv.html) to check commands depending on your OS. For Unix or macOS use
3032

3133
```bash
3234
source homecraftenv/bin/activate
@@ -44,7 +46,7 @@ git clone https://github.com/valerioarvizzigno/homecraft_vertex.git
4446
pip install -r requirements.txt
4547
```
4648

47-
7. Install gcloud SDK. It is needed to connect to VertexAI APIs. (https://cloud.google.com/sdk/docs/install-sdk)
49+
7. Install gcloud SDK. It is needed to connect to VertexAI APIs. [docs here](https://cloud.google.com/sdk/docs/install-sdk)
4850
Follow the instructions at the link depending on your OS. If using Homebrew on macOS you can simply install it with
4951

5052
```bash
@@ -57,13 +59,13 @@ brew install --cask google-cloud-sdk
5759
gcloud init
5860
```
5961

60-
9. Authenticate the VertexAI SDK (it has been installed with requirements.txt). More info here https://googleapis.dev/python/google-api-core/latest/auth.html
62+
9. Authenticate the VertexAI SDK (it has been installed with requirements.txt). More info [here](https://googleapis.dev/python/google-api-core/latest/auth.html)
6163

6264
```bash
6365
gcloud auth application-default login
6466
```
6567

66-
10. Load the all-distillroberta-v1 (https://huggingface.co/sentence-transformers/all-distilroberta-v1) ML model in you Elastic cluster via Eland client and start it. To run Eland client you need docker installed. An easy way to accomplish this step without python/docker installation is via Google's Cloud Shell.
68+
10. Load the [all-distillroberta-v1](https://huggingface.co/sentence-transformers/all-distilroberta-v1) ML model in you Elastic cluster via Eland client and start it. To run Eland client you need docker installed. An easy way to accomplish this step without python/docker installation is via Google's Cloud Shell.
6769

6870
```bash
6971
git clone https://github.com/elastic/eland.git
@@ -72,10 +74,7 @@ cd eland/
7274

7375
docker build -t elastic/eland .
7476

75-
docker run -it --rm elastic/eland eland_import_hub_model
76-
--url https://<elastic_user>:<elastic_password>@<your_elastic_endpoint>:9243/
77-
--hub-model-id sentence-transformers/all-distilroberta-v1
78-
--start
77+
docker run -it --rm elastic/eland eland_import_hub_model --url https://<elastic_user>:<elastic_password>@<your_elastic_endpoint>:9243/ --hub-model-id sentence-transformers/all-distilroberta-v1 --start
7978
```
8079

8180
11. Index general data from a retailer website (I used https://www.ikea.com/gb/en/) with Elastic Enterprise Search's webcrawler and give the index the "search-homecraft-ikea" name (for immediate compatibility with this repo code, otherwise change the index references in all homecraft_*.py files). For better crawling performance search the sitemap.xml file inside the robots.txt file of the target webserver, and add its path to the Site Maps tab. Set a custom ingest pipeline, named "ml-inference-title-vector", working directly at crawling time, to enrich crawled documents with dense vectors. Use the previously loaded ML model for inference on the "title" field as source, and set "title-vector" as target field for dense vectors.
@@ -98,7 +97,7 @@ POST search-homecraft-ikea/_mapping
9897

9998
13. Start crawling.
10099

101-
14. Index the Home Depot products dataset (https://www.kaggle.com/datasets/thedevastator/the-home-depot-products-dataset) into elastic.
100+
14. Index the Home Depot [products dataset](https://www.kaggle.com/datasets/thedevastator/the-home-depot-products-dataset) into elastic.
102101

103102
15. Create a new empty index that will host the dense vectors called "home-depot-product-catalog-vector" (for immediate compatibility with this repo code, otherwise change the index references in all homecraft_*.py files) and specify mappings.
104103

@@ -135,23 +134,24 @@ POST _reindex
135134

136135
17. Leverage the BigQuery to Elasticsearch Dataflow's [native integration](https://www.elastic.co/blog/ingest-data-directly-from-google-bigquery-into-elastic-using-google-dataflow) to move a [sample e-commerce dataset](https://console.cloud.google.com/marketplace/product/bigquery-public-data/thelook-ecommerce?project=elastic-sa) into Elastic. Take a look ad tables available in this dataset withih BigQuery explorer UI. Copy the ID of the "Order_items" table and create a new Dataflow job to move data from this BQ table to an index named "bigquery-thelook-order-items". You need to create an API key on the Elastic cluster and pass it along with Elastic cluster's cloud_id, user and pass to the job config. This new index will be used for retrieving user orders.
137136

138-
18. Clone this repo in your project folder.
137+
18. Set up the environment variables cloud_id (the elastic CloudID - find it on the Elastic admin console), cloud_pass and cloud_user (Elastic deployments's user details) and gcp_project_id (the GCP project you're working in). This variables are used inside of the app code to reference the correct systems to communicate with (Elastic cluster and VertexAI API in your GCP project)
139138

140139
```bash
141-
git clone https://github.com/valerioarvizzigno/homecraft_vertex.git
140+
export cloud_id='<replaceHereYourElasticCloudID>'
141+
export cloud_user='elastic'
142+
export cloud_pass='<replaceHereYourElasticDeploymentPassword>'
143+
export gcp_project_id='<replaceHereTheGCPProjectID>'
142144
```
143145

144-
19. Set up the environment variables cloud_id, cloud_pass, cloud_user (Elastic deployment) and gcp_project_id (the GCP project you're working in)
146+
19. Fine-tune text-bison@001 via VertexAI fine-tuning feature, using the fine-tuning/fine_tuning_dataset.jsonl file. This will instruct the model in advertizing partner network when specific questions are asked. For more information about fine-tuning look at [these docs](https://cloud.google.com/vertex-ai/docs/generative-ai/models/tune-models#generative-ai-tune-model-python)
145147

146-
20. Fine-tune text-bison@001 via VertexAI fine-tuning feature, using the fine-tuning/fine_tuning_dataset.jsonl file. This will instruct the model in advertizing partner network when specific questions are asked. For more information about fine-tuning look at https://cloud.google.com/vertex-ai/docs/generative-ai/models/tune-models#generative-ai-tune-model-python
147-
148-
21. Run streamlit app
148+
20. Run streamlit app
149149

150150
```bash
151151
streamlit run homecraft_home.py
152152
```
153153

154-
154+
## Sample questions
155155

156156
---USE THE HOME PAGE FOR BASE DEMO---
157157

0 commit comments

Comments
 (0)