Skip to content

Commit dc164d2

Browse files
author
Kraig Brockschmidt
authored
Merge pull request NuGet#555 from NuGet/master
Push to live
2 parents 1218860 + 1d3ee7e commit dc164d2

File tree

3 files changed

+213
-2
lines changed

3 files changed

+213
-2
lines changed

docs/API/catalog-resource.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,9 @@ The **catalog** is a resource that records all package operations on a package s
3232
> [!Note]
3333
> Because the catalog is not used by the official NuGet client, not all package sources implement the catalog.
3434
35+
> [!Note]
36+
> Currently, the nuget.org catalog is not available in China. For more details, see [NuGet/NuGetGallery#4949](https://github.com/NuGet/NuGetGallery/issues/4949).
37+
3538
## Versioning
3639

3740
The following `@type` value is used:
Lines changed: 207 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,207 @@
1+
---
2+
# required metadata 
3+
4+
title: Query for all packages published to nuget.org | Microsoft Docs
5+
author:
6+
- joelverhagen
7+
- kraigb
8+
ms.author:
9+
- joelverhagen
10+
- kraigb
11+
manager: skofman
12+
ms.date: 11/2/2017
13+
ms.topic: get-started-article
14+
ms.prod: nuget
15+
ms.technology: null
16+
ms.assetid: 5d017cd4-3d75-4341-ba90-3c57be093b7d
17+
18+
# optional metadata
19+
20+
description: Using the NuGet API, you can query for all packages published to nuget.org and stay up-to-date over time.
21+
keywords: NuGet API enumerate all packages, NuGet API replicate packages, latest packages published to nuget.org
22+
ms.reviewer:
23+
- karann
24+
- unniravindranathan
25+
26+
---
27+
28+
# Query for all packages published to nuget.org
29+
30+
One common query pattern on the legacy OData V2 API was enumerating all packages published to nuget.org, ordered by when
31+
the package was published. Scenarios requiring this kind of query against nuget.org vary widely:
32+
33+
- Replicating nuget.org entirely
34+
- Detecting when packages have new versions released
35+
- Finding packages that depend on your package
36+
37+
The legacy way of doing this typically depended on sorting the OData package entity by a timestamp and paging across
38+
the massive result set using `skip` and `top` (page size) parameters. Unfortunately, this approach has some drawbacks:
39+
40+
- Possibility of missing packages, since the queries are being made on data that is often changing order
41+
- Slow query response time, since the queries are not optimized (the most optimized queries are ones that support a
42+
mainline scenario for the official NuGet client)
43+
- Use of deprecated and undocumented API, meaning the support of such queries in the future is not guaranteed
44+
- Inability to replay history in the exact order that it transpired
45+
46+
For this reason, the following guide can be followed to address the aforementioned scenarios in a more reliable and
47+
future-proof way.
48+
49+
## Overview
50+
51+
At the center of this guide is resource in the [NuGet API](../../api/overview.md) called the **catalog**. The catalog
52+
is an append-only API that allows the caller to see a full history of packages added to, modified, and deleted from
53+
nuget.org. If you are interested in all or even a subset of packages published to nuget.org, the catalog is a great way
54+
to stay up-to-date with the set of currently available packages as time goes on.
55+
56+
This guide is intended to be a high-level walk-through but if you are interested in the fine-grain details of the
57+
catalog, see its [API reference document](../../api/catalog-resource.md).
58+
59+
The following steps can be implemented in any programming language of your choice. If you want a full running sample,
60+
take a look at the [C# sample](#c-sample-code) mentioned below.
61+
62+
Otherwise, follow the guide below to build a reliable catalog reader.
63+
64+
## Initialize a cursor
65+
66+
The first step in building a reliable catalog reader is implementing a cursor. For full details about the design of a
67+
catalog cursor, see the [catalog reference document](../../api/catalog-resource.md#cursor). In short, cursor is a
68+
point in time up to which you have processed events in the catalog. Events in the catalog represent package publishes
69+
and other package changes. If you care about all packages ever published to NuGet (since the beginning of time), you
70+
would initialize your cursor to a "minimum value" timestamp (e.g. `DateTime.MinValue` in .NET). If you care only about
71+
packages published starting now, you would use the current timestamp as your initial cursor value.
72+
73+
For this guide, we'll initialize our cursor to a timestamp one hour ago. For now, just save that timestamp in memory.
74+
75+
```cs
76+
DateTime cursor = DateTime.UtcNow.AddHours(-1);
77+
```
78+
79+
## Determine catalog index URL
80+
81+
The location of every resource (endpoint) in the NuGet API should be discovered using the
82+
[service index](../../api/service-index.md). Since this guide focuses on nuget.org, we'll be using nuget.org's service
83+
index.
84+
85+
```
86+
GET https://api.nuget.org/v3/index.json
87+
```
88+
89+
The service document is JSON document containing all of the resources on nuget.org. Look for the resource having the
90+
`@type` property value of `Catalog/3.0.0`. The associated `@id` property value is the URL to the catalog index itself.
91+
92+
## Find new catalog leaves
93+
94+
Using the `@id` property value found in the previous step, download the catalog index:
95+
96+
```
97+
GET https://api.nuget.org/v3/catalog0/index.json
98+
```
99+
100+
Deserialize the [catalog index](../../api/catalog-resource.md#catalog-index). Filter out all
101+
[catalog page objects](../../api/catalog-resource.md#catalog-page-object-in-the-index) with `commitTimeStamp` less than
102+
or equal to your current cursor value.
103+
104+
For each remaining catalog page, download the full document using the `@id` property.
105+
106+
```
107+
GET https://api.nuget.org/v3/catalog0/page2926.json
108+
```
109+
110+
Deserialize the [catalog page](../../api/catalog-resource.md#catalog-page). Filter out all
111+
[catalog leaf objects](../../api/catalog-resource.md#catalog-item-object-in-a-page) with `commitTimeStamp` less than
112+
or equal to your current cursor value.
113+
114+
After you have downloaded all of the catalog pages not filtered out, you will have a set of catalog leaf objects
115+
representing packages that have been published, unlisted, listed, or deleted in the time between your cursor timestamp
116+
and now.
117+
118+
## Process catalog leaves
119+
120+
At this point, you can perform any custom processing you'd like on the catalog items. If all you need is the ID and
121+
version of the package, you can inspect the `nuget:id` and `nuget:version` properties on the catalog item objects found
122+
in the pages. Make sure to look at the `@type` property to know if the catalog item concerns an existing package or a
123+
deleted package.
124+
125+
If you are interested in the metadata about the package (such at the description, dependencies, .nupkg size, etc), you
126+
can fetch the [catalog leaf document](../../api/catalog-resource.md#catalog-leaf) using the `@id` property.
127+
128+
```
129+
GET https://api.nuget.org/v3/catalog0/data/2015.02.01.11.18.40/windowsazure.storage.1.0.0.json
130+
```
131+
132+
This document has all of the metadata included in the
133+
[package metadata resource](../../api/registration-base-url-resource.md), and more!
134+
135+
This step is where you implement your custom logic. The other steps in this guide are implemented in pretty much the
136+
same way not matter what you are doing with the catalog leaves.
137+
138+
### Downloading the .nupkg
139+
140+
If you are interested in downloading the .nupkg's for packages found in the catalog, you can use the
141+
[package content resource](../../api/package-base-address-resource.md). However, note that there is a short delay
142+
between when a package is found in catalog and when it is available in the package content resource. Therefore, if
143+
you encounter `404 Not Found` when attempting to download a .nupkg for a package that you found in the catalog, simply
144+
retry a short time later. Fixing this delay is tracked by GitHub issue
145+
[NuGet/NuGetGallery#3455](https://github.com/NuGet/NuGetGallery/issues/3455).
146+
147+
## Move the cursor forward
148+
149+
Once you have successfully processed the catalog items, you need to determine the new cursor value to save. To do this,
150+
find the maximum (latest chronologically) `commitTimeStamp` of all catalog items that you processed. This is your new
151+
cursor value. Save it to some persistent store, like a database, file system, or blob storage. When you want to get more
152+
catalog items, simply start from the [first step](#initialize-a-cursor) by initializing your cursor value from this
153+
persistent store.
154+
155+
If your application throws an exception or faults, don't move the cursor forward. Moving the cursor forward has the
156+
meaning that you never again need to process catalog items before your cursor.
157+
158+
If, for some reason, you have a bug in how you process catalog leaves, you can simply move your cursor backward in time
159+
and allow your code to reprocess the old catalog items.
160+
161+
## C# sample code
162+
163+
Since the catalog is a set of JSON documents, it can be interacted with using any programming language that has
164+
an HTTP client and JSON deserializer.
165+
166+
For improved understanding and convenience, we have made a small sample available written in C# to demonstrate how to
167+
read from the catalog. The solution file requires Visual Studio 2017. The project is `netcoreapp2.0` depends on the
168+
[NuGet.Protocol 4.4.0](https://www.nuget.org/packages/NuGet.Protocol/4.4.0) (for resolving the service index) and
169+
[Newtonsoft.Json 9.0.1](https://www.nuget.org/packages/Newtonsoft.Json/9.0.1) (for JSON deserialization).
170+
171+
Simply clone the [NuGet/Samples](https://github.com/NuGet/Samples) repository on GitHub, restore, build, and run the
172+
project file under the `CatalogReaderExample` directory.
173+
174+
```
175+
git clone https://github.com/NuGet/Samples.git
176+
```
177+
178+
The main logic of the code is visible in the
179+
[`Program.cs`](https://github.com/NuGet/Samples/blob/master/CatalogReaderExample/CatalogReaderExample/Program.cs)
180+
file.
181+
182+
### Sample output
183+
184+
```
185+
No cursor found. Defaulting to 11/2/2017 9:41:28 PM.
186+
Fetched catalog index https://api.nuget.org/v3/catalog0/index.json.
187+
Fetched catalog page https://api.nuget.org/v3/catalog0/page2935.json.
188+
Processing 69 catalog leaves.
189+
11/2/2017 9:32:35 PM: DotVVM.Compiler.Light 1.1.7 (type is nuget:PackageDetails)
190+
11/2/2017 9:32:35 PM: Momentum.Pm.Api 5.12.181-beta (type is nuget:PackageDetails)
191+
11/2/2017 9:32:44 PM: Momentum.Pm.PortalApi 5.12.181-beta (type is nuget:PackageDetails)
192+
11/2/2017 9:35:14 PM: Genesys.Extensions.Standard 3.17.11.40 (type is nuget:PackageDetails)
193+
11/2/2017 9:35:14 PM: Genesys.Extensions.Core 3.17.11.40 (type is nuget:PackageDetails)
194+
11/2/2017 9:35:14 PM: Halforbit.DataStores.FileStores.Serialization.Bond 1.0.4 (type is nuget:PackageDetails)
195+
11/2/2017 9:35:14 PM: Halforbit.DataStores.FileStores.AmazonS3 1.0.4 (type is nuget:PackageDetails)
196+
11/2/2017 9:35:14 PM: Halforbit.DataStores.DocumentStores.DocumentDb 1.0.6 (type is nuget:PackageDetails)
197+
11/2/2017 9:35:14 PM: Halforbit.DataStores.FileStores.BlobStorage 1.0.5 (type is nuget:PackageDetails)
198+
...
199+
11/2/2017 10:23:54 PM: Cake.GitPackager 0.1.2 (type is nuget:PackageDetails)
200+
11/2/2017 10:23:54 PM: UtilPack.NuGet 2.0.0 (type is nuget:PackageDetails)
201+
11/2/2017 10:23:54 PM: UtilPack.NuGet.AssemblyLoading 2.0.0 (type is nuget:PackageDetails)
202+
11/2/2017 10:26:26 PM: UtilPack.NuGet.Deployment 2.0.0 (type is nuget:PackageDetails)
203+
11/2/2017 10:26:26 PM: UtilPack.NuGet.Common.MSBuild 2.0.0 (type is nuget:PackageDetails)
204+
11/2/2017 10:26:36 PM: InstaClient 1.0.2 (type is nuget:PackageDetails)
205+
11/2/2017 10:26:36 PM: SecureStrConvertor.VARUN_RUSIYA 1.0.0.5 (type is nuget:PackageDetails)
206+
Writing cursor value: 11/2/2017 10:26:36 PM.
207+
```

docs/TOC.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,12 @@
44
## [Use a Package](Quickstart/Use-a-Package.md)
55
# Guides
66
## [Install NuGet client tools](Guides/Install-NuGet.md)
7+
## [Create NET Standard Packages (Visual Studio 2017)](Guides/Create-NET-Standard-Packages-VS2017.md)
8+
## [Create NET Standard Packages (Visual Studio 2015)](Guides/Create-NET-Standard-Packages-VS2015.md)
79
## [Create UWP Packages](Guides/Create-UWP-Packages.md)
810
## [Creating UWP Controls as NuGet Packages](Guides/Create-UWP-Controls.md)
911
## [Create Cross-Platform Packages](Guides/Create-Cross-Platform-Packages.md)
10-
## [Create NET Standard Packages (Visual Studio 2017)](Guides/Create-NET-Standard-Packages-VS2017.md)
11-
## [Create NET Standard Packages (Visual Studio 2015)](Guides/Create-NET-Standard-Packages-VS2015.md)
12+
## [Query for all packages using the API](Guides/api/query-for-all-published-packages.md)
1213
# Create Packages
1314
## [Overview and Workflow](Create-Packages/Overview-and-Workflow.md)
1415
## [Creating a Package](Create-Packages/Creating-a-Package.md)

0 commit comments

Comments
 (0)