|
| 1 | +--- |
| 2 | +# required metadata |
| 3 | + |
| 4 | +title: Query for all packages published to nuget.org | Microsoft Docs |
| 5 | +author: |
| 6 | +- joelverhagen |
| 7 | +- kraigb |
| 8 | +ms.author: |
| 9 | +- joelverhagen |
| 10 | +- kraigb |
| 11 | +manager: skofman |
| 12 | +ms.date: 11/2/2017 |
| 13 | +ms.topic: get-started-article |
| 14 | +ms.prod: nuget |
| 15 | +ms.technology: null |
| 16 | +ms.assetid: 5d017cd4-3d75-4341-ba90-3c57be093b7d |
| 17 | + |
| 18 | +# optional metadata |
| 19 | + |
| 20 | +description: Using the NuGet API, you can query for all packages published to nuget.org and stay up-to-date over time. |
| 21 | +keywords: NuGet API enumerate all packages, NuGet API replicate packages, latest packages published to nuget.org |
| 22 | +ms.reviewer: |
| 23 | +- karann |
| 24 | +- unniravindranathan |
| 25 | + |
| 26 | +--- |
| 27 | + |
| 28 | +# Query for all packages published to nuget.org |
| 29 | + |
| 30 | +One common query pattern on the legacy OData V2 API was enumerating all packages published to nuget.org, ordered by when |
| 31 | +the package was published. Scenarios requiring this kind of query against nuget.org vary widely: |
| 32 | + |
| 33 | +- Replicating nuget.org entirely |
| 34 | +- Detecting when packages have new versions released |
| 35 | +- Finding packages that depend on your package |
| 36 | + |
| 37 | +The legacy way of doing this typically depended on sorting the OData package entity by a timestamp and paging across |
| 38 | +the massive result set using `skip` and `top` (page size) parameters. Unfortunately, this approach has some drawbacks: |
| 39 | + |
| 40 | +- Possibility of missing packages, since the queries are being made on data that is often changing order |
| 41 | +- Slow query response time, since the queries are not optimized (the most optimized queries are ones that support a |
| 42 | + mainline scenario for the official NuGet client) |
| 43 | +- Use of deprecated and undocumented API, meaning the support of such queries in the future is not guaranteed |
| 44 | +- Inability to replay history in the exact order that it transpired |
| 45 | + |
| 46 | +For this reason, the following guide can be followed to address the aforementioned scenarios in a more reliable and |
| 47 | +future-proof way. |
| 48 | + |
| 49 | +## Overview |
| 50 | + |
| 51 | +At the center of this guide is resource in the [NuGet API](../../api/overview.md) called the **catalog**. The catalog |
| 52 | +is an append-only API that allows the caller to see a full history of packages added to, modified, and deleted from |
| 53 | +nuget.org. If you are interested in all or even a subset of packages published to nuget.org, the catalog is a great way |
| 54 | +to stay up-to-date with the set of currently available packages as time goes on. |
| 55 | + |
| 56 | +This guide is intended to be a high-level walk-through but if you are interested in the fine-grain details of the |
| 57 | +catalog, see its [API reference document](../../api/catalog-resource.md). |
| 58 | + |
| 59 | +The following steps can be implemented in any programming language of your choice. If you want a full running sample, |
| 60 | +take a look at the [C# sample](#c-sample-code) mentioned below. |
| 61 | + |
| 62 | +Otherwise, follow the guide below to build a reliable catalog reader. |
| 63 | + |
| 64 | +## Initialize a cursor |
| 65 | + |
| 66 | +The first step in building a reliable catalog reader is implementing a cursor. For full details about the design of a |
| 67 | +catalog cursor, see the [catalog reference document](../../api/catalog-resource.md#cursor). In short, cursor is a |
| 68 | +point in time up to which you have processed events in the catalog. Events in the catalog represent package publishes |
| 69 | +and other package changes. If you care about all packages ever published to NuGet (since the beginning of time), you |
| 70 | +would initialize your cursor to a "minimum value" timestamp (e.g. `DateTime.MinValue` in .NET). If you care only about |
| 71 | +packages published starting now, you would use the current timestamp as your initial cursor value. |
| 72 | + |
| 73 | +For this guide, we'll initialize our cursor to a timestamp one hour ago. For now, just save that timestamp in memory. |
| 74 | + |
| 75 | +```cs |
| 76 | +DateTime cursor = DateTime.UtcNow.AddHours(-1); |
| 77 | +``` |
| 78 | + |
| 79 | +## Determine catalog index URL |
| 80 | + |
| 81 | +The location of every resource (endpoint) in the NuGet API should be discovered using the |
| 82 | +[service index](../../api/service-index.md). Since this guide focuses on nuget.org, we'll be using nuget.org's service |
| 83 | +index. |
| 84 | + |
| 85 | +``` |
| 86 | +GET https://api.nuget.org/v3/index.json |
| 87 | +``` |
| 88 | + |
| 89 | +The service document is JSON document containing all of the resources on nuget.org. Look for the resource having the |
| 90 | +`@type` property value of `Catalog/3.0.0`. The associated `@id` property value is the URL to the catalog index itself. |
| 91 | + |
| 92 | +## Find new catalog leaves |
| 93 | + |
| 94 | +Using the `@id` property value found in the previous step, download the catalog index: |
| 95 | + |
| 96 | +``` |
| 97 | +GET https://api.nuget.org/v3/catalog0/index.json |
| 98 | +``` |
| 99 | + |
| 100 | +Deserialize the [catalog index](../../api/catalog-resource.md#catalog-index). Filter out all |
| 101 | +[catalog page objects](../../api/catalog-resource.md#catalog-page-object-in-the-index) with `commitTimeStamp` less than |
| 102 | +or equal to your current cursor value. |
| 103 | + |
| 104 | +For each remaining catalog page, download the full document using the `@id` property. |
| 105 | + |
| 106 | +``` |
| 107 | +GET https://api.nuget.org/v3/catalog0/page2926.json |
| 108 | +``` |
| 109 | + |
| 110 | +Deserialize the [catalog page](../../api/catalog-resource.md#catalog-page). Filter out all |
| 111 | +[catalog leaf objects](../../api/catalog-resource.md#catalog-item-object-in-a-page) with `commitTimeStamp` less than |
| 112 | +or equal to your current cursor value. |
| 113 | + |
| 114 | +After you have downloaded all of the catalog pages not filtered out, you will have a set of catalog leaf objects |
| 115 | +representing packages that have been published, unlisted, listed, or deleted in the time between your cursor timestamp |
| 116 | +and now. |
| 117 | + |
| 118 | +## Process catalog leaves |
| 119 | + |
| 120 | +At this point, you can perform any custom processing you'd like on the catalog items. If all you need is the ID and |
| 121 | +version of the package, you can inspect the `nuget:id` and `nuget:version` properties on the catalog item objects found |
| 122 | +in the pages. Make sure to look at the `@type` property to know if the catalog item concerns an existing package or a |
| 123 | +deleted package. |
| 124 | + |
| 125 | +If you are interested in the metadata about the package (such at the description, dependencies, .nupkg size, etc), you |
| 126 | +can fetch the [catalog leaf document](../../api/catalog-resource.md#catalog-leaf) using the `@id` property. |
| 127 | + |
| 128 | +``` |
| 129 | +GET https://api.nuget.org/v3/catalog0/data/2015.02.01.11.18.40/windowsazure.storage.1.0.0.json |
| 130 | +``` |
| 131 | + |
| 132 | +This document has all of the metadata included in the |
| 133 | +[package metadata resource](../../api/registration-base-url-resource.md), and more! |
| 134 | + |
| 135 | +This step is where you implement your custom logic. The other steps in this guide are implemented in pretty much the |
| 136 | +same way not matter what you are doing with the catalog leaves. |
| 137 | + |
| 138 | +### Downloading the .nupkg |
| 139 | + |
| 140 | +If you are interested in downloading the .nupkg's for packages found in the catalog, you can use the |
| 141 | +[package content resource](../../api/package-base-address-resource.md). However, note that there is a short delay |
| 142 | +between when a package is found in catalog and when it is available in the package content resource. Therefore, if |
| 143 | +you encounter `404 Not Found` when attempting to download a .nupkg for a package that you found in the catalog, simply |
| 144 | +retry a short time later. Fixing this delay is tracked by GitHub issue |
| 145 | +[NuGet/NuGetGallery#3455](https://github.com/NuGet/NuGetGallery/issues/3455). |
| 146 | + |
| 147 | +## Move the cursor forward |
| 148 | + |
| 149 | +Once you have successfully processed the catalog items, you need to determine the new cursor value to save. To do this, |
| 150 | +find the maximum (latest chronologically) `commitTimeStamp` of all catalog items that you processed. This is your new |
| 151 | +cursor value. Save it to some persistent store, like a database, file system, or blob storage. When you want to get more |
| 152 | +catalog items, simply start from the [first step](#initialize-a-cursor) by initializing your cursor value from this |
| 153 | +persistent store. |
| 154 | + |
| 155 | +If your application throws an exception or faults, don't move the cursor forward. Moving the cursor forward has the |
| 156 | +meaning that you never again need to process catalog items before your cursor. |
| 157 | + |
| 158 | +If, for some reason, you have a bug in how you process catalog leaves, you can simply move your cursor backward in time |
| 159 | +and allow your code to reprocess the old catalog items. |
| 160 | + |
| 161 | +## C# sample code |
| 162 | + |
| 163 | +Since the catalog is a set of JSON documents, it can be interacted with using any programming language that has |
| 164 | +an HTTP client and JSON deserializer. |
| 165 | + |
| 166 | +For improved understanding and convenience, we have made a small sample available written in C# to demonstrate how to |
| 167 | +read from the catalog. The solution file requires Visual Studio 2017. The project is `netcoreapp2.0` depends on the |
| 168 | +[NuGet.Protocol 4.4.0](https://www.nuget.org/packages/NuGet.Protocol/4.4.0) (for resolving the service index) and |
| 169 | +[Newtonsoft.Json 9.0.1](https://www.nuget.org/packages/Newtonsoft.Json/9.0.1) (for JSON deserialization). |
| 170 | + |
| 171 | +Simply clone the [NuGet/Samples](https://github.com/NuGet/Samples) repository on GitHub, restore, build, and run the |
| 172 | +project file under the `CatalogReaderExample` directory. |
| 173 | + |
| 174 | +``` |
| 175 | +git clone https://github.com/NuGet/Samples.git |
| 176 | +``` |
| 177 | + |
| 178 | +The main logic of the code is visible in the |
| 179 | +[`Program.cs`](https://github.com/NuGet/Samples/blob/master/CatalogReaderExample/CatalogReaderExample/Program.cs) |
| 180 | +file. |
| 181 | + |
| 182 | +### Sample output |
| 183 | + |
| 184 | +``` |
| 185 | +No cursor found. Defaulting to 11/2/2017 9:41:28 PM. |
| 186 | +Fetched catalog index https://api.nuget.org/v3/catalog0/index.json. |
| 187 | +Fetched catalog page https://api.nuget.org/v3/catalog0/page2935.json. |
| 188 | +Processing 69 catalog leaves. |
| 189 | +11/2/2017 9:32:35 PM: DotVVM.Compiler.Light 1.1.7 (type is nuget:PackageDetails) |
| 190 | +11/2/2017 9:32:35 PM: Momentum.Pm.Api 5.12.181-beta (type is nuget:PackageDetails) |
| 191 | +11/2/2017 9:32:44 PM: Momentum.Pm.PortalApi 5.12.181-beta (type is nuget:PackageDetails) |
| 192 | +11/2/2017 9:35:14 PM: Genesys.Extensions.Standard 3.17.11.40 (type is nuget:PackageDetails) |
| 193 | +11/2/2017 9:35:14 PM: Genesys.Extensions.Core 3.17.11.40 (type is nuget:PackageDetails) |
| 194 | +11/2/2017 9:35:14 PM: Halforbit.DataStores.FileStores.Serialization.Bond 1.0.4 (type is nuget:PackageDetails) |
| 195 | +11/2/2017 9:35:14 PM: Halforbit.DataStores.FileStores.AmazonS3 1.0.4 (type is nuget:PackageDetails) |
| 196 | +11/2/2017 9:35:14 PM: Halforbit.DataStores.DocumentStores.DocumentDb 1.0.6 (type is nuget:PackageDetails) |
| 197 | +11/2/2017 9:35:14 PM: Halforbit.DataStores.FileStores.BlobStorage 1.0.5 (type is nuget:PackageDetails) |
| 198 | +... |
| 199 | +11/2/2017 10:23:54 PM: Cake.GitPackager 0.1.2 (type is nuget:PackageDetails) |
| 200 | +11/2/2017 10:23:54 PM: UtilPack.NuGet 2.0.0 (type is nuget:PackageDetails) |
| 201 | +11/2/2017 10:23:54 PM: UtilPack.NuGet.AssemblyLoading 2.0.0 (type is nuget:PackageDetails) |
| 202 | +11/2/2017 10:26:26 PM: UtilPack.NuGet.Deployment 2.0.0 (type is nuget:PackageDetails) |
| 203 | +11/2/2017 10:26:26 PM: UtilPack.NuGet.Common.MSBuild 2.0.0 (type is nuget:PackageDetails) |
| 204 | +11/2/2017 10:26:36 PM: InstaClient 1.0.2 (type is nuget:PackageDetails) |
| 205 | +11/2/2017 10:26:36 PM: SecureStrConvertor.VARUN_RUSIYA 1.0.0.5 (type is nuget:PackageDetails) |
| 206 | +Writing cursor value: 11/2/2017 10:26:36 PM. |
| 207 | +``` |
0 commit comments