Detailed sizing

Content Intelligence Installation Guide

Version
2.1.x
2.0.x
Audience
anonymous
Part Number
MK-HCI002-15
If you are installing HCI to run Hitachi Content Search, you should size your system based on the number of documents you need to index and the rate at which you need documents to be processed and indexed.
Important: This sizing guide details the resources required for a system with a single Index Protection Level (IPL). To scale your system accordingly, you will need to double the recommended values to accommodate IPL 2, triple the recommended values to accommodate IPL 3, etc.

To determine the system size that you need:

  1. Determine how many documents you need to index.
  2. Based on the number of documents you want to index, use the following tables to determine:
    • How many instances you need
    • How much RAM each instance needs
    • The Index service configuration needed to support indexing the number of documents you want
    Total documents to be indexed System configuration
    15 million 25 million 50 milliona

    Total instances required: 1b

    Instances running the Index service: 1

    Index service configuration required:

    • Shards per index: 1
    • Index Protection Level per index: 1
    • Container memory: 200MB greater than Heap settings
    • Heap settings: Depends on instance RAM.
      Instance RAM Heap setting
      16 GB 1800m
      32 GB 9800m
      64 GB 25800m
    16 GB 32 GB 64 GB
    Instance RAM needed (for each instance running the Index service)

    a Contact Hitachi Vantara for guidance before trying to index this many documents on this number of instances. At this scale, your documents and required configuration settings can greatly affect the number of documents you can index.

    b Single-instance systems are suitable for testing and development, but not for production use.

    Total documents to be indexed System configuration
    45 million 75 million 150 milliona

    Total instances required: 4

    Instances running the Index service: 3

    Index service configuration required:

    • Shards per index: 3
    • Index Protection Level per index: 1
    • Container memory: 200MB greater than Heap settings
    • Heap settings: Depends on instance RAM.
      Instance RAM Heap setting
      16 GB 1800m
      32 GB 9800m
      64 GB 25800m
    16 GB 32 GB 64 GB
    Instance RAM needed (for each instance running the Index service)

    a Contact Hitachi Vantara for guidance before trying to index this many documents on this number of instances. At this scale, your documents and required configuration settings can greatly affect the number of documents you can index.

    Total documents to be indexed System configuration
    75 million 125 million 250 milliona

    Total instances required: 8

    Instances running the Index service: 5

    Index service configuration required:

    • Shards per index: 5
    • Index Protection Level per index: 1
    • Container memory: 200MB greater than Heap settings
    • Heapb settings: Depends on instance RAM.
      Instance RAM Heap setting
      16 GB 7800m
      32 GB 15800m
      64 GB 31000m
    16 GB 32 GB 64 GB
    Instance RAM needed (for each instance running the Index service)

    a Contact Hitachi Vantara for guidance before trying to index this many documents on this number of instances. At this scale, your documents and required configuration settings can greatly affect the number of documents you can index.

    b With an 8-instance system, the Index service should be the only service running on each of its 5 instances. With the Index service isolated this way, you can allocate more heap space to the service than you can on a single or 4-instance system.

    Total documents to be indexed System configuration
    195 million 325 million 650 milliona

    Total instances required: 16

    Instances running the Index service: 13

    Index service configuration required:

    • Shards per index: 13
    • Index Protection Level per index: 1
    • Container memory: 200MB greater than Heap settings
    • Heapb settings: Depends on instance RAM.
      Instance RAM Heap setting
      16 GB 7800m
      32 GB 15800m
      64 GB 31000m
    16 GB 32 GB 64 GB
    Instance RAM needed (for each instance running the Index service)

    a Contact Hitachi Vantara for guidance before trying to index this many documents on this number of instances. At this scale, your documents and required configuration settings can greatly affect the number of documents you can index.

    b With a 16-instance system, the Index service should be the only service running on each of its 13 instances. With the Index service isolated this way, you can allocate more heap space to the service than you can on a single or 4-instance system.

    For example, if you need to index up to 150 million documents, you need at minimum a 4-instance system with 64 GB RAM per instance.

  3. Determine how fast you need to index documents, in documents per second.

    For example:

    • To index 100 million documents in 2 days, you need an indexing rate of 578 documents per second.
    • To continuously index 1 million documents every day, you need an indexing rate of 12 documents per second.
  4. Determine the base indexing rate for your particular dataset and processing pipelines:
    1. Install a single-instance HCI system with that has the minimum required hardware resources.
    2. Run a workflow with the pipelines you want and on a representative subset of your data.
    3. Use the workflow task details to determine the rate of documents processed per second.
  5. To determine the number of cores you need per instance, replace Base rate in this table with the rate you determined in step 4.
    Number of instances you need Cores per instance
    4 (minimum required) 8 (recommended)
    1 Base rate 70% Base rate
    4 300% Base rate 500% Base rate
    8 600% Base rate 900% Base rate
    More than 8 Contact Hitachi Vantara for guidance

    For example, if you had previously determined that:

    • You need a 4-instance system.
    • You need to process 500 documents per second.
    • The base processing rate for your data and pipelines is 100 documents per second.

    You need 8 cores per instance.

  6. Multiply the number of instances you need times the number of cores per instances to determine the total number of cores that you need for your system.
  7. After your system is installed, configure it with the index settings you determined in step 2.

    For information on index shards, Index Protection Level, and moving the Index service, see the Administrator Help, which is available from the Admin App.