LLAMA3 on Replicate

Using LLAMA3 on Replicate with Foyle

What You’ll Learn

How to configure Foyle to use LLAMA3 hosted on Replicate

Prerequisites

  1. You need a Replicate account

Setup Foyle To Use LLAMA3 on Replicate

  1. Get an API Token from Replicate and save it to a file

  2. Configure Foyle to use this API key

foyle config set replicate.apiKeyFile=/path/to/your/key/file
  1. Configure Foyle to use LLAMA3 hosted on Replicate

  2. Configure Foyle to use the appropriate model deployments

foyle config set  agent.model=meta/meta-llama-3-8b-instruct
foyle config set  agent.modelProvider=replicate                

How It Works

Foyle uses 2 Models

  • A Chat model to generate completions
  • An embedding model to compute embeddings for RAG

Replicate provides hosted versions of meta/llama-3-8b-instruct and meta/llama-3-8b-instruct which are chat models. Notably, these models are kept warm so when you send predictions Replicate doesn’t need to boot up new instances. Replicate also provides an OpenAI proxy so you can use the OpenAI APIs to generate responses.

Unfortunately, Replicate doesn’t provide hosted, always warms versions of embedding models. So Foyle continues to use OpenAI for the embedding models.


Last modified July 8, 2024: Support using Replicate (#163) (aa23b80)