LLAMA3 on Replicate

Using LLAMA3 on Replicate with Foyle

What You’ll Learn

How to configure Foyle to use LLAMA3 hosted on Replicate

Prerequisites

You need a Replicate account

Setup Foyle To Use LLAMA3 on Replicate

Get an API Token from Replicate and save it to a file
Configure Foyle to use this API key

foyle config set replicate.apiKeyFile=/path/to/your/key/file

Configure Foyle to use LLAMA3 hosted on Replicate
Configure Foyle to use the appropriate model deployments

foyle config set  agent.model=meta/meta-llama-3-8b-instruct
foyle config set  agent.modelProvider=replicate

How It Works

Foyle uses 2 Models

A Chat model to generate completions
An embedding model to compute embeddings for RAG

Replicate provides hosted versions of meta/llama-3-8b-instruct and meta/llama-3-8b-instruct which are chat models. Notably, these models are kept warm so when you send predictions Replicate doesn’t need to boot up new instances. Replicate also provides an OpenAI proxy so you can use the OpenAI APIs to generate responses.

Unfortunately, Replicate doesn’t provide hosted, always warms versions of embedding models. So Foyle continues to use OpenAI for the embedding models.