3.5. File Storage Considerations#

File storage requirements will vary considerably depending on the type of user activity on the sites, the number of courses and course copies hosted in the database, the number, and size of files uploaded to the courses and more.

Storage of static files, such as user generated files, generated reports, user profile photos, etc., are best managed with the use of a third-party service to host this information (example: Amazon S3 buckets). If this is not possible, alternatively they can also be managed within the local infrastructure using a local disk shared by the application machines, considering what that entails (mounting an additional disk in the NFS instance, constant monitoring, disk snapshots, among others).

Keep in mind that video files are considerably larger than any other file and require special treatment. Most often, video files will not be included in the file storage considerations as they will be handled with a specialized video delivery platform or service such as YouTube.

In general, the storage requirements can be split into two different groups.

Worker K8s nodes: regular worker nodes are required to download docker images with the code of the workloads they will host. These images tend to be 4GB in size and docker keeps a cache of the latest downloads. For this the server nodes should have a minimum of 40GB of storage in the root (/) partition, but a more generous 80GB allocation is recommended.

Mongo DB instances: course content is stored here, and this database tends to grow a little larger than the user data. The reason for this being the large amount of course assets that are often used for great content. The content datastore also keeps a record of current and previous versions of the course. We recommend 40 GB at the root (/) partition and 100GB to 200GB in the /edx/var/mongo/mongodb mounting point. This depends on how aggressively older versions of the content are pruned. Normally it needs to be done once a week.

At least one instance in the MongoDB cluster needs to be prepared with enough space for backup creation. The standard recommendation is 200GB mounted at /mnt.

MySQL instances: the user records in turn are smaller in comparison. 40GB to 80GB mounted at /var/lib/mysql is the recommended mark. 40 GB at the root (/) are standard for all servers. At this layer it is best to build amplitude right from the beginning to avoid costly upgrades with downtime when the traffic starts to scale up.

Auxiliary instances for Redis and Elasticsearch: this is optional for small and medium scale. Once a large scale is reached, extracting Redis and Elasticsearch to dedicated VMs would be in order and such servers should have the regular 40 GB at the root for operation and 20GB additional at the /mnt mounting point.