The Ultimate Manual To Box Skills

Considering artificial intelligence and machine learning feed on copious amounts of data, it was only a matter of time that Box got on board to utilize AI and ML algorithms to evolve and adapt to the ever-changing market needs.

With a vision to bring intelligence to all your content, the leading cloud-based content management service launched Box Skills. This unique framework applies best-in-class AI technologies from renowned providers to all data stored in your Box account.

In turn, this allows you to create structures and gather in-depth insights from your data at scale, making it easier for teams to work and find and work with content, accelerate and automate business processes, and reduce security and compliance risks. Or, as Box likes to put it: “With Box Skills, you can unleash the full potential of your content.”

In this Nira guide, we’ll discuss Box Skills in more detail and how you can use the innovative offering to ensure better outcomes.

What are Box Skills Anyway?

Box Skills are small bits of code that operate on all folders stored in the Box platform. Once you apply them to a particular folder, a Skill will analyze every file placed into that folder before writing the output of its analysis as metadata on the Box file.

Let’s understand this by discussing a practical example.

Suppose you decide to apply an audio skill to a folder. After placing an audio file in that Box folder, the Box Skill will add a transcript to that audio file automatically. Once that’s done, you can review the file in the Box Web App to use the transcript, while the metadata drives other Box functionalities (e.g., search).

In other words, Box Skills are functions that pass a file in Box to a machine learning provider for processing and then structure the output from the ML algorithm to store as metadata on that particular file in Box. Whenever a skill is triggered, it uses a momentary API token to carry out all these steps.

Developers can write Box Skill applications using the Box Skills Kit. However, the primary Box admin will have to enable the Box Skill and configure the folders upon which the Box Skill can operate before you can use it.

Although you can deploy and execute Box Skills using serverless platforms like Google Cloud Functions, AWS Lambda, or Microsoft Azure Functions, you can use any cloud infrastructure you like. It’s your call.

How Box Skills Work

The Box Skills framework has four crucial components:

  • Trigger: This component causes the Box Skills to execute. Actions like uploading, copying, or moving files into a folder configured for that particular Box Skill and other similar activities come under this category.
  • Event Pump: This component is responsible for sending notifications about activity in Box. When triggered, the event pump will notify the Box Skill application about the action.
  • Box Skills Application: This component processes the incoming event from the Box platform before retrieving the files from Box. It then processes the file by utilizing a third-party service and writes back to Box Skills metadata.
  • Box Skills Metadata: This component consists of pre-defined, global templates that present all extracted information about the file directly in the Box Web application.

Whenever a trigger occurs, Box Skills get executed automatically. These codes use an event payload from Box to apply metadata to the respective files after processing them.

Understanding Box Skills Metadata

The metadata is displayed directly in the Box Web application in the right-hand sidebar whenever you preview a file in Box that a Skill has processed. Interestingly, there are a series of predesigned cards that visualize the metadata, too.

These cards present common outputs from third-party AI or ML services. They’re dynamically generated by the Box Skill application, which can write to a pre-built, globally available metadata template. Here are the four pre-designed cards:

  • Topic Card: This card displays a list of keywords (dance, labels) with appropriate timestamps on a media file. It’s also optional.
  • Faces Card: This card displays a list of images (faces) with appropriate timestamps on a media file. It’s also optional.
  • Transcript Card: This card presents a transcript with corresponding timestamps on a media file, but can also store text, sans the timestamps.
  • Status Card: This card presents the statuses, along with any errors, in the Box Skill application.

Any Box Skill application can use a combination of the above cards, including presenting multiple instances of the same card. The idea is to write the metadata as accurately as possible in Box.

Example #1: eBrevia for Contracts Analysis

Box is a favorite with legal departments that use its services to store and manage contracts securely and in an organized manner.

You can analyze contracts, like customer contracts, employment agreements, patent agreements, and vendor agreements using eBrevia, which offers powerful pre-trained natural language processing algorithms. These algorithms can help you extract critical information from documents, including contract terms, renewal dates, and pricing.

With the Box Skills Kit, you can create a custom skill that analyzes all your contracts to extract relevant information to help your legal team efficiently manage critical paperwork.

Example #2: Google Cloud AutoML for Custom Image Recognition

Box Skills Kit is incredibly useful to label objects in image files that you store in Box. This can be anything from stock photos to screenshots you plan on using on your company website. Plus, images are one of the more popular content types as everybody appreciates some eye candy.

Keeping this in mind, Cloud AutoML lets you build and train custom ML models to create a computer vision model that recognizes particular objects and entities unique to your business. Think company logo, sub-brands, or products. You can connect this model to Box using the Box Skills Kit, which will then automatically label your images stored in Box with particular objects and entities.

Example #3: IBM Watson NLU for Document Insights

AI and ML can be particularly useful to understand unstructured data. With natural language understanding, you can refine and understand and extract the text at an entirely different level, including identifying concepts, emotions, entities, keywords, and so much more.

IBM’s Watson Natural Language Understanding comes with powerful algorithms that enable you to identify and label the above features, sometimes even correlating multiple documents to determine any similarities. With the Box Skills Kit in hand, you can use Watson NLU algorithms to gain a better understanding of text-based files.

Example #4: VoiceBase for Speech Analytics

Audio analysis through AI/ML has endless possibilities. Converting speech to text is just scratching the surface.

VoiceBase is an excellent example of this. In addition to speech-to-text transcription, the service can identify specific keywords and topic patterns that may appear in a transcript, and even label PII, curse words, PCI, and SSN. VoiceBase’s innovative algorithm can also add “predictions” to deduce the outcome of an audio recording, such as whether the caller scheduled an appointment for a meeting, and so on.

What’s more, you can develop a custom vocabulary to improve results, allowing the service to identify business-specific phrases or words. Box Skills Kit lets you connect with VoiceBase to process all your audio content stored in Box to extract valuable insights.

Example #5: Acuant for Identification Verification

You’ll need your new employees’ government-issued identification documents like a driver’s license or passport for employee onboarding. As these documents have a sensitive nature and are stored in image format, many businesses use Box to store them securely.

The real challenge, however, is verifying these ID images after pulling information off of images.

Acuant has been developed with machine learning-based algorithms that allow it to analyze images of government-issued identification documents to give you data to structure content. You can set up a Custom Skill to retrieve images of IDs whenever an employee or customer uploads them to Box, followed by verifying the document and extracting information off of the image to store as metadata.

Then the metadata will organize these images based on specific fields, (e.g., gender, demographics), or pass it to another system, like a CRM or HCM system, through the Box API.

How to Get Started With Box Skills


Let’s deep dive into how you, as a Box admin, can enable, manage, and maintain your enterprise. Here’s what an end-to-end process of a Skills application looks like:

Step 1: Set Up a Custom Skill

You have to create a Custom Skills box application that keeps an eye on all files that are uploaded within the enterprise—or in one or more folders.

To do this, you have to first login to the Developer Console, create and name a custom skill, and then get it approved and configured. It’s an elaborate process, so be prepared for it.

Step 2: Configure the invocation_url

Your Box Skill will send a remote URL every time you upload, copy, or move a file into the selected folder. This specific URL is what’s known as the invocation URL.

Your invocation URL can be any HTTP endpoint representing a development machine, server, or serverless function and has a hard requirement that the URL must be publicly available and accessible by Box servers.

Here, after creating the Custom Skills app, you have to get the invocation_url configured. This is essential since the URL will be called every time a new file gets uploaded to Box.

Step 3: Analyze the Event Payload

Whenever a Box file is uploaded, copied, or moved into a folder for which a Box Skill is enabled, an event payload is sent to the invocation_url. This payload will contain two Access Tokens that can be used to access the uploaded file in Box and store metadata back onto the file.

In other words, the event payload contains all data required to read in the content of the uploaded file before sending it to a processing system (third-party machine learning system), and to, of course, write the metadata back to the file once processing is done.

Step 4: Forward the Selected File for Processing

The same service that handles the Box Skill payload is responsible for sending the file URL or file content to an external service for processing. This can be an in-house service or a third-party machine learning system.

This is also where you must be aware of the four main categories of community samples skills: Document Skills, Image Skills, Videos Skills, and Audio Skills, which are used for processing document files, image files, video files, and audio files on Box, respectively.

Step 5: Store the Metadata Securely on the File

Once the third-party machine learning system (or any other external processing service) extracts the metadata for the file, the collected insights should be stored back on the uploaded file as custom metadata.

The whole step involves preparing the Skill Cards metadata and then writing it to the file.

To prepare the Skills metadata, you have to familiarize yourself with boxSkillsCards. It’s a globally available metadata template that follows a predetermined format for the JSON structure. You can store it on the associated files.

Box supports four kinds of cards at the moment:

  • Keyword: This Skill card contains a list of keywords next to the file.
  • Timeline: This Skill card contains a set of text or images. When you click on a Timeline card, you can find out when the images or text will appear in a timeline.
  • Transcript: This Skill card presents a transcript, along with its corresponding timestamps.
  • Status: This Skill card shows user status. You can use it to learn the status of the Skill while it processes the file.

To write the cards to a file, use the POST /files/:id/metadata/global/boxSkillsCards API. Pass it along with the list of Box Skill cards once you’re done.

Note: If Box Skill cards have already been applied to a file, you’ll need to update it. Use the POST /files/:id/metadata/global/boxSkillsCards API to do this since it accepts all the operations needed to perform. You can use each of these operations to replace a card at a particular position (path).