Scaling Audio Files | Limit Point

Discussion on how to use the Accelerate framework to speed up or slow down the rate of play of an audio file.

ScaleAudio

The associated Xcode project implements a SwiftUI app for macOS and iOS that presents a list of audio files included in the bundle resources subdirectory ‘Audio Files’.

Add your own audio files or use the sample set provided.

Each file in the list has an adjacent button to either play or scale the audio.

Select the scale factor from a slider.

Classes

The project is comprised of:

ScaleAudioApp : The App that displays a list of audio files in the project.
ScaleAudioObservable : An ObservableObject that manages the user interaction to scale and play audio files in the list.
ScaleAudio : The AVFoundation code that reads, scales and writes audio files.

1. ScaleAudioApp

The app displays of list of audio files in the reference folder ‘Audio Files’.

You can add your own files to the list.

Files are represented by a File object which stores its URL location:

struct File: Identifiable {
    var url:URL
    var id = UUID()
}

The files are presented in a FileTableView using List:

List(scaleAudioObservable.files) {
    FileTableViewRowView(file: $0, scaleAudioObservable: scaleAudioObservable)
}

Each FileTableViewRowView of the table displays the audio file name and a Button to play and scale it:

HStack {
    Text("\(file.url.lastPathComponent) [\(file.duration)]")
    
    Button("Play", action: {
        scaleAudioObservable.playAudioURL(file.url)
    })
        .buttonStyle(BorderlessButtonStyle()) // need this or tapping one invokes both actions
    
    Button("Scale", action: {
        scaleAudioObservable.scaleAudioURL(url: file.url)
    })
        .buttonStyle(BorderlessButtonStyle())
}

The scale factor is chosen from a Slider to scale an audio duration from 0.1 to 2 times the original duration:

Slider(
    value: $scaleAudioObservable.factor,
    in: 0.1...2
)

If audio has multiple channels an option to process a single merged channel rather than multiple interleaved channels is available with a Toggle:

Toggle(isOn: $scaleAudioObservable.singleChannel) {
    Text("Single Channel")
}

Both iOS and macOS store the generated file in the Documents folder.

On the Mac the Documents folder can be accessed using the provided Go to Documents button.

For iOS the app’s Info.plist includes an entry for Application supports iTunes file sharing so the Documents folder can be accessed in the Finder of your connected device, as well as an entry for Supports Document Browser so the Documents folder can be accessed in the ‘On My iPhone’ section of the Files app.

2. ScaleAudioObservable

This ObservableObject has a published property that stores the list of files included in the project:

@Published var files:[File]

A published property for the scale factor:

@Published var factor:Double = 1.5

And a published property for a single channel option:

@Published var singleChannel:Bool = false

An AVAudioPlayer is used to play the audio files:

var audioPlayer: AVAudioPlayer?
...
audioPlayer.play()

URLs of the included audio files are loaded in the init() method:

let kAudioFilesSubdirectory = "Audio Files"
...
init() {
    let fm = FileManager.default
    documentsURL = try! fm.url(for:.documentDirectory, in: .userDomainMask, appropriateFor: nil, create: true)
    
    self.files = []
    
    for audioExtension in kAudioExtensions {
        if let urls = Bundle.main.urls(forResourcesWithExtension: audioExtension, subdirectory: kAudioFilesSubdirectory) {
            for url in urls {
                self.files.append(File(url: url))
            }
        }
    }
    
    self.files.sort(by: { $0.url.lastPathComponent > $1.url.lastPathComponent })
    
}

The following input file extensions for audio are supported by this app, but you can add your own:

let kAudioExtensions: [String] = ["aac", "m4a", "aiff", "aif", "wav", "mp3", "caf", "m4r", "flac","mp4"]

The output file extension and associated type AVFileType is needed later for creating an AVAssetWriter to save the scaled audio file.

It is not always possible for the scaled output file to have the same file type and extension as the input file for reasons described below.

The scaled audio file will have a file type defined by the following mapping from the input file extension to AVFileType, with an output file extension compatible with the AVFileType selected:

let kAudioFileTypes: [AVFileType] = [AVFileType.m4a, AVFileType.m4a, AVFileType.aiff, AVFileType.aiff, AVFileType.wav, AVFileType.m4a, AVFileType.caf, AVFileType.m4a, AVFileType.m4a, AVFileType.mp4]

func AVFileTypeForExtension(ext:String) -> AVFileType {
    if let index = kAudioExtensions.firstIndex(of: ext) {
        return kAudioFileTypes[index]
    }
    return AVFileType.m4a
}

func ExtensionForAVFileType(_ type:AVFileType) -> String {
    if let ext =  UTType(type.rawValue)?.preferredFilenameExtension {
        return ext
    }
    return "m4a"
}

AVFileTypeForExtension maps input file extension to AVFileType. The best output file extension for a given AVFileType is chosen by ExtensionForAVFileType using preferredFilenameExtension of UTType.

The action for the scale button is implemented by:

func scaleAudioURL(url:URL) {
    
    // output extension should match AVFileType
    let avFileType = AVFileTypeForExtension(ext: url.pathExtension)
    let scaledExtension = ExtensionForAVFileType(avFileType)
    
    scale(url: url, avFileType:avFileType, saveTo: "SCALED.\(scaledExtension)") { (success, scaledURL, failureReason) in
        ...
    } 
}

Note the file extension is not necessarily preserved since AVFileType does not support some file extensions such as aac or flac:

var type = AVFileType.m4a  // OK
type = AVFileType.aifc  // OK

type = AVFileType.flac // Error: Type 'AVFileType' has no member 'flac'
type = AVFileType.aac // Error: Type 'AVFileType' has no member 'aac'

Unsupported types flac or aac are saved to type AVFileType.m4a.

scaleAudioURL invokes scaleAudio of the ScaleAudio class:

let scaleAudio = ScaleAudio()
...
func scale(url:URL, avFileType:AVFileType, saveTo:String, completion: @escaping (Bool, URL, String?) -> ()) {
    
    let scaledURL = documentsURL.appendingPathComponent(saveTo)
    
    let asset = AVAsset(url: url)
    
    let scaleQueue = DispatchQueue(label: "com.limit-point.scaleQueue")
    
    scaleQueue.async {
        self.scaleAudio.scaleAudio(asset: asset, factor: self.factor, singleChannel: self.singleChannel, destinationURL: scaledURL, avFileType: avFileType, progress: { value, title in
            DispatchQueue.main.async {
                self.progress = value
                self.progressTitle = title
            }
        }) { (success, failureReason) in
            completion(success, scaledURL, failureReason)
        }
    }
    
}

Processing is performed on a background thread which enables the progress views to update.

The result is written to a file in Documents named SCALED.extension-matching-avfiletype

The result URL is stored in published property:

@Published var scaledAudioURL:URL?

3. ScaleAudio

Scaling audio is performed in 3 steps using AVFoundation:

Read the audio samples of all channels of an audio file, scale all and interleave into an Array of [Int16]
Create an array of sample buffers [CMSampleBuffer?] for the array of interleaved scaled audio samples
Write the scaled sample buffers in [CMSampleBuffer?] to a file

The top level method that implements all of this, and is employed by the ScaleAudioObservable is:

func scaleAudio(asset:AVAsset, factor:Double, singleChannel:Bool, destinationURL:URL, avFileType:AVFileType, progress: @escaping (Double, String) -> (), completion: @escaping (Bool, String?) -> ())

Arguments:

asset:AVAsset - The AVAsset for the audio file to be scaled.
factor:Double - A scale factor < 1 slows down the audio, a factor > 1 speeds it up. For example if the audio is originally 10 seconds long and the scale factor is 2 then the scaled audio will be 20 seconds long. If factor is 0.5 then scaled audio will be 5 seconds long.
singleChannel:Bool - The AVAssetReader that reads the file can deliver the audio data interleaved with alternating samples from each channel (singleChannel = false) or as a single merged channel (singleChannel = true).
destinationURL:URL - A URL that specifies the location for the output file. The extension chosen for this URL should be compatible with the next argument for file type.
avFileType:AVFileType - An AVFileType for the desired file type that should be compatible with the previous argument for file extension.
progress - An optional handler that is periodically executed to send progress messages and values.
completion - A handler that is executed when the operation has completed to send a message of success or not.

1. Read the audio samples of all channels of an audio file, scale all and interleave into an `Array` of `[Int16]`

We implement a method to read the file and scale it by a factor:

func readAndScaleAudioSamples(asset:AVAsset, factor:Double, singleChannel:Bool) -> (Int, Int, CMAudioFormatDescription?, [Int16]?)?

This method returns a 4-tuple consisting of:

The size of the first sample buffer, it will be used as the size of the samples buffers we write
The channel count for the audio, as the output file will have the same channel count
A format description (CMAudioFormatDescription) that is used for creating and writing CMSampleBuffers
All the interleaved scaled audio samples as Int16 data

Audio samples are read using an AVAssetReader created with this method:

func audioReader(asset:AVAsset, outputSettings: [String : Any]?) -> (audioTrack:AVAssetTrack?, audioReader:AVAssetReader?, audioReaderOutput:AVAssetReaderTrackOutput?) {
    
    if let audioTrack = asset.tracks(withMediaType: .audio).first {
        if let audioReader = try? AVAssetReader(asset: asset)  {
            let audioReaderOutput = AVAssetReaderTrackOutput(track: audioTrack, outputSettings: outputSettings)
            return (audioTrack, audioReader, audioReaderOutput)
        }
    }
    
    return (nil, nil, nil)
}

Create an audioReader and audioReaderOutput and connect them:

let (_, reader, readerOutput) = self.audioReader(asset:asset, outputSettings: outputSettings)

guard let audioReader = reader,
      let audioReaderOutput = readerOutput
else {
    return nil
}

if audioReader.canAdd(audioReaderOutput) {
    audioReader.add(audioReaderOutput)
}
else {
    return nil
}

The audio reader output settings outputSettings are specified as:

let kAudioReaderSettings = [
    AVFormatIDKey: Int(kAudioFormatLinearPCM) as AnyObject,
    AVLinearPCMBitDepthKey: 16 as AnyObject,
    AVLinearPCMIsBigEndianKey: false as AnyObject,
    AVLinearPCMIsFloatKey: false as AnyObject,
	//AVNumberOfChannelsKey: 1 as AnyObject, // Set to 1 to read all channels merged into one
    AVLinearPCMIsNonInterleaved: false as AnyObject]

The audio reader settings keys are asking for samples to be returned with the following noteworthy specifications:

Format as ‘Linear PCM’, i.e. uncompressed samples (AVFormatIDKey)
16 bit integers, Int16 (AVLinearPCMBitDepthKey)
Interleaved when multiple channels (AVLinearPCMIsNonInterleaved)

In particular multiple channel audio will be received interleaved. If we include the additional key AVNumberOfChannelsKey set to 1, as it is for the singleChannel option, the audio reader reads all channels merged into one:

if singleChannel {
    outputSettings[AVNumberOfChannelsKey] = 1 as AnyObject
}

Read sample buffers and extract audio samples:

if let sampleBuffer = audioReaderOutput.copyNextSampleBuffer(), let bufferSamples = self.extractSamples(sampleBuffer) {
...
}

Audio samples are stored in an array of arrays audioSamples of Int16, one array for each channel:

var audioSamples:[[Int16]] = [[]] // one for each channel

Each time we call copyNextSampleBuffer we are returned a CMSampleBuffer that contains the audio data as well as information about the data.

First obtain the audio format description:

formatDescription = CMSampleBufferGetFormatDescription(sampleBuffer)

The formatDescription is used later for saving the scaled audio file.

In particular using the same audio format description preserves the AudioChannelLayout, if it exists. Some of the audio samples in the app bundle have multiple distinct channels for experimenting with this.

Example.

The sample audio named Channels - 7.1 (L R C LFE Rls Rrs Ls Rs) has 8 channels as seen by the ACL tag in its format description using print(formatDescription):

{
	mediaType:'soun' 
	mediaSubType:'lpcm' 
	mediaSpecific: {
		ASBD: {
			mSampleRate: 48000.000000 
			mFormatID: 'lpcm' 
			mFormatFlags: 0xc 
			mBytesPerPacket: 16 
			mFramesPerPacket: 1 
			mBytesPerFrame: 16 
			mChannelsPerFrame: 8 
			mBitsPerChannel: 16 	} 
		cookie: {(null)} 
		ACL: {7.1 (L R C LFE Rls Rrs Ls Rs)}
		FormatList Array: {
			Index: 0 
			ChannelLayoutTag: 0xbd0008 
			ASBD: {
			mSampleRate: 48000.000000 
			mFormatID: 'lpcm' 
			mFormatFlags: 0xc 
			mBytesPerPacket: 16 
			mFramesPerPacket: 1 
			mBytesPerFrame: 16 
			mChannelsPerFrame: 8 
			mBitsPerChannel: 16 	}} 
	} 
	extensions: {(null)}
}

The first sample buffer’s AudioStreamBasicDescription, tagged ASBD in the example format description above, and audio data obtained with extractSamples provide the channelCount and bufferSize:

The channelCount is used to extract channels from the interleaved samples via the method extract_array_channels, and when the scaled audio samples are packaged into sample buffers of size bufferSize for output to a file.

var bufferSize:Int = 0
var channelCount:Int = 0
var formatDescription:CMAudioFormatDescription?
var audioSamples:[[Int16]] = [[]] // one for each channel
        
if let sampleBuffer = audioReaderOutput.copyNextSampleBuffer(), let bufferSamples = self.extractSamples(sampleBuffer) {

    formatDescription = CMSampleBufferGetFormatDescription(sampleBuffer)
    
    if let audioStreamBasicDescription = CMSampleBufferGetFormatDescription(sampleBuffer)?.audioStreamBasicDescription {
        
        if bufferSize == 0 {
            channelCount = Int(audioStreamBasicDescription.mChannelsPerFrame)
            bufferSize = bufferSamples.count
            audioSamples = [[Int16]](repeating: [], count: channelCount)
        }
        ...
    }
}

The bufferSize is a count of all audio samples in all channels.

A set of time-coincident samples with one from each channel form an audio frame, and the numSamples property of a sample buffer is a count of audio frames, so that:

bufferSize = sampleBuffer.numSamples * channelCount

Example.

In stereo audio with 2 channels for left (L) and right (R) the audio frames form a logical sequence:

[L₁,R₂], [L₃,R₄], [L₅,R₆], … ,[L_N-1,R_N]

for array of samples L₁,R₂,L₃,R₄,L₅,R₆,…,L_N-1,R_N and bufferSize is N, numSamples is N/2.

Extract audio samples in each CMSampleBuffer into an array of Int16 named bufferSamples with the method:

func extractSamples(_ sampleBuffer:CMSampleBuffer) -> [Int16]? {
    
    if let dataBuffer = CMSampleBufferGetDataBuffer(sampleBuffer) {
        
        let sizeofInt16 = MemoryLayout<Int16>.size
        
        let bufferLength = CMBlockBufferGetDataLength(dataBuffer)
        
        var data = [Int16](repeating: 0, count: bufferLength / sizeofInt16)
        
        CMBlockBufferCopyDataBytes(dataBuffer, atOffset: 0, dataLength: bufferLength, destination: &data)
        
        return data
    }
    
    return nil
}

The method extractSamples pulls the Int16 values we requested out of each CMSampleBuffer using CMSampleBufferGetDataBuffer into an array [Int16].

First use CMSampleBufferGetDataBuffer to access the data buffer in the sampleBuffer:

if let dataBuffer = CMSampleBufferGetDataBuffer(sampleBuffer) {
    ...
}

And then move the data into an Array of [Int16] and return it:

let sizeofInt16 = MemoryLayout<Int16>.size

let bufferLength = CMBlockBufferGetDataLength(dataBuffer)

var data = [Int16](repeating: 0, count: bufferLength / sizeofInt16)

CMBlockBufferCopyDataBytes(dataBuffer, atOffset: 0, dataLength: bufferLength, destination: &data)

return data

However bufferSamples now contains interleaved samples for all channels, as we requested.

Since we need to scale each channel separately we need to extract all channels from bufferSamples using extract_array_channels, accumulating them in array of all channels, audioSamples:

    // extract channels
let channels = bufferSamples.extract_array_channels(channelCount: channelCount)

for (index, channel) in channels.enumerated() {
    audioSamples[index].append(contentsOf: channel)
}

The method extract_array_channels is an extension on [Int16] arrays which employs another extension on [Int16], extract_array_channel that extracts channel channelIndex from the interleaved samples:

func extract_array_channel(channelIndex:Int, channelCount:Int) -> [Int16]? {
    
    guard channelIndex >= 0, channelIndex < channelCount, self.count > 0 else { return nil }
    
    let channel_array_length = self.count / channelCount
    
    guard channel_array_length > 0 else { return nil }
    
    var channel_array = [Int16](repeating: 0, count: channel_array_length)
    
    for index in 0...channel_array_length-1 {
        let array_index = channelIndex + index * channelCount
        channel_array[index] = self[array_index]
    }
    
    return channel_array
}

Use this method for each channel to create an array of samples for all channels:

func extract_array_channels(channelCount:Int) -> [[Int16]] {
    
    var channels:[[Int16]] = []
    
    guard channelCount > 0 else { return channels }
    
    for channel_index in 0...channelCount-1 {
        if let channel = self.extract_array_channel(channelIndex: channel_index, channelCount: channelCount) {
            channels.append(channel)
        }
    }
    
    return channels
}

Scale all the channels with scaleAudioSamples and return them interleaved in an array [Int16]:

let scaledAudioSamples = scaleAudioSamples(audioSamples, factor: factor)

Now return the 4-tuple consisting of:

The size of the first sample buffer, it will be used as the size of the samples buffers we write
The channel count for the audio, as the output file will have the same channel count
A format description that is used for creating and writing CMSampleBuffers
All the interleaved scaled audio samples as Int16 data

`scaleAudioSamples` performs two operations on samples:

Scaling
Interleaving

1. Scaling - Linear Interpolation to Scale Samples

The method scaleAudioSamples is implemented using the Accelerate framework to scale an array audioSamplesChannel of audio samples by a factor > 0 to length:

let length = Int(Double(audioSamplesChannel.count) * factor)

We define an extension on [Int16] named scaleToD that shrinks or expands it to any length > 0:

extension Array where Element == Int16  {
    func scaleToD(length:Int, smoothly:Bool) -> [Element] {
        
        guard length > 0 else {
            return []
        }
        
        let stride = vDSP_Stride(1)
        var control:[Double]
        
        if smoothly, length > self.count {
            let denominator = Double(length - 1) / Double(self.count - 1)
            
            control = (0...length - 1).map {
                let x = Double($0) / denominator
                return floor(x) + simd_smoothstep(0, 1, simd_fract(x))
            }
        }
        else {
            var base: Double = 0
            var end = Double(self.count - 1)
            control = [Double](repeating: 0, count: length)
            
            vDSP_vgenD(&base, &end, &control, stride, vDSP_Length(length))
        }
        
        // Ensure last control point, if it is not also the first, is indeed `count-1` with no fractional part. The calculations above can produce endpoints like `6.9999999999999991` when that is desired to be `7`
        if control.count > 1 {
            control[control.count-1] = Double(count - 1)
        }
        
        var result = [Double](repeating: 0, count: length)
        
        var double_array = vDSP.integerToFloatingPoint(self, floatingPointType: Double.self)

        // Since the control points form an increasing sequence (ramp) from `0` to `self.count - 1`, to preserve endpoints, the array needs to be padded at the end, as explained in documentation for vDSP_vlint.
        double_array.append(0)
        
        vDSP_vlintD(double_array,
                    control, stride,
                    &result, stride,
                    vDSP_Length(length),
                    vDSP_Length(double_array.count))
        
        return vDSP.floatingPointToInteger(result, integerType: Int16.self, rounding: .towardNearestInteger)
    }
}

The “D” in scaleToD is for Double: we use the double precision version vDSP_vlintD of vDSP_vlint in vDSP.

The control points control form an increasing sequence (ramp) from 0 to self.count - 1, to preserve endpoints, so the array needs to be padded at the end, as explained in documentation for vDSP_vlint, namely:

“However, the integer parts of the values in B must be greater than or equal to zero and less than or equal to M - 2.”

double_array.append(0)

This does not affect the result since there is no fractional part for the last control point. Ensure last control point, if it is not also the first, is indeed count-1 with no fractional part:

if control.count > 1 {
    control[control.count-1] = Double(count - 1)
}

Otherwise the calculations that generate the control array can produce endpoints like 6.9999999999999991 when it is desired to be 7.

Since we want the results as Int16 the output of vDSP_vlintD must be converted from Double to Int16:

return vDSP.floatingPointToInteger(result, integerType: Int16.self, rounding: .towardNearestInteger)

The rounding method is towardNearestInteger. In the case of a tie towardNearestInteger rounds to nearest even integer.

Example.

let double:[Double] = [0.3, 0.5, 0.7, 1.3, 1.5, 1.7, 2.3, 2.5, 2.7]

let double_rounded = vDSP.floatingPointToInteger(double, integerType: Int16.self, rounding: vDSP.RoundingMode.towardNearestInteger)

print(double_rounded)

Result:

[0, 0, 1, 1, 2, 2, 2, 2, 3]

We use Array extension scaleToD in the method scaleAudioSamples:

func scaleAudioSamples(_ audioSamples:[[Int16]], factor:Double) -> [Int16]? {
            
    var scaledAudioSamplesChannels:[[Int16]] = []
        
    for (index, audioSamplesChannel) in audioSamples.enumerated() {
        
        let length = Int(Double(audioSamplesChannel.count) * factor)
        
        scaledAudioSamplesChannels.append(audioSamplesChannel.scaleToD(length: length, smoothly: true)) 
    }
    
    return interleave_arrays(scaledAudioSamplesChannels)
}

We scale arrays using techniques Decimation to scale down (factor < 1) and Interpolation to scale up (factor > 1).

But rather than use the decimation method vDSP_desampD we can use the single double precision linear interpolation method vDSP_vlintD for scaling both up and down.

Consider the pseudo-code for vDSP_vlintD in the header file.

func vDSP_vlintD(A, B, IB, C, IC, N, M)

A = samples - input vector to scale up or down
B = control - integer parts are indices into A and fractional parts are interpolation constants.
C = scaled result of length N

The computation is:

for (n = 0; n < N; ++n)
{
    b = trunc(B[n]);
    a = B[n] - b;
    C[n] = A[b] + a * (A[b+1] - A[b]);  // or A[b] * (1 - a) + A[b+1] * a
}

To scale down we choose control points B to subsample and linearly interpolate adjacent samples as these weighted averages A[b] * (1 - a) + A[b+1] * a of elements of A.

To subsample control points are created as a ramp with vDSP_vgenpD, of Double values, from the first index 0 to the last index count-1 for the array to be scaled down (and up, if the scaleToD option smoothly is off):

let stride = vDSP_Stride(1)

var base: Double = 0
var end = Double(self.count - 1)
control = [Double](repeating: 0, count: length)
    
vDSP_vgenD(&base, &end, &control, stride, vDSP_Length(length))

Example.

This example produces a 5 element ramp for the range 0…9 to scale down by half, from 10 to 5:

let stride = vDSP_Stride(1)
let length:Int = 5

var base: Double = 0
var end = Double(9)
var control = [Double](repeating: 0, count: length)

vDSP_vgenD(&base, &end, &control, stride, vDSP_Length(length))

print(control)

Output:

[0.0, 2.25, 4.5, 6.75, 9.0]

Control points are used to determine the values in the scaled array. The integer parts are indices, and fractional parts are for mixing with next index: element n of control determines element n of the scaled array.

Example.

If control element n is 6.75 then element n of the scaled array C is a mixture of element 6 and 7 of the array A being scaled, using the fractional part .75:

C[n] = A[6] * (1 - .75) + A[7] * .75

The integer part 6 is the index into array A and .75 is used to compute the average with the next sample.

Example.

The init method of ScaleAudioApp runs code that generates all decimated and some interpolated arrays of a small array [Int16] of 16 elements.

The output also displays the control points generated by vDSP_vgenpD, printed from within scaleToD:

init() {
    let x:[Int16] = [3,5,1,8,4,56,33,4,77,42,84,25,12,6,13,15]
    
    for i in 1...x.count+4 {
        print(i)
        print(x)
        let scaled = x.scaleToD(length: i, smoothly: true)
        print(scaled)
        print("scaled.count = \(scaled.count)")
        print("----")
    }
}

Run the code and consider the 10th output row in the log:

10
[3, 5, 1, 8, 4, 56, 33, 4, 77, 42, 84, 25, 12, 6, 13, 15]
control = [0.0, 1.6666666666666667, 3.3333333333333335, 5.0, 6.666666666666667, 8.333333333333334, 10.0, 11.666666666666668, 13.333333333333334, 15.0]
[3, 2, 7, 56, 14, 65, 84, 16, 8, 15]

The 2nd element of the result, 2, is computed as follows using the pseudo-code:

n = 1
b = trunc(B[1]) = trunc(1.6666666) = 1
a = B[1] - b = (1.6666666 - 1) = 0.6666666

c[1] = A[1] + a * (A[2] - A[1])
    = 5 + 0.6666666 * (1 - 5)
    = 5 - 2.6666664
    = 2.3333336
    
    towardNearestInteger(2.3333336) = 2

The rounding method is towardNearestInteger so the final result is 2

Consider the 9th output row to see how this works in the case of a tie:

9
[3, 5, 1, 8, 4, 56, 33, 4, 77, 42, 84, 25, 12, 6, 13, 15]
control = [0.0, 1.875, 3.75, 5.625, 7.5, 9.375, 11.25, 13.125, 15.0]
[3, 2, 5, 42, 40, 58, 22, 7, 15]

The 2nd element of the result, 2, is computed as follows using the pseudo-code:

n = 1
b = trunc(B[1]) = trunc(1.875) = 1
a = B[1] - b = (1.875 - 1) = 0.875 

c[1] = A[1] + a * (A[2] - A[1])
    = 5 + 0.875 * (1 - 5)
    = 5 - 3.5
    = 1.5
    
    towardNearestInteger(1.5) = 2

The rounding method rounds to nearest even integer for the case of the tie 1.5 so the final result is 2 again.

Extension scaleToD has a ‘smoothly’ option that uses simd_smoothstep of the simd library but this only makes sense for scaling up and therefore the code is conditioned.

The control points in general are generated as:

if smoothly, length > self.count {
    let denominator = Double(length - 1) / Double(self.count - 1)
    
    control = (0...length - 1).map {
        let x = Double($0) / denominator
        return floor(x) + simd_smoothstep(0, 1, simd_fract(x))
    }
}
else {
    var base: Double = 0
    var end = Double(self.count - 1)
    control = [Double](repeating: 0, count: length)
    
    vDSP_vgenD(&base, &end, &control, stride, vDSP_Length(length))
}

Finally recall our input is [Int16] and we want [Int16] output. Since vDSP_vlintD input must be [Double] and returns [Double] we need to do conversions.

On input convert Int16 to Double:

let double_array = vDSP.integerToFloatingPoint(self, floatingPointType: Double.self)

And then convert the result from Double to Int16:

return vDSP.floatingPointToInteger(result, integerType: Int16.self, rounding: .towardNearestInteger)

2. Interleaving - Combine Multiple Arrays Into One

After the audio sample arrays for each channel have been scaled we need to put them together into a single interleaved array for saving.

Interleaving is performed simply by combining all arrays into one by alternating values sequentially.

So if you have arrays A and B:

A = [A1, A2, A3]
B = [B1, B2, B3]

Then A and B interleaved to C is:

C = [A1, B1, A2, B2, A3, B3]

The general method for an arbitrary number of arrays is implemented as:

func interleave_arrays(_ arrays:[[Int16]]) -> [Int16]? {
    
    guard arrays.count > 0 else { return nil }
    
    if arrays.count == 1 {
        return arrays[0]
    }
    
    var size = Int.max
    for m in 0...arrays.count-1 {
        size = min(size, arrays[m].count)
    }
    
    guard size > 0 else { return nil }
    
    let interleaved_length = size * arrays.count
    var interleaved:[Int16] = [Int16](repeating: 0, count: interleaved_length)
    
    var count:Int = 0
    for j in 0...size-1 {
        for i in 0...arrays.count-1 {
            interleaved[count] = arrays[i][j]
            count += 1
        }
    }
    
    return interleaved 
}

2. Create an array of sample buffers `[CMSampleBuffer?]` for the array of interleaved scaled audio samples

This will be implemented by the method sampleBuffersForSamples passing the values previously retrieved for bufferSize, channelCount and formatDescription as well as the interleaved audio samples:

func sampleBuffersForSamples(bufferSize:Int, audioSamples:[Int16], channelCount:Int, formatDescription:CMAudioFormatDescription) -> [CMSampleBuffer?] 

Just as we read the data in as CMSampleBuffer it will be written out as CMSampleBuffer, where each sample buffer contains a subarray, or block, of the interleaved scaled audio samples.

To facilitate that we have an extension on Array that creates an array of blocks of size bufferSize of the array returned by readAndScaleAudioSamples:

extension Array {
    func blocks(size: Int) -> [[Element]] {
        return stride(from: 0, to: count, by: size).map {
            Array(self[$0 ..< Swift.min($0 + size, count)])
        }
    }
}

Example.

let x = [4, 7, 9, 3, 5, 2]

let x_blocks_2 = x.blocks(size: 2)
let x_blocks_4 = x.blocks(size: 4)

print("x_blocks_2 = \(x_blocks_2)")
print("x_blocks_4 = \(x_blocks_4)")

Output:

x_blocks_2 = [[4, 7], [9, 3], [5, 2]]

x_blocks_4 = [[4, 7, 9, 3], [5, 2]]

Employ the Array extension blocks to create an array consisting of blocks of interleaved audio data:

let blockedAudioSamples = audioSamples.blocks(size: bufferSize)

Then for each such block of Int16 samples we create a CMSampleBuffer using the method:

func sampleBufferForSamples(audioSamples:[Int16], channelCount:Int, formatDescription:CMAudioFormatDescription) -> CMSampleBuffer?

This method creates a CMSampleBuffer using CMSampleBufferCreate that contains one block of the interleaved scaled audio data.

For CMSampleBufferCreate we need to prepare samplesBlock for argument dataBuffer: CMBlockBuffer?

First create a CMBlockBuffer named samplesBlock for the dataBuffer argument using the audioSamples with CMBlockBufferCreateWithMemoryBlock.

CMBlockBufferCreateWithMemoryBlock requires an UnsafeMutableRawPointer named memoryBlock containing the audioSamples.

Allocate and initialize the memoryBlock with the audioSamples:

let bytesInt16 = MemoryLayout<Int16>.stride
let dataSize = audioSamples.count * bytesInt16

var samplesBlock:CMBlockBuffer? 

let memoryBlock:UnsafeMutableRawPointer = UnsafeMutableRawPointer.allocate(
    byteCount: dataSize,
    alignment: MemoryLayout<Int16>.alignment)

let _ = audioSamples.withUnsafeBufferPointer { buffer in
    memoryBlock.initializeMemory(as: Int16.self, from: buffer.baseAddress!, count: buffer.count)
}

Pass the memoryBlock to CMBlockBufferCreateWithMemoryBlock to create the samplesBlock, passing nil as the blockAllocator so the default allocator will release it:

CMBlockBufferCreateWithMemoryBlock(
            allocator: kCFAllocatorDefault, 
            memoryBlock: memoryBlock, 
            blockLength: dataSize, 
            blockAllocator: nil, 
            customBlockSource: nil, 
            offsetToData: 0, 
            dataLength: dataSize, 
            flags: 0, 
            blockBufferOut:&samplesBlock
        )

This is the samplesBlock for argument dataBuffer: CMBlockBuffer? of CMSampleBufferCreate.

For argument formatDescription: CMFormatDescription? of CMSampleBufferCreate we use the formatDescription we retrieved in readAndScaleAudioSamples.

Call CMSampleBufferCreate to create the sampleBuffer that contains the block of the interleaved scaled audio data in audioSamples:

let sampleCount = audioSamples.count / channelCount

if CMSampleBufferCreate(allocator: kCFAllocatorDefault, dataBuffer: samplesBlock, dataReady: true, makeDataReadyCallback: nil, refcon: nil, formatDescription: formatDesc, sampleCount: sampleCount, sampleTimingEntryCount: 0, sampleTimingArray: nil, sampleSizeEntryCount: 0, sampleSizeArray: nil, sampleBufferOut: &sampleBuffer) == noErr, let sampleBuffer = sampleBuffer {
    
    guard sampleBuffer.isValid, sampleBuffer.numSamples == sampleCount else {
        return nil
    }
}

It is important that we correctly specify the sampleCount taking into consideration of channelCount, as it is the number of audio frames. As noted earlier a set of time-coincident samples with one from each channel form an audio frame.

Example.

In stereo audio with 2 channels for left (L) and right (R) the audio frames form a logical sequence:

[L₁,R₂], [L₃,R₄], [L₅,R₆], … ,[L_N-1,R_N]

for array of samples L₁,R₂,L₃,R₄,L₅,R₆,…,L_N-1,R_N and the sampleCount is N/2, not N.

Each CMSampleBuffer created in this way is collected into an array [CMSampleBuffer?] with sampleBuffersForSamples:

func sampleBuffersForSamples(bufferSize:Int, audioSamples:[Int16], channelCount:Int, formatDescription:CMAudioFormatDescription) -> [CMSampleBuffer?] {
    
    ...
            
    let blockedAudioSamples = audioSamples.blocks(size: bufferSize)
            
    var sampleBuffers:[CMSampleBuffer?] = []
    
    for (index, audioSamples) in blockedAudioSamples.enumerated() {
    
        ...
        
        let sampleBuffer = sampleBufferForSamples(audioSamples: audioSamples, channelCount:channelCount, formatDescription: formatDescription)
        
        sampleBuffers.append(sampleBuffer)
    }
    
    return sampleBuffers
}

In the next section the [CMSampleBuffer?] will be written to the output file sequentially.

3. Write the scaled sample buffers in `[CMSampleBuffer?]` to a file

Finally implement this method to create the scaled audio file passing the array [CMSampleBuffer?]:

func saveSampleBuffersToFile(_ sampleBuffers:[CMSampleBuffer?], formatDescription:CMAudioFormatDescription, destinationURL:URL, completion: @escaping (Bool, String?) -> ())

This method uses an asset writer to write the samples.

First create the AVAssetWriter, checking that the AVFileType property has been set, and use the destinationURL with the compatible file extension as the output file location:

guard let avFileType = avFileType, let assetWriter = try? AVAssetWriter(outputURL: destinationURL, fileType: avFileType) else {
    completion(false, "Can't create asset writer.")
    return
}

Create an AVAssetWriterInput and attach it to the asset writer.

The source format hint is set to the formatDescription we retrieved in readAndScaleAudioSamples and the output settings are set to kAudioFormatLinearPCM for Linear PCM:

// Header: "When a source format hint is provided, the outputSettings dictionary is not required to be fully specified." 
let audioFormatSettings = [AVFormatIDKey: kAudioFormatLinearPCM] as [String : Any]

let audioWriterInput = AVAssetWriterInput(mediaType: AVMediaType.audio, outputSettings:audioFormatSettings, sourceFormatHint: formatDescription)
...
assetWriter.add(audioWriterInput)

Then write each CMSampleBuffer as the asset write input is ready to receive and append them:

let serialQueue: DispatchQueue = DispatchQueue(label: kScaleAudioQueue)
...
audioWriterInput.requestMediaDataWhenReady(on: serialQueue) {
    
    while audioWriterInput.isReadyForMoreMediaData, index < nbrSamples {
        
        if let currentSampleBuffer = sampleBuffers[index] {
            audioWriterInput.append(currentSampleBuffer)
        }
        
        ...
    }
}

Conclusion

To conclude assemble the above pieces readAndScaleAudioSamples, sampleBuffersForSamples and saveSampleBuffersToFile into the final method scaleAudio to carry out the 3 steps outlined at the start, namely:

Read the audio samples of all channels of an audio file, scale all and interleave into an Array of [Int16]
Create an array of sample buffers [CMSampleBuffer?] for the array of interleaved scaled audio samples
Write the scaled sample buffers in [CMSampleBuffer?] to a file

func scaleAudio(asset:AVAsset, factor:Double, singleChannel:Bool, destinationURL:URL, avFileType:AVFileType, progress:((Double, String) -> ())? = nil, completion: @escaping (Bool, String?) -> ())  {
    
    self.avFileType = avFileType
    
    if let progress = progress {
        self.progress = progress 
    }
    
    guard let (bufferSize, channelCount, formatDescription, audioSamples) = readAndScaleAudioSamples(asset: asset, factor: factor, singleChannel: singleChannel) else {
        completion(false, "Can't read audio samples")
        return
    }
    
    guard let formatDescription = formatDescription else {
        completion(false, "No audio format description")
        return
    }
    
    guard let audioSamples = audioSamples else {
        completion(false, "Can't scale audio samples")
        return
    }
    
    let sampleBuffers = sampleBuffersForSamples(bufferSize: bufferSize, audioSamples: audioSamples, channelCount:channelCount, formatDescription: formatDescription)
    
    saveSampleBuffersToFile(sampleBuffers, formatDescription: formatDescription, destinationURL: destinationURL, completion: completion)
}

Arguments:

asset:AVAsset - The AVAsset for the audio file to be scaled.
factor:Double - A scale factor < 1 slows down the audio, a factor > 1 speeds it up. For example if the audio is originally 10 seconds long and the scale factor is 2 then the scaled audio will be 20 seconds long. If factor is 0.5 then scaled audio will be 5 seconds long.
singleChannel:Bool - The AVAssetReader that reads the file can deliver the audio data interleaved with alternating samples from each channel (singleChannel = false) or as a single merged channel (singleChannel = true).
destinationURL:URL - A URL that specifies the location for the output file. The extension chosen for this URL should be compatible with the next argument for file type.
avFileType:AVFileType - An AVFileType for the desired file type that should be compatible with the previous argument for file extension.
progress - An optional handler that is periodically executed to send progress messages and values.
completion - A handler that is executed when the operation has completed to send a message of success or not.

ScaleAudio

Classes

1. ScaleAudioApp

2. ScaleAudioObservable

3. ScaleAudio

1. Read the audio samples of all channels of an audio file, scale all and interleave into an Array of [Int16]

scaleAudioSamples performs two operations on samples:

1. Scaling - Linear Interpolation to Scale Samples

2. Interleaving - Combine Multiple Arrays Into One

2. Create an array of sample buffers [CMSampleBuffer?] for the array of interleaved scaled audio samples

3. Write the scaled sample buffers in [CMSampleBuffer?] to a file

Conclusion

1. Read the audio samples of all channels of an audio file, scale all and interleave into an `Array` of `[Int16]`

`scaleAudioSamples` performs two operations on samples:

2. Create an array of sample buffers `[CMSampleBuffer?]` for the array of interleaved scaled audio samples

3. Write the scaled sample buffers in `[CMSampleBuffer?]` to a file