Scaling Audio Files
Learn how to scale all audio channels in the time domain
Originally published • Last updatedDiscussion on how to use the Accelerate framework to speed up or slow down the rate of play of an audio file.
ScaleAudio
The associated Xcode project implements a SwiftUI app for macOS and iOS that presents a list of audio files included in the bundle resources subdirectory ‘Audio Files’.
Add your own audio files or use the sample set provided.
Each file in the list has an adjacent button to either play or scale the audio.
Select the scale factor from a slider.
Classes
The project is comprised of:
ScaleAudioApp
: The App that displays a list of audio files in the project.ScaleAudioObservable
: An ObservableObject that manages the user interaction to scale and play audio files in the list.ScaleAudio
: The AVFoundation code that reads, scales and writes audio files.
1. ScaleAudioApp
The app displays of list of audio files in the reference folder ‘Audio Files’.
You can add your own files to the list.
Files are represented by a File
object which stores its URL location:
struct File: Identifiable {
var url:URL
var id = UUID()
}
The files are presented in a FileTableView
using List:
List(scaleAudioObservable.files) {
FileTableViewRowView(file: $0, scaleAudioObservable: scaleAudioObservable)
}
Each FileTableViewRowView
of the table displays the audio file name and a Button to play and scale it:
HStack {
Text("\(file.url.lastPathComponent) [\(file.duration)]")
Button("Play", action: {
scaleAudioObservable.playAudioURL(file.url)
})
.buttonStyle(BorderlessButtonStyle()) // need this or tapping one invokes both actions
Button("Scale", action: {
scaleAudioObservable.scaleAudioURL(url: file.url)
})
.buttonStyle(BorderlessButtonStyle())
}
The scale factor is chosen from a Slider to scale an audio duration from 0.1 to 2 times the original duration:
Slider(
value: $scaleAudioObservable.factor,
in: 0.1...2
)
If audio has multiple channels an option to process a single merged channel rather than multiple interleaved channels is available with a Toggle:
Toggle(isOn: $scaleAudioObservable.singleChannel) {
Text("Single Channel")
}
Both iOS and macOS store the generated file in the Documents folder.
On the Mac the Documents folder can be accessed using the provided Go to Documents button.
For iOS the app’s Info.plist includes an entry for Application supports iTunes file sharing so the Documents folder can be accessed in the Finder of your connected device, as well as an entry for Supports Document Browser so the Documents folder can be accessed in the ‘On My iPhone’ section of the Files app.
2. ScaleAudioObservable
This ObservableObject has a published property that stores the list of files included in the project:
@Published var files:[File]
A published property for the scale factor:
@Published var factor:Double = 1.5
And a published property for a single channel option:
@Published var singleChannel:Bool = false
An AVAudioPlayer is used to play the audio files:
var audioPlayer: AVAudioPlayer?
...
audioPlayer.play()
URLs of the included audio files are loaded in the init()
method:
let kAudioFilesSubdirectory = "Audio Files"
...
init() {
let fm = FileManager.default
documentsURL = try! fm.url(for:.documentDirectory, in: .userDomainMask, appropriateFor: nil, create: true)
self.files = []
for audioExtension in kAudioExtensions {
if let urls = Bundle.main.urls(forResourcesWithExtension: audioExtension, subdirectory: kAudioFilesSubdirectory) {
for url in urls {
self.files.append(File(url: url))
}
}
}
self.files.sort(by: { $0.url.lastPathComponent > $1.url.lastPathComponent })
}
The following input file extensions for audio are supported by this app, but you can add your own:
let kAudioExtensions: [String] = ["aac", "m4a", "aiff", "aif", "wav", "mp3", "caf", "m4r", "flac","mp4"]
The output file extension and associated type AVFileType is needed later for creating an AVAssetWriter to save the scaled audio file.
It is not always possible for the scaled output file to have the same file type and extension as the input file for reasons described below.
The scaled audio file will have a file type defined by the following mapping from the input file extension to AVFileType, with an output file extension compatible with the AVFileType
selected:
let kAudioFileTypes: [AVFileType] = [AVFileType.m4a, AVFileType.m4a, AVFileType.aiff, AVFileType.aiff, AVFileType.wav, AVFileType.m4a, AVFileType.caf, AVFileType.m4a, AVFileType.m4a, AVFileType.mp4]
func AVFileTypeForExtension(ext:String) -> AVFileType {
if let index = kAudioExtensions.firstIndex(of: ext) {
return kAudioFileTypes[index]
}
return AVFileType.m4a
}
func ExtensionForAVFileType(_ type:AVFileType) -> String {
if let ext = UTType(type.rawValue)?.preferredFilenameExtension {
return ext
}
return "m4a"
}
AVFileTypeForExtension
maps input file extension to AVFileType
. The best output file extension for a given AVFileType
is chosen by ExtensionForAVFileType
using preferredFilenameExtension of UTType.
The action for the scale button is implemented by:
func scaleAudioURL(url:URL) {
// output extension should match AVFileType
let avFileType = AVFileTypeForExtension(ext: url.pathExtension)
let scaledExtension = ExtensionForAVFileType(avFileType)
scale(url: url, avFileType:avFileType, saveTo: "SCALED.\(scaledExtension)") { (success, scaledURL, failureReason) in
...
}
}
Note the file extension is not necessarily preserved since AVFileType
does not support some file extensions such as aac
or flac
:
var type = AVFileType.m4a // OK
type = AVFileType.aifc // OK
type = AVFileType.flac // Error: Type 'AVFileType' has no member 'flac'
type = AVFileType.aac // Error: Type 'AVFileType' has no member 'aac'
Unsupported types flac
or aac
are saved to type AVFileType.m4a
.
scaleAudioURL
invokes scaleAudio
of the ScaleAudio
class:
let scaleAudio = ScaleAudio()
...
func scale(url:URL, avFileType:AVFileType, saveTo:String, completion: @escaping (Bool, URL, String?) -> ()) {
let scaledURL = documentsURL.appendingPathComponent(saveTo)
let asset = AVAsset(url: url)
let scaleQueue = DispatchQueue(label: "com.limit-point.scaleQueue")
scaleQueue.async {
self.scaleAudio.scaleAudio(asset: asset, factor: self.factor, singleChannel: self.singleChannel, destinationURL: scaledURL, avFileType: avFileType, progress: { value, title in
DispatchQueue.main.async {
self.progress = value
self.progressTitle = title
}
}) { (success, failureReason) in
completion(success, scaledURL, failureReason)
}
}
}
Processing is performed on a background thread which enables the progress views to update.
The result is written to a file in Documents named SCALED.extension-matching-avfiletype
The result URL is stored in published property:
@Published var scaledAudioURL:URL?
3. ScaleAudio
Scaling audio is performed in 3 steps using AVFoundation:
- Read the audio samples of all channels of an audio file, scale all and interleave into an
Array
of[Int16]
- Create an array of sample buffers
[CMSampleBuffer?]
for the array of interleaved scaled audio samples - Write the scaled sample buffers in
[CMSampleBuffer?]
to a file
The top level method that implements all of this, and is employed by the ScaleAudioObservable
is:
func scaleAudio(asset:AVAsset, factor:Double, singleChannel:Bool, destinationURL:URL, avFileType:AVFileType, progress: @escaping (Double, String) -> (), completion: @escaping (Bool, String?) -> ())
Arguments:
-
asset:AVAsset - The AVAsset for the audio file to be scaled.
-
factor:Double - A scale factor < 1 slows down the audio, a factor > 1 speeds it up. For example if the audio is originally 10 seconds long and the scale factor is 2 then the scaled audio will be 20 seconds long. If factor is 0.5 then scaled audio will be 5 seconds long.
-
singleChannel:Bool - The AVAssetReader that reads the file can deliver the audio data interleaved with alternating samples from each channel (singleChannel = false) or as a single merged channel (singleChannel = true).
-
destinationURL:URL - A URL that specifies the location for the output file. The extension chosen for this URL should be compatible with the next argument for file type.
-
avFileType:AVFileType - An AVFileType for the desired file type that should be compatible with the previous argument for file extension.
-
progress - An optional handler that is periodically executed to send progress messages and values.
-
completion - A handler that is executed when the operation has completed to send a message of success or not.
1. Read the audio samples of all channels of an audio file, scale all and interleave into an Array
of [Int16]
We implement a method to read the file and scale it by a factor
:
func readAndScaleAudioSamples(asset:AVAsset, factor:Double, singleChannel:Bool) -> (Int, Int, CMAudioFormatDescription?, [Int16]?)?
This method returns a 4-tuple consisting of:
- The size of the first sample buffer, it will be used as the size of the samples buffers we write
- The channel count for the audio, as the output file will have the same channel count
- A format description (CMAudioFormatDescription) that is used for creating and writing CMSampleBuffers
- All the interleaved scaled audio samples as
Int16
data
Audio samples are read using an AVAssetReader created with this method:
func audioReader(asset:AVAsset, outputSettings: [String : Any]?) -> (audioTrack:AVAssetTrack?, audioReader:AVAssetReader?, audioReaderOutput:AVAssetReaderTrackOutput?) {
if let audioTrack = asset.tracks(withMediaType: .audio).first {
if let audioReader = try? AVAssetReader(asset: asset) {
let audioReaderOutput = AVAssetReaderTrackOutput(track: audioTrack, outputSettings: outputSettings)
return (audioTrack, audioReader, audioReaderOutput)
}
}
return (nil, nil, nil)
}
Create an audioReader
and audioReaderOutput
and connect them:
let (_, reader, readerOutput) = self.audioReader(asset:asset, outputSettings: outputSettings)
guard let audioReader = reader,
let audioReaderOutput = readerOutput
else {
return nil
}
if audioReader.canAdd(audioReaderOutput) {
audioReader.add(audioReaderOutput)
}
else {
return nil
}
The audio reader output settings outputSettings
are specified as:
let kAudioReaderSettings = [
AVFormatIDKey: Int(kAudioFormatLinearPCM) as AnyObject,
AVLinearPCMBitDepthKey: 16 as AnyObject,
AVLinearPCMIsBigEndianKey: false as AnyObject,
AVLinearPCMIsFloatKey: false as AnyObject,
//AVNumberOfChannelsKey: 1 as AnyObject, // Set to 1 to read all channels merged into one
AVLinearPCMIsNonInterleaved: false as AnyObject]
The audio reader settings keys are asking for samples to be returned with the following noteworthy specifications:
- Format as ‘Linear PCM’, i.e. uncompressed samples (
AVFormatIDKey
) - 16 bit integers,
Int16
(AVLinearPCMBitDepthKey
) - Interleaved when multiple channels (
AVLinearPCMIsNonInterleaved
)
In particular multiple channel audio will be received interleaved. If we include the additional key AVNumberOfChannelsKey
set to 1, as it is for the singleChannel
option, the audio reader reads all channels merged into one:
if singleChannel {
outputSettings[AVNumberOfChannelsKey] = 1 as AnyObject
}
Read sample buffers and extract audio samples:
if let sampleBuffer = audioReaderOutput.copyNextSampleBuffer(), let bufferSamples = self.extractSamples(sampleBuffer) {
...
}
Audio samples are stored in an array of arrays audioSamples
of Int16
, one array for each channel:
var audioSamples:[[Int16]] = [[]] // one for each channel
Each time we call copyNextSampleBuffer we are returned a CMSampleBuffer that contains the audio data as well as information about the data.
First obtain the audio format description:
formatDescription = CMSampleBufferGetFormatDescription(sampleBuffer)
The formatDescription
is used later for saving the scaled audio file.
In particular using the same audio format description preserves the AudioChannelLayout, if it exists. Some of the audio samples in the app bundle have multiple distinct channels for experimenting with this.
Example.
The sample audio named Channels - 7.1 (L R C LFE Rls Rrs Ls Rs)
has 8 channels as seen by the ACL
tag in its format description using print(formatDescription)
:
{
mediaType:'soun'
mediaSubType:'lpcm'
mediaSpecific: {
ASBD: {
mSampleRate: 48000.000000
mFormatID: 'lpcm'
mFormatFlags: 0xc
mBytesPerPacket: 16
mFramesPerPacket: 1
mBytesPerFrame: 16
mChannelsPerFrame: 8
mBitsPerChannel: 16 }
cookie: {(null)}
ACL: {7.1 (L R C LFE Rls Rrs Ls Rs)}
FormatList Array: {
Index: 0
ChannelLayoutTag: 0xbd0008
ASBD: {
mSampleRate: 48000.000000
mFormatID: 'lpcm'
mFormatFlags: 0xc
mBytesPerPacket: 16
mFramesPerPacket: 1
mBytesPerFrame: 16
mChannelsPerFrame: 8
mBitsPerChannel: 16 }}
}
extensions: {(null)}
}
The first sample buffer’s AudioStreamBasicDescription, tagged ASBD
in the example format description above, and audio data obtained with extractSamples
provide the channelCount
and bufferSize
:
The channelCount
is used to extract channels from the interleaved samples via the method extract_array_channels
, and when the scaled audio samples are packaged into sample buffers of size bufferSize
for output to a file.
var bufferSize:Int = 0
var channelCount:Int = 0
var formatDescription:CMAudioFormatDescription?
var audioSamples:[[Int16]] = [[]] // one for each channel
if let sampleBuffer = audioReaderOutput.copyNextSampleBuffer(), let bufferSamples = self.extractSamples(sampleBuffer) {
formatDescription = CMSampleBufferGetFormatDescription(sampleBuffer)
if let audioStreamBasicDescription = CMSampleBufferGetFormatDescription(sampleBuffer)?.audioStreamBasicDescription {
if bufferSize == 0 {
channelCount = Int(audioStreamBasicDescription.mChannelsPerFrame)
bufferSize = bufferSamples.count
audioSamples = [[Int16]](repeating: [], count: channelCount)
}
...
}
}
The bufferSize
is a count of all audio samples in all channels.
A set of time-coincident samples with one from each channel form an audio frame, and the numSamples
property of a sample buffer is a count of audio frames, so that:
bufferSize = sampleBuffer.numSamples * channelCount
Example.
In stereo audio with 2 channels for left (L) and right (R) the audio frames form a logical sequence:
[L1,R2], [L3,R4], [L5,R6], … ,[LN-1,RN]
for array of samples L1,R2,L3,R4,L5,R6,…,LN-1,RN and bufferSize
is N, numSamples
is N/2.
Extract audio samples in each CMSampleBuffer
into an array of Int16
named bufferSamples
with the method:
func extractSamples(_ sampleBuffer:CMSampleBuffer) -> [Int16]? {
if let dataBuffer = CMSampleBufferGetDataBuffer(sampleBuffer) {
let sizeofInt16 = MemoryLayout<Int16>.size
let bufferLength = CMBlockBufferGetDataLength(dataBuffer)
var data = [Int16](repeating: 0, count: bufferLength / sizeofInt16)
CMBlockBufferCopyDataBytes(dataBuffer, atOffset: 0, dataLength: bufferLength, destination: &data)
return data
}
return nil
}
The method extractSamples
pulls the Int16
values we requested out of each CMSampleBuffer
using CMSampleBufferGetDataBuffer into an array [Int16]
.
First use CMSampleBufferGetDataBuffer
to access the data buffer in the sampleBuffer
:
if let dataBuffer = CMSampleBufferGetDataBuffer(sampleBuffer) {
...
}
And then move the data into an Array
of [Int16]
and return it:
let sizeofInt16 = MemoryLayout<Int16>.size
let bufferLength = CMBlockBufferGetDataLength(dataBuffer)
var data = [Int16](repeating: 0, count: bufferLength / sizeofInt16)
CMBlockBufferCopyDataBytes(dataBuffer, atOffset: 0, dataLength: bufferLength, destination: &data)
return data
However bufferSamples
now contains interleaved samples for all channels, as we requested.
Since we need to scale each channel separately we need to extract all channels from bufferSamples
using extract_array_channels
, accumulating them in array of all channels, audioSamples
:
// extract channels
let channels = bufferSamples.extract_array_channels(channelCount: channelCount)
for (index, channel) in channels.enumerated() {
audioSamples[index].append(contentsOf: channel)
}
The method extract_array_channels
is an extension on [Int16]
arrays which employs another extension on [Int16]
, extract_array_channel
that extracts channel channelIndex
from the interleaved samples:
func extract_array_channel(channelIndex:Int, channelCount:Int) -> [Int16]? {
guard channelIndex >= 0, channelIndex < channelCount, self.count > 0 else { return nil }
let channel_array_length = self.count / channelCount
guard channel_array_length > 0 else { return nil }
var channel_array = [Int16](repeating: 0, count: channel_array_length)
for index in 0...channel_array_length-1 {
let array_index = channelIndex + index * channelCount
channel_array[index] = self[array_index]
}
return channel_array
}
Use this method for each channel to create an array of samples for all channels:
func extract_array_channels(channelCount:Int) -> [[Int16]] {
var channels:[[Int16]] = []
guard channelCount > 0 else { return channels }
for channel_index in 0...channelCount-1 {
if let channel = self.extract_array_channel(channelIndex: channel_index, channelCount: channelCount) {
channels.append(channel)
}
}
return channels
}
Scale all the channels with scaleAudioSamples
and return them interleaved in an array [Int16]
:
let scaledAudioSamples = scaleAudioSamples(audioSamples, factor: factor)
Now return the 4-tuple consisting of:
- The size of the first sample buffer, it will be used as the size of the samples buffers we write
- The channel count for the audio, as the output file will have the same channel count
- A format description that is used for creating and writing CMSampleBuffers
- All the interleaved scaled audio samples as
Int16
data
scaleAudioSamples
performs two operations on samples:
- Scaling
- Interleaving
1. Scaling - Linear Interpolation to Scale Samples
The method scaleAudioSamples
is implemented using the Accelerate framework to scale an array audioSamplesChannel
of audio samples by a factor > 0 to length
:
let length = Int(Double(audioSamplesChannel.count) * factor)
We define an extension on [Int16]
named scaleToD
that shrinks or expands it to any length > 0:
extension Array where Element == Int16 {
func scaleToD(length:Int, smoothly:Bool) -> [Element] {
guard length > 0 else {
return []
}
let stride = vDSP_Stride(1)
var control:[Double]
if smoothly, length > self.count {
let denominator = Double(length - 1) / Double(self.count - 1)
control = (0...length - 1).map {
let x = Double($0) / denominator
return floor(x) + simd_smoothstep(0, 1, simd_fract(x))
}
}
else {
var base: Double = 0
var end = Double(self.count - 1)
control = [Double](repeating: 0, count: length)
vDSP_vgenD(&base, &end, &control, stride, vDSP_Length(length))
}
// Ensure last control point, if it is not also the first, is indeed `count-1` with no fractional part. The calculations above can produce endpoints like `6.9999999999999991` when that is desired to be `7`
if control.count > 1 {
control[control.count-1] = Double(count - 1)
}
var result = [Double](repeating: 0, count: length)
var double_array = vDSP.integerToFloatingPoint(self, floatingPointType: Double.self)
// Since the control points form an increasing sequence (ramp) from `0` to `self.count - 1`, to preserve endpoints, the array needs to be padded at the end, as explained in documentation for vDSP_vlint.
double_array.append(0)
vDSP_vlintD(double_array,
control, stride,
&result, stride,
vDSP_Length(length),
vDSP_Length(double_array.count))
return vDSP.floatingPointToInteger(result, integerType: Int16.self, rounding: .towardNearestInteger)
}
}
The “D” in scaleToD
is for Double
: we use the double precision version vDSP_vlintD of vDSP_vlint in vDSP.
The control points control
form an increasing sequence (ramp) from 0
to self.count - 1
, to preserve endpoints, so the array needs to be padded at the end, as explained in documentation for vDSP_vlint, namely:
“However, the integer parts of the values in B must be greater than or equal to zero and less than or equal to M - 2.”
double_array.append(0)
This does not affect the result since there is no fractional part for the last control point. Ensure last control point, if it is not also the first, is indeed count-1
with no fractional part:
if control.count > 1 {
control[control.count-1] = Double(count - 1)
}
Otherwise the calculations that generate the control
array can produce endpoints like 6.9999999999999991
when it is desired to be 7
.
Since we want the results as Int16
the output of vDSP_vlintD
must be converted from Double
to Int16
:
return vDSP.floatingPointToInteger(result, integerType: Int16.self, rounding: .towardNearestInteger)
The rounding method is towardNearestInteger. In the case of a tie towardNearestInteger
rounds to nearest even integer.
Example.
let double:[Double] = [0.3, 0.5, 0.7, 1.3, 1.5, 1.7, 2.3, 2.5, 2.7]
let double_rounded = vDSP.floatingPointToInteger(double, integerType: Int16.self, rounding: vDSP.RoundingMode.towardNearestInteger)
print(double_rounded)
Result:
[0, 0, 1, 1, 2, 2, 2, 2, 3]
We use Array extension scaleToD
in the method scaleAudioSamples
:
func scaleAudioSamples(_ audioSamples:[[Int16]], factor:Double) -> [Int16]? {
var scaledAudioSamplesChannels:[[Int16]] = []
for (index, audioSamplesChannel) in audioSamples.enumerated() {
let length = Int(Double(audioSamplesChannel.count) * factor)
scaledAudioSamplesChannels.append(audioSamplesChannel.scaleToD(length: length, smoothly: true))
}
return interleave_arrays(scaledAudioSamplesChannels)
}
We scale arrays using techniques Decimation to scale down (factor < 1) and Interpolation to scale up (factor > 1).
But rather than use the decimation method vDSP_desampD we can use the single double precision linear interpolation method vDSP_vlintD for scaling both up and down.
Consider the pseudo-code for vDSP_vlintD in the header file.
func vDSP_vlintD(A, B, IB, C, IC, N, M)
A = samples - input vector to scale up or down
B = control - integer parts are indices into A and fractional parts are interpolation constants.
C = scaled result of length N
The computation is:
for (n = 0; n < N; ++n)
{
b = trunc(B[n]);
a = B[n] - b;
C[n] = A[b] + a * (A[b+1] - A[b]); // or A[b] * (1 - a) + A[b+1] * a
}
To scale down we choose control points B
to subsample and linearly interpolate adjacent samples as these weighted averages A[b] * (1 - a) + A[b+1] * a
of elements of A
.
To subsample control points are created as a ramp with vDSP_vgenpD, of Double
values, from the first index 0
to the last index count-1
for the array to be scaled down (and up, if the scaleToD
option smoothly
is off):
let stride = vDSP_Stride(1)
var base: Double = 0
var end = Double(self.count - 1)
control = [Double](repeating: 0, count: length)
vDSP_vgenD(&base, &end, &control, stride, vDSP_Length(length))
Example.
This example produces a 5 element ramp for the range 0…9 to scale down by half, from 10 to 5:
let stride = vDSP_Stride(1)
let length:Int = 5
var base: Double = 0
var end = Double(9)
var control = [Double](repeating: 0, count: length)
vDSP_vgenD(&base, &end, &control, stride, vDSP_Length(length))
print(control)
Output:
[0.0, 2.25, 4.5, 6.75, 9.0]
Control points are used to determine the values in the scaled array. The integer parts are indices, and fractional parts are for mixing with next index: element n
of control determines element n
of the scaled array.
Example.
If control element n
is 6.75 then element n
of the scaled array C
is a mixture of element 6 and 7 of the array A
being scaled, using the fractional part .75:
C[n] = A[6] * (1 - .75) + A[7] * .75
The integer part 6 is the index into array A
and .75 is used to compute the average with the next sample.
Example.
The init
method of ScaleAudioApp
runs code that generates all decimated and some interpolated arrays of a small array [Int16]
of 16 elements.
The output also displays the control points generated by vDSP_vgenpD, printed from within scaleToD
:
init() {
let x:[Int16] = [3,5,1,8,4,56,33,4,77,42,84,25,12,6,13,15]
for i in 1...x.count+4 {
print(i)
print(x)
let scaled = x.scaleToD(length: i, smoothly: true)
print(scaled)
print("scaled.count = \(scaled.count)")
print("----")
}
}
Run the code and consider the 10th output row in the log:
10
[3, 5, 1, 8, 4, 56, 33, 4, 77, 42, 84, 25, 12, 6, 13, 15]
control = [0.0, 1.6666666666666667, 3.3333333333333335, 5.0, 6.666666666666667, 8.333333333333334, 10.0, 11.666666666666668, 13.333333333333334, 15.0]
[3, 2, 7, 56, 14, 65, 84, 16, 8, 15]
The 2nd element of the result, 2
, is computed as follows using the pseudo-code:
n = 1
b = trunc(B[1]) = trunc(1.6666666) = 1
a = B[1] - b = (1.6666666 - 1) = 0.6666666
c[1] = A[1] + a * (A[2] - A[1])
= 5 + 0.6666666 * (1 - 5)
= 5 - 2.6666664
= 2.3333336
towardNearestInteger(2.3333336) = 2
The rounding method is towardNearestInteger
so the final result is 2
Consider the 9th output row to see how this works in the case of a tie:
9
[3, 5, 1, 8, 4, 56, 33, 4, 77, 42, 84, 25, 12, 6, 13, 15]
control = [0.0, 1.875, 3.75, 5.625, 7.5, 9.375, 11.25, 13.125, 15.0]
[3, 2, 5, 42, 40, 58, 22, 7, 15]
The 2nd element of the result, 2
, is computed as follows using the pseudo-code:
n = 1
b = trunc(B[1]) = trunc(1.875) = 1
a = B[1] - b = (1.875 - 1) = 0.875
c[1] = A[1] + a * (A[2] - A[1])
= 5 + 0.875 * (1 - 5)
= 5 - 3.5
= 1.5
towardNearestInteger(1.5) = 2
The rounding method rounds to nearest even integer for the case of the tie 1.5
so the final result is 2
again.
Extension scaleToD
has a ‘smoothly’ option that uses simd_smoothstep of the simd library but this only makes sense for scaling up and therefore the code is conditioned.
The control points in general are generated as:
if smoothly, length > self.count {
let denominator = Double(length - 1) / Double(self.count - 1)
control = (0...length - 1).map {
let x = Double($0) / denominator
return floor(x) + simd_smoothstep(0, 1, simd_fract(x))
}
}
else {
var base: Double = 0
var end = Double(self.count - 1)
control = [Double](repeating: 0, count: length)
vDSP_vgenD(&base, &end, &control, stride, vDSP_Length(length))
}
Finally recall our input is [Int16]
and we want [Int16]
output. Since vDSP_vlintD
input must be [Double]
and returns [Double]
we need to do conversions.
On input convert Int16
to Double
:
let double_array = vDSP.integerToFloatingPoint(self, floatingPointType: Double.self)
And then convert the result from Double
to Int16
:
return vDSP.floatingPointToInteger(result, integerType: Int16.self, rounding: .towardNearestInteger)
2. Interleaving - Combine Multiple Arrays Into One
After the audio sample arrays for each channel have been scaled we need to put them together into a single interleaved array for saving.
Interleaving is performed simply by combining all arrays into one by alternating values sequentially.
So if you have arrays A and B:
A = [A1, A2, A3]
B = [B1, B2, B3]
Then A and B interleaved to C is:
C = [A1, B1, A2, B2, A3, B3]
The general method for an arbitrary number of arrays is implemented as:
func interleave_arrays(_ arrays:[[Int16]]) -> [Int16]? {
guard arrays.count > 0 else { return nil }
if arrays.count == 1 {
return arrays[0]
}
var size = Int.max
for m in 0...arrays.count-1 {
size = min(size, arrays[m].count)
}
guard size > 0 else { return nil }
let interleaved_length = size * arrays.count
var interleaved:[Int16] = [Int16](repeating: 0, count: interleaved_length)
var count:Int = 0
for j in 0...size-1 {
for i in 0...arrays.count-1 {
interleaved[count] = arrays[i][j]
count += 1
}
}
return interleaved
}
2. Create an array of sample buffers [CMSampleBuffer?]
for the array of interleaved scaled audio samples
This will be implemented by the method sampleBuffersForSamples
passing the values previously retrieved for bufferSize
, channelCount
and formatDescription
as well as the interleaved audio samples:
func sampleBuffersForSamples(bufferSize:Int, audioSamples:[Int16], channelCount:Int, formatDescription:CMAudioFormatDescription) -> [CMSampleBuffer?]
Just as we read the data in as CMSampleBuffer
it will be written out as CMSampleBuffer
, where each sample buffer contains a subarray, or block, of the interleaved scaled audio samples.
To facilitate that we have an extension on Array
that creates an array of blocks of size bufferSize
of the array returned by readAndScaleAudioSamples
:
extension Array {
func blocks(size: Int) -> [[Element]] {
return stride(from: 0, to: count, by: size).map {
Array(self[$0 ..< Swift.min($0 + size, count)])
}
}
}
Example.
let x = [4, 7, 9, 3, 5, 2]
let x_blocks_2 = x.blocks(size: 2)
let x_blocks_4 = x.blocks(size: 4)
print("x_blocks_2 = \(x_blocks_2)")
print("x_blocks_4 = \(x_blocks_4)")
Output:
x_blocks_2 = [[4, 7], [9, 3], [5, 2]]
x_blocks_4 = [[4, 7, 9, 3], [5, 2]]
Employ the Array
extension blocks
to create an array consisting of blocks of interleaved audio data:
let blockedAudioSamples = audioSamples.blocks(size: bufferSize)
Then for each such block of Int16
samples we create a CMSampleBuffer
using the method:
func sampleBufferForSamples(audioSamples:[Int16], channelCount:Int, formatDescription:CMAudioFormatDescription) -> CMSampleBuffer?
This method creates a CMSampleBuffer
using CMSampleBufferCreate that contains one block of the interleaved scaled audio data.
For CMSampleBufferCreate we need to prepare samplesBlock
for argument dataBuffer: CMBlockBuffer?
First create a CMBlockBuffer named samplesBlock
for the dataBuffer
argument using the audioSamples with CMBlockBufferCreateWithMemoryBlock.
CMBlockBufferCreateWithMemoryBlock
requires an UnsafeMutableRawPointer named memoryBlock
containing the audioSamples
.
Allocate and initialize the memoryBlock
with the audioSamples
:
let bytesInt16 = MemoryLayout<Int16>.stride
let dataSize = audioSamples.count * bytesInt16
var samplesBlock:CMBlockBuffer?
let memoryBlock:UnsafeMutableRawPointer = UnsafeMutableRawPointer.allocate(
byteCount: dataSize,
alignment: MemoryLayout<Int16>.alignment)
let _ = audioSamples.withUnsafeBufferPointer { buffer in
memoryBlock.initializeMemory(as: Int16.self, from: buffer.baseAddress!, count: buffer.count)
}
Pass the memoryBlock
to CMBlockBufferCreateWithMemoryBlock
to create the samplesBlock
, passing nil
as the blockAllocator
so the default allocator will release it:
CMBlockBufferCreateWithMemoryBlock(
allocator: kCFAllocatorDefault,
memoryBlock: memoryBlock,
blockLength: dataSize,
blockAllocator: nil,
customBlockSource: nil,
offsetToData: 0,
dataLength: dataSize,
flags: 0,
blockBufferOut:&samplesBlock
)
This is the samplesBlock
for argument dataBuffer: CMBlockBuffer?
of CMSampleBufferCreate
.
For argument formatDescription: CMFormatDescription?
of CMSampleBufferCreate
we use the formatDescription
we retrieved in readAndScaleAudioSamples
.
Call CMSampleBufferCreate to create the sampleBuffer
that contains the block of the interleaved scaled audio data in audioSamples
:
let sampleCount = audioSamples.count / channelCount
if CMSampleBufferCreate(allocator: kCFAllocatorDefault, dataBuffer: samplesBlock, dataReady: true, makeDataReadyCallback: nil, refcon: nil, formatDescription: formatDesc, sampleCount: sampleCount, sampleTimingEntryCount: 0, sampleTimingArray: nil, sampleSizeEntryCount: 0, sampleSizeArray: nil, sampleBufferOut: &sampleBuffer) == noErr, let sampleBuffer = sampleBuffer {
guard sampleBuffer.isValid, sampleBuffer.numSamples == sampleCount else {
return nil
}
}
It is important that we correctly specify the sampleCount
taking into consideration of channelCount
, as it is the number of audio frames. As noted earlier a set of time-coincident samples with one from each channel form an audio frame.
Example.
In stereo audio with 2 channels for left (L) and right (R) the audio frames form a logical sequence:
[L1,R2], [L3,R4], [L5,R6], … ,[LN-1,RN]
for array of samples L1,R2,L3,R4,L5,R6,…,LN-1,RN and the sampleCount
is N/2, not N.
Each CMSampleBuffer
created in this way is collected into an array [CMSampleBuffer?]
with sampleBuffersForSamples
:
func sampleBuffersForSamples(bufferSize:Int, audioSamples:[Int16], channelCount:Int, formatDescription:CMAudioFormatDescription) -> [CMSampleBuffer?] {
...
let blockedAudioSamples = audioSamples.blocks(size: bufferSize)
var sampleBuffers:[CMSampleBuffer?] = []
for (index, audioSamples) in blockedAudioSamples.enumerated() {
...
let sampleBuffer = sampleBufferForSamples(audioSamples: audioSamples, channelCount:channelCount, formatDescription: formatDescription)
sampleBuffers.append(sampleBuffer)
}
return sampleBuffers
}
In the next section the [CMSampleBuffer?]
will be written to the output file sequentially.
3. Write the scaled sample buffers in [CMSampleBuffer?]
to a file
Finally implement this method to create the scaled audio file passing the array [CMSampleBuffer?]
:
func saveSampleBuffersToFile(_ sampleBuffers:[CMSampleBuffer?], formatDescription:CMAudioFormatDescription, destinationURL:URL, completion: @escaping (Bool, String?) -> ())
This method uses an asset writer to write the samples.
First create the AVAssetWriter, checking that the AVFileType property has been set, and use the destinationURL
with the compatible file extension as the output file location:
guard let avFileType = avFileType, let assetWriter = try? AVAssetWriter(outputURL: destinationURL, fileType: avFileType) else {
completion(false, "Can't create asset writer.")
return
}
Create an AVAssetWriterInput and attach it to the asset writer.
The source format hint is set to the formatDescription
we retrieved in readAndScaleAudioSamples
and the output settings are set to kAudioFormatLinearPCM
for Linear PCM:
// Header: "When a source format hint is provided, the outputSettings dictionary is not required to be fully specified."
let audioFormatSettings = [AVFormatIDKey: kAudioFormatLinearPCM] as [String : Any]
let audioWriterInput = AVAssetWriterInput(mediaType: AVMediaType.audio, outputSettings:audioFormatSettings, sourceFormatHint: formatDescription)
...
assetWriter.add(audioWriterInput)
Then write each CMSampleBuffer
as the asset write input is ready to receive and append them:
let serialQueue: DispatchQueue = DispatchQueue(label: kScaleAudioQueue)
...
audioWriterInput.requestMediaDataWhenReady(on: serialQueue) {
while audioWriterInput.isReadyForMoreMediaData, index < nbrSamples {
if let currentSampleBuffer = sampleBuffers[index] {
audioWriterInput.append(currentSampleBuffer)
}
...
}
}
Conclusion
To conclude assemble the above pieces readAndScaleAudioSamples
, sampleBuffersForSamples
and saveSampleBuffersToFile
into the final method scaleAudio
to carry out the 3 steps outlined at the start, namely:
- Read the audio samples of all channels of an audio file, scale all and interleave into an
Array
of[Int16]
- Create an array of sample buffers
[CMSampleBuffer?]
for the array of interleaved scaled audio samples - Write the scaled sample buffers in
[CMSampleBuffer?]
to a file
func scaleAudio(asset:AVAsset, factor:Double, singleChannel:Bool, destinationURL:URL, avFileType:AVFileType, progress:((Double, String) -> ())? = nil, completion: @escaping (Bool, String?) -> ()) {
self.avFileType = avFileType
if let progress = progress {
self.progress = progress
}
guard let (bufferSize, channelCount, formatDescription, audioSamples) = readAndScaleAudioSamples(asset: asset, factor: factor, singleChannel: singleChannel) else {
completion(false, "Can't read audio samples")
return
}
guard let formatDescription = formatDescription else {
completion(false, "No audio format description")
return
}
guard let audioSamples = audioSamples else {
completion(false, "Can't scale audio samples")
return
}
let sampleBuffers = sampleBuffersForSamples(bufferSize: bufferSize, audioSamples: audioSamples, channelCount:channelCount, formatDescription: formatDescription)
saveSampleBuffersToFile(sampleBuffers, formatDescription: formatDescription, destinationURL: destinationURL, completion: completion)
}
Arguments:
-
asset:AVAsset - The AVAsset for the audio file to be scaled.
-
factor:Double - A scale factor < 1 slows down the audio, a factor > 1 speeds it up. For example if the audio is originally 10 seconds long and the scale factor is 2 then the scaled audio will be 20 seconds long. If factor is 0.5 then scaled audio will be 5 seconds long.
-
singleChannel:Bool - The AVAssetReader that reads the file can deliver the audio data interleaved with alternating samples from each channel (singleChannel = false) or as a single merged channel (singleChannel = true).
-
destinationURL:URL - A URL that specifies the location for the output file. The extension chosen for this URL should be compatible with the next argument for file type.
-
avFileType:AVFileType - An AVFileType for the desired file type that should be compatible with the previous argument for file extension.
-
progress - An optional handler that is periodically executed to send progress messages and values.
-
completion - A handler that is executed when the operation has completed to send a message of success or not.