Skip to content

Conversation

@dgruss
Copy link
Contributor

@dgruss dgruss commented Jun 5, 2025

These are first steps in the direction of automatically determining the mic delay.

in the optionsrecord screen (where you select the microphones):

press W to play the wave file and measure the latency (shown on the top right). try this multiple times to figure out what the actual latency is.

@basisbit
Copy link
Member

In UltraStar Play, we added a similar feature, however we ended up having to add audio playback of 3 different frequencies to make it more resilient to false detection caused by noise and issues caused by input signal filters.
The audio files in UltraStar Play are in https://github.com/UltraStar-Deluxe/Play/tree/master/UltraStar%20Play/Assets/Common/Audio/SineWaveTones and the code for it can be found here: https://github.com/UltraStar-Deluxe/Play/blob/master/UltraStar%20Play/Assets/Scenes/Options/RecordingOptions/CalibrateMicDelayControl.cs
It is MIT licensed, and thus can also be used in USDX without any issues.

@dgruss
Copy link
Contributor Author

dgruss commented Jun 15, 2025

that sounds more robust, but i wouldn't really know how to integrate that in the usdx code base without rewriting much more...

i went for an option with only minimal code changes (not even 100loc)

@dgruss
Copy link
Contributor Author

dgruss commented Jun 28, 2025

i tried this on multiple x86 windows and arm linux now, works pretty well with the wave file. midi has delays... i might just remove that, and then the macos build will also work.
i'm not sure why the thing only works after entering the sing menu once, so this is still a bug. other than that i think it's pretty convenient.

alternative locations: one of the option menus, e.g., the microphone menu, where the microphone is selected, then we could do even a microphone specific measurement. any thoughts?

@dgruss
Copy link
Contributor Author

dgruss commented Jul 20, 2025

moved this to the options menu.

@barbeque-squared
Copy link
Member

UI-wise this is a really good place for it!

But it's possible this is specific to my system, but I get extremely varied results. Both mic boost and threshold also seem to affect the detected delay.

Mic boost seems pretty binary: if threshold remains the same, different values of mic boost will either make it detect it, or not detect anything at all. Makes sense, seems like expected behaviour.

I can't quite figure out how the Threshold values I'm observing are doing though. I suspect it's picking up keyboard noise, which gets detected as C6 for some reason. Tapping on my desk is also C6. Dragging the microphone around is C6. I think this might accidentally be the root cause of a "bug" I've been observing for quite some time now, where if I ctrl-right through a song that (probably) has a lot of C (or B / C / C# if playing on Normal), you get an insane amount of points.

But there's a second thing going on (which I can't really tell is a bug in this PR, a bug/feature elsewhere in USDX, or something I just can't properly test without setting up a more involved setup where the C6 thing is less pronounced): when keeping Mic Boost the same, the higher the Threshold, the higher the (averaged) reported delay appears to be? in my particular case I can get it to fairly reliably report:

  • 10% Threshold: -5 to 80 ms (I suspect C6 issue)
  • 15% Threshold: 60 to 150ms
  • 20% Threshold: 110 to 200ms
  • any higher threshold: 150 to 250ms.

In this particular case, the ingame delay is around 140ms so the numbers are still useful, but I'd need my other setup to tell more on this.

Very offtopic C6 theory but otherwise I'll forget about it
Chances are that as soon as the signal gets above a certain threshold/volume/amplitude, the pitch detection stuff must find something. But there's no confidence check whatsoever. What I'm saying is that when a human is singing (or rapping?), there's probably one or a few very closely related frequencies/pitches that are clearly it, and it's probably some kind of normal distribution? If I hang a microphone above a traffic junction, or record a passing jet, I highly doubt all of that just "happens" to be C6.

@dgruss
Copy link
Contributor Author

dgruss commented Jul 23, 2025

the changes are fairly high... i would not be surprised by +-10ms but this is much higher.

is this with a fixed cpu frequency? the cpu frequency can jump a lot and could cause delay changes like this. --> possibly we want to take this into account in future versions

is this a laptop? some laptops (and desktops) have audio cards and drivers that do echo cancellation already at the level of the audio driver. on linux this is less likely the case, on windows you can actually configure post-processing options for each audio device

regarding C6: that's just a random note... i might improve the audio also by playing a more flat sound without ramp up. the first version i had was using midi, but midi has more delays on my systems / is not even supported on macos builds and only partially on Linux. And I've seen very inconsistent delays there from just playing the midi note. so a wave file appears to be more robust and portable. then modulating to different notes is more tricky though. We could use an audio signal that plays two note and try to detect the transition. That would be robust, if your background noise is one of the two notes, it will still be picking up when the transition between the two notes happens. it would also be more robust to any ramp up effects

@barbeque-squared
Copy link
Member

This is a laptop. It does not do any hardware echo cancellation. I'm not sure if/how CPU frequency should influence this, but yes, it does ramp its frequency up and down automatically, so we can't rule it out. I'll figure out a way to get a build with this PR on my other PC and do some retests there.

Using a wave file is fine, midi doesn't work for me at all, and I'd like to avoid platform-specific bits of code.

For C6 I'll have to try some different microphones and also the other PC. I'll try to do it this weekend.

@dgruss
Copy link
Contributor Author

dgruss commented Jul 24, 2025

CPU frequency has a massive influence as the latency is to a significant determined by software-processing of the audio signal. Some kernel code, then some library code, then the usdx code, all do some buffering and passing on, usdx then processes and interprets the data. I would suspect that at least 60% of the latency is in the libary + usdx code. This latency is linearly dependent on the CPU frequency.
Depending on how recent your laptop is it will have a wider range. 10 year old laptops were in the range 0.8GHz to maybe 3-4GHz, more recent laptops might go up to around 5GHz and 0.4GHz on the lower end of the range.
Now if the latency is 80ms first and 60% is in the library and usdx, and it is running at 4GHz, then 48ms at 4GHz is about 192 million cycles of instructions. At 0.8GHz (due to the power budget regulating the CPU frequency this can fluctuate pretty quickly and unpredictably) running through the same instructions (because the code to interpret the audio signal didn't change) might now take 240ms.
So, yes, this could be CPU-related delays. It definitely is an issue we should look into. Possibly the game should fix the CPU frequency (which would require administrator permissions on Windows / root on Linux / not sure about macos).

@dgruss
Copy link
Contributor Author

dgruss commented Jul 24, 2025

this might also explain part of the delay differences with more microphones: more code executed (this alone will also add latency of course) --> more power consumed --> less power budget for higher clock frequencies left --> lower clock frequencies --> additional higher delays due to lower execution speed

@dgruss
Copy link
Contributor Author

dgruss commented Jul 30, 2025

i made some changes to work with 3 notes, each 900ms, very sharply cut off to avoid any transition delays.

this is output from my system:

INFO:   Ping Notes Measured vs. Ideal: [TScreenOptionsRecord.DrawDelay]
INFO:   1. [199 - 1019] ms instead of [0 - 900] ms [TScreenOptionsRecord.DrawDelay]
INFO:   2. [1079 - 1899] ms instead of [900 - 1800] ms [TScreenOptionsRecord.DrawDelay]
INFO:   3. [2119 - 2878] ms instead of [1800 - 2700] ms [TScreenOptionsRecord.DrawDelay]
INFO:   Ping Notes Measured vs. Ideal: [TScreenOptionsRecord.DrawDelay]
INFO:   1. [241 - 1001] ms instead of [0 - 900] ms [TScreenOptionsRecord.DrawDelay]
INFO:   2. [1061 - 1882] ms instead of [900 - 1800] ms [TScreenOptionsRecord.DrawDelay]
INFO:   3. [2102 - 2860] ms instead of [1800 - 2700] ms [TScreenOptionsRecord.DrawDelay]
INFO:   Ping Notes Measured vs. Ideal: [TScreenOptionsRecord.DrawDelay]
INFO:   1. [241 - 1001] ms instead of [0 - 900] ms [TScreenOptionsRecord.DrawDelay]
INFO:   2. [1062 - 1882] ms instead of [900 - 1800] ms [TScreenOptionsRecord.DrawDelay]
INFO:   3. [2102 - 2860] ms instead of [1800 - 2700] ms [TScreenOptionsRecord.DrawDelay]
INFO:   Ping Notes Measured vs. Ideal: [TScreenOptionsRecord.DrawDelay]
INFO:   1. [242 - 1001] ms instead of [0 - 900] ms [TScreenOptionsRecord.DrawDelay]
INFO:   2. [1061 - 1881] ms instead of [900 - 1800] ms [TScreenOptionsRecord.DrawDelay]
INFO:   3. [2102 - 2860] ms instead of [1800 - 2700] ms [TScreenOptionsRecord.DrawDelay]
INFO:   Ping Notes Measured vs. Ideal: [TScreenOptionsRecord.DrawDelay]
INFO:   1. [262 - 1001] ms instead of [0 - 900] ms [TScreenOptionsRecord.DrawDelay]
INFO:   2. [1061 - 1881] ms instead of [900 - 1800] ms [TScreenOptionsRecord.DrawDelay]
INFO:   3. [2121 - 2880] ms instead of [1800 - 2700] ms [TScreenOptionsRecord.DrawDelay]
INFO:   Ping Notes Measured vs. Ideal: [TScreenOptionsRecord.DrawDelay]
INFO:   1. [241 - 981] ms instead of [0 - 900] ms [TScreenOptionsRecord.DrawDelay]
INFO:   2. [1062 - 1882] ms instead of [900 - 1800] ms [TScreenOptionsRecord.DrawDelay]
INFO:   3. [2082 - 2860] ms instead of [1800 - 2700] ms [TScreenOptionsRecord.DrawDelay]
INFO:   Ping Notes Measured vs. Ideal: [TScreenOptionsRecord.DrawDelay]
INFO:   1. [242 - 1002] ms instead of [0 - 900] ms [TScreenOptionsRecord.DrawDelay]
INFO:   2. [1062 - 1882] ms instead of [900 - 1800] ms [TScreenOptionsRecord.DrawDelay]
INFO:   3. [2082 - 2860] ms instead of [1800 - 2700] ms [TScreenOptionsRecord.DrawDelay]
INFO:   Ping Notes Measured vs. Ideal: [TScreenOptionsRecord.DrawDelay]
INFO:   1. [262 - 1002] ms instead of [0 - 900] ms [TScreenOptionsRecord.DrawDelay]
INFO:   2. [1062 - 1902] ms instead of [900 - 1800] ms [TScreenOptionsRecord.DrawDelay]
INFO:   3. [2102 - 2880] ms instead of [1800 - 2700] ms [TScreenOptionsRecord.DrawDelay]
INFO:   Ping Notes Measured vs. Ideal: [TScreenOptionsRecord.DrawDelay]
INFO:   1. [260 - 1000] ms instead of [0 - 900] ms [TScreenOptionsRecord.DrawDelay]
INFO:   2. [1060 - 1880] ms instead of [900 - 1800] ms [TScreenOptionsRecord.DrawDelay]
INFO:   3. [2100 - 2879] ms instead of [1800 - 2700] ms [TScreenOptionsRecord.DrawDelay]

In all cases the onset is more difficult as usdx pitch detection has to transition from one to the other pitch and it appears to be a bit slow with that. For the last note, the ending transition might be unreliable too as the usdx pitch detection might provide the same note a bit longer even if its not there anymore.
So the decision on the actual delay is now based on the ending of the first two notes, which should be the most reliable.

Should the debug output be removed?

@dgruss
Copy link
Contributor Author

dgruss commented Sep 2, 2025

i tested this a bit more and interestingly, the delay measured here is much lower than what you have to configure for the microphone delay

@s09bQ5
Copy link
Collaborator

s09bQ5 commented Sep 8, 2025

I think SoundLib.Ping.Position needs to be used instead of relying on SDL_GetTicks since that's what is used during normal game play. But it will probably lower the measured delay even more because it removes the time between .Play and feeding the first sample into the audio driver from the equation. This time is not constant because samples are fed into the driver not on .Play, but when the next periodic sound card interrupt happens.

Would it make sense to store a time stamp in TCaptureBuffer.ProcessNewBuffer that is retrieved with an optional argument to TCaptureBuffer.AnalyzeBuffer? That way we can shave off a few more milliseconds of jitter from our measurements.

dgruss and others added 5 commits September 21, 2025 10:54
in the main menu: enter the sing menu and go back (i don't know why this
is necessary)
press W to play the wave file and measure the latency (shown on the top
right). try this multiple times to figure out what the actual latency
is.
press M to play the same tone via MIDI output and measure the latency.
on my system MIDI is 150ms slower than playing the wave file
@dgruss
Copy link
Contributor Author

dgruss commented Sep 21, 2025

bit of cleanup and switching to the same timing method that is used for the MicDelay... now the delays make a lot more sense to me but i still have to test it on different systems

@dgruss dgruss marked this pull request as ready for review September 21, 2025 18:13
@dgruss dgruss marked this pull request as draft November 9, 2025 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants