Hacking Clubhouse for fun and profit

Clubhouse API explained

Stan Sobolev
17 min readFeb 20, 2021

Clubhouse (“Clubhouse”) is an audio chat-type social networking app developed by Alpha Exploration Co in 2020. Recently, it has become more popular everywhere, and invitations are sold at the carrot market, and some people buy used iPhones to try the clubhouse. Celebrities such as Elon Musk and Noh Hongcheol also joined and actively worked, and it seems to have received more attention. Thanks to this tremendous popularity, it was recognized that the company valued at over 1 trillion won a while ago, I envy…

In some ways, it’s an audio chat app that isn’t very new, but it seems to come especially in that you can talk and debate without breaking out of the present life with not only an intuitive UI/UX but also a large number of people from various backgrounds. However, there are some regrets behind the splendor. First of all, since only iOS/iPad apps exist right now, there is a disadvantage that Android users cannot use them, and there have been talks about voice data going through Chinese servers, and discussions on security issues have also been conducted.

I can’t help but wonder if it’s technically difficult to create an Android or web/PC version of Clubhouse, or if my conversations are really being sent to a Chinese server. If so, you can check it with your own eyes! The following is an analysis based on what I looked at home for about a day because I couldn’t go anywhere because of the corona this New Year holidays. I didn’t spend a lot of time, so I didn’t look deep, so please see it for reference only :)

Structure / Flow

Clubhouse consists of four components (as of v0.1.27).

  1. Clubhouse App
  2. Clubhouse API Server
  3. Agora RTC Server
  4. PubNub Server

Figure 1. Clubhouse App structure and flow diagram

Clubhouse seems to have set up a strategy that effectively utilizes existing platforms rather than implementing all of its own complex functions such as multi-party real-time voice conversation in a short period of time. So, with the exception of user authentication and club management, core functions such as voice chat and real-time state change processing are actively utilizing 3rd-party PaaS. Looking more specifically, RTC, a real-time voice conversation technology, uses Agora.io service, and Pub/Sub, asynchronous messaging technology, uses PubNub service.

The Club House iOS app operates as a client and provides UI/UX of social network functions to users. Basically, it provides functions such as membership registration and login, user/club search, user invitation, channel (room) list display, schedule management, follower/following management, room creation, alarm, and of course, participants when they participate in a specific room It provides features such as lists and raising hands. The app communicates with the Clubhouse API server using an HTTP request, which is responsible for updating the state by actually communicating the functions that we have just described with the server side.

When the information exchange for establishing RTC session through Agora server is finished, all parts related to voice conversation are managed by Agora RTC SDK, and real-time state change of participants in the conversation is updated by sending and receiving information with other clients through PubNub SDK. .

Registration / Login Flow

  • Enter phone number
  • Enter the verification code sent by text
  • Check if it is in the wait list
  • If correct, ID creation and waiting information
  • If not, call up and display channel list
  • Special note: Apart from the wait list status, the user token is delivered from the server, and some APIs can be used even if the account belongs to the wait list.

Channel Join / Leave Flow

  • Choose the room you want to join
  • Passing room details and participant list
  • Transfer RTC Token + PubSub Token
  • Listen to real-time conversation
  • RTC channel join and channel information Subscribe
  • Audio data reception + Playback
  • Exchange of participant information (join/leave, etc.)
  • Leave the room
  • RTC channel destruction and PubSub release

Speaker / Moderator Flow

  • Moderator appoints participants and promotes to moderator or speaker
  • Transfer of RTC Token with speaker role + Subscribe to related PubSub channel
  • With speaker authority, audio data transmission
  • With moderator authority, it is possible to cancel or change to listener

Now that we understand the basic structure, let’s take a closer look at each component.

REST API

Almost all of Clubhouse’s endpoints, including API servers, are located behind Cloudflare’s infrastructure. So, most requests to the clubhouse API server address, www.clubhouseapi.com, have __cfduidcookies attached. Also, in the request sent to the API server, there are HTTP headers unique to Clubhouse such as language, user ID, app version or device specific value. User-Agent is also sent with a custom UA set including the app build number. Lastly, all requests include the Token given from the server at the time of sign-up/login in the Authorizationheader.

CH-Languages: en-US
CH-UserID: 1234567890
CH-Locale: en_US
CH-AppBuild: 297
CH-AppVersion: 0.1.27
CH-DeviceId: 7CAF8200-EC2B-4392-A62B-62D41AFB7648
User-Agent: clubhouse/297 (iPhone; iOS 14.4; Scale/2.00)
Authorization: Token ef1f1be31620226ea1dee33edfc6e3feecc5036f

Each API endpoint has https://www.clubhouseapi.com/api/ as its base address, and the endpoint list is as follows. The role of most endpoints is self-explanatory.

API List (Total: 107, v0.1.27)

record_action_trails
start_phone_number_auth
call_phone_number_auth
resend_phone_number_auth
complete_phone_number_auth
check_waitlist_status
get_release_notes
get_all_topics
get_topic
get_clubs_for_topic
get_users_for_topic
update_name
update_displayname
update_bio
update_username
update_twitter_username
update_skintone
add_user_topic
remove_user_topic
update_notifications
add_email
get_settings
update_instagram_username
report_incident
get_followers
get_following
get_mutual_follows
get_suggested_follows_friends_only
get_suggested_follows_all
get_suggested_follows_similar
ignore_suggested_follow
follow
follow_multiple
unfollow
update_follow_notifications
block
unblock
get_profile
get_channel
get_channels
get_suggested_speakers
create_channel
join_channel
leave_channel
active_ping
end_channel
invite_speaker
uninvite_speaker
mute_speaker
make_moderator
accept_speaker_invite
reject_speaker_invite
invite_to_existing_channel
audience_reply
make_channel_public
make_channel_social
block_from_channel
get_welcome_channel
reject_welcome_channel
change_handraise_settings
get_create_channel_targets
update_channel_flags
hide_channel
get_notifications
get_actionable_notifications
ignore_actionable_notification
me
get_online_friends
search_users
search_clubs
check_for_update
get_suggested_invites
invite_to_app
invite_from_waitlist
invite_to_new_channel
accept_new_channel_invite
reject_new_channel_invite
cancel_new_channel_invite
add_club_admin
add_club_member
get_club
get_club_members
get_suggested_club_invites
remove_club_admin
remove_club_member
accept_club_member_invite
follow_club
unfollow_club
get_club_nominations
approve_club_nomination
reject_club_nomination
get_clubs
update_is_follow_allowed
update_is_membership_private
update_is_community
update_club_description
update_club_rules
update_club_topics
add_club_topic
remove_club_topic
get_events
get_events_for_user
get_events_to_start
delete_event
create_event
edit_event
get_event

Agora

Agora.io provides a real-time video and audio chat platform as a service. The company, based in China and California, went public on the NASDAQ in June and attracted attention as a company with the background skills of the clubhouse. In particular, through its own network infrastructure called SD-RTN™ (Software Defined Real-time Network), we provide technology that connects ultra-low-latency real-time video/audio worldwide in the form of service. It also uses UDP for speed optimization.

All audio and video services provided by the Agora SDK are deployed and transmitted through the Agora SD-RTN™. Agora deploys over 250 data centers worldwide that use intelligent dynamic routing algorithms to achieve millisecond latency and ensure high availability of Agora’s service.

Platform support is also very wide. It provides SDK that covers virtually most use cases such as Android, iOS, macOS, Web, and Windows, as well as frameworks such as Electron, Unity, and React Native, allowing developers to quickly and easily enable real-time video/audio services to be provided. Currently, the 4.x version of the web SDK is being developed, and the latest version is 3.3.0 on other platforms. In the case of Club House, the SDK for iOS of version 3.0.1.1 is used.

Most recent developer documentation and references are written in English and Chinese, but it has been confirmed that only Chinese documentation and code comments exist in previous versions. Agora’s server code was not open source, so I couldn’t review it, but I looked at the developer documentation to understand the overall composition and check the security-critical parts as much as possible.

App ID, App Certificate

To give it a unique ID in order to know what to Agora processing services for some apps, the App ID is embedded in the clubhouse app Info.plistfrom AGORA_KEYstored with the name. App Certificate is a randomly generated string to issue an authentication token and can be generated by app developers in the Developer Center.

RTC Token Generation

Token, also called dynamic key, is used for authentication when a user joins the channel. In Agora, the dynamic key is largely composed of RTM (Real-time Messaging) tokens and RTC (Real-time Communication) tokens, but the clubhouse uses only the RTC part. The generation of this token is to be implemented on the service operator (i.e. Clubhouse) server, with sample code provided in various languages.

Figure 2. Token Generation (from Agora.io)

It is created and used in the form as above. In other words, it is a method of directly connecting to the Agora server using the Agora SDK by receiving the token generated and issued by the clubhouse server from the client. So, how is this token created? Example code in Python3 can be found here .

To put it simply, an RTC token is created by including values ​​including app ID, channel name, user ID, and role (Publisher or Subscriber) that specifies what authority the user has in the channel, along with the validity period. The point to note here is that the HMAC is calculated using the App Certificate known only to the developer so that the user cannot arbitrarily change the role.

val = self.appID.encode('utf-8') + self.channelName.encode('utf-8') + self.uidStr.encode('utf-8') + m
signature = hmac.new(self.appCertificate.encode('utf-8'), val, sha256).digest()

The above signaturevalue is included in the token, and is used to check whether the message has not been tampered with while the client receives the token from the app server and sends it to the Agora server. Since the Agora server code has not been disclosed, it is difficult to verify whether it is actually verified, but it was confirmed that if only the payload part corresponding to the role was changed and sent, the server returned a failure code.

Role-based Privilege Separation

All participants do not have the same authority, but the authority is divided into a Publisher (corresponding to a speaker in a clubhouse) and a Subscriber (corresponding to an Audience in a clubhouse) that has the authority to transmit audio. As mentioned above, this authority is issued. Included in the token. When you receive the Speaker role through the Clubhouse API, you will receive a token containing the appropriate permission and access. If you are on the audience list in the clubhouse and then go up to the speaker, you will experience the phenomenon that the audio is disconnected for a while and then reconnected. This is believed to be because the existing session is terminated and the new permission is reconnected with the added token.

In other words, since audio data can be transmitted only if there is a Publisher role, trolling is not allowed to transmit audio randomly by connecting as a simple audience participant.

Separate Audio Stream

Audio media streams can be transmitted separately for each user who connects to the channel and transmits it, so after recording each track, mixing is possible by post-processing. The codec for audio supports ILBC, SILK, NOVA, HVXC, AAC, etc. depending on the mode.

Encryption

Agora.io supports encryption of data in two main ways.

Figure 3. Agora Data Encryption

Built-in encryption

  • It supports end-to-end (E2E; device-to-device) encryption, which encrypts and sends data from a device and decrypts data from another device. Supported encryption modes are AES-128-XTS, AES-128-ECB, AES-256-XTS, SM4–128-ECB, and the creation, storage, transmission, and verification of encryption keys are for SDK users (i.e. Clubhouse). Can be controlled directly. For example, as the Clubhouse API server generates and delivers RTC tokens to clients, the Agora server can not see the contents of the data by generating and delivering encryption keys for each channel.

Custom encryption

  • If SDK users want, they can implement and use arbitrary encryption functions. This function is applied before the audio data is encoded, and after decoding on the receiving side, so that the Agora side (until reverse engineering the client) can not know the encryption method itself.

PubNub

PubNub is a service that provides a real-time communication layer. In fact, this part has not been analyzed in detail, so there is not much to talk about, but just like the RTC token, the PubNub token for the room accessed from the Clubhouse API server is delivered to publish, subscribe, and exchange data. It seems to be used to update in real time about joining a specific room, promoting someone to a speaker or moderator, or raising a hand.

The following is a list of actions of messages coming and going through the current PubNub channel.

Action List (Total: 22, v0.1.27)

join_channel
leave_channel
add_speaker
remove_speaker
end_channel
make_channel_public
make_channel_social
reject_welcome_channel
make_moderator
change_handraise_settings
raise_hands
unraise_hands
invite_to_new_channel
accept_new_channel_invite
reject_new_channel_invite
cancel_new_channel_invite
invite_speaker
uninvite_speaker
reject_speaker_invite
accept_speaker_invite
remove_from_channel
mute_speaker

Security

Clubhouse iOS app basically performs all HTTP communication over TLS, so communication with the API server is securely encrypted and protected. Also, by applying Certificate Pinning, packet access is prevented through a simple Man-in-the-Middle (MITM) method that registers an arbitrary root certificate. However, if you use the Frida-based objection tool, you can easily disable Certificate Pinning and check the contents of the communication.

What security attacks are the clubhouses currently exposed to? The analysis itself was so urgent that we could not do a full investigation, but through the process of understanding the above, we have briefly summarized the possible security vulnerabilities in the clubhouse.

1. Potential Account Takeover

The only information you need to join the Clubhouse is a valid phone number to receive text messages. If you enter a phone number and proceed, a 4-digit code will be sent to the number, and when you enter the number, phone number verification is completed. As described above, not only signing up but also logging in has the same flow. That is, if the attacker knows the phone number of the attacker and the 4-digit code generated during the login process, the account can be acquired immediately without additional verification. Since there are only 10,000 cases of a 4-digit code, it can be very dangerous in security depending on the situation.

Two attack scenarios can be considered.

  1. When specifying an attack target
  • The attack target’s phone number is fixed and the 4-digit code is continuously attempted in a brute-force method until authentication is successful.

2. When not specifying the target of attack

  • It attempts to authenticate against a large number of random phone numbers (linked to the clubhouse account). However, very few authentication attempts are made for each number.

Each attempt can be successful in tens of thousands, or 0.01%, so about 7,000 (or 7,000 phone numbers) attempts have a chance of success of about 50%, which means a very realistic attack potential. However, the scenario above is a “ideal” situation in which the clubhouse does not limit the above attempts at all. Not all possibilities have been tested, so it may not be perfect, but through a simple test, we confirmed that the following rate limit is applied in the part that verifies the 4-digit code.

  • The total number of attempts per character is 3 times . If you enter the wrong code more than 3 times, the existing 4-digit code will be invalidated and you will have to proceed with the text verification process from the beginning.
  • If authentication continues to fail with the same phone number, the text authentication function for that number is suspended for about 30 minutes .

So, what is the risk of attack given these limitations? The probability of attack is slightly higher because three attempts per one code are possible instead of having to match a new code each time. You lose. However, if an attacker is patient and tries over a few days, they will be able to take over the desired account in about three weeks or so. Of course, a lot of verification code texts were sent to the victim’s mobile phone number, so it would be quite noticeable.

If an attacker is really lucky and can take over the accounts of socially influential politicians, corporate representatives, or celebrities while they sleep, there is a possibility that it will cause great confusion due to the nature of the faceless social network. There seems to be a way to leverage the power of machine learning, which has developed a lot of voice, and it is dazzling to think of the craftsmen in the vocal chord simulation room, which is very popular in clubhouses these days.

Another reason that this attack potential is a big problem is because of the “multiple logins” we’ll talk about in item 3 below. This is because if the attacker successfully authenticates the target account, there is no way for the victim to know if another person has successfully logged in to his or her account, and there is no function to log out sessions logged in to other devices. For a more secure account environment, Clubhouse would recommend the following supplements.

  • Verification code increased to 6 or more digits
  • Addition of currently logged-in session management function (viewing login history and forced session logout)

2. Unencrypted Voice/Data Channels

As mentioned above, Agora supports encryption of data. However, audio data and control data transmitted and received by the clubhouse are not currently encrypted.

Channel ID, user ID, RTC token, and internal IP are included in many packets, and RTC messages are also exposed to UDP packets as they are.

Because of these settings, security problems such as eavesdropping/wiretapping and tampering can occur. Two scenarios can be considered.

  1. Stealth tapping/tapping + trolling
  • With only the information contained in the packet, you can access the RTC channel and collect voice data, and if you have a token with speaker rights, there is a risk of sending arbitrary audio data to troll.
  • In particular, if you directly connect to the RTC channel without going through the Clubhouse API, it is not possible to force export because it is not displayed as a listener or speaker in the Clubhouse app UI.

2. Data tampering

  • Since UDP packets are not encrypted, the contents of the packet can be altered through an attack such as ARP spoofing in the middle to change the audio stream data so that different audio can be heard.

3. Miscellaneous

Multiple login sessions

As already mentioned above, in the current clubhouse, even if the attacker successfully logs in to the target account, the victim has no way to know if someone has successfully logged in to his account, and view the list of sessions logged in to other devices or log out. There is no function to let you do. It would be great if the currently logged-in session management function (viewing login history and forced session logout) was added. If you log in from a location other than your own device, please log out.

Chinese Servers

It is difficult to obtain an accurate IP because it uses a Cloudflare proxy, but it is believed that all of the clubhouse servers exist on AWS instances outside of China. However, Agora SDK exchanges data with various servers in order to find the best route to reduce network latency, and in this process, we can confirm that it communicates with servers located in China.

Figure 4.Talking with Alibaba Server in Mainland China (112.126.96.46)

On the 12th, the clubhouse responded as follows.

For example, for a small percentage of our traffic, network pings containing the user ID are sent to servers around the globe — which can include servers in China — to determine the fastest route to the client. Over the next 72 hours, we are rolling out changes to add additional encryption and blocks to prevent Clubhouse clients from ever transmitting pings to Chinese servers. We also plan to engage an external data security firm to review and validate these changes

In summary, user IDs were sent to servers around the world, including servers in China, to find the fastest route to clients among some of the traffic, and additional encryption was applied to the Clubhouse app within 72 hours, and the app was deployed in China. He said he would patch it so that nothing was sent to the server located at. Perhaps you don’t apply the encryption mentioned above.At the end, he said he would hire an external security company to verify the new patch, but it would have been nice if he left it to us… huh huh.

However, what is a bit shocking is that the above answers are not considered true. According to the results of the experiment conducted by myself, not only the user ID but also the unencrypted RTC token mentioned as a security issue above were transmitted together, so it was possible to access the channel at any time and listen to the audio stream.

These SDKs support network geofencing in the following regions: global (default), North America, Europe, Asia (excluding Mainland China), Japan, India, and Mainland China. Once a customer specifies a region using geofencing, no audio, video, or message can access Agora servers outside that region.

In addition, Agora supports Geofencing. Probably, we are planning to implement this function so that nothing is sent to the server located in China, but it seems difficult to implement the function only with the functions supported by the SDK. Because, if you read the documentation on geofencing carefully, you can read “After enabling geofencing, the Agora SDK only connects to Agora servers within the specified region.” There is a passage, that is because you can use Geofencing to access only certain regions (eg North America, Europe, Japan), but there is no option to allow servers other than certain regions (eg China). However, the clubhouse probably occupies most of the recent Agora traffic share, so the two companies will do it well.

Conclusion

I hope this article has given you a little explanation on how the hot clubhouse service is designed, how it is configured, and whether there are any security issues that many people have been curious about. There are various functions in addition to the ones covered above, but we ask for your understanding that we have made a selection and concentration to quickly solve your questions by dedicating a day during the Lunar New Year holidays. In addition, although a client that can run on a PC was created based on the analysis, it is not disclosed because there are always legal issues and the risk of bans in use.

If you have a chance to look at it more because you can afford it, we will update the related information later. Maybe it will be after the encryption logic is entered :)

Thank you for reading the long article. See you at the clubhouse~👋

Update #1: v0.1.28

When I published the blog post, there was an update to the v0.1.28 version, so I quickly looked at it. In fact, there is no significant difference in functionality from v0.1.27, but geofencing and encryption related parts have been changed as mentioned above.

Geofencing

The Clubhouse app now excludes China from areas where connections are allowed by setting AgoraRtcEngineConfigthe areaCodevalue in the class to 0xFFFFFFFE (i.e. ~0x1). In fact, I said above that it would be difficult to exclude only one region, but I thought so only by looking at the phrase in the docs document. However, when I actually looked at the docs for the API functions, “You can use the bitwise OR operator (|) to specify multiple areas.” I found that it is further described. You can set multiple regions at the same time.

typedef NS_ENUM(NSUInteger, AgoraIpAreaCode ) {
AgoraIpAreaCode_CN = ( 1 < < 0 ),
AgoraIpAreaCode_NA = ( 1 < < 1 ),
AgoraIpAreaCode_EUR = ( 1 < < 2 ),
AgoraIpAreaCode_AS = ( 1 < < 3 ),
AgoraIpAreaCode_GLOBAL = ( 0 xFFFFFFFF ),
};

As you can see from the definition above, CN (China) AreaCode is expressed as a value of 0x1.

Encryption

It seems that it hasn’t actually been applied yet, but code has been added to enable it to be activated at any time on the server side. Encryption is activated in the form of code introduced below only when the encryption key is received from the API server. Among the contents described in the text, the built-in encryption method is used, and the encryption key is delivered and the Agora RTC channel session is created using the key, just as the RTC Token is delivered through the Clubhouse API. The Encryption Mode used AES256XTSis AES-256-XTS.

let config = AgoraEncryptionConfig()
config.encryptionKey = encryptionKey
config.encryptionMode = .AES256XTS
agoraKit.enableEncryption(true, encryptionConfig: config)

In other words, when the function is activated and used, the encryption key is provided to the client directly from the API server, so the encrypted data cannot be viewed on the Agora server. Likewise, data transmitted in UDP packets are encrypted and verification is required directly after the function is activated, but as in the documentation, if the SDK does not really send this Encryption key to the Agora server, the above-mentioned scenarios such as Public Wi-Fi can Eavesdropping and tampering become impossible.

I don’t know if I should applaud the clubhouse for quick action, or praise Agora, who already designed and implemented these features. Of course, it would be best if it had been implemented to avoid these concerns from the start, but I would commend both!

--

--