Although the information in social media accounts is often publicly accessible, there should be cause for concern
Earlier this month, Comparitech researchers found three exposed datasets housing information on more than 238 million social media profiles. The database is believed to have been left open by social media data broker, Social Data. Per their website, “Social Data helps your business to find Influencers and get in-depth insights into demographic & psychographic data of influencers and their audience throughout different types of social media on the web.” For just how long the data was accessible is unknown. Social Data did take the open servers down three hours after being notified by Comparitech.
81% of the records were scraped from Instagram with 18% from Tik Tok and 2% from YouTube. Data scraping (aka web data extraction) is the process of copying numerous bits of information from a website using automated computer programs. While Instagram, TikTok, and YouTube do not permit data scraping, determining when it occurs, and therefore being able to prevent it, is not necessarily an easy thing to do.
According to Comparitech, some or all of the following information was available for each record; it’s not known whether the information was obtained and used by third parties.
- Full name
- Phone number
- Email address
- Account description
- Follower engagement statistics (e.g., number of followers and audience location)
Along those lines, in May of this year, a similar story emerged. TechCruch published an article noting that an AWS-hosted database owned by Chtrbox was found exposed online and that it contained data scraped from more than 49 million Instagram influencers, celebrities and brand account records (Chtrbox contends the database held data on 350,000 people, not 49 million).
The type of data was consistent with what Comparitech found in Social Data’s servers. Chtrbox connects brands and influencers for advertising purposes. The organization took the database offline shortly after being connected by TechCrunch.
Does it Matter? Factors for Consideration
Aggregated Data is More Vulnerable
Although the data was taken from publicly accessible accounts in both cases (influencer profiles are typically public afterall), the fact that it was compiled into one convenient source makes the data even more vulnerable to misuse and malicious activities such as:
- Phishing campaigns or spear phishing attacks
- Vishing or smishing attacks
- Fake accounts that attract followers then promote misinformation or run scams
- Spam marketing
- Facial recognition technology
- Identify theft (if additional personally identifiable information is gathered)
Why make it any easier for scammers to carry out their misdeeds?
Companies Should Be Good Data Stewards
One might also ask, if it didn’t matter that these databases were left exposed, why pull them offline? Why not just leave them up there?
Organizations should hold themselves to a high standard of data protection across the board. Given the proliferation of data in the cloud and individuals who mine it for profit or other misconduct, any organization that collects and stores any data should take steps to safeguard it.
While sensitive data is certainly the area on which to focus the bulk of effort, non critical data en masse shouldn’t be ignored. Beyond the activities noted above, who is to say that the millions of records left unsecured couldn’t be used in combination with data gathered from elsewhere for more vicious endeavors? Cyber criminals are slick, fast and furious. Anything that can be done to thwart their efforts should be considered.
This not only includes organizations like Social Data and Chtrbox, it also includes the social media companies from which the data originated. It seems some steps have been taken to minimize access and help deter scraping, although the efficacy of those efforts leaves something to be desired. In 2018, Instagram appeared to take some steps to improve privacy as Facebook was embroiled in a number of data scandals (namely, Cambridge Analytica) by reducing the amount of data third party developers could pull from the Instagram API. At the end of June this year, they disabled the remaining Instagram Legacy API permission ("Basic Permission") in favor of APIs that allow less data acquisition.
When all is said and done, it’s just good practice to secure data.
Tools Make Data Protection Easy
Unsecured databases hosted in the cloud are not a new problem. They represent an ongoing issue that needs to be proactively addressed. While we don't know for sure why Social Data and Chtrbox databases were left open, it is easy to assume that configuration mistakes and human error were root causes.
And yet, simple actions to sidestep those issues can make a world of difference. For example, in regard to scraping, something as simple as replacing the standard “awsuser” username in public cloud database configurations can help organizations avoid security breaches and misuse of cloud resources; using a custom username makes it less likely that scraping software can guess the master username.
Automated cloud security solutions can help secure unsecured databases. Prevent an attacker or scammer from exploiting misconfigurations or unsecured data by identifying and fixing issues in real time with tools like SecureCloudDB. (We can also tell you if you need to update your “awsuser” name). Together, we can raise the bar and make unsecured public cloud databases a thing of the past.
See How Much Your Security Posture Improves with SecureCloudDB
14-Day Trial | All Features Included | No Credit Card Needed.
Discover where databases and backups are located
Unlock risk scores and take steps to improve them
Find and fix critical vulnerabilities in two weeks time
See what's happening in real time with Database Activity Monitoring