3

Background

We allow the user to create some text that will get converted to HTML, using a rich-text editor library (called Android-RTEditor).

The output HTML text is saved as is on the server and the device.

Because on some end cases, there is a need to show a lot of this content (multiple instances), we wish to also save a "preview" version of this content, meaning it will be much shorter in length (say 120 of normal characters, excluding the extra characters for the HTML tags, which are not counted).

What we want is a minimized version of the HTML. Some tags might optionally be removed, but we still want to see lists (numbered/bullets), no matter what we choose to do, because lists do show like text to the user (the bullet is a character, and so do the numbers with the dot).

The tag of going to next line should also be handled , as it's important to go to the next line.

The problem

As opposed to a normal string, where I can just call substring with the required number of characters, on HTML it might ruin the tags.

What I've tried

I've thought of 2 possible solutions for this:

  1. Convert to plain text (while having some tags handled), and then truncate : Parse the HTML, and replacing some tags with Unicode alternatives, while removing the others. For example, instead of a bullet-list, put the bullet character (maybe this), and same for numbered list (put numbers instead). All the other tags would be removed. Same goes for the tag of going to the next line ("
    "), which should be replaced with "\n". After that, I could safely truncate the normal text, because there are no more tags that could be ruined.

  2. Truncate nicely inside the HTML : Parse the HTML, while identifying the text within it, and truncate it there and closing all tags when reaching the truncation position. This might even be harder.

I'm not sure which is easier, but I can think of possible disadvantages for each. It is just a preview though, so I don't think it matters much.

I've searched the Internet for such solutions, to see if others have made it. I've found some links that talk about "cleaning" or "optimizing" HTML, but I don't see they can handle replacing them or truncating them. Not only that, but since it's HTML, most are not related to Android, and use PHP, C#, Angular and others as their language.

Here are some links that I've found:

The questions

  1. Are those solutions that I've written possible? If so, is there maybe a known way to implement them? Or even a Java/Kotlin/Android library? How hard would it be to make such a solution?

  2. Maybe other solution I haven't thought about?


EDIT: I've also tried using an old code I've made in the past (here), which parses XML. Maybe it will work. I also try now to investigate some third party libraries for parsing HTML, such as Jsoup. I think it can help with the truncating, while supporting "faulty" HTML inputs.

android developer
  • 114,585
  • 152
  • 739
  • 1,270
  • Isn't it easier to generate the "preview" *before* the user text is converted to HTML? – assylias Mar 04 '18 at 15:02
  • @assylias How exactly ? I don't understand what you mean. – android developer Mar 04 '18 at 15:26
  • You say "We allow the user to create some text that will get converted to HTML" => can't you store the beginning of that text before it's converted to HTML and use that as a (unformatted) preview? – assylias Mar 04 '18 at 15:28
  • @assylias I still don't understand. How could you truncate HTML, before it's HTML ? We want to handle lists too (numbered/bullets). In any case, this is a very large library (uses spans), so I'm not sure how complex such a thing would be. – android developer Mar 04 '18 at 15:38
  • @assylias I've updated the question to make sure it's clear: we need minimized version of the HTML, while still somehow showing at least the lists. It can become plain text, but lists will still be visible (using spacial characters, for example), and don't forget about going to the next line... – android developer Mar 04 '18 at 15:55

1 Answers1

0

OK, I think I got it, using my old code for converting XML string into an object . It would still be great to see more robust solutions, but I think what I got is good enough, at least for now.

Below code uses it (origininal XmlTag class available here) :

XmlTagTruncationHelper.kt

object XmlTagTruncationHelper {
    /**@param maxLines max lines to permit. If <0, means there is no restriction
     * @param maxTextCharacters max text characters to permit. If <0, means there is no restriction*/
    class Restriction(val maxTextCharacters: Int, val maxLines: Int) {
        var currentTextCharactersCount: Int = 0
        var currentLinesCount: Int = 0
    }

    @JvmStatic
    fun truncateXmlTag(xmlTag: XmlTag, restriction: Restriction): String {
        if (restriction.maxLines == 0 || (restriction.maxTextCharacters >= 0 && restriction.currentTextCharactersCount >= restriction.maxTextCharacters))
            return ""
        val sb = StringBuilder()
        sb.append("<").append(xmlTag.tagName)
        val numberOfAttributes = if (xmlTag.tagAttributes != null) xmlTag.tagAttributes!!.size else 0
        if (numberOfAttributes != 0)
            for ((key, value) in xmlTag.tagAttributes!!)
                sb.append(" ").append(key).append("=\"").append(value).append("\"")
        val numberOfInnerContent = if (xmlTag.innerTagsAndContent != null) xmlTag.innerTagsAndContent!!.size else 0
        if (numberOfInnerContent == 0)
            sb.append("/>")
        else {
            sb.append(">")
            for (innerItem in xmlTag.innerTagsAndContent!!) {
                if (restriction.maxTextCharacters >= 0 && restriction.currentTextCharactersCount >= restriction.maxTextCharacters)
                    break
                if (innerItem is XmlTag) {
                    if (restriction.maxLines < 0)
                        sb.append(truncateXmlTag(innerItem, restriction))
                    else {
//                    Log.d("AppLog", "xmlTag:" + innerItem.tagName + " " + innerItem.innerTagsAndContent?.size)
                        var needToBreak = false
                        when {
                            innerItem.tagName == "br" -> {
                                ++restriction.currentLinesCount
                                needToBreak = restriction.currentLinesCount >= restriction.maxLines
                            }
                            innerItem.tagName == "li" -> {
                                ++restriction.currentLinesCount
                                needToBreak = restriction.currentLinesCount >= restriction.maxLines
                            }
                        }
                        if (needToBreak)
                            break
                        sb.append(truncateXmlTag(innerItem, restriction))
                    }
                } else if (innerItem is String) {
                    if (restriction.maxTextCharacters < 0)
                        sb.append(innerItem)
                    else
                        if (restriction.currentTextCharactersCount < restriction.maxTextCharacters) {
                            val str = innerItem
                            val extraCharactersAllowedToAdd = restriction.maxTextCharacters - restriction.currentTextCharactersCount
                            val strToAdd = str.substring(0, Math.min(str.length, extraCharactersAllowedToAdd))
                            if (strToAdd.isNotEmpty()) {
                                sb.append(strToAdd)
                                restriction.currentTextCharactersCount += strToAdd.length
                            }
                        }
                }
            }
            sb.append("</").append(xmlTag.tagName).append(">")
        }
        return sb.toString()
    }
}

XmlTag.kt

//based on https://stackoverflow.com/a/19115036/878126
/**
 * an xml tag , includes its name, value and attributes
 * @param tagName the name of the xml tag . for example : <a>b</a> . the name of the tag is "a"
 */
class XmlTag(val tagName: String) {
    /** a hashmap of all of the tag attributes. example: <a c="d" e="f">b</a> . attributes: {{"c"="d"},{"e"="f"}}     */
    @JvmField
    var tagAttributes: HashMap<String, String>? = null
    /**list of inner text and xml tags*/
    @JvmField
    var innerTagsAndContent: ArrayList<Any>? = null

    companion object {
        @JvmStatic
        fun getXmlFromString(input: String): XmlTag? {
            val factory = XmlPullParserFactory.newInstance()
            factory.isNamespaceAware = true
            val xpp = factory.newPullParser()
            xpp.setInput(StringReader(input))
            return getXmlRootTagOfXmlPullParser(xpp)
        }

        @JvmStatic
        fun getXmlRootTagOfXmlPullParser(xmlParser: XmlPullParser): XmlTag? {
            var currentTag: XmlTag? = null
            var rootTag: XmlTag? = null
            val tagsStack = Stack<XmlTag>()
            xmlParser.next()
            var eventType = xmlParser.eventType
            var doneParsing = false
            while (eventType != XmlPullParser.END_DOCUMENT && !doneParsing) {
                when (eventType) {
                    XmlPullParser.START_DOCUMENT -> {
                    }
                    XmlPullParser.START_TAG -> {
                        val xmlTagName = xmlParser.name
                        currentTag = XmlTag(xmlTagName)
                        if (tagsStack.isEmpty())
                            rootTag = currentTag
                        tagsStack.push(currentTag)
                        val numberOfAttributes = xmlParser.attributeCount
                        if (numberOfAttributes > 0) {
                            val attributes = HashMap<String, String>(numberOfAttributes)
                            for (i in 0 until numberOfAttributes) {
                                val attrName = xmlParser.getAttributeName(i)
                                val attrValue = xmlParser.getAttributeValue(i)
                                attributes[attrName] = attrValue
                            }
                            currentTag.tagAttributes = attributes
                        }
                    }
                    XmlPullParser.END_TAG -> {
                        currentTag = tagsStack.pop()
                        if (!tagsStack.isEmpty()) {
                            val parentTag = tagsStack.peek()
                            parentTag.addInnerXmlTag(currentTag)
                            currentTag = parentTag
                        } else
                            doneParsing = true
                    }
                    XmlPullParser.TEXT -> {
                        val innerText = xmlParser.text
                        if (currentTag != null)
                            currentTag.addInnerText(innerText)
                    }
                }
                eventType = xmlParser.next()
            }
            return rootTag
        }

        /**returns the root xml tag of the given xml resourceId , or null if not succeeded . */
        fun getXmlRootTagOfXmlFileResourceId(context: Context, xmlFileResourceId: Int): XmlTag? {
            val res = context.resources
            val xmlParser = res.getXml(xmlFileResourceId)
            return getXmlRootTagOfXmlPullParser(xmlParser)
        }
    }

    private fun addInnerXmlTag(tag: XmlTag) {
        if (innerTagsAndContent == null)
            innerTagsAndContent = ArrayList()
        innerTagsAndContent!!.add(tag)
    }

    private fun addInnerText(str: String) {
        if (innerTagsAndContent == null)
            innerTagsAndContent = ArrayList()
        innerTagsAndContent!!.add(str)
    }

    /**formats the xmlTag back to its string format,including its inner tags     */
    override fun toString(): String {
        val sb = StringBuilder()
        sb.append("<").append(tagName)
        val numberOfAttributes = if (tagAttributes != null) tagAttributes!!.size else 0
        if (numberOfAttributes != 0)
            for ((key, value) in tagAttributes!!)
                sb.append(" ").append(key).append("=\"").append(value).append("\"")
        val numberOfInnerContent = if (innerTagsAndContent != null) innerTagsAndContent!!.size else 0
        if (numberOfInnerContent == 0)
            sb.append("/>")
        else {
            sb.append(">")
            for (innerItem in innerTagsAndContent!!)
                sb.append(innerItem.toString())
            sb.append("</").append(tagName).append(">")
        }
        return sb.toString()
    }

}

Sample usage:

build.grade

    compileOptions {
        sourceCompatibility JavaVersion.VERSION_1_8
        targetCompatibility JavaVersion.VERSION_1_8
    }

...
dependencies{
implementation 'com.1gravity:android-rteditor:1.6.7'
...
}
...

MainActivity.kt

class MainActivity : AppCompatActivity() {


    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)
//        val inputXmlString = "<zz>Zhshs<br/>ABC</zz>"
        val inputXmlString = "Aaa<br/><b>Bbb<br/></b>Ccc<br/><ul><li>Ddd</li><li>eee</li></ul>fff<br/><ol><li>ggg</li><li>hhh</li></ol>"

        // XML must have a root tag
        val xmlString = if (!inputXmlString.startsWith("<"))
            "<html>$inputXmlString</html>" else inputXmlString

        val rtApi = RTApi(this, RTProxyImpl(this), RTMediaFactoryImpl(this, true))
        val mRTManager = RTManager(rtApi, savedInstanceState)
        mRTManager.registerEditor(beforeTruncationTextView, true)
        mRTManager.registerEditor(afterTruncationTextView, true)
        beforeTruncationTextView.setRichTextEditing(true, inputXmlString)
        val xmlTag = XmlTag.getXmlFromString(xmlString)

        Log.d("AppLog", "xml parsed: " + xmlTag.toString())
        val maxTextCharacters = 10
        val maxLines = 20

        val output = XmlTagTruncationHelper.truncateXmlTag(xmlTag!!, XmlTagTruncationHelper.Restriction(maxTextCharacters, maxLines))
        afterTruncationTextView.setRichTextEditing(true, output)
        Log.d("AppLog", "xml with truncation : maxTextCharacters: $maxTextCharacters , maxLines: $maxLines output: " + output)
    }
}

activity_main.xml

<LinearLayout
    xmlns:android="http://schemas.android.com/apk/res/android" xmlns:app="http://schemas.android.com/apk/res-auto"
    xmlns:tools="http://schemas.android.com/tools" android:layout_width="match_parent"
    android:layout_height="match_parent" android:gravity="center" android:orientation="vertical"
    tools:context=".MainActivity">

    <com.onegravity.rteditor.RTEditText
        android:id="@+id/beforeTruncationTextView" android:layout_width="match_parent"
        android:layout_height="wrap_content" android:background="#11ff0000" tools:text="beforeTruncationTextView"/>


    <com.onegravity.rteditor.RTEditText
        android:id="@+id/afterTruncationTextView" android:layout_width="match_parent"
        android:layout_height="wrap_content" android:background="#1100ff00" tools:text="afterTruncationTextView"/>
</LinearLayout>

And the result:

enter image description here

android developer
  • 114,585
  • 152
  • 739
  • 1,270